eynollah

mirror of https://github.com/qurator-spk/eynollah.git synced 2026-02-21 00:41:56 +01:00

History

Robert Sachunsky 27f43c175f Merge branch 'main' into ro-fixes and resolve conflicts… major conflicts resolved manually: - branches for non-`light` segmentation already removed in main - Keras/TF setup and no TF1 sessions, esp. in new ModelZoo - changes to binarizer and its CLI (`mode`, `overwrite`, `run_single()`) - writer: `build...` w/ kwargs instead of positional - training for segmentation/binarization/enhancement tasks: * drop unused `generate_data_from_folder()` * simplify `preprocess_imgs()`: turn `preprocess_img()`, `get_patches()` and `get_patches_num_scale_new()` into generators, only writing result files in the caller (top-level loop) instead of passing output directories and file counter - training for new OCR task: * `train`: put keys into additional `config_params` where they belong, resp. (conditioned under existing keys), and w/ better documentation * `train`: add new keys as kwargs to `run()` to make usable * `utils`: instead of custom data loader `data_gen_ocr()`, re-use existing `preprocess_imgs()` (for cfg capture and top-level loop), but extended w/ new kwargs and calling new `preprocess_img_ocr()`; the latter as single-image generator (also much simplified) * `train`: use tf.data loader pipeline from that generator w/ standard mechanisms for batching, shuffling, prefetching etc. * `utils` and `train`: instead of `vectorize_label`, use `Dataset.padded_batch` * add TensorBoard callback and re-use our checkpoint callback * also use standard Keras top-level loop for training still problematic (substantially unresolved): - `Patches` now only w/ fixed implicit size (ignoring training config params) - `PatchEncoder` now only w/ fixed implicit num patches and projection dim (ignoring training config params)		2026-02-07 14:05:56 +01:00
..
.gitkeep	code to produce models	2019-12-05 12:01:54 +01:00
config_params.json	The cnn-rnn ocr model can be trained now	2025-12-09 17:22:12 +01:00
config_params_docker.json	docker file to train model with desired cuda and cudnn	2025-06-25 18:24:16 +02:00
custom_config_page2label.json	scaling and cropping of labels and org images	2024-05-30 16:59:50 +02:00
Dockerfile	docker file to train model with desired cuda and cudnn	2025-06-25 18:24:16 +02:00
requirements.txt	training: use proper Keras callbacks and top-level loop	2026-01-29 03:01:57 +01:00
scales_enhancement.json	pass degrading scales for image enhancement as a json file	2024-05-28 10:01:17 +02:00