Merge branch 'main' into ro-fixes and resolve conflicts…

major conflicts resolved manually: - branches for non-`light` segmentation already removed in main - Keras/TF setup and no TF1 sessions, esp. in new ModelZoo - changes to binarizer and its CLI (`mode`, `overwrite`, `run_single()`) - writer: `build...` w/ kwargs instead of positional - training for segmentation/binarization/enhancement tasks: * drop unused `generate_data_from_folder()` * simplify `preprocess_imgs()`: turn `preprocess_img()`, `get_patches()` and `get_patches_num_scale_new()` into generators, only writing result files in the caller (top-level loop) instead of passing output directories and file counter - training for new OCR task: * `train`: put keys into additional `config_params` where they belong, resp. (conditioned under existing keys), and w/ better documentation * `train`: add new keys as kwargs to `run()` to make usable * `utils`: instead of custom data loader `data_gen_ocr()`, re-use existing `preprocess_imgs()` (for cfg capture and top-level loop), but extended w/ new kwargs and calling new `preprocess_img_ocr()`; the latter as single-image generator (also much simplified) * `train`: use tf.data loader pipeline from that generator w/ standard mechanisms for batching, shuffling, prefetching etc. * `utils` and `train`: instead of `vectorize_label`, use `Dataset.padded_batch` * add TensorBoard callback and re-use our checkpoint callback * also use standard Keras top-level loop for training still problematic (substantially unresolved): - `Patches` now only w/ fixed implicit size (ignoring training config params) - `PatchEncoder` now only w/ fixed implicit num patches and projection dim (ignoring training config params)
2026-02-21 00:41:56 +01:00 · 2026-02-07 14:05:56 +01:00 · 2026-02-07 14:05:56 +01:00 · 27f43c175f
commit 27f43c175f
parent 6944d31617 586077fbcd
77 changed files with 5597 additions and 4952 deletions
--- a/train/README.md
+++ b/train/README.md
@ -1,59 +0,0 @@
-# Training eynollah
-
-This README explains the technical details of how to set up and run training, for detailed information on parameterization, see [`docs/train.md`](../docs/train.md)
-
-## Introduction
-
-This folder contains the source code for training an encoder model for document image segmentation.
-
-## Installation
-
-Clone the repository and install eynollah along with the dependencies necessary for training:
-
-```sh
-git clone https://github.com/qurator-spk/eynollah
-cd eynollah
-pip install '.[training]'
-```
-
-### Pretrained encoder
-
-Download our pretrained weights and add them to a `train/pretrained_model` folder:   
-
-```sh
-cd train
-wget -O pretrained_model.tar.gz https://zenodo.org/records/17243320/files/pretrained_model_v0_5_1.tar.gz?download=1
-tar xf pretrained_model.tar.gz
-```
-
-### Binarization training data
-
-A small sample of training data for binarization experiment can be found [on
-zenodo](https://zenodo.org/records/17243320/files/training_data_sample_binarization_v0_5_1.tar.gz?download=1),
-which contains `images` and `labels` folders.
-
-### Helpful tools
-
-* [`pagexml2img`](https://github.com/qurator-spk/page2img)
-> Tool to extract 2-D or 3-D RGB images from PAGE-XML data. In the former case, the output will be 1 2-D image array which each class has filled with a pixel value. In the case of a 3-D RGB image, 
-each class will be defined with a RGB value and beside images, a text file of classes will also be produced.
-* [`cocoSegmentationToPng`](https://github.com/nightrome/cocostuffapi/blob/17acf33aef3c6cc2d6aca46dcf084266c2778cf0/PythonAPI/pycocotools/cocostuffhelper.py#L130)
-> Convert COCO GT or results for a single image to a segmentation map and write it to disk.
-* [`ocrd-segment-extract-pages`](https://github.com/OCR-D/ocrd_segment/blob/master/ocrd_segment/extract_pages.py)
-> Extract region classes and their colours in mask (pseg) images. Allows the color map as free dict parameter, and comes with a default that mimics PageViewer's coloring for quick debugging; it also warns when regions do overlap.
-
-### Train using Docker
-
-Build the Docker image:
-
-```bash
-cd train
-docker build -t model-training .
-```
-
-Run Docker image 
-
-```bash
-cd train
-docker run --gpus all -v $PWD:/entry_point_dir model-training
-```
--- a/train/config_params.json
+++ b/train/config_params.json
@ -1,31 +1,50 @@
 {
    "backbone_type" : "transformer",
-    "task": "segmentation",
+    "task": "cnn-rnn-ocr",
    "n_classes" : 2,
-    "n_epochs" : 0,
-    "input_height" : 448,
-    "input_width" : 448,
+    "max_len": 280,
+    "n_epochs" : 3,
+    "input_height" : 32,
+    "input_width" : 512,
    "weight_decay" : 1e-6,
-    "n_batch" : 1,
-    "learning_rate": 1e-4,
+    "n_batch" : 4,
+    "learning_rate": 1e-5,
+    "save_interval": 1500,
    "patches" : false,
    "pretraining" : true,
    "augmentation" : true,
    "flip_aug" : false,
-    "blur_aug" : false,
+    "blur_aug" : true,
    "scaling" : false,
    "adding_rgb_background": true,
    "adding_rgb_foreground": true,
-    "add_red_textlines": false,
-    "channels_shuffling": false,
-    "degrading": false,
-    "brightening": false,
+    "add_red_textlines": true,
+    "white_noise_strap": true,
+    "textline_right_in_depth": true,
+    "textline_left_in_depth": true,
+    "textline_up_in_depth": true,
+    "textline_down_in_depth": true,
+    "textline_right_in_depth_bin": true,
+    "textline_left_in_depth_bin": true,
+    "textline_up_in_depth_bin": true,
+    "textline_down_in_depth_bin": true,
+    "bin_deg": true,
+    "textline_skewing": true,
+    "textline_skewing_bin": true,
+    "channels_shuffling": true,
+    "degrading": true,
+    "brightening": true,
    "binarization" : true,
+    "pepper_aug": true,
+    "pepper_bin_aug": true,
+    "image_inversion": true,
    "scaling_bluring" : false,
    "scaling_binarization" : false,
    "scaling_flip" : false,
    "rotation": false,
-    "rotation_not_90": false,
+    "color_padding_rotation": true,
+    "padding_white": true,
+    "rotation_not_90": true,
    "transformer_num_patches_xy": [56, 56],
    "transformer_patchsize_x": 4,
    "transformer_patchsize_y": 4,
@ -34,13 +53,18 @@
    "transformer_layers": 1,
    "transformer_num_heads": 1,
    "transformer_cnn_first": false,
-    "blur_k" : ["blur","guass","median"],
+    "blur_k" : ["blur","gauss","median"],
+    "padd_colors" : ["white", "black"],
    "scales" : [0.6, 0.7, 0.8, 0.9],
    "brightness" : [1.3, 1.5, 1.7, 2],
    "degrade_scales" : [0.2, 0.4],
+    "pepper_indexes": [0.01, 0.005],
+    "skewing_amplitudes" : [5, 8],
    "flip_index" : [0, 1, -1],
    "shuffle_indexes" : [ [0,2,1], [1,2,0], [1,0,2] , [2,1,0]],
-    "thetha" : [5, -5],
+    "thetha" : [0.1, 0.2, -0.1, -0.2],
+    "thetha_padd": [-0.6, -1, -1.4, -1.8, 0.6, 1, 1.4, 1.8],
+    "white_padds" : [0.1, 0.3, 0.5, 0.7, 0.9],
    "number_of_backgrounds_per_image": 2,
    "continue_training": false,
    "index_start" : 0,
@ -48,11 +72,12 @@
    "weighted_loss": false,
    "is_loss_soft_dice": false,
    "data_is_provided": false,
-    "dir_train": "/home/vahid/Documents/test/sbb_pixelwise_segmentation/test_label/pageextractor_test/train_new",
+    "dir_train": "/home/vahid/extracted_lines/1919_bin/train",
    "dir_eval": "/home/vahid/Documents/test/sbb_pixelwise_segmentation/test_label/pageextractor_test/eval_new",
-    "dir_output": "/home/vahid/Documents/test/sbb_pixelwise_segmentation/test_label/pageextractor_test/output_new",
+    "dir_output": "/home/vahid/extracted_lines/1919_bin/output",
    "dir_rgb_backgrounds": "/home/vahid/Documents/1_2_test_eynollah/set_rgb_background",
    "dir_rgb_foregrounds": "/home/vahid/Documents/1_2_test_eynollah/out_set_rgb_foreground",
-    "dir_img_bin": "/home/vahid/Documents/test/sbb_pixelwise_segmentation/test_label/pageextractor_test/train_new/images_bin"
+    "dir_img_bin": "/home/vahid/extracted_lines/1919_bin/images_bin",
+    "characters_txt_file":"/home/vahid/Downloads/models_eynollah/model_eynollah_ocr_cnnrnn_20250930/characters_org.txt"
    
 }