Merge branch 'main' into ro-fixes and resolve conflicts…

major conflicts resolved manually:

- branches for non-`light` segmentation already removed in main
- Keras/TF setup and no TF1 sessions, esp. in new ModelZoo
- changes to binarizer and its CLI (`mode`, `overwrite`, `run_single()`)
- writer: `build...` w/ kwargs instead of positional
- training for segmentation/binarization/enhancement tasks:
  * drop unused `generate_data_from_folder()`
  * simplify `preprocess_imgs()`: turn `preprocess_img()`, `get_patches()`
    and `get_patches_num_scale_new()` into generators, only writing
    result files in the caller (top-level loop) instead of passing
    output directories and file counter
- training for new OCR task:
  * `train`: put keys into additional `config_params` where they belong,
    resp. (conditioned under existing keys), and w/ better documentation
  * `train`: add new keys as kwargs to `run()` to make usable
  * `utils`: instead of custom data loader `data_gen_ocr()`, re-use
    existing `preprocess_imgs()` (for cfg capture and top-level loop),
    but extended w/ new kwargs and calling new `preprocess_img_ocr()`;
    the latter as single-image generator (also much simplified)
  * `train`: use tf.data loader pipeline from that generator w/ standard
    mechanisms for batching, shuffling, prefetching etc.
  * `utils` and `train`: instead of `vectorize_label`, use `Dataset.padded_batch`
  * add TensorBoard callback and re-use our checkpoint callback
  * also use standard Keras top-level loop for training

still problematic (substantially unresolved):
- `Patches` now only w/ fixed implicit size
  (ignoring training config params)
- `PatchEncoder` now only w/ fixed implicit num patches and projection dim
  (ignoring training config params)
This commit is contained in:
Robert Sachunsky 2026-02-07 14:05:56 +01:00
commit 27f43c175f
77 changed files with 5597 additions and 4952 deletions

View file

@ -1,59 +0,0 @@
# Training eynollah
This README explains the technical details of how to set up and run training, for detailed information on parameterization, see [`docs/train.md`](../docs/train.md)
## Introduction
This folder contains the source code for training an encoder model for document image segmentation.
## Installation
Clone the repository and install eynollah along with the dependencies necessary for training:
```sh
git clone https://github.com/qurator-spk/eynollah
cd eynollah
pip install '.[training]'
```
### Pretrained encoder
Download our pretrained weights and add them to a `train/pretrained_model` folder:
```sh
cd train
wget -O pretrained_model.tar.gz https://zenodo.org/records/17243320/files/pretrained_model_v0_5_1.tar.gz?download=1
tar xf pretrained_model.tar.gz
```
### Binarization training data
A small sample of training data for binarization experiment can be found [on
zenodo](https://zenodo.org/records/17243320/files/training_data_sample_binarization_v0_5_1.tar.gz?download=1),
which contains `images` and `labels` folders.
### Helpful tools
* [`pagexml2img`](https://github.com/qurator-spk/page2img)
> Tool to extract 2-D or 3-D RGB images from PAGE-XML data. In the former case, the output will be 1 2-D image array which each class has filled with a pixel value. In the case of a 3-D RGB image,
each class will be defined with a RGB value and beside images, a text file of classes will also be produced.
* [`cocoSegmentationToPng`](https://github.com/nightrome/cocostuffapi/blob/17acf33aef3c6cc2d6aca46dcf084266c2778cf0/PythonAPI/pycocotools/cocostuffhelper.py#L130)
> Convert COCO GT or results for a single image to a segmentation map and write it to disk.
* [`ocrd-segment-extract-pages`](https://github.com/OCR-D/ocrd_segment/blob/master/ocrd_segment/extract_pages.py)
> Extract region classes and their colours in mask (pseg) images. Allows the color map as free dict parameter, and comes with a default that mimics PageViewer's coloring for quick debugging; it also warns when regions do overlap.
### Train using Docker
Build the Docker image:
```bash
cd train
docker build -t model-training .
```
Run Docker image
```bash
cd train
docker run --gpus all -v $PWD:/entry_point_dir model-training
```

View file

@ -1,31 +1,50 @@
{
"backbone_type" : "transformer",
"task": "segmentation",
"task": "cnn-rnn-ocr",
"n_classes" : 2,
"n_epochs" : 0,
"input_height" : 448,
"input_width" : 448,
"max_len": 280,
"n_epochs" : 3,
"input_height" : 32,
"input_width" : 512,
"weight_decay" : 1e-6,
"n_batch" : 1,
"learning_rate": 1e-4,
"n_batch" : 4,
"learning_rate": 1e-5,
"save_interval": 1500,
"patches" : false,
"pretraining" : true,
"augmentation" : true,
"flip_aug" : false,
"blur_aug" : false,
"blur_aug" : true,
"scaling" : false,
"adding_rgb_background": true,
"adding_rgb_foreground": true,
"add_red_textlines": false,
"channels_shuffling": false,
"degrading": false,
"brightening": false,
"add_red_textlines": true,
"white_noise_strap": true,
"textline_right_in_depth": true,
"textline_left_in_depth": true,
"textline_up_in_depth": true,
"textline_down_in_depth": true,
"textline_right_in_depth_bin": true,
"textline_left_in_depth_bin": true,
"textline_up_in_depth_bin": true,
"textline_down_in_depth_bin": true,
"bin_deg": true,
"textline_skewing": true,
"textline_skewing_bin": true,
"channels_shuffling": true,
"degrading": true,
"brightening": true,
"binarization" : true,
"pepper_aug": true,
"pepper_bin_aug": true,
"image_inversion": true,
"scaling_bluring" : false,
"scaling_binarization" : false,
"scaling_flip" : false,
"rotation": false,
"rotation_not_90": false,
"color_padding_rotation": true,
"padding_white": true,
"rotation_not_90": true,
"transformer_num_patches_xy": [56, 56],
"transformer_patchsize_x": 4,
"transformer_patchsize_y": 4,
@ -34,13 +53,18 @@
"transformer_layers": 1,
"transformer_num_heads": 1,
"transformer_cnn_first": false,
"blur_k" : ["blur","guass","median"],
"blur_k" : ["blur","gauss","median"],
"padd_colors" : ["white", "black"],
"scales" : [0.6, 0.7, 0.8, 0.9],
"brightness" : [1.3, 1.5, 1.7, 2],
"degrade_scales" : [0.2, 0.4],
"pepper_indexes": [0.01, 0.005],
"skewing_amplitudes" : [5, 8],
"flip_index" : [0, 1, -1],
"shuffle_indexes" : [ [0,2,1], [1,2,0], [1,0,2] , [2,1,0]],
"thetha" : [5, -5],
"thetha" : [0.1, 0.2, -0.1, -0.2],
"thetha_padd": [-0.6, -1, -1.4, -1.8, 0.6, 1, 1.4, 1.8],
"white_padds" : [0.1, 0.3, 0.5, 0.7, 0.9],
"number_of_backgrounds_per_image": 2,
"continue_training": false,
"index_start" : 0,
@ -48,11 +72,12 @@
"weighted_loss": false,
"is_loss_soft_dice": false,
"data_is_provided": false,
"dir_train": "/home/vahid/Documents/test/sbb_pixelwise_segmentation/test_label/pageextractor_test/train_new",
"dir_train": "/home/vahid/extracted_lines/1919_bin/train",
"dir_eval": "/home/vahid/Documents/test/sbb_pixelwise_segmentation/test_label/pageextractor_test/eval_new",
"dir_output": "/home/vahid/Documents/test/sbb_pixelwise_segmentation/test_label/pageextractor_test/output_new",
"dir_output": "/home/vahid/extracted_lines/1919_bin/output",
"dir_rgb_backgrounds": "/home/vahid/Documents/1_2_test_eynollah/set_rgb_background",
"dir_rgb_foregrounds": "/home/vahid/Documents/1_2_test_eynollah/out_set_rgb_foreground",
"dir_img_bin": "/home/vahid/Documents/test/sbb_pixelwise_segmentation/test_label/pageextractor_test/train_new/images_bin"
"dir_img_bin": "/home/vahid/extracted_lines/1919_bin/images_bin",
"characters_txt_file":"/home/vahid/Downloads/models_eynollah/model_eynollah_ocr_cnnrnn_20250930/characters_org.txt"
}