Merge branch 'main' into ro-fixes and resolve conflicts…

major conflicts resolved manually: - branches for non-`light` segmentation already removed in main - Keras/TF setup and no TF1 sessions, esp. in new ModelZoo - changes to binarizer and its CLI (`mode`, `overwrite`, `run_single()`) - writer: `build...` w/ kwargs instead of positional - training for segmentation/binarization/enhancement tasks: * drop unused `generate_data_from_folder()` * simplify `preprocess_imgs()`: turn `preprocess_img()`, `get_patches()` and `get_patches_num_scale_new()` into generators, only writing result files in the caller (top-level loop) instead of passing output directories and file counter - training for new OCR task: * `train`: put keys into additional `config_params` where they belong, resp. (conditioned under existing keys), and w/ better documentation * `train`: add new keys as kwargs to `run()` to make usable * `utils`: instead of custom data loader `data_gen_ocr()`, re-use existing `preprocess_imgs()` (for cfg capture and top-level loop), but extended w/ new kwargs and calling new `preprocess_img_ocr()`; the latter as single-image generator (also much simplified) * `train`: use tf.data loader pipeline from that generator w/ standard mechanisms for batching, shuffling, prefetching etc. * `utils` and `train`: instead of `vectorize_label`, use `Dataset.padded_batch` * add TensorBoard callback and re-use our checkpoint callback * also use standard Keras top-level loop for training still problematic (substantially unresolved): - `Patches` now only w/ fixed implicit size (ignoring training config params) - `PatchEncoder` now only w/ fixed implicit num patches and projection dim (ignoring training config params)
2026-02-21 00:41:56 +01:00 · 2026-02-07 14:05:56 +01:00 · 2026-02-07 14:05:56 +01:00 · 27f43c175f
commit 27f43c175f
parent 6944d31617 586077fbcd
77 changed files with 5597 additions and 4952 deletions
--- a/docs/train.md
+++ b/docs/train.md
@ -1,3 +1,41 @@
+# Prerequisistes
+
+## 1. Install Eynollah with training dependencies
+
+Clone the repository and install eynollah along with the dependencies necessary for training:
+
+```sh
+git clone https://github.com/qurator-spk/eynollah
+cd eynollah
+pip install '.[training]'
+```
+
+## 2. Pretrained encoder
+
+Download our pretrained weights and add them to a `train/pretrained_model` folder: 
+
+```sh
+cd train
+wget -O pretrained_model.tar.gz https://zenodo.org/records/17243320/files/pretrained_model_v0_5_1.tar.gz?download=1
+tar xf pretrained_model.tar.gz
+```
+
+## 3. Example data
+
+### Binarization
+A small sample of training data for binarization experiment can be found on [Zenodo](https://zenodo.org/records/17243320/files/training_data_sample_binarization_v0_5_1.tar.gz?download=1),
+which contains `images` and `labels` folders.
+
+## 4. Helpful tools
+
+* [`pagexml2img`](https://github.com/qurator-spk/page2img)
+> Tool to extract 2-D or 3-D RGB images from PAGE-XML data. In the former case, the output will be 1 2-D image array which each class has filled with a pixel value. In the case of a 3-D RGB image, 
+each class will be defined with a RGB value and beside images, a text file of classes will also be produced.
+* [`cocoSegmentationToPng`](https://github.com/nightrome/cocostuffapi/blob/17acf33aef3c6cc2d6aca46dcf084266c2778cf0/PythonAPI/pycocotools/cocostuffhelper.py#L130)
+> Convert COCO GT or results for a single image to a segmentation map and write it to disk.
+* [`ocrd-segment-extract-pages`](https://github.com/OCR-D/ocrd_segment/blob/master/ocrd_segment/extract_pages.py)
+> Extract region classes and their colours in mask (pseg) images. Allows the color map as free dict parameter, and comes with a default that mimics PageViewer's coloring for quick debugging; it also warns when regions do overlap.
+
 # Training documentation

 This document aims to assist users in preparing training datasets, training models, and