mirror of
https://github.com/qurator-spk/eynollah.git
synced 2026-02-21 00:41:56 +01:00
Merge branch 'main' into ro-fixes and resolve conflicts…
major conflicts resolved manually:
- branches for non-`light` segmentation already removed in main
- Keras/TF setup and no TF1 sessions, esp. in new ModelZoo
- changes to binarizer and its CLI (`mode`, `overwrite`, `run_single()`)
- writer: `build...` w/ kwargs instead of positional
- training for segmentation/binarization/enhancement tasks:
* drop unused `generate_data_from_folder()`
* simplify `preprocess_imgs()`: turn `preprocess_img()`, `get_patches()`
and `get_patches_num_scale_new()` into generators, only writing
result files in the caller (top-level loop) instead of passing
output directories and file counter
- training for new OCR task:
* `train`: put keys into additional `config_params` where they belong,
resp. (conditioned under existing keys), and w/ better documentation
* `train`: add new keys as kwargs to `run()` to make usable
* `utils`: instead of custom data loader `data_gen_ocr()`, re-use
existing `preprocess_imgs()` (for cfg capture and top-level loop),
but extended w/ new kwargs and calling new `preprocess_img_ocr()`;
the latter as single-image generator (also much simplified)
* `train`: use tf.data loader pipeline from that generator w/ standard
mechanisms for batching, shuffling, prefetching etc.
* `utils` and `train`: instead of `vectorize_label`, use `Dataset.padded_batch`
* add TensorBoard callback and re-use our checkpoint callback
* also use standard Keras top-level loop for training
still problematic (substantially unresolved):
- `Patches` now only w/ fixed implicit size
(ignoring training config params)
- `PatchEncoder` now only w/ fixed implicit num patches and projection dim
(ignoring training config params)
This commit is contained in:
commit
27f43c175f
77 changed files with 5597 additions and 4952 deletions
|
|
@ -1,59 +0,0 @@
|
|||
# Training eynollah
|
||||
|
||||
This README explains the technical details of how to set up and run training, for detailed information on parameterization, see [`docs/train.md`](../docs/train.md)
|
||||
|
||||
## Introduction
|
||||
|
||||
This folder contains the source code for training an encoder model for document image segmentation.
|
||||
|
||||
## Installation
|
||||
|
||||
Clone the repository and install eynollah along with the dependencies necessary for training:
|
||||
|
||||
```sh
|
||||
git clone https://github.com/qurator-spk/eynollah
|
||||
cd eynollah
|
||||
pip install '.[training]'
|
||||
```
|
||||
|
||||
### Pretrained encoder
|
||||
|
||||
Download our pretrained weights and add them to a `train/pretrained_model` folder:
|
||||
|
||||
```sh
|
||||
cd train
|
||||
wget -O pretrained_model.tar.gz https://zenodo.org/records/17243320/files/pretrained_model_v0_5_1.tar.gz?download=1
|
||||
tar xf pretrained_model.tar.gz
|
||||
```
|
||||
|
||||
### Binarization training data
|
||||
|
||||
A small sample of training data for binarization experiment can be found [on
|
||||
zenodo](https://zenodo.org/records/17243320/files/training_data_sample_binarization_v0_5_1.tar.gz?download=1),
|
||||
which contains `images` and `labels` folders.
|
||||
|
||||
### Helpful tools
|
||||
|
||||
* [`pagexml2img`](https://github.com/qurator-spk/page2img)
|
||||
> Tool to extract 2-D or 3-D RGB images from PAGE-XML data. In the former case, the output will be 1 2-D image array which each class has filled with a pixel value. In the case of a 3-D RGB image,
|
||||
each class will be defined with a RGB value and beside images, a text file of classes will also be produced.
|
||||
* [`cocoSegmentationToPng`](https://github.com/nightrome/cocostuffapi/blob/17acf33aef3c6cc2d6aca46dcf084266c2778cf0/PythonAPI/pycocotools/cocostuffhelper.py#L130)
|
||||
> Convert COCO GT or results for a single image to a segmentation map and write it to disk.
|
||||
* [`ocrd-segment-extract-pages`](https://github.com/OCR-D/ocrd_segment/blob/master/ocrd_segment/extract_pages.py)
|
||||
> Extract region classes and their colours in mask (pseg) images. Allows the color map as free dict parameter, and comes with a default that mimics PageViewer's coloring for quick debugging; it also warns when regions do overlap.
|
||||
|
||||
### Train using Docker
|
||||
|
||||
Build the Docker image:
|
||||
|
||||
```bash
|
||||
cd train
|
||||
docker build -t model-training .
|
||||
```
|
||||
|
||||
Run Docker image
|
||||
|
||||
```bash
|
||||
cd train
|
||||
docker run --gpus all -v $PWD:/entry_point_dir model-training
|
||||
```
|
||||
|
|
@ -1,31 +1,50 @@
|
|||
{
|
||||
"backbone_type" : "transformer",
|
||||
"task": "segmentation",
|
||||
"task": "cnn-rnn-ocr",
|
||||
"n_classes" : 2,
|
||||
"n_epochs" : 0,
|
||||
"input_height" : 448,
|
||||
"input_width" : 448,
|
||||
"max_len": 280,
|
||||
"n_epochs" : 3,
|
||||
"input_height" : 32,
|
||||
"input_width" : 512,
|
||||
"weight_decay" : 1e-6,
|
||||
"n_batch" : 1,
|
||||
"learning_rate": 1e-4,
|
||||
"n_batch" : 4,
|
||||
"learning_rate": 1e-5,
|
||||
"save_interval": 1500,
|
||||
"patches" : false,
|
||||
"pretraining" : true,
|
||||
"augmentation" : true,
|
||||
"flip_aug" : false,
|
||||
"blur_aug" : false,
|
||||
"blur_aug" : true,
|
||||
"scaling" : false,
|
||||
"adding_rgb_background": true,
|
||||
"adding_rgb_foreground": true,
|
||||
"add_red_textlines": false,
|
||||
"channels_shuffling": false,
|
||||
"degrading": false,
|
||||
"brightening": false,
|
||||
"add_red_textlines": true,
|
||||
"white_noise_strap": true,
|
||||
"textline_right_in_depth": true,
|
||||
"textline_left_in_depth": true,
|
||||
"textline_up_in_depth": true,
|
||||
"textline_down_in_depth": true,
|
||||
"textline_right_in_depth_bin": true,
|
||||
"textline_left_in_depth_bin": true,
|
||||
"textline_up_in_depth_bin": true,
|
||||
"textline_down_in_depth_bin": true,
|
||||
"bin_deg": true,
|
||||
"textline_skewing": true,
|
||||
"textline_skewing_bin": true,
|
||||
"channels_shuffling": true,
|
||||
"degrading": true,
|
||||
"brightening": true,
|
||||
"binarization" : true,
|
||||
"pepper_aug": true,
|
||||
"pepper_bin_aug": true,
|
||||
"image_inversion": true,
|
||||
"scaling_bluring" : false,
|
||||
"scaling_binarization" : false,
|
||||
"scaling_flip" : false,
|
||||
"rotation": false,
|
||||
"rotation_not_90": false,
|
||||
"color_padding_rotation": true,
|
||||
"padding_white": true,
|
||||
"rotation_not_90": true,
|
||||
"transformer_num_patches_xy": [56, 56],
|
||||
"transformer_patchsize_x": 4,
|
||||
"transformer_patchsize_y": 4,
|
||||
|
|
@ -34,13 +53,18 @@
|
|||
"transformer_layers": 1,
|
||||
"transformer_num_heads": 1,
|
||||
"transformer_cnn_first": false,
|
||||
"blur_k" : ["blur","guass","median"],
|
||||
"blur_k" : ["blur","gauss","median"],
|
||||
"padd_colors" : ["white", "black"],
|
||||
"scales" : [0.6, 0.7, 0.8, 0.9],
|
||||
"brightness" : [1.3, 1.5, 1.7, 2],
|
||||
"degrade_scales" : [0.2, 0.4],
|
||||
"pepper_indexes": [0.01, 0.005],
|
||||
"skewing_amplitudes" : [5, 8],
|
||||
"flip_index" : [0, 1, -1],
|
||||
"shuffle_indexes" : [ [0,2,1], [1,2,0], [1,0,2] , [2,1,0]],
|
||||
"thetha" : [5, -5],
|
||||
"thetha" : [0.1, 0.2, -0.1, -0.2],
|
||||
"thetha_padd": [-0.6, -1, -1.4, -1.8, 0.6, 1, 1.4, 1.8],
|
||||
"white_padds" : [0.1, 0.3, 0.5, 0.7, 0.9],
|
||||
"number_of_backgrounds_per_image": 2,
|
||||
"continue_training": false,
|
||||
"index_start" : 0,
|
||||
|
|
@ -48,11 +72,12 @@
|
|||
"weighted_loss": false,
|
||||
"is_loss_soft_dice": false,
|
||||
"data_is_provided": false,
|
||||
"dir_train": "/home/vahid/Documents/test/sbb_pixelwise_segmentation/test_label/pageextractor_test/train_new",
|
||||
"dir_train": "/home/vahid/extracted_lines/1919_bin/train",
|
||||
"dir_eval": "/home/vahid/Documents/test/sbb_pixelwise_segmentation/test_label/pageextractor_test/eval_new",
|
||||
"dir_output": "/home/vahid/Documents/test/sbb_pixelwise_segmentation/test_label/pageextractor_test/output_new",
|
||||
"dir_output": "/home/vahid/extracted_lines/1919_bin/output",
|
||||
"dir_rgb_backgrounds": "/home/vahid/Documents/1_2_test_eynollah/set_rgb_background",
|
||||
"dir_rgb_foregrounds": "/home/vahid/Documents/1_2_test_eynollah/out_set_rgb_foreground",
|
||||
"dir_img_bin": "/home/vahid/Documents/test/sbb_pixelwise_segmentation/test_label/pageextractor_test/train_new/images_bin"
|
||||
"dir_img_bin": "/home/vahid/extracted_lines/1919_bin/images_bin",
|
||||
"characters_txt_file":"/home/vahid/Downloads/models_eynollah/model_eynollah_ocr_cnnrnn_20250930/characters_org.txt"
|
||||
|
||||
}
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue