Commit graph

54 commits

Author SHA1 Message Date
Robert Sachunsky
27f43c175f Merge branch 'main' into ro-fixes and resolve conflicts…
major conflicts resolved manually:

- branches for non-`light` segmentation already removed in main
- Keras/TF setup and no TF1 sessions, esp. in new ModelZoo
- changes to binarizer and its CLI (`mode`, `overwrite`, `run_single()`)
- writer: `build...` w/ kwargs instead of positional
- training for segmentation/binarization/enhancement tasks:
  * drop unused `generate_data_from_folder()`
  * simplify `preprocess_imgs()`: turn `preprocess_img()`, `get_patches()`
    and `get_patches_num_scale_new()` into generators, only writing
    result files in the caller (top-level loop) instead of passing
    output directories and file counter
- training for new OCR task:
  * `train`: put keys into additional `config_params` where they belong,
    resp. (conditioned under existing keys), and w/ better documentation
  * `train`: add new keys as kwargs to `run()` to make usable
  * `utils`: instead of custom data loader `data_gen_ocr()`, re-use
    existing `preprocess_imgs()` (for cfg capture and top-level loop),
    but extended w/ new kwargs and calling new `preprocess_img_ocr()`;
    the latter as single-image generator (also much simplified)
  * `train`: use tf.data loader pipeline from that generator w/ standard
    mechanisms for batching, shuffling, prefetching etc.
  * `utils` and `train`: instead of `vectorize_label`, use `Dataset.padded_batch`
  * add TensorBoard callback and re-use our checkpoint callback
  * also use standard Keras top-level loop for training

still problematic (substantially unresolved):
- `Patches` now only w/ fixed implicit size
  (ignoring training config params)
- `PatchEncoder` now only w/ fixed implicit num patches and projection dim
  (ignoring training config params)
2026-02-07 14:05:56 +01:00
kba
83e8b289da 🔥 drop light_version/textline_light (now default and implied) 2025-11-26 20:48:22 +01:00
kba
5a1900e664 🔥 remove OCR option from eynollah layout 2025-11-26 18:12:03 +01:00
kba
b34329dd61 tests: more path fixes 2025-11-13 12:21:48 +01:00
kba
9aeff6d155 tests: typo 2025-11-13 11:49:09 +01:00
kba
3afbce023d tests: adapt paths 2025-11-13 11:46:31 +01:00
vahidrezanezhad
ed5b5c13dd Add test images; call TrOCR processor from the same directory as the TrOCR model 2025-11-07 12:47:21 +01:00
kba
8732007aaf . 2025-11-06 16:33:39 +01:00
kba
f902756ce1 try importing torch, then shapely, then tensorflow 2025-11-06 13:10:35 +01:00
kba
44037bc05d add layout marginalia test 2025-11-06 12:42:57 +01:00
kba
62d05917c5 test_layout: str(Path) 2025-10-30 12:17:38 +01:00
kba
29c273685f fix merge issues 2025-10-29 20:15:19 +01:00
kba
de76eabc1d Merge branch 'cli-logging' into model-zoo 2025-10-29 19:41:01 +01:00
kba
5e22e9db64 model_zoo: make type str to reduce importing overhead 2025-10-29 19:16:35 +01:00
kba
a913bdf7dc make --model-basedir and --model-overrides top-level CLI options 2025-10-29 18:48:41 +01:00
kba
b6f82c72b9 refactor cli tests 2025-10-29 17:23:21 +01:00
kba
ef999c8f0a Merge branch 'model-zoo' of lx0145.sbb.spk-berlin.de:/data/eynollah into model-zoo 2025-10-27 11:45:20 +01:00
kba
294b6356d3 wip 2025-10-27 11:45:16 +01:00
kba
51d2680d9c wip 2025-10-27 11:44:59 +01:00
kba
ec1fd93dad wip 2025-10-23 11:58:23 +02:00
kba
9d2b18d2af test_run: check log messages starting with eynollah 2025-10-21 13:29:55 +02:00
kba
b90cfdfcc4 adapt tests to -l being top-level option now 2025-10-20 18:56:24 +02:00
Robert Sachunsky
096def1e9d mbreorder/enhancment: fix missing imports
(not sure if these models really need that, though)
2025-10-09 20:14:11 +02:00
Robert Sachunsky
ca72a095ca tests: cover table detection in various modes 2025-10-09 20:14:11 +02:00
Robert Sachunsky
23535998f7 tests: symlink OCR models into layout model directory
(so layout with OCR options works with our split model packages)
2025-10-09 20:14:11 +02:00
Robert Sachunsky
a1904fa660 tests: cover layout with OCR in various modes 2025-10-09 20:14:11 +02:00
Robert Sachunsky
61b20cc83d tests: switch from subtests to parametrize, use --isolate everywhere to free CUDA memory in between 2025-09-30 19:20:35 +02:00
kba
3123add815 📝 update README 2025-09-26 15:07:32 +02:00
kba
830cc2c30a comment out the offending test outright 2025-09-26 14:37:04 +02:00
kba
eb8d4573a8 tests: also disable ...ocr_directory test 2025-09-26 13:57:08 +02:00
kba
42fb452a7e disable the -doit OCR test 2025-09-26 12:55:29 +02:00
Robert Sachunsky
480daa4c7c test_run: make ocr -doit work (add truetype file) 2025-09-25 22:28:15 +02:00
kba
0bb1fb1a05 tests: adapt to layout/ocr model split 2025-09-25 21:47:15 +02:00
Robert Sachunsky
5c7e1f21fb test_run: add tests for ocr 2025-09-25 19:53:19 +02:00
Robert Sachunsky
f07df080f0 add tests for enhancement and mbreorder 2025-09-25 01:16:19 +02:00
vahidrezanezhad
da141bb42e resolving tests error 2025-07-23 16:44:17 +02:00
Robert Sachunsky
177e017167 test_run: ensure exceptions are shown 2025-04-07 10:39:50 +00:00
Robert Sachunsky
56cc179d35 pytest: add tests for directory mode (layout+bin) 2025-04-05 01:20:38 +02:00
Robert Sachunsky
a3e1b3d4d5 pytest: add asserts for results, add binarization 2025-04-04 23:37:00 +02:00
Robert Sachunsky
b03116f4a6 pytest: use subtests for various layout options, add coverage 2025-04-04 22:22:50 +02:00
Robert Sachunsky
c7dc952851 smoke-test: also test dir-in mode and overwrite 2025-04-01 22:43:30 +02:00
vahidrezanezhad
ce5b611296 tests are passed - new models by the way should be uploaded 2024-11-14 17:18:07 +01:00
kba
84b844203d switch from qurator namespace to src-layout 2024-08-29 17:11:29 +02:00
Konstantin Baierer
4897cefdb7 allow passing PIL image to Eynollah w/o disk I/O 2021-04-15 17:25:05 +02:00
Konstantin Baierer
416a84e542 replace lxml with OCR-D/core PAGE API 2021-04-12 13:25:29 +02:00
Konstantin Baierer
a678bbf966 counter: add reset(); 2021-03-12 18:39:27 +01:00
Konstantin Baierer
56b688befe counter: allow arbitrary line/region id 2021-03-02 17:41:45 +01:00
Konstantin Baierer
98568402c7 counter: init-overrideable 2021-03-02 16:13:03 +01:00
Konstantin Baierer
58c4403e13 rename package to qurator.eynollah 2021-02-27 17:38:55 +01:00
Konstantin Baierer
8c603ae16d check_dpi: use OcrdExif instead identify callout 2021-02-18 14:27:08 +01:00