Commit graph

1283 commits

Author SHA1 Message Date
vahidrezanezhad
c4434c7f7d same task name for transformer-ocr training and inference 2026-02-19 13:59:16 +01:00
vahidrezanezhad
a84ae67e7a fix a typo 2026-02-19 00:04:42 +01:00
vahidrezanezhad
77adcbea8a copy characters list needed for cnn-rnn ocr model output while training and ensembling 2026-02-18 16:47:21 +01:00
vahidrezanezhad
4f66734e4d eynollah config files has renamed from config.json to config_eynollah.json - training trocr model still misses to write config file into checkpoint directories 2026-02-18 16:04:44 +01:00
vahidrezanezhad
b426f7f152 trocr inference is integrated - works on CPU cause seg fault on GPU 2026-02-18 15:04:54 +01:00
vahidrezanezhad
733462381c bug fix: layout visualization 2026-02-16 11:50:39 +01:00
vahidrezanezhad
9aa19aa6fa cnn-rnn ocr inference: get input shape of model 2026-02-16 11:49:05 +01:00
vahidrezanezhad
68dd5eab62 fixing: imporing StringLookup 2026-02-12 15:37:11 +01:00
vahidrezanezhad
2ee8d8e050
Update inference.py
importing StringLookup for cnn-rnn inference
2026-02-12 15:34:58 +01:00
vahidrezanezhad
ed034aa8ce
Update inference.py
Fix broken inference model loading introduced during refactoring or merge
2026-02-12 15:28:15 +01:00
vahidrezanezhad
47fa22112c import tensorflow is uncommented for ocr training 2026-02-11 19:52:56 +01:00
vahidrezanezhad
ab43477451 extracting ocr textline images and text: vertical lines threshold has changed to 1.4 2026-02-10 14:32:23 +01:00
vahidrezanezhad
a57914a68a fixed: textline and text correct extraction for page xml if vertical textlines are excluded + textline and text extraction for page alto files 2026-02-09 18:53:08 +01:00
vahidrezanezhad
fbf252db13 torch model ensembling is integrated 2026-02-04 21:16:08 +01:00
vahidrezanezhad
498ff8f7a5 Updating the --help descriptions 2026-02-03 20:27:06 +01:00
vahidrezanezhad
fff4253352 generate or update list of characters in the case of cnn-rnn ocr training 2026-02-03 20:22:47 +01:00
vahidrezanezhad
60f0fb541d integrating transformer ocr 2026-02-03 19:45:50 +01:00
kba
586077fbcd 📦 v0.7.0 2026-01-30 16:40:55 +01:00
kba
4ade0f788f 📝 changelog 2026-01-29 17:33:35 +01:00
kba
f13560726e Merge remote-tracking branch 'origin/adding-cnn-rnn-training-script' into 2026-01-29-training
# Conflicts:
#	src/eynollah/training/inference.py
2026-01-29 17:32:08 +01:00
kba
87190f8997 Merge branch 'adding-cnn-rnn-training-script-rfct' into 2026-01-29-training
# Conflicts:
#	src/eynollah/training/models.py
2026-01-29 10:27:36 +01:00
kba
a76de1e182 Merge branch 'adding-cnn-rnn-training-script' into 2026-01-29-training 2026-01-29 10:26:34 +01:00
kba
ef3cf02877 Merge branch 'ruff-training' into 2026-01-29-training 2026-01-29 10:26:14 +01:00
vahidrezanezhad
f9695cd7be Merge branch 'adding-cnn-rnn-training-script' of https://github.com/qurator-spk/eynollah into adding-cnn-rnn-training-script 2026-01-28 11:52:36 +01:00
vahidrezanezhad
3500167870 weights ensembling for tensorflow models is integrated 2026-01-28 11:52:12 +01:00
vahidrezanezhad
33f6a231bc fix: prevent crash when printspace is missing in xmls used for label generation 2026-01-26 17:30:26 +01:00
vahidrezanezhad
6ae244bf9b Fix filename stem extraction using binarization. Restore the CNN-RNN model to its previous version, as setting channels_last alone was insufficient for running on both CPU and GPU. Prevent errors caused by null values in image shape elements. 2026-01-26 15:04:47 +01:00
vahidrezanezhad
30f39e7383 mapregion is added to labels 2026-01-26 13:56:34 +01:00
vahidrezanezhad
c8240905a8 Fix label generation by selecting largest contour when erosion splits shapes 2026-01-26 13:36:24 +01:00
kba
9ccc495b4a wip 2025-12-19 14:57:10 +01:00
vahidrezanezhad
49261fa99b CNN–RNN–OCR inference and adaptation of the CNN–RNN–OCR model to support inference on both CPU and GPU 2025-12-17 15:12:39 +01:00
vahidrezanezhad
6ee79c7320 evaluation with a given GT is only possible for segmentation tasks 2025-12-17 13:28:02 +01:00
vahidrezanezhad
4651000191 debuging input shape + enable finetuning a model 2025-12-15 11:36:09 +01:00
vahidrezanezhad
4fc3ff33cb The cnn-rnn ocr model can be trained now 2025-12-09 17:22:12 +01:00
vahidrezanezhad
84a72a128b cnn-rnn model can be called - model input height and width are dynamic now - data generator is also callable 2025-12-09 15:30:19 +01:00
vahidrezanezhad
59e5a73654 adding cnn-rnn training script 2025-12-08 19:30:57 +01:00
vahidrezanezhad
7bf5e077d9 Restore correct execution of export_textline_images_and_text 2025-12-03 15:40:52 +01:00
vahidrezanezhad
6ac37af2f8 Fix eynollah ocr --help so it works again 2025-12-03 14:11:47 +01:00
vahidrezanezhad
d687d862d6 Restored correct functionality of the extract_only_images mode and cleaned up the argument handling 2025-12-03 12:01:42 +01:00
kba
51abe9617a log to STDERR not STDOUT 2025-12-02 15:00:33 +01:00
kba
b161e33854 🔥 refactor eynollah ocr
.
2025-11-28 15:45:21 +01:00
kba
30f9c695dc move line-gt extraction out of ocr to eynollah-training 2025-11-28 15:12:31 +01:00
kba
951bd2fce6 CI: do not upgrade (now-unpineed) torch 2025-11-28 15:12:31 +01:00
kba
9bcfeab057 💀 remove dead code from eynollah.py 2025-11-28 12:52:28 +01:00
kba
5171e09c2d eynollah.py: fix kwargs to writer 2025-11-28 12:52:28 +01:00
kba
c24cf94bce enforce kwargs for writer.build_... 2025-11-28 12:52:28 +01:00
kba
4aa9543a7d remove more branches after textline_light default true 2025-11-27 11:30:00 +01:00
kba
177d555ded factor out extract_only_images as eynollah extract-images 2025-11-26 21:37:00 +01:00
kba
83e8b289da 🔥 drop light_version/textline_light (now default and implied) 2025-11-26 20:48:22 +01:00
kba
ca83cf934d fix imports from src/cli/cli_*/*_cli 2025-11-26 20:48:14 +01:00