Commit graph

823 commits

Author SHA1 Message Date
kba
45b05c2316 Merge branch 'mbro_dead_code' into prepare-release-v0.5.0 2025-09-24 17:18:31 +02:00
vahidrezanezhad
80d50d4bf6 get textlines sorted in textregion - verticals 2025-09-24 17:17:27 +02:00
b-vr103
6d8641a518 get textlines sorted in textregion - verticals 2025-09-24 17:17:21 +02:00
vahidrezanezhad
6904a98182 get textlines inside textregion sorted debugging 2025-09-24 17:17:12 +02:00
vahidrezanezhad
ce13d8c5a3 get textlines inside textregion sorted 2025-09-24 17:16:47 +02:00
kba
8b30bdbae2 image_enhancer: use latest page extraction model 2025-09-24 16:39:31 +02:00
kba
c8ebe84697 image_enhancer: add missing models, remove dead code 2025-09-24 16:36:18 +02:00
kba
b75ca0d31f mb_ro_on_layout: remove copy-pasta code not actually used 2025-09-24 16:29:05 +02:00
Konstantin Baierer
9c129c7f54
Merge pull request #180 from bertsky/prepare-release-v0.5.0-fixlogging
prepare release v0.5.0: fix logging
2025-09-24 12:28:10 +02:00
Robert Sachunsky
5bd318e657 rm print statement (already log msg) 2025-09-24 12:14:32 +02:00
Robert Sachunsky
90f1d7aa47 rm summary msg (info already logged elsewhere) 2025-09-24 12:10:11 +02:00
Robert Sachunsky
7933b103f5 log modes only once (in run, not in run_single) 2025-09-24 12:09:30 +02:00
Robert Sachunsky
d0817f5744 fix typo 2025-09-24 12:08:50 +02:00
kba
9ead58b99a Merge remote-tracking branch 'michalbubula/add-feedback' into prepare-release-v0.5.0 2025-09-23 19:50:27 +02:00
kba
7bde99e866 Merge remote-tracking branch 'origin/updating_readme_for_eynollah_use_cases' into prepare-release-v0.5.0 2025-09-23 19:42:55 +02:00
kba
df8d93dbfa Merge branch 'main' into add-feedback 2025-09-23 19:20:20 +02:00
vahidrezanezhad
554f3988c9 default cnn-rnn and transformer ocr models have changed to model_eynollah_ocr_cnnrnn_20250904 and model_eynollah_ocr_trocr_20250919 respectively 2025-09-21 16:33:14 +02:00
vahidrezanezhad
6bbdfe1074 extending image types 2025-09-21 02:32:40 +02:00
vahidrezanezhad
e97e3ab192 Merge text of textlines and handle hyphenated words by joining them correctly 2025-09-19 23:23:30 +02:00
vahidrezanezhad
b38331b4ab writing page contour correctly in xml output + ignore unsupported file types when loading images 2025-09-19 18:06:18 +02:00
vahidrezanezhad
994bc8a1c0 debug new page extraction in the case of ignoring page extraction 2025-09-19 15:24:34 +02:00
kba
5c9cf8472b remove redundant/brittle interval logging 2025-09-18 13:19:57 +02:00
kba
146102842a convert all print stmts to logger.info calls 2025-09-18 13:15:18 +02:00
kba
c64d102613 move logging to CLI and make initialization optional 2025-09-18 13:07:41 +02:00
vahidrezanezhad
310679eeb8 page extraction model name is changed 2025-09-16 14:27:15 +02:00
vahidrezanezhad
542646791d For TrOCR, the cropped text lines will no longer be added to a list before prediction. Instead, for each batch size, the text line images will be collected and predictions will be made directly on them. 2025-09-23 19:03:13 +02:00
vahidrezanezhad
0711166524 changed the drop capitals bonding box to contour ratio threshold 2025-09-01 11:37:22 +02:00
vahidrezanezhad
e15640aa8a new page extraction model integration 2025-09-15 13:36:58 +02:00
vahidrezanezhad
6a735daa60
Update README.md 2025-08-31 23:30:54 +02:00
vahidrezanezhad
9b9d21d8ac eynollah ocr: support using either a specific model name or a models directory (default model) 2025-08-28 11:30:59 +02:00
vahidrezanezhad
41365645ef Marginals are divided into left and right, and written from top to bottom. 2025-08-26 22:38:03 +02:00
vahidrezanezhad
7741502876 reading order on given layout 2025-08-18 02:31:13 +02:00
Clemens Neudecker
a2359ea4c4
Merge pull request #171 from bertsky/ocrd-machine-based-ro
OCR-D processor: expose reading_order_machine_based
2025-08-15 18:40:13 +02:00
Robert Sachunsky
21615a986d OCR-D processor: expose reading_order_machine_based 2025-08-13 14:14:37 +02:00
michalbubula
8ebba5ac04 add feedback to command line interface 2025-08-12 16:21:15 +02:00
vahidrezanezhad
268aa141d7 avoiding float in range 2025-08-12 12:50:15 +02:00
vahidrezanezhad
52d9cc9baf deskewing with faster multiprocessing 2025-08-08 11:32:02 +02:00
vahidrezanezhad
322b04145f use the latest ocr model with balanced fraktur-antiqua training dataset 2025-08-05 14:22:22 +02:00
vahidrezanezhad
1b95f8f38d threshold for textline ocr + new ocr model 2025-07-25 13:18:38 +02:00
Clemens Neudecker
2996fc8b30
Merge pull request #166 from qurator-spk/updating_readme_for_eynollah_use_cases-cli
Updating readme for eynollah use cases cli
2025-07-24 15:30:57 +02:00
vahidrezanezhad
fd0595f920
Update Makefile 2025-07-24 13:52:38 +02:00
vahidrezanezhad
da141bb42e resolving tests error 2025-07-23 16:44:17 +02:00
vahidrezanezhad
6b8893b188
Merge pull request #167 from qurator-spk/ocrd-fixes
Ocrd fixes
2025-07-22 14:46:25 +02:00
vahidrezanezhad
daa597dbaa should merged text for the whole page be written in xml? 2025-07-21 14:50:05 +02:00
vahidrezanezhad
673e67a847 update model names 2025-07-21 10:54:20 +02:00
vahidrezanezhad
fee40049cd ocr model renamed - image text font for ocr result is now using Charis-7.000 font (downloaded from here https://software.sil.org/charis/download/) 2025-07-16 14:00:12 +02:00
vahidrezanezhad
04fead348f ocr: make sure that image height or width is not zero 2025-07-03 15:24:52 +02:00
vahidrezanezhad
53dd4b26a9 decorated with confidence value for cnnrnn ocr model 2025-07-03 11:50:47 +02:00
kba
b7b218ff11 OCR-D processor: same behavior as standalone wrt light_version/textline_light 2025-06-12 15:30:17 +02:00
vahidrezanezhad
c194a20c9c Fixed duplicate textline_light assignments (true and false) in the OCR-D framework for the Eynollah light version, which caused rectangles to be used instead of contours for textlines 2025-06-12 15:27:22 +02:00