Commit graph

828 commits

Author SHA1 Message Date
vahidrezanezhad
fee40049cd ocr model renamed - image text font for ocr result is now using Charis-7.000 font (downloaded from here https://software.sil.org/charis/download/) 2025-07-16 14:00:12 +02:00
vahidrezanezhad
04fead348f ocr: make sure that image height or width is not zero 2025-07-03 15:24:52 +02:00
vahidrezanezhad
53dd4b26a9 decorated with confidence value for cnnrnn ocr model 2025-07-03 11:50:47 +02:00
kba
b7b218ff11 OCR-D processor: same behavior as standalone wrt light_version/textline_light 2025-06-12 15:30:17 +02:00
vahidrezanezhad
c194a20c9c Fixed duplicate textline_light assignments (true and false) in the OCR-D framework for the Eynollah light version, which caused rectangles to be used instead of contours for textlines 2025-06-12 15:27:22 +02:00
kba
32889ef1e0 adapt binarization CLI according to #156 2025-06-12 13:57:41 +02:00
vahidrezanezhad
9b4e78c55c
Fixed duplicate textline_light assignments (true and false) in the OCR-D framework for the Eynollah light version, which caused rectangles to be used instead of contours for textlines 2025-06-11 18:57:08 +02:00
vahidrezanezhad
f79af201ab Fix: Resolved OCR bug when text region type is undefined 2025-06-02 18:21:33 +02:00
vahidrezanezhad
e26c4ab9b4 image enhancer updated 2025-06-01 22:44:50 +02:00
vahidrezanezhad
9342b76038 saving enhanced image in org or scaled resolution 2025-06-01 22:10:13 +02:00
vahidrezanezhad
3b475915c7 image enhancer is integrated 2025-06-01 15:53:04 +02:00
vahidrezanezhad
df903aa1b4 Parametrize OCR for handling curved lines 2025-05-31 01:09:14 +02:00
vahidrezanezhad
1e7cecfcf9 updating ocr 2025-05-28 01:17:21 +02:00
vahidrezanezhad
03f52e7a46 updating ocr 2025-05-27 23:45:22 +02:00
vahidrezanezhad
31d9fa0c80 strings alignment function is added + new changes needed for prediction with both bin and rgb inputs is implemented 2025-05-25 21:44:36 +02:00
vahidrezanezhad
b18691f96a rnn ocr for all layout textregion types 2025-05-25 03:33:54 +02:00
vahidrezanezhad
ba3420b2d8 Drop capitals are written separately and are not attached to their corresponding text line. The OCR use case also supports single-image input. 2025-05-25 01:12:58 +02:00
vahidrezanezhad
0250a6d3d0 enhancing ocr 2025-05-23 18:06:53 +02:00
vahidrezanezhad
089029cec7 commit 21ec4fb is picked + rnn ocr at the same time with segmentation + enhancement of mb reading order 2025-05-23 15:55:03 +02:00
vahidrezanezhad
ee2c7e9013 enhancing curved lines OCR 2025-05-21 17:42:44 +02:00
vahidrezanezhad
14b70c2556 Implement hyphenated textline merging in OCR engine and a bug fixed for curved textline OCR 2025-05-21 14:39:31 +02:00
vahidrezanezhad
3ad621e956 ocr for curved lines 2025-05-20 19:01:52 +02:00
vahidrezanezhad
44ff51f5c1 mb reading order now can be done faster. Text regions are clustered using dilation, and mb reading order needs to be implemented for fewer regions 2025-05-20 16:51:08 +02:00
vahidrezanezhad
5016039cd7 enhancing marginal detection for light version 2025-05-18 02:48:05 +02:00
vahidrezanezhad
1cbc669d36 marginals detection enhanced for light version 2025-05-15 15:33:50 +02:00
vahidrezanezhad
1b229ba7ae enhancement for vertical textlines 2025-05-15 00:45:22 +02:00
cneud
7a22e51f5d resolve some comments from review 2025-05-14 21:56:03 +02:00
vahidrezanezhad
ed46615f00 enhance ocr for vertical textlines 2025-05-14 18:34:58 +02:00
vahidrezanezhad
88e0315321 Accurately writing text line contours into xml file when the deskewing exceeds 45 degrees and the text line is in light mode 2025-05-13 15:53:05 +02:00
vahidrezanezhad
54088c6b04 The initial attempt at reading heavily deskewed or vertically aligned lines. 2025-05-13 14:40:57 +02:00
vahidrezanezhad
c12b09a868 I have tried to address the issues #163 and #161 . The changes have also improved marginal detection and enhanced the isolation of headers. 2025-05-12 00:10:18 +02:00
vahidrezanezhad
21ec4fbfb5 The text region coordinates are now correctly written into the XML output when using the skip layout and reading order option 2025-05-07 14:04:01 +02:00
vahidrezanezhad
83211ae684 In the case of skip_layout_and_reading_order, the confidence value was not set correctly, leading to an error while writing to the XML file. 2025-05-07 12:33:03 +02:00
Clemens Neudecker
3dcbb20cac
Merge pull request #159 from bertsky/main
update docker
2025-05-06 15:14:06 +02:00
vahidrezanezhad
89aa545049 let to add dataset abbrevation to extracted textline images and text 2025-05-03 02:59:16 +02:00
vahidrezanezhad
48e8dd4ab3 machine based model name changed to public one 2025-05-02 12:57:26 +02:00
vahidrezanezhad
a1a004b19d inference batch size for ocr is passed as an argument 2025-05-02 12:53:33 +02:00
vahidrezanezhad
5d8c864c08 adding space between splitted textline predicted text in the case of trocr 2025-05-02 01:02:32 +02:00
vahidrezanezhad
184af46664 displaying detexted text on an image is provided for trocr case 2025-05-02 00:30:36 +02:00
Robert Sachunsky
e9179e1d34 docker: use latest core base stage 2025-05-02 00:16:22 +02:00
Robert Sachunsky
f8b4d29a59 docker: prepackage ocrd-all-module-dir.json 2025-05-02 00:16:22 +02:00
vahidrezanezhad
e2da7a6239 Fix model name to return the correct machine-based model name 2025-04-30 16:06:29 +02:00
vahidrezanezhad
b227736094 Fix OCR text cleaning to correctly handle 'U', 'K', and 'N' starting sentence; update text line splitting size 2025-04-30 16:04:34 +02:00
vahidrezanezhad
4cb4414740 Resolve remaining issue with #158 and resolving #124 2025-04-30 16:01:52 +02:00
vahidrezanezhad
208bde706f resolving issue #158 2025-04-30 13:55:09 +02:00
Konstantin Baierer
3e8adb86c2
Merge pull request #157 from qurator-spk/kba-patch-1
CI: Use most recent actions/setup-python@v5
2025-04-29 11:42:18 +02:00
Konstantin Baierer
77dae129d5
CI: Use most recent actions/setup-python@v5 2025-04-22 13:22:28 +02:00
vahidrezanezhad
192b9111e3 updating eynollah README, how to use it for use cases 2025-04-22 00:23:01 +02:00
Clemens Neudecker
b4df978dd5
Merge pull request #154 from qurator-spk/ci-pypi
CI: pypi
2025-04-17 17:01:20 +02:00
kba
30ba234641 CI: pypi 2025-04-16 19:27:17 +02:00