Commit graph

734 commits

Author SHA1 Message Date
Robert Sachunsky
df3510750c
Github Actions CI: no more Docker clean or build 2025-04-01 00:28:16 +02:00
Robert Sachunsky
45e3ab9692
Github Actions: free space: all existing Docker images 2025-04-01 00:23:53 +02:00
vahidrezanezhad
4de441eaaa OCR prediction is now enabled to integrate results from both RGB and binarized images or to be performed on each individually 2025-03-31 21:28:05 +02:00
vahidrezanezhad
b1da0a3327 In OCR, the predicted text is now drawn on the image, and the results are saved in a specified directory. This makes it easier to review the predicted output 2025-03-31 18:43:14 +02:00
Robert Sachunsky
31aeb9629d
Github Actions: free space more aggressively 2025-03-31 18:16:17 +02:00
Robert Sachunsky
7430b57b65 dockerfile: add smoke test 2025-03-31 16:56:47 +02:00
Robert Sachunsky
f35f49376e run CLI test in TMPDIR, add ocrd-test 2025-03-31 16:55:57 +02:00
Robert Sachunsky
ae066388ea docker: no need for g++, but install w/ 'EXTRAS=OCR' 2025-03-31 15:58:57 +02:00
Robert Sachunsky
722b5c6bf1 add make variable EXTRAS for optional dependencies 2025-03-31 15:58:12 +02:00
Robert Sachunsky
c01609ff4e allow even more empty imports for optional dependencies 2025-03-31 15:57:22 +02:00
Robert Sachunsky
51e9bfd6d7 improve+extend dockerfile 2025-03-31 14:14:08 +02:00
Robert Sachunsky
09248d4829 improve+extend makefile 2025-03-31 14:13:16 +02:00
Robert Sachunsky
46618f4229 allow more empty imports for optional dependencies 2025-03-31 14:11:50 +02:00
Robert Sachunsky
4be89910a2 CLI: fix arg vs kwarg from merge 2025-03-31 02:38:24 +02:00
Robert Sachunsky
9d61acf173 simplify 2025-03-31 02:02:30 +02:00
Robert Sachunsky
a1068ff2eb OCR-D: move sbb-binarize to ocrd-tool.json, update to v3 2025-03-31 01:47:32 +02:00
Robert Sachunsky
c794d4d29f OCR-D: fix typo light_mode→light_version 2025-03-31 01:46:29 +02:00
Robert Sachunsky
4338259ca1 OCR-D: ensure page image gets replaced in result as well if not the original file 2025-03-31 01:17:14 +02:00
Robert Sachunsky
55969b0173 OCR-D: add docstring 2025-03-31 01:15:26 +02:00
Robert Sachunsky
3916474b8b OCR-D: require >=v3.1 2025-03-31 01:15:12 +02:00
Robert Sachunsky
6d02e90570 OCR-D: restrict max_workers=1 2025-03-31 01:14:54 +02:00
Robert Sachunsky
efd3fa6775 allow empty imports for optional dependencies 2025-03-31 00:32:26 +02:00
Robert Sachunsky
238132e260 use 'image_filename' for pseudo-iteration outside 'dir_in' mode 2025-03-31 00:31:49 +02:00
Robert Sachunsky
af4e2a4ffc do not require 'dir_out' outside 'dir_in' mode 2025-03-31 00:31:09 +02:00
Robert Sachunsky
ea136e3ddd 'overwrite' check: only in 'dir_in' mode 2025-03-31 00:30:06 +02:00
Robert Sachunsky
1f4a17b60d Merge remote-tracking branch 'origin/machine_based_reading_order_integration' into v3-api 2025-03-30 21:21:59 +02:00
Robert Sachunsky
edf924c2cb ocrd-tool: add dockerhub 2025-03-30 19:47:25 +02:00
vahidrezanezhad
9b04688ebc The rotate_image function has been updated. Additionally, the reading order is now correct in the case of the light version, provided that slope_deskew exceeds the slope_threshold. 2025-03-30 15:34:27 +02:00
vahidrezanezhad
cf40f9ecc5 The rotate_image function produces the exact same rotation as Imutils. Therefore, there is no need to retain the remove-imutils-1 branch. 2025-03-28 20:58:32 +01:00
vahidrezanezhad
b55389ac62
Update requirements.txt 2025-03-28 14:59:31 +01:00
vahidrezanezhad
8bf70d905f
Merge pull request #147 from qurator-spk/revert-146-remove-imutils-1
Revert "replace usages of `imutils` with opencv equivalents"
2025-03-28 14:58:04 +01:00
vahidrezanezhad
f756b08c9b
Revert "replace usages of imutils with opencv equivalents" 2025-03-28 14:57:40 +01:00
vahidrezanezhad
c9de578d4d removing imutils from requirements 2025-03-28 11:25:03 +01:00
vahidrezanezhad
52c605185a
Merge pull request #146 from qurator-spk/remove-imutils-1
replace usages of `imutils` with opencv equivalents
2025-03-28 11:10:25 +01:00
cneud
0e9a72ea52 consolidate usage documentation 2025-03-27 23:14:59 +01:00
cneud
3a55b6ce91 consolidate usage documentation 2025-03-27 23:11:18 +01:00
cneud
e9fa691308 add model and training documentation 2025-03-27 22:41:10 +01:00
vahidrezanezhad
6f36c7177f For OCR, the splitting ratio of text lines is adjusted 2025-03-27 18:24:47 +01:00
cneud
181c0c584f bbox rotation with opencv 2025-03-26 22:25:22 +01:00
cneud
eaff9e3537 Merge branch 'main' into remove-imutils-1 2025-03-26 20:16:46 +01:00
vahidrezanezhad
7df0427b04 In the context of OCR, if Page-XML files already contain text, the new predicted text will replace the existing text. 2025-03-26 18:42:06 +01:00
vahidrezanezhad
370d44a66b Slope deskew in the light version is set to zero because when the slope_deskew value exceeds the slope_threshold, the reading order becomes incorrect. This issue needs to be addressed. Additionally, the textlines order within text region in the light version was reversed, and this has been corrected. 2025-03-26 10:45:34 +01:00
Clemens Neudecker
005b6988f4
Merge pull request #140 from qurator-spk/machine_based_reading_order_integration
Machine based reading order integration
2025-03-25 11:00:44 +01:00
vahidrezanezhad
d3a4c06e7f This commit enables the export of cropped text line images along with their corresponding texts from a Page-XML file. These exported text line images and texts can be utilized for training a text line-based OCR model. 2025-03-20 18:21:44 +01:00
vahidrezanezhad
c8b8529951 For the CNN-RNN OCR model, long text lines are split into two segments 2025-03-17 19:50:58 +01:00
vahidrezanezhad
aa72ca3006 Resolved an issue in the OCR-D framework where dir_out received a None value 2025-03-13 15:02:38 +01:00
vahidrezanezhad
a4f1f35125 Resolving test failure 2025-03-07 13:19:56 +01:00
kba
54040c1db4 Merge remote-tracking branch 'bertsky/machine_based_reading_order_integration_fixes' into machine_based_reading_order_integration 2025-03-06 15:48:52 +01:00
cneud
0b2c1b9275 remove imutils dependency 2025-03-03 22:21:57 +01:00
Clemens Neudecker
687aba1fa2
replace usages of imutils with opencv equivalents
should fix https://github.com/qurator-spk/eynollah/issues/141
2025-03-03 22:10:40 +01:00