Commit graph

628 commits

Author SHA1 Message Date
cneud
4179a2ea3a integrate binarization in ocrd-tool.json 2025-03-26 01:27:02 +01:00
cneud
284c2aed7f unresolved references 2025-03-26 01:06:19 +01:00
cneud
881f6f37c1 rename ocrd-sbb-binarize to ocrd-eynollah-binarize 2025-03-26 00:50:39 +01:00
cneud
3f1e140da1 pep8: class names should use CapWords 2025-03-26 00:32:10 +01:00
cneud
62e8d78e73 few minor fixes 2025-03-26 00:27:06 +01:00
cneud
b4d168cae3 pep8-e265: comment should start with # 2025-03-25 23:54:31 +01:00
cneud
87ae6d11a9 pep8: whitespaces around operators 2025-03-25 23:01:12 +01:00
cneud
fa7bb63481 pep8-e302: two blank lines between functions 2025-03-25 22:36:22 +01:00
cneud
caf0fbe90f pep8-e302: two blank lines between functions 2025-03-25 22:32:59 +01:00
cneud
0e8ea64ba4 don't use equality operators to compare with None 2025-03-25 21:49:35 +01:00
cneud
2722a9a464 simplify chained comparisons 2025-03-25 21:44:52 +01:00
cneud
b9030f5203 replace multi-step list initialization with list literal 2025-03-25 21:42:49 +01:00
cneud
5326104d26 remove unnecessary backslash 2025-03-25 21:40:49 +01:00
cneud
badf80039f remove redundant parentheses 2025-03-25 21:39:00 +01:00
Clemens Neudecker
005b6988f4
Merge pull request #140 from qurator-spk/machine_based_reading_order_integration
Machine based reading order integration
2025-03-25 11:00:44 +01:00
vahidrezanezhad
d3a4c06e7f This commit enables the export of cropped text line images along with their corresponding texts from a Page-XML file. These exported text line images and texts can be utilized for training a text line-based OCR model. 2025-03-20 18:21:44 +01:00
vahidrezanezhad
c8b8529951 For the CNN-RNN OCR model, long text lines are split into two segments 2025-03-17 19:50:58 +01:00
vahidrezanezhad
aa72ca3006 Resolved an issue in the OCR-D framework where dir_out received a None value 2025-03-13 15:02:38 +01:00
vahidrezanezhad
a4f1f35125 Resolving test failure 2025-03-07 13:19:56 +01:00
kba
54040c1db4 Merge remote-tracking branch 'bertsky/machine_based_reading_order_integration_fixes' into machine_based_reading_order_integration 2025-03-06 15:48:52 +01:00
vahidrezanezhad
7110bd971f resolved an error for light version in the case that slope_deskew is smaller than slope_threshold 2025-02-27 19:11:15 +01:00
vahidrezanezhad
25116a2c79 resolved 2 errors 2025-02-19 00:35:48 +01:00
vahidrezanezhad
33fda2f8be changing cnn ocr model name 2024-12-26 22:45:40 +01:00
Robert Sachunsky
335aa273a1 simplify, wrap extremely long lines 2024-12-23 13:36:29 +00:00
Robert Sachunsky
cfc65128b1 reduce redundancy/indentation 2024-12-22 14:56:32 +00:00
Robert Sachunsky
01376af905 do_order_of_regions_with_model: simplify 2024-12-22 13:10:05 +00:00
vahidrezanezhad
92bfac4b41 Provide OCR as an option to process a directory of XML files, incorporating layout and text line coordinates. 2024-12-20 15:47:21 +01:00
vahidrezanezhad
fbeef79d50 adding scatter_nd inference 2024-12-16 01:11:54 +01:00
Robert Sachunsky
0ae28f7d3e switch from stdlib to loky.ProcessPoolExecutor, ensure shutdown 2024-12-14 12:16:29 +00:00
vahidrezanezhad
f93c6c288d function of patch-wise inference with scatter_nd is added 2024-12-14 02:50:17 +01:00
vahidrezanezhad
0e8c561618 debugging issues 2024-12-14 00:24:29 +01:00
Robert Sachunsky
e9c0d716f6 CI: install optional dependencies, too 2024-12-11 23:48:56 +00:00
Robert Sachunsky
dcaf796283 change polarity of orientation angle (PAGE schema required cw=positive) 2024-12-11 23:07:56 +00:00
Robert Sachunsky
b4b0890294 add option to overwrite output xml, but skip by default if file exists 2024-12-11 19:52:21 +00:00
Robert Sachunsky
b9ca7a6191 log num_cols-dependent resizing 2024-12-11 18:48:26 +00:00
Robert Sachunsky
9270ea4550 annotate region angles in PAGE 2024-12-11 18:48:26 +00:00
Robert Sachunsky
3b70b11ea6 avoid deskewing patches if binary-empty 2024-12-11 18:48:26 +00:00
Robert Sachunsky
7e9ee90e6e switch from (ad-hoc) mp.Pool to (attribute) concurrent.futures.ProcessPoolExecutor 2024-12-11 18:48:26 +00:00
Robert Sachunsky
68456ea002 do_work_of_slopes_new*, do_back_rotation_and_get_cnt_back, do_work_of_contours_in_image: use mp.Pool, simplify 2024-12-11 18:48:26 +00:00
Robert Sachunsky
25e967397d exit early if no text regions found (to avoid segfault) 2024-12-11 18:48:26 +00:00
Robert Sachunsky
21efea8711 no del on function argument 2024-12-11 18:48:26 +00:00
Robert Sachunsky
5e0c1da711 simplify 2024-12-11 00:18:58 +00:00
Robert Sachunsky
54cb15056b do_image_rotation / return_deskew_slop: avoid code duplication, simplify via mp.Pool 2024-12-10 09:52:32 +00:00
Robert Sachunsky
6fe02df973 do_image_rotation: fix f93fa12 (do return results) 2024-12-09 16:35:31 +00:00
Robert Sachunsky
d68017037c do_prediction: trigger GC to avoid CUDA OOM 2024-12-09 11:27:11 +00:00
Robert Sachunsky
ad748d0039 do_prediction: avoid code duplication 2024-12-09 10:55:41 +00:00
Robert Sachunsky
c3163caefd avoid indentation 2024-12-05 14:28:17 +00:00
Robert Sachunsky
055463d23a avoid indentation 2024-12-05 09:43:30 +00:00
Robert Sachunsky
aaea2ef463 simplify 2024-12-05 09:40:02 +00:00
Robert Sachunsky
3d88b207fc run: log instead of print 2024-12-05 09:39:55 +00:00