Commit graph

890 commits

Author SHA1 Message Date
Robert Sachunsky
13f85b0d5c Merge branch 'main' into loky-with-shm-for-175-rebuilt 2025-09-30 02:07:20 +02:00
Robert Sachunsky
758602403e replace loky with concurrent.futures.ProcessPoolExecutor (faster) 2025-09-29 17:48:22 +02:00
Robert Sachunsky
0366707136 get_smallest_skew: do not pass logger 2025-09-29 17:48:22 +02:00
Robert Sachunsky
b94c96fcbb find_num_col: exit early if empty (avoiding exceptions) 2025-09-29 17:48:22 +02:00
Robert Sachunsky
04c3d7dd1b get_smallest_skew: avoid shm if no ProcessPoolExecutor is passed 2025-09-29 17:48:22 +02:00
Robert Sachunsky
0662ece536 do_work_of_slopes*: use shm also in non-light mode(s) 2025-09-29 17:48:22 +02:00
Robert Sachunsky
31f240c3b8 do_image_rotation, do_work_of_slopes_new_curved: pass arrays via shared memory 2025-09-29 17:48:22 +02:00
Robert Sachunsky
8be2c79771 Revert "deskewing with faster multiprocessing"
This reverts commit 5db3e9fa64.
2025-09-29 17:48:22 +02:00
Robert Sachunsky
abf5c0f845 get_smallest_skew: when shifting search range of rotation angle, compare resulting (maximum) variances instead of blindly assuming the new range is better 2025-09-29 17:48:22 +02:00
Robert Sachunsky
dc0caad512 writer: use @type='heading' instead of 'header' 2025-09-29 17:48:22 +02:00
Robert Sachunsky
f458e3ece0 writer: SeparatorRegion needs SeparatorRegionType (not ImageRegionType) 2025-09-29 17:48:22 +02:00
Robert Sachunsky
4337d62985 contours: rename 'pixel' → 'label' for clarity 2025-09-29 17:48:22 +02:00
Robert Sachunsky
5b16c2fc00 avoid pulling unused 'image_page_rotated' through functions 2025-09-29 17:48:22 +02:00
Robert Sachunsky
5bff2d156a use box2rect instead of crop_image_inside_box when no image needed 2025-09-29 17:48:22 +02:00
Robert Sachunsky
9b5182c1c0 utils: introduce box2rect and box2slice 2025-09-29 17:48:19 +02:00
Robert Sachunsky
bca2ae3d78 get_marginals: exit early if no peaks found to avoid spurious overlap mask 2025-09-29 17:47:51 +02:00
Robert Sachunsky
235539a350 filter_contours_without_textline_inside: avoid removing from identical lists twice 2025-09-29 17:47:51 +02:00
Robert Sachunsky
11e143afee polygon2contour: avoid overflow 2025-09-29 17:47:51 +02:00
Robert Sachunsky
7a9e8256ee increase dilatation: textregions/lines (5→6), seplines (0→1) 2025-09-29 17:47:51 +02:00
Robert Sachunsky
f3faa29528 refactor shapely converisons into contour2polygon / polygon2contour, also handle heterogeneous geometries 2025-09-29 17:47:51 +02:00
Robert Sachunsky
0650274ffa move dilate_*_contours to .utils.contour, rename dilate_textregions_contours_textline_version → dilate_textline_contours 2025-09-29 17:47:47 +02:00
Robert Sachunsky
a433c73628 filter_contours_area_of_image*: also ensure validity here 2025-09-29 17:46:50 +02:00
Robert Sachunsky
17bcf1af71 rename *lines_xml → *seplines for clarity 2025-09-29 17:46:50 +02:00
Robert Sachunsky
e730725da3 check_any_text_region_in_model_one_is_main_or_header_light: return original instead of resampled contours 2025-09-29 17:46:50 +02:00
Robert Sachunsky
7b51fd6624 avoid creating invalid polygons via rounding 2025-09-29 17:46:50 +02:00
Robert Sachunsky
41cc38c51a get_textregion_contours_in_org_image_light: no back rotation, drop slope_first (always 0) 2025-09-29 17:46:48 +02:00
Robert Sachunsky
afba70c920 separate_lines/do_work_of_slopes: skip if crop is empty 2025-09-29 17:44:39 +02:00
Robert Sachunsky
66b2bce8b9 return_boxes_of_images_by_order_of_reading_new: log any exceptions 2025-09-29 17:44:36 +02:00
Robert Sachunsky
b48c41e68f return_boxes_of_images_by_order_of_reading_new: simplify, avoid changing dtype during np.append 2025-09-29 17:42:53 +02:00
Robert Sachunsky
09ece86f0d dilate_textregions_contours: simplify (via shapely's Polygon.buffer()), ensure validity 2025-09-29 17:42:53 +02:00
kba
6ea6a62801 📝 v0.5.0 2025-09-26 16:23:46 +02:00
Konstantin Baierer
882e242946
Merge pull request #178 from qurator-spk/prepare-release-v0.5.0
Prepare release v0.5.0
2025-09-26 16:21:09 +02:00
kba
37e64b4e45 📝 changelog 2025-09-26 16:19:04 +02:00
kba
3123add815 📝 update README 2025-09-26 15:07:32 +02:00
kba
830cc2c30a comment out the offending test outright 2025-09-26 14:37:04 +02:00
kba
eb8d4573a8 tests: also disable ...ocr_directory test 2025-09-26 13:57:08 +02:00
kba
42fb452a7e disable the -doit OCR test 2025-09-26 12:55:29 +02:00
Robert Sachunsky
480daa4c7c test_run: make ocr -doit work (add truetype file) 2025-09-25 22:28:15 +02:00
kba
4c6405713a ci: ocr models 2025-09-25 22:19:36 +02:00
kba
b4d460ca79 makefile forgot the OCR models 2025-09-25 22:16:38 +02:00
kba
f3f5426597 Merge branch 'adapt-ocrd' of https://github.com/qurator-spk/eynollah into adapt-ocrd 2025-09-25 21:47:27 +02:00
kba
0bb1fb1a05 tests: adapt to layout/ocr model split 2025-09-25 21:47:15 +02:00
kba
2ec773128b Merge branch 'adapt-ocrd' of https://github.com/qurator-spk/eynollah into adapt-ocrd 2025-09-25 21:40:48 +02:00
kba
f37d80c188 Merge branch 'adapt-ocrd' of https://github.com/qurator-spk/eynollah into adapt-ocrd 2025-09-25 21:39:55 +02:00
kba
57ee1cdc72 Merge remote-tracking branch 'bertsky/mbro_dead_code-plus-fixes-plus-tests' into adapt-ocrd 2025-09-25 21:39:36 +02:00
kba
5c0ab509c4 CI: Update model name 2025-09-25 21:17:32 +02:00
kba
9303ded11f ocrd-tool.json: use models_layout instead of eynollah_layouts for consistency 2025-09-25 21:12:52 +02:00
Robert Sachunsky
7c79902835 enhancement/mbreorder: make all path options kwargs to run() instead of attributes 2025-09-25 20:51:02 +02:00
kba
e6ee26fde3 make models: adapt to zenodo/v0.5.0 2025-09-25 20:35:54 +02:00
kba
11de8a025d Adapt ocrd-eynollah-segment for release 2025-09-25 20:11:48 +02:00