Commit graph

772 commits

Author SHA1 Message Date
Robert Sachunsky
0366707136 get_smallest_skew: do not pass logger 2025-09-29 17:48:22 +02:00
Robert Sachunsky
b94c96fcbb find_num_col: exit early if empty (avoiding exceptions) 2025-09-29 17:48:22 +02:00
Robert Sachunsky
04c3d7dd1b get_smallest_skew: avoid shm if no ProcessPoolExecutor is passed 2025-09-29 17:48:22 +02:00
Robert Sachunsky
0662ece536 do_work_of_slopes*: use shm also in non-light mode(s) 2025-09-29 17:48:22 +02:00
Robert Sachunsky
31f240c3b8 do_image_rotation, do_work_of_slopes_new_curved: pass arrays via shared memory 2025-09-29 17:48:22 +02:00
Robert Sachunsky
8be2c79771 Revert "deskewing with faster multiprocessing"
This reverts commit 5db3e9fa64.
2025-09-29 17:48:22 +02:00
Robert Sachunsky
abf5c0f845 get_smallest_skew: when shifting search range of rotation angle, compare resulting (maximum) variances instead of blindly assuming the new range is better 2025-09-29 17:48:22 +02:00
Robert Sachunsky
dc0caad512 writer: use @type='heading' instead of 'header' 2025-09-29 17:48:22 +02:00
Robert Sachunsky
f458e3ece0 writer: SeparatorRegion needs SeparatorRegionType (not ImageRegionType) 2025-09-29 17:48:22 +02:00
Robert Sachunsky
4337d62985 contours: rename 'pixel' → 'label' for clarity 2025-09-29 17:48:22 +02:00
Robert Sachunsky
5b16c2fc00 avoid pulling unused 'image_page_rotated' through functions 2025-09-29 17:48:22 +02:00
Robert Sachunsky
5bff2d156a use box2rect instead of crop_image_inside_box when no image needed 2025-09-29 17:48:22 +02:00
Robert Sachunsky
9b5182c1c0 utils: introduce box2rect and box2slice 2025-09-29 17:48:19 +02:00
Robert Sachunsky
bca2ae3d78 get_marginals: exit early if no peaks found to avoid spurious overlap mask 2025-09-29 17:47:51 +02:00
Robert Sachunsky
235539a350 filter_contours_without_textline_inside: avoid removing from identical lists twice 2025-09-29 17:47:51 +02:00
Robert Sachunsky
11e143afee polygon2contour: avoid overflow 2025-09-29 17:47:51 +02:00
Robert Sachunsky
7a9e8256ee increase dilatation: textregions/lines (5→6), seplines (0→1) 2025-09-29 17:47:51 +02:00
Robert Sachunsky
f3faa29528 refactor shapely converisons into contour2polygon / polygon2contour, also handle heterogeneous geometries 2025-09-29 17:47:51 +02:00
Robert Sachunsky
0650274ffa move dilate_*_contours to .utils.contour, rename dilate_textregions_contours_textline_version → dilate_textline_contours 2025-09-29 17:47:47 +02:00
Robert Sachunsky
a433c73628 filter_contours_area_of_image*: also ensure validity here 2025-09-29 17:46:50 +02:00
Robert Sachunsky
17bcf1af71 rename *lines_xml → *seplines for clarity 2025-09-29 17:46:50 +02:00
Robert Sachunsky
e730725da3 check_any_text_region_in_model_one_is_main_or_header_light: return original instead of resampled contours 2025-09-29 17:46:50 +02:00
Robert Sachunsky
7b51fd6624 avoid creating invalid polygons via rounding 2025-09-29 17:46:50 +02:00
Robert Sachunsky
41cc38c51a get_textregion_contours_in_org_image_light: no back rotation, drop slope_first (always 0) 2025-09-29 17:46:48 +02:00
Robert Sachunsky
afba70c920 separate_lines/do_work_of_slopes: skip if crop is empty 2025-09-29 17:44:39 +02:00
Robert Sachunsky
66b2bce8b9 return_boxes_of_images_by_order_of_reading_new: log any exceptions 2025-09-29 17:44:36 +02:00
Robert Sachunsky
b48c41e68f return_boxes_of_images_by_order_of_reading_new: simplify, avoid changing dtype during np.append 2025-09-29 17:42:53 +02:00
Robert Sachunsky
09ece86f0d dilate_textregions_contours: simplify (via shapely's Polygon.buffer()), ensure validity 2025-09-29 17:42:53 +02:00
Clemens Neudecker
a2359ea4c4
Merge pull request #171 from bertsky/ocrd-machine-based-ro
OCR-D processor: expose reading_order_machine_based
2025-08-15 18:40:13 +02:00
Robert Sachunsky
21615a986d OCR-D processor: expose reading_order_machine_based 2025-08-13 14:14:37 +02:00
vahidrezanezhad
6b8893b188
Merge pull request #167 from qurator-spk/ocrd-fixes
Ocrd fixes
2025-07-22 14:46:25 +02:00
kba
b7b218ff11 OCR-D processor: same behavior as standalone wrt light_version/textline_light 2025-06-12 15:30:17 +02:00
vahidrezanezhad
c194a20c9c Fixed duplicate textline_light assignments (true and false) in the OCR-D framework for the Eynollah light version, which caused rectangles to be used instead of contours for textlines 2025-06-12 15:27:22 +02:00
Clemens Neudecker
3dcbb20cac
Merge pull request #159 from bertsky/main
update docker
2025-05-06 15:14:06 +02:00
Robert Sachunsky
e9179e1d34 docker: use latest core base stage 2025-05-02 00:16:22 +02:00
Robert Sachunsky
f8b4d29a59 docker: prepackage ocrd-all-module-dir.json 2025-05-02 00:16:22 +02:00
vahidrezanezhad
e2da7a6239 Fix model name to return the correct machine-based model name 2025-04-30 16:06:29 +02:00
vahidrezanezhad
b227736094 Fix OCR text cleaning to correctly handle 'U', 'K', and 'N' starting sentence; update text line splitting size 2025-04-30 16:04:34 +02:00
vahidrezanezhad
4cb4414740 Resolve remaining issue with #158 and resolving #124 2025-04-30 16:01:52 +02:00
vahidrezanezhad
208bde706f resolving issue #158 2025-04-30 13:55:09 +02:00
Konstantin Baierer
3e8adb86c2
Merge pull request #157 from qurator-spk/kba-patch-1
CI: Use most recent actions/setup-python@v5
2025-04-29 11:42:18 +02:00
Konstantin Baierer
77dae129d5
CI: Use most recent actions/setup-python@v5 2025-04-22 13:22:28 +02:00
Clemens Neudecker
b4df978dd5
Merge pull request #154 from qurator-spk/ci-pypi
CI: pypi
2025-04-17 17:01:20 +02:00
kba
30ba234641 CI: pypi 2025-04-16 19:27:17 +02:00
kba
41318f0404 📝 changelog 2025-04-15 11:14:26 +02:00
vahidrezanezhad
a22df11ebb Restoring the contour in the original image caused an error due to an empty tuple. This issue has been resolved, and as expected, the confidence score for this contour is set to zero 2025-04-14 00:42:08 +02:00
kba
8080bd823c 📦 v0.4.0 2025-04-07 16:48:57 +02:00
Robert Sachunsky
bcf1898aa4 📝 changelog 2025-04-07 16:46:58 +02:00
Robert Sachunsky
177e017167 test_run: ensure exceptions are shown 2025-04-07 10:39:50 +00:00
vahidrezanezhad
e2907f67e0 'from PIL.Image import Image' causes an error when using Image.new(), and since Image is already imported, this line can be safely commented out. 2025-04-06 00:33:36 +02:00