Robert Sachunsky
7387f5a929
do_order_of_regions: improve box matching, simplify
...
- when searching for boxes matching contour, be more precise:
- avoid heuristic rules ("xmin + 80 within xrange") in favour
of exact criteria (contour properly contained in box)
- for fallback criterion (nearest centers), also require
proper containment of center in box
- `order_of_regions`: remove (now) unnecessary (and insufficient)
workaround for missing indexes (if boxes are not covering contours
exactly)
2025-10-09 20:14:11 +02:00
Robert Sachunsky
4950e6bd78
order_of_regions: simplify
...
- use new `find_center_of_contours`
- avoid unused calculations
- avoid loops in favour of array processing
2025-10-09 20:14:10 +02:00
Robert Sachunsky
a1c8fd4467
do_order_of_regions / order_of_regions: simplify
...
- array-convert only once (before returning from `order_of_regions`)
- avoid passing `matrix_of_orders` unnecessarily between
`order_of_regions` and `order_and_id_of_texts`
2025-10-09 20:14:10 +02:00
Robert Sachunsky
415b2cbad8
eynollah, drop_capitals: simplify
...
- use new `find_center_of_contours`
2025-10-09 20:14:10 +02:00
Robert Sachunsky
3f3353ec3a
do_order_of_regions: simplify
...
- avoid loops in favour of array processing
2025-10-09 20:14:10 +02:00
Robert Sachunsky
8c3d5eb0eb
separate_marginals_to_left_and_right_and_order_from_top_to_down: simplify
...
- use new `find_center_of_contours`
- avoid loops in favour of array processing
- avoid repeated sorting
2025-10-09 20:14:10 +02:00
Robert Sachunsky
81827c2942
filter_contours_inside_a_bigger_one: simplify
...
- use new `find_center_of_contours`
- avoid loops in favour of array processing
- use sets instead of `np.unique` and `np.delete` instead of list.pop
2025-10-06 13:32:34 +02:00
Robert Sachunsky
0b9d4901a6
contour features: avoid unused calculations, simplify, add shortcuts
...
- new function: `find_center_of_contours`
- simplified: `find_(new_)features_of_contours`
2025-10-02 20:51:03 +02:00
Robert Sachunsky
3aa7ad04fa
📝 update changelog
2025-09-30 23:14:52 +02:00
Robert Sachunsky
f0de1adabf
rm loky dependency
2025-09-30 23:12:18 +02:00
Robert Sachunsky
7daec392b9
Dockerfile: fix up CUDA installation for mixed TF/Torch
2025-09-30 22:10:45 +02:00
Robert Sachunsky
ad129ed46c
CI: remove OS from model cache keys
2025-09-30 22:05:53 +02:00
Robert Sachunsky
c86e59f481
CI: update model key, split up cache restore/save
2025-09-30 22:03:46 +02:00
Robert Sachunsky
a3d8197930
makefile: update model URL
2025-09-30 21:50:21 +02:00
Robert Sachunsky
61b20cc83d
tests: switch from subtests to parametrize, use --isolate everywhere to free CUDA memory in between
2025-09-30 19:20:35 +02:00
Robert Sachunsky
375e0263d4
CNN-RNN OCR model: switch to 20250930 version (compatible with TF 2.12 on CPU as well)
2025-09-30 19:16:50 +02:00
Robert Sachunsky
b21051db21
ProcessPoolExecutor: shutdown during del() instead of atexit()
2025-09-30 19:16:00 +02:00
Robert Sachunsky
08c8c26028
indent extremely long lines
2025-09-30 03:52:19 +02:00
Robert Sachunsky
f857ee7b51
simplify
2025-09-30 02:26:00 +02:00
Robert Sachunsky
c0137c29ad
try to fix the failed outsourcing of utils_ocr
2025-09-30 02:23:43 +02:00
Robert Sachunsky
13f85b0d5c
Merge branch 'main' into loky-with-shm-for-175-rebuilt
2025-09-30 02:07:20 +02:00
Robert Sachunsky
758602403e
replace loky with concurrent.futures.ProcessPoolExecutor (faster)
2025-09-29 17:48:22 +02:00
Robert Sachunsky
0366707136
get_smallest_skew: do not pass logger
2025-09-29 17:48:22 +02:00
Robert Sachunsky
b94c96fcbb
find_num_col: exit early if empty (avoiding exceptions)
2025-09-29 17:48:22 +02:00
Robert Sachunsky
04c3d7dd1b
get_smallest_skew: avoid shm if no ProcessPoolExecutor is passed
2025-09-29 17:48:22 +02:00
Robert Sachunsky
0662ece536
do_work_of_slopes*: use shm also in non-light mode(s)
2025-09-29 17:48:22 +02:00
Robert Sachunsky
31f240c3b8
do_image_rotation, do_work_of_slopes_new_curved: pass arrays via shared memory
2025-09-29 17:48:22 +02:00
Robert Sachunsky
8be2c79771
Revert "deskewing with faster multiprocessing"
...
This reverts commit 5db3e9fa64
.
2025-09-29 17:48:22 +02:00
Robert Sachunsky
abf5c0f845
get_smallest_skew: when shifting search range of rotation angle, compare resulting (maximum) variances instead of blindly assuming the new range is better
2025-09-29 17:48:22 +02:00
Robert Sachunsky
dc0caad512
writer: use @type='heading' instead of 'header'
2025-09-29 17:48:22 +02:00
Robert Sachunsky
f458e3ece0
writer: SeparatorRegion needs SeparatorRegionType (not ImageRegionType)
2025-09-29 17:48:22 +02:00
Robert Sachunsky
4337d62985
contours: rename 'pixel' → 'label' for clarity
2025-09-29 17:48:22 +02:00
Robert Sachunsky
5b16c2fc00
avoid pulling unused 'image_page_rotated' through functions
2025-09-29 17:48:22 +02:00
Robert Sachunsky
5bff2d156a
use box2rect instead of crop_image_inside_box when no image needed
2025-09-29 17:48:22 +02:00
Robert Sachunsky
9b5182c1c0
utils: introduce box2rect and box2slice
2025-09-29 17:48:19 +02:00
Robert Sachunsky
bca2ae3d78
get_marginals: exit early if no peaks found to avoid spurious overlap mask
2025-09-29 17:47:51 +02:00
Robert Sachunsky
235539a350
filter_contours_without_textline_inside: avoid removing from identical lists twice
2025-09-29 17:47:51 +02:00
Robert Sachunsky
11e143afee
polygon2contour: avoid overflow
2025-09-29 17:47:51 +02:00
Robert Sachunsky
7a9e8256ee
increase dilatation: textregions/lines (5→6), seplines (0→1)
2025-09-29 17:47:51 +02:00
Robert Sachunsky
f3faa29528
refactor shapely converisons into contour2polygon / polygon2contour, also handle heterogeneous geometries
2025-09-29 17:47:51 +02:00
Robert Sachunsky
0650274ffa
move dilate_*_contours to .utils.contour, rename dilate_textregions_contours_textline_version → dilate_textline_contours
2025-09-29 17:47:47 +02:00
Robert Sachunsky
a433c73628
filter_contours_area_of_image*: also ensure validity here
2025-09-29 17:46:50 +02:00
Robert Sachunsky
17bcf1af71
rename *lines_xml → *seplines for clarity
2025-09-29 17:46:50 +02:00
Robert Sachunsky
e730725da3
check_any_text_region_in_model_one_is_main_or_header_light: return original instead of resampled contours
2025-09-29 17:46:50 +02:00
Robert Sachunsky
7b51fd6624
avoid creating invalid polygons via rounding
2025-09-29 17:46:50 +02:00
Robert Sachunsky
41cc38c51a
get_textregion_contours_in_org_image_light: no back rotation, drop slope_first (always 0)
2025-09-29 17:46:48 +02:00
Robert Sachunsky
afba70c920
separate_lines/do_work_of_slopes: skip if crop is empty
2025-09-29 17:44:39 +02:00
Robert Sachunsky
66b2bce8b9
return_boxes_of_images_by_order_of_reading_new: log any exceptions
2025-09-29 17:44:36 +02:00
Robert Sachunsky
b48c41e68f
return_boxes_of_images_by_order_of_reading_new: simplify, avoid changing dtype during np.append
2025-09-29 17:42:53 +02:00
Robert Sachunsky
09ece86f0d
dilate_textregions_contours: simplify (via shapely's Polygon.buffer()), ensure validity
2025-09-29 17:42:53 +02:00