kba
|
aac2e58b05
|
Merge remote-tracking branch 'michalbubula/add-feedback' into prepare-release-v0.5.0
|
2025-09-23 19:38:56 +02:00 |
|
kba
|
df8d93dbfa
|
Merge branch 'main' into add-feedback
|
2025-09-23 19:20:20 +02:00 |
|
kba
|
89e49f46bb
|
Merge remote-tracking branch 'origin/updating_readme_for_eynollah_use_cases' into prepare-release-v0.5.0
|
2025-09-23 19:16:54 +02:00 |
|
vahidrezanezhad
|
4c2e15aa00
|
default cnn-rnn and transformer ocr models have changed to model_eynollah_ocr_cnnrnn_20250904 and model_eynollah_ocr_trocr_20250919 respectively
|
2025-09-22 13:55:31 +02:00 |
|
vahidrezanezhad
|
c7ed7a30e6
|
extending image types
|
2025-09-21 02:32:40 +02:00 |
|
vahidrezanezhad
|
eb0b44b25c
|
Merge text of textlines and handle hyphenated words by joining them correctly
|
2025-09-19 23:23:30 +02:00 |
|
vahidrezanezhad
|
eb322d5182
|
writing page contour correctly in xml output + ignore unsupported file types when loading images
|
2025-09-19 18:06:18 +02:00 |
|
vahidrezanezhad
|
ee040c7767
|
debug new page extraction in the case of ignoring page extraction
|
2025-09-19 15:24:34 +02:00 |
|
kba
|
5c9cf8472b
|
remove redundant/brittle interval logging
|
2025-09-18 13:19:57 +02:00 |
|
kba
|
146102842a
|
convert all print stmts to logger.info calls
|
2025-09-18 13:15:18 +02:00 |
|
kba
|
c64d102613
|
move logging to CLI and make initialization optional
|
2025-09-18 13:07:41 +02:00 |
|
vahidrezanezhad
|
54d9916f3b
|
page extraction model name is changed
|
2025-09-16 14:27:15 +02:00 |
|
vahidrezanezhad
|
52cb0d9fac
|
new page extraction model integration
|
2025-09-15 13:38:23 +02:00 |
|
vahidrezanezhad
|
6e008345a0
|
new page extraction model integration
|
2025-09-15 13:36:58 +02:00 |
|
vahidrezanezhad
|
8c949cec71
|
PR #173 has been reverted. Additionally, for TrOCR, the cropped text lines will no longer be added to a list before prediction. Instead, for each batch size, the text line images will be collected and predictions will be made directly on them.
|
2025-09-03 19:18:11 +02:00 |
|
vahidrezanezhad
|
d9ae7bd12c
|
merged pr #173 in #175
|
2025-09-02 15:27:19 +02:00 |
|
Robert Sachunsky
|
b84d945b5a
|
Merge pull request #3 from bertsky/polygon-dilate-buffer-refactor2
some refactoring (second attempt)...
|
2025-09-02 13:26:52 +02:00 |
|
vahidrezanezhad
|
92a7c7cfea
|
changed the drop capitals bonding box to contour ratio threshold
|
2025-09-01 11:37:22 +02:00 |
|
vahidrezanezhad
|
6a735daa60
|
Update README.md
|
2025-08-31 23:30:54 +02:00 |
|
Robert Sachunsky
|
090341241e
|
writer: use @type='heading' instead of 'header'
|
2025-08-29 17:21:30 +02:00 |
|
Robert Sachunsky
|
bb9cba1fd9
|
writer: SeparatorRegion needs SeparatorRegionType (not ImageRegionType)
|
2025-08-29 17:21:30 +02:00 |
|
Robert Sachunsky
|
eae1303ebb
|
contours: rename 'pixel' → 'label' for clarity
|
2025-08-29 17:21:30 +02:00 |
|
Robert Sachunsky
|
dbbf1073df
|
avoid pulling unused 'image_page_rotated' through functions
|
2025-08-29 17:21:30 +02:00 |
|
Robert Sachunsky
|
142ac8825e
|
use box2rect instead of crop_image_inside_box when no image needed
|
2025-08-29 17:21:30 +02:00 |
|
Robert Sachunsky
|
892ff41e38
|
utils: introduce box2rect and box2slice
|
2025-08-29 17:21:30 +02:00 |
|
Robert Sachunsky
|
d3566e55ef
|
polygon2contour: fix 698f38e4 (deprecated dtype)
|
2025-08-29 17:21:08 +02:00 |
|
Robert Sachunsky
|
741aa7867c
|
get_marginals: exit early if no peaks found to avoid spurious overlap mask
|
2025-08-29 12:46:19 +02:00 |
|
Robert Sachunsky
|
57821662b9
|
filter_contours_without_textline_inside: avoid removing from identical lists twice
|
2025-08-29 12:46:12 +02:00 |
|
Robert Sachunsky
|
698f38e461
|
polygon2contour: avoid overflow
|
2025-08-29 12:43:46 +02:00 |
|
vahidrezanezhad
|
fdcae8dd6e
|
eynollah ocr: support using either a specific model name or a models directory (default model)
|
2025-08-28 11:30:59 +02:00 |
|
vahidrezanezhad
|
7dd281267d
|
Marginals are divided into left and right, and written from top to bottom.
|
2025-08-26 22:38:03 +02:00 |
|
Robert Sachunsky
|
fd6a6495a2
|
increase dilatation: textregions/lines (5→6), seplines (0→1)
|
2025-08-21 13:00:31 +02:00 |
|
Robert Sachunsky
|
8be52fb143
|
refactor shapely converisons into contour2polygon / polygon2contour, also handle heterogeneous geometries
|
2025-08-21 12:59:03 +02:00 |
|
Robert Sachunsky
|
8b5f90e243
|
move dilate_*_contours to .utils.contour, rename dilate_textregions_contours_textline_version → dilate_textline_contours
|
2025-08-21 01:42:46 +02:00 |
|
Robert Sachunsky
|
244772f086
|
filter_contours_area_of_image*: also ensure validity here
|
2025-08-21 01:33:16 +02:00 |
|
Robert Sachunsky
|
42474afa4b
|
rename *lines_xml → *seplines for clarity
|
2025-08-21 01:32:32 +02:00 |
|
Robert Sachunsky
|
b610fe07a6
|
check_any_text_region_in_model_one_is_main_or_header_light: return original instead of resampled contours
|
2025-08-21 01:05:15 +02:00 |
|
Robert Sachunsky
|
3d53070b90
|
avoid creating invalid polygons via rounding
|
2025-08-21 01:03:46 +02:00 |
|
Robert Sachunsky
|
277d00579e
|
get_textregion_contours_in_org_image_light: no back rotation, drop slope_first (always 0)
|
2025-08-20 14:28:14 +02:00 |
|
Robert Sachunsky
|
b6d1c43a85
|
dilate_textregions_contours_textline_version: simplify (via shapely's Polygon.buffer()), ensure validity
|
2025-08-20 14:26:14 +02:00 |
|
Robert Sachunsky
|
6c442c9ae9
|
separate_lines/do_work_of_slopes: skip if crop is empty
|
2025-08-19 22:56:36 +02:00 |
|
Robert Sachunsky
|
e9a6ff5d81
|
return_boxes_of_images_by_order_of_reading_new: simplify, avoid changing dtype during np.append
|
2025-08-19 20:09:09 +02:00 |
|
Robert Sachunsky
|
f994ea5f0b
|
dilate_textregions_contours: simplify (via shapely's Polygon.buffer()), ensure validity
|
2025-08-19 11:59:26 +02:00 |
|
vahidrezanezhad
|
8dc2fab9fa
|
reading order on given layout
|
2025-08-18 02:31:13 +02:00 |
|
Clemens Neudecker
|
a2359ea4c4
|
Merge pull request #171 from bertsky/ocrd-machine-based-ro
OCR-D processor: expose reading_order_machine_based
|
2025-08-15 18:40:13 +02:00 |
|
Robert Sachunsky
|
21615a986d
|
OCR-D processor: expose reading_order_machine_based
|
2025-08-13 14:14:37 +02:00 |
|
michalbubula
|
8ebba5ac04
|
add feedback to command line interface
|
2025-08-12 16:21:15 +02:00 |
|
vahidrezanezhad
|
20614d1678
|
avoiding float in range
|
2025-08-12 12:50:15 +02:00 |
|
vahidrezanezhad
|
5db3e9fa64
|
deskewing with faster multiprocessing
|
2025-08-08 11:32:02 +02:00 |
|
vahidrezanezhad
|
a0c19c57be
|
use the latest ocr model with balanced fraktur-antiqua training dataset
|
2025-08-05 14:22:22 +02:00 |
|