Robert Sachunsky
e2dfec75fb
return_x_start_end_mothers_childs_and_type_of_reading_order:
...
simplify and document
- simplify
- rename identifiers to make readable:
- `y_sep` → `y_mid` (because the cy gets passed)
- `y_diff` → `y_max` (because the ymax gets passed)
- array instead of list operations
- add docstring and in-line comments
- return (zero-length) numpy array instead of empty list
2025-10-25 13:35:06 +02:00
Robert Sachunsky
0fc4b2535d
return_boxes_of_images_by_order_of_reading_new: fix no-mother case
...
- when handling lines without mother,
and biggest line already accounts for all columns,
but some are too close to the top and therefore must be removed,
avoid invalidating `biggest` index, causing `IndexError`
- remove try-catch (now unnecessary)
- array instead of list operations
2025-10-25 13:34:58 +02:00
Robert Sachunsky
7c3e418588
return_boxes_of_images_by_order_of_reading_new: simplify
...
- enumeration instead of indexing
- array instead of list operations
- add better plotting (but commented out)
2025-10-25 13:34:52 +02:00
Robert Sachunsky
cd35241e81
find_number_of_columns_in_document: split headings at top+baseline
...
regarding `splitter_y` result, for headings, instead of cutting right
through them via center line, add their toplines and baselines as if
they were horizontal separators
2025-10-25 13:34:35 +02:00
vahidrezanezhad
6192e5ba5c
qualitative evaluation of ocr models are added to docs
2025-10-23 16:37:24 +02:00
kba
ec1fd93dad
wip
2025-10-23 11:58:23 +02:00
vahidrezanezhad
d0ad7a98b7
starting qualitative ocr evaluation
2025-10-22 22:45:22 +02:00
vahidrezanezhad
7b7714af2e
completing ocr evaluations metric
2025-10-22 22:42:37 +02:00
vahidrezanezhad
b56bb44284
providing ocr model evaluation metrics
2025-10-22 21:30:06 +02:00
vahidrezanezhad
59eb4fd3be
images with ro are added to readme
2025-10-22 19:04:01 +02:00
vahidrezanezhad
ab9ddd5214
OCR examples are added to README
2025-10-22 18:41:15 +02:00
vahidrezanezhad
2fc723d292
extend README
2025-10-22 18:29:14 +02:00
kba
874cfc247f
.
2025-10-22 17:56:18 +02:00
kba
883546a6b8
eynollah models package
2025-10-22 17:05:40 +02:00
kba
04bc4a63d0
reorganize model_zoo
2025-10-22 16:04:48 +02:00
kba
d94285b3ea
rewrite model spec data structure
2025-10-22 13:07:35 +02:00
kba
146658f026
eynollah layout: fix trocr_processor model_zoo call
2025-10-22 10:48:26 +02:00
kba
4c8abfe19c
eynollah_ocr: actually replace the model calls
2025-10-22 10:48:26 +02:00
kba
1337461d47
adopt image_enhancer to the zoo
2025-10-21 19:24:55 +02:00
kba
f0c86672f8
adopt mb_ro_on_layout to the zoo
2025-10-21 17:55:08 +02:00
kba
bcffa2e503
adopt binarizer to the zoo
2025-10-21 17:53:24 +02:00
kba
de34a15809
Makefile: fix make models for OCR
2025-10-21 17:27:16 +02:00
kba
9d2b18d2af
test_run: check log messages starting with eynollah
2025-10-21 13:29:55 +02:00
kba
a53d5fc452
update docs/makefile to point to v0.6.0 models
2025-10-21 13:15:57 +02:00
kba
c6b863b13f
typing and asserts
2025-10-21 12:05:27 +02:00
kba
44b75eb36f
cli: model -> model_basedir
2025-10-21 11:05:12 +02:00
cneud
7d70835d22
small fixes to main readme
2025-10-20 23:19:10 +02:00
cneud
230e7cc705
integrate ocrd docs
2025-10-20 22:52:54 +02:00
cneud
e5254dc6c5
integrate training docs
2025-10-20 22:39:54 +02:00
cneud
6e3399fe7a
combine Docker docs
2025-10-20 22:16:56 +02:00
kba
062f317d2e
Introduce model_zoo to Eynollah_ocr
2025-10-20 21:14:52 +02:00
kba
d609a532bf
organize imports mostly
2025-10-20 19:46:07 +02:00
kba
48d1198d24
move Eynollah_ocr to separate module
2025-10-20 19:15:31 +02:00
kba
b90cfdfcc4
adapt tests to -l being top-level option now
2025-10-20 18:56:24 +02:00
kba
a850ef39ea
factor model loading in Eynollah to EynollahModelZoo
2025-10-20 18:34:44 +02:00
Robert Sachunsky
5a0e4c3b0f
find_number_of_columns_in_document: improve splitter rule
...
extend horizontal separators to full img width if they do not overlap
any other regions
(only as regards to returned `splitter_y` result,
but without changing returned separators mask)
2025-10-20 17:41:50 +02:00
Robert Sachunsky
542d38ab43
find_number_of_columns_in_document: simplify, rename line→seps
2025-10-20 17:41:49 +02:00
Robert Sachunsky
d3d599b010
order_of_regions: add better plotting (but commented out)
2025-10-20 17:41:47 +02:00
Robert Sachunsky
c43a825d1d
order_of_regions: filter out-of-image peaks
2025-10-20 17:41:47 +02:00
Robert Sachunsky
48761c3e12
find_num_col: simplify, add better plotting (but commented out)
2025-10-20 17:41:45 +02:00
Robert Sachunsky
184927fb54
find_num_cols: re-sort peaks when cutting n-best num_col_classifier
2025-10-20 17:41:44 +02:00
Robert Sachunsky
086c1880ac
binarization: add option --overwrite, skip existing outputs
...
(also, simplify `run` and separate `run_single`)
2025-10-20 17:40:52 +02:00
vahidrezanezhad
c8455370a9
updating heuristics and ocr documentation
2025-10-20 15:13:45 +02:00
vahidrezanezhad
3ec5ceb22e
Update flowchart
2025-10-20 14:55:14 +02:00
vahidrezanezhad
9d2dbb8388
updating model based reading orde detection
2025-10-20 14:47:55 +02:00
cneud
496a0e2ca4
readme and documentation updates
2025-10-17 19:19:26 +02:00
cneud
f212ffa22d
remove unnecessary backslash
2025-10-17 18:27:18 +02:00
cneud
9733d575bf
replace list declaration with list literal (faster)
2025-10-17 18:21:49 +02:00
cneud
20a95365c2
remove redundant parentheses
2025-10-17 18:19:00 +02:00
cneud
2a1f892d72
expand keywords and supported Python versions
2025-10-17 18:17:41 +02:00