Robert Sachunsky
a2a9fe5117
delete_separator_around: simplify, eynollah: identifiers
...
- use array instead of list operations
- rename identifiers:
- `pixel` → `label`
- `line` → `sep`
2025-10-25 13:36:17 +02:00
Robert Sachunsky
3ebbc2d693
return_boxes_of_images_by_order_of_reading_new: indent
...
(by removing unnecessary conditional)
2025-10-25 13:36:06 +02:00
Robert Sachunsky
66a0e55e49
return_boxes_of_images_by_order_of_reading_new: avoid oversplits
...
when y slice (`top:bot`) is not a significant part of the page,
viz. less than 22% (as in `find_number_of_columns_in_document`),
avoid forcing `find_num_col` to reach `num_col_classifier`
(allows large headers not to be split up and thus better ordered)
2025-10-25 13:35:56 +02:00
Robert Sachunsky
6fbb5f8a12
return_boxes_of_images_by_order_of_reading_new: simplify
...
- array instead of list operations
- add better plotting (but commented out)
- add more debug printing (but commented out)
- add more inline comments for documentation
- rename identifiers to make more readable:
- `cy_hor_diff` → `y_max_hor_some` (because the ymax gets passed)
- `lines` → `seps`
- `y_type_2` → `y_mid`
- `y_diff_type_2` → `y_max`
- `y_lines_by_order` → `y_mid_by_order`
- `y_lines_without_mother` → `y_mid_without_mother`
- `y_lines_with_child_without_mother` → `y_mid_with_child_without_mother`
- `y_column` → `y_mid_column`
- `y_column_nc` → `y_mid_column_nc`
- `y_all_between_nm_wc` → `y_mid_between_nm_wc`
- `lines_so_close_to_top_separator` → `seps_too_close_to_top_separator`
- `y_in_cols` and `y_down` → `y_mid_next`
- use `pairwise()` `nc_top:nc_bot` instead of `i_c` indexing
2025-10-25 13:35:44 +02:00
Robert Sachunsky
6cc5900943
find_num_col: add better plotting (but commented out)
2025-10-25 13:35:34 +02:00
Robert Sachunsky
5d15941b35
contours_in_same_horizon: simplify
...
- array instead of list operations
- return array of index pairs instead of list objects
2025-10-25 13:35:26 +02:00
Robert Sachunsky
acee4c1bfe
find_number_of_columns_in_document: simplify
2025-10-25 13:35:18 +02:00
Robert Sachunsky
b2a79cc6ed
return_x_start_end_mothers_childs_and_type_of_reading_order: fix+1
...
when calculating `reading_order_type`, upper limit on column range
(`x_end`) needs to be `+1` here as well
2025-10-25 13:35:12 +02:00
Robert Sachunsky
e2dfec75fb
return_x_start_end_mothers_childs_and_type_of_reading_order:
...
simplify and document
- simplify
- rename identifiers to make readable:
- `y_sep` → `y_mid` (because the cy gets passed)
- `y_diff` → `y_max` (because the ymax gets passed)
- array instead of list operations
- add docstring and in-line comments
- return (zero-length) numpy array instead of empty list
2025-10-25 13:35:06 +02:00
Robert Sachunsky
0fc4b2535d
return_boxes_of_images_by_order_of_reading_new: fix no-mother case
...
- when handling lines without mother,
and biggest line already accounts for all columns,
but some are too close to the top and therefore must be removed,
avoid invalidating `biggest` index, causing `IndexError`
- remove try-catch (now unnecessary)
- array instead of list operations
2025-10-25 13:34:58 +02:00
Robert Sachunsky
7c3e418588
return_boxes_of_images_by_order_of_reading_new: simplify
...
- enumeration instead of indexing
- array instead of list operations
- add better plotting (but commented out)
2025-10-25 13:34:52 +02:00
Robert Sachunsky
cd35241e81
find_number_of_columns_in_document: split headings at top+baseline
...
regarding `splitter_y` result, for headings, instead of cutting right
through them via center line, add their toplines and baselines as if
they were horizontal separators
2025-10-25 13:34:35 +02:00
vahidrezanezhad
6192e5ba5c
qualitative evaluation of ocr models are added to docs
2025-10-23 16:37:24 +02:00
kba
ec1fd93dad
wip
2025-10-23 11:58:23 +02:00
vahidrezanezhad
d0ad7a98b7
starting qualitative ocr evaluation
2025-10-22 22:45:22 +02:00
vahidrezanezhad
7b7714af2e
completing ocr evaluations metric
2025-10-22 22:42:37 +02:00
vahidrezanezhad
b56bb44284
providing ocr model evaluation metrics
2025-10-22 21:30:06 +02:00
vahidrezanezhad
59eb4fd3be
images with ro are added to readme
2025-10-22 19:04:01 +02:00
vahidrezanezhad
ab9ddd5214
OCR examples are added to README
2025-10-22 18:41:15 +02:00
vahidrezanezhad
2fc723d292
extend README
2025-10-22 18:29:14 +02:00
kba
874cfc247f
.
2025-10-22 17:56:18 +02:00
kba
883546a6b8
eynollah models package
2025-10-22 17:05:40 +02:00
kba
04bc4a63d0
reorganize model_zoo
2025-10-22 16:04:48 +02:00
kba
d94285b3ea
rewrite model spec data structure
2025-10-22 13:07:35 +02:00
kba
146658f026
eynollah layout: fix trocr_processor model_zoo call
2025-10-22 10:48:26 +02:00
kba
4c8abfe19c
eynollah_ocr: actually replace the model calls
2025-10-22 10:48:26 +02:00
kba
1337461d47
adopt image_enhancer to the zoo
2025-10-21 19:24:55 +02:00
kba
f0c86672f8
adopt mb_ro_on_layout to the zoo
2025-10-21 17:55:08 +02:00
kba
bcffa2e503
adopt binarizer to the zoo
2025-10-21 17:53:24 +02:00
kba
de34a15809
Makefile: fix make models for OCR
2025-10-21 17:27:16 +02:00
kba
9d2b18d2af
test_run: check log messages starting with eynollah
2025-10-21 13:29:55 +02:00
kba
a53d5fc452
update docs/makefile to point to v0.6.0 models
2025-10-21 13:15:57 +02:00
kba
c6b863b13f
typing and asserts
2025-10-21 12:05:27 +02:00
kba
44b75eb36f
cli: model -> model_basedir
2025-10-21 11:05:12 +02:00
cneud
7d70835d22
small fixes to main readme
2025-10-20 23:19:10 +02:00
cneud
230e7cc705
integrate ocrd docs
2025-10-20 22:52:54 +02:00
cneud
e5254dc6c5
integrate training docs
2025-10-20 22:39:54 +02:00
cneud
6e3399fe7a
combine Docker docs
2025-10-20 22:16:56 +02:00
kba
062f317d2e
Introduce model_zoo to Eynollah_ocr
2025-10-20 21:14:52 +02:00
kba
d609a532bf
organize imports mostly
2025-10-20 19:46:07 +02:00
kba
48d1198d24
move Eynollah_ocr to separate module
2025-10-20 19:15:31 +02:00
kba
b90cfdfcc4
adapt tests to -l being top-level option now
2025-10-20 18:56:24 +02:00
kba
a850ef39ea
factor model loading in Eynollah to EynollahModelZoo
2025-10-20 18:34:44 +02:00
Robert Sachunsky
5a0e4c3b0f
find_number_of_columns_in_document: improve splitter rule
...
extend horizontal separators to full img width if they do not overlap
any other regions
(only as regards to returned `splitter_y` result,
but without changing returned separators mask)
2025-10-20 17:41:50 +02:00
Robert Sachunsky
542d38ab43
find_number_of_columns_in_document: simplify, rename line→seps
2025-10-20 17:41:49 +02:00
Robert Sachunsky
d3d599b010
order_of_regions: add better plotting (but commented out)
2025-10-20 17:41:47 +02:00
Robert Sachunsky
c43a825d1d
order_of_regions: filter out-of-image peaks
2025-10-20 17:41:47 +02:00
Robert Sachunsky
48761c3e12
find_num_col: simplify, add better plotting (but commented out)
2025-10-20 17:41:45 +02:00
Robert Sachunsky
184927fb54
find_num_cols: re-sort peaks when cutting n-best num_col_classifier
2025-10-20 17:41:44 +02:00
Robert Sachunsky
086c1880ac
binarization: add option --overwrite, skip existing outputs
...
(also, simplify `run` and separate `run_single`)
2025-10-20 17:40:52 +02:00