kba
8732007aaf
.
2025-11-06 16:33:39 +01:00
kba
f902756ce1
try importing torch, then shapely, then tensorflow
2025-11-06 13:10:35 +01:00
kba
44037bc05d
add layout marginalia test
2025-11-06 12:42:57 +01:00
kba
d224b0f7e8
try with shapely.set_precision(...mode="keep_collpased")
2025-11-06 11:55:40 +01:00
kba
0d84e7da16
Merge remote-tracking branch 'origin/docs_and_minor_fixes' into model-zoo
...
# Conflicts:
# README.md
# train/README.md
2025-11-06 11:37:10 +01:00
kba
53e879e289
make *test: another typo;
2025-11-05 16:19:55 +01:00
kba
e449dbab6d
make *test: fix paths
2025-11-05 15:28:41 +01:00
kba
0bef6e297b
make models: unzip to the versioned directory
2025-11-05 15:19:16 +01:00
kba
2c211095d7
make deps-test should not depend on the models
2025-11-05 15:02:55 +01:00
kba
b6c7283b4d
further debugging
2025-11-05 14:41:18 +01:00
cneud
f90259d6e2
fix docs links
2025-10-30 22:24:54 +01:00
cneud
d5b7089bad
Merge branch 'docs_and_minor_fixes' of https://github.com/qurator-spk/eynollah into docs_and_minor_fixes
2025-10-30 22:17:41 +01:00
cneud
9dbac280cc
Revert "remove unnecessary backslash"
...
This reverts commit f212ffa22d .
2025-10-30 22:16:53 +01:00
cneud
2d35a0598d
Revert "replace list declaration with list literal (faster)"
...
This reverts commit 9733d575bf .
2025-10-30 22:16:48 +01:00
cneud
70d8577a15
Revert "remove redundant parentheses"
...
This reverts commit 20a95365c2 .
2025-10-30 22:16:41 +01:00
Clemens Neudecker
c9efbe1871
refactor image layout in examples.md
2025-10-30 16:52:59 +01:00
kba
8782ef17b2
CI: 🔥 upgrade torch for debugging
2025-10-30 12:19:35 +01:00
kba
62d05917c5
test_layout: str(Path)
2025-10-30 12:17:38 +01:00
cneud
b1e191b2ea
reformat cli options table
2025-10-29 22:30:58 +01:00
cneud
f6c0f56348
Update README.md
2025-10-29 22:23:56 +01:00
cneud
46a45f6b0e
Create examples.md
2025-10-29 22:23:48 +01:00
kba
15e6ecb95d
make models: update URL
2025-10-29 21:27:10 +01:00
kba
600ebfeb50
make: fix to use single-archive ZIP
2025-10-29 21:07:49 +01:00
kba
9ab565fa02
model basedir might be a symlink
2025-10-29 21:02:42 +01:00
kba
4772fd17e2
missed changing override mechanism in eynollah_ocr
2025-10-29 20:47:13 +01:00
kba
29c273685f
fix merge issues
2025-10-29 20:15:19 +01:00
kba
de76eabc1d
Merge branch 'cli-logging' into model-zoo
2025-10-29 19:41:01 +01:00
kba
5e22e9db64
model_zoo: make type str to reduce importing overhead
2025-10-29 19:16:35 +01:00
kba
a913bdf7dc
make --model-basedir and --model-overrides top-level CLI options
2025-10-29 18:48:41 +01:00
kba
b6f82c72b9
refactor cli tests
2025-10-29 17:23:21 +01:00
cneud
22d61e8d94
remove newspaper images from main readme
2025-10-28 19:56:23 +01:00
cneud
8822da17cf
Merge remote-tracking branch 'origin/updating_docs' into docs_and_minor_fixes
2025-10-28 19:53:12 +01:00
kba
ef999c8f0a
Merge branch 'model-zoo' of lx0145.sbb.spk-berlin.de:/data/eynollah into model-zoo
2025-10-27 11:45:20 +01:00
kba
294b6356d3
wip
2025-10-27 11:45:16 +01:00
kba
51d2680d9c
wip
2025-10-27 11:44:59 +01:00
Robert Sachunsky
19b2c3fa42
reading order: improve handling of headings and horizontal seps
...
- drop connected components analysis to test overlaps between
horizontal separators and (horizontal) neighbours (introduced
in ab17a927)
- instead of converting headings to topline and baseline during
`find_number_of_columns_in_document` (introduced in 9f1595d7),
add them to the matrix unchanged, but mark as extra type
(besides horizontal and vertical separtors)
- convert headings to toplines and baselines no earlier than in
`return_boxes_of_images_by_order_of_reading_new`
- for both headings and horizontal separators, if they already
span multiple columns, check if they would overlap (horizontal)
neighbours by looking at successively larger (left and right)
intervals of columns (and pick the largest elongation which
does not introduce any overlaps)
2025-10-25 13:36:35 +02:00
Robert Sachunsky
3367462d18
return_boxes_of_images_by_order_of_reading_new: change arg order
2025-10-25 13:36:24 +02:00
Robert Sachunsky
a2a9fe5117
delete_separator_around: simplify, eynollah: identifiers
...
- use array instead of list operations
- rename identifiers:
- `pixel` → `label`
- `line` → `sep`
2025-10-25 13:36:17 +02:00
Robert Sachunsky
3ebbc2d693
return_boxes_of_images_by_order_of_reading_new: indent
...
(by removing unnecessary conditional)
2025-10-25 13:36:06 +02:00
Robert Sachunsky
66a0e55e49
return_boxes_of_images_by_order_of_reading_new: avoid oversplits
...
when y slice (`top:bot`) is not a significant part of the page,
viz. less than 22% (as in `find_number_of_columns_in_document`),
avoid forcing `find_num_col` to reach `num_col_classifier`
(allows large headers not to be split up and thus better ordered)
2025-10-25 13:35:56 +02:00
Robert Sachunsky
6fbb5f8a12
return_boxes_of_images_by_order_of_reading_new: simplify
...
- array instead of list operations
- add better plotting (but commented out)
- add more debug printing (but commented out)
- add more inline comments for documentation
- rename identifiers to make more readable:
- `cy_hor_diff` → `y_max_hor_some` (because the ymax gets passed)
- `lines` → `seps`
- `y_type_2` → `y_mid`
- `y_diff_type_2` → `y_max`
- `y_lines_by_order` → `y_mid_by_order`
- `y_lines_without_mother` → `y_mid_without_mother`
- `y_lines_with_child_without_mother` → `y_mid_with_child_without_mother`
- `y_column` → `y_mid_column`
- `y_column_nc` → `y_mid_column_nc`
- `y_all_between_nm_wc` → `y_mid_between_nm_wc`
- `lines_so_close_to_top_separator` → `seps_too_close_to_top_separator`
- `y_in_cols` and `y_down` → `y_mid_next`
- use `pairwise()` `nc_top:nc_bot` instead of `i_c` indexing
2025-10-25 13:35:44 +02:00
Robert Sachunsky
6cc5900943
find_num_col: add better plotting (but commented out)
2025-10-25 13:35:34 +02:00
Robert Sachunsky
5d15941b35
contours_in_same_horizon: simplify
...
- array instead of list operations
- return array of index pairs instead of list objects
2025-10-25 13:35:26 +02:00
Robert Sachunsky
acee4c1bfe
find_number_of_columns_in_document: simplify
2025-10-25 13:35:18 +02:00
Robert Sachunsky
b2a79cc6ed
return_x_start_end_mothers_childs_and_type_of_reading_order: fix+1
...
when calculating `reading_order_type`, upper limit on column range
(`x_end`) needs to be `+1` here as well
2025-10-25 13:35:12 +02:00
Robert Sachunsky
e2dfec75fb
return_x_start_end_mothers_childs_and_type_of_reading_order:
...
simplify and document
- simplify
- rename identifiers to make readable:
- `y_sep` → `y_mid` (because the cy gets passed)
- `y_diff` → `y_max` (because the ymax gets passed)
- array instead of list operations
- add docstring and in-line comments
- return (zero-length) numpy array instead of empty list
2025-10-25 13:35:06 +02:00
Robert Sachunsky
0fc4b2535d
return_boxes_of_images_by_order_of_reading_new: fix no-mother case
...
- when handling lines without mother,
and biggest line already accounts for all columns,
but some are too close to the top and therefore must be removed,
avoid invalidating `biggest` index, causing `IndexError`
- remove try-catch (now unnecessary)
- array instead of list operations
2025-10-25 13:34:58 +02:00
Robert Sachunsky
7c3e418588
return_boxes_of_images_by_order_of_reading_new: simplify
...
- enumeration instead of indexing
- array instead of list operations
- add better plotting (but commented out)
2025-10-25 13:34:52 +02:00
Robert Sachunsky
cd35241e81
find_number_of_columns_in_document: split headings at top+baseline
...
regarding `splitter_y` result, for headings, instead of cutting right
through them via center line, add their toplines and baselines as if
they were horizontal separators
2025-10-25 13:34:35 +02:00
vahidrezanezhad
6192e5ba5c
qualitative evaluation of ocr models are added to docs
2025-10-23 16:37:24 +02:00