- re-use Eynollah base class, drop copied code
- simplify `run()` and `run_single()`
- delegate to `do_prediction()`
instead of custom (old) tiling loop
- drop `predict()`
- add `--device` option to CLI as well
- re-use Eynollah base class, drop copied code
- write usable `run()` and `run_single()`
- delegate to `resize_image_with_column_classifier()`
for column classifier, resizing and enhancement,
instead of `resize_and_enhance_image_with_column_classifier()`
(which does _not_ actually enhance)
- drop unused `predict_enhancement()`
- add defaults to `num_col` options (always numeric)
- add `--device` option to CLI as well
use rules from `resize_and_enhance_image_with_column_classifier()`
and apply them to `resize_image_with_column_classifier()` as well
(to be used by enhancer CLI)
instead of hard cut-offs between overlapping window tiles,
apply sigmoid attenuation to slide from one to the next
(apply all postprocessing in the end)
- calculation for number of tiles: sometimes one less
tile is needed by making the previous last tile
half-full on the right side
- calculation of window margins: fix case if dimension
extends to full image shape
- simplify (identifiers, slicing etc)
in `estimate_skew_contours()`, distinguish between angle stats
scattering around <45° vs >45°: in the latter case, use modulo
180° for averages - to avoid cancelling out +90° with -90°
- introduce `config_params` key `reload_weights`
- add respective section for all model types:
- build fresh model from code
- load existing weights from `dir_of_start_model`
- save to `dir_output` under same basename as existing model
(but without optimizer and metrics; which does not work currently)
- exit immediately (i.e. no actual training)
- reorder so reload_weights is after compilation but before data loading
- move `extract_page()` to the start (right after enhancement),
so early layout and textline model prediction sees cropped
image
- `extract_page()`: also return page mask
- `get_early_layout()`:
* use cropped image
* also run optional table prediction here,
map table label and confidence already
(so no need to pass these arrays everywhere)
* suppress all non-text type regions in textline mask
* also return text+table mask
(so no need to reconstruct it everywhere)
- apply page mask to textline mask and early layout result
(i.e. suppress areas beyond border contour)
- `run_graphics_and_columns()`:
* rename → `run_columns()`
* no table prediction here
* no page extraction here
* no page cropping+masking here
* no textline mask suppression here
- `run_graphics_and_columns_without_layout()`: drop
(not needed anymore)
- `run_marginals()` vs. `get_marginals()`: extract
`text_mask` internally from early layout
- early page cropping for col-classifier:
also use cropped image in input binarization mode
- early page cropping for col-classifier:
get external contours instead of indiscriminate tree
- writer: skip layout mode now also uses cropped coordinates
(so drop kwarg for it)
for local (within-box) ordering of region contours, use the same
text mask (merely eroded) as for the contour extraction itself:
the text+table+drop mask from early+full layout prediction,
rather than the textline mask, because the latter may be empty
in some boxes and is unlikely to be more useful than the region
mask itself
(as there are valid cases where both left and right marginalia
is present) follow-up 4bdea39 by re-allowing left point _and_
right point - but still score-based, and not if very asymmetric
- `get_marginals` modifies region labels in-place anyways,
so no need for retval
- de/rotate only inside `get_marginals` (for consistency)
- return early if no marginals detected
- `run_marginals`: only useful in 1 or 2 columns, so keep to
that conditional branch; allows avoiding unnecessary resizing
of images to and fro
- rename `text_regions_p_1` → `text_regions_p`
in search of valid peaks (gaps between text columns),
- drop absolute values for minimum gap depth
(likely crafted for some fixed resolution examples)
- instead, use criterion relative to maximum column depth
and page height (trying to loosely approximate the prior
constants, albeit somewhat more permissive)
in search of valid (above threshold) peaks:
- do not just pick right-most left and left-most right span;
- instead,
* if no peaks on the left, then only search right
* if no peaks on the right, then only search left
* if peaks on both sides, then only better side
(so never return marginals on both sides!)
* use scoring for peaks that reflects their peak
prominence and peak height (but keep positional
range constraints for what constitues left and right)
- rename `thickness_along_y_percent` →
`max_textline_thickness_percent`
- rename `marginlas_should_be_main_text` →
`main_text_should_be_marginals`
- constrain `find_peaks()` by prominence and distance
- simplify (a lot)
- add comments for possible improvements
and for plotting