for local (within-box) ordering of region contours, use the same
text mask (merely eroded) as for the contour extraction itself:
the text+table+drop mask from early+full layout prediction,
rather than the textline mask, because the latter may be empty
in some boxes and is unlikely to be more useful than the region
mask itself
(as there are valid cases where both left and right marginalia
is present) follow-up 4bdea39 by re-allowing left point _and_
right point - but still score-based, and not if very asymmetric
- `get_marginals` modifies region labels in-place anyways,
so no need for retval
- de/rotate only inside `get_marginals` (for consistency)
- return early if no marginals detected
- `run_marginals`: only useful in 1 or 2 columns, so keep to
that conditional branch; allows avoiding unnecessary resizing
of images to and fro
- rename `text_regions_p_1` → `text_regions_p`
in search of valid peaks (gaps between text columns),
- drop absolute values for minimum gap depth
(likely crafted for some fixed resolution examples)
- instead, use criterion relative to maximum column depth
and page height (trying to loosely approximate the prior
constants, albeit somewhat more permissive)
in search of valid (above threshold) peaks:
- do not just pick right-most left and left-most right span;
- instead,
* if no peaks on the left, then only search right
* if no peaks on the right, then only search left
* if peaks on both sides, then only better side
(so never return marginals on both sides!)
* use scoring for peaks that reflects their peak
prominence and peak height (but keep positional
range constraints for what constitues left and right)
- rename `thickness_along_y_percent` →
`max_textline_thickness_percent`
- rename `marginlas_should_be_main_text` →
`main_text_should_be_marginals`
- constrain `find_peaks()` by prominence and distance
- simplify (a lot)
- add comments for possible improvements
and for plotting
- use new `rotate_image_enlarge` instead of
custom (insufficient) padding w/ `rotate_image`
- get external contours instead of tree
(without checking hierarchy afterwards)
- use largest textline contours by area instead of
longest polygon path
- always use `separate_lines` (but without its incorrect
angle/offset calculations) instead of `separate_lines_vertical_cont`
- calculate coordinate transformation (shift, angle)
for all cases (including >45°)
- simplify
- use relative images, cropped to parent bbox (faster)
- no `scale` parameter (unused)
- use largest textline contours by area instead of first
- simplify
- return early if textline mask is empty
- intersect textline mask with parent mask
(so neighbouring, truncated textlines
will not interfere)
- fix bug when resulting angle is small:
rather, compare with page angle
- if there is more than 1 line in the region,
* use median instead of mean to estimate y_diff
* if height dominates over width and x_diff
over y_diff, then assume 90°: transpose image,
deskew on that, then add 90° to result
- otherwise instead of just using page angle,
try to estimate single-line angle by approximating
slope of linear x-y regression on mask image;
again, if height dominates over width, then
assume +90° and use transposed image
- drop unused `scale` param
- when merging large line with small lines,
don't use first new contour but largest
- get external contours instead of tree
(without checking hierarchy afterwards)
- simplify
- rename `get_regions()` → `get_early_layout()`
- split up `run_boxes_no/full_layout()` into shared
* `get_full_layout()` (for lapping mapping,
table decoding and optional full model prediction)
* `get_deskewed_masks()` (for de-rotation)
* extraction of various region types (polygons and confidences)
* `run_boxes_order()` (for column detection and box ordering)
- rename `contours_tables` → `polygons_of_tables`
This further reduces redundant code, avoids splitting up the same
functionality across different places depending on mode etc.
- `run_single`: re-use `return_contours_of_interested_region`
for extraction and filtering of text region contours
- `run_single`: isolate new function `match_deskewed_contours`
- `run_single`: apply dilation afterwards
- rename `contours_only_text_parent_d_ordered` → `polygons_of_textregions_d`
- rename `contours_only_text_parent` → `polygons_of_textregions`
- rename `contours_only_text_parent_h` → `polygons_of_textregions_h`
- `do_work_of_slopes_new_curved` and `get_slopes_and_deskew_new_curved`:
no need for `mask_texts_only` array arg
- `filter_contours_inside_a_bigger_one`: no need for `image` as array arg,
simplify
- `split_textregion_main_vs_head`: simplify, re-order arguments
and return tuple logically
- if no main text regions are found, just convert marginals to main text
and continue normally instead of stopping early w/ empty marginals (i.e.
no textlines)
- do_order_of_regions_with_model:
* add `polygons_of_drop_capitals`, order these indices as well
(model was not trained for this, but it works)
* explicit label identifiers instead of number literals
* map marginals and images correctly
* simplify (a lot)
* reduce inference batch size to accomodate 8 GB VRAM GPUs
- return_indexes_of_contours_located_inside_another_list_of_contours:
simplify
- pass on probabilities from predicted class everywhere
- rename `confidence_matrix` → `confidence_regions` / `regions_confidence`
- rename `get_textregion_confidences()` → `get_region_confidences()`
- add same for tables, textlines and regionsfl (full layout model)
- aggregate per-region confidence lists for image, table, drop-capital,
left marginal and right marginal regions
- add in writer
- simplify/re-indent some
- try to replace more number literals with class label identifiers
- re-introduce boosting `heading` thresholding broken
when refactoring (light version and do_prediction)
- also return confidence for full layout prediction
1. use connected component analysis to get unique segments
in early prediction result
2. for each drop-capital segment in full prediction result,
find matching early segment
3. when they have high overlap, assign drop-capital label
to the entire early segment
- rename `putt_bb_of_drop_capitals_of_model_in_patches_in_layout`
→ `fill_bb_of_drop_capitals`
- also allow image (besides text) label in early layout prediction
result when checking if entire bbox can be filled (as opposed to
just drop-capital | image | background mask)
- simplify