(as there are valid cases where both left and right marginalia
is present) follow-up 4bdea39 by re-allowing left point _and_
right point - but still score-based, and not if very asymmetric
- `get_marginals` modifies region labels in-place anyways,
so no need for retval
- de/rotate only inside `get_marginals` (for consistency)
- return early if no marginals detected
- `run_marginals`: only useful in 1 or 2 columns, so keep to
that conditional branch; allows avoiding unnecessary resizing
of images to and fro
- rename `text_regions_p_1` → `text_regions_p`
in search of valid peaks (gaps between text columns),
- drop absolute values for minimum gap depth
(likely crafted for some fixed resolution examples)
- instead, use criterion relative to maximum column depth
and page height (trying to loosely approximate the prior
constants, albeit somewhat more permissive)
in search of valid (above threshold) peaks:
- do not just pick right-most left and left-most right span;
- instead,
* if no peaks on the left, then only search right
* if no peaks on the right, then only search left
* if peaks on both sides, then only better side
(so never return marginals on both sides!)
* use scoring for peaks that reflects their peak
prominence and peak height (but keep positional
range constraints for what constitues left and right)
- rename `thickness_along_y_percent` →
`max_textline_thickness_percent`
- rename `marginlas_should_be_main_text` →
`main_text_should_be_marginals`
- constrain `find_peaks()` by prominence and distance
- simplify (a lot)
- add comments for possible improvements
and for plotting
- use new `rotate_image_enlarge` instead of
custom (insufficient) padding w/ `rotate_image`
- get external contours instead of tree
(without checking hierarchy afterwards)
- use largest textline contours by area instead of
longest polygon path
- always use `separate_lines` (but without its incorrect
angle/offset calculations) instead of `separate_lines_vertical_cont`
- calculate coordinate transformation (shift, angle)
for all cases (including >45°)
- simplify
- use relative images, cropped to parent bbox (faster)
- no `scale` parameter (unused)
- use largest textline contours by area instead of first
- simplify
- return early if textline mask is empty
- intersect textline mask with parent mask
(so neighbouring, truncated textlines
will not interfere)
- fix bug when resulting angle is small:
rather, compare with page angle
- if there is more than 1 line in the region,
* use median instead of mean to estimate y_diff
* if height dominates over width and x_diff
over y_diff, then assume 90°: transpose image,
deskew on that, then add 90° to result
- otherwise instead of just using page angle,
try to estimate single-line angle by approximating
slope of linear x-y regression on mask image;
again, if height dominates over width, then
assume +90° and use transposed image
- drop unused `scale` param
- when merging large line with small lines,
don't use first new contour but largest
- get external contours instead of tree
(without checking hierarchy afterwards)
- simplify
- rename `get_regions()` → `get_early_layout()`
- split up `run_boxes_no/full_layout()` into shared
* `get_full_layout()` (for lapping mapping,
table decoding and optional full model prediction)
* `get_deskewed_masks()` (for de-rotation)
* extraction of various region types (polygons and confidences)
* `run_boxes_order()` (for column detection and box ordering)
- rename `contours_tables` → `polygons_of_tables`
This further reduces redundant code, avoids splitting up the same
functionality across different places depending on mode etc.
- `run_single`: re-use `return_contours_of_interested_region`
for extraction and filtering of text region contours
- `run_single`: isolate new function `match_deskewed_contours`
- `run_single`: apply dilation afterwards
- rename `contours_only_text_parent_d_ordered` → `polygons_of_textregions_d`
- rename `contours_only_text_parent` → `polygons_of_textregions`
- rename `contours_only_text_parent_h` → `polygons_of_textregions_h`
- `do_work_of_slopes_new_curved` and `get_slopes_and_deskew_new_curved`:
no need for `mask_texts_only` array arg
- `filter_contours_inside_a_bigger_one`: no need for `image` as array arg,
simplify
- `split_textregion_main_vs_head`: simplify, re-order arguments
and return tuple logically
- if no main text regions are found, just convert marginals to main text
and continue normally instead of stopping early w/ empty marginals (i.e.
no textlines)
- do_order_of_regions_with_model:
* add `polygons_of_drop_capitals`, order these indices as well
(model was not trained for this, but it works)
* explicit label identifiers instead of number literals
* map marginals and images correctly
* simplify (a lot)
* reduce inference batch size to accomodate 8 GB VRAM GPUs
- return_indexes_of_contours_located_inside_another_list_of_contours:
simplify
- pass on probabilities from predicted class everywhere
- rename `confidence_matrix` → `confidence_regions` / `regions_confidence`
- rename `get_textregion_confidences()` → `get_region_confidences()`
- add same for tables, textlines and regionsfl (full layout model)
- aggregate per-region confidence lists for image, table, drop-capital,
left marginal and right marginal regions
- add in writer
- simplify/re-indent some
- try to replace more number literals with class label identifiers
- re-introduce boosting `heading` thresholding broken
when refactoring (light version and do_prediction)
- also return confidence for full layout prediction
1. use connected component analysis to get unique segments
in early prediction result
2. for each drop-capital segment in full prediction result,
find matching early segment
3. when they have high overlap, assign drop-capital label
to the entire early segment
- rename `putt_bb_of_drop_capitals_of_model_in_patches_in_layout`
→ `fill_bb_of_drop_capitals`
- also allow image (besides text) label in early layout prediction
result when checking if entire bbox can be filled (as opposed to
just drop-capital | image | background mask)
- simplify
fix bug where in non-full mode, the wrong class label was assumed
for separator regions (3 in non- vs 6 in full layout mode):
- pass in separator mask instead of full segmentation map
- rename for clarity:
- `regions_without_separators` → `text_mask` (alread binary)
- `regions_with_separators` → `sep_mask` (now just binary)
(thresholding and decoding with artificial boundary class can
overwrite existing column separators, which in turn can contribute
to missing column boundaries; this prioritises seps over boundaries,
which does not impair separation of instances, as seps will separate
text/image/etc instances just as well as artificial boundaries)