instead of hard cut-offs between overlapping window tiles,
apply sigmoid attenuation to slide from one to the next
(apply all postprocessing in the end)
- calculation for number of tiles: sometimes one less
tile is needed by making the previous last tile
half-full on the right side
- calculation of window margins: fix case if dimension
extends to full image shape
- simplify (identifiers, slicing etc)
in `estimate_skew_contours()`, distinguish between angle stats
scattering around <45° vs >45°: in the latter case, use modulo
180° for averages - to avoid cancelling out +90° with -90°
- move `extract_page()` to the start (right after enhancement),
so early layout and textline model prediction sees cropped
image
- `extract_page()`: also return page mask
- `get_early_layout()`:
* use cropped image
* also run optional table prediction here,
map table label and confidence already
(so no need to pass these arrays everywhere)
* suppress all non-text type regions in textline mask
* also return text+table mask
(so no need to reconstruct it everywhere)
- apply page mask to textline mask and early layout result
(i.e. suppress areas beyond border contour)
- `run_graphics_and_columns()`:
* rename → `run_columns()`
* no table prediction here
* no page extraction here
* no page cropping+masking here
* no textline mask suppression here
- `run_graphics_and_columns_without_layout()`: drop
(not needed anymore)
- `run_marginals()` vs. `get_marginals()`: extract
`text_mask` internally from early layout
- early page cropping for col-classifier:
also use cropped image in input binarization mode
- early page cropping for col-classifier:
get external contours instead of indiscriminate tree
- writer: skip layout mode now also uses cropped coordinates
(so drop kwarg for it)
for local (within-box) ordering of region contours, use the same
text mask (merely eroded) as for the contour extraction itself:
the text+table+drop mask from early+full layout prediction,
rather than the textline mask, because the latter may be empty
in some boxes and is unlikely to be more useful than the region
mask itself
(as there are valid cases where both left and right marginalia
is present) follow-up 4bdea39 by re-allowing left point _and_
right point - but still score-based, and not if very asymmetric
- `get_marginals` modifies region labels in-place anyways,
so no need for retval
- de/rotate only inside `get_marginals` (for consistency)
- return early if no marginals detected
- `run_marginals`: only useful in 1 or 2 columns, so keep to
that conditional branch; allows avoiding unnecessary resizing
of images to and fro
- rename `text_regions_p_1` → `text_regions_p`
in search of valid peaks (gaps between text columns),
- drop absolute values for minimum gap depth
(likely crafted for some fixed resolution examples)
- instead, use criterion relative to maximum column depth
and page height (trying to loosely approximate the prior
constants, albeit somewhat more permissive)
in search of valid (above threshold) peaks:
- do not just pick right-most left and left-most right span;
- instead,
* if no peaks on the left, then only search right
* if no peaks on the right, then only search left
* if peaks on both sides, then only better side
(so never return marginals on both sides!)
* use scoring for peaks that reflects their peak
prominence and peak height (but keep positional
range constraints for what constitues left and right)
- rename `thickness_along_y_percent` →
`max_textline_thickness_percent`
- rename `marginlas_should_be_main_text` →
`main_text_should_be_marginals`
- constrain `find_peaks()` by prominence and distance
- simplify (a lot)
- add comments for possible improvements
and for plotting
- use new `rotate_image_enlarge` instead of
custom (insufficient) padding w/ `rotate_image`
- get external contours instead of tree
(without checking hierarchy afterwards)
- use largest textline contours by area instead of
longest polygon path
- always use `separate_lines` (but without its incorrect
angle/offset calculations) instead of `separate_lines_vertical_cont`
- calculate coordinate transformation (shift, angle)
for all cases (including >45°)
- simplify
- use relative images, cropped to parent bbox (faster)
- no `scale` parameter (unused)
- use largest textline contours by area instead of first
- simplify
- return early if textline mask is empty
- intersect textline mask with parent mask
(so neighbouring, truncated textlines
will not interfere)
- fix bug when resulting angle is small:
rather, compare with page angle
- if there is more than 1 line in the region,
* use median instead of mean to estimate y_diff
* if height dominates over width and x_diff
over y_diff, then assume 90°: transpose image,
deskew on that, then add 90° to result
- otherwise instead of just using page angle,
try to estimate single-line angle by approximating
slope of linear x-y regression on mask image;
again, if height dominates over width, then
assume +90° and use transposed image
- drop unused `scale` param
- when merging large line with small lines,
don't use first new contour but largest
- get external contours instead of tree
(without checking hierarchy afterwards)
- simplify
- rename `get_regions()` → `get_early_layout()`
- split up `run_boxes_no/full_layout()` into shared
* `get_full_layout()` (for lapping mapping,
table decoding and optional full model prediction)
* `get_deskewed_masks()` (for de-rotation)
* extraction of various region types (polygons and confidences)
* `run_boxes_order()` (for column detection and box ordering)
- rename `contours_tables` → `polygons_of_tables`
This further reduces redundant code, avoids splitting up the same
functionality across different places depending on mode etc.