📝 update changelog

This commit is contained in:
Robert Sachunsky 2025-10-09 20:12:06 +02:00
parent e1b56d97da
commit cab392601e

View file

@ -15,11 +15,17 @@ Fixed:
* `get_smallest_skew`: after shifting search range of rotation angle, use overall best result * `get_smallest_skew`: after shifting search range of rotation angle, use overall best result
* Dockerfile: fix CUDA installation (cuDNN contested between Torch and TF due to extra OCR) * Dockerfile: fix CUDA installation (cuDNN contested between Torch and TF due to extra OCR)
* OCR: re-instate missing methods and fix `utils_ocr` function calls * OCR: re-instate missing methods and fix `utils_ocr` function calls
* mbreorder/enhancement CLIs: missing imports
* :fire: writer: `SeparatorRegion` needs `SeparatorRegionType` (not `ImageRegionType`) * :fire: writer: `SeparatorRegion` needs `SeparatorRegionType` (not `ImageRegionType`)
f458e3e f458e3e
* tests: switch from `pytest-subtests` to `parametrize` so we can use `pytest-isolate` * tests: switch from `pytest-subtests` to `parametrize` so we can use `pytest-isolate`
(so CUDA memory gets freed between tests if running on GPU) (so CUDA memory gets freed between tests if running on GPU)
Added:
* test coverage for OCR options in `layout`
* test coverage for table detection in `layout`
* CI linting with ruff
Changed: Changed:
* polygons: slightly widen for regions and lines, increase for separators * polygons: slightly widen for regions and lines, increase for separators
@ -28,7 +34,19 @@ Changed:
but use shared memory if necessary, and switch back from `loky` to stdlib, but use shared memory if necessary, and switch back from `loky` to stdlib,
and shutdown in `del()` instead of `atexit` and shutdown in `del()` instead of `atexit`
* :fire: OCR: switch CNN-RNN model to `20250930` version compatible with TF 2.12 on CPU, too * :fire: OCR: switch CNN-RNN model to `20250930` version compatible with TF 2.12 on CPU, too
* OCR: allow running `-tr` without `-fl`, too
* :fire: writer: use `@type='heading'` instead of `'header'` for headings * :fire: writer: use `@type='heading'` instead of `'header'` for headings
* :fire: performance gains via refactoring (simplification, less copy-code, vectorization,
avoiding unused calculations, avoiding unnecessary 3-channel image operations)
* :fire: heuristic reading order detection: many improvements
- contour vs splitter box matching:
* contour must be contained in box exactly instead of heuristics
* make fallback center matching, center must be contained in box
- original vs deskewed contour matching:
* same min-area filter on both sides
* similar area score in addition to center proximity
* avoid duplicate and missing mappings by allowing N:M
matches and splitting+joining where necessary
* CI: update+improve model caching * CI: update+improve model caching