qurator-spk/eynollah

Fork 0

mirror of https://github.com/qurator-spk/eynollah.git synced 2025-10-06 14:39:55 +02:00

Robert Sachunsky 3aa7ad04fa 📝 update changelog

2025-09-30 23:14:52 +02:00

7.1 KiB

Raw Permalink Blame History

Change Log

Versioned according to Semantic Versioning.

Unreleased

Fixed:

🔥 polygons: avoid invalid paths (use Polygon.buffer() instead of dilation etc.)
return_boxes_of_images_by_order_of_reading_new: avoid Numpy.dtype mismatch, simplify
return_boxes_of_images_by_order_of_reading_new: log any exceptions instead of ignoring
filter_contours_without_textline_inside: avoid removing from duplicate lists twice
get_marginals: exit early if no peaks found to avoid spurious overlap mask
get_smallest_skew: after shifting search range of rotation angle, use overall best result
Dockerfile: fix CUDA installation (cuDNN contested between Torch and TF due to extra OCR)
OCR: re-instate missing methods and fix utils_ocr function calls
🔥 writer: SeparatorRegion needs SeparatorRegionType (not ImageRegionType) f458e3e
tests: switch from pytest-subtests to parametrize so we can use pytest-isolate (so CUDA memory gets freed between tests if running on GPU)

Changed:

polygons: slightly widen for regions and lines, increase for separators
various refactorings, some code style and identifier improvements
deskewing/multiprocessing: switch back to ProcessPoolExecutor (faster), but use shared memory if necessary, and switch back from loky to stdlib, and shutdown in del() instead of atexit
🔥 OCR: switch CNN-RNN model to 20250930 version compatible with TF 2.12 on CPU, too
🔥 writer: use @type='heading' instead of 'header' for headings
CI: update+improve model caching

0.5.0 - 2025-09-26

Fixed:

restoring the contour in the original image caused an error due to an empty tuple, #154

Added:

eynollah machine-based-reading-order CLI to run reading order detection, #175
eynollah enhancement CLI to run image enhancement, #175
Improved models for page extraction and reading order detection, #175

0.4.0 - 2025-04-07

Fixed:

allow empty imports for optional dependencies
avoid Numpy warnings (empty slices etc)
remove deprecated Numpy types
binarization CLI: make dir_in usable again

Added:

Continuous Deployment via Dockerhub and GHCR
CI: also test CLIs and OCR-D
CI: measure code coverage, annotate+upload reports
smoke-test: also check results
smoke-test: also test sbb-binarize
ocrd-test: analog for OCR-D CLI (segment and binarize)
pytest: add asserts, extend coverage, use subtests for various options
pytest: also add binarization
pytest: add dir_in mode (segment and binarize)
make install: control optional dependencies via EXTRAS variable
OCR-D: expose and describe recently added parameters:
- ignore_page_extraction
- allow_enhancement
- textline_light
- right_to_left
OCR-D: 🔥 integrate ocrd-sbb-binarize
add detection confidence in TextRegion/Coords/@conf (but only in light version and not for marginalia)

Changed:

Docker build: simplify, w/ OCR, conform to OCR-D spec
OCR-D: 🔥 migrate to core v3
- initialize+setup only once
- restrict number of parallel page workers to 1 (conflicts with existing multiprocessing; TF parts not mp-compatible)
- do query maximally annotated page image (but filtering existing binarization/cropping/deskewing), rebase (as new @imageFilename) if necessary
- add behavioural docstring
🔥 refactor Eynollah API:
- no more data (kw)args at init, but kwargs dir_in / image_filename for run()
- no more data attributes, but function kwargs (pcgts, image_filename, image_pil, dir_in, override_dpi)
- remove redundant TF session/model loaders (only load once during init)
- factor run_single() out of run() (loop body), expose for independent calls (like OCR-D)
- expose cache_images(), add dpi kwarg, set self._imgs
- single-image mode writes PAGE file result (just as directory mode does)
CLI: assertions (instead of print+exit) for options checks
light mode: fine-tune ratio to better detect a region as header

0.3.1 - 2024-08-27

Fixed:

regression in OCR-D processor, #106
Expected Ptrcv::UMat for argument 'contour', #110
Memory usage explosion with very narrow images (e.g. book spine), #67

0.3.0 - 2023-05-13

Changed:

Eynollah light integration, #86
use PEP420 style qurator namespace, #97
set_memory_growth to all GPU devices alike, #100

Fixed:

PAGE-XML coordinates can have self-intersections, #20
reading order representation (XML order vs index), #22
allow cropping separately, #26
Order of regions, #51
error while running inference, #75
Eynollah crashes while processing image, #77
ValueError: bad marshal data, #87
contour extraction: inhomogeneous shape, #92
Confusing model dir variables, #93
New release?, #96

0.2.0 - 2023-03-24

Changed:

Convert default model from HDFS to TF SavedModel, #91

Added:

parmeter tables to toggle table detectino, #91
default model described in ocrd-tool.json, #91

0.1.0 - 2023-03-22

Fixed:

Do not produce spurious TextEquiv, #68
Less spammy logging, #64, #65, #71

Changed:

Upgrade to tensorflow 2.4.0, #74
Improved README
CI: test for python 3.7+, #90

0.0.11 - 2022-02-02

Fixed:

models parameter should have content-type, #61, OCR-D/core#777

0.0.10 - 2021-09-27

Fixed:

call to uild_pagexml_no_full_layout for empty pages, #52

0.0.9 - 2021-08-16

Added:

Table detection, #48

Fixed:

Catch exception, #47

0.0.8 - 2021-07-27

Fixed:

pc:PcGts/@pcGtsId was not set, #49

0.0.7 - 2021-07-01

Fixed:

slopes/slopes_h retval/arguments mixed up, #45, #46

0.0.6 - 2021-06-22

Fixed:

Cast arguments to opencv2 to python native types, #43, #44, opencv/opencv#20186

0.0.5 - 2021-05-19

Changed:

Remove allow_enhancement parameter, #42

0.0.4 - 2021-05-18

fix contour bug, #40

0.0.3 - 2021-05-11

fix NaN bug, #38

0.0.2 - 2021-05-04

Fixed:

prevent negative coordinates for textlines in marginals
fix a bug in the contour logic, #38
the binarization model is added into the models and now binarization of input can be done at the first stage of eynollah's pipline. This option can be turned on by -ib (-input_binary) argument. This is suggested for very dark or bright documents

0.0.1 - 2021-04-22

Initial release

7.1 KiB Raw Permalink Blame History

Change Log

Unreleased

0.5.0 - 2025-09-26

0.4.0 - 2025-04-07

0.3.1 - 2024-08-27

0.3.0 - 2023-05-13

0.2.0 - 2023-03-24

0.1.0 - 2023-03-22

0.0.11 - 2022-02-02

0.0.10 - 2021-09-27

0.0.9 - 2021-08-16

0.0.8 - 2021-07-27

0.0.7 - 2021-07-01

0.0.6 - 2021-06-22

0.0.5 - 2021-05-19

0.0.4 - 2021-05-18

0.0.3 - 2021-05-11

0.0.2 - 2021-05-04

0.0.1 - 2021-04-22

7.1 KiB

Raw Permalink Blame History