You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
eynollah/CHANGELOG.md

5.2 KiB

Change Log

Versioned according to Semantic Versioning.

Unreleased

[0.4.0] - 2025-04-07

Fixed:

  • allow empty imports for optional dependencies
  • avoid Numpy warnings (empty slices etc)
  • remove deprecated Numpy types
  • binarization CLI: make dir_in usable again

Added:

  • Continuous Deployment via Dockerhub and GHCR
  • CI: also test CLIs and OCR-D
  • CI: measure code coverage, annotate+upload reports
  • smoke-test: also check results
  • smoke-test: also test sbb-binarize
  • ocrd-test: analog for OCR-D CLI (segment and binarize)
  • pytest: add asserts, extend coverage, use subtests for various options
  • pytest: also add binarization
  • pytest: add dir_in mode (segment and binarize)
  • make install: control optional dependencies via EXTRAS variable
  • OCR-D: expose and describe recently added parameters:
    • ignore_page_extraction
    • allow_enhancement
    • textline_light
    • right_to_left
  • OCR-D: 🔥 integrate ocrd-sbb-binarize
  • add detection confidence in TextRegion/Coords/@conf (but only in light version and not for marginalia)

Changed:

  • Docker build: simplify, w/ OCR, conform to OCR-D spec

  • OCR-D: 🔥 migrate to core v3

    • initialize+setup only once
    • restrict number of parallel page workers to 1 (conflicts with existing multiprocessing; TF parts not mp-compatible)
    • do query maximally annotated page image (but filtering existing binarization/cropping/deskewing), rebase (as new @imageFilename) if necessary
    • add behavioural docstring
  • 🔥 refactor Eynollah API:

    • no more data (kw)args at init, but kwargs dir_in / image_filename for run()
    • no more data attributes, but function kwargs (pcgts, image_filename, image_pil, dir_in, override_dpi)
    • remove redundant TF session/model loaders (only load once during init)
    • factor run_single() out of run() (loop body), expose for independent calls (like OCR-D)
    • expose cache_images(), add dpi kwarg, set self._imgs
    • single-image mode writes PAGE file result (just as directory mode does)
  • CLI: assertions (instead of print+exit) for options checks

  • light mode: fine-tune ratio to better detect a region as header

0.3.1 - 2024-08-27

Fixed:

  • regression in OCR-D processor, #106
  • Expected Ptrcv::UMat for argument 'contour', #110
  • Memory usage explosion with very narrow images (e.g. book spine), #67

0.3.0 - 2023-05-13

Changed:

  • Eynollah light integration, #86
  • use PEP420 style qurator namespace, #97
  • set_memory_growth to all GPU devices alike, #100

Fixed:

  • PAGE-XML coordinates can have self-intersections, #20
  • reading order representation (XML order vs index), #22
  • allow cropping separately, #26
  • Order of regions, #51
  • error while running inference, #75
  • Eynollah crashes while processing image, #77
  • ValueError: bad marshal data, #87
  • contour extraction: inhomogeneous shape, #92
  • Confusing model dir variables, #93
  • New release?, #96

0.2.0 - 2023-03-24

Changed:

  • Convert default model from HDFS to TF SavedModel, #91

Added:

  • parmeter tables to toggle table detectino, #91
  • default model described in ocrd-tool.json, #91

0.1.0 - 2023-03-22

Fixed:

  • Do not produce spurious TextEquiv, #68
  • Less spammy logging, #64, #65, #71

Changed:

  • Upgrade to tensorflow 2.4.0, #74
  • Improved README
  • CI: test for python 3.7+, #90

0.0.11 - 2022-02-02

Fixed:

0.0.10 - 2021-09-27

Fixed:

  • call to uild_pagexml_no_full_layout for empty pages, #52

0.0.9 - 2021-08-16

Added:

  • Table detection, #48

Fixed:

  • Catch exception, #47

0.0.8 - 2021-07-27

Fixed:

  • pc:PcGts/@pcGtsId was not set, #49

0.0.7 - 2021-07-01

Fixed:

  • slopes/slopes_h retval/arguments mixed up, #45, #46

0.0.6 - 2021-06-22

Fixed:

0.0.5 - 2021-05-19

Changed:

  • Remove allow_enhancement parameter, #42

0.0.4 - 2021-05-18

  • fix contour bug, #40

0.0.3 - 2021-05-11

  • fix NaN bug, #38

0.0.2 - 2021-05-04

Fixed:

  • prevent negative coordinates for textlines in marginals
  • fix a bug in the contour logic, #38
  • the binarization model is added into the models and now binarization of input can be done at the first stage of eynollah's pipline. This option can be turned on by -ib (-input_binary) argument. This is suggested for very dark or bright documents

0.0.1 - 2021-04-22

Initial release