You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
5.2 KiB
5.2 KiB
Change Log
Versioned according to Semantic Versioning.
Unreleased
[0.4.0] - 2025-04-07
Fixed:
- allow empty imports for optional dependencies
- avoid Numpy warnings (empty slices etc)
- remove deprecated Numpy types
- binarization CLI: make
dir_in
usable again
Added:
- Continuous Deployment via Dockerhub and GHCR
- CI: also test CLIs and OCR-D
- CI: measure code coverage, annotate+upload reports
- smoke-test: also check results
- smoke-test: also test sbb-binarize
- ocrd-test: analog for OCR-D CLI (segment and binarize)
- pytest: add asserts, extend coverage, use subtests for various options
- pytest: also add binarization
- pytest: add
dir_in
mode (segment and binarize) - make install: control optional dependencies via
EXTRAS
variable - OCR-D: expose and describe recently added parameters:
ignore_page_extraction
allow_enhancement
textline_light
right_to_left
- OCR-D: 🔥 integrate ocrd-sbb-binarize
- add detection confidence in
TextRegion/Coords/@conf
(but only in light version and not for marginalia)
Changed:
-
Docker build: simplify, w/
OCR
, conform to OCR-D spec -
OCR-D: 🔥 migrate to core v3
- initialize+setup only once
- restrict number of parallel page workers to 1 (conflicts with existing multiprocessing; TF parts not mp-compatible)
- do query maximally annotated page image
(but filtering existing binarization/cropping/deskewing),
rebase (as new
@imageFilename
) if necessary - add behavioural docstring
-
🔥 refactor
Eynollah
API:- no more data (kw)args at init,
but kwargs
dir_in
/image_filename
forrun()
- no more data attributes, but function kwargs
(
pcgts
,image_filename
,image_pil
,dir_in
,override_dpi
) - remove redundant TF session/model loaders (only load once during init)
- factor
run_single()
out ofrun()
(loop body), expose for independent calls (like OCR-D) - expose
cache_images()
, adddpi
kwarg, setself._imgs
- single-image mode writes PAGE file result (just as directory mode does)
- no more data (kw)args at init,
but kwargs
-
CLI: assertions (instead of print+exit) for options checks
-
light mode: fine-tune ratio to better detect a region as header
0.3.1 - 2024-08-27
Fixed:
- regression in OCR-D processor, #106
- Expected Ptrcv::UMat for argument 'contour', #110
- Memory usage explosion with very narrow images (e.g. book spine), #67
0.3.0 - 2023-05-13
Changed:
- Eynollah light integration, #86
- use PEP420 style qurator namespace, #97
- set_memory_growth to all GPU devices alike, #100
Fixed:
- PAGE-XML coordinates can have self-intersections, #20
- reading order representation (XML order vs index), #22
- allow cropping separately, #26
- Order of regions, #51
- error while running inference, #75
- Eynollah crashes while processing image, #77
- ValueError: bad marshal data, #87
- contour extraction: inhomogeneous shape, #92
- Confusing model dir variables, #93
- New release?, #96
0.2.0 - 2023-03-24
Changed:
- Convert default model from HDFS to TF SavedModel, #91
Added:
0.1.0 - 2023-03-22
Fixed:
Changed:
0.0.11 - 2022-02-02
Fixed:
models
parameter should havecontent-type
, #61, OCR-D/core#777
0.0.10 - 2021-09-27
Fixed:
- call to
uild_pagexml_no_full_layout
for empty pages, #52
0.0.9 - 2021-08-16
Added:
- Table detection, #48
Fixed:
- Catch exception, #47
0.0.8 - 2021-07-27
Fixed:
pc:PcGts/@pcGtsId
was not set, #49
0.0.7 - 2021-07-01
Fixed:
0.0.6 - 2021-06-22
Fixed:
- Cast arguments to opencv2 to python native types, #43, #44, opencv/opencv#20186
0.0.5 - 2021-05-19
Changed:
- Remove
allow_enhancement
parameter, #42
0.0.4 - 2021-05-18
- fix contour bug, #40
0.0.3 - 2021-05-11
- fix NaN bug, #38
0.0.2 - 2021-05-04
Fixed:
- prevent negative coordinates for textlines in marginals
- fix a bug in the contour logic, #38
- the binarization model is added into the models and now binarization of input can be done at the first stage of eynollah's pipline. This option can be turned on by -ib (-input_binary) argument. This is suggested for very dark or bright documents
0.0.1 - 2021-04-22
Initial release