Commit graph

716 commits

Author SHA1 Message Date
Robert Sachunsky
56cc179d35 pytest: add tests for directory mode (layout+bin) 2025-04-05 01:20:38 +02:00
Robert Sachunsky
a3e1b3d4d5 pytest: add asserts for results, add binarization 2025-04-04 23:37:00 +02:00
Robert Sachunsky
b03116f4a6 pytest: use subtests for various layout options, add coverage 2025-04-04 22:22:50 +02:00
Robert Sachunsky
91a340f619 CLI: simplify option checks to asserts (also avoid stack trace) 2025-04-04 20:42:28 +02:00
Robert Sachunsky
e0a7fde537 logger: fix type hint 2025-04-04 20:27:15 +02:00
Robert Sachunsky
108ce1f5a1 Merge remote-tracking branch 'origin/main' into v3-api-release-foreal
(bad-ass difficult diff diffing)
2025-04-04 20:23:23 +02:00
Konstantin Baierer
e0d38517d3
Merge pull request #130 from qurator-spk/v3-api
port processor to core v3
2025-04-04 16:01:45 +02:00
vahidrezanezhad
2e3a29f66b In light mode: To determine whether a main region is a header, I adjusted the ratio to achieve better results. 2025-04-04 15:36:31 +02:00
Konstantin Baierer
85566c2186
Merge pull request #148 from bertsky/v3-api
fix, merge, resolve conflicts, apply review, migrate sbb-binarize
2025-04-04 13:31:00 +02:00
Robert Sachunsky
1a0b9d1958
Merge pull request #1 from bertsky/v3-api-refactor-init
refactoring of Eynollah init and model loading
2025-04-04 13:30:23 +02:00
vahidrezanezhad
38a2d60fa2 Confidence value for textregions and in the case of not light version is set to zero. This is done to let the pipeline go through. It will be updated to return the correct value in upcomming commits 2025-04-03 12:47:27 +02:00
vahidrezanezhad
6b52da227c docorating eynollah with textregion confidence score #135 2025-04-03 00:39:21 +02:00
Robert Sachunsky
559d001eef another fix to avoid frequent warnings 2025-04-02 05:45:34 +00:00
Robert Sachunsky
dd478279a4 CLI: also --overwrite in single-image mode 2025-04-02 05:40:21 +00:00
Robert Sachunsky
8159e6336a fix typo (preventing log messages) 2025-04-02 00:01:02 +00:00
Robert Sachunsky
2919538382 minor fixes to avoid frequent warnings 2025-04-01 23:33:26 +00:00
Robert Sachunsky
903c87aca0 update readme (OCR-D section) 2025-04-01 23:26:38 +02:00
Robert Sachunsky
dcf2ed5e22 run: also write out XML in single filename mode 2025-04-01 23:13:24 +02:00
Robert Sachunsky
fe77171d45 run_single: reduce indentation 2025-04-01 22:47:33 +02:00
Robert Sachunsky
c7dc952851 smoke-test: also test dir-in mode and overwrite 2025-04-01 22:43:30 +02:00
Robert Sachunsky
79003a083c CLI: ValueError instead of print+exit 2025-04-01 22:43:01 +02:00
Robert Sachunsky
e17d34fafa factor run_single() out of run(), simplify kwargs 2025-04-01 22:12:24 +02:00
Robert Sachunsky
1a0a1cb00b remove session methods and redundant model loaders 2025-04-01 21:15:41 +02:00
Robert Sachunsky
ab3da17547
Update requirements.txt
Co-authored-by: Konstantin Baierer <kba@users.noreply.github.com>
2025-04-01 18:13:28 +02:00
Robert Sachunsky
dd51f900b9 OCR-D: init Eynollah in 'setup', re-use instance for each page via non-public API 2025-04-01 13:02:30 +02:00
Robert Sachunsky
ffeb4a343d Eynollah: remove useless 'pcgts' attr 2025-04-01 13:00:41 +02:00
Robert Sachunsky
9dc33db108 CI: add binarization models to cache 2025-04-01 11:36:56 +02:00
Robert Sachunsky
9c769d4cc5 CI: run CLI tests, too 2025-04-01 11:13:16 +02:00
Robert Sachunsky
250fc02606 add tests for binarization, remove dependency on deps-test 2025-04-01 11:13:04 +02:00
vahidrezanezhad
91b2201b07 cnnrnn Ocr: width of input textline image can not be zero! 2025-04-01 10:55:40 +02:00
Robert Sachunsky
515b4023f6 sbb_binarize: fix missing reference 2025-04-01 10:54:36 +02:00
Robert Sachunsky
95a681aa8c add Continuous Deployment via Dockerhub and GHCR 2025-04-01 01:31:00 +02:00
Robert Sachunsky
df3510750c
Github Actions CI: no more Docker clean or build 2025-04-01 00:28:16 +02:00
Robert Sachunsky
45e3ab9692
Github Actions: free space: all existing Docker images 2025-04-01 00:23:53 +02:00
vahidrezanezhad
4de441eaaa OCR prediction is now enabled to integrate results from both RGB and binarized images or to be performed on each individually 2025-03-31 21:28:05 +02:00
vahidrezanezhad
b1da0a3327 In OCR, the predicted text is now drawn on the image, and the results are saved in a specified directory. This makes it easier to review the predicted output 2025-03-31 18:43:14 +02:00
Robert Sachunsky
31aeb9629d
Github Actions: free space more aggressively 2025-03-31 18:16:17 +02:00
Robert Sachunsky
7430b57b65 dockerfile: add smoke test 2025-03-31 16:56:47 +02:00
Robert Sachunsky
f35f49376e run CLI test in TMPDIR, add ocrd-test 2025-03-31 16:55:57 +02:00
Robert Sachunsky
ae066388ea docker: no need for g++, but install w/ 'EXTRAS=OCR' 2025-03-31 15:58:57 +02:00
Robert Sachunsky
722b5c6bf1 add make variable EXTRAS for optional dependencies 2025-03-31 15:58:12 +02:00
Robert Sachunsky
c01609ff4e allow even more empty imports for optional dependencies 2025-03-31 15:57:22 +02:00
Robert Sachunsky
51e9bfd6d7 improve+extend dockerfile 2025-03-31 14:14:08 +02:00
Robert Sachunsky
09248d4829 improve+extend makefile 2025-03-31 14:13:16 +02:00
Robert Sachunsky
46618f4229 allow more empty imports for optional dependencies 2025-03-31 14:11:50 +02:00
Robert Sachunsky
4be89910a2 CLI: fix arg vs kwarg from merge 2025-03-31 02:38:24 +02:00
Robert Sachunsky
9d61acf173 simplify 2025-03-31 02:02:30 +02:00
Robert Sachunsky
a1068ff2eb OCR-D: move sbb-binarize to ocrd-tool.json, update to v3 2025-03-31 01:47:32 +02:00
Robert Sachunsky
c794d4d29f OCR-D: fix typo light_mode→light_version 2025-03-31 01:46:29 +02:00
Robert Sachunsky
4338259ca1 OCR-D: ensure page image gets replaced in result as well if not the original file 2025-03-31 01:17:14 +02:00