Commit graph

687 commits

Author SHA1 Message Date
Konstantin Baierer
e0d38517d3
Merge pull request #130 from qurator-spk/v3-api
port processor to core v3
2025-04-04 16:01:45 +02:00
vahidrezanezhad
2e3a29f66b In light mode: To determine whether a main region is a header, I adjusted the ratio to achieve better results. 2025-04-04 15:36:31 +02:00
Konstantin Baierer
85566c2186
Merge pull request #148 from bertsky/v3-api
fix, merge, resolve conflicts, apply review, migrate sbb-binarize
2025-04-04 13:31:00 +02:00
Robert Sachunsky
1a0b9d1958
Merge pull request #1 from bertsky/v3-api-refactor-init
refactoring of Eynollah init and model loading
2025-04-04 13:30:23 +02:00
Robert Sachunsky
559d001eef another fix to avoid frequent warnings 2025-04-02 05:45:34 +00:00
Robert Sachunsky
dd478279a4 CLI: also --overwrite in single-image mode 2025-04-02 05:40:21 +00:00
Robert Sachunsky
8159e6336a fix typo (preventing log messages) 2025-04-02 00:01:02 +00:00
Robert Sachunsky
2919538382 minor fixes to avoid frequent warnings 2025-04-01 23:33:26 +00:00
Robert Sachunsky
903c87aca0 update readme (OCR-D section) 2025-04-01 23:26:38 +02:00
Robert Sachunsky
dcf2ed5e22 run: also write out XML in single filename mode 2025-04-01 23:13:24 +02:00
Robert Sachunsky
fe77171d45 run_single: reduce indentation 2025-04-01 22:47:33 +02:00
Robert Sachunsky
c7dc952851 smoke-test: also test dir-in mode and overwrite 2025-04-01 22:43:30 +02:00
Robert Sachunsky
79003a083c CLI: ValueError instead of print+exit 2025-04-01 22:43:01 +02:00
Robert Sachunsky
e17d34fafa factor run_single() out of run(), simplify kwargs 2025-04-01 22:12:24 +02:00
Robert Sachunsky
1a0a1cb00b remove session methods and redundant model loaders 2025-04-01 21:15:41 +02:00
Robert Sachunsky
ab3da17547
Update requirements.txt
Co-authored-by: Konstantin Baierer <kba@users.noreply.github.com>
2025-04-01 18:13:28 +02:00
Robert Sachunsky
dd51f900b9 OCR-D: init Eynollah in 'setup', re-use instance for each page via non-public API 2025-04-01 13:02:30 +02:00
Robert Sachunsky
ffeb4a343d Eynollah: remove useless 'pcgts' attr 2025-04-01 13:00:41 +02:00
Robert Sachunsky
9dc33db108 CI: add binarization models to cache 2025-04-01 11:36:56 +02:00
Robert Sachunsky
9c769d4cc5 CI: run CLI tests, too 2025-04-01 11:13:16 +02:00
Robert Sachunsky
250fc02606 add tests for binarization, remove dependency on deps-test 2025-04-01 11:13:04 +02:00
Robert Sachunsky
515b4023f6 sbb_binarize: fix missing reference 2025-04-01 10:54:36 +02:00
Robert Sachunsky
95a681aa8c add Continuous Deployment via Dockerhub and GHCR 2025-04-01 01:31:00 +02:00
Robert Sachunsky
df3510750c
Github Actions CI: no more Docker clean or build 2025-04-01 00:28:16 +02:00
Robert Sachunsky
45e3ab9692
Github Actions: free space: all existing Docker images 2025-04-01 00:23:53 +02:00
Robert Sachunsky
31aeb9629d
Github Actions: free space more aggressively 2025-03-31 18:16:17 +02:00
Robert Sachunsky
7430b57b65 dockerfile: add smoke test 2025-03-31 16:56:47 +02:00
Robert Sachunsky
f35f49376e run CLI test in TMPDIR, add ocrd-test 2025-03-31 16:55:57 +02:00
Robert Sachunsky
ae066388ea docker: no need for g++, but install w/ 'EXTRAS=OCR' 2025-03-31 15:58:57 +02:00
Robert Sachunsky
722b5c6bf1 add make variable EXTRAS for optional dependencies 2025-03-31 15:58:12 +02:00
Robert Sachunsky
c01609ff4e allow even more empty imports for optional dependencies 2025-03-31 15:57:22 +02:00
Robert Sachunsky
51e9bfd6d7 improve+extend dockerfile 2025-03-31 14:14:08 +02:00
Robert Sachunsky
09248d4829 improve+extend makefile 2025-03-31 14:13:16 +02:00
Robert Sachunsky
46618f4229 allow more empty imports for optional dependencies 2025-03-31 14:11:50 +02:00
Robert Sachunsky
4be89910a2 CLI: fix arg vs kwarg from merge 2025-03-31 02:38:24 +02:00
Robert Sachunsky
9d61acf173 simplify 2025-03-31 02:02:30 +02:00
Robert Sachunsky
a1068ff2eb OCR-D: move sbb-binarize to ocrd-tool.json, update to v3 2025-03-31 01:47:32 +02:00
Robert Sachunsky
c794d4d29f OCR-D: fix typo light_mode→light_version 2025-03-31 01:46:29 +02:00
Robert Sachunsky
4338259ca1 OCR-D: ensure page image gets replaced in result as well if not the original file 2025-03-31 01:17:14 +02:00
Robert Sachunsky
55969b0173 OCR-D: add docstring 2025-03-31 01:15:26 +02:00
Robert Sachunsky
3916474b8b OCR-D: require >=v3.1 2025-03-31 01:15:12 +02:00
Robert Sachunsky
6d02e90570 OCR-D: restrict max_workers=1 2025-03-31 01:14:54 +02:00
Robert Sachunsky
efd3fa6775 allow empty imports for optional dependencies 2025-03-31 00:32:26 +02:00
Robert Sachunsky
238132e260 use 'image_filename' for pseudo-iteration outside 'dir_in' mode 2025-03-31 00:31:49 +02:00
Robert Sachunsky
af4e2a4ffc do not require 'dir_out' outside 'dir_in' mode 2025-03-31 00:31:09 +02:00
Robert Sachunsky
ea136e3ddd 'overwrite' check: only in 'dir_in' mode 2025-03-31 00:30:06 +02:00
Robert Sachunsky
1f4a17b60d Merge remote-tracking branch 'origin/machine_based_reading_order_integration' into v3-api 2025-03-30 21:21:59 +02:00
Robert Sachunsky
edf924c2cb ocrd-tool: add dockerhub 2025-03-30 19:47:25 +02:00
vahidrezanezhad
d3a4c06e7f This commit enables the export of cropped text line images along with their corresponding texts from a Page-XML file. These exported text line images and texts can be utilized for training a text line-based OCR model. 2025-03-20 18:21:44 +01:00
vahidrezanezhad
c8b8529951 For the CNN-RNN OCR model, long text lines are split into two segments 2025-03-17 19:50:58 +01:00