|
c3aa48ec3b
|
Merge branch 'master' of https://github.com/qurator-spk/dinglehopper
|
2025-04-24 17:16:06 +02:00 |
|
|
628594ef98
|
📦 v0.11.0
|
2025-04-24 17:14:44 +02:00 |
|
|
5639f3db7f
|
✔ Add a tests that checks if plain text files with BOM are read correctly
|
2025-04-24 16:44:29 +02:00 |
|
|
14a4bc56d8
|
🐛 Add --plain-encoding option to dinglehopper-extract
|
2025-04-22 18:24:35 +02:00 |
|
|
a70260c10e
|
🐛 Use warning() to fix DeprecationWarning
|
2025-04-22 13:57:19 +02:00 |
|
|
224aa02163
|
🚧 Fix help text
|
2025-04-22 13:57:19 +02:00 |
|
|
9db5b4caf5
|
🚧 Add OCR-D parameter for plain text encoding
|
2025-04-22 13:57:19 +02:00 |
|
|
5578ce83a3
|
🚧 Add option for text encoding to line dir cli
|
2025-04-22 13:57:19 +02:00 |
|
|
cf59b951a3
|
🚧 Add option for text encoding to line dir cli
|
2025-04-22 13:57:19 +02:00 |
|
|
480b3cf864
|
✔ Test that CLI produces a complete HTML report
|
2025-04-22 13:57:19 +02:00 |
|
|
f1a586cff1
|
✔ Test line dirs CLI
|
2025-04-22 13:57:18 +02:00 |
|
|
3b16c14c16
|
✔ Properly test line dir finding
|
2025-04-22 13:57:18 +02:00 |
|
|
322faeb26c
|
🎨 Sort imports
|
2025-04-22 13:57:18 +02:00 |
|
|
c37316da09
|
🐛 cli_line_dirs: Fix word differences section
At the time of generation of the section, the {gt,ocr}_words generators
were drained. Fix by using a list.
Fixes gh-124.
|
2025-04-22 13:57:18 +02:00 |
|
|
9414a92f9f
|
🐛 cli_line_dirs: Type-annotate functions
|
2025-04-22 13:57:18 +02:00 |
|
|
68344e48f8
|
🎨 Reformat cli_line_dirs
|
2025-04-22 13:57:18 +02:00 |
|
|
73ee16fe51
|
🚧 Support 'merged' GT+OCR line directories
|
2025-04-22 13:57:18 +02:00 |
|
|
6980d7a252
|
🚧 Use our own removesuffix() as we still support Python 3.8
|
2025-04-22 13:57:18 +02:00 |
|
|
2bf2529c38
|
🚧 Port new line dir functions
|
2025-04-22 13:57:17 +02:00 |
|
|
ad8e6de36b
|
🐛 cli_line_dirs: Fix character diff reports
|
2025-04-22 13:57:17 +02:00 |
|
|
4024e350f7
|
🚧 Test new flexible line dirs functions
|
2025-04-22 13:57:17 +02:00 |
|
|
817e0c95f7
|
📦 v0.10.1
|
2025-04-22 10:32:29 +02:00 |
|
Robert Sachunsky
|
64444dd419
|
opt out of 7f8a8dd5 (uniseg update that requires py39)
|
2025-04-17 16:12:37 +02:00 |
|
|
ef817cb343
|
📦 v0.10.0
|
2025-04-17 08:37:37 +02:00 |
|
kba
|
831a24fc4c
|
typo: report_prefix -> file_id
|
2025-04-17 08:04:52 +02:00 |
|
Konstantin Baierer
|
f6a2c94520
|
ocrd_cli: but do check for existing output files
Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
|
2025-04-17 08:04:52 +02:00 |
|
Konstantin Baierer
|
4162836612
|
ocrd_cli: no need to check fileGrp dir exists
Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
|
2025-04-17 08:04:52 +02:00 |
|
Konstantin Baierer
|
c0aa82d188
|
OCR-D processor: properly handle missing or non-downloaded GT/OCR file
Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
|
2025-04-17 08:04:51 +02:00 |
|
kba
|
63031b30bf
|
Port to OCR-D/core API v3
|
2025-04-16 14:45:16 +02:00 |
|
|
7f8a8dd564
|
🐛 Fix for changed API of uniseg's word_break
|
2025-04-16 09:10:43 +02:00 |
|
|
f2e290dffe
|
🐛 Fix --version option in OCR-D CLI
|
2024-07-19 14:54:46 +02:00 |
|
|
6d1daf1dfe
|
✨ Support --version option in CLI
|
2024-07-19 14:41:54 +02:00 |
|
|
129e6eb427
|
📦 v0.9.7
|
2024-07-11 17:25:38 +02:00 |
|
|
6048107889
|
Merge branch 'master' of https://github.com/qurator-spk/dinglehopper
|
2024-07-11 16:26:29 +02:00 |
|
|
2ee37ed4e3
|
🎨 Sort imports
|
2024-07-11 16:25:38 +02:00 |
|
|
521f034fba
|
Merge pull request #116 from stweil/master
Fix typo
|
2024-07-10 01:13:24 +02:00 |
|
|
4047f8b6e5
|
🐛 Fix loading ocrd-tool.json for Python 3.12
|
2024-07-09 21:01:31 +02:00 |
|
Stefan Weil
|
cd68a973cb
|
Fix typo
Signed-off-by: Stefan Weil <sw@weilnetz.de>
|
2024-05-26 09:18:00 +02:00 |
|
|
b336f98271
|
🐛 Fix reading plain text files
As reported by @tallemeersch in gh-107, newlines were not removed for plain text files.
Fix this by stripping the lines as suggested.
Fixes gh-107.
|
2024-05-06 18:14:16 +02:00 |
|
|
41a0fad352
|
📦 v0.9.6
|
2024-05-06 17:48:48 +02:00 |
|
Stefan Weil
|
79701e410d
|
Fix some typos (found by codespell and typos )
Signed-off-by: Stefan Weil <sw@weilnetz.de>
|
2024-04-29 08:42:17 +02:00 |
|
|
2383730a55
|
✔ Test using empty files
Test edge cases + empty files, e.g. empty text content and a Unicode BOM character.
See also gh-79.
|
2024-04-08 20:33:03 +02:00 |
|
|
edabffec7e
|
🧹 tests: Move comment out of the code (bad style + weird formatting)
|
2024-04-04 19:46:08 +02:00 |
|
|
32d4037533
|
⚙ cli: Annotate types in process_dir()
|
2024-04-04 19:38:27 +02:00 |
|
|
be7c1dd25d
|
🧹 Make from_text_segment()'s textequiv_level keyword-only
|
2024-03-27 21:09:34 +01:00 |
|
|
932bfafc7d
|
🧹 Make process_dir() keyword arguments keyword-only
|
2024-03-27 19:44:09 +01:00 |
|
|
c29a80bc81
|
📦 v0.9.5
|
2024-03-27 18:49:13 +01:00 |
|
|
5d9f0c482f
|
🐛 Check that we always get a valid ALTO namespace (satifies mypy)
|
2024-03-27 17:57:53 +01:00 |
|
|
19d1a00817
|
🎨 Reformat (Black)
|
2024-03-27 17:36:05 +01:00 |
|
|
4d4ead4cc8
|
🐛 Fix word segmentation with uniseg 0.8.0
|
2024-03-26 19:34:22 +01:00 |
|