1
0
Fork 0
mirror of https://github.com/qurator-spk/dinglehopper.git synced 2025-06-09 20:00:01 +02:00
Commit graph

36 commits

Author SHA1 Message Date
Robert Sachunsky
64444dd419 opt out of 7f8a8dd5 (uniseg update that requires py39) 2025-04-17 16:12:37 +02:00
f287386c0e 🧹Don't pin uniseg and rapidfuzz
Breakage with the newest uniseg API was fixed in master.

Can't see any issue with rapidfuzz, so removing that pin, too.
2025-04-16 14:49:23 +02:00
7f8a8dd564 🐛 Fix for changed API of uniseg's word_break 2025-04-16 09:10:43 +02:00
4047f8b6e5 🐛 Fix loading ocrd-tool.json for Python 3.12 2024-07-09 21:01:31 +02:00
e34adbf41c 🐛 Fix Python 3.12 support by requiring ocrd >= 2.65.0 2024-05-06 16:59:18 +02:00
4d4ead4cc8 🐛 Fix word segmentation with uniseg 0.8.0 2024-03-26 19:34:22 +01:00
38fcbc8e1c Merge branch 'master' into performance 2024-01-02 20:22:38 +01:00
68a12f8f7f ⬆ Update uniseg dependency
@maxbachmann also improved the performance of uniseg, and it is in 0.7.2 - update our
dependency.
2023-11-01 13:48:07 +01:00
7ed076d3c1 ⬆ Update multimethod dependency
We had some issues while reviewing/rebasing #72. We don't support Python 3.5 anymore,
so lifting the hard pin on multimethod 1.3.
2023-10-27 18:43:27 +02:00
d8f84ec9ac 🧹 Remove old six dependency (workaround for OCR-D/core#730) 2023-10-23 15:53:14 +02:00
1c3b28d873 ⬆ Update multimethod dependency
We had some issues while reviewing/rebasing #72. We don't support Python 3.5 anymore,
so lifting the hard pin on multimethod 1.3.
2023-10-23 15:26:20 +02:00
69325facf2 🐛 Detect encoding (incl BOM) when reading files
As @imlabormitlea-code reported in gh-79, dinglehopper did not handle text files with
BOM well. Fix this by using chardet to detect an encoding, which also detects the BOM
and use the proper encoding to read the files, not including the BOM in the resulting
extracted text.

Fixes gh-80.
2023-08-03 17:48:13 +02:00
Max Bachmann
f48e305347
use uniseg again 2022-10-12 18:52:58 +02:00
Max Bachmann
d2bbc8a6c7 update rapidfuzz version 2022-09-11 02:38:32 +02:00
Max Bachmann
a1f0a5e2d3 replace uniseg with uniseg2 2022-08-29 22:08:25 +02:00
15dfbac3a7 Revert "Revert "Merge pull request #67 from maxbachmann/rapidfuzz""
This reverts commit 76bd50f1db.
2022-08-17 11:42:19 +02:00
ede9402a6c Revert "💩 Stick with rapidfuzz < 2.1.0 for now"
This reverts commit 0e153db9ca.
2022-08-17 11:42:07 +02:00
0e153db9ca 💩 Stick with rapidfuzz < 2.1.0 for now 2022-08-16 19:34:48 +02:00
76bd50f1db Revert "Merge pull request #67 from maxbachmann/rapidfuzz"
This reverts commit 85f751aacc, reversing
changes made to 1febea8c92.
2022-08-16 19:31:28 +02:00
Max Bachmann
e543438496 replace usage of deprecated rapidfuzz APIs 2022-08-07 10:40:31 +02:00
76bacc0f15 🐛 Bump rapidfuzz dep to >= 2.0.5 (Fixes gh-65) 2022-02-28 14:35:54 +01:00
f0f3cd2d96 ⬆️ dinglehopper: Require rapidfuzz >= 1.9.1
Some checks reported errors
continuous-integration/drone/push Build encountered an error
See https://github.com/qurator-spk/dinglehopper/issues/64.
2021-12-14 11:36:00 +01:00
a5c9c7438f 💩 ocrd-galley: Work around OCR-D/core#730
All checks were successful
continuous-integration/drone/push Build is passing
OCR-D/core currently needs six until the next relaase. Fix the build by
requiring it here.
2021-11-05 17:05:54 +01:00
af8da1d716 dinglehopper: Use rapidfuzz for editops 2021-10-22 15:38:59 +02:00
8cd8314c8a 🐛 dinglehopper: Bump up ocrd req for zip_input_files
See also GH-49.
2020-11-19 18:59:47 +01:00
f2367ac0c3 🐛 Fix OCR-D CLI for newest OCR-D
Now that find_files() is a generator, we can't use [0] to get the file.
2020-10-16 14:58:27 +02:00
5ed184c8c4 dinglehopper: Show a progressbar on --progress 2020-10-15 16:09:54 +02:00
f50591abac Merge branch 'feat/display-segment-id' 2020-10-08 13:39:38 +02:00
b14c35e147 🎨 dinglehopper: Use multimethod to handle str vs ExtractedText 2020-10-08 12:15:58 +02:00
Konstantin Baierer
004ae298ca ocrd cli: use make_file_id and assert_file_grp_cardinality 2020-08-07 18:00:33 +02:00
2c69e077fe 🚧 dinglehopper: WIP data structure for extracted text 2020-06-18 13:27:59 +02:00
cdfd4d321d 🐛 dinglehopper: Add missing requirement MarkupSafe 2020-06-12 20:46:51 +02:00
48a31ce672 Revert "Merge branch 'master' of https://github.com/qurator-spk/sbb_textline_detector"
This reverts commit 2c89bf3b35ee290d7b830ef270df3a96aa48245e, reversing
changes made to 9f7e413148ca5dbac9b555d7b0d0a5fa3a0f5340.
2019-12-09 12:44:05 +01:00
b-vr103
1303a7d92f Merge branch 'master' of https://github.com/qurator-spk/sbb_textline_detector 2019-12-09 11:57:16 +01:00
02a0e093bf dinglehopper: Add OCR-D interface 2019-08-15 17:42:56 +02:00
89048bf55d ➡ Move dinglehopper into its own directory 2019-08-14 15:32:50 +02:00