Robert Sachunsky
64444dd419
opt out of 7f8a8dd5
(uniseg update that requires py39)
2025-04-17 16:12:37 +02:00
f287386c0e
🧹 Don't pin uniseg and rapidfuzz
...
Breakage with the newest uniseg API was fixed in master.
Can't see any issue with rapidfuzz, so removing that pin, too.
2025-04-16 14:49:23 +02:00
7f8a8dd564
🐛 Fix for changed API of uniseg's word_break
2025-04-16 09:10:43 +02:00
4047f8b6e5
🐛 Fix loading ocrd-tool.json for Python 3.12
2024-07-09 21:01:31 +02:00
e34adbf41c
🐛 Fix Python 3.12 support by requiring ocrd >= 2.65.0
2024-05-06 16:59:18 +02:00
4d4ead4cc8
🐛 Fix word segmentation with uniseg 0.8.0
2024-03-26 19:34:22 +01:00
38fcbc8e1c
Merge branch 'master' into performance
2024-01-02 20:22:38 +01:00
68a12f8f7f
⬆ Update uniseg dependency
...
@maxbachmann also improved the performance of uniseg, and it is in 0.7.2 - update our
dependency.
2023-11-01 13:48:07 +01:00
7ed076d3c1
⬆ Update multimethod dependency
...
We had some issues while reviewing/rebasing #72 . We don't support Python 3.5 anymore,
so lifting the hard pin on multimethod 1.3.
2023-10-27 18:43:27 +02:00
d8f84ec9ac
🧹 Remove old six dependency (workaround for OCR-D/core#730 )
2023-10-23 15:53:14 +02:00
1c3b28d873
⬆ Update multimethod dependency
...
We had some issues while reviewing/rebasing #72 . We don't support Python 3.5 anymore,
so lifting the hard pin on multimethod 1.3.
2023-10-23 15:26:20 +02:00
69325facf2
🐛 Detect encoding (incl BOM) when reading files
...
As @imlabormitlea-code reported in gh-79, dinglehopper did not handle text files with
BOM well. Fix this by using chardet to detect an encoding, which also detects the BOM
and use the proper encoding to read the files, not including the BOM in the resulting
extracted text.
Fixes gh-80.
2023-08-03 17:48:13 +02:00
Max Bachmann
f48e305347
use uniseg again
2022-10-12 18:52:58 +02:00
Max Bachmann
d2bbc8a6c7
update rapidfuzz version
2022-09-11 02:38:32 +02:00
Max Bachmann
a1f0a5e2d3
replace uniseg with uniseg2
2022-08-29 22:08:25 +02:00
15dfbac3a7
Revert "Revert "Merge pull request #67 from maxbachmann/rapidfuzz""
...
This reverts commit 76bd50f1db
.
2022-08-17 11:42:19 +02:00
ede9402a6c
Revert " 💩 Stick with rapidfuzz < 2.1.0 for now"
...
This reverts commit 0e153db9ca
.
2022-08-17 11:42:07 +02:00
0e153db9ca
💩 Stick with rapidfuzz < 2.1.0 for now
2022-08-16 19:34:48 +02:00
76bd50f1db
Revert "Merge pull request #67 from maxbachmann/rapidfuzz"
...
This reverts commit 85f751aacc
, reversing
changes made to 1febea8c92
.
2022-08-16 19:31:28 +02:00
Max Bachmann
e543438496
replace usage of deprecated rapidfuzz APIs
2022-08-07 10:40:31 +02:00
76bacc0f15
🐛 Bump rapidfuzz dep to >= 2.0.5 (Fixes gh-65)
2022-02-28 14:35:54 +01:00
f0f3cd2d96
⬆️ dinglehopper: Require rapidfuzz >= 1.9.1
...
continuous-integration/drone/push Build encountered an error
See https://github.com/qurator-spk/dinglehopper/issues/64 .
2021-12-14 11:36:00 +01:00
a5c9c7438f
💩 ocrd-galley: Work around OCR-D/core#730
...
continuous-integration/drone/push Build is passing
OCR-D/core currently needs six until the next relaase. Fix the build by
requiring it here.
2021-11-05 17:05:54 +01:00
af8da1d716
⚡ dinglehopper: Use rapidfuzz for editops
2021-10-22 15:38:59 +02:00
8cd8314c8a
🐛 dinglehopper: Bump up ocrd req for zip_input_files
...
See also GH-49.
2020-11-19 18:59:47 +01:00
f2367ac0c3
🐛 Fix OCR-D CLI for newest OCR-D
...
Now that find_files() is a generator, we can't use [0] to get the file.
2020-10-16 14:58:27 +02:00
5ed184c8c4
✨ dinglehopper: Show a progressbar on --progress
2020-10-15 16:09:54 +02:00
f50591abac
Merge branch 'feat/display-segment-id'
2020-10-08 13:39:38 +02:00
b14c35e147
🎨 dinglehopper: Use multimethod to handle str vs ExtractedText
2020-10-08 12:15:58 +02:00
Konstantin Baierer
004ae298ca
ocrd cli: use make_file_id and assert_file_grp_cardinality
2020-08-07 18:00:33 +02:00
2c69e077fe
🚧 dinglehopper: WIP data structure for extracted text
2020-06-18 13:27:59 +02:00
cdfd4d321d
🐛 dinglehopper: Add missing requirement MarkupSafe
2020-06-12 20:46:51 +02:00
48a31ce672
Revert "Merge branch 'master' of https://github.com/qurator-spk/sbb_textline_detector "
...
This reverts commit 2c89bf3b35ee290d7b830ef270df3a96aa48245e, reversing
changes made to 9f7e413148ca5dbac9b555d7b0d0a5fa3a0f5340.
2019-12-09 12:44:05 +01:00
b-vr103
1303a7d92f
Merge branch 'master' of https://github.com/qurator-spk/sbb_textline_detector
2019-12-09 11:57:16 +01:00
02a0e093bf
✨ dinglehopper: Add OCR-D interface
2019-08-15 17:42:56 +02:00
89048bf55d
➡ Move dinglehopper into its own directory
2019-08-14 15:32:50 +02:00