Mike Gerber
69325facf2
🐛 Detect encoding (incl BOM) when reading files
...
As @imlabormitlea-code reported in gh-79, dinglehopper did not handle text files with
BOM well. Fix this by using chardet to detect an encoding, which also detects the BOM
and use the proper encoding to read the files, not including the BOM in the resulting
extracted text.
Fixes gh-80.
1 year ago
Gerber, Mike
15dfbac3a7
Revert "Revert "Merge pull request #67 from maxbachmann/rapidfuzz""
...
This reverts commit 76bd50f1db
.
2 years ago
Gerber, Mike
ede9402a6c
Revert " 💩 Stick with rapidfuzz < 2.1.0 for now"
...
This reverts commit 0e153db9ca
.
2 years ago
Gerber, Mike
0e153db9ca
💩 Stick with rapidfuzz < 2.1.0 for now
2 years ago
Gerber, Mike
76bd50f1db
Revert "Merge pull request #67 from maxbachmann/rapidfuzz"
...
This reverts commit 85f751aacc
, reversing
changes made to 1febea8c92
.
2 years ago
Max Bachmann
e543438496
replace usage of deprecated rapidfuzz APIs
2 years ago
Gerber, Mike
76bacc0f15
🐛 Bump rapidfuzz dep to >= 2.0.5 (Fixes gh-65)
3 years ago
Gerber, Mike
f0f3cd2d96
⬆️ dinglehopper: Require rapidfuzz >= 1.9.1
...
continuous-integration/drone/push Build encountered an error
Details
See https://github.com/qurator-spk/dinglehopper/issues/64 .
3 years ago
Gerber, Mike
a5c9c7438f
💩 ocrd-galley: Work around OCR-D/core#730
...
continuous-integration/drone/push Build is passing
Details
OCR-D/core currently needs six until the next relaase. Fix the build by
requiring it here.
3 years ago
Gerber, Mike
af8da1d716
⚡ dinglehopper: Use rapidfuzz for editops
3 years ago
Gerber, Mike
8cd8314c8a
🐛 dinglehopper: Bump up ocrd req for zip_input_files
...
See also GH-49.
4 years ago
Gerber, Mike
f2367ac0c3
🐛 Fix OCR-D CLI for newest OCR-D
...
Now that find_files() is a generator, we can't use [0] to get the file.
4 years ago
Gerber, Mike
5ed184c8c4
✨ dinglehopper: Show a progressbar on --progress
4 years ago
Gerber, Mike
f50591abac
Merge branch 'feat/display-segment-id'
4 years ago
Gerber, Mike
b14c35e147
🎨 dinglehopper: Use multimethod to handle str vs ExtractedText
4 years ago
Konstantin Baierer
004ae298ca
ocrd cli: use make_file_id and assert_file_grp_cardinality
4 years ago
Gerber, Mike
2c69e077fe
🚧 dinglehopper: WIP data structure for extracted text
4 years ago
Gerber, Mike
cdfd4d321d
🐛 dinglehopper: Add missing requirement MarkupSafe
4 years ago
Gerber, Mike
48a31ce672
Revert "Merge branch 'master' of https://github.com/qurator-spk/sbb_textline_detector "
...
This reverts commit 2c89bf3b35ee290d7b830ef270df3a96aa48245e, reversing
changes made to 9f7e413148ca5dbac9b555d7b0d0a5fa3a0f5340.
5 years ago
b-vr103
1303a7d92f
Merge branch 'master' of https://github.com/qurator-spk/sbb_textline_detector
5 years ago
Gerber, Mike
02a0e093bf
✨ dinglehopper: Add OCR-D interface
5 years ago
Gerber, Mike
89048bf55d
➡ Move dinglehopper into its own directory
5 years ago