Commit Graph

22 Commits (915a647949fe311b0e83d09d8a2b09fbabc6de25)

Author SHA1 Message Date
Mike Gerber 69325facf2 🐛 Detect encoding (incl BOM) when reading files
As @imlabormitlea-code reported in gh-79, dinglehopper did not handle text files with
BOM well. Fix this by using chardet to detect an encoding, which also detects the BOM
and use the proper encoding to read the files, not including the BOM in the resulting
extracted text.

Fixes gh-80.
1 year ago
Gerber, Mike 15dfbac3a7 Revert "Revert "Merge pull request #67 from maxbachmann/rapidfuzz""
This reverts commit 76bd50f1db.
2 years ago
Gerber, Mike ede9402a6c Revert "💩 Stick with rapidfuzz < 2.1.0 for now"
This reverts commit 0e153db9ca.
2 years ago
Gerber, Mike 0e153db9ca 💩 Stick with rapidfuzz < 2.1.0 for now 2 years ago
Gerber, Mike 76bd50f1db Revert "Merge pull request #67 from maxbachmann/rapidfuzz"
This reverts commit 85f751aacc, reversing
changes made to 1febea8c92.
2 years ago
Max Bachmann e543438496 replace usage of deprecated rapidfuzz APIs 2 years ago
Gerber, Mike 76bacc0f15 🐛 Bump rapidfuzz dep to >= 2.0.5 (Fixes gh-65) 3 years ago
Gerber, Mike f0f3cd2d96 ⬆️ dinglehopper: Require rapidfuzz >= 1.9.1
continuous-integration/drone/push Build encountered an error Details
See https://github.com/qurator-spk/dinglehopper/issues/64.
3 years ago
Gerber, Mike a5c9c7438f 💩 ocrd-galley: Work around OCR-D/core#730
continuous-integration/drone/push Build is passing Details
OCR-D/core currently needs six until the next relaase. Fix the build by
requiring it here.
3 years ago
Gerber, Mike af8da1d716 dinglehopper: Use rapidfuzz for editops 3 years ago
Gerber, Mike 8cd8314c8a 🐛 dinglehopper: Bump up ocrd req for zip_input_files
See also GH-49.
4 years ago
Gerber, Mike f2367ac0c3 🐛 Fix OCR-D CLI for newest OCR-D
Now that find_files() is a generator, we can't use [0] to get the file.
4 years ago
Gerber, Mike 5ed184c8c4 dinglehopper: Show a progressbar on --progress 4 years ago
Gerber, Mike f50591abac Merge branch 'feat/display-segment-id' 4 years ago
Gerber, Mike b14c35e147 🎨 dinglehopper: Use multimethod to handle str vs ExtractedText 4 years ago
Konstantin Baierer 004ae298ca ocrd cli: use make_file_id and assert_file_grp_cardinality 4 years ago
Gerber, Mike 2c69e077fe 🚧 dinglehopper: WIP data structure for extracted text 4 years ago
Gerber, Mike cdfd4d321d 🐛 dinglehopper: Add missing requirement MarkupSafe 4 years ago
Gerber, Mike 48a31ce672 Revert "Merge branch 'master' of https://github.com/qurator-spk/sbb_textline_detector"
This reverts commit 2c89bf3b35ee290d7b830ef270df3a96aa48245e, reversing
changes made to 9f7e413148ca5dbac9b555d7b0d0a5fa3a0f5340.
5 years ago
b-vr103 1303a7d92f Merge branch 'master' of https://github.com/qurator-spk/sbb_textline_detector 5 years ago
Gerber, Mike 02a0e093bf dinglehopper: Add OCR-D interface 5 years ago
Gerber, Mike 89048bf55d ➡ Move dinglehopper into its own directory 5 years ago