mirror of
https://github.com/qurator-spk/dinglehopper.git
synced 2025-06-09 20:00:01 +02:00
🐛 Detect encoding (incl BOM) when reading files
As @imlabormitlea-code reported in gh-79, dinglehopper did not handle text files with BOM well. Fix this by using chardet to detect an encoding, which also detects the BOM and use the proper encoding to read the files, not including the BOM in the resulting extracted text. Fixes gh-80.
This commit is contained in:
parent
325e5af5f5
commit
69325facf2
2 changed files with 10 additions and 2 deletions
|
@ -11,3 +11,4 @@ multimethod == 1.3 # latest version to officially support Python 3.5
|
|||
tqdm
|
||||
rapidfuzz >= 2.4.2
|
||||
six # XXX workaround OCR-D/core#730
|
||||
chardet
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue