1
0
Fork 0
mirror of https://github.com/qurator-spk/dinglehopper.git synced 2025-06-09 20:00:01 +02:00

🐛 Detect encoding (incl BOM) when reading files

As @imlabormitlea-code reported in gh-79, dinglehopper did not handle text files with
BOM well. Fix this by using chardet to detect an encoding, which also detects the BOM
and use the proper encoding to read the files, not including the BOM in the resulting
extracted text.

Fixes gh-80.
This commit is contained in:
Mike Gerber 2023-08-03 17:48:13 +02:00
parent 325e5af5f5
commit 69325facf2
2 changed files with 10 additions and 2 deletions

View file

@ -11,3 +11,4 @@ multimethod == 1.3 # latest version to officially support Python 3.5
tqdm
rapidfuzz >= 2.4.2
six # XXX workaround OCR-D/core#730
chardet