1
0
Fork 0
mirror of https://github.com/qurator-spk/dinglehopper.git synced 2025-06-09 03:40:12 +02:00
Commit graph

308 commits

Author SHA1 Message Date
bc6754d0cb ⚙ ruff: Ignore F811 (no redefinitions) for now, as ruff considers the multimethods redefinitions 2023-08-03 19:53:29 +02:00
e4431797e6 🎨 Reformat comments + strings manually (not auto-fixed by Black) 2023-08-03 19:46:01 +02:00
704e7cca1c ⬆ Use f-strings 2023-08-03 19:44:40 +02:00
bea56117ae 🎨 Reformat using Black 2023-08-03 19:25:44 +02:00
d50d624554 🎨 Sort imports (auto-fixed by ruff) 2023-08-03 19:21:21 +02:00
5b20fb24a1 ⚙ Add pre-commit 2023-08-03 19:13:21 +02:00
32bd1896e0 🛠 Replace flake8 + pylint with ruff
As ruff is a lot faster than the other options, use this for code style checks etc. This
change also removes setup.cfg (See also: gh-85).
2023-08-03 19:03:52 +02:00
e8e58e76c4 ⚙ Move mypy settings to pyproject.toml 2023-08-03 18:35:25 +02:00
84a05170ba ⚙ pytest.ini → pyproject.toml 2023-08-03 18:06:13 +02:00
69325facf2 🐛 Detect encoding (incl BOM) when reading files
As @imlabormitlea-code reported in gh-79, dinglehopper did not handle text files with
BOM well. Fix this by using chardet to detect an encoding, which also detects the BOM
and use the proper encoding to read the files, not including the BOM in the resulting
extracted text.

Fixes gh-80.
2023-08-03 17:48:13 +02:00
325e5af5f5 🐛 Move source into src/ to fix install
Installing was broken since moving to pyproject.toml, which we didn't notice because of
leftover files in build/. Fix this by using the convention of having the source files
in src/ and adjusting pyproject.toml accordingly.

Fixes gh-86. 🤞
2023-08-03 17:29:30 +02:00
db7c051b22 ⚙ Migrate to pyproject.toml 2023-08-02 20:55:47 +02:00
fc81233a0e 🚧 CircleCI: Run black 2023-07-18 20:41:16 +02:00
cb0134d2db 🚧 CircleCI: Run black 2023-07-18 20:40:17 +02:00
55d534b981 🚧 CircleCI: Run black 2023-07-18 20:37:47 +02:00
2632cb09b8 🚧 CircleCI: Run black 2023-07-18 20:28:55 +02:00
35be58cb94
Merge pull request #83 from INL/feat/batch-processing
Add batch processing and report summaries
2023-05-26 15:28:36 +02:00
6d3a8cecd2
Merge pull request #82 from CircleCI-config-suggestions-bot/StoreTestResults
Update .circleci/config.yml to use store_test_results
2023-05-24 18:50:40 +02:00
Ruud de Jong
207804e6a6 Add batch processing and report summaries 2023-05-12 09:55:00 +02:00
CircleCI Config Suggestions Bot
89814cbe4b Upload test results to CircleCI 2023-05-05 14:21:14 -04:00
dd9303b429 🧹 .gitignore .python-version (for pyenv) 2023-04-20 20:15:44 +02:00
f1fc3f1880 🧹 Remove qurator. namespace prefix 2023-03-27 18:25:39 +02:00
f668963a2e 🐛 Fix installing by calling find_namespace_packages in setup.py
Turns out just removing __init__.py is not enough for native namespace
packages. We also need to (explicitly) call setuptools.find_namespace_packages()
for setup.py to find the package...

https://packaging.python.org/en/latest/guides/packaging-namespace-packages/#native-namespace-packages

Fixes gh-77.
2023-03-27 14:35:08 +02:00
c4ab7c9a7c 🕸Do not use deprecated ID, pageId options
See gh-75.
2023-03-14 13:16:28 +01:00
b4ac24ac9d 🔧 Remove explicit namespace_packages
Fixes gh-76.
2023-03-14 12:59:13 +01:00
2a090c9b5a ✔ CircleCI: Explicitly install binary opencv-python-headless (dep of OCR-D?) to avoid compilation 2023-03-14 12:49:02 +01:00
833efa37da 🐛 Remove deprecated declare_namespace call
Remove depecreated declare_namespace call and use implicit namespace (PEP-0420).

Fixes gh-76.
2023-03-14 12:44:22 +01:00
0fd4ea1973 ✔ Add @cneud's former 40 GB problem files to the test suite 2023-03-02 16:24:08 +01:00
0f0819512e 🎨 Reformat using Black 2023-03-02 10:22:51 +01:00
2268f32a78 ✔ CircleCI: Test on Python 3.11 2023-03-02 10:06:00 +01:00
dcc10c5389 ✔️ Skip test_lines_similar() for now
test_lines_similar() fails with rapidfuzz 2.5 and is flawed anyway:

The test was based on our own implementation that used __eq__ and not __hash__ as
rapidfuzz does. Need to review this in the future.
2022-08-18 15:51:16 +02:00
555f586775 📝 Note that old terminals might not render the Unicode characters correctly 2022-08-17 17:59:15 +02:00
c4e85da5ab 🐛 Update editops() and seq_align() due to RapidFuzz API changes 2022-08-17 17:55:44 +02:00
15dfbac3a7 Revert "Revert "Merge pull request #67 from maxbachmann/rapidfuzz""
This reverts commit 76bd50f1db.
2022-08-17 11:42:19 +02:00
ede9402a6c Revert "💩 Stick with rapidfuzz < 2.1.0 for now"
This reverts commit 0e153db9ca.
2022-08-17 11:42:07 +02:00
0e153db9ca 💩 Stick with rapidfuzz < 2.1.0 for now 2022-08-16 19:34:48 +02:00
76bd50f1db Revert "Merge pull request #67 from maxbachmann/rapidfuzz"
This reverts commit 85f751aacc, reversing
changes made to 1febea8c92.
2022-08-16 19:31:28 +02:00
85f751aacc
Merge pull request #67 from maxbachmann/rapidfuzz
replace usage of deprecated rapidfuzz APIs
2022-08-16 16:35:54 +02:00
Max Bachmann
e543438496 replace usage of deprecated rapidfuzz APIs 2022-08-07 10:40:31 +02:00
1febea8c92
Merge pull request #66 from stweil/master
All checks were successful
continuous-integration/drone/push Build is passing
Ignore Python build artifacts
2022-03-30 13:40:36 +02:00
Stefan Weil
101f50ec88 Ignore Python build artifacts
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2022-03-24 16:51:37 +01:00
edc24cd4db ✔️ DroneCI: Build on Python 3.6 → 3.10
All checks were successful
continuous-integration/drone/push Build is passing
2022-03-03 16:35:26 +01:00
d726396002 👷🏾‍♂️ Remove str() on Path objects
As of Python 3.6 we don't need to call str() on Path objects anymore.

See also gh-20.
2022-03-02 11:19:40 +01:00
a19224dc46 ✔️ CircleCI: Stop testing using Python 3.5
The latest rapidfuzz updates broke Python 3.5 support. As it is EOL for some time now,
we are stopping testing with it.

See also gh-65 and gh-20.
2022-02-28 14:46:34 +01:00
76bacc0f15 🐛 Bump rapidfuzz dep to >= 2.0.5 (Fixes gh-65) 2022-02-28 14:35:54 +01:00
195354c6d4 Merge branch 'feat/compare-line-texts'
Some checks reported errors
continuous-integration/drone/push Build encountered an error
2022-01-24 18:46:33 +01:00
8a3f5e48c2 🐛 dinglehopper: Patch word_break only once
Some checks reported errors
continuous-integration/drone/push Build encountered an error
Previously, we (accidently) patched uniseg's word_break on every call
to words(). Do it only once.
2022-01-24 18:44:30 +01:00
b6bde2b7ec 📝 dinglehopper: Document dinglehopper-line-dirs in the README
Some checks reported errors
continuous-integration/drone/push Build encountered an error
2021-12-15 11:16:40 +01:00
f77ce857b2 🚧 dinglehopper: Sahre json_float code
Some checks reported errors
continuous-integration/drone/push Build encountered an error
2021-12-14 18:37:07 +01:00
5b394649a7 🚧 dinglehopper: Compute WER in line-dirs CLI 2021-12-14 18:33:20 +01:00