f4dff64066
🚧 GitHub Actions: Test report
2023-08-04 17:35:43 +02:00
043dbb1749
🚧 GitHub Actions: Test report
2023-08-04 17:33:05 +02:00
207fcd4054
🚧 GitHub Actions: Test report
2023-08-04 17:30:26 +02:00
f7c280e59c
🚧 GitHub Actions: Try shell for loop to install from all requirements*.txt
2023-08-04 16:42:46 +02:00
d42e59846c
🚧 GitHub Actions: Try shell for loop to install from all requirements*.txt
2023-08-04 16:41:03 +02:00
ab1359c7e6
🚧 GitHub Actions: Rework test, run in src/
2023-08-04 16:35:36 +02:00
77bcecd2d0
🚧 GitHub Actions: Allow running test manually
2023-08-04 16:21:00 +02:00
c1b8d983e6
🚧 GitHub Actions: Rename test workflow, also run on schedule
2023-08-04 16:17:21 +02:00
674d833b15
🚧 GitHub Actions: Add test worklow
2023-08-04 16:12:33 +02:00
caef84cebb
🚧 GitHub Actions: Add release workflow
2023-08-04 16:10:54 +02:00
1dad18909c
🧹 Make dinglehopper.* exports explicit
2023-08-03 20:27:17 +02:00
bc6754d0cb
⚙ ruff: Ignore F811 (no redefinitions) for now, as ruff considers the multimethods redefinitions
2023-08-03 19:53:29 +02:00
e4431797e6
🎨 Reformat comments + strings manually (not auto-fixed by Black)
2023-08-03 19:46:01 +02:00
704e7cca1c
⬆ Use f-strings
2023-08-03 19:44:40 +02:00
bea56117ae
🎨 Reformat using Black
2023-08-03 19:25:44 +02:00
d50d624554
🎨 Sort imports (auto-fixed by ruff)
2023-08-03 19:21:21 +02:00
5b20fb24a1
⚙ Add pre-commit
2023-08-03 19:13:21 +02:00
32bd1896e0
🛠 Replace flake8 + pylint with ruff
...
As ruff is a lot faster than the other options, use this for code style checks etc. This
change also removes setup.cfg (See also: gh-85).
2023-08-03 19:03:52 +02:00
e8e58e76c4
⚙ Move mypy settings to pyproject.toml
2023-08-03 18:35:25 +02:00
84a05170ba
⚙ pytest.ini → pyproject.toml
2023-08-03 18:06:13 +02:00
69325facf2
🐛 Detect encoding (incl BOM) when reading files
...
As @imlabormitlea-code reported in gh-79, dinglehopper did not handle text files with
BOM well. Fix this by using chardet to detect an encoding, which also detects the BOM
and use the proper encoding to read the files, not including the BOM in the resulting
extracted text.
Fixes gh-80.
2023-08-03 17:48:13 +02:00
325e5af5f5
🐛 Move source into src/ to fix install
...
Installing was broken since moving to pyproject.toml, which we didn't notice because of
leftover files in build/. Fix this by using the convention of having the source files
in src/ and adjusting pyproject.toml accordingly.
Fixes gh-86. 🤞
2023-08-03 17:29:30 +02:00
db7c051b22
⚙ Migrate to pyproject.toml
2023-08-02 20:55:47 +02:00
fc81233a0e
🚧 CircleCI: Run black
2023-07-18 20:41:16 +02:00
cb0134d2db
🚧 CircleCI: Run black
2023-07-18 20:40:17 +02:00
55d534b981
🚧 CircleCI: Run black
2023-07-18 20:37:47 +02:00
2632cb09b8
🚧 CircleCI: Run black
2023-07-18 20:28:55 +02:00
35be58cb94
Merge pull request #83 from INL/feat/batch-processing
...
Add batch processing and report summaries
2023-05-26 15:28:36 +02:00
6d3a8cecd2
Merge pull request #82 from CircleCI-config-suggestions-bot/StoreTestResults
...
Update .circleci/config.yml to use store_test_results
2023-05-24 18:50:40 +02:00
Ruud de Jong
207804e6a6
Add batch processing and report summaries
2023-05-12 09:55:00 +02:00
CircleCI Config Suggestions Bot
89814cbe4b
Upload test results to CircleCI
2023-05-05 14:21:14 -04:00
dd9303b429
🧹 .gitignore .python-version (for pyenv)
2023-04-20 20:15:44 +02:00
f1fc3f1880
🧹 Remove qurator. namespace prefix
2023-03-27 18:25:39 +02:00
f668963a2e
🐛 Fix installing by calling find_namespace_packages in setup.py
...
Turns out just removing __init__.py is not enough for native namespace
packages. We also need to (explicitly) call setuptools.find_namespace_packages()
for setup.py to find the package...
https://packaging.python.org/en/latest/guides/packaging-namespace-packages/#native-namespace-packages
Fixes gh-77.
2023-03-27 14:35:08 +02:00
c4ab7c9a7c
🕸Do not use deprecated ID, pageId options
...
See gh-75.
2023-03-14 13:16:28 +01:00
b4ac24ac9d
🔧 Remove explicit namespace_packages
...
Fixes gh-76.
2023-03-14 12:59:13 +01:00
2a090c9b5a
✔ CircleCI: Explicitly install binary opencv-python-headless (dep of OCR-D?) to avoid compilation
2023-03-14 12:49:02 +01:00
833efa37da
🐛 Remove deprecated declare_namespace call
...
Remove depecreated declare_namespace call and use implicit namespace (PEP-0420).
Fixes gh-76.
2023-03-14 12:44:22 +01:00
0fd4ea1973
✔ Add @cneud's former 40 GB problem files to the test suite
2023-03-02 16:24:08 +01:00
0f0819512e
🎨 Reformat using Black
2023-03-02 10:22:51 +01:00
2268f32a78
✔ CircleCI: Test on Python 3.11
2023-03-02 10:06:00 +01:00
a18b25b163
🐛 Update tests for ExtractedText
...
In PR gh-72, @maxbachmann introduced a new argument for ExtractedText(). Update the
corresponding tests.
2023-01-27 19:13:45 +01:00
Max Bachmann
f48e305347
use uniseg again
2022-10-12 18:52:58 +02:00
Max Bachmann
d2bbc8a6c7
update rapidfuzz version
2022-09-11 02:38:32 +02:00
Max Bachmann
a1f0a5e2d3
replace uniseg with uniseg2
2022-08-29 22:08:25 +02:00
Max Bachmann
22c3817f45
apply black
2022-08-29 01:50:19 +02:00
Max Bachmann
01571f23b7
move grapheme clusters to ExtractedText
2022-08-29 01:49:04 +02:00
Max Bachmann
f211d09f56
remove python2.7 futures
2022-08-29 00:50:33 +02:00
Max Bachmann
205a969c0e
remove unused includes
2022-08-29 00:48:40 +02:00
Max Bachmann
f3825cdeb6
only call words_normalized
once
2022-08-29 00:22:23 +02:00