Commit Graph

289 Commits (7fef02bf0aa5d7a5abd86bc58f141a9856a795af)
 

Author SHA1 Message Date
Mike Gerber 7fef02bf0a ✔ Add mets:FLocat's @LOCTYPE/OTHERLOCTYPE to test data
Newest OCR-D wasn't happy with the test data anymore (see gh-89). I'm not sure if the
test data was invalid the way it was, but having a LOCTYPE certainly is "prettier" so
adding it. This fixes the test again.
Mike Gerber 7ed076d3c1 ⬆ Update multimethod dependency
We had some issues while reviewing/rebasing . We don't support Python 3.5 anymore,
so lifting the hard pin on multimethod 1.3.
Gerber, Mike a18b25b163 🐛 Update tests for ExtractedText
In PR gh-72, @maxbachmann introduced a new argument for ExtractedText(). Update the
corresponding tests.
Max Bachmann f48e305347
use uniseg again
Max Bachmann d2bbc8a6c7 update rapidfuzz version
Max Bachmann a1f0a5e2d3 replace uniseg with uniseg2
Max Bachmann 22c3817f45 apply black
Max Bachmann 01571f23b7 move grapheme clusters to ExtractedText
Max Bachmann f211d09f56 remove python2.7 futures
Max Bachmann 205a969c0e remove unused includes
Max Bachmann f3825cdeb6
only call `words_normalized` once
Gerber, Mike dcc10c5389 ✔️ Skip test_lines_similar() for now
test_lines_similar() fails with rapidfuzz 2.5 and is flawed anyway:

The test was based on our own implementation that used __eq__ and not __hash__ as
rapidfuzz does. Need to review this in the future.
Gerber, Mike 555f586775 📝 Note that old terminals might not render the Unicode characters correctly
Gerber, Mike c4e85da5ab 🐛 Update editops() and seq_align() due to RapidFuzz API changes
Gerber, Mike 15dfbac3a7 Revert "Revert "Merge pull request from maxbachmann/rapidfuzz""
This reverts commit 76bd50f1db.
Gerber, Mike ede9402a6c Revert "💩 Stick with rapidfuzz < 2.1.0 for now"
This reverts commit 0e153db9ca.
Gerber, Mike 0e153db9ca 💩 Stick with rapidfuzz < 2.1.0 for now
Gerber, Mike 76bd50f1db Revert "Merge pull request from maxbachmann/rapidfuzz"
This reverts commit 85f751aacc, reversing
changes made to 1febea8c92.
Mike Gerber 85f751aacc
Merge pull request from maxbachmann/rapidfuzz
replace usage of deprecated rapidfuzz APIs
Max Bachmann e543438496 replace usage of deprecated rapidfuzz APIs
Mike Gerber 1febea8c92
Merge pull request from stweil/master
Ignore Python build artifacts
Stefan Weil 101f50ec88 Ignore Python build artifacts
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Gerber, Mike edc24cd4db ✔️ DroneCI: Build on Python 3.6 → 3.10
Gerber, Mike d726396002 👷🏾‍♂️ Remove str() on Path objects
As of Python 3.6 we don't need to call str() on Path objects anymore.

See also gh-20.
Gerber, Mike a19224dc46 ✔️ CircleCI: Stop testing using Python 3.5
The latest rapidfuzz updates broke Python 3.5 support. As it is EOL for some time now,
we are stopping testing with it.

See also gh-65 and gh-20.
Gerber, Mike 76bacc0f15 🐛 Bump rapidfuzz dep to >= 2.0.5 (Fixes gh-65)
Gerber, Mike 195354c6d4 Merge branch 'feat/compare-line-texts'
Gerber, Mike 8a3f5e48c2 🐛 dinglehopper: Patch word_break only once
Previously, we (accidently) patched uniseg's word_break on every call
to words(). Do it only once.
Gerber, Mike b6bde2b7ec 📝 dinglehopper: Document dinglehopper-line-dirs in the README
Gerber, Mike f77ce857b2 🚧 dinglehopper: Sahre json_float code
Gerber, Mike 5b394649a7 🚧 dinglehopper: Compute WER in line-dirs CLI
Gerber, Mike cb2be96179 🚧 dinglehopper: Add word differences in line-dirs report
Gerber, Mike dbb660615a 🚧 dinglehopper: Compare line text directories (WIP)
Gerber, Mike a018006f98 🚧 dinglehopper: Compare line text directories (WIP)
Gerber, Mike 36b36f6986 🚧 dinglehopper: Compare line text directories (WIP)
Gerber, Mike f0f3cd2d96 ⬆️ dinglehopper: Require rapidfuzz >= 1.9.1
See https://github.com/qurator-spk/dinglehopper/issues/64.
Gerber, Mike a5c9c7438f 💩 ocrd-galley: Work around
OCR-D/core currently needs six until the next relaase. Fix the build by
requiring it here.
Gerber, Mike 7d26b049d1 Merge branch 'fix/ci-py310'
Gerber, Mike 51a44895dc ⬆️ CircleCI: Add Python 3.10
Gerber, Mike 1f8fa5176f Revert "⬆️ CircleCI: Add Python 3.10"
This reverts commit b2b21839c2.
Gerber, Mike b2b21839c2 ⬆️ CircleCI: Add Python 3.10
Gerber, Mike 7d85e21cbc ⬆️ CircleCI: Switch to the new cimg/python image
Gerber, Mike dea0c53f88 Merge branch 'rapidfuzz'
Gerber, Mike 06ea38449c 📝 dinglehopper: Update Levenshtein notebook
Gerber, Mike 3ee688001a 🧹 dinglehopper: Directly import levenshtein() from rapidfuzz
Gerber, Mike 5d496df267 dinglehopper: Remove tests that only test rapidfuzz's levenshtein()
Gerber, Mike 091f069b3c dinglehopper: Remove tests that only test rapidfuzz's levenshtein_ops()
Gerber, Mike af8da1d716 dinglehopper: Use rapidfuzz for editops
Gerber, Mike 249787686f Merge branch 'master' of github.com:qurator-spk/dinglehopper
Gerber, Mike 2a6cc5823e 🐛 dinglehopper: Call initLogging before logging
When using ocrd_utils' getLogger(), we need to call initLogging() before doing any
logging.

Fixes .