Commit Graph

124 Commits (15dfbac3a73a9f6b6d0a7f7e96b6d12cf5032ff6)

Author SHA1 Message Date
Gerber, Mike ced6504ad0 🎨 dinglehopper: Expose clearing the Levenshtein cache as a function
Gerber, Mike 5cf4eddaeb dinglehopper: Clear Levenshtein cache between OCR-D files
Gerber, Mike 58ff140bc0 ️ dinglehopper: Improve performance by caching the Levensthein matrix
Motivated by [a pull
request](https://github.com/qurator-spk/dinglehopper/pull/7) by
@JKamlah, implement a cache of the Levensthein matrix calculation.

We calculated the Levenshtein matrixes for characters and words twice:
Once for the error rates, once for the alignment.
Gerber, Mike 11a6341641 🧹 dinglehopper: Remove broken implementation of the unordered word error rate
Gerber, Mike f22228840e 🧹 dinglehopper: Use exclusively relative imports in tests
Gerber, Mike d61c076aad 🧹 dinglehopper: Remove debug print()s
Gerber, Mike 12a48f3bfe dinglehopper: Test aligning lists of lines
Gerber, Mike 680c2a2661 🐛 dinglehopper: Fix test_ocrd_cli for Python 3.5, again, and again
Gerber, Mike 7cf1a540f4 🐛 dinglehopper: Fix test_ocrd_cli for Python 3.5, again
Gerber, Mike 49e2065ad6 🐛 dinglehopper: Fix test_ocrd_cli for Python 3.5
Gerber, Mike 86178271df dinglehopper: Fix repeated tests for the OCR-D interface
Gerber, Mike b6f50ef853 dinglehopper: Add a test for the OCR-D interface
Konstantin Baierer 2ca44af31d ocrd-tool: add category
Gerber, Mike c30553985f � dinglehopper: Substitute more characters
Gerber, Mike 493541fddf 🐛 dinglehopper: Always work with NFC text
Gerber, Mike df93c80e5d 🐛 dinglehopper: Always work with NFC text
Gerber, Mike 715b813bbc � dinglehopper: Add two more eMOP ligatures
Gerber, Mike 8d055e7b6e 🐛 dinglehopper: Work on NFC'ed grapheme clusters when aligning text
Gerber, Mike 534958be1d 🐛 dinglehopper: Fix sorting the reading order
Regions were sorted wrongly when there are more than 9 regions in an
OrderedGroup because the index was sorted alphabetically, not
numerically. Fix this by converting the index to integers.
Gerber, Mike 10f010eaa8 🐛 dinglehopper: Do not throw error if a region ID is not found
The ReadingOrder might contain regions of types other than text regions,
so not finding a TextRegion with the referenced ID is not an error.
Downgrade to a warning for now.
Gerber, Mike 8237b3edaf � dinglehopper: Substitute more characters
Gerber, Mike 02a0e093bf dinglehopper: Add OCR-D interface
Gerber, Mike 495919c06d 🧹 dinglehopper: Move pytest.ini
Gerber, Mike 89048bf55d ➡ Move dinglehopper into its own directory