Commit Graph

120 Commits (76bacc0f1511957913f8af0e7ea0f2bdb84fa8b2)

Author SHA1 Message Date
Gerber, Mike f3aafb6fdf dinglehopper: Validate ExtractedText.{segments,_text} in both directions 4 years ago
Gerber, Mike b14c35e147 🎨 dinglehopper: Use multimethod to handle str vs ExtractedText 4 years ago
Gerber, Mike a17ee2afec 🚧 dinglehopper: Guarantee NFC + rename from_text → from_str 4 years ago
Gerber, Mike 7843824eaf 🚧 dinglehopper: Support str & ExtractedText in CER and distance functions 4 years ago
Gerber, Mike 5bee55c896 💩 dinglehopper: Fix OCR-D CLI test by working around ocrd_cli_wrap_processor() check for arguments 4 years ago
Gerber, Mike 96b55f1806 🚧 dinglehopper: Hierarchical text representation 4 years ago
Gerber, Mike d706ef4621 📝 Document CER/WER and the format detection (Fixes GH-26) 4 years ago
Gerber, Mike da47e41c85 💩 dinglehopper: Fix OCR-D CLI test by working around ocrd_cli_wrap_processor() check for arguments 4 years ago
Mike Gerber 7085ee0fd8
Merge pull request #29 from kba/getlogger
getLogger per method
4 years ago
Gerber, Mike 77154ef256 📝 dinglehopper: Document REPORT_PREFIX (Closes GH-27.) 4 years ago
Konstantin Baierer 12da98e477 getLogger per method 4 years ago
Konstantin Baierer 004ae298ca ocrd cli: use make_file_id and assert_file_grp_cardinality 4 years ago
Gerber, Mike 6ab38f1bda 🎨 dinglehopper: Make PyCharm happier with the type hinting, newlines etc. 4 years ago
Gerber, Mike d484810038 dinglehopper: Validate read segment ids 4 years ago
Gerber, Mike d39f74f11a 🧹 dinglehopper: Remove obsolete normalization-related FIXME 4 years ago
Gerber, Mike 8c5f7c73d5 🧹 dinglehopper: Replace XXX with an actual comment 4 years ago
Gerber, Mike 37edc0336f 🧹 dinglehopper: Remove obsolete XXX that has a GitHub issue 4 years ago
Gerber, Mike 9f05e6ca4c 🧹 dinglehopper: Remove obsolete XXX about None ids 4 years ago
Gerber, Mike 4469af62c8 🎨 dinglehopper: Unfuck substitutions a bit 4 years ago
Gerber, Mike 079be203bd 🐛 dinglehopper: Fix tests to deal with new normalization logic 4 years ago
Gerber, Mike c010a7f05e 🧹 dinglehopper: Calculate segment ids once, on the first call 4 years ago
Gerber, Mike 0cf7ff4721 🧹 dinglehopper: Remove obsolete XXX about the PAGE hierarchy 4 years ago
Gerber, Mike c432cb505a 🧹 dinglehopper: Clean up test_lines_similar() 4 years ago
Gerber, Mike 0c33e84415 📓 dinglehopper: Document editops() 4 years ago
Gerber, Mike a61c935624 🧹 dinglehopper: Move Python 3.5 XXXs to a GitHub issue
See https://github.com/qurator-spk/dinglehopper/issues/20.
4 years ago
Gerber, Mike 257e4986cc 🚧 dinglehopper: Use a Bootstrap tooltip for the segment id 4 years ago
Gerber, Mike a320d5fd8f 🚧 dinglehopper: Re-introduce "substitute_equivalences" as Normalization.NFC_SBB 4 years ago
Gerber, Mike 2579e0220c 🚧 dinglehopper: Remove debug output 4 years ago
Gerber, Mike d4e39d3d26 🚧 dinglehopper: Display segment id in the corresponding column 4 years ago
Gerber, Mike 48ad340428 🚧 dinglehopper: Display segment id when hovering over a character difference 4 years ago
Gerber, Mike 1f6538b44c 🚧 dinglehopper: Extract text while retaining segment id info 4 years ago
Gerber, Mike 275ff32524 🚧 dinglehopper: Extract text while retaining segment id info 4 years ago
Gerber, Mike 4e182e0794 🚧 dinglehopper: Extract text while retaining segment id info 4 years ago
Gerber, Mike 9f8bb1d8ea 🚧 dinglehopper: Extract text while retaining segment id info 4 years ago
Gerber, Mike 668de758a0 dinglehopper: Support disabling metrics in the OCR-D interface 4 years ago
Gerber, Mike f699697eb3 🐛 dinglehopper: Fix reading OCR-D workspace files when only URLs are provided 4 years ago
Gerber, Mike 22765f02a2 🐛 dinglehopper: Fix tests by making metrics a keyword argument 5 years ago
Gerber, Mike 5cbeb7b0dd dinglehopper: Support disabling the metrics using CLI option --no-metrics 5 years ago
Gerber, Mike 745095e52c dinglehopper: Include number of characters and words in JSON report 5 years ago
Gerber, Mike 48a31ce672 Revert "Merge branch 'master' of https://github.com/qurator-spk/sbb_textline_detector"
This reverts commit 2c89bf3b35ee290d7b830ef270df3a96aa48245e, reversing
changes made to 9f7e413148ca5dbac9b555d7b0d0a5fa3a0f5340.
5 years ago
b-vr103 1303a7d92f Merge branch 'master' of https://github.com/qurator-spk/sbb_textline_detector 5 years ago
Gerber, Mike f32eb9eb69 🐛 dinglehopper: Escape text inserted into HTML (Fixes #8) 5 years ago
Gerber, Mike 82e863fac2 📝 dinglehopper: Document seq_editops() 5 years ago
Gerber, Mike 5ccdace1dd 🎨 dinglehopper: Move working_directory() context manager into tests/util 5 years ago
Gerber, Mike f98c527c93 🐛 dinglehopper: Fix working_directory() context manager 5 years ago
Gerber, Mike 5273d10bac 🐛 dinglehopper: Generate a loadable JSON report even if CER=∞ 5 years ago
Gerber, Mike ced6504ad0 🎨 dinglehopper: Expose clearing the Levenshtein cache as a function 5 years ago
Gerber, Mike 5cf4eddaeb dinglehopper: Clear Levenshtein cache between OCR-D files 5 years ago
Gerber, Mike 58ff140bc0 ️ dinglehopper: Improve performance by caching the Levensthein matrix
Motivated by [a pull
request](https://github.com/qurator-spk/dinglehopper/pull/7) by
@JKamlah, implement a cache of the Levensthein matrix calculation.

We calculated the Levenshtein matrixes for characters and words twice:
Once for the error rates, once for the alignment.
5 years ago
Gerber, Mike 11a6341641 🧹 dinglehopper: Remove broken implementation of the unordered word error rate 5 years ago
Gerber, Mike f22228840e 🧹 dinglehopper: Use exclusively relative imports in tests 5 years ago
Gerber, Mike d61c076aad 🧹 dinglehopper: Remove debug print()s 5 years ago
Gerber, Mike 12a48f3bfe dinglehopper: Test aligning lists of lines 5 years ago
Gerber, Mike 680c2a2661 🐛 dinglehopper: Fix test_ocrd_cli for Python 3.5, again, and again 5 years ago
Gerber, Mike 7cf1a540f4 🐛 dinglehopper: Fix test_ocrd_cli for Python 3.5, again 5 years ago
Gerber, Mike 49e2065ad6 🐛 dinglehopper: Fix test_ocrd_cli for Python 3.5 5 years ago
Gerber, Mike 86178271df dinglehopper: Fix repeated tests for the OCR-D interface 5 years ago
Gerber, Mike b6f50ef853 dinglehopper: Add a test for the OCR-D interface 5 years ago
Konstantin Baierer 2ca44af31d ocrd-tool: add category 5 years ago
Gerber, Mike c30553985f � dinglehopper: Substitute more characters 5 years ago
Gerber, Mike 493541fddf 🐛 dinglehopper: Always work with NFC text 5 years ago
Gerber, Mike df93c80e5d 🐛 dinglehopper: Always work with NFC text 5 years ago
Gerber, Mike 715b813bbc � dinglehopper: Add two more eMOP ligatures 5 years ago
Gerber, Mike 8d055e7b6e 🐛 dinglehopper: Work on NFC'ed grapheme clusters when aligning text 5 years ago
Gerber, Mike 534958be1d 🐛 dinglehopper: Fix sorting the reading order
Regions were sorted wrongly when there are more than 9 regions in an
OrderedGroup because the index was sorted alphabetically, not
numerically. Fix this by converting the index to integers.
5 years ago
Gerber, Mike 10f010eaa8 🐛 dinglehopper: Do not throw error if a region ID is not found
The ReadingOrder might contain regions of types other than text regions,
so not finding a TextRegion with the referenced ID is not an error.
Downgrade to a warning for now.
5 years ago
Gerber, Mike 8237b3edaf � dinglehopper: Substitute more characters 5 years ago
Gerber, Mike 02a0e093bf dinglehopper: Add OCR-D interface 5 years ago
Gerber, Mike 495919c06d 🧹 dinglehopper: Move pytest.ini 5 years ago
Gerber, Mike 89048bf55d ➡ Move dinglehopper into its own directory 5 years ago