Gerber, Mike
5cbe148741
🐛 dinglehopper: Skip pages if there is no GT nor OCR (Fixes GH-34)
4 years ago
Gerber, Mike
e4e2777cb7
🐛 dinglehopper: Do try to get text when no TextEquivs exist
4 years ago
Gerber, Mike
1c88891a98
✔️ Add test data for LAREX's indexed TextEquivs (unused)
4 years ago
Gerber, Mike
19d15e3ecc
🐛 dinglehopper: Honor TextEquiv index (Closes GH-33)
4 years ago
Gerber, Mike
f626a2ebe6
🧹 dinglehopper: Remove warning when there is a non-TextRegion in the ReadingOrder
4 years ago
Gerber, Mike
8b4ee20a40
✨ Add a new CLI tool dinglehopper-extract to just give the extracted text
4 years ago
Gerber, Mike
b23b75b601
✨ dinglehopper: Give segment ids from the extracted textequiv_level
4 years ago
Gerber, Mike
b23e4ce30e
✨ dinglehopper: Add OCR-D parameter to choose TextEquiv level
4 years ago
Gerber, Mike
9744fa2567
✨ dinglehopper: Add CLI option to choose TextEquiv level
4 years ago
Gerber, Mike
75733039b8
🧹 dinglehopper: Do not hardcode joiner to \n
4 years ago
Gerber, Mike
3848412349
✨ dinglehopper: Implement the basic text extraction from PAGE TextLines
4 years ago
Gerber, Mike
f2367ac0c3
🐛 Fix OCR-D CLI for newest OCR-D
...
Now that find_files() is a generator, we can't use [0] to get the file.
4 years ago
Gerber, Mike
5ed184c8c4
✨ dinglehopper: Show a progressbar on --progress
4 years ago
Gerber, Mike
4951823a29
🧹 dinglehopper: Disable metrics in JSON report, too
4 years ago
Gerber, Mike
82217a25bb
🧹 dinglehopper: Move all normalization code to extracted_text.py
4 years ago
Gerber, Mike
c6c6b8efab
📝 dinglehopper: Add detail about the text extraction and ExtractedText
4 years ago
Gerber, Mike
f50591abac
Merge branch 'feat/display-segment-id'
4 years ago
Gerber, Mike
c514abfb9f
🧹 dinglehopper: Sanitize imports
4 years ago
Gerber, Mike
1077dc64ce
➡️ dinglehopper: Move ExtractedText to its own file
4 years ago
Gerber, Mike
9dd4ff0aae
✨ dinglehopper: Extract line IDs for ALTO
4 years ago
Gerber, Mike
f3aafb6fdf
✨ dinglehopper: Validate ExtractedText.{segments,_text} in both directions
4 years ago
Gerber, Mike
b14c35e147
🎨 dinglehopper: Use multimethod to handle str vs ExtractedText
4 years ago
Gerber, Mike
a17ee2afec
🚧 dinglehopper: Guarantee NFC + rename from_text → from_str
4 years ago
Gerber, Mike
7843824eaf
🚧 dinglehopper: Support str & ExtractedText in CER and distance functions
4 years ago
Gerber, Mike
5bee55c896
💩 dinglehopper: Fix OCR-D CLI test by working around ocrd_cli_wrap_processor() check for arguments
4 years ago
Gerber, Mike
96b55f1806
🚧 dinglehopper: Hierarchical text representation
4 years ago
Gerber, Mike
d706ef4621
📝 Document CER/WER and the format detection (Fixes GH-26)
4 years ago
Gerber, Mike
da47e41c85
💩 dinglehopper: Fix OCR-D CLI test by working around ocrd_cli_wrap_processor() check for arguments
4 years ago
Mike Gerber
7085ee0fd8
Merge pull request #29 from kba/getlogger
...
getLogger per method
4 years ago
Gerber, Mike
77154ef256
📝 dinglehopper: Document REPORT_PREFIX (Closes GH-27.)
4 years ago
Konstantin Baierer
12da98e477
getLogger per method
4 years ago
Konstantin Baierer
004ae298ca
ocrd cli: use make_file_id and assert_file_grp_cardinality
4 years ago
Gerber, Mike
6ab38f1bda
🎨 dinglehopper: Make PyCharm happier with the type hinting, newlines etc.
5 years ago
Gerber, Mike
d484810038
✨ dinglehopper: Validate read segment ids
5 years ago
Gerber, Mike
d39f74f11a
🧹 dinglehopper: Remove obsolete normalization-related FIXME
5 years ago
Gerber, Mike
8c5f7c73d5
🧹 dinglehopper: Replace XXX with an actual comment
5 years ago
Gerber, Mike
37edc0336f
🧹 dinglehopper: Remove obsolete XXX that has a GitHub issue
5 years ago
Gerber, Mike
9f05e6ca4c
🧹 dinglehopper: Remove obsolete XXX about None ids
5 years ago
Gerber, Mike
4469af62c8
🎨 dinglehopper: Unfuck substitutions a bit
5 years ago
Gerber, Mike
079be203bd
🐛 dinglehopper: Fix tests to deal with new normalization logic
5 years ago
Gerber, Mike
c010a7f05e
🧹 dinglehopper: Calculate segment ids once, on the first call
5 years ago
Gerber, Mike
0cf7ff4721
🧹 dinglehopper: Remove obsolete XXX about the PAGE hierarchy
5 years ago
Gerber, Mike
c432cb505a
🧹 dinglehopper: Clean up test_lines_similar()
5 years ago
Gerber, Mike
0c33e84415
📓 dinglehopper: Document editops()
5 years ago
Gerber, Mike
a61c935624
🧹 dinglehopper: Move Python 3.5 XXXs to a GitHub issue
...
See https://github.com/qurator-spk/dinglehopper/issues/20 .
5 years ago
Gerber, Mike
257e4986cc
🚧 dinglehopper: Use a Bootstrap tooltip for the segment id
5 years ago
Gerber, Mike
a320d5fd8f
🚧 dinglehopper: Re-introduce "substitute_equivalences" as Normalization.NFC_SBB
5 years ago
Gerber, Mike
2579e0220c
🚧 dinglehopper: Remove debug output
5 years ago
Gerber, Mike
d4e39d3d26
🚧 dinglehopper: Display segment id in the corresponding column
5 years ago
Gerber, Mike
48ad340428
🚧 dinglehopper: Display segment id when hovering over a character difference
5 years ago
Gerber, Mike
1f6538b44c
🚧 dinglehopper: Extract text while retaining segment id info
5 years ago
Gerber, Mike
275ff32524
🚧 dinglehopper: Extract text while retaining segment id info
5 years ago
Gerber, Mike
4e182e0794
🚧 dinglehopper: Extract text while retaining segment id info
5 years ago
Gerber, Mike
9f8bb1d8ea
🚧 dinglehopper: Extract text while retaining segment id info
5 years ago
Gerber, Mike
668de758a0
✨ dinglehopper: Support disabling metrics in the OCR-D interface
5 years ago
Gerber, Mike
f699697eb3
🐛 dinglehopper: Fix reading OCR-D workspace files when only URLs are provided
5 years ago
Gerber, Mike
22765f02a2
🐛 dinglehopper: Fix tests by making metrics a keyword argument
5 years ago
Gerber, Mike
5cbeb7b0dd
✨ dinglehopper: Support disabling the metrics using CLI option --no-metrics
5 years ago
Gerber, Mike
745095e52c
✨ dinglehopper: Include number of characters and words in JSON report
5 years ago
Gerber, Mike
48a31ce672
Revert "Merge branch 'master' of https://github.com/qurator-spk/sbb_textline_detector "
...
This reverts commit 2c89bf3b35ee290d7b830ef270df3a96aa48245e, reversing
changes made to 9f7e413148ca5dbac9b555d7b0d0a5fa3a0f5340.
5 years ago
b-vr103
1303a7d92f
Merge branch 'master' of https://github.com/qurator-spk/sbb_textline_detector
5 years ago
Gerber, Mike
f32eb9eb69
🐛 dinglehopper: Escape text inserted into HTML ( Fixes #8 )
5 years ago
Gerber, Mike
82e863fac2
📝 dinglehopper: Document seq_editops()
5 years ago
Gerber, Mike
5ccdace1dd
🎨 dinglehopper: Move working_directory() context manager into tests/util
5 years ago
Gerber, Mike
f98c527c93
🐛 dinglehopper: Fix working_directory() context manager
5 years ago
Gerber, Mike
5273d10bac
🐛 dinglehopper: Generate a loadable JSON report even if CER=∞
5 years ago
Gerber, Mike
ced6504ad0
🎨 dinglehopper: Expose clearing the Levenshtein cache as a function
5 years ago
Gerber, Mike
5cf4eddaeb
⚡ dinglehopper: Clear Levenshtein cache between OCR-D files
5 years ago
Gerber, Mike
58ff140bc0
⚡ ️ dinglehopper: Improve performance by caching the Levensthein matrix
...
Motivated by [a pull
request](https://github.com/qurator-spk/dinglehopper/pull/7 ) by
@JKamlah, implement a cache of the Levensthein matrix calculation.
We calculated the Levenshtein matrixes for characters and words twice:
Once for the error rates, once for the alignment.
5 years ago
Gerber, Mike
11a6341641
🧹 dinglehopper: Remove broken implementation of the unordered word error rate
5 years ago
Gerber, Mike
f22228840e
🧹 dinglehopper: Use exclusively relative imports in tests
5 years ago
Gerber, Mike
d61c076aad
🧹 dinglehopper: Remove debug print()s
5 years ago
Gerber, Mike
12a48f3bfe
✅ dinglehopper: Test aligning lists of lines
5 years ago
Gerber, Mike
680c2a2661
🐛 dinglehopper: Fix test_ocrd_cli for Python 3.5, again, and again
5 years ago
Gerber, Mike
7cf1a540f4
🐛 dinglehopper: Fix test_ocrd_cli for Python 3.5, again
5 years ago
Gerber, Mike
49e2065ad6
🐛 dinglehopper: Fix test_ocrd_cli for Python 3.5
5 years ago
Gerber, Mike
86178271df
✅ dinglehopper: Fix repeated tests for the OCR-D interface
5 years ago
Gerber, Mike
b6f50ef853
✅ dinglehopper: Add a test for the OCR-D interface
5 years ago
Konstantin Baierer
2ca44af31d
ocrd-tool: add category
5 years ago
Gerber, Mike
c30553985f
� dinglehopper: Substitute more characters
5 years ago
Gerber, Mike
493541fddf
🐛 dinglehopper: Always work with NFC text
5 years ago
Gerber, Mike
df93c80e5d
🐛 dinglehopper: Always work with NFC text
5 years ago
Gerber, Mike
715b813bbc
� dinglehopper: Add two more eMOP ligatures
5 years ago
Gerber, Mike
8d055e7b6e
🐛 dinglehopper: Work on NFC'ed grapheme clusters when aligning text
5 years ago
Gerber, Mike
534958be1d
🐛 dinglehopper: Fix sorting the reading order
...
Regions were sorted wrongly when there are more than 9 regions in an
OrderedGroup because the index was sorted alphabetically, not
numerically. Fix this by converting the index to integers.
5 years ago
Gerber, Mike
10f010eaa8
🐛 dinglehopper: Do not throw error if a region ID is not found
...
The ReadingOrder might contain regions of types other than text regions,
so not finding a TextRegion with the referenced ID is not an error.
Downgrade to a warning for now.
5 years ago
Gerber, Mike
8237b3edaf
� dinglehopper: Substitute more characters
5 years ago
Gerber, Mike
02a0e093bf
✨ dinglehopper: Add OCR-D interface
5 years ago
Gerber, Mike
495919c06d
🧹 dinglehopper: Move pytest.ini
5 years ago
Gerber, Mike
89048bf55d
➡ Move dinglehopper into its own directory
5 years ago