Gerber, Mike
|
f14ae46870
|
Merge branch 'feat/text-extraction-levels'
|
4 years ago |
Gerber, Mike
|
1c88891a98
|
✔️ Add test data for LAREX's indexed TextEquivs (unused)
|
4 years ago |
Gerber, Mike
|
19d15e3ecc
|
🐛 dinglehopper: Honor TextEquiv index (Closes GH-33)
|
4 years ago |
Gerber, Mike
|
f626a2ebe6
|
🧹 dinglehopper: Remove warning when there is a non-TextRegion in the ReadingOrder
|
4 years ago |
Gerber, Mike
|
0f3857d8d3
|
📝 Document OCR-D parameters and restructure README a bit
|
4 years ago |
Gerber, Mike
|
8b4ee20a40
|
✨ Add a new CLI tool dinglehopper-extract to just give the extracted text
|
4 years ago |
Gerber, Mike
|
b23b75b601
|
✨ dinglehopper: Give segment ids from the extracted textequiv_level
|
4 years ago |
Gerber, Mike
|
b23e4ce30e
|
✨ dinglehopper: Add OCR-D parameter to choose TextEquiv level
|
4 years ago |
Gerber, Mike
|
9744fa2567
|
✨ dinglehopper: Add CLI option to choose TextEquiv level
|
4 years ago |
Gerber, Mike
|
75733039b8
|
🧹 dinglehopper: Do not hardcode joiner to \n
|
4 years ago |
Gerber, Mike
|
3848412349
|
✨ dinglehopper: Implement the basic text extraction from PAGE TextLines
|
4 years ago |
Gerber, Mike
|
f2367ac0c3
|
🐛 Fix OCR-D CLI for newest OCR-D
Now that find_files() is a generator, we can't use [0] to get the file.
|
4 years ago |
Gerber, Mike
|
5ed184c8c4
|
✨ dinglehopper: Show a progressbar on --progress
|
4 years ago |
Gerber, Mike
|
4951823a29
|
🧹 dinglehopper: Disable metrics in JSON report, too
|
4 years ago |
Gerber, Mike
|
5303eea80c
|
📝 dinglehopper: Update README to use OCR-D's new and more readable -P option
|
4 years ago |
Gerber, Mike
|
82217a25bb
|
🧹 dinglehopper: Move all normalization code to extracted_text.py
|
4 years ago |
Gerber, Mike
|
009fa55c09
|
Merge branch 'master' of https://github.com/qurator-spk/dinglehopper
|
4 years ago |
Gerber, Mike
|
c20bbbfa25
|
📝 dinglehopper: Update screenshot to include a region id tooltip
|
4 years ago |
Mike Gerber
|
252bf9b3e7
|
📝 dinglehopper: Fix markdown in README.md
|
4 years ago |
Gerber, Mike
|
c6c6b8efab
|
📝 dinglehopper: Add detail about the text extraction and ExtractedText
|
4 years ago |
Gerber, Mike
|
7025ea54a8
|
📝 dinglehopper: Move developer info to README-DEV.md
|
4 years ago |
Gerber, Mike
|
f50591abac
|
Merge branch 'feat/display-segment-id'
|
4 years ago |
Gerber, Mike
|
c514abfb9f
|
🧹 dinglehopper: Sanitize imports
|
4 years ago |
Gerber, Mike
|
1077dc64ce
|
➡️ dinglehopper: Move ExtractedText to its own file
|
4 years ago |
Gerber, Mike
|
9dd4ff0aae
|
✨ dinglehopper: Extract line IDs for ALTO
|
4 years ago |
Gerber, Mike
|
f3aafb6fdf
|
✨ dinglehopper: Validate ExtractedText.{segments,_text} in both directions
|
4 years ago |
Gerber, Mike
|
1f9a680fe7
|
⚙️ dinglehopper: PyCharm should use dinglehopper-github virtualenv
|
4 years ago |
Gerber, Mike
|
b14c35e147
|
🎨 dinglehopper: Use multimethod to handle str vs ExtractedText
|
4 years ago |
Gerber, Mike
|
a17ee2afec
|
🚧 dinglehopper: Guarantee NFC + rename from_text → from_str
|
4 years ago |
Gerber, Mike
|
7843824eaf
|
🚧 dinglehopper: Support str & ExtractedText in CER and distance functions
|
4 years ago |
Gerber, Mike
|
5bee55c896
|
💩 dinglehopper: Fix OCR-D CLI test by working around ocrd_cli_wrap_processor() check for arguments
|
4 years ago |
Gerber, Mike
|
96b55f1806
|
🚧 dinglehopper: Hierarchical text representation
|
4 years ago |
Gerber, Mike
|
db6292611f
|
🧹 dinglehopper: Remove merged text extraction test code
|
4 years ago |
Gerber, Mike
|
d706ef4621
|
📝 Document CER/WER and the format detection (Fixes GH-26)
|
4 years ago |
Gerber, Mike
|
da47e41c85
|
💩 dinglehopper: Fix OCR-D CLI test by working around ocrd_cli_wrap_processor() check for arguments
|
4 years ago |
Mike Gerber
|
7085ee0fd8
|
Merge pull request #29 from kba/getlogger
getLogger per method
|
4 years ago |
Gerber, Mike
|
77154ef256
|
📝 dinglehopper: Document REPORT_PREFIX (Closes GH-27.)
|
4 years ago |
Gerber, Mike
|
829b84c66a
|
⚙️ dinglehopper: Add PyCharm's vcs.xml to git
|
4 years ago |
Konstantin Baierer
|
12da98e477
|
getLogger per method
|
4 years ago |
Gerber, Mike
|
717801bdbb
|
Merge commit '7930ecd42868cb6785a58f8ee95b05882704621d'
|
4 years ago |
Gerber, Mike
|
7930ecd428
|
Merge branch 'master' of https://github.com/qurator-spk/dinglehopper
|
4 years ago |
Gerber, Mike
|
976a042b2b
|
🔧 dinglehopper: Add PyCharm code style config
|
4 years ago |
Gerber, Mike
|
7e3dafd3bc
|
🔧 dinglehopper: Add PyCharm code style config
|
4 years ago |
Mike Gerber
|
2b98f69afe
|
Merge pull request #23 from kba/file-ids-and-such
ocrd cli: use make_file_id and assert_file_grp_cardinality
|
4 years ago |
Konstantin Baierer
|
004ae298ca
|
ocrd cli: use make_file_id and assert_file_grp_cardinality
|
4 years ago |
Gerber, Mike
|
79253c2640
|
Merge branch 'feat/display-segment-id' of https://github.com/qurator-spk/dinglehopper into feat/display-segment-id
|
4 years ago |
Gerber, Mike
|
5a3a74b246
|
Merge branch 'feat/display-segment-id' of github.com:qurator-spk/dinglehopper into feat/display-segment-id
|
4 years ago |
Gerber, Mike
|
6ab38f1bda
|
🎨 dinglehopper: Make PyCharm happier with the type hinting, newlines etc.
|
4 years ago |
Gerber, Mike
|
d484810038
|
✨ dinglehopper: Validate read segment ids
|
4 years ago |
Gerber, Mike
|
d39f74f11a
|
🧹 dinglehopper: Remove obsolete normalization-related FIXME
|
4 years ago |