Commit Graph

347 Commits (1e7c46285be455f3f87da4c47b02b8d6a2309c83)
 

Author SHA1 Message Date
Benjamin Rosemann 20661487d6 Add .editorconfig
Add a proposal for a .editorconfig file (see https://editorconfig.org/).
This is natively supported by a lot of editors, others are supported via
plugins.

This will close #19.
4 years ago
Gerber, Mike 6e47acda1c 📝 dinglehopper: Move screenshot higher 4 years ago
Gerber, Mike 5cbe148741 🐛 dinglehopper: Skip pages if there is no GT nor OCR (Fixes GH-34) 4 years ago
Gerber, Mike e4e2777cb7 🐛 dinglehopper: Do try to get text when no TextEquivs exist 4 years ago
Gerber, Mike f14ae46870 Merge branch 'feat/text-extraction-levels' 4 years ago
Gerber, Mike 1c88891a98 ✔️ Add test data for LAREX's indexed TextEquivs (unused) 4 years ago
Gerber, Mike 19d15e3ecc 🐛 dinglehopper: Honor TextEquiv index (Closes GH-33) 4 years ago
Gerber, Mike f626a2ebe6 🧹 dinglehopper: Remove warning when there is a non-TextRegion in the ReadingOrder 4 years ago
Gerber, Mike 0f3857d8d3 📝 Document OCR-D parameters and restructure README a bit 4 years ago
Gerber, Mike 8b4ee20a40 Add a new CLI tool dinglehopper-extract to just give the extracted text 4 years ago
Gerber, Mike b23b75b601 dinglehopper: Give segment ids from the extracted textequiv_level 4 years ago
Gerber, Mike b23e4ce30e dinglehopper: Add OCR-D parameter to choose TextEquiv level 4 years ago
Gerber, Mike 9744fa2567 dinglehopper: Add CLI option to choose TextEquiv level 4 years ago
Gerber, Mike 75733039b8 🧹 dinglehopper: Do not hardcode joiner to \n 4 years ago
Gerber, Mike 3848412349 dinglehopper: Implement the basic text extraction from PAGE TextLines 4 years ago
Gerber, Mike f2367ac0c3 🐛 Fix OCR-D CLI for newest OCR-D
Now that find_files() is a generator, we can't use [0] to get the file.
4 years ago
Gerber, Mike 5ed184c8c4 dinglehopper: Show a progressbar on --progress 4 years ago
Gerber, Mike 4951823a29 🧹 dinglehopper: Disable metrics in JSON report, too 4 years ago
Gerber, Mike 5303eea80c 📝 dinglehopper: Update README to use OCR-D's new and more readable -P option 4 years ago
Gerber, Mike 82217a25bb 🧹 dinglehopper: Move all normalization code to extracted_text.py 4 years ago
Gerber, Mike 009fa55c09 Merge branch 'master' of https://github.com/qurator-spk/dinglehopper 4 years ago
Gerber, Mike c20bbbfa25 📝 dinglehopper: Update screenshot to include a region id tooltip 4 years ago
Mike Gerber 252bf9b3e7
📝 dinglehopper: Fix markdown in README.md 4 years ago
Gerber, Mike c6c6b8efab 📝 dinglehopper: Add detail about the text extraction and ExtractedText 4 years ago
Gerber, Mike 7025ea54a8 📝 dinglehopper: Move developer info to README-DEV.md 4 years ago
Gerber, Mike f50591abac Merge branch 'feat/display-segment-id' 4 years ago
Gerber, Mike c514abfb9f 🧹 dinglehopper: Sanitize imports 4 years ago
Gerber, Mike 1077dc64ce ➡️ dinglehopper: Move ExtractedText to its own file 4 years ago
Gerber, Mike 9dd4ff0aae dinglehopper: Extract line IDs for ALTO 4 years ago
Gerber, Mike f3aafb6fdf dinglehopper: Validate ExtractedText.{segments,_text} in both directions 4 years ago
Gerber, Mike 1f9a680fe7 ⚙️ dinglehopper: PyCharm should use dinglehopper-github virtualenv 4 years ago
Gerber, Mike b14c35e147 🎨 dinglehopper: Use multimethod to handle str vs ExtractedText 4 years ago
Gerber, Mike a17ee2afec 🚧 dinglehopper: Guarantee NFC + rename from_text → from_str 4 years ago
Gerber, Mike 7843824eaf 🚧 dinglehopper: Support str & ExtractedText in CER and distance functions 4 years ago
Gerber, Mike 5bee55c896 💩 dinglehopper: Fix OCR-D CLI test by working around ocrd_cli_wrap_processor() check for arguments 4 years ago
Gerber, Mike 96b55f1806 🚧 dinglehopper: Hierarchical text representation 4 years ago
Gerber, Mike db6292611f 🧹 dinglehopper: Remove merged text extraction test code 4 years ago
Gerber, Mike d706ef4621 📝 Document CER/WER and the format detection (Fixes GH-26) 4 years ago
Gerber, Mike da47e41c85 💩 dinglehopper: Fix OCR-D CLI test by working around ocrd_cli_wrap_processor() check for arguments 4 years ago
Mike Gerber 7085ee0fd8
Merge pull request #29 from kba/getlogger
getLogger per method
4 years ago
Gerber, Mike 77154ef256 📝 dinglehopper: Document REPORT_PREFIX (Closes GH-27.) 4 years ago
Gerber, Mike 829b84c66a ⚙️ dinglehopper: Add PyCharm's vcs.xml to git 4 years ago
Konstantin Baierer 12da98e477 getLogger per method 4 years ago
Gerber, Mike 717801bdbb Merge commit '7930ecd42868cb6785a58f8ee95b05882704621d' 4 years ago
Gerber, Mike 7930ecd428 Merge branch 'master' of https://github.com/qurator-spk/dinglehopper 4 years ago
Gerber, Mike 976a042b2b 🔧 dinglehopper: Add PyCharm code style config 4 years ago
Gerber, Mike 7e3dafd3bc 🔧 dinglehopper: Add PyCharm code style config 4 years ago
Mike Gerber 2b98f69afe
Merge pull request #23 from kba/file-ids-and-such
ocrd cli: use make_file_id and assert_file_grp_cardinality
4 years ago
Konstantin Baierer 004ae298ca ocrd cli: use make_file_id and assert_file_grp_cardinality 4 years ago
Gerber, Mike 79253c2640 Merge branch 'feat/display-segment-id' of https://github.com/qurator-spk/dinglehopper into feat/display-segment-id 5 years ago