Benjamin Rosemann
20661487d6
Add .editorconfig
...
Add a proposal for a .editorconfig file (see https://editorconfig.org/ ).
This is natively supported by a lot of editors, others are supported via
plugins.
This will close #19 .
4 years ago
Gerber, Mike
6e47acda1c
📝 dinglehopper: Move screenshot higher
4 years ago
Gerber, Mike
5cbe148741
🐛 dinglehopper: Skip pages if there is no GT nor OCR (Fixes GH-34)
4 years ago
Gerber, Mike
e4e2777cb7
🐛 dinglehopper: Do try to get text when no TextEquivs exist
4 years ago
Gerber, Mike
f14ae46870
Merge branch 'feat/text-extraction-levels'
4 years ago
Gerber, Mike
1c88891a98
✔️ Add test data for LAREX's indexed TextEquivs (unused)
4 years ago
Gerber, Mike
19d15e3ecc
🐛 dinglehopper: Honor TextEquiv index (Closes GH-33)
4 years ago
Gerber, Mike
f626a2ebe6
🧹 dinglehopper: Remove warning when there is a non-TextRegion in the ReadingOrder
4 years ago
Gerber, Mike
0f3857d8d3
📝 Document OCR-D parameters and restructure README a bit
4 years ago
Gerber, Mike
8b4ee20a40
✨ Add a new CLI tool dinglehopper-extract to just give the extracted text
4 years ago
Gerber, Mike
b23b75b601
✨ dinglehopper: Give segment ids from the extracted textequiv_level
4 years ago
Gerber, Mike
b23e4ce30e
✨ dinglehopper: Add OCR-D parameter to choose TextEquiv level
4 years ago
Gerber, Mike
9744fa2567
✨ dinglehopper: Add CLI option to choose TextEquiv level
4 years ago
Gerber, Mike
75733039b8
🧹 dinglehopper: Do not hardcode joiner to \n
4 years ago
Gerber, Mike
3848412349
✨ dinglehopper: Implement the basic text extraction from PAGE TextLines
4 years ago
Gerber, Mike
f2367ac0c3
🐛 Fix OCR-D CLI for newest OCR-D
...
Now that find_files() is a generator, we can't use [0] to get the file.
4 years ago
Gerber, Mike
5ed184c8c4
✨ dinglehopper: Show a progressbar on --progress
4 years ago
Gerber, Mike
4951823a29
🧹 dinglehopper: Disable metrics in JSON report, too
4 years ago
Gerber, Mike
5303eea80c
📝 dinglehopper: Update README to use OCR-D's new and more readable -P option
4 years ago
Gerber, Mike
82217a25bb
🧹 dinglehopper: Move all normalization code to extracted_text.py
4 years ago
Gerber, Mike
009fa55c09
Merge branch 'master' of https://github.com/qurator-spk/dinglehopper
4 years ago
Gerber, Mike
c20bbbfa25
📝 dinglehopper: Update screenshot to include a region id tooltip
4 years ago
Mike Gerber
252bf9b3e7
📝 dinglehopper: Fix markdown in README.md
4 years ago
Gerber, Mike
c6c6b8efab
📝 dinglehopper: Add detail about the text extraction and ExtractedText
4 years ago
Gerber, Mike
7025ea54a8
📝 dinglehopper: Move developer info to README-DEV.md
4 years ago
Gerber, Mike
f50591abac
Merge branch 'feat/display-segment-id'
4 years ago
Gerber, Mike
c514abfb9f
🧹 dinglehopper: Sanitize imports
4 years ago
Gerber, Mike
1077dc64ce
➡️ dinglehopper: Move ExtractedText to its own file
4 years ago
Gerber, Mike
9dd4ff0aae
✨ dinglehopper: Extract line IDs for ALTO
4 years ago
Gerber, Mike
f3aafb6fdf
✨ dinglehopper: Validate ExtractedText.{segments,_text} in both directions
4 years ago
Gerber, Mike
1f9a680fe7
⚙️ dinglehopper: PyCharm should use dinglehopper-github virtualenv
4 years ago
Gerber, Mike
b14c35e147
🎨 dinglehopper: Use multimethod to handle str vs ExtractedText
4 years ago
Gerber, Mike
a17ee2afec
🚧 dinglehopper: Guarantee NFC + rename from_text → from_str
4 years ago
Gerber, Mike
7843824eaf
🚧 dinglehopper: Support str & ExtractedText in CER and distance functions
4 years ago
Gerber, Mike
5bee55c896
💩 dinglehopper: Fix OCR-D CLI test by working around ocrd_cli_wrap_processor() check for arguments
4 years ago
Gerber, Mike
96b55f1806
🚧 dinglehopper: Hierarchical text representation
4 years ago
Gerber, Mike
db6292611f
🧹 dinglehopper: Remove merged text extraction test code
4 years ago
Gerber, Mike
d706ef4621
📝 Document CER/WER and the format detection (Fixes GH-26)
4 years ago
Gerber, Mike
da47e41c85
💩 dinglehopper: Fix OCR-D CLI test by working around ocrd_cli_wrap_processor() check for arguments
4 years ago
Mike Gerber
7085ee0fd8
Merge pull request #29 from kba/getlogger
...
getLogger per method
4 years ago
Gerber, Mike
77154ef256
📝 dinglehopper: Document REPORT_PREFIX (Closes GH-27.)
4 years ago
Gerber, Mike
829b84c66a
⚙️ dinglehopper: Add PyCharm's vcs.xml to git
4 years ago
Konstantin Baierer
12da98e477
getLogger per method
4 years ago
Gerber, Mike
717801bdbb
Merge commit '7930ecd42868cb6785a58f8ee95b05882704621d'
4 years ago
Gerber, Mike
7930ecd428
Merge branch 'master' of https://github.com/qurator-spk/dinglehopper
4 years ago
Gerber, Mike
976a042b2b
🔧 dinglehopper: Add PyCharm code style config
4 years ago
Gerber, Mike
7e3dafd3bc
🔧 dinglehopper: Add PyCharm code style config
4 years ago
Mike Gerber
2b98f69afe
Merge pull request #23 from kba/file-ids-and-such
...
ocrd cli: use make_file_id and assert_file_grp_cardinality
4 years ago
Konstantin Baierer
004ae298ca
ocrd cli: use make_file_id and assert_file_grp_cardinality
4 years ago
Gerber, Mike
79253c2640
Merge branch 'feat/display-segment-id' of https://github.com/qurator-spk/dinglehopper into feat/display-segment-id
5 years ago