1
0
Fork 0
mirror of https://github.com/qurator-spk/dinglehopper.git synced 2025-06-08 19:30:01 +02:00
Commit graph

448 commits

Author SHA1 Message Date
082fc9e09a
Merge pull request #38 from b2m/add-editorconfig
Add .editorconfig
2020-10-28 15:16:04 +01:00
Benjamin Rosemann
20661487d6 Add .editorconfig
Add a proposal for a .editorconfig file (see https://editorconfig.org/).
This is natively supported by a lot of editors, others are supported via
plugins.

This will close #19.
2020-10-28 11:31:18 +01:00
6e47acda1c 📝 dinglehopper: Move screenshot higher 2020-10-21 19:31:53 +02:00
5cbe148741 🐛 dinglehopper: Skip pages if there is no GT nor OCR (Fixes GH-34) 2020-10-21 19:29:45 +02:00
e4e2777cb7 🐛 dinglehopper: Do try to get text when no TextEquivs exist 2020-10-21 17:59:44 +02:00
f14ae46870 Merge branch 'feat/text-extraction-levels' 2020-10-21 17:51:44 +02:00
1c88891a98 ✔️ Add test data for LAREX's indexed TextEquivs (unused) 2020-10-21 17:51:15 +02:00
19d15e3ecc 🐛 dinglehopper: Honor TextEquiv index (Closes GH-33) 2020-10-21 17:50:21 +02:00
f626a2ebe6 🧹 dinglehopper: Remove warning when there is a non-TextRegion in the ReadingOrder 2020-10-21 17:03:55 +02:00
0f3857d8d3 📝 Document OCR-D parameters and restructure README a bit 2020-10-21 16:54:23 +02:00
8b4ee20a40 Add a new CLI tool dinglehopper-extract to just give the extracted text 2020-10-21 16:30:48 +02:00
b23b75b601 dinglehopper: Give segment ids from the extracted textequiv_level 2020-10-21 16:04:33 +02:00
b23e4ce30e dinglehopper: Add OCR-D parameter to choose TextEquiv level 2020-10-21 14:38:19 +02:00
9744fa2567 dinglehopper: Add CLI option to choose TextEquiv level 2020-10-20 19:33:39 +02:00
75733039b8 🧹 dinglehopper: Do not hardcode joiner to \n 2020-10-20 18:43:56 +02:00
3848412349 dinglehopper: Implement the basic text extraction from PAGE TextLines 2020-10-20 18:40:21 +02:00
f2367ac0c3 🐛 Fix OCR-D CLI for newest OCR-D
Now that find_files() is a generator, we can't use [0] to get the file.
2020-10-16 14:58:27 +02:00
5ed184c8c4 dinglehopper: Show a progressbar on --progress 2020-10-15 16:09:54 +02:00
4951823a29 🧹 dinglehopper: Disable metrics in JSON report, too 2020-10-15 15:38:15 +02:00
5303eea80c 📝 dinglehopper: Update README to use OCR-D's new and more readable -P option 2020-10-15 15:37:51 +02:00
82217a25bb 🧹 dinglehopper: Move all normalization code to extracted_text.py 2020-10-08 17:29:25 +02:00
009fa55c09 Merge branch 'master' of https://github.com/qurator-spk/dinglehopper 2020-10-08 17:17:40 +02:00
c20bbbfa25 📝 dinglehopper: Update screenshot to include a region id tooltip 2020-10-08 17:17:34 +02:00
252bf9b3e7
📝 dinglehopper: Fix markdown in README.md 2020-10-08 17:14:29 +02:00
c6c6b8efab 📝 dinglehopper: Add detail about the text extraction and ExtractedText 2020-10-08 17:05:36 +02:00
7025ea54a8 📝 dinglehopper: Move developer info to README-DEV.md 2020-10-08 16:59:50 +02:00
f50591abac Merge branch 'feat/display-segment-id' 2020-10-08 13:39:38 +02:00
c514abfb9f 🧹 dinglehopper: Sanitize imports 2020-10-08 13:33:19 +02:00
1077dc64ce ➡️ dinglehopper: Move ExtractedText to its own file 2020-10-08 13:25:20 +02:00
9dd4ff0aae dinglehopper: Extract line IDs for ALTO 2020-10-08 12:54:28 +02:00
f3aafb6fdf dinglehopper: Validate ExtractedText.{segments,_text} in both directions 2020-10-08 12:20:27 +02:00
1f9a680fe7 ⚙️ dinglehopper: PyCharm should use dinglehopper-github virtualenv 2020-10-08 12:16:42 +02:00
b14c35e147 🎨 dinglehopper: Use multimethod to handle str vs ExtractedText 2020-10-08 12:15:58 +02:00
a17ee2afec 🚧 dinglehopper: Guarantee NFC + rename from_text → from_str 2020-10-08 11:25:01 +02:00
7843824eaf 🚧 dinglehopper: Support str & ExtractedText in CER and distance functions 2020-10-08 10:47:20 +02:00
5bee55c896 💩 dinglehopper: Fix OCR-D CLI test by working around ocrd_cli_wrap_processor() check for arguments 2020-10-07 18:40:06 +02:00
96b55f1806 🚧 dinglehopper: Hierarchical text representation 2020-10-07 18:31:52 +02:00
db6292611f 🧹 dinglehopper: Remove merged text extraction test code 2020-10-07 16:07:27 +02:00
d706ef4621 📝 Document CER/WER and the format detection (Fixes GH-26) 2020-09-30 17:58:05 +02:00
da47e41c85 💩 dinglehopper: Fix OCR-D CLI test by working around ocrd_cli_wrap_processor() check for arguments 2020-09-25 14:53:19 +02:00
7085ee0fd8
Merge pull request #29 from kba/getlogger
getLogger per method
2020-09-25 13:20:58 +02:00
77154ef256 📝 dinglehopper: Document REPORT_PREFIX (Closes GH-27.) 2020-09-24 20:58:15 +02:00
829b84c66a ⚙️ dinglehopper: Add PyCharm's vcs.xml to git 2020-09-24 20:51:42 +02:00
Konstantin Baierer
12da98e477 getLogger per method 2020-09-24 10:16:52 +02:00
717801bdbb Merge commit '7930ecd428' 2020-09-03 14:47:44 +02:00
7930ecd428 Merge branch 'master' of https://github.com/qurator-spk/dinglehopper 2020-08-10 18:03:39 +02:00
976a042b2b 🔧 dinglehopper: Add PyCharm code style config 2020-08-10 18:03:29 +02:00
7e3dafd3bc 🔧 dinglehopper: Add PyCharm code style config 2020-08-10 18:03:00 +02:00
2b98f69afe
Merge pull request #23 from kba/file-ids-and-such
ocrd cli: use make_file_id and assert_file_grp_cardinality
2020-08-07 18:12:07 +02:00
Konstantin Baierer
004ae298ca ocrd cli: use make_file_id and assert_file_grp_cardinality 2020-08-07 18:00:33 +02:00