This website works better with JavaScript.
5cbe148741
🐛 dinglehopper: Skip pages if there is no GT nor OCR (Fixes GH-34)
Gerber, Mike
2020-10-21 19:29:35 +0200
e4e2777cb7
🐛 dinglehopper: Do try to get text when no TextEquivs exist
Gerber, Mike
2020-10-21 17:59:44 +0200
f14ae46870
Merge branch 'feat/text-extraction-levels'
Gerber, Mike
2020-10-21 17:51:44 +0200
1c88891a98
✔️ Add test data for LAREX's indexed TextEquivs (unused)
Gerber, Mike
2020-10-21 17:51:15 +0200
19d15e3ecc
🐛 dinglehopper: Honor TextEquiv index (Closes GH-33)
Gerber, Mike
2020-10-21 17:50:21 +0200
f626a2ebe6
🧹 dinglehopper: Remove warning when there is a non-TextRegion in the ReadingOrder
Gerber, Mike
2020-10-21 17:03:55 +0200
0f3857d8d3
📝 Document OCR-D parameters and restructure README a bit
Gerber, Mike
2020-10-21 16:51:53 +0200
8b4ee20a40
✨ Add a new CLI tool dinglehopper-extract to just give the extracted text
Gerber, Mike
2020-10-21 16:30:48 +0200
b23b75b601
✨ dinglehopper: Give segment ids from the extracted textequiv_level
Gerber, Mike
2020-10-21 16:04:25 +0200
b23e4ce30e
✨ dinglehopper: Add OCR-D parameter to choose TextEquiv level
Gerber, Mike
2020-10-21 14:38:15 +0200
9744fa2567
✨ dinglehopper: Add CLI option to choose TextEquiv level
Gerber, Mike
2020-10-20 19:33:39 +0200
75733039b8
🧹 dinglehopper: Do not hardcode joiner to \n
Gerber, Mike
2020-10-20 18:43:56 +0200
3848412349
✨ dinglehopper: Implement the basic text extraction from PAGE TextLines
Gerber, Mike
2020-10-20 18:40:07 +0200
f2367ac0c3
🐛 Fix OCR-D CLI for newest OCR-D
Gerber, Mike
2020-10-16 14:58:17 +0200
5ed184c8c4
✨ dinglehopper: Show a progressbar on --progress
Gerber, Mike
2020-10-15 16:09:17 +0200
4951823a29
🧹 dinglehopper: Disable metrics in JSON report, too
Gerber, Mike
2020-10-15 15:38:15 +0200
5303eea80c
📝 dinglehopper: Update README to use OCR-D's new and more readable -P option
Gerber, Mike
2020-10-15 15:37:51 +0200
82217a25bb
🧹 dinglehopper: Move all normalization code to extracted_text.py
Gerber, Mike
2020-10-08 17:29:25 +0200
009fa55c09
Merge branch 'master' of https://github.com/qurator-spk/dinglehopper
Gerber, Mike
2020-10-08 17:17:40 +0200
c20bbbfa25
📝 dinglehopper: Update screenshot to include a region id tooltip
Gerber, Mike
2020-10-08 17:17:34 +0200
252bf9b3e7
📝 dinglehopper: Fix markdown in README.md
Mike Gerber
2020-10-08 17:14:29 +0200
c6c6b8efab
📝 dinglehopper: Add detail about the text extraction and ExtractedText
Gerber, Mike
2020-10-08 17:05:36 +0200
7025ea54a8
📝 dinglehopper: Move developer info to README-DEV.md
Gerber, Mike
2020-10-08 16:59:50 +0200
f50591abac
Merge branch 'feat/display-segment-id'
Gerber, Mike
2020-10-08 13:39:38 +0200
c514abfb9f
🧹 dinglehopper: Sanitize imports
Gerber, Mike
2020-10-08 13:33:19 +0200
1077dc64ce
➡️ dinglehopper: Move ExtractedText to its own file
Gerber, Mike
2020-10-08 13:25:20 +0200
9dd4ff0aae
✨ dinglehopper: Extract line IDs for ALTO
Gerber, Mike
2020-10-08 12:54:28 +0200
f3aafb6fdf
✨ dinglehopper: Validate ExtractedText.{segments,_text} in both directions
Gerber, Mike
2020-10-08 12:20:27 +0200
1f9a680fe7
⚙️ dinglehopper: PyCharm should use dinglehopper-github virtualenv
Gerber, Mike
2020-10-08 12:16:42 +0200
b14c35e147
🎨 dinglehopper: Use multimethod to handle str vs ExtractedText
Gerber, Mike
2020-10-08 12:15:58 +0200
a17ee2afec
🚧 dinglehopper: Guarantee NFC + rename from_text → from_str
Gerber, Mike
2020-10-08 11:25:01 +0200
7843824eaf
🚧 dinglehopper: Support str & ExtractedText in CER and distance functions
Gerber, Mike
2020-10-08 10:47:20 +0200
5bee55c896
💩 dinglehopper: Fix OCR-D CLI test by working around ocrd_cli_wrap_processor() check for arguments
Gerber, Mike
2020-09-25 14:53:19 +0200
96b55f1806
🚧 dinglehopper: Hierarchical text representation
Gerber, Mike
2020-10-07 18:31:52 +0200
db6292611f
🧹 dinglehopper: Remove merged text extraction test code
Gerber, Mike
2020-10-07 16:07:27 +0200
d706ef4621
📝 Document CER/WER and the format detection (Fixes GH-26)
Gerber, Mike
2020-09-30 17:58:05 +0200
da47e41c85
💩 dinglehopper: Fix OCR-D CLI test by working around ocrd_cli_wrap_processor() check for arguments
Gerber, Mike
2020-09-25 14:53:19 +0200
7085ee0fd8
Merge pull request #29 from kba/getlogger
Mike Gerber
2020-09-25 13:20:58 +0200
77154ef256
📝 dinglehopper: Document REPORT_PREFIX (Closes GH-27.)
Gerber, Mike
2020-09-24 20:56:50 +0200
829b84c66a
⚙️ dinglehopper: Add PyCharm's vcs.xml to git
Gerber, Mike
2020-09-24 20:51:42 +0200
12da98e477
getLogger per method
Konstantin Baierer
2020-09-24 10:16:52 +0200
717801bdbb
Merge commit '7930ecd42868cb6785a58f8ee95b05882704621d'
Gerber, Mike
2020-09-03 14:47:44 +0200
7930ecd428
Merge branch 'master' of https://github.com/qurator-spk/dinglehopper
Gerber, Mike
2020-08-10 18:03:39 +0200
976a042b2b
🔧 dinglehopper: Add PyCharm code style config
Gerber, Mike
2020-08-10 18:03:00 +0200
7e3dafd3bc
🔧 dinglehopper: Add PyCharm code style config
Gerber, Mike
2020-08-10 18:03:00 +0200
2b98f69afe
Merge pull request #23 from kba/file-ids-and-such
Mike Gerber
2020-08-07 18:12:07 +0200
004ae298ca
ocrd cli: use make_file_id and assert_file_grp_cardinality
Konstantin Baierer
2020-08-07 17:51:23 +0200
79253c2640
Merge branch 'feat/display-segment-id' of https://github.com/qurator-spk/dinglehopper into feat/display-segment-id
Gerber, Mike
2020-06-26 17:52:39 +0200
5a3a74b246
Merge branch 'feat/display-segment-id' of github.com:qurator-spk/dinglehopper into feat/display-segment-id
Gerber, Mike
2020-06-23 17:02:56 +0200
6ab38f1bda
🎨 dinglehopper: Make PyCharm happier with the type hinting, newlines etc.
Gerber, Mike
2020-06-12 20:59:37 +0200
d484810038
✨ dinglehopper: Validate read segment ids
Gerber, Mike
2020-06-12 20:43:25 +0200
d39f74f11a
🧹 dinglehopper: Remove obsolete normalization-related FIXME
Gerber, Mike
2020-06-12 20:29:50 +0200
8c5f7c73d5
🧹 dinglehopper: Replace XXX with an actual comment
Gerber, Mike
2020-06-12 20:24:58 +0200
37edc0336f
🧹 dinglehopper: Remove obsolete XXX that has a GitHub issue
Gerber, Mike
2020-06-12 20:21:18 +0200
9f05e6ca4c
🧹 dinglehopper: Remove obsolete XXX about None ids
Gerber, Mike
2020-06-12 20:19:38 +0200
4469af62c8
🎨 dinglehopper: Unfuck substitutions a bit
Gerber, Mike
2020-06-12 20:05:33 +0200
079be203bd
🐛 dinglehopper: Fix tests to deal with new normalization logic
Gerber, Mike
2020-06-12 20:04:24 +0200
c010a7f05e
🧹 dinglehopper: Calculate segment ids once, on the first call
Gerber, Mike
2020-06-12 18:06:42 +0200
0cf7ff4721
🧹 dinglehopper: Remove obsolete XXX about the PAGE hierarchy
Gerber, Mike
2020-06-12 17:04:07 +0200
c432cb505a
🧹 dinglehopper: Clean up test_lines_similar()
Gerber, Mike
2020-06-12 17:01:56 +0200
0c33e84415
📓 dinglehopper: Document editops()
Gerber, Mike
2020-06-12 17:01:28 +0200
a61c935624
🧹 dinglehopper: Move Python 3.5 XXXs to a GitHub issue
Gerber, Mike
2020-06-12 16:08:56 +0200
257e4986cc
🚧 dinglehopper: Use a Bootstrap tooltip for the segment id
Gerber, Mike
2020-06-12 15:56:01 +0200
a320d5fd8f
🚧 dinglehopper: Re-introduce "substitute_equivalences" as Normalization.NFC_SBB
Gerber, Mike
2020-06-12 15:53:15 +0200
2579e0220c
🚧 dinglehopper: Remove debug output
Gerber, Mike
2020-06-12 14:25:11 +0200
d4e39d3d26
🚧 dinglehopper: Display segment id in the corresponding column
Gerber, Mike
2020-06-12 13:46:28 +0200
48ad340428
🚧 dinglehopper: Display segment id when hovering over a character difference
Gerber, Mike
2020-06-12 13:25:35 +0200
1f6538b44c
🚧 dinglehopper: Extract text while retaining segment id info
Gerber, Mike
2020-06-11 17:43:30 +0200
275ff32524
🚧 dinglehopper: Extract text while retaining segment id info
Gerber, Mike
2020-06-11 16:54:48 +0200
4e182e0794
🚧 dinglehopper: Extract text while retaining segment id info
Gerber, Mike
2020-06-11 15:37:34 +0200
9f8bb1d8ea
🚧 dinglehopper: Extract text while retaining segment id info
Gerber, Mike
2020-06-11 15:35:52 +0200
1083dcc5b9
🚧 dinglehopper: Test aligning by character while retaining segment id info
Gerber, Mike
2020-06-11 14:56:23 +0200
55db2b713f
🚧 dinglehopper: Test aligning by character while retaining segment id info
Gerber, Mike
2020-06-11 14:54:50 +0200
0d569e81c3
🚧 dinglehopper: Test aligning by character while retaining segment id info
Gerber, Mike
2020-06-11 14:50:32 +0200
167dad18f4
🚧 dinglehopper: Test aligning by character while retaining segment id info
Gerber, Mike
2020-06-11 13:54:46 +0200
4cd835ae51
🚧 dinglehopper: Test aligning by character while retaining segment id info
Gerber, Mike
2020-06-11 13:04:36 +0200
8435d88419
🚧 dinglehopper: WIP data structure for extracted text
Gerber, Mike
2020-06-10 20:31:54 +0200
534e042f9e
🚧 dinglehopper: WIP data structure for extracted text
Gerber, Mike
2020-06-10 20:29:01 +0200
89852314dc
🚧 dinglehopper: WIP data structure for extracted text
Gerber, Mike
2020-06-10 19:49:12 +0200
4bd30e6686
🚧 dinglehopper: WIP data structure for extracted text
Gerber, Mike
2020-06-10 19:40:57 +0200
bc630233d0
🚧 dinglehopper: WIP data structure for extracted text
Gerber, Mike
2020-06-10 19:36:49 +0200
2c69e077fe
🚧 dinglehopper: WIP data structure for extracted text
Gerber, Mike
2020-06-10 18:30:34 +0200
84c9e6a9c7
🚧 dinglehopper: WIP data structure for extracted text
Gerber, Mike
2020-06-10 18:29:11 +0200
c3709e2ec0
🧹 dinglehopper: Remove .vimrc again (security)
Gerber, Mike
2020-06-18 13:27:24 +0200
5aa74e8383
🎨 dinglehopper: Make PyCharm happier with the type hinting, newlines etc.
Gerber, Mike
2020-06-12 20:59:37 +0200
e972328e51
✨ dinglehopper: Validate read segment ids
Gerber, Mike
2020-06-12 20:43:25 +0200
c9109999db
🧹 dinglehopper: Remove obsolete normalization-related FIXME
Gerber, Mike
2020-06-12 20:29:50 +0200
bc006746dd
🧹 dinglehopper: Replace XXX with an actual comment
Gerber, Mike
2020-06-12 20:24:58 +0200
507ad6b6a4
🧹 dinglehopper: Remove obsolete XXX that has a GitHub issue
Gerber, Mike
2020-06-12 20:21:18 +0200
e0aa9bc3f4
🧹 dinglehopper: Remove obsolete XXX about None ids
Gerber, Mike
2020-06-12 20:19:38 +0200
6eb0a9350c
🎨 dinglehopper: Unfuck substitutions a bit
Gerber, Mike
2020-06-12 20:05:33 +0200
e3e7938162
🐛 dinglehopper: Fix tests to deal with new normalization logic
Gerber, Mike
2020-06-12 20:04:24 +0200
c3ae73d576
🧹 dinglehopper: Calculate segment ids once, on the first call
Gerber, Mike
2020-06-12 18:06:42 +0200
bc05f83088
🧹 dinglehopper: Remove obsolete XXX about the PAGE hierarchy
Gerber, Mike
2020-06-12 17:04:07 +0200
453247c2f3
🧹 dinglehopper: Clean up test_lines_similar()
Gerber, Mike
2020-06-12 17:01:56 +0200
dc85294380
📓 dinglehopper: Document editops()
Gerber, Mike
2020-06-12 17:01:28 +0200
e1c8546336
🧹 dinglehopper: Move Python 3.5 XXXs to a GitHub issue
Gerber, Mike
2020-06-12 16:08:56 +0200
4b86f01b15
🚧 dinglehopper: Use a Bootstrap tooltip for the segment id
Gerber, Mike
2020-06-12 15:56:01 +0200
a1c1b9c5ca
🚧 dinglehopper: Re-introduce "substitute_equivalences" as Normalization.NFC_SBB
Gerber, Mike
2020-06-12 15:53:15 +0200
28849c701b
🚧 dinglehopper: Remove debug output
Gerber, Mike
2020-06-12 14:25:11 +0200