-
0dd5fc0ee5
Small corrections
Benjamin Rosemann
2020-11-23 09:18:22 +0100
-
b24d8d5664
Performance increases
Benjamin Rosemann
2020-11-13 15:33:06 +0100
-
0ef7810dd0
Reduce number of splits for short (one char) elements
Benjamin Rosemann
2020-11-13 11:45:55 +0100
-
c9219cbacd
Make sure that 0 cer and wer are reported
Benjamin Rosemann
2020-11-13 09:01:33 +0100
-
fd6f57a263
Fix broken build on Python 3.5
Benjamin Rosemann
2020-11-13 08:54:21 +0100
-
cac437afbf
Evaluate some performance issues
Benjamin Rosemann
2020-11-12 18:38:16 +0100
-
1bc7ef6c8b
Correct report for fca
Benjamin Rosemann
2020-11-12 16:23:04 +0100
-
750ad00d1b
Add tooltips to fca report
Benjamin Rosemann
2020-11-11 17:21:56 +0100
-
53064bf833
Include fca as parameter and add some tests
Benjamin Rosemann
2020-11-11 11:14:44 +0100
-
9b76539936
Fix numpy version conflict with ocrd_utils
Benjamin Rosemann
2020-11-11 11:13:56 +0100
-
26fe98dde7
Readd pytest.ini
Benjamin Rosemann
2020-11-11 11:13:24 +0100
-
4a87adc2c7
Implement version specific data structures
Benjamin Rosemann
2020-11-10 17:18:09 +0100
-
2a215a1062
Reformat using black
Benjamin Rosemann
2020-11-10 14:26:31 +0100
-
5277593bdb
Fix some special cases
Benjamin Rosemann
2020-11-10 12:33:49 +0100
-
d7a74fa58b
First draft of flexible character accuracy
Benjamin Rosemann
2020-11-09 17:29:40 +0100
-
-
082e30822f
Fix method return type
Benjamin Rosemann
2020-11-19 11:24:38 +0100
-
e371da899e
Switch from custom Levenshtein to python-Levenshtein
Benjamin Rosemann
2020-11-16 12:06:44 +0100
-
0e263cfac2
Switch between c and own implementation for distance and editops.
Benjamin Rosemann
2020-11-16 09:48:54 +0100
-
11916c2dcf
Refactor tests in preparation of refactoring levenshtein.
Benjamin Rosemann
2020-11-16 08:40:41 +0100
-
-
bd324331e6
🚧 dinglehopper: Try out Drone CI
Gerber, Mike
2021-02-11 14:26:29 +0100
-
a59ecb795c
🚧 dinglehopper: Try out Drone CI
Gerber, Mike
2021-02-11 14:15:08 +0100
-
14230e073a
🚧 dinglehopper: Try out Drone CI
Gerber, Mike
2021-02-11 14:08:25 +0100
-
985666a71c
🚧 dinglehopper: Try out Drone CI
Gerber, Mike
2021-02-10 20:35:22 +0100
-
4a73053cfc
🚧 Replace Travis with CircleCI
Gerber, Mike
2021-02-10 18:22:52 +0100
-
e3d4493c82
🚧 Replace Travis with CircleCI
Gerber, Mike
2021-02-10 17:58:58 +0100
-
27f4c3bdf8
🚧 Replace Travis with CircleCI
Gerber, Mike
2021-02-10 17:57:08 +0100
-
8533e6d421
🚧 Replace Travis with CircleCI
Gerber, Mike
2021-02-10 17:55:09 +0100
-
e8da8b63f8
🚧 Replace Travis with CircleCI
Gerber, Mike
2021-02-10 17:53:50 +0100
-
3b7a1a5631
🚧 Replace Travis with CircleCI
Gerber, Mike
2021-02-10 17:50:34 +0100
-
691ce371ca
Merge pull request #50 from b2m/fix-table-extraction
Mike Gerber
2021-02-01 17:51:33 +0100
-
-
a68fc269d9
Fix the extraction of text from Page with TableRegion
Benjamin Rosemann
2020-11-27 11:18:11 +0100
-
-
8cd8314c8a
🐛 dinglehopper: Bump up ocrd req for zip_input_files
Gerber, Mike
2020-11-19 18:59:20 +0100
-
62670dd0c7
Merge pull request #49 from kba/zip_input_files
Mike Gerber
2020-11-19 18:54:21 +0100
-
-
74e0ac18ed
ocrd cli: use core-provided zip_input_files method
Konstantin Baierer
2020-11-19 16:00:28 +0100
-
-
389e253c11
🐛 dinglehopper: Fix alto_extract_lines()'s type annotation
Gerber, Mike
2020-11-12 19:32:38 +0100
-
fe3923a8af
🐛 dinglehopper: Fix alto_extract()'s type annotation
Gerber, Mike
2020-11-12 19:19:05 +0100
-
132f91d500
✔️ dinglehopper: Add missing integration test markers
Gerber, Mike
2020-11-12 19:10:23 +0100
-
c48d7646df
📝 dinglehopper: README-DEV: Massage markdown a bit
Gerber, Mike
2020-11-12 19:04:38 +0100
-
fed021090d
Merge pull request #46 from b2m/tool-changes
Mike Gerber
2020-11-12 18:59:25 +0100
-
-
cb1ac9d260
Add black to developer requirements.
Benjamin Rosemann
2020-11-10 13:09:06 +0100
-
03ad413f4a
Added some helpful tools and configurations
Benjamin Rosemann
2020-11-10 12:56:08 +0100
-
5cbd4f3d95
Preparation for black code formatter
Benjamin Rosemann
2020-11-10 12:55:31 +0100
-
ce752e1912
Remove .idea folder and modify .gitignore
Benjamin Rosemann
2020-11-10 12:45:13 +0100
-
5270737c1f
Skip test on windows because it is unix specific.
Benjamin Rosemann
2020-10-28 14:48:38 +0100
-
-
32a4b95a99
🐛 dinglehopper: Normalize in plain_extract()
Gerber, Mike
2020-11-10 18:51:14 +0100
-
14421c8e53
🎨 dinglehopper: Reformat using black
Gerber, Mike
2020-11-10 12:29:55 +0100
-
31c63f9e4c
🎨 dinglehopper: s/LOG/log
Gerber, Mike
2020-11-09 16:55:43 +0100
-
0804b029c4
Merge pull request #43 from bertsky/patch-1
Mike Gerber
2020-11-09 16:51:00 +0100
-
-
a60c14351e
1 more update for core's getLogger context
Robert Sachunsky
2020-11-03 17:46:59 +0100
-
-
a51f0b3dcd
Merge pull request #42 from b2m/test-python-cache-for-travis
Mike Gerber
2020-10-30 12:35:20 +0100
-
-
b10af9f138
Test travis pip caching
Benjamin Rosemann
2020-10-29 11:49:35 +0100
-
-
089f6d299e
Merge pull request #37 from b2m/fix-sort-with-none
Mike Gerber
2020-10-29 15:05:46 +0100
-
-
5138a1de21
Merge pull request #39 from b2m/test-python-3.9
Mike Gerber
2020-10-29 13:42:24 +0100
-
-
c02569b41e
Fix f-strings for Python 3.5
Benjamin Rosemann
2020-10-29 12:33:54 +0100
-
7b27b2834e
More complex sorting for text extraction
Benjamin Rosemann
2020-10-29 09:51:15 +0100
-
6ff831dfd2
Sort textlines with missing indices
Benjamin Rosemann
2020-10-27 12:33:37 +0100
-
-
-
e77f19fefc
Add Python 3.9 to .travis.yml
Benjamin Rosemann
2020-10-28 14:53:04 +0100
-
-
082fc9e09a
Merge pull request #38 from b2m/add-editorconfig
Mike Gerber
2020-10-28 15:16:04 +0100
-
-
20661487d6
Add .editorconfig
Benjamin Rosemann
2020-10-28 11:31:18 +0100
-
-
6e47acda1c
📝 dinglehopper: Move screenshot higher
Gerber, Mike
2020-10-21 19:31:53 +0200
-
5cbe148741
🐛 dinglehopper: Skip pages if there is no GT nor OCR (Fixes GH-34)
Gerber, Mike
2020-10-21 19:29:35 +0200
-
e4e2777cb7
🐛 dinglehopper: Do try to get text when no TextEquivs exist
Gerber, Mike
2020-10-21 17:59:44 +0200
-
f14ae46870
Merge branch 'feat/text-extraction-levels'
Gerber, Mike
2020-10-21 17:51:44 +0200
-
-
1c88891a98
✔️ Add test data for LAREX's indexed TextEquivs (unused)
Gerber, Mike
2020-10-21 17:51:15 +0200
-
19d15e3ecc
🐛 dinglehopper: Honor TextEquiv index (Closes GH-33)
Gerber, Mike
2020-10-21 17:50:21 +0200
-
f626a2ebe6
🧹 dinglehopper: Remove warning when there is a non-TextRegion in the ReadingOrder
Gerber, Mike
2020-10-21 17:03:55 +0200
-
0f3857d8d3
📝 Document OCR-D parameters and restructure README a bit
Gerber, Mike
2020-10-21 16:51:53 +0200
-
8b4ee20a40
✨ Add a new CLI tool dinglehopper-extract to just give the extracted text
Gerber, Mike
2020-10-21 16:30:48 +0200
-
b23b75b601
✨ dinglehopper: Give segment ids from the extracted textequiv_level
Gerber, Mike
2020-10-21 16:04:25 +0200
-
b23e4ce30e
✨ dinglehopper: Add OCR-D parameter to choose TextEquiv level
Gerber, Mike
2020-10-21 14:38:15 +0200
-
9744fa2567
✨ dinglehopper: Add CLI option to choose TextEquiv level
Gerber, Mike
2020-10-20 19:33:39 +0200
-
75733039b8
🧹 dinglehopper: Do not hardcode joiner to \n
Gerber, Mike
2020-10-20 18:43:56 +0200
-
3848412349
✨ dinglehopper: Implement the basic text extraction from PAGE TextLines
Gerber, Mike
2020-10-20 18:40:07 +0200
-
-
f2367ac0c3
🐛 Fix OCR-D CLI for newest OCR-D
Gerber, Mike
2020-10-16 14:58:17 +0200
-
5ed184c8c4
✨ dinglehopper: Show a progressbar on --progress
Gerber, Mike
2020-10-15 16:09:17 +0200
-
4951823a29
🧹 dinglehopper: Disable metrics in JSON report, too
Gerber, Mike
2020-10-15 15:38:15 +0200
-
5303eea80c
📝 dinglehopper: Update README to use OCR-D's new and more readable -P option
Gerber, Mike
2020-10-15 15:37:51 +0200
-
82217a25bb
🧹 dinglehopper: Move all normalization code to extracted_text.py
Gerber, Mike
2020-10-08 17:29:25 +0200
-
009fa55c09
Merge branch 'master' of https://github.com/qurator-spk/dinglehopper
Gerber, Mike
2020-10-08 17:17:40 +0200
-
-
c20bbbfa25
📝 dinglehopper: Update screenshot to include a region id tooltip
Gerber, Mike
2020-10-08 17:17:34 +0200
-
252bf9b3e7
📝 dinglehopper: Fix markdown in README.md
Mike Gerber
2020-10-08 17:14:29 +0200
-
-
c6c6b8efab
📝 dinglehopper: Add detail about the text extraction and ExtractedText
Gerber, Mike
2020-10-08 17:05:36 +0200
-
7025ea54a8
📝 dinglehopper: Move developer info to README-DEV.md
Gerber, Mike
2020-10-08 16:59:50 +0200
-
f50591abac
Merge branch 'feat/display-segment-id'
Gerber, Mike
2020-10-08 13:39:38 +0200
-
-
c514abfb9f
🧹 dinglehopper: Sanitize imports
Gerber, Mike
2020-10-08 13:33:19 +0200
-
1077dc64ce
➡️ dinglehopper: Move ExtractedText to its own file
Gerber, Mike
2020-10-08 13:25:20 +0200
-
9dd4ff0aae
✨ dinglehopper: Extract line IDs for ALTO
Gerber, Mike
2020-10-08 12:54:28 +0200
-
f3aafb6fdf
✨ dinglehopper: Validate ExtractedText.{segments,_text} in both directions
Gerber, Mike
2020-10-08 12:20:27 +0200
-
1f9a680fe7
⚙️ dinglehopper: PyCharm should use dinglehopper-github virtualenv
Gerber, Mike
2020-10-08 12:16:42 +0200
-
b14c35e147
🎨 dinglehopper: Use multimethod to handle str vs ExtractedText
Gerber, Mike
2020-10-08 12:15:58 +0200
-
a17ee2afec
🚧 dinglehopper: Guarantee NFC + rename from_text → from_str
Gerber, Mike
2020-10-08 11:25:01 +0200
-
7843824eaf
🚧 dinglehopper: Support str & ExtractedText in CER and distance functions
Gerber, Mike
2020-10-08 10:47:20 +0200
-
5bee55c896
💩 dinglehopper: Fix OCR-D CLI test by working around ocrd_cli_wrap_processor() check for arguments
Gerber, Mike
2020-09-25 14:53:19 +0200
-
96b55f1806
🚧 dinglehopper: Hierarchical text representation
Gerber, Mike
2020-10-07 18:31:52 +0200
-
db6292611f
🧹 dinglehopper: Remove merged text extraction test code
Gerber, Mike
2020-10-07 16:07:27 +0200
-
d706ef4621
📝 Document CER/WER and the format detection (Fixes GH-26)
Gerber, Mike
2020-09-30 17:58:05 +0200
-
da47e41c85
💩 dinglehopper: Fix OCR-D CLI test by working around ocrd_cli_wrap_processor() check for arguments
Gerber, Mike
2020-09-25 14:53:19 +0200
-
7085ee0fd8
Merge pull request #29 from kba/getlogger
Mike Gerber
2020-09-25 13:20:58 +0200
-
-
77154ef256
📝 dinglehopper: Document REPORT_PREFIX (Closes GH-27.)
Gerber, Mike
2020-09-24 20:56:50 +0200
-
829b84c66a
⚙️ dinglehopper: Add PyCharm's vcs.xml to git
Gerber, Mike
2020-09-24 20:51:42 +0200