1
0
Fork 0
mirror of https://github.com/qurator-spk/dinglehopper.git synced 2025-10-24 15:04:14 +02:00
Commit graph

255 commits

Author SHA1 Message Date
Benjamin Rosemann
9e64c4f0d0 Remove obsolete test 2021-02-16 11:28:24 +01:00
Benjamin Rosemann
b9259b9d01 Add multiprocessing to flexible_character_accuracy 2021-02-16 11:28:24 +01:00
Benjamin Rosemann
c4f75d5264 Increase cache size for bad OCR results. 2021-02-16 11:28:24 +01:00
Benjamin Rosemann
84d34f5b26 Fix annoying logging exceptions and encoding errors. 2021-02-16 11:28:24 +01:00
Benjamin Rosemann
0dd5fc0ee5 Small corrections 2021-02-16 11:28:24 +01:00
Benjamin Rosemann
b24d8d5664 Performance increases
Temporarily switch to the c-implementation of python-levenshtein for
editops calculatation. Also added some variables, caching and type
changes for performance gains.
2021-02-16 11:28:24 +01:00
Benjamin Rosemann
0ef7810dd0 Reduce number of splits for short (one char) elements 2021-02-16 11:28:24 +01:00
Benjamin Rosemann
c9219cbacd Make sure that 0 cer and wer are reported 2021-02-16 11:28:23 +01:00
Benjamin Rosemann
fd6f57a263 Fix broken build on Python 3.5 2021-02-16 11:28:23 +01:00
Benjamin Rosemann
cac437afbf Evaluate some performance issues 2021-02-16 11:28:23 +01:00
Benjamin Rosemann
1bc7ef6c8b Correct report for fca
As the fca implementation already knows the editing operations for each
segment we use a different sequence alignment method.
2021-02-16 11:28:23 +01:00
Benjamin Rosemann
750ad00d1b Add tooltips to fca report 2021-02-16 11:28:23 +01:00
Benjamin Rosemann
53064bf833 Include fca as parameter and add some tests 2021-02-16 11:28:23 +01:00
Benjamin Rosemann
9b76539936 Fix numpy version conflict with ocrd_utils 2021-02-16 11:28:23 +01:00
Benjamin Rosemann
26fe98dde7 Readd pytest.ini 2021-02-16 11:28:23 +01:00
Benjamin Rosemann
4a87adc2c7 Implement version specific data structures
As ocr-d continues the support for Python 3.5 until the end of this year
version specific data structures have been implemented.

When the support for Python 3.5 is dropped the extra file can easily be
removed.
2021-02-16 11:28:23 +01:00
Benjamin Rosemann
2a215a1062 Reformat using black 2021-02-16 11:28:23 +01:00
Benjamin Rosemann
5277593bdb Fix some special cases 2021-02-16 11:28:23 +01:00
Benjamin Rosemann
d7a74fa58b First draft of flexible character accuracy 2021-02-16 11:28:23 +01:00
bd324331e6 🚧 dinglehopper: Try out Drone CI
All checks were successful
continuous-integration/drone/push Build is passing
2021-02-11 14:26:29 +01:00
a59ecb795c 🚧 dinglehopper: Try out Drone CI
Some checks failed
continuous-integration/drone/push Build is failing
2021-02-11 14:15:08 +01:00
14230e073a 🚧 dinglehopper: Try out Drone CI 2021-02-11 14:08:25 +01:00
985666a71c 🚧 dinglehopper: Try out Drone CI 2021-02-10 20:35:22 +01:00
4a73053cfc 🚧 Replace Travis with CircleCI 2021-02-10 18:22:52 +01:00
e3d4493c82 🚧 Replace Travis with CircleCI 2021-02-10 17:58:58 +01:00
27f4c3bdf8 🚧 Replace Travis with CircleCI 2021-02-10 17:57:08 +01:00
8533e6d421 🚧 Replace Travis with CircleCI 2021-02-10 17:55:09 +01:00
e8da8b63f8 🚧 Replace Travis with CircleCI 2021-02-10 17:53:50 +01:00
3b7a1a5631 🚧 Replace Travis with CircleCI 2021-02-10 17:50:34 +01:00
691ce371ca
Merge pull request #50 from b2m/fix-table-extraction
Fix the extraction of text from Page with TableRegion
2021-02-01 17:51:33 +01:00
Benjamin Rosemann
a68fc269d9 Fix the extraction of text from Page with TableRegion
Dinglehopper did not consider `OrderedGroupIndex` in the `ReadingOrder`
element when extracting text regions. As a consequence a `TableRegion`
was not considered for text extraction.
2020-11-27 11:18:11 +01:00
8cd8314c8a 🐛 dinglehopper: Bump up ocrd req for zip_input_files
See also GH-49.
2020-11-19 18:59:47 +01:00
62670dd0c7
Merge pull request #49 from kba/zip_input_files
ocrd cli: use core-provided zip_input_files method
2020-11-19 18:54:21 +01:00
Konstantin Baierer
74e0ac18ed ocrd cli: use core-provided zip_input_files method 2020-11-19 16:00:28 +01:00
389e253c11 🐛 dinglehopper: Fix alto_extract_lines()'s type annotation 2020-11-12 19:32:38 +01:00
fe3923a8af 🐛 dinglehopper: Fix alto_extract()'s type annotation 2020-11-12 19:19:05 +01:00
132f91d500 ✔️ dinglehopper: Add missing integration test markers 2020-11-12 19:10:23 +01:00
c48d7646df 📝 dinglehopper: README-DEV: Massage markdown a bit 2020-11-12 19:05:14 +01:00
fed021090d
Merge pull request #46 from b2m/tool-changes
Tool changes
2020-11-12 18:59:25 +01:00
Benjamin Rosemann
cb1ac9d260 Add black to developer requirements. 2020-11-11 11:36:17 +01:00
Benjamin Rosemann
03ad413f4a Added some helpful tools and configurations 2020-11-11 11:36:17 +01:00
Benjamin Rosemann
5cbd4f3d95 Preparation for black code formatter 2020-11-11 11:36:17 +01:00
Benjamin Rosemann
ce752e1912 Remove .idea folder and modify .gitignore
Sharing even parts of the .idea folder in worldwide setting is bound to
generate more problems than solutions. Therefore it should be removed
and consequently ignore in .gitignore.

Also adds some Python specific stuff to the .gitignore file.
2020-11-11 11:36:17 +01:00
Benjamin Rosemann
5270737c1f Skip test on windows because it is unix specific. 2020-11-11 11:36:17 +01:00
32a4b95a99 🐛 dinglehopper: Normalize in plain_extract() 2020-11-10 18:51:14 +01:00
14421c8e53 🎨 dinglehopper: Reformat using black 2020-11-10 12:29:55 +01:00
31c63f9e4c 🎨 dinglehopper: s/LOG/log 2020-11-09 16:55:43 +01:00
0804b029c4
Merge pull request #43 from bertsky/patch-1
1 more update for core's getLogger context
2020-11-09 16:51:00 +01:00
Robert Sachunsky
a60c14351e
1 more update for core's getLogger context 2020-11-03 17:46:59 +01:00
a51f0b3dcd
Merge pull request #42 from b2m/test-python-cache-for-travis
Add travis pip caching
2020-10-30 12:35:20 +01:00