Commit Graph

257 Commits (675a096dfe52ea5ba248323f596f1c5152148dca)
 

Author SHA1 Message Date
Benjamin Rosemann 675a096dfe Remove restrictions on numpy
Benjamin Rosemann 85b784f9a1 Fix problem with json encoding
Benjamin Rosemann 9e64c4f0d0 Remove obsolete test
Benjamin Rosemann b9259b9d01 Add multiprocessing to flexible_character_accuracy
Benjamin Rosemann c4f75d5264 Increase cache size for bad OCR results.
Benjamin Rosemann 84d34f5b26 Fix annoying logging exceptions and encoding errors.
Benjamin Rosemann 0dd5fc0ee5 Small corrections
Benjamin Rosemann b24d8d5664 Performance increases
Temporarily switch to the c-implementation of python-levenshtein for
editops calculatation. Also added some variables, caching and type
changes for performance gains.
Benjamin Rosemann 0ef7810dd0 Reduce number of splits for short (one char) elements
Benjamin Rosemann c9219cbacd Make sure that 0 cer and wer are reported
Benjamin Rosemann fd6f57a263 Fix broken build on Python 3.5
Benjamin Rosemann cac437afbf Evaluate some performance issues
Benjamin Rosemann 1bc7ef6c8b Correct report for fca
As the fca implementation already knows the editing operations for each
segment we use a different sequence alignment method.
Benjamin Rosemann 750ad00d1b Add tooltips to fca report
Benjamin Rosemann 53064bf833 Include fca as parameter and add some tests
Benjamin Rosemann 9b76539936 Fix numpy version conflict with ocrd_utils
Benjamin Rosemann 26fe98dde7 Readd pytest.ini
Benjamin Rosemann 4a87adc2c7 Implement version specific data structures
As ocr-d continues the support for Python 3.5 until the end of this year
version specific data structures have been implemented.

When the support for Python 3.5 is dropped the extra file can easily be
removed.
Benjamin Rosemann 2a215a1062 Reformat using black
Benjamin Rosemann 5277593bdb Fix some special cases
Benjamin Rosemann d7a74fa58b First draft of flexible character accuracy
Gerber, Mike bd324331e6 🚧 dinglehopper: Try out Drone CI
Gerber, Mike a59ecb795c 🚧 dinglehopper: Try out Drone CI
Gerber, Mike 14230e073a 🚧 dinglehopper: Try out Drone CI
Gerber, Mike 985666a71c 🚧 dinglehopper: Try out Drone CI
Gerber, Mike 4a73053cfc 🚧 Replace Travis with CircleCI
Gerber, Mike e3d4493c82 🚧 Replace Travis with CircleCI
Gerber, Mike 27f4c3bdf8 🚧 Replace Travis with CircleCI
Gerber, Mike 8533e6d421 🚧 Replace Travis with CircleCI
Gerber, Mike e8da8b63f8 🚧 Replace Travis with CircleCI
Gerber, Mike 3b7a1a5631 🚧 Replace Travis with CircleCI
Mike Gerber 691ce371ca
Merge pull request from b2m/fix-table-extraction
Fix the extraction of text from Page with TableRegion
Benjamin Rosemann a68fc269d9 Fix the extraction of text from Page with TableRegion
Dinglehopper did not consider `OrderedGroupIndex` in the `ReadingOrder`
element when extracting text regions. As a consequence a `TableRegion`
was not considered for text extraction.
Gerber, Mike 8cd8314c8a 🐛 dinglehopper: Bump up ocrd req for zip_input_files
See also GH-49.
Mike Gerber 62670dd0c7
Merge pull request from kba/zip_input_files
ocrd cli: use core-provided zip_input_files method
Konstantin Baierer 74e0ac18ed ocrd cli: use core-provided zip_input_files method
Gerber, Mike 389e253c11 🐛 dinglehopper: Fix alto_extract_lines()'s type annotation
Gerber, Mike fe3923a8af 🐛 dinglehopper: Fix alto_extract()'s type annotation
Gerber, Mike 132f91d500 ✔️ dinglehopper: Add missing integration test markers
Gerber, Mike c48d7646df 📝 dinglehopper: README-DEV: Massage markdown a bit
Mike Gerber fed021090d
Merge pull request from b2m/tool-changes
Tool changes
Benjamin Rosemann cb1ac9d260 Add black to developer requirements.
Benjamin Rosemann 03ad413f4a Added some helpful tools and configurations
Benjamin Rosemann 5cbd4f3d95 Preparation for black code formatter
Benjamin Rosemann ce752e1912 Remove .idea folder and modify .gitignore
Sharing even parts of the .idea folder in worldwide setting is bound to
generate more problems than solutions. Therefore it should be removed
and consequently ignore in .gitignore.

Also adds some Python specific stuff to the .gitignore file.
Benjamin Rosemann 5270737c1f Skip test on windows because it is unix specific.
Gerber, Mike 32a4b95a99 🐛 dinglehopper: Normalize in plain_extract()
Gerber, Mike 14421c8e53 🎨 dinglehopper: Reformat using black
Gerber, Mike 31c63f9e4c 🎨 dinglehopper: s/LOG/log
Mike Gerber 0804b029c4
Merge pull request from bertsky/patch-1
1 more update for core's getLogger context