Commit Graph

  • 0dd5fc0ee5 Small corrections Benjamin Rosemann 2020-11-23 09:18:22 +0100
  • b24d8d5664 Performance increases Benjamin Rosemann 2020-11-13 15:33:06 +0100
  • 0ef7810dd0 Reduce number of splits for short (one char) elements Benjamin Rosemann 2020-11-13 11:45:55 +0100
  • c9219cbacd Make sure that 0 cer and wer are reported Benjamin Rosemann 2020-11-13 09:01:33 +0100
  • fd6f57a263 Fix broken build on Python 3.5 Benjamin Rosemann 2020-11-13 08:54:21 +0100
  • cac437afbf Evaluate some performance issues Benjamin Rosemann 2020-11-12 18:38:16 +0100
  • 1bc7ef6c8b Correct report for fca Benjamin Rosemann 2020-11-12 16:23:04 +0100
  • 750ad00d1b Add tooltips to fca report Benjamin Rosemann 2020-11-11 17:21:56 +0100
  • 53064bf833 Include fca as parameter and add some tests Benjamin Rosemann 2020-11-11 11:14:44 +0100
  • 9b76539936 Fix numpy version conflict with ocrd_utils Benjamin Rosemann 2020-11-11 11:13:56 +0100
  • 26fe98dde7 Readd pytest.ini Benjamin Rosemann 2020-11-11 11:13:24 +0100
  • 4a87adc2c7 Implement version specific data structures Benjamin Rosemann 2020-11-10 17:18:09 +0100
  • 2a215a1062 Reformat using black Benjamin Rosemann 2020-11-10 14:26:31 +0100
  • 5277593bdb Fix some special cases Benjamin Rosemann 2020-11-10 12:33:49 +0100
  • d7a74fa58b First draft of flexible character accuracy Benjamin Rosemann 2020-11-09 17:29:40 +0100
  • 082e30822f Fix method return type Benjamin Rosemann 2020-11-19 11:24:38 +0100
  • e371da899e Switch from custom Levenshtein to python-Levenshtein Benjamin Rosemann 2020-11-16 12:06:44 +0100
  • 0e263cfac2 Switch between c and own implementation for distance and editops. Benjamin Rosemann 2020-11-16 09:48:54 +0100
  • 11916c2dcf Refactor tests in preparation of refactoring levenshtein. Benjamin Rosemann 2020-11-16 08:40:41 +0100
  • bd324331e6 🚧 dinglehopper: Try out Drone CI Gerber, Mike 2021-02-11 14:26:29 +0100
  • a59ecb795c 🚧 dinglehopper: Try out Drone CI Gerber, Mike 2021-02-11 14:15:08 +0100
  • 14230e073a 🚧 dinglehopper: Try out Drone CI Gerber, Mike 2021-02-11 14:08:25 +0100
  • 985666a71c 🚧 dinglehopper: Try out Drone CI Gerber, Mike 2021-02-10 20:35:22 +0100
  • 4a73053cfc 🚧 Replace Travis with CircleCI Gerber, Mike 2021-02-10 18:22:52 +0100
  • e3d4493c82 🚧 Replace Travis with CircleCI Gerber, Mike 2021-02-10 17:58:58 +0100
  • 27f4c3bdf8 🚧 Replace Travis with CircleCI Gerber, Mike 2021-02-10 17:57:08 +0100
  • 8533e6d421 🚧 Replace Travis with CircleCI Gerber, Mike 2021-02-10 17:55:09 +0100
  • e8da8b63f8 🚧 Replace Travis with CircleCI Gerber, Mike 2021-02-10 17:53:50 +0100
  • 3b7a1a5631 🚧 Replace Travis with CircleCI Gerber, Mike 2021-02-10 17:50:34 +0100
  • 691ce371ca
    Merge pull request #50 from b2m/fix-table-extraction Mike Gerber 2021-02-01 17:51:33 +0100
  • a68fc269d9 Fix the extraction of text from Page with TableRegion Benjamin Rosemann 2020-11-27 11:18:11 +0100
  • 8cd8314c8a 🐛 dinglehopper: Bump up ocrd req for zip_input_files Gerber, Mike 2020-11-19 18:59:20 +0100
  • 62670dd0c7
    Merge pull request #49 from kba/zip_input_files Mike Gerber 2020-11-19 18:54:21 +0100
  • 74e0ac18ed ocrd cli: use core-provided zip_input_files method Konstantin Baierer 2020-11-19 16:00:28 +0100
  • 389e253c11 🐛 dinglehopper: Fix alto_extract_lines()'s type annotation Gerber, Mike 2020-11-12 19:32:38 +0100
  • fe3923a8af 🐛 dinglehopper: Fix alto_extract()'s type annotation Gerber, Mike 2020-11-12 19:19:05 +0100
  • 132f91d500 ✔️ dinglehopper: Add missing integration test markers Gerber, Mike 2020-11-12 19:10:23 +0100
  • c48d7646df 📝 dinglehopper: README-DEV: Massage markdown a bit Gerber, Mike 2020-11-12 19:04:38 +0100
  • fed021090d
    Merge pull request #46 from b2m/tool-changes Mike Gerber 2020-11-12 18:59:25 +0100
  • cb1ac9d260 Add black to developer requirements. Benjamin Rosemann 2020-11-10 13:09:06 +0100
  • 03ad413f4a Added some helpful tools and configurations Benjamin Rosemann 2020-11-10 12:56:08 +0100
  • 5cbd4f3d95 Preparation for black code formatter Benjamin Rosemann 2020-11-10 12:55:31 +0100
  • ce752e1912 Remove .idea folder and modify .gitignore Benjamin Rosemann 2020-11-10 12:45:13 +0100
  • 5270737c1f Skip test on windows because it is unix specific. Benjamin Rosemann 2020-10-28 14:48:38 +0100
  • 32a4b95a99 🐛 dinglehopper: Normalize in plain_extract() Gerber, Mike 2020-11-10 18:51:14 +0100
  • 14421c8e53 🎨 dinglehopper: Reformat using black Gerber, Mike 2020-11-10 12:29:55 +0100
  • 31c63f9e4c 🎨 dinglehopper: s/LOG/log Gerber, Mike 2020-11-09 16:55:43 +0100
  • 0804b029c4
    Merge pull request #43 from bertsky/patch-1 Mike Gerber 2020-11-09 16:51:00 +0100
  • a60c14351e
    1 more update for core's getLogger context Robert Sachunsky 2020-11-03 17:46:59 +0100
  • a51f0b3dcd
    Merge pull request #42 from b2m/test-python-cache-for-travis Mike Gerber 2020-10-30 12:35:20 +0100
  • b10af9f138 Test travis pip caching Benjamin Rosemann 2020-10-29 11:49:35 +0100
  • 089f6d299e
    Merge pull request #37 from b2m/fix-sort-with-none Mike Gerber 2020-10-29 15:05:46 +0100
  • 5138a1de21
    Merge pull request #39 from b2m/test-python-3.9 Mike Gerber 2020-10-29 13:42:24 +0100
  • c02569b41e Fix f-strings for Python 3.5 Benjamin Rosemann 2020-10-29 12:33:54 +0100
  • 7b27b2834e More complex sorting for text extraction Benjamin Rosemann 2020-10-29 09:51:15 +0100
  • 6ff831dfd2 Sort textlines with missing indices Benjamin Rosemann 2020-10-27 12:33:37 +0100
  • e77f19fefc Add Python 3.9 to .travis.yml Benjamin Rosemann 2020-10-28 14:53:04 +0100
  • 082fc9e09a
    Merge pull request #38 from b2m/add-editorconfig Mike Gerber 2020-10-28 15:16:04 +0100
  • 20661487d6 Add .editorconfig Benjamin Rosemann 2020-10-28 11:31:18 +0100
  • 6e47acda1c 📝 dinglehopper: Move screenshot higher Gerber, Mike 2020-10-21 19:31:53 +0200
  • 5cbe148741 🐛 dinglehopper: Skip pages if there is no GT nor OCR (Fixes GH-34) Gerber, Mike 2020-10-21 19:29:35 +0200
  • e4e2777cb7 🐛 dinglehopper: Do try to get text when no TextEquivs exist Gerber, Mike 2020-10-21 17:59:44 +0200
  • f14ae46870 Merge branch 'feat/text-extraction-levels' Gerber, Mike 2020-10-21 17:51:44 +0200
  • 1c88891a98 ✔️ Add test data for LAREX's indexed TextEquivs (unused) Gerber, Mike 2020-10-21 17:51:15 +0200
  • 19d15e3ecc 🐛 dinglehopper: Honor TextEquiv index (Closes GH-33) Gerber, Mike 2020-10-21 17:50:21 +0200
  • f626a2ebe6 🧹 dinglehopper: Remove warning when there is a non-TextRegion in the ReadingOrder Gerber, Mike 2020-10-21 17:03:55 +0200
  • 0f3857d8d3 📝 Document OCR-D parameters and restructure README a bit Gerber, Mike 2020-10-21 16:51:53 +0200
  • 8b4ee20a40 Add a new CLI tool dinglehopper-extract to just give the extracted text Gerber, Mike 2020-10-21 16:30:48 +0200
  • b23b75b601 dinglehopper: Give segment ids from the extracted textequiv_level Gerber, Mike 2020-10-21 16:04:25 +0200
  • b23e4ce30e dinglehopper: Add OCR-D parameter to choose TextEquiv level Gerber, Mike 2020-10-21 14:38:15 +0200
  • 9744fa2567 dinglehopper: Add CLI option to choose TextEquiv level Gerber, Mike 2020-10-20 19:33:39 +0200
  • 75733039b8 🧹 dinglehopper: Do not hardcode joiner to \n Gerber, Mike 2020-10-20 18:43:56 +0200
  • 3848412349 dinglehopper: Implement the basic text extraction from PAGE TextLines Gerber, Mike 2020-10-20 18:40:07 +0200
  • f2367ac0c3 🐛 Fix OCR-D CLI for newest OCR-D Gerber, Mike 2020-10-16 14:58:17 +0200
  • 5ed184c8c4 dinglehopper: Show a progressbar on --progress Gerber, Mike 2020-10-15 16:09:17 +0200
  • 4951823a29 🧹 dinglehopper: Disable metrics in JSON report, too Gerber, Mike 2020-10-15 15:38:15 +0200
  • 5303eea80c 📝 dinglehopper: Update README to use OCR-D's new and more readable -P option Gerber, Mike 2020-10-15 15:37:51 +0200
  • 82217a25bb 🧹 dinglehopper: Move all normalization code to extracted_text.py Gerber, Mike 2020-10-08 17:29:25 +0200
  • 009fa55c09 Merge branch 'master' of https://github.com/qurator-spk/dinglehopper Gerber, Mike 2020-10-08 17:17:40 +0200
  • c20bbbfa25 📝 dinglehopper: Update screenshot to include a region id tooltip Gerber, Mike 2020-10-08 17:17:34 +0200
  • 252bf9b3e7
    📝 dinglehopper: Fix markdown in README.md Mike Gerber 2020-10-08 17:14:29 +0200
  • c6c6b8efab 📝 dinglehopper: Add detail about the text extraction and ExtractedText Gerber, Mike 2020-10-08 17:05:36 +0200
  • 7025ea54a8 📝 dinglehopper: Move developer info to README-DEV.md Gerber, Mike 2020-10-08 16:59:50 +0200
  • f50591abac Merge branch 'feat/display-segment-id' Gerber, Mike 2020-10-08 13:39:38 +0200
  • c514abfb9f 🧹 dinglehopper: Sanitize imports Gerber, Mike 2020-10-08 13:33:19 +0200
  • 1077dc64ce ➡️ dinglehopper: Move ExtractedText to its own file Gerber, Mike 2020-10-08 13:25:20 +0200
  • 9dd4ff0aae dinglehopper: Extract line IDs for ALTO Gerber, Mike 2020-10-08 12:54:28 +0200
  • f3aafb6fdf dinglehopper: Validate ExtractedText.{segments,_text} in both directions Gerber, Mike 2020-10-08 12:20:27 +0200
  • 1f9a680fe7 ⚙️ dinglehopper: PyCharm should use dinglehopper-github virtualenv Gerber, Mike 2020-10-08 12:16:42 +0200
  • b14c35e147 🎨 dinglehopper: Use multimethod to handle str vs ExtractedText Gerber, Mike 2020-10-08 12:15:58 +0200
  • a17ee2afec 🚧 dinglehopper: Guarantee NFC + rename from_text → from_str Gerber, Mike 2020-10-08 11:25:01 +0200
  • 7843824eaf 🚧 dinglehopper: Support str & ExtractedText in CER and distance functions Gerber, Mike 2020-10-08 10:47:20 +0200
  • 5bee55c896 💩 dinglehopper: Fix OCR-D CLI test by working around ocrd_cli_wrap_processor() check for arguments Gerber, Mike 2020-09-25 14:53:19 +0200
  • 96b55f1806 🚧 dinglehopper: Hierarchical text representation Gerber, Mike 2020-10-07 18:31:52 +0200
  • db6292611f 🧹 dinglehopper: Remove merged text extraction test code Gerber, Mike 2020-10-07 16:07:27 +0200
  • d706ef4621 📝 Document CER/WER and the format detection (Fixes GH-26) Gerber, Mike 2020-09-30 17:58:05 +0200
  • da47e41c85 💩 dinglehopper: Fix OCR-D CLI test by working around ocrd_cli_wrap_processor() check for arguments Gerber, Mike 2020-09-25 14:53:19 +0200
  • 7085ee0fd8
    Merge pull request #29 from kba/getlogger Mike Gerber 2020-09-25 13:20:58 +0200
  • 77154ef256 📝 dinglehopper: Document REPORT_PREFIX (Closes GH-27.) Gerber, Mike 2020-09-24 20:56:50 +0200
  • 829b84c66a ⚙️ dinglehopper: Add PyCharm's vcs.xml to git Gerber, Mike 2020-09-24 20:51:42 +0200