Commit Graph

  • 69325facf2 🐛 Detect encoding (incl BOM) when reading files Mike Gerber 2023-08-03 17:48:13 +0200
  • 325e5af5f5 🐛 Move source into src/ to fix install Mike Gerber 2023-08-03 17:29:28 +0200
  • db7c051b22 ⚙ Migrate to pyproject.toml Mike Gerber 2023-08-02 20:55:47 +0200
  • fc81233a0e 🚧 CircleCI: Run black Mike Gerber 2023-07-18 20:41:16 +0200
  • cb0134d2db 🚧 CircleCI: Run black Mike Gerber 2023-07-18 20:40:17 +0200
  • 55d534b981 🚧 CircleCI: Run black Mike Gerber 2023-07-18 20:37:47 +0200
  • 2632cb09b8 🚧 CircleCI: Run black Mike Gerber 2023-07-18 20:28:55 +0200
  • 35be58cb94
    Merge pull request #83 from INL/feat/batch-processing Mike Gerber 2023-05-26 15:28:36 +0200
  • 6d3a8cecd2
    Merge pull request #82 from CircleCI-config-suggestions-bot/StoreTestResults Mike Gerber 2023-05-24 18:50:40 +0200
  • 207804e6a6 Add batch processing and report summaries Ruud de Jong 2023-05-12 09:55:00 +0200
  • 89814cbe4b Upload test results to CircleCI CircleCI Config Suggestions Bot 2023-05-05 14:21:14 -0400
  • dd9303b429 🧹 .gitignore .python-version (for pyenv) neingeist 2023-04-20 20:15:44 +0200
  • f1fc3f1880 🧹 Remove qurator. namespace prefix Mike Gerber 2023-03-27 18:25:39 +0200
  • f668963a2e 🐛 Fix installing by calling find_namespace_packages in setup.py Mike Gerber 2023-03-27 14:34:52 +0200
  • c4ab7c9a7c 🕸Do not use deprecated ID, pageId options Mike Gerber 2023-03-14 13:16:09 +0100
  • b4ac24ac9d 🔧 Remove explicit namespace_packages Mike Gerber 2023-03-14 12:59:10 +0100
  • 2a090c9b5a ✔ CircleCI: Explicitly install binary opencv-python-headless (dep of OCR-D?) to avoid compilation Mike Gerber 2023-03-14 12:49:02 +0100
  • 833efa37da 🐛 Remove deprecated declare_namespace call Mike Gerber 2023-03-14 12:44:22 +0100
  • 0fd4ea1973 ✔ Add @cneud's former 40 GB problem files to the test suite Gerber, Mike 2023-03-02 16:24:08 +0100
  • 0f0819512e 🎨 Reformat using Black Gerber, Mike 2023-03-02 10:22:51 +0100
  • 2268f32a78 ✔ CircleCI: Test on Python 3.11 Gerber, Mike 2023-03-02 10:06:00 +0100
  • d07bd5ecc6 add version to ocrd-tool.json (and setup.py) Konstantin Baierer 2023-02-28 17:14:04 +0100
  • a18b25b163 🐛 Update tests for ExtractedText Gerber, Mike 2023-01-27 19:13:45 +0100
  • f48e305347
    use uniseg again Max Bachmann 2022-10-12 18:52:58 +0200
  • d2bbc8a6c7 update rapidfuzz version Max Bachmann 2022-09-11 02:38:32 +0200
  • a1f0a5e2d3 replace uniseg with uniseg2 Max Bachmann 2022-08-29 22:08:25 +0200
  • 22c3817f45 apply black Max Bachmann 2022-08-29 01:50:19 +0200
  • 01571f23b7 move grapheme clusters to ExtractedText Max Bachmann 2022-08-29 01:49:04 +0200
  • f211d09f56 remove python2.7 futures Max Bachmann 2022-08-29 00:50:33 +0200
  • 205a969c0e remove unused includes Max Bachmann 2022-08-29 00:48:40 +0200
  • f3825cdeb6
    only call `words_normalized` once Max Bachmann 2022-08-29 00:22:23 +0200
  • dcc10c5389 ✔️ Skip test_lines_similar() for now Gerber, Mike 2022-08-18 15:51:13 +0200
  • 555f586775 📝 Note that old terminals might not render the Unicode characters correctly Gerber, Mike 2022-08-17 17:59:15 +0200
  • c4e85da5ab 🐛 Update editops() and seq_align() due to RapidFuzz API changes Gerber, Mike 2022-08-17 17:55:44 +0200
  • 15dfbac3a7 Revert "Revert "Merge pull request #67 from maxbachmann/rapidfuzz"" Gerber, Mike 2022-08-17 11:42:19 +0200
  • ede9402a6c Revert "💩 Stick with rapidfuzz < 2.1.0 for now" Gerber, Mike 2022-08-17 11:42:07 +0200
  • 0e153db9ca 💩 Stick with rapidfuzz < 2.1.0 for now Gerber, Mike 2022-08-16 19:34:48 +0200
  • 76bd50f1db Revert "Merge pull request #67 from maxbachmann/rapidfuzz" Gerber, Mike 2022-08-16 19:31:28 +0200
  • 85f751aacc
    Merge pull request #67 from maxbachmann/rapidfuzz Mike Gerber 2022-08-16 16:35:54 +0200
  • e543438496 replace usage of deprecated rapidfuzz APIs Max Bachmann 2022-08-07 10:40:31 +0200
  • 1febea8c92
    Merge pull request #66 from stweil/master Mike Gerber 2022-03-30 13:40:36 +0200
  • 101f50ec88 Ignore Python build artifacts Stefan Weil 2022-03-24 16:51:37 +0100
  • edc24cd4db ✔️ DroneCI: Build on Python 3.6 → 3.10 Gerber, Mike 2022-03-03 16:35:26 +0100
  • d726396002 👷🏾‍♂️ Remove str() on Path objects Gerber, Mike 2022-03-02 11:19:40 +0100
  • a19224dc46 ✔️ CircleCI: Stop testing using Python 3.5 Gerber, Mike 2022-02-28 14:46:34 +0100
  • 76bacc0f15 🐛 Bump rapidfuzz dep to >= 2.0.5 (Fixes gh-65) Gerber, Mike 2022-02-28 14:35:54 +0100
  • 195354c6d4 Merge branch 'feat/compare-line-texts' Gerber, Mike 2022-01-24 18:46:33 +0100
  • 8a3f5e48c2 🐛 dinglehopper: Patch word_break only once Gerber, Mike 2022-01-24 18:44:30 +0100
  • b6bde2b7ec 📝 dinglehopper: Document dinglehopper-line-dirs in the README Gerber, Mike 2021-12-15 11:16:40 +0100
  • f77ce857b2 🚧 dinglehopper: Sahre json_float code Gerber, Mike 2021-12-14 18:37:07 +0100
  • 5b394649a7 🚧 dinglehopper: Compute WER in line-dirs CLI Gerber, Mike 2021-12-14 18:33:20 +0100
  • cb2be96179 🚧 dinglehopper: Add word differences in line-dirs report Gerber, Mike 2021-12-14 18:20:04 +0100
  • dbb660615a 🚧 dinglehopper: Compare line text directories (WIP) Gerber, Mike 2021-12-13 20:02:18 +0100
  • a018006f98 🚧 dinglehopper: Compare line text directories (WIP) Gerber, Mike 2021-12-13 19:32:55 +0100
  • 36b36f6986 🚧 dinglehopper: Compare line text directories (WIP) Gerber, Mike 2021-12-13 19:26:21 +0100
  • f0f3cd2d96 ⬆️ dinglehopper: Require rapidfuzz >= 1.9.1 Gerber, Mike 2021-12-14 11:35:57 +0100
  • a5c9c7438f 💩 ocrd-galley: Work around OCR-D/core#730 Gerber, Mike 2021-11-05 17:05:54 +0100
  • 7d26b049d1 Merge branch 'fix/ci-py310' Gerber, Mike 2021-10-26 13:28:57 +0200
  • 51a44895dc ⬆️ CircleCI: Add Python 3.10 Gerber, Mike 2021-10-26 13:24:50 +0200
  • 1f8fa5176f Revert "⬆️ CircleCI: Add Python 3.10" Gerber, Mike 2021-10-23 15:22:57 +0200
  • b2b21839c2 ⬆️ CircleCI: Add Python 3.10 Gerber, Mike 2021-10-22 18:41:47 +0200
  • 7d85e21cbc ⬆️ CircleCI: Switch to the new cimg/python image Gerber, Mike 2021-10-22 18:39:54 +0200
  • dea0c53f88 Merge branch 'rapidfuzz' Gerber, Mike 2021-10-22 18:19:58 +0200
  • 06ea38449c 📝 dinglehopper: Update Levenshtein notebook Gerber, Mike 2021-10-22 16:58:40 +0200
  • 3ee688001a 🧹 dinglehopper: Directly import levenshtein() from rapidfuzz Gerber, Mike 2021-10-22 16:30:21 +0200
  • 5d496df267 dinglehopper: Remove tests that only test rapidfuzz's levenshtein() Gerber, Mike 2021-10-22 16:26:55 +0200
  • 091f069b3c dinglehopper: Remove tests that only test rapidfuzz's levenshtein_ops() Gerber, Mike 2021-10-22 16:21:16 +0200
  • af8da1d716 dinglehopper: Use rapidfuzz for editops Gerber, Mike 2021-10-22 15:38:59 +0200
  • 9f8f88df1f Reintroduce tooltips in report. Benjamin Rosemann 2021-06-15 08:58:56 +0200
  • 12dcdb81da Add metrics parameter to integration test Benjamin Rosemann 2021-06-14 17:08:02 +0200
  • 7642a53091 Allow disabling the html report. Benjamin Rosemann 2021-06-14 16:25:31 +0200
  • e8ccffb275 Updated reports and dependencies. Benjamin Rosemann 2021-06-14 15:52:14 +0200
  • 40f23b8482 Added comments Benjamin Rosemann 2021-06-14 12:29:34 +0200
  • cee7b6891b Fix CI Build Benjamin Rosemann 2021-06-12 09:43:02 +0200
  • 714b569195 Fixed some flake8 and mypy issues. Benjamin Rosemann 2021-06-11 16:09:19 +0200
  • a44a3d4bf2 Error handling Benjamin Rosemann 2021-06-11 15:33:13 +0200
  • 06468a436e Implemented new metrics behaviour Benjamin Rosemann 2021-06-11 15:08:45 +0200
  • 9f5112f8f6 Remove support for ExtractedText for bag metrics. Benjamin Rosemann 2021-06-11 10:23:26 +0200
  • 381fe7cb6b Switch to result tuple instead of multiple return parameters Benjamin Rosemann 2021-06-11 10:21:23 +0200
  • 974ca3e5c0 Split html and json report generation Benjamin Rosemann 2021-06-11 09:35:26 +0200
  • 8cd624f795 Add BoC and BoW metric Benjamin Rosemann 2021-06-08 17:41:44 +0200
  • 4ccae9432d Move metrics into separate package Benjamin Rosemann 2021-05-27 16:37:34 +0200
  • 45465f8d13 Remove restriction on Python 3.5 Benjamin Rosemann 2021-05-27 16:26:02 +0200
  • 249787686f Merge branch 'master' of github.com:qurator-spk/dinglehopper Gerber, Mike 2021-05-20 09:42:15 +0200
  • 2a6cc5823e 🐛 dinglehopper: Call initLogging before logging Gerber, Mike 2021-05-20 09:39:09 +0200
  • 675a096dfe Remove restrictions on numpy Benjamin Rosemann 2021-05-19 15:02:49 +0200
  • 0b9af3a21e
    Merge pull request #58 from kba/unorderedgroupindexed Mike Gerber 2021-05-18 18:32:32 +0200
  • 7fde00d911 ReadingOrder may also contain UnorderedGroupIndexed Konstantin Baierer 2021-05-18 17:34:08 +0200
  • a39a89a50d Adapt version matrix Benjamin Rosemann 2021-05-05 16:52:24 +0200
  • 685c37ece3 Test missing trigger Benjamin Rosemann 2021-05-05 16:38:26 +0200
  • 0f69ec85fa Also consider packages on CircleCI Benjamin Rosemann 2021-05-05 16:31:18 +0200
  • 72ad03b4df Test triggering via .allowed-licenses Benjamin Rosemann 2021-05-05 16:27:15 +0200
  • 1232dee64a Test with version specific requirement files Benjamin Rosemann 2021-05-05 16:13:05 +0200
  • 15e584f0ab Introduce version pinning and license checcking Benjamin Rosemann 2021-05-05 15:20:35 +0200
  • 1778b36a9a 🚧 dinglehopper: Read PAGE UnorderedGroup in XML order Gerber, Mike 2021-04-15 21:09:45 +0200
  • 85b784f9a1 Fix problem with json encoding Benjamin Rosemann 2021-02-16 11:23:37 +0100
  • 9e64c4f0d0 Remove obsolete test Benjamin Rosemann 2020-11-27 11:31:25 +0100
  • b9259b9d01 Add multiprocessing to flexible_character_accuracy Benjamin Rosemann 2020-11-26 09:58:40 +0100
  • c4f75d5264 Increase cache size for bad OCR results. Benjamin Rosemann 2020-11-24 17:10:59 +0100
  • 84d34f5b26 Fix annoying logging exceptions and encoding errors. Benjamin Rosemann 2020-11-24 17:10:18 +0100