Commit Graph

  • 06c8b382db character normalization based on aletheia mapping master Kai Labusch 2024-03-14 17:31:41 +0100
  • eac71b3e40
    Merge pull request #13 from qurator-spk/fix-ppn-xpath Kai Labusch 2024-03-13 12:50:07 +0100
  • 3a8bfa74cc
    fix namespace typo: s/mets/mods/ fix-ppn-xpath Konstantin Baierer 2024-03-13 12:13:24 +0100
  • 2f7d01c7cd fix alto2tsv bug Kai Labusch 2023-11-10 17:28:30 +0100
  • eb750752c6
    Merge pull request #15 from stweil/typo Kai Labusch 2023-10-23 08:09:38 +0200
  • 3f35554a70 Fix typo (found by codespell) Stefan Weil 2023-10-23 08:06:07 +0200
  • fa1c6b5aa4
    Merge pull request #14 from stweil/gitignore Kai Labusch 2023-10-23 07:54:57 +0200
  • 175694d25d .gitignore: Ignore build directory Stefan Weil 2023-10-22 13:27:23 +0200
  • 0ec6f83c4c add alto2tsv Kai Labusch 2023-10-20 16:36:43 +0200
  • 82769077df make xpath for PPN number more specific to avoid catching the PPN of containing work Konstantin Baierer 2023-06-08 19:08:40 +0200
  • 0f64f07635 📦 v0.0.1 v0.0.1 Konstantin Baierer 2022-11-09 16:01:44 +0100
  • 3b10dcb05b Merge branch 'ocrd-processors' of https://github.com/kba/page2tsv into ocrd-processors Konstantin Baierer 2022-11-08 16:24:04 +0100
  • 1c0c1cd525 ocrd processors: use snake_case for add_file Konstantin Baierer 2022-11-08 16:22:28 +0100
  • e1a440b91c install into qurator namespace Konstantin Baierer 2022-11-08 16:19:23 +0100
  • abeca0df16 drop requirement for matplotlib (not used) Konstantin Baierer 2022-10-24 14:29:42 +0200
  • db25239075 Merge branch 'master' into ocrd-processors Konstantin Baierer 2022-10-24 14:02:24 +0200
  • a0e5c82929
    Merge branch 'master' into ocrd-processors Kai Labusch 2022-06-10 10:32:34 +0200
  • 75796b5c0c refactor Kai 2022-06-10 10:00:32 +0200
  • 81ba7cff82 tests Konstantin Baierer 2022-05-30 17:01:50 +0200
  • 60a07c6310 drop support for scaling, not necessary for SBB use case anymore Konstantin Baierer 2022-05-30 14:29:01 +0200
  • fe4a1eabb1 setup.py: use ocrd-tool.json for version Konstantin Baierer 2022-02-21 18:51:28 +0100
  • aabcc4866d remove obsolete tsv.py (now in qurator-sbb-tools Konstantin Baierer 2022-02-21 18:50:09 +0100
  • f813c45ba2 Merge remote-tracking branch 'origin/master' into ocrd-processors Konstantin Baierer 2022-02-21 18:47:10 +0100
  • aeb67e445f implement page2tsv/tsv2page as ocrd-neat-{ex,im}port Konstantin Baierer 2022-02-21 18:47:03 +0100
  • 0aee20a7f6 cli: separate tsv2page and tsv2page_cli Konstantin Baierer 2022-02-21 17:00:18 +0100
  • fe0c355e5a cli: produce TSV if no words are transcribed Konstantin Baierer 2022-02-21 17:00:03 +0100
  • 93ee53c8e2 cli: split page2tsv from page2tsv_cli Konstantin Baierer 2022-02-21 15:22:04 +0100
  • 9d2d5fcd31 add missing imports Kai 2022-02-21 14:03:07 +0100
  • 568e1cd104 remove ner/ned code from page2tsv package Kai 2021-12-15 15:51:00 +0100
  • ed90193c45 support segmentation only Page-XML Kai 2021-11-19 11:27:46 +0100
  • ee5f03ce07 change default scale factor to 1.0 Kai 2021-05-07 12:36:42 +0200
  • 5e60fabe4a revert changes Kai 2021-05-07 11:28:24 +0200
  • e5b635ec2d try other coordinate computation Kai 2021-05-07 10:50:09 +0200
  • f320904503 try other coordinate computation Kai 2021-05-07 10:40:52 +0200
  • 1eb05d0d62 xlrd does not support xsls files anymore Kai 2021-05-07 08:02:33 +0200
  • ae93668bac xlrd does not support xsls files anymore Kai 2021-05-07 07:54:55 +0200
  • 2bd4ae8d5a add ned-priority option to page2tsv Kai 2021-05-06 16:23:08 +0200
  • d4eb95b64b make code more robust Kai 2021-05-06 15:13:26 +0200
  • 49861b1652 support confidences in find-entities Kai 2021-05-06 13:17:07 +0200
  • 0da38d6ec6 support confidences in find-entities Kai 2021-05-06 13:01:41 +0200
  • 9b3198e401 add priority option for find-entities Kai 2021-05-06 12:42:12 +0200
  • 7b53cc5539 add priority option for find-entities Kai 2021-05-06 12:24:47 +0200
  • 318d9bd122 fix #7 Kai 2021-04-09 08:07:27 +0200
  • abcdb67e9e
    Merge pull request #6 from kba/lineid-ocr-tsv Kai Labusch 2021-04-08 10:53:51 +0200
  • f03acbf54d tsv2page CLI to propagate TSV results back to PAGE-XML Konstantin Baierer 2021-04-01 17:53:27 +0200
  • ad379aea2b store pc:TextLine ID in TSV, fix #5 Konstantin Baierer 2021-04-01 17:12:58 +0200
  • 9c63631d7a
    Merge pull request #4 from kba/core-page-api Kai Labusch 2021-04-01 15:43:46 +0200
  • 675c88a67d requirements: ocrd pulls in requests already Konstantin Baierer 2021-04-01 15:27:23 +0200
  • d80b02c56d use OCR-D/core PAGE API for reading order and recursive regions Konstantin Baierer 2021-04-01 15:17:13 +0200
  • e21fbc09a1 fix url Kai Labusch 2021-03-18 21:20:46 +0100
  • 1ec06a3087 fix setup.py Kai 2021-03-18 08:27:36 +0100
  • eca7823b10
    Merge pull request #3 from qurator-spk/cneud-patch-1 Kai Labusch 2021-03-10 17:00:59 +0100
  • 5c82b83b2e
    fix snippets Clemens Neudecker 2021-03-10 16:59:09 +0100
  • 243c7b48c6 fix line shift Kai 2021-03-10 16:08:39 +0100
  • 6ffba183ab fix repeated text lines Kai 2021-03-10 15:33:30 +0100
  • de575037e6 fix repeated text rows Kai 2021-03-10 15:15:24 +0100
  • a6008b83b5 remove full Kai 2021-03-10 15:11:38 +0100
  • 487b74b6e6 #2 Kai 2021-03-10 14:06:51 +0100
  • 243d373913
    fix iiif-url (dirty hack) Clemens Neudecker 2021-03-10 13:43:39 +0100
  • c554644838 Add directory parsing option to make-page2tsv-commands Kai 2021-03-10 12:03:45 +0100
  • aa79678403 Add directory parsing option to make-page2tsv-commands Kai 2021-03-10 11:57:39 +0100
  • 7fc39739b7 Add directory parsing option to make-page2tsv-commands Kai 2021-03-10 11:51:49 +0100
  • f606cb92b0 Change scale-factor default parameter. Fix make-page2tsv-commands Kai 2021-03-10 11:15:42 +0100
  • 900015da61 store OCR or NED confidences in tsv file Kai 2021-02-26 12:18:10 +0100
  • 5d55ba24a3 use max confidence instead of mean Kai 2021-02-04 07:37:58 +0100
  • 85ec36218e support visualization of ocr confidences Kai 2021-02-03 15:31:36 +0100
  • 2b73b421ae support visualization of ocr confidences Kai 2021-02-03 15:22:38 +0100
  • c3acd74e9f add OCR annotation functionality Kai 2021-02-01 16:25:12 +0100
  • a834da494a permit empty files Kai Labusch 2020-08-15 08:46:18 +0200
  • 2dc3857770 make tools more robust against glitches within the input files Kai Labusch 2020-07-02 11:37:54 +0200
  • e09f40db61 proper support for retroactive entity linking Kai Labusch 2020-06-19 14:30:38 +0200
  • 449bd1d3ca preserve URL structure in tsv files during NER/NED amendment U-PK\b-kl104 2020-06-19 11:29:51 +0200
  • 361c811264 add command line tool that creates page2tsv commands from an excel file Kai Labusch 2020-06-02 15:40:36 +0200
  • 83fb2ea033 enable NED only usage of find-entities Kai 2020-05-25 15:10:08 +0200
  • c12bea2cb0 enable NED only usage of find-entities Kai 2020-05-25 15:09:13 +0200
  • 975487a233 adapt find-entities to CLEF2020 requirements Kai Labusch 2020-05-25 07:15:46 +0200
  • 0d650ebcc5 support loading ned result from disk Kai 2020-05-22 08:29:08 +0200
  • 9fe35377e3 disable proxy option in find-entities Kai 2020-05-19 19:51:23 +0200
  • c7f4b6fe53 add proper NED support Kai 2020-04-09 09:57:03 +0200
  • 24fd7245f5 add findentities command line tool that can be used in order to NER/NED tag an existing .tsv file Kai 2020-03-25 08:18:59 +0100
  • b13dae29f5 rename GND-ID column to more generic ID Kai 2020-03-13 09:05:55 +0100
  • 0cd9cd932a support automatic named entity disambiguation Kai Labusch 2020-03-12 11:01:58 +0100
  • 05f49df6d2 support Qurator calamari PAGE xml Kai Labusch 2020-03-11 12:56:09 +0100
  • abdabbac4f try to infer correct line ordering ... Kai Labusch 2020-03-09 13:44:16 +0100
  • 7bf9cfa5de try to infer correct line ordering ... Kai Labusch 2020-03-09 10:58:07 +0100
  • e535a070c4
    Update cli.py Clemens Neudecker 2020-02-20 18:35:16 +0100
  • 2946909cf3 add command line option for image scale factor Kai Labusch 2020-01-10 13:04:07 +0100
  • 311dac31ac
    Update README.md Clemens Neudecker 2019-12-16 17:24:41 +0100
  • f888017f03 add example.xml PAGE-XML cneud 2019-12-16 16:40:39 +0100
  • 59a1e81243 extract TSV Tools from qurator-spk/neath cneud 2019-12-16 16:37:47 +0100
  • 92a81a869c
    Initial commit Clemens Neudecker 2019-12-16 16:36:36 +0100