1
0
Fork 0
mirror of https://github.com/qurator-spk/page2tsv.git synced 2025-06-09 11:29:55 +02:00
No description
Find a file
2021-04-01 15:21:03 +02:00
tsvtools use OCR-D/core PAGE API for reading order and recursive regions 2021-04-01 15:21:03 +02:00
.gitignore use OCR-D/core PAGE API for reading order and recursive regions 2021-04-01 15:21:03 +02:00
__init__.py add OCR annotation functionality 2021-02-01 16:25:12 +01:00
example.xml add example.xml PAGE-XML 2019-12-16 16:40:39 +01:00
LICENSE Initial commit 2019-12-16 16:36:36 +01:00
README.md Update README.md 2019-12-16 17:24:41 +01:00
requirements.txt use OCR-D/core PAGE API for reading order and recursive regions 2021-04-01 15:21:03 +02:00
setup.py fix setup.py 2021-03-18 08:27:36 +01:00

TSV - Processing Tools

Installation:

Setup virtual environment:

virtualenv --python=python3.6 venv

Activate virtual environment:

source venv/bin/activate

Upgrade pip:

pip install -U pip

Install package together with its dependencies in development mode:

pip install -e ./

PAGE-XML to TSV Transformation:

Create a TSV file from OCR in PAGE-XML format (with word segmentation):

page2tsv PAGE1.xml PAGE.tsv --image-url=http://link-to-corresponding-image-1

In order to create a TSV file for multiple PAGE XML files just perform successive calls of the tool using the same TSV file:

page2tsv PAGE1.xml PAGE.tsv --image-url=http://link-to-corresponding-image-1
page2tsv PAGE2.xml PAGE.tsv --image-url=http://link-to-corresponding-image-2
page2tsv PAGE3.xml PAGE.tsv --image-url=http://link-to-corresponding-image-3
page2tsv PAGE4.xml PAGE.tsv --image-url=http://link-to-corresponding-image-4
page2tsv PAGE5.xml PAGE.tsv --image-url=http://link-to-corresponding-image-5
...
...
...

For instance, for the file example.xml:

page2tsv example.xml example.tsv --image-url=http://content.staatsbibliothek-berlin.de/zefys/SNP27646518-18800101-0-3-0-0/left,top,width,height/full/0/default.jpg

Processing of already existing TSV files:

Create a URL-annotated TSV file from an existing TSV file:

annotate-tsv enp_DE.tsv enp_DE-annotated.tsv