Update README

master
Kai Labusch 1 week ago
parent e189222b1e
commit 2577a47d40

@ -66,7 +66,14 @@ annotate-tsv enp_DE.tsv enp_DE-annotated.tsv
# Command-line interface:
```
page2tsv [OPTIONS] PAGE_XML_FILE TSV_OUT_FILE
page2tsv --help
Usage: page2tsv [OPTIONS] PAGE_XML_FILE TSV_OUT_FILE
Converts a page-XML file into a TSV file that can be edited with neat.
Optionally the tool also accepts NER and Entitiy Linking API-Endpoints as
parameters and performs NER and EL and the document if these are provided.
PAGE_XML_FILE: The source page-XML file. TSV_OUT_FILE: Resulting TSV file.
Options:
--purpose [NERD|OCR] Purpose of output tsv file.
@ -76,7 +83,11 @@ Options:
OCR: OCR application/ground-truth creation.
default: NERD.
--image-url TEXT
--image-url TEXT An image retrieval link that enables neat to show
the scan images corresponding to the text tokens.
Example: https://content.staatsbibliothek-berlin.
de/zefys/SNP26824620-18371109-0-1-0-0/left,top,wi
dth,height/full/0/default.jpg
--ner-rest-endpoint TEXT REST endpoint of sbb_ner service. See
https://github.com/qurator-spk/sbb_ner for
details. Only applicable in case of NERD.
@ -89,6 +100,32 @@ Options:
--min-confidence FLOAT
--max-confidence FLOAT
--ned-priority INTEGER
--normalization-file PATH
--help Show this message and exit.
```
```
tsv2tsv --help
Usage: tsv2tsv [OPTIONS] TSV_IN_FILE
Options:
--tsv-out-file PATH Write modified TSV to this file.
--ner-rest-endpoint TEXT REST endpoint of sbb_ner service. See
https://github.com/qurator-spk/sbb_ner for
details.
--noproxy disable proxy. default: enabled.
--num-tokens Print number of tokens in input/output file.
--sentence-count Print sentence count in input/output file.
--max-sentence-len Print maximum sentence len for input/output
file.
--keep-tokenization Keep the word tokenization exactly as it is.
--sentence-split-only Do only sentence splitting.
--show-urls Print contained visualization URLs.
--just-zero Process only files that have max sentence
length zero,i.e., that do not have sentence
splitting.
--sanitize-sentence-numbers Sanitize sentence numbering.
--show-columns Show TSV columns.
--drop-column TEXT Drop column
--help Show this message and exit.
```
Loading…
Cancel
Save