diff --git a/tools/README.md b/tools/README.md index 8b5840e..502f899 100644 --- a/tools/README.md +++ b/tools/README.md @@ -37,4 +37,14 @@ extract-doc-links enp_DE.tsv enp_DE-urls.tsv By loading the annotated TSV as well as the url mapping file into ner.edith, you will be able to jump directly to the original image -where the full text has been extracted from. \ No newline at end of file +where the full text has been extracted from. + +# PAGE-XML to TSV Transformation + +## Usage: + +Create a TSV file from OCR in PAGE-XML format (with word segmentation): + +``` +python page2tsv.py PAGE.xml > PAGE.tsv +```