From f2e9ed535debd919ffce29e004b56be919f250c7 Mon Sep 17 00:00:00 2001 From: Clemens Neudecker Date: Wed, 30 Oct 2019 19:08:12 +0100 Subject: [PATCH] Update README.md --- tools/README.md | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/tools/README.md b/tools/README.md index 8b5840e..502f899 100644 --- a/tools/README.md +++ b/tools/README.md @@ -37,4 +37,14 @@ extract-doc-links enp_DE.tsv enp_DE-urls.tsv By loading the annotated TSV as well as the url mapping file into ner.edith, you will be able to jump directly to the original image -where the full text has been extracted from. \ No newline at end of file +where the full text has been extracted from. + +# PAGE-XML to TSV Transformation + +## Usage: + +Create a TSV file from OCR in PAGE-XML format (with word segmentation): + +``` +python page2tsv.py PAGE.xml > PAGE.tsv +```