You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1003 B

Raw Blame History

Textline-Recognition

Introduction

This tool performs textline detection from document image data and returns the results as PAGE-XML.

Installation

pip install .

Models

In order to run this tool you also need trained models. You can download our pre-trained models from here:
https://file.spk-berlin.de:8443/textline_detection/

Usage

sbb_textline_detector -i <image file name> -o <directory to write output xml> -m <directory of models>

Usage with OCR-D

ocrd-example-binarize -I OCR-D-IMG -O OCR-D-IMG-BIN
ocrd_sbb_textline_detector -I OCR-D-IMG-BIN -O OCR-D-SEG-LINE-SBB \
        -p '{ "model": "/path/to/the/models/textline_detection" }'

Segmentation works on raw RGB images, but respects and retains AlternativeImages from binarization steps, so it's a good idea to do binarization first, then perform the textline detection. The used binarization processor must produce an AlternativeImage for the binarized image, not replace the original raw RGB image.

1003 B Raw Blame History