You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
Go to file
Mike Gerber 4b566830a9
📝 README.md: Rephrase/correct OCR-D usage info
See #32: "respects" is probably an ambiguous or even incorrect term. Also rephrase "it's a good idea" to "it's OK to do".
4 years ago
qurator ocrd-tool.json: Make description OCR-D compliant 4 years ago
.gitignore Revert "Merge branch 'master' of https://github.com/qurator-spk/sbb_textline_detector" 5 years ago
.gitkeep Update config_params.json 5 years ago
Dockerfile Update config_params.json 5 years ago
LICENSE Revert "Merge branch 'master' of https://github.com/qurator-spk/sbb_textline_detector" 5 years ago
README.md 📝 README.md: Rephrase/correct OCR-D usage info 4 years ago
ocrd-tool.json Update config_params.json 5 years ago
requirements.txt use TensorFlow 1.15.2 or later, but not 2.x 4 years ago
setup.py Revert "Merge branch 'master' of https://github.com/qurator-spk/sbb_textline_detector" 5 years ago

README.md

Textline Detection

Detect textlines in document images

Introduction

This tool performs printspace, region and textline detection from document image data and returns the results as PAGE-XML.

Installation

pip install .

Models

In order to run this tool you also need trained models. You can download our pretrained models from here:
https://qurator-data.de/sbb_textline_detector/

Usage

sbb_textline_detector -i <image file name> -o <directory to write output xml> -m <directory of models>

Usage with OCR-D

ocrd-example-binarize -I OCR-D-IMG -O OCR-D-IMG-BIN
ocrd-sbb-textline-detector -I OCR-D-IMG-BIN -O OCR-D-SEG-LINE-SBB \
        -p '{ "model": "/path/to/the/models/textline_detection" }'

Segmentation works on raw RGB images, but retains AlternativeImages from binarization steps, so it's OK to do binarization first, then perform the textline detection. The used binarization processor must produce an AlternativeImage for the binarized image, not replace the original raw RGB image.