You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Mike Gerber
4b566830a9
See #32: "respects" is probably an ambiguous or even incorrect term. Also rephrase "it's a good idea" to "it's OK to do". |
5 years ago | |
---|---|---|
qurator | 5 years ago | |
.gitignore | 5 years ago | |
.gitkeep | 5 years ago | |
Dockerfile | 5 years ago | |
LICENSE | 5 years ago | |
README.md | 5 years ago | |
ocrd-tool.json | 5 years ago | |
requirements.txt | 5 years ago | |
setup.py | 5 years ago |
README.md
Textline Detection
Detect textlines in document images
Introduction
This tool performs printspace, region and textline detection from document image data and returns the results as PAGE-XML.
Installation
pip install .
Models
In order to run this tool you also need trained models. You can download our pretrained models from here:
https://qurator-data.de/sbb_textline_detector/
Usage
sbb_textline_detector -i <image file name> -o <directory to write output xml> -m <directory of models>
Usage with OCR-D
ocrd-example-binarize -I OCR-D-IMG -O OCR-D-IMG-BIN
ocrd-sbb-textline-detector -I OCR-D-IMG-BIN -O OCR-D-SEG-LINE-SBB \
-p '{ "model": "/path/to/the/models/textline_detection" }'
Segmentation works on raw RGB images, but retains
AlternativeImage
s from binarization steps, so it's OK to do
binarization first, then perform the textline detection. The used binarization
processor must produce an AlternativeImage
for the binarized image, not
replace the original raw RGB image.