You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
Go to file
Clemens Neudecker 1c4ddac3b6
Merge pull request #10 from kba/kebab-snake
kebab-case snake_case executable, fix #9
5 years ago
qurator kebab-case snake_case executable, fix #9 5 years ago
.gitignore kebab-case snake_case executable, fix #9 5 years ago
.gitkeep 🧹 sbb_textline_docker: Rename to sbb_textline_detector 5 years ago
Dockerfile 🧹 sbb_textline_detector: Use same structure as the other projects 5 years ago
README.md 📝 sbb_textline_detector: Break long line for ocrd_sbb_textline_detector example 5 years ago
ocrd-tool.json sbb_textline_detector: Add a OCR-D interface 5 years ago
requirements.txt sbb_textline_detection: Preserve input PAGE info by merging segmentation results 5 years ago
setup.py kebab-case snake_case executable, fix #9 5 years ago

README.md

Textline-Recognition


Tool

This tool does textline detection of image and throw result as xml data.

Models

In order to run this tool you need corresponding models. You can find them here:

https://file.spk-berlin.de:8443/textline_detection/

Installation

sudo pip install .

Usage

sbb_textline_detector -i 'image file name' -o 'directory to write output xml' -m 'directory of models'

Usage with OCR-D

ocrd-example-binarize -I OCR-D-IMG -O OCR-D-IMG-BIN
ocrd_sbb_textline_detector -I OCR-D-IMG-BIN -O OCR-D-SEG-LINE-SBB \
        -p '{ "model": "/path/to/the/models/textline_detection" }'

Segmentation works on raw RGB images, but respects and retains AlternativeImages from binarization steps, so it's a good idea to do binarization first, then perform the textline detection. The used binarization processor must produce an AlternativeImage for the binarized image, not replace the original raw RGB image.