No description
Find a file
2019-12-06 12:25:01 +01:00
qurator 🐛 sbb_textline_detector: Fix making the output file id 2019-12-04 11:42:45 +01:00
.gitkeep 🧹 sbb_textline_docker: Rename to sbb_textline_detector 2019-10-10 16:13:07 +02:00
Dockerfile 🧹 sbb_textline_detector: Use same structure as the other projects 2019-10-10 16:24:28 +02:00
ocrd-tool.json sbb_textline_detector: Add a OCR-D interface 2019-10-10 17:54:42 +02:00
README.md 📝 sbb_textline_detector: Document OCR-D Usage 2019-12-06 11:42:23 +01:00
requirements.txt sbb_textline_detection: Preserve input PAGE info by merging segmentation results 2019-11-19 15:08:53 +01:00
setup.py 🐛 sbb_textline_detector: Install *.json 2019-10-11 16:18:10 +02:00

Textline-Recognition


Tool

This tool does textline detection of image and throw result as xml data.

Models

In order to run this tool you need corresponding models. You can find them here:

https://file.spk-berlin.de:8443/textline_detection/

Installation

sudo pip install .

Usage

sbb_textline_detector -i 'image file name' -o 'directory to write output xml' -m 'directory of models'

Usage with OCR-D

ocrd-example-binarize -I OCR-D-IMG -O OCR-D-IMG-BIN
ocrd_sbb_textline_detector -I OCR-D-IMG-BIN -O OCR-D-SEG-LINE-SBB -p '{ "model": "/path/to/the/models/textline_detection" }'

Segmentation works on raw RGB images, but respects and retains AlternativeImages from binarization steps, so it's a good idea to do binarization first, then perform the textline detection. The used binarization processor must produce an AlternativeImage for the binarized image, not replace the original raw RGB image.