mirror of https://github.com/qurator-spk/sbb_textline_detection.git synced 2026-03-16 11:41:55 +01:00

No description

Find a file

Gerber, Mike b15fed32ff Merge branch 'master' of https://github.com/qurator-spk/sbb_textline_detector		2019-12-06 12:25:01 +01:00
qurator	🐛 sbb_textline_detector: Fix making the output file id	2019-12-04 11:42:45 +01:00
.gitkeep	🧹 sbb_textline_docker: Rename to sbb_textline_detector	2019-10-10 16:13:07 +02:00
Dockerfile	🧹 sbb_textline_detector: Use same structure as the other projects	2019-10-10 16:24:28 +02:00
ocrd-tool.json	✨ sbb_textline_detector: Add a OCR-D interface	2019-10-10 17:54:42 +02:00
README.md	📝 sbb_textline_detector: Document OCR-D Usage	2019-12-06 11:42:23 +01:00
requirements.txt	✨ sbb_textline_detection: Preserve input PAGE info by merging segmentation results	2019-11-19 15:08:53 +01:00
setup.py	🐛 sbb_textline_detector: Install *.json	2019-10-11 16:18:10 +02:00

README.md

Textline-Recognition

Tool

This tool does textline detection of image and throw result as xml data.

Models

In order to run this tool you need corresponding models. You can find them here:

https://file.spk-berlin.de:8443/textline_detection/

Installation

sudo pip install .

Usage

sbb_textline_detector -i 'image file name' -o 'directory to write output xml' -m 'directory of models'

Usage with OCR-D

ocrd-example-binarize -I OCR-D-IMG -O OCR-D-IMG-BIN
ocrd_sbb_textline_detector -I OCR-D-IMG-BIN -O OCR-D-SEG-LINE-SBB -p '{ "model": "/path/to/the/models/textline_detection" }'

Segmentation works on raw RGB images, but respects and retains AlternativeImages from binarization steps, so it's a good idea to do binarization first, then perform the textline detection. The used binarization processor must produce an AlternativeImage for the binarized image, not replace the original raw RGB image.