mirror of https://github.com/qurator-spk/sbb_textline_detection.git synced 2026-07-13 06:49:11 +02:00

No description

Find a file

Mike Gerber 3593506e72 🔧 ocrd-tool.json: Update description, steps and categories Fixes #31.		2020-05-29 17:30:53 +02:00
qurator	🔧 ocrd-tool.json: Update description, steps and categories	2020-05-29 17:30:53 +02:00
.gitignore	Revert "Merge branch 'master' of https://github.com/qurator-spk/sbb_textline_detector "	2019-12-09 15:11:25 +01:00
.gitkeep	Update config_params.json	2019-12-05 14:05:55 +01:00
Dockerfile	Update config_params.json	2019-12-05 14:05:55 +01:00
LICENSE	Revert "Merge branch 'master' of https://github.com/qurator-spk/sbb_textline_detector "	2019-12-09 15:11:25 +01:00
ocrd-tool.json	Update config_params.json	2019-12-05 14:05:55 +01:00
README.md	📝 README.md: Rephrase/correct OCR-D usage info	2020-05-29 17:08:29 +02:00
requirements.txt	use TensorFlow 1.15.2 or later, but not 2.x	2020-02-20 16:49:54 +01:00
setup.py	Revert "Merge branch 'master' of https://github.com/qurator-spk/sbb_textline_detector "	2019-12-09 15:11:25 +01:00

README.md

Textline Detection

Detect textlines in document images

Introduction

This tool performs printspace, region and textline detection from document image data and returns the results as PAGE-XML.

Installation

pip install .

Models

In order to run this tool you also need trained models. You can download our pretrained models from here:
https://qurator-data.de/sbb_textline_detector/

Usage

sbb_textline_detector -i <image file name> -o <directory to write output xml> -m <directory of models>

Usage with OCR-D

ocrd-example-binarize -I OCR-D-IMG -O OCR-D-IMG-BIN
ocrd-sbb-textline-detector -I OCR-D-IMG-BIN -O OCR-D-SEG-LINE-SBB \
        -p '{ "model": "/path/to/the/models/textline_detection" }'

Segmentation works on raw RGB images, but retains AlternativeImages from binarization steps, so it's OK to do binarization first, then perform the textline detection. The used binarization processor must produce an AlternativeImage for the binarized image, not replace the original raw RGB image.