You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
Go to file
Alexander Pacha 55a9fba677 Small fix to raise an exception when no model was found instead of failing later with TypeError: 'int' object is not subscriptable. 2 years ago
.circleci use resmgr for model download 4 years ago
repo add assets subrepo 4 years ago
sbb_binarize Small fix to raise an exception when no model was found instead of failing later with TypeError: 'int' object is not subscriptable. 2 years ago
.gitignore 📦 v0.0.2 4 years ago
.gitkeep Add new directory, you can find corresponding models in qurator-data 5 years ago
.gitmodules add assets subrepo 4 years ago
CHANGELOG.md 📦 v0.0.10 2 years ago
LICENSE Add LICENSE 5 years ago
Makefile fix test 4 years ago
README.md improve usage instructions 3 years ago
make.sh Add new file 5 years ago
ocrd-tool.json add ocrd-tool.json 4 years ago
requirements.txt depend on tensorflow not tensorflow-gpu, drop holding h5py 3 years ago
setup.py minimal CI setup 4 years ago

README.md

Binarization

Binarization for document images

Examples

Introduction

This tool performs document image binarization using trained models. The method is based on Calvo-Zaragoza and Gallego, 2018.

Installation

Clone the repository, enter it and run

pip install .

Models

Pre-trained models can be downloaded from here:

https://qurator-data.de/sbb_binarization/

Usage

sbb_binarize \
  --patches \
  -m <path to directory containing model files> \
  <input image> \
  <output image>

Note In virtually all cases, applying the --patches flag will improve the quality of results.

Example

sbb_binarize --patches -m /path/to/models/ myimage.tif myimage-bin.tif

To use the OCR-D interface:

ocrd-sbb-binarize --overwrite -I INPUT_FILE_GRP -O OCR-D-IMG-BIN -P model "/var/lib/sbb_binarization"