This tool performs document image binarization using trained models. The method is based on [Calvo-Zaragoza and Gallego, 2018](https://arxiv.org/abs/1706.10241).
This tool performs document image binarization using a trained ResNet50-UNet model.
## Installation
## Installation
@ -18,10 +18,14 @@ Clone the repository, enter it and run
### Models
### Models
Pre-trained models can be downloaded from here:
Pre-trained models in `HDF5` format can be downloaded from here:
https://qurator-data.de/sbb_binarization/
https://qurator-data.de/sbb_binarization/
We also provide a Tensorflow `saved_model` via Huggingface:
https://huggingface.co/SBB/sbb_binarization
## Usage
## Usage
```sh
```sh
@ -31,7 +35,9 @@ sbb_binarize \
<outputimage>
<outputimage>
```
```
Example
Images containing a lot of border noise (black pixels) should be cropped beforehand to improve the quality of results.
"description":"PAGE XML hierarchy level to operate on"
"description":"PAGE XML hierarchy level to operate on"
},
},
"model":{
"model":{
"description":"Directory containing HDF5 models. Can be an absolute path or a path relative to the current working directory or $SBB_BINARIZE_DATA environment variable (if set)",
"description":"Directory containing HDF5 or SavedModel/ProtoBuf models. Can be an absolute path or a path relative to the OCR-D resource location, the current working directory or the $SBB_BINARIZE_DATA environment variable (if set)",