Go to file
Gerber, Mike dc38f0ee51 🎨 Use TOOL constant convention from the other OCR-D processors
.circleci CircleCI: Install imagemagick
.idea 🔧 Add PyCharm project files
ocrd_calamari 🎨 Use TOOL constant convention from the other OCR-D processors
test Remove broken __main__ handling (stick to pytest)
.coveragerc Only do the coverage on our code
.gitignore Fix tests by 1. binarizing and 2. use the GT4HistOCR model
Dockerfile Dockerfile
LICENSE Initial commit
Makefile 🧹 Do not advertise and support untested models
README-DEV.md 📝 README: Add testing instructions + reference README-DEV.md
README.md 🧹 Do not advertise and support untested models
ocrd-tool.json .
requirements-test.txt Use GT segmentation to test
requirements.txt 🐛 Further tighten dependencies to a known good configuration
setup.py 📦 v0.0.3 – To fix version inconsistency

README.md

ocrd_calamari

Recognize text using Calamari OCR.

image image image

Introduction

This offers a OCR-D compliant workspace processor for some of the functionality of Calamari OCR.

This processor only operates on the text line level and so needs a line segmentation (and by extension a binarized image) as its input.

Installation

From PyPI

pip install ocrd_calamari

From Repo

pip install .

Install models

Download models trained on GT4HistOCR data:

make gt4histocr-calamari
ls gt4histocr-calamari

Example Usage

ocrd-calamari-recognize -p test-parameters.json -m mets.xml -I OCR-D-SEG-LINE -O OCR-D-OCR-CALAMARI

With test-parameters.json:

{
    "checkpoint": "/path/to/some/trained/models/*.ckpt.json"
}

Development & Testing

For information regarding development and testing, please see README-DEV.md.