Go to file
Gerber, Mike 2393edc645 CircleCI: Install imagemagick
.circleci CircleCI: Install imagemagick
.idea 🔧 Add PyCharm project files
ocrd_calamari remove existing annotation below the line level to avoid inconsistency
test Fix tests by 1. binarizing and 2. use the GT4HistOCR model
.coveragerc Only do the coverage on our code
.gitignore Fix tests by 1. binarizing and 2. use the GT4HistOCR model
Dockerfile Dockerfile
LICENSE Initial commit
Makefile Fix tests by 1. binarizing and 2. use the GT4HistOCR model
README-DEV.md 📝 README: Add testing instructions + reference README-DEV.md
README.md Fix tests by 1. binarizing and 2. use the GT4HistOCR model
ocrd-tool.json .
requirements-test.txt Use GT segmentation to test
requirements.txt 🐛 Further tighten dependencies to a known good configuration
setup.py 📦 v0.0.3 – To fix version inconsistency

README.md

ocrd_calamari

Recognize text using Calamari OCR.

image image image

Introduction

This offers a OCR-D compliant workspace processor for some of the functionality of Calamari OCR.

This processor only operates on the text line level and so needs a line segmentation (and by extension a binarized image) as its input.

Installation

From PyPI

pip install ocrd_calamari

From Repo

pip install .

Install models

Download standard models:

wget https://github.com/Calamari-OCR/calamari_models/archive/master.zip
unzip master.zip

Download models trained on GT4HistOCR data:

make gt4histocr-calamari
ls gt4histocr-calamari

Example Usage

ocrd-calamari-recognize -p test-parameters.json -m mets.xml -I OCR-D-SEG-LINE -O OCR-D-OCR-CALAMARI

With test-parameters.json:

{
    "checkpoint": "/path/to/some/trained/models/*.ckpt.json"
}

Development & Testing

For information regarding development and testing, please see README-DEV.md.