This reverts commit 2c89bf3b35ee290d7b830ef270df3a96aa48245e, reversing
changes made to 9f7e413148ca5dbac9b555d7b0d0a5fa3a0f5340.
pull/23/head
Gerber, Mike 5 years ago
parent 1303a7d92f
commit 48a31ce672

2
.gitignore vendored

@ -1,2 +0,0 @@
__pycache__
*.egg-info

Binary file not shown.

After

Width:  |  Height:  |  Size: 144 KiB

@ -0,0 +1,14 @@
dist: xenial # required for Python >= 3.7
language: python
python:
- "3.5"
- "3.6"
- "3.7"
- "3.8"
install:
- pip install -r requirements.txt
script:
- pytest

@ -1,9 +0,0 @@
FROM python:3
ADD requirements.txt /
RUN pip install --proxy=http-proxy.sbb.spk-berlin.de:3128 -r requirements.txt
COPY . /usr/src/sbb_textline_detector
RUN pip install /usr/src/sbb_textline_detector
ENTRYPOINT ["sbb_textline_detector"]

@ -178,7 +178,7 @@
APPENDIX: How to apply the Apache License to your work. APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]" boilerplate notice, with the fields enclosed by brackets "{}"
replaced with your own identifying information. (Don't include replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a comment syntax for the file format. We also recommend that a
@ -186,7 +186,7 @@
same "printed page" as the copyright notice for easier same "printed page" as the copyright notice for easier
identification within third-party archives. identification within third-party archives.
Copyright [yyyy] [name of copyright owner] Copyright 2019 qurator
Licensed under the Apache License, Version 2.0 (the "License"); Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License. you may not use this file except in compliance with the License.

@ -1,30 +1,49 @@
# Textline Detection dinglehopper
============
## Introduction dinglehopper is an OCR evaluation tool and reads [ALTO](https://github.com/altoxml), [PAGE](https://github.com/PRImA-Research-Lab/PAGE-XML) and text files.
This tool performs textline detection from document image data and returns the results as PAGE-XML.
## Installation [![Build Status](https://travis-ci.org/qurator-spk/dinglehopper.svg?branch=master)](https://travis-ci.org/qurator-spk/dinglehopper)
`pip install .` Goals
-----
* Useful
* As a UI tool
* For an automated evaluation
* As a library
* Unicode support
## Models Installation
In order to run this tool you also need trained models. You can download our pre-trained models from here: ------------
https://file.spk-berlin.de:8443/textline_detection/ It's best to use pip, e.g.:
~~~
## Usage sudo pip install .
~~~
`sbb_textline_detector -i <image file name> -o <directory to write output xml> -m <directory of models>` Usage
-----
~~~
dinglehopper some-document.gt.page.xml some-document.ocr.alto.xml
~~~
This generates `report.html` and `report.json`.
## Usage with OCR-D
As a OCR-D processor:
~~~ ~~~
ocrd-example-binarize -I OCR-D-IMG -O OCR-D-IMG-BIN ocrd-dinglehopper -m mets.xml -I OCR-D-GT-PAGE,OCR-D-OCR-TESS -O OCR-D-OCR-TESS-EVAL
ocrd-sbb-textline-detector -I OCR-D-IMG-BIN -O OCR-D-SEG-LINE-SBB \
-p '{ "model": "/path/to/the/models/textline_detection" }'
~~~ ~~~
This generates HTML and JSON reports in the `OCR-D-OCR-TESS-EVAL` filegroup.
Segmentation works on raw RGB images, but respects and retains
`AlternativeImage`s from binarization steps, so it's a good idea to do ![dinglehopper displaying metrics and character differences](.screenshots/dinglehopper.png?raw=true)
binarization first, then perform the textline detection. The used binarization
processor must produce an `AlternativeImage` for the binarized image, not Testing
replace the original raw RGB image. -------
Use `pytest` to run the tests in [the tests directory](qurator/dinglehopper/tests):
~~~
virtualenv -p /usr/bin/python3 venv
. venv/bin/activate
pip install -r requirements.txt
pip install pytest
pytest
~~~

@ -1 +1 @@
qurator/sbb_textline_detector/ocrd-tool.json qurator/dinglehopper/ocrd-tool.json

@ -0,0 +1,4 @@
[pytest]
markers =
integration: integration tests
serial

@ -1 +1,2 @@
__import__('pkg_resources').declare_namespace(__name__) __import__('pkg_resources').declare_namespace(__name__)

@ -0,0 +1,6 @@
# User-specific stuff
.idea/**/workspace.xml
.idea/**/tasks.xml
.idea/**/usage.statistics.xml
.idea/**/dictionaries
.idea/**/shelf

@ -0,0 +1,12 @@
<?xml version="1.0" encoding="UTF-8"?>
<module type="PYTHON_MODULE" version="4">
<component name="NewModuleRootManager">
<content url="file://$MODULE_DIR$" />
<orderEntry type="jdk" jdkName="Python 3.7 (dinglehopper)" jdkType="Python SDK" />
<orderEntry type="sourceFolder" forTests="false" />
</component>
<component name="TestRunnerService">
<option name="projectConfiguration" value="pytest" />
<option name="PROJECT_TEST_RUNNER" value="pytest" />
</component>
</module>

@ -0,0 +1,7 @@
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="ProjectRootManager" version="2" project-jdk-name="Python 3.7 (dinglehopper)" project-jdk-type="Python SDK" />
<component name="PyCharmProfessionalAdvertiser">
<option name="shown" value="true" />
</component>
</project>

@ -0,0 +1,8 @@
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="ProjectModuleManager">
<modules>
<module fileurl="file://$PROJECT_DIR$/.idea/dinglehopper.iml" filepath="$PROJECT_DIR$/.idea/dinglehopper.iml" />
</modules>
</component>
</project>

@ -0,0 +1,5 @@
from .ocr_files import *
from .substitute_equivalences import *
from .character_error_rate import *
from .word_error_rate import *
from .align import *

@ -0,0 +1,43 @@
from .edit_distance import *
def align(t1, t2):
"""Align text."""
s1 = list(grapheme_clusters(unicodedata.normalize('NFC', t1)))
s2 = list(grapheme_clusters(unicodedata.normalize('NFC', t2)))
return seq_align(s1, s2)
def seq_align(s1, s2):
"""Align general sequences."""
s1 = list(s1)
s2 = list(s2)
ops = seq_editops(s1, s2)
i = 0
j = 0
while i < len(s1) or j < len(s2):
o = None
try:
ot = ops[0]
if ot[1] == i and ot[2] == j:
ops = ops[1:]
o = ot
except IndexError:
pass
if o:
if o[0] == 'insert':
yield (None, s2[j])
j += 1
elif o[0] == 'delete':
yield (s1[i], None)
i += 1
elif o[0] == 'replace':
yield (s1[i], s2[j])
i += 1
j += 1
else:
yield (s1[i], s2[j])
i += 1
j += 1

@ -0,0 +1,21 @@
from __future__ import division
import unicodedata
from uniseg.graphemecluster import grapheme_clusters
from qurator.dinglehopper.edit_distance import distance
def character_error_rate(reference, compared):
d = distance(reference, compared)
if d == 0:
return 0
n = len(list(grapheme_clusters(unicodedata.normalize('NFC', reference))))
if n == 0:
return float('inf')
return d/n
# XXX Should we really count newlines here?

@ -0,0 +1,106 @@
import os
import click
from jinja2 import Environment, FileSystemLoader
from markupsafe import escape
from qurator.dinglehopper import *
def gen_diff_report(gt_things, ocr_things, css_prefix, joiner, none, align):
gtx = ''
ocrx = ''
def format_thing(t, css_classes=None):
if t is None:
html_t = none
css_classes += ' ellipsis'
elif t == '\n':
html_t = '<br>'
else:
html_t = escape(t)
if css_classes:
return '<span class="{css_classes}">{html_t}</span>'.format(css_classes=css_classes, html_t=html_t)
else:
return '{html_t}'.format(html_t=html_t)
for k, (g, o) in enumerate(align(gt_things, ocr_things)):
if g == o:
css_classes = None
else:
css_classes = '{css_prefix}diff{k} diff'.format(css_prefix=css_prefix, k=k)
gtx += joiner + format_thing(g, css_classes)
ocrx += joiner + format_thing(o, css_classes)
return \
'''
<div class="row">
<div class="col-md-6 gt">{}</div>
<div class="col-md-6 ocr">{}</div>
</div>
'''.format(gtx, ocrx)
def process(gt, ocr, report_prefix):
"""Check OCR result against GT.
The @click decorators change the signature of the decorated functions, so we keep this undecorated version and use
Click on a wrapper.
"""
gt_text = text(gt)
ocr_text = text(ocr)
gt_text = substitute_equivalences(gt_text)
ocr_text = substitute_equivalences(ocr_text)
cer = character_error_rate(gt_text, ocr_text)
wer = word_error_rate(gt_text, ocr_text)
char_diff_report = gen_diff_report(gt_text, ocr_text, css_prefix='c', joiner='', none='·', align=align)
gt_words = words_normalized(gt_text)
ocr_words = words_normalized(ocr_text)
word_diff_report = gen_diff_report(gt_words, ocr_words, css_prefix='w', joiner=' ', none='', align=seq_align)
def json_float(value):
"""Convert a float value to an JSON float.
This is here so that float('inf') yields "Infinity", not "inf".
"""
if value == float('inf'):
return 'Infinity'
elif value == float('-inf'):
return '-Infinity'
else:
return str(value)
env = Environment(loader=FileSystemLoader(os.path.join(os.path.dirname(os.path.realpath(__file__)), 'templates')))
env.filters['json_float'] = json_float
for report_suffix in ('.html', '.json'):
template_fn = 'report' + report_suffix + '.j2'
out_fn = report_prefix + report_suffix
template = env.get_template(template_fn)
template.stream(
gt=gt, ocr=ocr,
cer=cer, wer=wer,
char_diff_report=char_diff_report,
word_diff_report=word_diff_report
).dump(out_fn)
@click.command()
@click.argument('gt', type=click.Path(exists=True))
@click.argument('ocr', type=click.Path(exists=True))
@click.argument('report_prefix', type=click.Path(), default='report')
def main(gt, ocr, report_prefix):
process(gt, ocr, report_prefix)
if __name__ == '__main__':
main()

@ -0,0 +1,122 @@
from __future__ import division, print_function
import unicodedata
from functools import partial, lru_cache
from typing import Sequence, Tuple
import numpy as np
from uniseg.graphemecluster import grapheme_clusters
def levenshtein_matrix(seq1: Sequence, seq2: Sequence):
"""Compute the matrix commonly computed to produce the Levenshtein distance.
This is also known as the Wagner-Fischer algorithm. The matrix element at the bottom right contains the desired
edit distance.
This algorithm is implemented here because we need an implementation that can work with sequences other than
strings, e.g. lists of grapheme clusters or lists of word strings.
"""
# Internally, we use a cached version. As the cache only works on hashable parameters, we convert the input
# sequences to tuples to make them hashable.
return _levenshtein_matrix(tuple(seq1), tuple(seq2))
@lru_cache(maxsize=10)
def _levenshtein_matrix(seq1: Tuple, seq2: Tuple):
"""Compute the matrix commonly computed to produce the Levenshtein distance.
This is a LRU cached function not meant to be used directly. Use levenshtein_matrix() instead.
"""
m = len(seq1)
n = len(seq2)
def from_to(start, stop):
return range(start, stop + 1, 1)
D = np.zeros((m + 1, n + 1), np.int)
D[0, 0] = 0
for i in from_to(1, m):
D[i, 0] = i
for j in from_to(1, n):
D[0, j] = j
for i in from_to(1, m):
for j in from_to(1, n):
D[i, j] = min(
D[i - 1, j - 1] + 1 * (seq1[i - 1] != seq2[j - 1]), # Same or Substitution
D[i, j - 1] + 1, # Insertion
D[i - 1, j] + 1 # Deletion
)
return D
def levenshtein(seq1, seq2):
"""Compute the Levenshtein edit distance between two sequences"""
m = len(seq1)
n = len(seq2)
D = levenshtein_matrix(seq1, seq2)
return D[m, n]
def levenshtein_matrix_cache_clear():
"""Clear internal Levenshtein matrix cache.
You want to do this between different input file pairs to decrease memory
usage by not caching results from prior input files.
"""
_levenshtein_matrix.cache_clear()
def distance(s1, s2):
"""Compute the Levenshtein edit distance between two Unicode strings
Note that this is different from levenshtein() as this function knows about Unicode normalization and grapheme
clusters. This should be the correct way to compare two Unicode strings.
"""
s1 = list(grapheme_clusters(unicodedata.normalize('NFC', s1)))
s2 = list(grapheme_clusters(unicodedata.normalize('NFC', s2)))
return levenshtein(s1, s2)
def seq_editops(seq1, seq2):
"""
Return sequence of edit operations transforming one sequence to another.
This aims to return the same/similar results as python-Levenshtein's editops(), just generalized to arbitrary
sequences.
"""
seq1 = list(seq1)
seq2 = list(seq2)
m = len(seq1)
n = len(seq2)
D = levenshtein_matrix(seq1, seq2)
def _tail_backtrace(i, j, accumulator):
if i > 0 and D[i - 1, j] + 1 == D[i, j]:
return partial(_tail_backtrace, i - 1, j, [('delete', i-1, j)] + accumulator)
if j > 0 and D[i, j - 1] + 1 == D[i, j]:
return partial(_tail_backtrace, i, j - 1, [('insert', i, j-1)] + accumulator)
if i > 0 and j > 0 and D[i - 1, j - 1] + 1 == D[i, j]:
return partial(_tail_backtrace, i - 1, j - 1, [('replace', i-1, j-1)] + accumulator)
if i > 0 and j > 0 and D[i - 1, j - 1] == D[i, j]:
return partial(_tail_backtrace, i - 1, j - 1, accumulator) # NOP
return accumulator
def backtrace(i, j):
result = partial(_tail_backtrace, i, j, [])
while isinstance(result, partial):
result = result()
return result
b = backtrace(m, n)
return b
def editops(word1, word2):
# XXX Note that this returns indices to the _grapheme clusters_, not characters!
word1 = list(grapheme_clusters(unicodedata.normalize('NFC', word1)))
word2 = list(grapheme_clusters(unicodedata.normalize('NFC', word2)))
return seq_editops(word1, word2)

File diff suppressed because it is too large Load Diff

@ -0,0 +1,558 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import unicodedata"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"def list_characters(s):\n",
" \"\"\"List characters of string s, as seen by Python\"\"\"\n",
" for c in s:\n",
" print(c, end=' ')\n",
" if unicodedata.combining(c):\n",
" print(end=' ')\n",
" print(unicodedata.name(c))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Comparing two Unicode strings"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"S LATIN CAPITAL LETTER S\n",
"c LATIN SMALL LETTER C\n",
"h LATIN SMALL LETTER H\n",
"l LATIN SMALL LETTER L\n",
"y LATIN SMALL LETTER Y\n",
"ñ LATIN SMALL LETTER N WITH TILDE\n",
"\n",
"S LATIN CAPITAL LETTER S\n",
"c LATIN SMALL LETTER C\n",
"h LATIN SMALL LETTER H\n",
"l LATIN SMALL LETTER L\n",
"y LATIN SMALL LETTER Y\n",
"n LATIN SMALL LETTER N\n",
"̃ COMBINING TILDE\n",
"\n"
]
}
],
"source": [
"words = [unicodedata.normalize('NFC', 'Schlyñ'), unicodedata.normalize('NFD', 'Schlyñ')]\n",
"\n",
"for s in words:\n",
" list_characters(s)\n",
" print()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"These two strings are different:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"words[0] == words[1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And yet they are the canonically equivalent:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"unicodedata.normalize('NFC', words[0]) == unicodedata.normalize('NFC', words[1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"→ Normalize to NFC (Normalization Form Composed) to compare. NFC is also composed, which is what we want. But it doesn't matter because we're not interested in the characters as Python sees them, but in grapheme clusters (see below.)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Grapheme clusters"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For evaluation we're interesting in what is perceived as \"characters\". But is \"ñ\" 1 character (LATIN SMALL LETTER N WITH TILDE) or 2 (LATIN SMALL LETTER N + COMBINING TILDE)?\n",
"\n",
"What we're probably want are [grapheme clusters](https://uniseg-python.readthedocs.io/en/latest/graphemecluster.html):"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['S', 'c', 'h', 'l', 'y', 'ñ']\n",
"['S', 'c', 'h', 'l', 'y', 'ñ']\n"
]
}
],
"source": [
"from uniseg.graphemecluster import grapheme_clusters\n",
"\n",
"for w in words:\n",
" print(list(grapheme_clusters(w)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Just looking at the interesting character the last one - from both words:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ñ LATIN SMALL LETTER N WITH TILDE\n",
"\n",
"n LATIN SMALL LETTER N\n",
"̃ COMBINING TILDE\n",
"\n"
]
}
],
"source": [
"for w in words:\n",
" list_characters(list(grapheme_clusters(w))[-1])\n",
" print()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"→ Work with grapheme clusters, not \"characters as Python sees them\"."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"def unicode_name(c):\n",
" if 0xE000 <= ord(c) <= 0xF8FF:\n",
" return 'private use character 0x{:04X}'.format(ord(c))\n",
" else:\n",
" return unicodedata.name(c)\n",
" \n",
"\n",
"def list_grapheme_clusters(s):\n",
" \"\"\"List grapheme clusters of string s\"\"\"\n",
" for g in grapheme_clusters(s):\n",
" print(g, end=' ')\n",
" if len(g) > 1:\n",
" print('(multiple)', end=' ')\n",
" try:\n",
" print(', '.join(unicode_name(c) for c in g))\n",
" except ValueError:\n",
" print('ValueError')"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"S LATIN CAPITAL LETTER S\n",
"c LATIN SMALL LETTER C\n",
"h LATIN SMALL LETTER H\n",
"l LATIN SMALL LETTER L\n",
"y LATIN SMALL LETTER Y\n",
"ñ LATIN SMALL LETTER N WITH TILDE\n",
"\n",
"S LATIN CAPITAL LETTER S\n",
"c LATIN SMALL LETTER C\n",
"h LATIN SMALL LETTER H\n",
"l LATIN SMALL LETTER L\n",
"y LATIN SMALL LETTER Y\n",
"ñ (multiple) LATIN SMALL LETTER N, COMBINING TILDE\n",
"\n"
]
}
],
"source": [
"for w in words:\n",
" list_grapheme_clusters(w)\n",
" print()"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"私 CJK UNIFIED IDEOGRAPH-79C1\n",
"は HIRAGANA LETTER HA\n",
"彼 CJK UNIFIED IDEOGRAPH-5F7C\n",
"女 CJK UNIFIED IDEOGRAPH-5973\n",
"が HIRAGANA LETTER GA\n",
"お HIRAGANA LETTER O\n",
"茶 CJK UNIFIED IDEOGRAPH-8336\n",
"を HIRAGANA LETTER WO\n",
"好 CJK UNIFIED IDEOGRAPH-597D\n",
"き HIRAGANA LETTER KI\n",
"な HIRAGANA LETTER NA\n",
"事 CJK UNIFIED IDEOGRAPH-4E8B\n",
"が HIRAGANA LETTER GA\n",
"分 CJK UNIFIED IDEOGRAPH-5206\n",
"か HIRAGANA LETTER KA\n",
"っ HIRAGANA LETTER SMALL TU\n",
"た HIRAGANA LETTER TA\n",
"。 IDEOGRAPHIC FULL STOP\n"
]
}
],
"source": [
"list_grapheme_clusters('私は彼女がお茶を好きな事が分かった。')"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
". FULL STOP\n",
" SPACE\n",
"ا ARABIC LETTER ALEF\n",
"م ARABIC LETTER MEEM\n",
"ا ARABIC LETTER ALEF\n",
" SPACE\n",
"چ ARABIC LETTER TCHEH\n",
"ن ARABIC LETTER NOON\n",
"د ARABIC LETTER DAL\n",
" SPACE\n",
"ت ARABIC LETTER TEH\n",
"ا ARABIC LETTER ALEF\n",
" SPACE\n",
"ح ARABIC LETTER HAH\n",
"ر ARABIC LETTER REH\n",
"ف ARABIC LETTER FEH\n",
" SPACE\n",
"ت ARABIC LETTER TEH\n",
"و ARABIC LETTER WAW\n",
" SPACE\n",
"ف ARABIC LETTER FEH\n",
"ا ARABIC LETTER ALEF\n",
"ر ARABIC LETTER REH\n",
"س ARABIC LETTER SEEN\n",
"ی ARABIC LETTER FARSI YEH\n",
" SPACE\n",
"ه ARABIC LETTER HEH\n",
"س ARABIC LETTER SEEN\n",
"ت ARABIC LETTER TEH\n",
" SPACE\n",
"ک ARABIC LETTER KEHEH\n",
"ه ARABIC LETTER HEH\n",
" SPACE\n",
"ت ARABIC LETTER TEH\n",
"و ARABIC LETTER WAW\n",
" SPACE\n",
"ع ARABIC LETTER AIN\n",
"ر ARABIC LETTER REH\n",
"ب ARABIC LETTER BEH\n",
"ی ARABIC LETTER FARSI YEH\n",
" SPACE\n",
"ن ARABIC LETTER NOON\n",
"ی ARABIC LETTER FARSI YEH\n",
"س ARABIC LETTER SEEN\n",
"ت ARABIC LETTER TEH\n"
]
}
],
"source": [
"list_grapheme_clusters('. اما چند تا حرف تو فارسی هست که تو عربی نیست')"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
". FULL STOP\n",
" SPACE\n",
"ل ARABIC LETTER LAM\n",
"ك ARABIC LETTER KAF\n",
"ن ARABIC LETTER NOON\n",
" SPACE\n",
"ك ARABIC LETTER KAF\n",
"م ARABIC LETTER MEEM\n",
" SPACE\n",
"ع ARABIC LETTER AIN\n",
"د ARABIC LETTER DAL\n",
"د ARABIC LETTER DAL\n",
" SPACE\n",
"ا ARABIC LETTER ALEF\n",
"ل ARABIC LETTER LAM\n",
"ك ARABIC LETTER KAF\n",
"ل ARABIC LETTER LAM\n",
"م ARABIC LETTER MEEM\n",
"ا ARABIC LETTER ALEF\n",
"ت ARABIC LETTER TEH\n",
" SPACE\n",
"ب ARABIC LETTER BEH\n",
"ا ARABIC LETTER ALEF\n",
"ل ARABIC LETTER LAM\n",
"ف ARABIC LETTER FEH\n",
"ا ARABIC LETTER ALEF\n",
"ر ARABIC LETTER REH\n",
"س ARABIC LETTER SEEN\n",
"ي ARABIC LETTER YEH\n",
"ة ARABIC LETTER TEH MARBUTA\n",
" SPACE\n",
"ه ARABIC LETTER HEH\n",
"ل ARABIC LETTER LAM\n",
" SPACE\n",
"أ ARABIC LETTER ALEF WITH HAMZA ABOVE\n",
"ن ARABIC LETTER NOON\n",
"ت ARABIC LETTER TEH\n",
" SPACE\n",
"ب ARABIC LETTER BEH\n",
"ا ARABIC LETTER ALEF\n",
"ل ARABIC LETTER LAM\n",
"ل ARABIC LETTER LAM\n",
"غ ARABIC LETTER GHAIN\n",
"ة ARABIC LETTER TEH MARBUTA\n",
" SPACE\n",
"ا ARABIC LETTER ALEF\n",
"ل ARABIC LETTER LAM\n",
"ع ARABIC LETTER AIN\n",
"ر ARABIC LETTER REH\n",
"ب ARABIC LETTER BEH\n",
"ي ARABIC LETTER YEH\n",
"ة ARABIC LETTER TEH MARBUTA\n",
"؟ ARABIC QUESTION MARK\n"
]
}
],
"source": [
"list_grapheme_clusters('. لكن كم عدد الكلمات بالفارسية هل أنت باللغة العربية؟')"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"H LATIN CAPITAL LETTER H\n",
"e LATIN SMALL LETTER E\n",
"l LATIN SMALL LETTER L\n",
"l LATIN SMALL LETTER L\n",
"😀 GRINNING FACE\n",
" SPACE\n",
"W LATIN CAPITAL LETTER W\n",
"😀 GRINNING FACE\n",
"r LATIN SMALL LETTER R\n",
"l LATIN SMALL LETTER L\n",
"d LATIN SMALL LETTER D\n",
"! EXCLAMATION MARK\n"
]
}
],
"source": [
"list_grapheme_clusters('Hell😀 W😀rld!')"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"u̶̜͓̬̞͚͙̪̰͓̯̲̝̬͔͎̳̼͇̓͊ͤ̋̃̀̄̓̿͊̀̚͟͜͟ͅ (multiple) LATIN SMALL LETTER U, COMBINING COMMA ABOVE, COMBINING NOT TILDE ABOVE, COMBINING LATIN SMALL LETTER E, COMBINING DOUBLE ACUTE ACCENT, COMBINING TILDE, COMBINING GRAVE ACCENT, COMBINING LEFT ANGLE ABOVE, COMBINING MACRON, COMBINING COMMA ABOVE, COMBINING DOUBLE OVERLINE, COMBINING NOT TILDE ABOVE, COMBINING DOUBLE MACRON BELOW, COMBINING GRAVE TONE MARK, COMBINING DOUBLE BREVE BELOW, COMBINING LONG STROKE OVERLAY, COMBINING DOUBLE MACRON BELOW, COMBINING LEFT HALF RING BELOW, COMBINING X BELOW, COMBINING CARON BELOW, COMBINING DOWN TACK BELOW, COMBINING DOUBLE RING BELOW, COMBINING ASTERISK BELOW, COMBINING BRIDGE BELOW, COMBINING TILDE BELOW, COMBINING X BELOW, COMBINING INVERTED BREVE BELOW, COMBINING LOW LINE, COMBINING UP TACK BELOW, COMBINING CARON BELOW, COMBINING LEFT ARROWHEAD BELOW, COMBINING UPWARDS ARROW BELOW, COMBINING DOUBLE LOW LINE, COMBINING SEAGULL BELOW, COMBINING EQUALS SIGN BELOW, COMBINING GREEK YPOGEGRAMMENI\n",
"ņ̷͔̤̜̗̘̠̦̦̖̟͉̹͕̬͎̙̲̲̎̅̈́ͮͣ̔̀̌͂̄͆͑̚ (multiple) LATIN SMALL LETTER N, COMBINING DOUBLE VERTICAL LINE ABOVE, COMBINING OVERLINE, COMBINING GREEK DIALYTIKA TONOS, COMBINING LEFT ANGLE ABOVE, COMBINING LATIN SMALL LETTER V, COMBINING LATIN SMALL LETTER A, COMBINING REVERSED COMMA ABOVE, COMBINING GRAVE ACCENT, COMBINING CARON, COMBINING GREEK PERISPOMENI, COMBINING MACRON, COMBINING BRIDGE ABOVE, COMBINING LEFT HALF RING ABOVE, COMBINING SHORT SOLIDUS OVERLAY, COMBINING CEDILLA, COMBINING LEFT ARROWHEAD BELOW, COMBINING DIAERESIS BELOW, COMBINING LEFT HALF RING BELOW, COMBINING ACUTE ACCENT BELOW, COMBINING LEFT TACK BELOW, COMBINING MINUS SIGN BELOW, COMBINING COMMA BELOW, COMBINING COMMA BELOW, COMBINING GRAVE ACCENT BELOW, COMBINING PLUS SIGN BELOW, COMBINING LEFT ANGLE BELOW, COMBINING RIGHT HALF RING BELOW, COMBINING RIGHT ARROWHEAD BELOW, COMBINING CARON BELOW, COMBINING UPWARDS ARROW BELOW, COMBINING RIGHT TACK BELOW, COMBINING LOW LINE, COMBINING LOW LINE\n",
"i̴̢͖̳̣̙͕̍ͯͧ̀ͥͭ̆ͣ̉͐͆̊͋͛̈́͒͟ (multiple) LATIN SMALL LETTER I, COMBINING VERTICAL LINE ABOVE, COMBINING LATIN SMALL LETTER X, COMBINING LATIN SMALL LETTER U, COMBINING GRAVE ACCENT, COMBINING LATIN SMALL LETTER I, COMBINING LATIN SMALL LETTER T, COMBINING BREVE, COMBINING LATIN SMALL LETTER A, COMBINING HOOK ABOVE, COMBINING RIGHT ARROWHEAD ABOVE, COMBINING BRIDGE ABOVE, COMBINING RING ABOVE, COMBINING HOMOTHETIC ABOVE, COMBINING ZIGZAG ABOVE, COMBINING GREEK DIALYTIKA TONOS, COMBINING FERMATA, COMBINING TILDE OVERLAY, COMBINING RETROFLEX HOOK BELOW, COMBINING DOUBLE MACRON BELOW, COMBINING RIGHT ARROWHEAD AND UP ARROWHEAD BELOW, COMBINING DOUBLE LOW LINE, COMBINING DOT BELOW, COMBINING RIGHT TACK BELOW, COMBINING RIGHT ARROWHEAD BELOW\n",
"c̰̟̫̲͇̺̹͖̼̦̾ͮ̍̐ͤͪ̓ͤ̐̈́̅ͯͤ̚̚͘ (multiple) LATIN SMALL LETTER C, COMBINING VERTICAL TILDE, COMBINING LATIN SMALL LETTER V, COMBINING VERTICAL LINE ABOVE, COMBINING CANDRABINDU, COMBINING LATIN SMALL LETTER E, COMBINING LEFT ANGLE ABOVE, COMBINING LATIN SMALL LETTER H, COMBINING COMMA ABOVE, COMBINING LATIN SMALL LETTER E, COMBINING LEFT ANGLE ABOVE, COMBINING CANDRABINDU, COMBINING GREEK DIALYTIKA TONOS, COMBINING OVERLINE, COMBINING LATIN SMALL LETTER X, COMBINING LATIN SMALL LETTER E, COMBINING DOT ABOVE RIGHT, COMBINING TILDE BELOW, COMBINING PLUS SIGN BELOW, COMBINING INVERTED DOUBLE ARCH BELOW, COMBINING LOW LINE, COMBINING EQUALS SIGN BELOW, COMBINING INVERTED BRIDGE BELOW, COMBINING RIGHT HALF RING BELOW, COMBINING RIGHT ARROWHEAD AND UP ARROWHEAD BELOW, COMBINING SEAGULL BELOW, COMBINING COMMA BELOW\n",
"o̴ͣ̑̐ͫ̈̄͊ͥ̓͟͏̫͔̠̤̜̤̥͘ (multiple) LATIN SMALL LETTER O, COMBINING LATIN SMALL LETTER A, COMBINING INVERTED BREVE, COMBINING CANDRABINDU, COMBINING LATIN SMALL LETTER M, COMBINING DIAERESIS, COMBINING MACRON, COMBINING NOT TILDE ABOVE, COMBINING LATIN SMALL LETTER I, COMBINING GREEK KORONIS, COMBINING DOUBLE MACRON BELOW, COMBINING TILDE OVERLAY, COMBINING GRAPHEME JOINER, COMBINING DOT ABOVE RIGHT, COMBINING INVERTED DOUBLE ARCH BELOW, COMBINING LEFT ARROWHEAD BELOW, COMBINING MINUS SIGN BELOW, COMBINING DIAERESIS BELOW, COMBINING LEFT HALF RING BELOW, COMBINING DIAERESIS BELOW, COMBINING RING BELOW\n",
"ḍ̛̥͖͓̪͈̹̯͖̱̘͙͖ͧ̿ͧ̓̓͊̈͑͘̕ (multiple) LATIN SMALL LETTER D, COMBINING LATIN SMALL LETTER U, COMBINING DOUBLE OVERLINE, COMBINING LATIN SMALL LETTER U, COMBINING COMMA ABOVE, COMBINING COMMA ABOVE, COMBINING NOT TILDE ABOVE, COMBINING DIAERESIS, COMBINING LEFT HALF RING ABOVE, COMBINING DOT ABOVE RIGHT, COMBINING COMMA ABOVE RIGHT, COMBINING HORN, COMBINING DOT BELOW, COMBINING RING BELOW, COMBINING RIGHT ARROWHEAD AND UP ARROWHEAD BELOW, COMBINING X BELOW, COMBINING BRIDGE BELOW, COMBINING DOUBLE VERTICAL LINE BELOW, COMBINING RIGHT HALF RING BELOW, COMBINING INVERTED BREVE BELOW, COMBINING RIGHT ARROWHEAD AND UP ARROWHEAD BELOW, COMBINING MACRON BELOW, COMBINING LEFT TACK BELOW, COMBINING ASTERISK BELOW, COMBINING RIGHT ARROWHEAD AND UP ARROWHEAD BELOW\n",
"e̛̺͈̜̰̜̖͎͚͈͋̒̆̈́̏͊ͬ̎̑̇̾̆̓ͬ̔̐̾ͭ́͞ (multiple) LATIN SMALL LETTER E, COMBINING HOMOTHETIC ABOVE, COMBINING TURNED COMMA ABOVE, COMBINING BREVE, COMBINING GREEK DIALYTIKA TONOS, COMBINING DOUBLE GRAVE ACCENT, COMBINING NOT TILDE ABOVE, COMBINING LATIN SMALL LETTER R, COMBINING DOUBLE VERTICAL LINE ABOVE, COMBINING INVERTED BREVE, COMBINING DOT ABOVE, COMBINING VERTICAL TILDE, COMBINING BREVE, COMBINING GREEK KORONIS, COMBINING LATIN SMALL LETTER R, COMBINING REVERSED COMMA ABOVE, COMBINING CANDRABINDU, COMBINING VERTICAL TILDE, COMBINING LATIN SMALL LETTER T, COMBINING ACUTE TONE MARK, COMBINING HORN, COMBINING DOUBLE MACRON, COMBINING INVERTED BRIDGE BELOW, COMBINING DOUBLE VERTICAL LINE BELOW, COMBINING LEFT HALF RING BELOW, COMBINING TILDE BELOW, COMBINING LEFT HALF RING BELOW, COMBINING GRAVE ACCENT BELOW, COMBINING UPWARDS ARROW BELOW, COMBINING DOUBLE RING BELOW, COMBINING DOUBLE VERTICAL LINE BELOW\n"
]
}
],
"source": [
"list_grapheme_clusters('u̶̜͓̬̞͚͙̪̰͓̯̲̝̬͔͎̳̼͇̓͊ͤ̋̃̀̄̓̿͊̀̚͟͜͟ͅņ̷͔̤̜̗̘̠̦̦̖̟͉̹͕̬͎̙̲̲̎̅̈́ͮͣ̔̀̌͂̄͆͑̚i̴̢͖̳̣̙͕̍ͯͧ̀ͥͭ̆ͣ̉͐͆̊͋͛̈́͒͟c̰̟̫̲͇̺̹͖̼̦̾ͮ̍̐ͤͪ̓ͤ̐̈́̅ͯͤ̚̚͘o̴ͣ̑̐ͫ̈̄͊ͥ̓͟͏̫͔̠̤̜̤̥͘ḍ̛̥͖͓̪͈̹̯͖̱̘͙͖ͧ̿ͧ̓̓͊̈͑͘̕e̛̺͈̜̰̜̖͎͚͈͋̒̆̈́̏͊ͬ̎̑̇̾̆̓ͬ̔̐̾ͭ́͞')"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Z LATIN CAPITAL LETTER Z\n",
"e LATIN SMALL LETTER E\n",
"u LATIN SMALL LETTER U\n",
"g LATIN SMALL LETTER G\n",
"n LATIN SMALL LETTER N\n",
"uͤ (multiple) LATIN SMALL LETTER U, COMBINING LATIN SMALL LETTER E\n",
"ß LATIN SMALL LETTER SHARP S\n"
]
}
],
"source": [
"list_grapheme_clusters('Zeugnuͤß')"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Z LATIN CAPITAL LETTER Z\n",
"e LATIN SMALL LETTER E\n",
"u LATIN SMALL LETTER U\n",
"g LATIN SMALL LETTER G\n",
"n LATIN SMALL LETTER N\n",
" private use character 0xE72B\n",
"ß LATIN SMALL LETTER SHARP S\n"
]
}
],
"source": [
"list_grapheme_clusters('Zeugnß')"
]
}
],
"metadata": {
"hide_input": false,
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": true
}
},
"nbformat": 4,
"nbformat_minor": 2
}

@ -0,0 +1,107 @@
from __future__ import division, print_function
from warnings import warn
from lxml import etree as ET
import sys
from lxml.etree import XMLSyntaxError
def alto_namespace(tree):
"""Return the ALTO namespace used in the given ElementTree.
This relies on the assumption that, in any given ALTO file, the root element has the local name "alto". We do not
check if the files uses any valid ALTO namespace.
"""
root_name = ET.QName(tree.getroot().tag)
if root_name.localname == 'alto':
return root_name.namespace
else:
raise ValueError('Not an ALTO tree')
def alto_text(tree):
"""Extract text from the given ALTO ElementTree."""
nsmap = {'alto': alto_namespace(tree)}
lines = (
' '.join(string.attrib.get('CONTENT') for string in line.iterfind('alto:String', namespaces=nsmap))
for line in tree.iterfind('.//alto:TextLine', namespaces=nsmap))
text_ = '\n'.join(lines)
return text_
def page_namespace(tree):
"""Return the PAGE content namespace used in the given ElementTree.
This relies on the assumption that, in any given PAGE content file, the root element has the local name "PcGts". We
do not check if the files uses any valid PAGE namespace.
"""
root_name = ET.QName(tree.getroot().tag)
if root_name.localname == 'PcGts':
return root_name.namespace
else:
raise ValueError('Not a PAGE tree')
def page_text(tree):
"""Extract text from the given PAGE content ElementTree."""
nsmap = {'page': page_namespace(tree)}
def region_text(region):
try:
return region.find('./page:TextEquiv/page:Unicode', namespaces=nsmap).text
except AttributeError:
return None
region_texts = []
reading_order = tree.find('.//page:ReadingOrder', namespaces=nsmap)
if reading_order is not None:
for group in reading_order.iterfind('./*', namespaces=nsmap):
if ET.QName(group.tag).localname == 'OrderedGroup':
region_ref_indexeds = group.findall('./page:RegionRefIndexed', namespaces=nsmap)
for region_ref_indexed in sorted(region_ref_indexeds, key=lambda r: int(r.attrib['index'])):
region_id = region_ref_indexed.attrib['regionRef']
region = tree.find('.//page:TextRegion[@id="%s"]' % region_id, namespaces=nsmap)
if region is not None:
region_texts.append(region_text(region))
else:
warn('Not a TextRegion: "%s"' % region_id)
else:
raise NotImplementedError
else:
for region in tree.iterfind('.//page:TextRegion', namespaces=nsmap):
region_texts.append(region_text(region))
# XXX Does a file have to have regions etc.? region vs lines etc.
# Filter empty region texts
region_texts = (t for t in region_texts if t)
text_ = '\n'.join(region_texts)
return text_
def text(filename):
"""Read the text from the given file.
Supports PAGE, ALTO and falls back to plain text.
"""
try:
tree = ET.parse(filename)
except XMLSyntaxError:
with open(filename, 'r') as f:
return f.read()
try:
return page_text(tree)
except ValueError:
return alto_text(tree)
if __name__ == '__main__':
print(text(sys.argv[1]))

@ -0,0 +1,22 @@
{
"git_url": "https://github.com/qurator-spk/dinglehopper",
"tools": {
"ocrd-dinglehopper": {
"executable": "ocrd-dinglehopper",
"description": "Evaluate OCR text against ground truth with dinglehopper",
"input_file_grp": [
"OCR-D-GT-PAGE",
"OCR-D-OCR"
],
"output_file_grp": [
"OCR-D-OCR-EVAL"
],
"categories": [
"Quality assurance"
],
"steps": [
"recognition/text-recognition"
]
}
}
}

@ -0,0 +1,71 @@
import json
import os
import click
from ocrd import Processor
from ocrd.decorators import ocrd_cli_options, ocrd_cli_wrap_processor
from ocrd_utils import concat_padded, getLogger
from pkg_resources import resource_string
from qurator.dinglehopper.cli import process as cli_process
from qurator.dinglehopper.edit_distance import levenshtein_matrix_cache_clear
log = getLogger('processor.OcrdDinglehopperEvaluate')
OCRD_TOOL = json.loads(resource_string(__name__, 'ocrd-tool.json').decode('utf8'))
@click.command()
@ocrd_cli_options
def ocrd_dinglehopper(*args, **kwargs):
return ocrd_cli_wrap_processor(OcrdDinglehopperEvaluate, *args, **kwargs)
class OcrdDinglehopperEvaluate(Processor):
def __init__(self, *args, **kwargs):
kwargs['ocrd_tool'] = OCRD_TOOL['tools']['ocrd-dinglehopper']
super(OcrdDinglehopperEvaluate, self).__init__(*args, **kwargs)
def _make_file_id(self, input_file, input_file_grp, n):
file_id = input_file.ID.replace(input_file_grp, self.output_file_grp)
if file_id == input_file.ID:
file_id = concat_padded(self.output_file_grp, n)
return file_id
def process(self):
gt_grp, ocr_grp = self.input_file_grp.split(',')
for n, page_id in enumerate(self.workspace.mets.physical_pages):
gt_file = self.workspace.mets.find_files(fileGrp=gt_grp, pageId=page_id)[0]
ocr_file = self.workspace.mets.find_files(fileGrp=ocr_grp, pageId=page_id)[0]
log.info("INPUT FILES %i / %s%s", n, gt_file, ocr_file)
file_id = self._make_file_id(ocr_file, ocr_grp, n)
report_prefix = os.path.join(self.output_file_grp, file_id)
# Process the files
try:
os.mkdir(self.output_file_grp)
except FileExistsError:
pass
cli_process(gt_file.local_filename, ocr_file.local_filename, report_prefix)
# Add reports to the workspace
for report_suffix, mimetype in \
[
['.html', 'text/html'],
['.json', 'application/json']
]:
self.workspace.add_file(
ID=file_id + report_suffix,
file_grp=self.output_file_grp,
pageId=page_id,
mimetype=mimetype,
local_filename=report_prefix + report_suffix)
# Clear cache between files
levenshtein_matrix_cache_clear()
if __name__ == '__main__':
ocrd_dinglehopper()

@ -0,0 +1,46 @@
import unicodedata
def substitute_equivalences(s):
# These are for OCR-D GT vs Tesseract frk vs Calamari GT4HistOCR
# It might make sense to use different rules for GT and for the different OCR
equivalences = {
'': 'ü',
'': 'ſſ',
"\ueba7": 'ſſi', # MUFI: LATIN SMALL LIGATURE LONG S LONG S I
'': 'ä',
'': 'ch',
'==': '', # → en-dash
'': '', # em-dash → en-dash
'': 'ck',
'': 'll',
'': 'ö',
'': 'ſi',
'': 'ſt',
'': 'fi',
'': 'ff',
'': 'fl',
'': 'ffi',
'': 'ct',
'': '\'',
'': '-',
'': 'tz', # MUFI: LATIN SMALL LIGATURE TZ
'': 'ä', # LATIN SMALL LETTER A, COMBINING LATIN SMALL LETTER E
'': 'ö', # LATIN SMALL LETTER O, COMBINING LATIN SMALL LETTER E
'': 'ü', # LATIN SMALL LETTER U, COMBINING LATIN SMALL LETTER E
'\uf532': 'as', # eMOP: Latin small ligature as
'\uf533': 'is', # eMOP: Latin small ligature is
'\uf534': 'us', # eMOP: Latin small ligature us
'\uf535': 'Qu', # eMOP: Latin ligature capital Q small u
'ij': 'ij', # U+0133 LATIN SMALL LIGATURE IJ
'\uE8BF': 'q&', # MUFI: LATIN SMALL LETTER Q LIGATED WITH FINAL ET XXX How to replace this correctly?
'\uEBA5': 'ſp', # MUFI: LATIN SMALL LIGATURE LONG S P
'': 'st', # U+FB06 LATIN SMALL LIGATURE ST
'\uF50E': '' # U+F50E LATIN SMALL LETTER Q WITH ACUTE ACCENT
}
s = unicodedata.normalize('NFC', s)
for fr, to in equivalences.items():
s = s.replace(fr, to)
return s

@ -0,0 +1,60 @@
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css" integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T" crossorigin="anonymous">
<style type="text/css">
.gt .diff {
color: green;
}
.ocr .diff {
color: red;
}
.ellipsis {
opacity: 0.5;
font-style: italic;
}
.diff-highlight {
border: 2px solid;
border-radius: 5px;
}
</style>
</head>
<body>
<div class="container">
{{ gt }}<br>
{{ ocr }}
<h2>Metrics</h2>
<p>CER: {{ cer|round(4) }}</p>
<p>WER: {{ wer|round(4) }}</p>
<h2>Character differences</h2>
{{ char_diff_report }}
<h2>Word differences</h2>
{{ word_diff_report }}
</div>
<script src="https://code.jquery.com/jquery-3.3.1.slim.min.js" integrity="sha384-q8i/X+965DzO0rT7abK41JStQIAqVgRVzpbzo5smXKp4YfRvH+8abtTE1Pi6jizo" crossorigin="anonymous"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.7/umd/popper.min.js" integrity="sha384-UO2eT0CpHqdSJQ6hJty5KVphtPhzWj9WO1clHTMGa3JDZwrnQq4sF86dIHNDz0W1" crossorigin="anonymous"></script>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/js/bootstrap.min.js" integrity="sha384-JjSmVgyd0p3pXB1rRibZUAYoIIy6OrQ6VrjIEaFf/nJGzIxFDsf4x0xIM+B07jRM" crossorigin="anonymous"></script>
<script>
{% include 'report.html.js' %}
</script>
</body>
</html>

@ -0,0 +1,14 @@
function find_diff_class(classes) {
return classes.split(/\s+/).find(x => x.match(/.diff\d.*/));
}
$(document).ready(function() {
$('.diff').mouseover(function() {
let c = find_diff_class($(this).attr('class'))
$('.' + c).addClass('diff-highlight')
});
$('.diff').mouseout(function() {
let c = find_diff_class($(this).attr('class'))
$('.' + c).removeClass('diff-highlight')
});
});

@ -0,0 +1,6 @@
{
"gt": "{{ gt }}",
"ocr": "{{ ocr }}",
"cer": {{ cer|json_float }},
"wer": {{ wer|json_float }}
}

@ -0,0 +1,287 @@
<?xml version="1.0" encoding="UTF-8"?>
<mets:mets xmlns:mets="http://www.loc.gov/METS/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="info:lc/xmlns/premis-v2 http://www.loc.gov/standards/premis/v2/premis-v2-0.xsd http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-6.xsd http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/version17/mets.v1-7.xsd http://www.loc.gov/mix/v10 http://www.loc.gov/standards/mix/mix10/mix10.xsd">
<mets:metsHdr CREATEDATE="2017-08-22T14:23:38">
<mets:agent OTHERTYPE="SOFTWARE" ROLE="CREATOR" TYPE="OTHER">
<mets:name>Goobi - UGH-1.11.1-v1.11.0-11-gbafb11b - 16&#8722;November&#8722;2015</mets:name>
<mets:note>Goobi</mets:note>
</mets:agent>
</mets:metsHdr>
<mets:dmdSec ID="DMDLOG_0000">
<mets:mdWrap MDTYPE="MODS">
<mets:xmlData>
<mods:mods xmlns:mods="http://www.loc.gov/mods/v3">
<mods:location>
<mods:physicalLocation authority="marcorg" displayLabel="Staatsbibliothek zu Berlin - Preu&#223;ischer Kulturbesitz, Berlin, Germany">DE-1</mods:physicalLocation>
<mods:shelfLocator>4" Fy 11178</mods:shelfLocator>
</mods:location>
<mods:originInfo eventType="publication">
<mods:place>
<mods:placeTerm type="text">Hanau</mods:placeTerm>
</mods:place>
<mods:dateIssued encoding="iso8601" keyDate="yes">1749</mods:dateIssued>
</mods:originInfo>
<mods:originInfo eventType="digitization">
<mods:place>
<mods:placeTerm type="text">Berlin</mods:placeTerm>
</mods:place>
<mods:dateCaptured encoding="iso8601">2012</mods:dateCaptured>
<mods:publisher>Staatsbibliothek zu Berlin - Preu&#223;ischer Kulturbesitz, Germany</mods:publisher>
<mods:edition>[Electronic ed.]</mods:edition>
</mods:originInfo>
<mods:classification authority="ZVDD">Historische Drucke</mods:classification>
<mods:classification authority="ZVDD">Rechtswissenschaft</mods:classification>
<mods:classification authority="ZVDD">VD18 digital</mods:classification>
<mods:recordInfo>
<mods:recordIdentifier source="gbv-ppn">PPN718448162</mods:recordIdentifier>
</mods:recordInfo>
<mods:identifier type="purl">http://resolver.staatsbibliothek-berlin.de/SBB00008F1000000000</mods:identifier>
<mods:identifier type="vd18">11750219</mods:identifier>
<mods:identifier type="PPNanalog">PPN370506340</mods:identifier>
<mods:titleInfo>
<mods:title>Acten-m&#228;&#223;iger Verlauff, Des Fameusen Processus sich verhaltende zwischen Herrn Hoff-Rath Era&#223;mus Senckenberg des Raths zu Franckfurt An einem und der Unschuldigen Catharina Agricola, am andern Theil puncto stupri violenti</mods:title>
<mods:subTitle>Worinnen allen unpartheyischen Iustitiariis diese unverantwortliche Procedur und dabey gespielte listige Touren kl&#228;rlich vor Augen gestellet werden</mods:subTitle>
</mods:titleInfo>
<mods:note type="source characteristics">P_Drucke_VD18</mods:note>
<mods:note type="bibliography">VD18 11750219</mods:note>
<mods:language>
<mods:languageTerm authority="iso639-2b" type="code">ger</mods:languageTerm>
</mods:language>
<mods:relatedItem type="series">
<mods:titleInfo>
<mods:title>VD18 digital</mods:title>
</mods:titleInfo>
</mods:relatedItem>
<mods:name type="personal">
<mods:role>
<mods:roleTerm authority="marcrelator" type="code">asn</mods:roleTerm>
</mods:role>
<mods:namePart type="family">Senckenberg</mods:namePart>
<mods:namePart type="given">Era&#223;mus</mods:namePart>
<mods:displayForm>Senckenberg, Era&#223;mus</mods:displayForm>
</mods:name>
<mods:name type="personal">
<mods:role>
<mods:roleTerm authority="marcrelator" type="code">asn</mods:roleTerm>
</mods:role>
<mods:namePart type="family">Agricola</mods:namePart>
<mods:namePart type="given">Catharina</mods:namePart>
<mods:displayForm>Agricola, Catharina</mods:displayForm>
</mods:name>
<mods:name type="corporate">
<mods:role>
<mods:roleTerm authority="marcrelator" type="code">fnd</mods:roleTerm>
</mods:role>
<mods:namePart>Deutsche Forschungsgemeinschaft</mods:namePart>
</mods:name>
<mods:physicalDescription>
<mods:digitalOrigin>reformatted digital</mods:digitalOrigin>
<mods:extent>44 S.</mods:extent>
<mods:extent>2&#176;</mods:extent>
</mods:physicalDescription>
<mods:extension>
<zvdd:zvddWrap xmlns:zvdd="http://zvdd.gdz-cms.de/">
<zvdd:titleWord>Aktenm&#228;&#223;iger Verlauf famosen Prozesses Hofrat Erasmus Rats Frankfurt Justitiariis</zvdd:titleWord>
</zvdd:zvddWrap>
</mods:extension>
<mods:accessCondition type="use and reproduction">CC BY-NC-SA 4.0 International</mods:accessCondition>
<mods:typeOfResource>text</mods:typeOfResource>
</mods:mods>
</mets:xmlData>
</mets:mdWrap>
</mets:dmdSec>
<mets:dmdSec ID="DMDLOG_0001">
<mets:mdWrap MDTYPE="MODS">
<mets:xmlData>
<mods:mods xmlns:mods="http://www.loc.gov/mods/v3">
<mods:titleInfo>
<mods:title>Ursachen so diesen Druck veranlasset</mods:title>
</mods:titleInfo>
</mods:mods>
</mets:xmlData>
</mets:mdWrap>
</mets:dmdSec>
<mets:dmdSec ID="DMDLOG_0002">
<mets:mdWrap MDTYPE="MODS">
<mets:xmlData>
<mods:mods xmlns:mods="http://www.loc.gov/mods/v3">
<mods:titleInfo>
<mods:title>Endlich Abgetrungene Rechtliche Interims-Defensions-Schrifft ...</mods:title>
</mods:titleInfo>
</mods:mods>
</mets:xmlData>
</mets:mdWrap>
</mets:dmdSec>
<mets:amdSec ID="AMD">
<mets:rightsMD ID="RIGHTS">
<mets:mdWrap MDTYPE="OTHER" MIMETYPE="text/xml" OTHERMDTYPE="DVRIGHTS">
<mets:xmlData>
<dv:rights xmlns:dv="http://dfg-viewer.de/">
<dv:owner>Staatsbibliothek zu Berlin - Preu&#223;ischer Kulturbesitz</dv:owner>
<dv:ownerLogo>http://resolver.staatsbibliothek-berlin.de/SBB0000000100000000</dv:ownerLogo>
<dv:ownerSiteURL>http://www.staatsbibliothek-berlin.de</dv:ownerSiteURL>
<dv:ownerContact>mailto:info@sbb.spk-berlin.de</dv:ownerContact>
</dv:rights>
</mets:xmlData>
</mets:mdWrap>
</mets:rightsMD>
<mets:digiprovMD ID="DIGIPROV">
<mets:mdWrap MDTYPE="OTHER" MIMETYPE="text/xml" OTHERMDTYPE="DVLINKS">
<mets:xmlData>
<dv:links xmlns:dv="http://dfg-viewer.de/">
<dv:reference>http://www.stabikat.de/DB=1/PPN?PPN=718448162 </dv:reference>
<dv:presentation>http://digital.staatsbibliothek-berlin.de/dms/werkansicht/?PPN=PPN718448162</dv:presentation>
</dv:links>
</mets:xmlData>
</mets:mdWrap>
</mets:digiprovMD>
</mets:amdSec>
<mets:fileSec>
<mets:fileGrp USE="OCR-D-GT-PAGE">
<mets:file MIMETYPE="application/xml" ID="OCR-D-GT-PAGE_00000024">
<mets:FLocat xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="OCR-D-GT-PAGE/00000024.page.xml"/>
</mets:file>
</mets:fileGrp>
<mets:fileGrp USE="OCR-D-OCR-CALAMARI">
<mets:file MIMETYPE="application/vnd.prima.page+xml" ID="OCR-D-OCR-CALAMARI_0001">
<mets:FLocat xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="OCR-D-OCR-CALAMARI/OCR-D-OCR-CALAMARI_0001.xml"/>
</mets:file>
</mets:fileGrp>
<mets:fileGrp USE="OCR-D-OCR-TESS">
<mets:file MIMETYPE="application/vnd.prima.page+xml" ID="OCR-D-OCR-TESS_0001">
<mets:FLocat xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="OCR-D-OCR-TESS/OCR-D-OCR-TESS_0001.xml"/>
</mets:file>
</mets:fileGrp>
</mets:fileSec>
<mets:structMap TYPE="LOGICAL">
<mets:div ADMID="AMD" CONTENTIDS="http://resolver.staatsbibliothek-berlin.de/SBB00008F1000000000" DMDID="DMDLOG_0000" ID="LOG_0000" LABEL="Acten-m&#228;&#223;iger Verlauff, Des Fameusen Processus sich verhaltende zwischen Herrn Hoff-Rath Era&#223;mus Senckenberg des Raths zu Franckfurt An einem und der Unschuldigen Catharina Agricola, am andern Theil puncto stupri violenti" ORDERLABEL="Acten-m&#228;&#223;iger Verlauff, Des Fameusen Processus sich verhaltende zwischen Herrn Hoff-Rath Era&#223;mus Senckenberg des Raths zu Franckfurt An einem und der Unschuldigen Catharina Agricola, am andern Theil puncto stupri violenti" TYPE="monograph">
<mets:div ID="LOG_0001" TYPE="binding">
<mets:div ID="LOG_0002" TYPE="cover_front"/>
</mets:div>
<mets:div ID="LOG_0003" TYPE="title_page"/>
<mets:div DMDID="DMDLOG_0001" ID="LOG_0004" LABEL="Ursachen so diesen Druck veranlasset" TYPE="section"/>
<mets:div DMDID="DMDLOG_0002" ID="LOG_0005" LABEL="Endlich Abgetrungene Rechtliche Interims-Defensions-Schrifft ..." TYPE="section"/>
<mets:div ID="LOG_0006" TYPE="binding">
<mets:div ID="LOG_0007" TYPE="cover_back"/>
</mets:div>
</mets:div>
</mets:structMap>
<mets:structMap TYPE="PHYSICAL">
<mets:div CONTENTIDS="http://resolver.staatsbibliothek-berlin.de/SBB00008F1000000000" DMDID="DMDPHYS_0000" ID="PHYS_0000" TYPE="physSequence">
<mets:div TYPE="page" ID="00000024">
<mets:fptr FILEID="OCR-D-GT-PAGE_00000024"/>
<mets:fptr FILEID="OCR-D-OCR-CALAMARI_0001"/>
<mets:fptr FILEID="OCR-D-OCR-TESS_0001"/>
</mets:div>
</mets:div>
</mets:structMap>
<mets:structLink>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0001" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0002" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0003" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0004" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0005" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0006" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0007" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0008" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0009" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0010" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0011" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0012" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0013" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0014" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0015" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0016" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0017" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0018" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0019" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0020" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0021" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0022" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0023" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0024" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0025" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0026" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0027" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0028" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0029" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0030" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0031" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0032" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0033" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0034" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0035" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0036" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0037" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0038" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0039" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0040" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0041" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0042" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0043" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0044" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0045" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0046" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0047" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0048" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0049" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0050" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0051" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0052" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0053" xlink:from="LOG_0000"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0001" xlink:from="LOG_0001"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0002" xlink:from="LOG_0001"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0003" xlink:from="LOG_0001"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0004" xlink:from="LOG_0001"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0001" xlink:from="LOG_0002"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0005" xlink:from="LOG_0003"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0006" xlink:from="LOG_0003"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0007" xlink:from="LOG_0004"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0008" xlink:from="LOG_0004"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0008" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0009" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0010" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0011" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0012" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0013" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0014" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0015" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0016" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0017" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0018" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0019" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0020" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0021" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0022" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0023" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0024" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0025" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0026" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0027" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0028" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0029" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0030" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0031" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0032" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0033" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0034" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0035" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0036" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0037" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0038" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0039" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0040" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0041" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0042" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0043" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0044" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0045" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0046" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0047" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0048" xlink:from="LOG_0005"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0049" xlink:from="LOG_0006"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0050" xlink:from="LOG_0006"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0051" xlink:from="LOG_0006"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0052" xlink:from="LOG_0006"/>
<mets:smLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:to="PHYS_0052" xlink:from="LOG_0007"/>
</mets:structLink>
</mets:mets>

@ -0,0 +1,289 @@
<?xml version="1.0" encoding="UTF-8"?>
<PcGts xmlns="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15">
<Metadata>
<Creator>OCR-D/core 1.0.0b11</Creator>
<Created>2019-08-01T15:03:17.741679</Created>
<LastChange>2019-08-01T15:03:17.741679</LastChange>
<MetadataItem type="processingStep" name="recognition/text-recognition" value="ocrd-tesserocr-recognize">
<Labels>
<Label value="frk" type="model"/>
<Label value="line" type="textequiv_level"/>
<Label value="False" type="overwrite_words"/>
</Labels>
</MetadataItem>
</Metadata>
<Page imageFilename="../OCR-D-IMG-BIN/OCR-D-IMG-BIN_0002" imageWidth="1386" imageHeight="2372">
<ReadingOrder>
<OrderedGroup id="reading-order">
<RegionRefIndexed index="0" regionRef="region0000"/>
<RegionRefIndexed index="1" regionRef="region0001"/>
<RegionRefIndexed index="2" regionRef="region0002"/>
<RegionRefIndexed index="3" regionRef="region0003"/>
<RegionRefIndexed index="4" regionRef="region0004"/>
<RegionRefIndexed index="5" regionRef="region0005"/>
<RegionRefIndexed index="6" regionRef="region0006"/>
</OrderedGroup>
</ReadingOrder>
<TextRegion id="region0000">
<Coords points="488,133 1197,133 1197,193 488,193"/>
<TextEquiv>
<Unicode></Unicode>
</TextEquiv>
</TextRegion>
<TextRegion id="region0001">
<Coords points="40,221 1198,221 1198,626 40,626"/>
<TextLine id="region0001_line0000">
<Coords points="40,221 1198,221 1198,281 40,281"/>
<TextEquiv conf="0.86">
<Unicode>Die ſcheinen uns bald kleine Hügel - bald Hütten x Zelten und bald</Unicode>
</TextEquiv>
</TextLine>
<TextLine id="region0001_line0001">
<Coords points="768,290 879,290 879,325 768,325"/>
<TextEquiv conf="0.62">
<Unicode>„Bellen</Unicode>
</TextEquiv>
</TextLine>
<TextLine id="region0001_line0002">
<Coords points="86,337 1174,337 1174,396 86,396"/>
<TextEquiv conf="0.8">
<Unicode>Den Blicken , welche ſie durchlaufen , von weiten öfters vorzuſtellen,</Unicode>
</TextEquiv>
</TextLine>
<TextLine id="region0001_line0003">
<Coords points="88,397 841,397 841,455 88,455"/>
<TextEquiv conf="0.84">
<Unicode>Sieht man ein ſolch gemähtes Feld - von oben,</Unicode>
</TextEquiv>
</TextLine>
<TextLine id="region0001_line0004">
<Coords points="87,455 1142,455 1142,510 87,510"/>
<TextEquiv conf="0.92">
<Unicode>Sy gleicht es einem weiten Meer - worauf erhabne Wellen kobeny</Unicode>
</TextEquiv>
</TextLine>
<TextLine id="region0001_line0005">
<Coords points="87,510 1153,510 1153,570 87,570"/>
<TextEquiv conf="0.85">
<Unicode>Jedoch mit dieſem Unterſcheid - daß, da ſich die beſtändig rühren:</Unicode>
</TextEquiv>
</TextLine>
<TextLine id="region0001_line0006">
<Coords points="88,569 1161,569 1161,626 88,626"/>
<TextEquiv conf="0.84">
<Unicode>Von einiger Bewegung hier - in dieſen Wellen ; nichts zu ſpähren,</Unicode>
</TextEquiv>
</TextLine>
<TextEquiv>
<Unicode>Die ſcheinen uns bald kleine Hügel - bald Hütten x Zelten und bald
„Bellen
Den Blicken , welche ſie durchlaufen , von weiten öfters vorzuſtellen,
Sieht man ein ſolch gemähtes Feld - von oben,
Sy gleicht es einem weiten Meer - worauf erhabne Wellen kobeny
Jedoch mit dieſem Unterſcheid - daß, da ſich die beſtändig rühren:
Von einiger Bewegung hier - in dieſen Wellen ; nichts zu ſpähren,</Unicode>
</TextEquiv>
</TextRegion>
<TextRegion id="region0002">
<Coords points="517,670 745,670 745,716 517,716"/>
<TextEquiv>
<Unicode></Unicode>
</TextEquiv>
</TextRegion>
<TextRegion id="region0003">
<Coords points="243,739 1124,739 1124,1094 243,1094"/>
<TextLine id="region0003_line0000">
<Coords points="243,739 884,739 884,795 243,795"/>
<TextEquiv conf="0.83">
<Unicode>Was erhebt des Schöpfers Güte</Unicode>
</TextEquiv>
</TextLine>
<TextLine id="region0003_line0001">
<Coords points="244,792 972,792 972,859 244,859"/>
<TextEquiv conf="0.8">
<Unicode>Mehr , als dieſes Seegens Meer?</Unicode>
</TextEquiv>
</TextLine>
<TextLine id="region0003_line0002">
<Coords points="243,855 931,855 931,913 243,913"/>
<TextEquiv conf="0.83">
<Unicode>Kommt dies wohl von ungefehv?</Unicode>
</TextEquiv>
</TextLine>
<TextLine id="region0003_line0003">
<Coords points="244,914 918,914 918,971 244,971"/>
<TextEquiv conf="0.84">
<Unicode>Nein , rüſt mein erfreut Gemühte</Unicode>
</TextEquiv>
</TextLine>
<TextLine id="region0003_line0004">
<Coords points="245,972 1059,972 1059,1034 245,1034"/>
<TextEquiv conf="0.86">
<Unicode>Nur von GOTT komint alles hers</Unicode>
</TextEquiv>
</TextLine>
<TextLine id="region0003_line0005">
<Coords points="247,1029 1124,1029 1124,1094 247,1094"/>
<TextEquiv conf="0.74">
<Unicode>Ihm ſey Preiß und Dan und Ehr!</Unicode>
</TextEquiv>
</TextLine>
<TextEquiv>
<Unicode>Was erhebt des Schöpfers Güte
Mehr , als dieſes Seegens Meer?
Kommt dies wohl von ungefehv?
Nein , rüſt mein erfreut Gemühte
Nur von GOTT komint alles hers
Ihm ſey Preiß und Dan und Ehr!</Unicode>
</TextEquiv>
</TextRegion>
<TextRegion id="region0004">
<Coords points="1043,1096 1204,1096 1204,1136 1043,1136"/>
<TextLine id="region0004_line0000">
<Coords points="1043,1096 1204,1096 1204,1136 1043,1136"/>
<TextEquiv conf="0.8">
<Unicode>Da Capo,</Unicode>
</TextEquiv>
</TextLine>
<TextEquiv>
<Unicode>Da Capo,</Unicode>
</TextEquiv>
</TextRegion>
<TextRegion id="region0005">
<Coords points="68,1183 1236,1183 1236,2056 68,2056"/>
<TextLine id="region0005_line0000">
<Coords points="91,1183 1170,1183 1170,1235 91,1235"/>
<TextEquiv conf="0.65">
<Unicode>Geht man auf einen ſolhen Felde, ſo eben erſi gemäht - ſpaßtiereny</Unicode>
</TextEquiv>
</TextLine>
<TextLine id="region0005_line0001">
<Coords points="89,1236 1182,1236 1182,1289 89,1289"/>
<TextEquiv conf="0.73">
<Unicode>Das man gewohnt voll Korn zu ſehn; ſo kommen wir uns gröſſer für,</Unicode>
</TextEquiv>
</TextLine>
<TextLine id="region0005_line0002">
<Coords points="89,1294 1208,1294 1208,1346 89,1346"/>
<TextEquiv conf="0.85">
<Unicode>Das Feld hingegen niedriger. Auch nimmt ſodean ein neuer Scheinz</Unicode>
</TextEquiv>
</TextLine>
<TextLine id="region0005_line0003">
<Coords points="90,1351 519,1351 519,1399 90,1399"/>
<TextEquiv conf="0.92">
<Unicode>Und eine neue Farben Zier</Unicode>
</TextEquiv>
</TextLine>
<TextLine id="region0005_line0004">
<Coords points="91,1405 561,1405 561,1457 91,1457"/>
<TextEquiv conf="0.91">
<Unicode>Den erſt gemähten Aker ein,</Unicode>
</TextEquiv>
</TextLine>
<TextLine id="region0005_line0005">
<Coords points="92,1459 1208,1459 1208,1510 92,1510"/>
<TextEquiv conf="0.88">
<Unicode>Der Grund iſt grün - die Stoppeln gelb und wenn fich unjrer Son-</Unicode>
</TextEquiv>
</TextLine>
<TextLine id="region0005_line0006">
<Coords points="782,1514 1007,1514 1007,1555 782,1555"/>
<TextEquiv conf="0.46">
<Unicode>nen B;Of</Unicode>
</TextEquiv>
</TextLine>
<TextLine id="region0005_line0007">
<Coords points="68,1562 1177,1562 1177,1617 68,1617"/>
<TextEquiv conf="0.82">
<Unicode>Un ihre runde glatte Röhren , zumahlen früh und Abends bricht;</Unicode>
</TextEquiv>
</TextLine>
<TextLine id="region0005_line0008">
<Coords points="90,1618 1236,1618 1236,1670 90,1670"/>
<TextEquiv conf="0.79">
<Unicode>So kann ein Gold kaum ſtärcker glänßen.- Dies macht ein liebliches</Unicode>
</TextEquiv>
</TextLine>
<TextLine id="region0005_line0009">
<Coords points="777,1671 1159,1671 1159,1716 777,1716"/>
<TextEquiv conf="0.76">
<Unicode>Gemiſche, |</Unicode>
</TextEquiv>
</TextLine>
<TextLine id="region0005_line0010">
<Coords points="92,1722 1211,1722 1211,1783 92,1783"/>
<TextEquiv conf="0.7">
<Unicode>Zutnahl wenn , in der Nachbarſchaft - ein dumfel-grünendes Gebüſche</Unicode>
</TextEquiv>
</TextLine>
<TextLine id="region0005_line0011">
<Coords points="91,1779 1210,1779 1210,1837 91,1837"/>
<TextEquiv conf="0.84">
<Unicode>Den gelben Schimmer noch erhöht. Wir ich nun jüngſt, zur Abend Zeif,</Unicode>
</TextEquiv>
</TextLine>
<TextLine id="region0005_line0012">
<Coords points="93,1837 1210,1837 1210,1895 93,1895"/>
<TextEquiv conf="0.84">
<Unicode>Durch ſo viel ſhwere Scegens-Berge, mit ſanften Schritten, hin und</Unicode>
</TextEquiv>
</TextLine>
<TextLine id="region0005_line0013">
<Coords points="800,1896 914,1896 914,1936 800,1936"/>
<TextEquiv conf="0.52">
<Unicode>Wieder;</Unicode>
</TextEquiv>
</TextLine>
<TextLine id="region0005_line0014">
<Coords points="92,1943 1212,1943 1212,2001 92,2001"/>
<TextEquiv conf="0.74">
<Unicode>Gepühret durch des Feldes Schmu, gerühret durc&lt; die Fruchtbarkeitz</Unicode>
</TextEquiv>
</TextLine>
<TextLine id="region0005_line0015">
<Coords points="90,1998 1125,1998 1125,2056 90,2056"/>
<TextEquiv conf="0.76">
<Unicode>Vergmigt auf meinem Acker gieng - ertönten dieſe meine Lieder:</Unicode>
</TextEquiv>
</TextLine>
<TextEquiv>
<Unicode>Geht man auf einen ſolhen Felde, ſo eben erſi gemäht - ſpaßtiereny
Das man gewohnt voll Korn zu ſehn; ſo kommen wir uns gröſſer für,
Das Feld hingegen niedriger. Auch nimmt ſodean ein neuer Scheinz
Und eine neue Farben Zier
Den erſt gemähten Aker ein,
Der Grund iſt grün - die Stoppeln gelb und wenn fich unjrer Son-
nen B;Of
Un ihre runde glatte Röhren , zumahlen früh und Abends bricht;
So kann ein Gold kaum ſtärcker glänßen.- Dies macht ein liebliches
Gemiſche, |
Zutnahl wenn , in der Nachbarſchaft - ein dumfel-grünendes Gebüſche
Den gelben Schimmer noch erhöht. Wir ich nun jüngſt, zur Abend Zeif,
Durch ſo viel ſhwere Scegens-Berge, mit ſanften Schritten, hin und
Wieder;
Gepühret durch des Feldes Schmu, gerühret durc&lt; die Fruchtbarkeitz
Vergmigt auf meinem Acker gieng - ertönten dieſe meine Lieder:</Unicode>
</TextEquiv>
</TextRegion>
<TextRegion id="region0006">
<Coords points="688,2060 1216,2060 1216,2120 688,2120"/>
<TextLine id="region0006_line0000">
<Coords points="688,2069 787,2069 787,2120 688,2120"/>
<TextEquiv conf="0.74">
<Unicode>5) 2</Unicode>
</TextEquiv>
</TextLine>
<TextLine id="region0006_line0001">
<Coords points="1044,2060 1216,2060 1216,2105 1044,2105"/>
<TextEquiv conf="0.89">
<Unicode>ARIA.</Unicode>
</TextEquiv>
</TextLine>
<TextEquiv>
<Unicode>5) 2
ARIA.</Unicode>
</TextEquiv>
</TextRegion>
</Page>
</PcGts>

@ -0,0 +1,47 @@
<?xml version="1.0" encoding="UTF-8"?>
<PcGts xmlns="http://schema.primaresearch.org/PAGE/gts/pagecontent/2018-07-15" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schema.primaresearch.org/PAGE/gts/pagecontent/2018-07-15 http://schema.primaresearch.org/PAGE/gts/pagecontent/2018-07-15/pagecontent.xsd">
<Metadata>
<Creator></Creator>
<Created>2019-07-26T13:59:00</Created>
<LastChange>2019-07-26T14:00:29</LastChange></Metadata>
<Page imageFilename="lorem-ipsum-scan.tif" imageXResolution="300.00000" imageYResolution="300.00000" imageWidth="2481" imageHeight="3508">
<TextRegion id="tempReg357564684568544579089">
<Coords points="0,0 1,0 1,1 0,1"/>
<TextLine id="l0">
<Coords points="228,237 228,295 2216,295 2216,237"/>
<TextEquiv>
<Unicode></Unicode></TextEquiv></TextLine>
<TextLine id="l1">
<Coords points="228,298 228,348 2160,348 2160,298"/>
<TextEquiv>
<Unicode></Unicode></TextEquiv></TextLine>
<TextLine id="l2">
<Coords points="225,348 225,410 2178,410 2178,348"/>
<TextEquiv>
<Unicode></Unicode></TextEquiv></TextLine>
<TextLine id="l3">
<Coords points="218,413 218,463 2153,463 2153,413"/>
<TextEquiv>
<Unicode></Unicode></TextEquiv></TextLine>
<TextLine id="l4">
<Coords points="225,466 225,522 2153,522 2153,466"/>
<TextEquiv>
<Unicode></Unicode></TextEquiv></TextLine>
<TextLine id="l5">
<Coords points="216,524 216,581 2187,581 2187,524"/>
<TextEquiv>
<Unicode></Unicode></TextEquiv></TextLine>
<TextLine id="l6">
<Coords points="219,584 219,640 542,640 542,584"/>
<TextEquiv>
<Unicode></Unicode></TextEquiv></TextLine></TextRegion>
<TextRegion id="r7" type="paragraph">
<Coords points="204,212 204,651 2227,651 2227,212"/>
<TextEquiv>
<Unicode>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt
ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo
dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit
amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor
invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et
justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum
dolor sit amet.</Unicode></TextEquiv></TextRegion></Page></PcGts>

@ -0,0 +1,139 @@
<?xml version="1.0" encoding="UTF-8"?>
<alto xmlns="http://www.loc.gov/standards/alto/ns-v3#" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v3# http://www.loc.gov/alto/v3/alto-3-0.xsd">
<Description>
<MeasurementUnit>pixel</MeasurementUnit>
<sourceImageInformation>
<fileName> </fileName>
</sourceImageInformation>
<OCRProcessing ID="OCR_0">
<ocrProcessingStep>
<processingSoftware>
<softwareName>tesseract 4.1.0-rc4</softwareName>
</processingSoftware>
</ocrProcessingStep>
</OCRProcessing>
</Description>
<Layout>
<Page WIDTH="2481" HEIGHT="3508" PHYSICAL_IMG_NR="0" ID="page_0">
<PrintSpace HPOS="0" VPOS="0" WIDTH="2481" HEIGHT="3508">
<TextBlock ID="block_0" HPOS="209" VPOS="258" WIDTH="1954" HEIGHT="437">
<TextLine ID="line_0" HPOS="209" VPOS="258" WIDTH="1954" HEIGHT="103">
<String ID="string_0" HPOS="209" VPOS="319" WIDTH="134" HEIGHT="34" WC="0.96" CONTENT="Lorem"/><SP WIDTH="13" VPOS="319" HPOS="343"/>
<String ID="string_1" HPOS="356" VPOS="316" WIDTH="121" HEIGHT="45" WC="0.96" CONTENT="ipsum"/><SP WIDTH="14" VPOS="316" HPOS="477"/>
<String ID="string_2" HPOS="491" VPOS="312" WIDTH="102" HEIGHT="36" WC="0.96" CONTENT="dolor"/><SP WIDTH="15" VPOS="312" HPOS="593"/>
<String ID="string_3" HPOS="608" VPOS="309" WIDTH="46" HEIGHT="35" WC="0.96" CONTENT="sit"/><SP WIDTH="14" VPOS="309" HPOS="654"/>
<String ID="string_4" HPOS="668" VPOS="311" WIDTH="106" HEIGHT="37" WC="0.96" CONTENT="amet,"/><SP WIDTH="16" VPOS="311" HPOS="774"/>
<String ID="string_5" HPOS="790" VPOS="307" WIDTH="201" HEIGHT="32" WC="0.88" CONTENT="consetetur"/><SP WIDTH="14" VPOS="307" HPOS="991"/>
<String ID="string_6" HPOS="1005" VPOS="297" WIDTH="205" HEIGHT="46" WC="0.96" CONTENT="sadipscing"/><SP WIDTH="15" VPOS="297" HPOS="1210"/>
<String ID="string_7" HPOS="1225" VPOS="293" WIDTH="84" HEIGHT="42" WC="0.91" CONTENT="elitr,"/><SP WIDTH="16" VPOS="293" HPOS="1309"/>
<String ID="string_8" HPOS="1325" VPOS="289" WIDTH="65" HEIGHT="38" WC="0.96" CONTENT="sed"/><SP WIDTH="14" VPOS="289" HPOS="1390"/>
<String ID="string_9" HPOS="1404" VPOS="286" WIDTH="97" HEIGHT="36" WC="0.93" CONTENT="diam"/><SP WIDTH="14" VPOS="286" HPOS="1501"/>
<String ID="string_10" HPOS="1515" VPOS="291" WIDTH="100" HEIGHT="24" WC="0.69" CONTENT="nonu"/><SP WIDTH="32" VPOS="291" HPOS="1615"/>
<String ID="string_11" HPOS="1647" VPOS="285" WIDTH="30" HEIGHT="36" WC="0.37" CONTENT="yy"/><SP WIDTH="17" VPOS="285" HPOS="1677"/>
<String ID="string_12" HPOS="1694" VPOS="268" WIDTH="140" HEIGHT="42" WC="0.93" CONTENT="eirmod"/><SP WIDTH="11" VPOS="268" HPOS="1834"/>
<String ID="string_13" HPOS="1845" VPOS="273" WIDTH="139" HEIGHT="37" WC="0.96" CONTENT="tempor"/><SP WIDTH="15" VPOS="273" HPOS="1984"/>
<String ID="string_14" HPOS="1999" VPOS="258" WIDTH="164" HEIGHT="38" WC="0.95" CONTENT="invidunt"/>
</TextLine>
<TextLine ID="line_1" HPOS="211" VPOS="315" WIDTH="1904" HEIGHT="102">
<String ID="string_15" HPOS="211" VPOS="380" WIDTH="39" HEIGHT="31" WC="0.96" CONTENT="ut"/><SP WIDTH="13" VPOS="380" HPOS="250"/>
<String ID="string_16" HPOS="263" VPOS="373" WIDTH="123" HEIGHT="44" WC="0.96" CONTENT="labore"/><SP WIDTH="16" VPOS="373" HPOS="386"/>
<String ID="string_17" HPOS="402" VPOS="379" WIDTH="33" HEIGHT="27" WC="0.95" CONTENT="et"/><SP WIDTH="14" VPOS="379" HPOS="435"/>
<String ID="string_18" HPOS="449" VPOS="370" WIDTH="123" HEIGHT="36" WC="0.95" CONTENT="dolore"/><SP WIDTH="15" VPOS="370" HPOS="572"/>
<String ID="string_19" HPOS="587" VPOS="374" WIDTH="133" HEIGHT="37" WC="0.96" CONTENT="magna"/><SP WIDTH="14" VPOS="374" HPOS="720"/>
<String ID="string_20" HPOS="734" VPOS="363" WIDTH="183" HEIGHT="43" WC="0.96" CONTENT="aliquyam"/><SP WIDTH="14" VPOS="363" HPOS="917"/>
<String ID="string_21" HPOS="931" VPOS="360" WIDTH="82" HEIGHT="36" WC="0.95" CONTENT="erat,"/><SP WIDTH="17" VPOS="360" HPOS="1013"/>
<String ID="string_22" HPOS="1030" VPOS="354" WIDTH="65" HEIGHT="35" WC="0.96" CONTENT="sed"/><SP WIDTH="13" VPOS="354" HPOS="1095"/>
<String ID="string_23" HPOS="1108" VPOS="352" WIDTH="96" HEIGHT="36" WC="0.96" CONTENT="diam"/><SP WIDTH="13" VPOS="352" HPOS="1204"/>
<String ID="string_24" HPOS="1217" VPOS="350" WIDTH="181" HEIGHT="44" WC="0.95" CONTENT="voluptua."/><SP WIDTH="13" VPOS="350" HPOS="1398"/>
<String ID="string_25" HPOS="1411" VPOS="345" WIDTH="49" HEIGHT="34" WC="0.95" CONTENT="At"/><SP WIDTH="11" VPOS="345" HPOS="1460"/>
<String ID="string_26" HPOS="1471" VPOS="348" WIDTH="88" HEIGHT="26" WC="0.93" CONTENT="Vero"/><SP WIDTH="16" VPOS="348" HPOS="1559"/>
<String ID="string_27" HPOS="1575" VPOS="345" WIDTH="65" HEIGHT="26" WC="0.96" CONTENT="eos"/><SP WIDTH="15" VPOS="345" HPOS="1640"/>
<String ID="string_28" HPOS="1655" VPOS="339" WIDTH="36" HEIGHT="29" WC="0.96" CONTENT="et"/><SP WIDTH="14" VPOS="339" HPOS="1691"/>
<String ID="string_29" HPOS="1705" VPOS="336" WIDTH="168" HEIGHT="31" WC="0.87" CONTENT="accusam"/><SP WIDTH="15" VPOS="336" HPOS="1873"/>
<String ID="string_30" HPOS="1888" VPOS="329" WIDTH="34" HEIGHT="28" WC="0.96" CONTENT="et"/><SP WIDTH="11" VPOS="329" HPOS="1922"/>
<String ID="string_31" HPOS="1933" VPOS="322" WIDTH="96" HEIGHT="44" WC="0.96" CONTENT="justo"/><SP WIDTH="15" VPOS="322" HPOS="2029"/>
<String ID="string_32" HPOS="2044" VPOS="315" WIDTH="71" HEIGHT="63" WC="0.96" CONTENT="duo"/>
</TextLine>
<TextLine ID="line_2" HPOS="214" VPOS="375" WIDTH="1919" HEIGHT="93">
<String ID="string_33" HPOS="214" VPOS="431" WIDTH="144" HEIGHT="37" WC="0.96" CONTENT="dolores"/><SP WIDTH="16" VPOS="431" HPOS="358"/>
<String ID="string_34" HPOS="374" VPOS="433" WIDTH="34" HEIGHT="31" WC="0.96" CONTENT="et"/><SP WIDTH="14" VPOS="433" HPOS="408"/>
<String ID="string_35" HPOS="422" VPOS="437" WIDTH="42" HEIGHT="25" WC="0.96" CONTENT="ea"/><SP WIDTH="13" VPOS="437" HPOS="464"/>
<String ID="string_36" HPOS="477" VPOS="426" WIDTH="136" HEIGHT="35" WC="0.96" CONTENT="rebum."/><SP WIDTH="18" VPOS="426" HPOS="613"/>
<String ID="string_37" HPOS="631" VPOS="424" WIDTH="75" HEIGHT="34" WC="0.96" CONTENT="Stet"/><SP WIDTH="14" VPOS="424" HPOS="706"/>
<String ID="string_38" HPOS="720" VPOS="419" WIDTH="85" HEIGHT="36" WC="0.96" CONTENT="clita"/><SP WIDTH="13" VPOS="419" HPOS="805"/>
<String ID="string_39" HPOS="818" VPOS="415" WIDTH="90" HEIGHT="35" WC="0.97" CONTENT="kasd"/><SP WIDTH="14" VPOS="415" HPOS="908"/>
<String ID="string_40" HPOS="922" VPOS="412" WIDTH="206" HEIGHT="48" WC="0.96" CONTENT="gubergren,"/><SP WIDTH="16" VPOS="412" HPOS="1128"/>
<String ID="string_41" HPOS="1144" VPOS="417" WIDTH="47" HEIGHT="26" WC="0.97" CONTENT="no"/><SP WIDTH="16" VPOS="417" HPOS="1191"/>
<String ID="string_42" HPOS="1207" VPOS="415" WIDTH="61" HEIGHT="25" WC="0.96" CONTENT="sea"/><SP WIDTH="13" VPOS="415" HPOS="1268"/>
<String ID="string_43" HPOS="1281" VPOS="405" WIDTH="169" HEIGHT="36" WC="0.91" CONTENT="iakimata"/><SP WIDTH="14" VPOS="405" HPOS="1450"/>
<String ID="string_44" HPOS="1464" VPOS="400" WIDTH="144" HEIGHT="33" WC="0.96" CONTENT="sanctus"/><SP WIDTH="16" VPOS="400" HPOS="1608"/>
<String ID="string_45" HPOS="1624" VPOS="397" WIDTH="54" HEIGHT="29" WC="0.97" CONTENT="est"/><SP WIDTH="13" VPOS="397" HPOS="1678"/>
<String ID="string_46" HPOS="1691" VPOS="390" WIDTH="132" HEIGHT="34" WC="0.96" CONTENT="Lorem"/><SP WIDTH="14" VPOS="390" HPOS="1823"/>
<String ID="string_47" HPOS="1837" VPOS="383" WIDTH="120" HEIGHT="44" WC="0.96" CONTENT="ipsum"/><SP WIDTH="14" VPOS="383" HPOS="1957"/>
<String ID="string_48" HPOS="1971" VPOS="375" WIDTH="102" HEIGHT="37" WC="0.96" CONTENT="dolor"/><SP WIDTH="15" VPOS="375" HPOS="2073"/>
<String ID="string_49" HPOS="2088" VPOS="377" WIDTH="45" HEIGHT="31" WC="0.96" CONTENT="sit"/>
</TextLine>
<TextLine ID="line_3" HPOS="215" VPOS="435" WIDTH="1896" HEIGHT="93">
<String ID="string_50" HPOS="215" VPOS="494" WIDTH="106" HEIGHT="32" WC="0.96" CONTENT="amet."/><SP WIDTH="16" VPOS="494" HPOS="321"/>
<String ID="string_51" HPOS="337" VPOS="488" WIDTH="130" HEIGHT="33" WC="0.96" CONTENT="Lorem"/><SP WIDTH="14" VPOS="488" HPOS="467"/>
<String ID="string_52" HPOS="481" VPOS="484" WIDTH="121" HEIGHT="44" WC="0.96" CONTENT="ipsum"/><SP WIDTH="14" VPOS="484" HPOS="602"/>
<String ID="string_53" HPOS="616" VPOS="479" WIDTH="104" HEIGHT="37" WC="0.96" CONTENT="dolor"/><SP WIDTH="14" VPOS="479" HPOS="720"/>
<String ID="string_54" HPOS="734" VPOS="476" WIDTH="46" HEIGHT="36" WC="0.93" CONTENT="sit"/><SP WIDTH="14" VPOS="476" HPOS="780"/>
<String ID="string_55" HPOS="794" VPOS="477" WIDTH="104" HEIGHT="36" WC="0.75" CONTENT="armet,"/><SP WIDTH="17" VPOS="477" HPOS="898"/>
<String ID="string_56" HPOS="915" VPOS="474" WIDTH="200" HEIGHT="30" WC="0.97" CONTENT="consetetur"/><SP WIDTH="14" VPOS="474" HPOS="1115"/>
<String ID="string_57" HPOS="1129" VPOS="463" WIDTH="205" HEIGHT="45" WC="0.96" CONTENT="sadipscing"/><SP WIDTH="15" VPOS="463" HPOS="1334"/>
<String ID="string_58" HPOS="1349" VPOS="457" WIDTH="86" HEIGHT="41" WC="0.96" CONTENT="elitr,"/><SP WIDTH="16" VPOS="457" HPOS="1435"/>
<String ID="string_59" HPOS="1451" VPOS="452" WIDTH="65" HEIGHT="39" WC="0.96" CONTENT="sed"/><SP WIDTH="14" VPOS="452" HPOS="1516"/>
<String ID="string_60" HPOS="1530" VPOS="449" WIDTH="99" HEIGHT="36" WC="0.93" CONTENT="diam"/><SP WIDTH="14" VPOS="449" HPOS="1629"/>
<String ID="string_61" HPOS="1643" VPOS="451" WIDTH="162" HEIGHT="36" WC="0.59" CONTENT="nonurny"/><SP WIDTH="16" VPOS="451" HPOS="1805"/>
<String ID="string_62" HPOS="1821" VPOS="435" WIDTH="138" HEIGHT="39" WC="0.96" CONTENT="eirmod"/><SP WIDTH="12" VPOS="435" HPOS="1959"/>
<String ID="string_63" HPOS="1971" VPOS="440" WIDTH="140" HEIGHT="37" WC="0.96" CONTENT="tempor"/>
</TextLine>
<TextLine ID="line_4" HPOS="216" VPOS="483" WIDTH="1888" HEIGHT="97">
<String ID="string_64" HPOS="216" VPOS="543" WIDTH="165" HEIGHT="37" WC="0.97" CONTENT="invidunt"/><SP WIDTH="13" VPOS="543" HPOS="381"/>
<String ID="string_65" HPOS="394" VPOS="546" WIDTH="39" HEIGHT="30" WC="0.97" CONTENT="ut"/><SP WIDTH="12" VPOS="546" HPOS="433"/>
<String ID="string_66" HPOS="445" VPOS="539" WIDTH="122" HEIGHT="36" WC="0.96" CONTENT="labore"/><SP WIDTH="16" VPOS="539" HPOS="567"/>
<String ID="string_67" HPOS="583" VPOS="543" WIDTH="35" HEIGHT="29" WC="0.96" CONTENT="et"/><SP WIDTH="14" VPOS="543" HPOS="618"/>
<String ID="string_68" HPOS="632" VPOS="536" WIDTH="125" HEIGHT="34" WC="0.96" CONTENT="dolore"/><SP WIDTH="14" VPOS="536" HPOS="757"/>
<String ID="string_69" HPOS="771" VPOS="539" WIDTH="131" HEIGHT="37" WC="0.46" CONTENT="magna"/><SP WIDTH="14" VPOS="539" HPOS="902"/>
<String ID="string_70" HPOS="916" VPOS="526" WIDTH="182" HEIGHT="45" WC="0.96" CONTENT="aliquyam"/><SP WIDTH="14" VPOS="526" HPOS="1098"/>
<String ID="string_71" HPOS="1112" VPOS="527" WIDTH="82" HEIGHT="37" WC="0.96" CONTENT="erat,"/><SP WIDTH="17" VPOS="527" HPOS="1194"/>
<String ID="string_72" HPOS="1211" VPOS="519" WIDTH="63" HEIGHT="36" WC="0.97" CONTENT="sed"/><SP WIDTH="14" VPOS="519" HPOS="1274"/>
<String ID="string_73" HPOS="1288" VPOS="517" WIDTH="97" HEIGHT="37" WC="0.96" CONTENT="diam"/><SP WIDTH="11" VPOS="517" HPOS="1385"/>
<String ID="string_74" HPOS="1396" VPOS="513" WIDTH="185" HEIGHT="44" WC="0.96" CONTENT="voluptua."/><SP WIDTH="14" VPOS="513" HPOS="1581"/>
<String ID="string_75" HPOS="1595" VPOS="505" WIDTH="50" HEIGHT="35" WC="0.96" CONTENT="At"/><SP WIDTH="11" VPOS="505" HPOS="1645"/>
<String ID="string_76" HPOS="1656" VPOS="511" WIDTH="89" HEIGHT="27" WC="0.96" CONTENT="vero"/><SP WIDTH="16" VPOS="511" HPOS="1745"/>
<String ID="string_77" HPOS="1761" VPOS="508" WIDTH="63" HEIGHT="26" WC="0.96" CONTENT="eos"/><SP WIDTH="15" VPOS="508" HPOS="1824"/>
<String ID="string_78" HPOS="1839" VPOS="501" WIDTH="35" HEIGHT="30" WC="0.97" CONTENT="et"/><SP WIDTH="13" VPOS="501" HPOS="1874"/>
<String ID="string_79" HPOS="1887" VPOS="499" WIDTH="168" HEIGHT="53" WC="0.80" CONTENT="accusam"/><SP WIDTH="-3" VPOS="499" HPOS="2055"/>
<String ID="string_80" HPOS="2052" VPOS="483" WIDTH="52" HEIGHT="55" WC="0.97" CONTENT="et"/>
</TextLine>
<TextLine ID="line_5" HPOS="215" VPOS="552" WIDTH="1941" HEIGHT="97">
<String ID="string_81" HPOS="215" VPOS="604" WIDTH="97" HEIGHT="45" WC="0.97" CONTENT="justo"/><SP WIDTH="16" VPOS="604" HPOS="312"/>
<String ID="string_82" HPOS="328" VPOS="600" WIDTH="71" HEIGHT="35" WC="0.97" CONTENT="duo"/><SP WIDTH="16" VPOS="600" HPOS="399"/>
<String ID="string_83" HPOS="415" VPOS="597" WIDTH="143" HEIGHT="36" WC="0.93" CONTENT="dolores"/><SP WIDTH="16" VPOS="597" HPOS="558"/>
<String ID="string_84" HPOS="574" VPOS="600" WIDTH="34" HEIGHT="29" WC="0.96" CONTENT="et"/><SP WIDTH="14" VPOS="600" HPOS="608"/>
<String ID="string_85" HPOS="622" VPOS="602" WIDTH="43" HEIGHT="26" WC="0.96" CONTENT="ea"/><SP WIDTH="13" VPOS="602" HPOS="665"/>
<String ID="string_86" HPOS="678" VPOS="590" WIDTH="136" HEIGHT="36" WC="0.96" CONTENT="rebum."/><SP WIDTH="19" VPOS="590" HPOS="814"/>
<String ID="string_87" HPOS="833" VPOS="588" WIDTH="74" HEIGHT="34" WC="0.96" CONTENT="Stet"/><SP WIDTH="14" VPOS="588" HPOS="907"/>
<String ID="string_88" HPOS="921" VPOS="584" WIDTH="83" HEIGHT="36" WC="0.96" CONTENT="clita"/><SP WIDTH="12" VPOS="584" HPOS="1004"/>
<String ID="string_89" HPOS="1016" VPOS="580" WIDTH="90" HEIGHT="36" WC="0.97" CONTENT="kasd"/><SP WIDTH="15" VPOS="580" HPOS="1106"/>
<String ID="string_90" HPOS="1121" VPOS="578" WIDTH="205" HEIGHT="47" WC="0.96" CONTENT="gubergren,"/><SP WIDTH="16" VPOS="578" HPOS="1326"/>
<String ID="string_91" HPOS="1342" VPOS="582" WIDTH="47" HEIGHT="25" WC="0.96" CONTENT="no"/><SP WIDTH="16" VPOS="582" HPOS="1389"/>
<String ID="string_92" HPOS="1405" VPOS="581" WIDTH="62" HEIGHT="26" WC="0.97" CONTENT="sea"/><SP WIDTH="13" VPOS="581" HPOS="1467"/>
<String ID="string_93" HPOS="1480" VPOS="566" WIDTH="172" HEIGHT="38" WC="0.96" CONTENT="takimata"/><SP WIDTH="14" VPOS="566" HPOS="1652"/>
<String ID="string_94" HPOS="1666" VPOS="563" WIDTH="145" HEIGHT="33" WC="0.97" CONTENT="sanctus"/><SP WIDTH="15" VPOS="563" HPOS="1811"/>
<String ID="string_95" HPOS="1826" VPOS="558" WIDTH="54" HEIGHT="30" WC="0.97" CONTENT="est"/><SP WIDTH="12" VPOS="558" HPOS="1880"/>
<String ID="string_96" HPOS="1892" VPOS="552" WIDTH="130" HEIGHT="34" WC="0.96" CONTENT="Lorem"/><SP WIDTH="15" VPOS="552" HPOS="2022"/>
<String ID="string_97" HPOS="2037" VPOS="553" WIDTH="119" HEIGHT="37" WC="0.51" CONTENT="Ipsum"/>
</TextLine>
<TextLine ID="line_6" HPOS="219" VPOS="657" WIDTH="282" HEIGHT="38">
<String ID="string_98" HPOS="219" VPOS="658" WIDTH="104" HEIGHT="37" WC="0.97" CONTENT="dolor"/><SP WIDTH="15" VPOS="658" HPOS="323"/>
<String ID="string_99" HPOS="338" VPOS="657" WIDTH="45" HEIGHT="35" WC="0.97" CONTENT="sit"/><SP WIDTH="14" VPOS="657" HPOS="383"/>
<String ID="string_100" HPOS="397" VPOS="660" WIDTH="104" HEIGHT="35" WC="0.94" CONTENT="amet."/>
</TextLine>
</TextBlock>
</PrintSpace>
</Page>
</Layout>
</alto>

@ -0,0 +1,47 @@
<?xml version="1.0" encoding="UTF-8"?>
<PcGts xmlns="http://schema.primaresearch.org/PAGE/gts/pagecontent/2018-07-15" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schema.primaresearch.org/PAGE/gts/pagecontent/2018-07-15 http://schema.primaresearch.org/PAGE/gts/pagecontent/2018-07-15/pagecontent.xsd">
<Metadata>
<Creator></Creator>
<Created>2019-07-26T13:59:00</Created>
<LastChange>2019-07-26T14:00:29</LastChange></Metadata>
<Page imageFilename="lorem-ipsum-scan.tif" imageXResolution="300.00000" imageYResolution="300.00000" imageWidth="2481" imageHeight="3508">
<TextRegion id="tempReg357564684568544579089">
<Coords points="0,0 1,0 1,1 0,1"/>
<TextLine id="l0">
<Coords points="228,237 228,295 2216,295 2216,237"/>
<TextEquiv>
<Unicode></Unicode></TextEquiv></TextLine>
<TextLine id="l1">
<Coords points="228,298 228,348 2160,348 2160,298"/>
<TextEquiv>
<Unicode></Unicode></TextEquiv></TextLine>
<TextLine id="l2">
<Coords points="225,348 225,410 2178,410 2178,348"/>
<TextEquiv>
<Unicode></Unicode></TextEquiv></TextLine>
<TextLine id="l3">
<Coords points="218,413 218,463 2153,463 2153,413"/>
<TextEquiv>
<Unicode></Unicode></TextEquiv></TextLine>
<TextLine id="l4">
<Coords points="225,466 225,522 2153,522 2153,466"/>
<TextEquiv>
<Unicode></Unicode></TextEquiv></TextLine>
<TextLine id="l5">
<Coords points="216,524 216,581 2187,581 2187,524"/>
<TextEquiv>
<Unicode></Unicode></TextEquiv></TextLine>
<TextLine id="l6">
<Coords points="219,584 219,640 542,640 542,584"/>
<TextEquiv>
<Unicode></Unicode></TextEquiv></TextLine></TextRegion>
<TextRegion id="r7" type="paragraph">
<Coords points="204,212 204,651 2227,651 2227,212"/>
<TextEquiv>
<Unicode>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt
ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo
dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit
amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor
invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et
justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum
dolor sit amet.</Unicode></TextEquiv></TextRegion></Page></PcGts>

@ -0,0 +1,138 @@
<?xml version="1.0" encoding="UTF-8"?>
<alto xmlns="http://www.loc.gov/standards/alto/ns-v3#" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v3# http://www.loc.gov/alto/v3/alto-3-0.xsd">
<Description>
<MeasurementUnit>pixel</MeasurementUnit>
<sourceImageInformation>
<fileName> </fileName>
</sourceImageInformation>
<OCRProcessing ID="OCR_0">
<ocrProcessingStep>
<processingSoftware>
<softwareName>tesseract 4.1.0-rc4</softwareName>
</processingSoftware>
</ocrProcessingStep>
</OCRProcessing>
</Description>
<Layout>
<Page WIDTH="2481" HEIGHT="3508" PHYSICAL_IMG_NR="0" ID="page_0">
<PrintSpace HPOS="0" VPOS="0" WIDTH="2481" HEIGHT="3508">
<TextBlock ID="block_0" HPOS="234" VPOS="244" WIDTH="1966" HEIGHT="387">
<TextLine ID="line_0" HPOS="237" VPOS="244" WIDTH="1963" HEIGHT="48">
<String ID="string_0" HPOS="237" VPOS="248" WIDTH="133" HEIGHT="34" WC="0.96" CONTENT="Lorem"/><SP WIDTH="14" VPOS="248" HPOS="370"/>
<String ID="string_1" HPOS="384" VPOS="247" WIDTH="120" HEIGHT="45" WC="0.96" CONTENT="ipsum"/><SP WIDTH="15" VPOS="247" HPOS="504"/>
<String ID="string_2" HPOS="519" VPOS="246" WIDTH="103" HEIGHT="36" WC="0.96" CONTENT="dolor"/><SP WIDTH="14" VPOS="246" HPOS="622"/>
<String ID="string_3" HPOS="636" VPOS="247" WIDTH="46" HEIGHT="35" WC="0.96" CONTENT="sit"/><SP WIDTH="14" VPOS="247" HPOS="682"/>
<String ID="string_4" HPOS="696" VPOS="252" WIDTH="105" HEIGHT="36" WC="0.97" CONTENT="amet,"/><SP WIDTH="17" VPOS="252" HPOS="801"/>
<String ID="string_5" HPOS="818" VPOS="251" WIDTH="202" HEIGHT="30" WC="0.96" CONTENT="consetetur"/><SP WIDTH="14" VPOS="251" HPOS="1020"/>
<String ID="string_6" HPOS="1034" VPOS="244" WIDTH="207" HEIGHT="46" WC="0.96" CONTENT="sadipscing"/><SP WIDTH="15" VPOS="244" HPOS="1241"/>
<String ID="string_7" HPOS="1256" VPOS="244" WIDTH="86" HEIGHT="43" WC="0.96" CONTENT="elitr,"/><SP WIDTH="16" VPOS="244" HPOS="1342"/>
<String ID="string_8" HPOS="1358" VPOS="244" WIDTH="65" HEIGHT="36" WC="0.96" CONTENT="sed"/><SP WIDTH="15" VPOS="244" HPOS="1423"/>
<String ID="string_9" HPOS="1438" VPOS="244" WIDTH="99" HEIGHT="36" WC="0.96" CONTENT="diam"/><SP WIDTH="14" VPOS="244" HPOS="1537"/>
<String ID="string_10" HPOS="1551" VPOS="255" WIDTH="164" HEIGHT="35" WC="0.97" CONTENT="nonumy"/><SP WIDTH="15" VPOS="255" HPOS="1715"/>
<String ID="string_11" HPOS="1730" VPOS="244" WIDTH="139" HEIGHT="36" WC="0.96" CONTENT="eirmod"/><SP WIDTH="13" VPOS="244" HPOS="1869"/>
<String ID="string_12" HPOS="1882" VPOS="250" WIDTH="140" HEIGHT="40" WC="0.96" CONTENT="tempor"/><SP WIDTH="13" VPOS="250" HPOS="2022"/>
<String ID="string_13" HPOS="2035" VPOS="244" WIDTH="165" HEIGHT="35" WC="0.96" CONTENT="invidunt"/>
</TextLine>
<TextLine ID="line_1" HPOS="237" VPOS="301" WIDTH="1913" HEIGHT="49">
<String ID="string_14" HPOS="237" VPOS="310" WIDTH="39" HEIGHT="29" WC="0.96" CONTENT="ut"/><SP WIDTH="13" VPOS="310" HPOS="276"/>
<String ID="string_15" HPOS="289" VPOS="304" WIDTH="123" HEIGHT="44" WC="0.96" CONTENT="labore"/><SP WIDTH="16" VPOS="304" HPOS="412"/>
<String ID="string_16" HPOS="428" VPOS="310" WIDTH="34" HEIGHT="29" WC="0.97" CONTENT="et"/><SP WIDTH="14" VPOS="310" HPOS="462"/>
<String ID="string_17" HPOS="476" VPOS="304" WIDTH="123" HEIGHT="36" WC="0.96" CONTENT="dolore"/><SP WIDTH="15" VPOS="304" HPOS="599"/>
<String ID="string_18" HPOS="614" VPOS="313" WIDTH="133" HEIGHT="37" WC="0.96" CONTENT="magna"/><SP WIDTH="14" VPOS="313" HPOS="747"/>
<String ID="string_19" HPOS="761" VPOS="302" WIDTH="183" HEIGHT="46" WC="0.96" CONTENT="aliquyam"/><SP WIDTH="15" VPOS="302" HPOS="944"/>
<String ID="string_20" HPOS="959" VPOS="308" WIDTH="81" HEIGHT="36" WC="0.96" CONTENT="erat,"/><SP WIDTH="17" VPOS="308" HPOS="1040"/>
<String ID="string_21" HPOS="1057" VPOS="301" WIDTH="65" HEIGHT="36" WC="0.96" CONTENT="sed"/><SP WIDTH="14" VPOS="301" HPOS="1122"/>
<String ID="string_22" HPOS="1136" VPOS="301" WIDTH="97" HEIGHT="36" WC="0.95" CONTENT="diam"/><SP WIDTH="13" VPOS="301" HPOS="1233"/>
<String ID="string_23" HPOS="1246" VPOS="301" WIDTH="183" HEIGHT="46" WC="0.96" CONTENT="voluptua."/><SP WIDTH="13" VPOS="301" HPOS="1429"/>
<String ID="string_24" HPOS="1442" VPOS="303" WIDTH="51" HEIGHT="34" WC="0.96" CONTENT="At"/><SP WIDTH="12" VPOS="303" HPOS="1493"/>
<String ID="string_25" HPOS="1505" VPOS="312" WIDTH="88" HEIGHT="25" WC="0.96" CONTENT="vero"/><SP WIDTH="17" VPOS="312" HPOS="1593"/>
<String ID="string_26" HPOS="1610" VPOS="312" WIDTH="64" HEIGHT="25" WC="0.96" CONTENT="eos"/><SP WIDTH="16" VPOS="312" HPOS="1674"/>
<String ID="string_27" HPOS="1690" VPOS="308" WIDTH="35" HEIGHT="29" WC="0.96" CONTENT="et"/><SP WIDTH="14" VPOS="308" HPOS="1725"/>
<String ID="string_28" HPOS="1739" VPOS="312" WIDTH="168" HEIGHT="25" WC="0.96" CONTENT="accusam"/><SP WIDTH="15" VPOS="312" HPOS="1907"/>
<String ID="string_29" HPOS="1922" VPOS="308" WIDTH="34" HEIGHT="29" WC="0.97" CONTENT="et"/><SP WIDTH="11" VPOS="308" HPOS="1956"/>
<String ID="string_30" HPOS="1967" VPOS="302" WIDTH="96" HEIGHT="45" WC="0.97" CONTENT="justo"/><SP WIDTH="16" VPOS="302" HPOS="2063"/>
<String ID="string_31" HPOS="2079" VPOS="301" WIDTH="71" HEIGHT="36" WC="0.96" CONTENT="duo"/>
</TextLine>
<TextLine ID="line_2" HPOS="238" VPOS="359" WIDTH="1928" HEIGHT="46">
<String ID="string_32" HPOS="238" VPOS="361" WIDTH="144" HEIGHT="36" WC="0.96" CONTENT="dolores"/><SP WIDTH="16" VPOS="361" HPOS="382"/>
<String ID="string_33" HPOS="398" VPOS="368" WIDTH="34" HEIGHT="29" WC="0.96" CONTENT="et"/><SP WIDTH="15" VPOS="368" HPOS="432"/>
<String ID="string_34" HPOS="447" VPOS="372" WIDTH="41" HEIGHT="25" WC="0.96" CONTENT="ea"/><SP WIDTH="14" VPOS="372" HPOS="488"/>
<String ID="string_35" HPOS="502" VPOS="361" WIDTH="136" HEIGHT="36" WC="0.96" CONTENT="rebum."/><SP WIDTH="19" VPOS="361" HPOS="638"/>
<String ID="string_36" HPOS="657" VPOS="363" WIDTH="75" HEIGHT="33" WC="0.97" CONTENT="Stet"/><SP WIDTH="14" VPOS="363" HPOS="732"/>
<String ID="string_37" HPOS="746" VPOS="360" WIDTH="84" HEIGHT="36" WC="0.96" CONTENT="clita"/><SP WIDTH="13" VPOS="360" HPOS="830"/>
<String ID="string_38" HPOS="843" VPOS="359" WIDTH="91" HEIGHT="36" WC="0.96" CONTENT="kasd"/><SP WIDTH="13" VPOS="359" HPOS="934"/>
<String ID="string_39" HPOS="947" VPOS="359" WIDTH="208" HEIGHT="46" WC="0.96" CONTENT="gubergren,"/><SP WIDTH="16" VPOS="359" HPOS="1155"/>
<String ID="string_40" HPOS="1171" VPOS="370" WIDTH="47" HEIGHT="24" WC="0.96" CONTENT="no"/><SP WIDTH="16" VPOS="370" HPOS="1218"/>
<String ID="string_41" HPOS="1234" VPOS="370" WIDTH="61" HEIGHT="25" WC="0.96" CONTENT="sea"/><SP WIDTH="13" VPOS="370" HPOS="1295"/>
<String ID="string_42" HPOS="1308" VPOS="359" WIDTH="172" HEIGHT="36" WC="0.96" CONTENT="takimata"/><SP WIDTH="15" VPOS="359" HPOS="1480"/>
<String ID="string_43" HPOS="1495" VPOS="365" WIDTH="145" HEIGHT="30" WC="0.96" CONTENT="sanctus"/><SP WIDTH="16" VPOS="365" HPOS="1640"/>
<String ID="string_44" HPOS="1656" VPOS="365" WIDTH="55" HEIGHT="29" WC="0.96" CONTENT="est"/><SP WIDTH="13" VPOS="365" HPOS="1711"/>
<String ID="string_45" HPOS="1724" VPOS="361" WIDTH="131" HEIGHT="33" WC="0.96" CONTENT="Lorem"/><SP WIDTH="15" VPOS="361" HPOS="1855"/>
<String ID="string_46" HPOS="1870" VPOS="360" WIDTH="119" HEIGHT="44" WC="0.96" CONTENT="ipsum"/><SP WIDTH="15" VPOS="360" HPOS="1989"/>
<String ID="string_47" HPOS="2004" VPOS="359" WIDTH="103" HEIGHT="35" WC="0.96" CONTENT="dolor"/><SP WIDTH="14" VPOS="359" HPOS="2107"/>
<String ID="string_48" HPOS="2121" VPOS="360" WIDTH="45" HEIGHT="34" WC="0.96" CONTENT="sit"/>
</TextLine>
<TextLine ID="line_3" HPOS="238" VPOS="416" WIDTH="1905" HEIGHT="48">
<String ID="string_49" HPOS="238" VPOS="425" WIDTH="105" HEIGHT="29" WC="0.96" CONTENT="amet."/><SP WIDTH="16" VPOS="425" HPOS="343"/>
<String ID="string_50" HPOS="359" VPOS="421" WIDTH="132" HEIGHT="33" WC="0.96" CONTENT="Lorem"/><SP WIDTH="13" VPOS="421" HPOS="491"/>
<String ID="string_51" HPOS="504" VPOS="420" WIDTH="121" HEIGHT="44" WC="0.96" CONTENT="ipsum"/><SP WIDTH="15" VPOS="420" HPOS="625"/>
<String ID="string_52" HPOS="640" VPOS="418" WIDTH="104" HEIGHT="36" WC="0.96" CONTENT="dolor"/><SP WIDTH="14" VPOS="418" HPOS="744"/>
<String ID="string_53" HPOS="758" VPOS="419" WIDTH="45" HEIGHT="35" WC="0.97" CONTENT="sit"/><SP WIDTH="15" VPOS="419" HPOS="803"/>
<String ID="string_54" HPOS="818" VPOS="424" WIDTH="104" HEIGHT="36" WC="0.96" CONTENT="amet,"/><SP WIDTH="17" VPOS="424" HPOS="922"/>
<String ID="string_55" HPOS="939" VPOS="422" WIDTH="201" HEIGHT="30" WC="0.96" CONTENT="consetetur"/><SP WIDTH="15" VPOS="422" HPOS="1140"/>
<String ID="string_56" HPOS="1155" VPOS="416" WIDTH="207" HEIGHT="46" WC="0.96" CONTENT="sadipscing"/><SP WIDTH="15" VPOS="416" HPOS="1362"/>
<String ID="string_57" HPOS="1377" VPOS="417" WIDTH="86" HEIGHT="42" WC="0.96" CONTENT="elitr,"/><SP WIDTH="17" VPOS="417" HPOS="1463"/>
<String ID="string_58" HPOS="1480" VPOS="416" WIDTH="66" HEIGHT="36" WC="0.96" CONTENT="sed"/><SP WIDTH="15" VPOS="416" HPOS="1546"/>
<String ID="string_59" HPOS="1561" VPOS="416" WIDTH="98" HEIGHT="36" WC="0.96" CONTENT="diam"/><SP WIDTH="14" VPOS="416" HPOS="1659"/>
<String ID="string_60" HPOS="1673" VPOS="427" WIDTH="163" HEIGHT="35" WC="0.96" CONTENT="nonumy"/><SP WIDTH="16" VPOS="427" HPOS="1836"/>
<String ID="string_61" HPOS="1852" VPOS="416" WIDTH="138" HEIGHT="36" WC="0.96" CONTENT="eirmod"/><SP WIDTH="13" VPOS="416" HPOS="1990"/>
<String ID="string_62" HPOS="2003" VPOS="422" WIDTH="140" HEIGHT="40" WC="0.96" CONTENT="tempor"/>
</TextLine>
<TextLine ID="line_4" HPOS="236" VPOS="474" WIDTH="1897" HEIGHT="47">
<String ID="string_63" HPOS="236" VPOS="476" WIDTH="166" HEIGHT="35" WC="0.96" CONTENT="invidunt"/><SP WIDTH="14" VPOS="476" HPOS="402"/>
<String ID="string_64" HPOS="416" VPOS="482" WIDTH="39" HEIGHT="29" WC="0.96" CONTENT="ut"/><SP WIDTH="12" VPOS="482" HPOS="455"/>
<String ID="string_65" HPOS="467" VPOS="476" WIDTH="122" HEIGHT="35" WC="0.96" CONTENT="labore"/><SP WIDTH="16" VPOS="476" HPOS="589"/>
<String ID="string_66" HPOS="605" VPOS="482" WIDTH="34" HEIGHT="29" WC="0.96" CONTENT="et"/><SP WIDTH="15" VPOS="482" HPOS="639"/>
<String ID="string_67" HPOS="654" VPOS="475" WIDTH="125" HEIGHT="36" WC="0.96" CONTENT="dolore"/><SP WIDTH="14" VPOS="475" HPOS="779"/>
<String ID="string_68" HPOS="793" VPOS="484" WIDTH="131" HEIGHT="37" WC="0.96" CONTENT="magna"/><SP WIDTH="15" VPOS="484" HPOS="924"/>
<String ID="string_69" HPOS="939" VPOS="474" WIDTH="182" HEIGHT="45" WC="0.96" CONTENT="aliquyam"/><SP WIDTH="15" VPOS="474" HPOS="1121"/>
<String ID="string_70" HPOS="1136" VPOS="480" WIDTH="81" HEIGHT="37" WC="0.96" CONTENT="erat,"/><SP WIDTH="18" VPOS="480" HPOS="1217"/>
<String ID="string_71" HPOS="1235" VPOS="474" WIDTH="63" HEIGHT="35" WC="0.96" CONTENT="sed"/><SP WIDTH="15" VPOS="474" HPOS="1298"/>
<String ID="string_72" HPOS="1313" VPOS="474" WIDTH="97" HEIGHT="35" WC="0.96" CONTENT="diam"/><SP WIDTH="13" VPOS="474" HPOS="1410"/>
<String ID="string_73" HPOS="1423" VPOS="474" WIDTH="186" HEIGHT="46" WC="0.96" CONTENT="voluptua."/><SP WIDTH="14" VPOS="474" HPOS="1609"/>
<String ID="string_74" HPOS="1623" VPOS="475" WIDTH="50" HEIGHT="34" WC="0.96" CONTENT="At"/><SP WIDTH="12" VPOS="475" HPOS="1673"/>
<String ID="string_75" HPOS="1685" VPOS="485" WIDTH="89" HEIGHT="24" WC="0.96" CONTENT="vero"/><SP WIDTH="16" VPOS="485" HPOS="1774"/>
<String ID="string_76" HPOS="1790" VPOS="484" WIDTH="63" HEIGHT="25" WC="0.96" CONTENT="eos"/><SP WIDTH="15" VPOS="484" HPOS="1853"/>
<String ID="string_77" HPOS="1868" VPOS="480" WIDTH="34" HEIGHT="29" WC="0.96" CONTENT="et"/><SP WIDTH="14" VPOS="480" HPOS="1902"/>
<String ID="string_78" HPOS="1916" VPOS="484" WIDTH="168" HEIGHT="25" WC="0.96" CONTENT="accusam"/><SP WIDTH="16" VPOS="484" HPOS="2084"/>
<String ID="string_79" HPOS="2100" VPOS="480" WIDTH="33" HEIGHT="29" WC="0.96" CONTENT="et"/>
</TextLine>
<TextLine ID="line_5" HPOS="234" VPOS="531" WIDTH="1950" HEIGHT="47">
<String ID="string_80" HPOS="234" VPOS="534" WIDTH="98" HEIGHT="44" WC="0.97" CONTENT="justo"/><SP WIDTH="16" VPOS="534" HPOS="332"/>
<String ID="string_81" HPOS="348" VPOS="533" WIDTH="71" HEIGHT="35" WC="0.96" CONTENT="duo"/><SP WIDTH="16" VPOS="533" HPOS="419"/>
<String ID="string_82" HPOS="435" VPOS="533" WIDTH="143" HEIGHT="35" WC="0.96" CONTENT="dolores"/><SP WIDTH="15" VPOS="533" HPOS="578"/>
<String ID="string_83" HPOS="593" VPOS="539" WIDTH="35" HEIGHT="29" WC="0.96" CONTENT="et"/><SP WIDTH="14" VPOS="539" HPOS="628"/>
<String ID="string_84" HPOS="642" VPOS="543" WIDTH="42" HEIGHT="25" WC="0.97" CONTENT="ea"/><SP WIDTH="14" VPOS="543" HPOS="684"/>
<String ID="string_85" HPOS="698" VPOS="533" WIDTH="137" HEIGHT="35" WC="0.96" CONTENT="rebum."/><SP WIDTH="18" VPOS="533" HPOS="835"/>
<String ID="string_86" HPOS="853" VPOS="534" WIDTH="74" HEIGHT="34" WC="0.96" CONTENT="Stet"/><SP WIDTH="14" VPOS="534" HPOS="927"/>
<String ID="string_87" HPOS="941" VPOS="531" WIDTH="84" HEIGHT="36" WC="0.96" CONTENT="clita"/><SP WIDTH="13" VPOS="531" HPOS="1025"/>
<String ID="string_88" HPOS="1038" VPOS="531" WIDTH="89" HEIGHT="35" WC="0.96" CONTENT="kasd"/><SP WIDTH="15" VPOS="531" HPOS="1127"/>
<String ID="string_89" HPOS="1142" VPOS="531" WIDTH="208" HEIGHT="46" WC="0.96" CONTENT="gubergren,"/><SP WIDTH="16" VPOS="531" HPOS="1350"/>
<String ID="string_90" HPOS="1366" VPOS="542" WIDTH="48" HEIGHT="25" WC="0.96" CONTENT="no"/><SP WIDTH="16" VPOS="542" HPOS="1414"/>
<String ID="string_91" HPOS="1430" VPOS="542" WIDTH="62" HEIGHT="25" WC="0.96" CONTENT="sea"/><SP WIDTH="13" VPOS="542" HPOS="1492"/>
<String ID="string_92" HPOS="1505" VPOS="531" WIDTH="173" HEIGHT="36" WC="0.96" CONTENT="takimata"/><SP WIDTH="15" VPOS="531" HPOS="1678"/>
<String ID="string_93" HPOS="1693" VPOS="538" WIDTH="144" HEIGHT="29" WC="0.96" CONTENT="sanctus"/><SP WIDTH="16" VPOS="538" HPOS="1837"/>
<String ID="string_94" HPOS="1853" VPOS="537" WIDTH="53" HEIGHT="29" WC="0.96" CONTENT="est"/><SP WIDTH="14" VPOS="537" HPOS="1906"/>
<String ID="string_95" HPOS="1920" VPOS="533" WIDTH="130" HEIGHT="33" WC="0.96" CONTENT="Lorem"/><SP WIDTH="14" VPOS="533" HPOS="2050"/>
<String ID="string_96" HPOS="2064" VPOS="532" WIDTH="120" HEIGHT="44" WC="0.95" CONTENT="ipsum"/>
</TextLine>
<TextLine ID="line_6" HPOS="237" VPOS="590" WIDTH="282" HEIGHT="41">
<String ID="string_97" HPOS="237" VPOS="590" WIDTH="104" HEIGHT="35" WC="0.96" CONTENT="dolor"/><SP WIDTH="15" VPOS="590" HPOS="341"/>
<String ID="string_98" HPOS="356" VPOS="591" WIDTH="45" HEIGHT="35" WC="0.96" CONTENT="sit"/><SP WIDTH="14" VPOS="591" HPOS="401"/>
<String ID="string_99" HPOS="415" VPOS="597" WIDTH="104" HEIGHT="34" WC="0.96" CONTENT="amet."/>
</TextLine>
</TextBlock>
</PrintSpace>
</Page>
</Layout>
</alto>

@ -0,0 +1,290 @@
<?xml version="1.0" encoding="UTF-8"?>
<pc:PcGts xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2018-07-15">
<pc:Metadata>
<pc:Creator>OCR-D/core 1.0.0b19</pc:Creator>
<pc:Created>2019-09-26T11:59:19.519140</pc:Created>
<pc:LastChange>2019-09-26T11:59:19.519140</pc:LastChange>
<pc:MetadataItem type="processingStep" name="layout/segmentation/region" value="ocrd-tesserocr-segment-region">
<pc:Labels>
<pc:Label value="True" type="overwrite_regions"/>
<pc:Label value="8" type="padding"/>
<pc:Label value="False" type="crop_polygons"/>
<pc:Label value="True" type="find_tables"/>
</pc:Labels>
</pc:MetadataItem>
<pc:MetadataItem type="processingStep" name="layout/segmentation/line" value="ocrd-tesserocr-segment-line">
<pc:Labels>
<pc:Label value="True" type="overwrite_lines"/>
</pc:Labels>
</pc:MetadataItem>
</pc:Metadata>
<pc:Page imageFilename="../OCR-D-IMG-BIN/OCR-D-IMG-BIN_0001.png" imageWidth="1832" imageHeight="2408">
<pc:ReadingOrder>
<pc:OrderedGroup id="reading-order">
<pc:RegionRefIndexed index="0" regionRef="region0000"/>
<pc:RegionRefIndexed index="1" regionRef="region0001"/>
<pc:RegionRefIndexed index="2" regionRef="region0002"/>
<pc:RegionRefIndexed index="3" regionRef="region0003"/>
</pc:OrderedGroup>
</pc:ReadingOrder>
<pc:TextRegion id="region0001">
<pc:Coords points="184,196 1338,196 1338,1969 184,1969"/>
<pc:TextLine id="region0001_line0000">
<pc:Coords points="217,204 1324,204 1324,264 217,264"/>
<pc:TextEquiv>
<pc:Unicode>phariſei hypocritæ, qui comeditis domos uiduarã ſub</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0001">
<pc:Coords points="220,258 1325,258 1325,314 220,314"/>
<pc:TextEquiv>
<pc:Unicode>prætextu longarum precationum, propterea maiorẽ</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0002">
<pc:Coords points="218,305 1325,305 1325,359 218,359"/>
<pc:TextEquiv>
<pc:Unicode>accipieris condemnationem. Ideo enim ꝙ non oratis</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0003">
<pc:Coords points="217,354 1325,354 1325,413 217,413"/>
<pc:TextEquiv>
<pc:Unicode>ſecundum præſeriptum ſacræ ſcripturæ, nec ex ſpiritu</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0004">
<pc:Coords points="216,401 1322,401 1322,460 216,460"/>
<pc:TextEquiv>
<pc:Unicode>&amp; ueritate ſed iuxta ueſtram propriam conſtitutionẽ,</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0005">
<pc:Coords points="219,454 1324,454 1324,505 219,505"/>
<pc:TextEquiv>
<pc:Unicode>orationes ueſtræ nõ ſiunt Deo acceptæ, neq; ab eo ex⸗</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0006">
<pc:Coords points="219,501 1326,501 1326,563 219,563"/>
<pc:TextEquiv>
<pc:Unicode>audiunt᷑ Eſa, Cum multiplicaueritis orationes ueſtras</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0007">
<pc:Coords points="215,556 1325,556 1325,607 215,607"/>
<pc:TextEquiv>
<pc:Unicode>non exaudiam uos. Chriſtiani uero quia orant iuxta</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0008">
<pc:Coords points="218,605 1324,605 1324,665 218,665"/>
<pc:TextEquiv>
<pc:Unicode>tenorem ſcripturæ, &amp; ex ſpiritu &amp; ueritate, ideo eo⸗</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0009">
<pc:Coords points="217,651 1324,651 1324,707 217,707"/>
<pc:TextEquiv>
<pc:Unicode>rum orationes a Deo exaudiuntur, ſuntq; illi grat iſsi⸗</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0010">
<pc:Coords points="219,705 1322,705 1322,756 219,756"/>
<pc:TextEquiv>
<pc:Unicode>mæ, dicunt enim Pater noſter qui es iu cœlis &amp;c. Vos</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0011">
<pc:Coords points="218,756 1323,756 1323,806 218,806"/>
<pc:TextEquiv>
<pc:Unicode>autem hoc tenore orandi contempto, obmur muratis</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0012">
<pc:Coords points="218,803 1327,803 1327,854 218,854"/>
<pc:TextEquiv>
<pc:Unicode>ueſtras Horas canonicas, hoc eſt, diabolicas ab Anti⸗</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0013">
<pc:Coords points="218,852 1324,852 1324,904 218,904"/>
<pc:TextEquiv>
<pc:Unicode>chriſto inſtitutas. Paulus mauult quinq; uerba in Ec⸗</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0014">
<pc:Coords points="219,904 1323,904 1323,958 219,958"/>
<pc:TextEquiv>
<pc:Unicode>cle ſia loqui in ſenſu, qß decem milia uerborum in lin⸗</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0015">
<pc:Coords points="218,954 1326,954 1326,1010 218,1010"/>
<pc:TextEquiv>
<pc:Unicode>ua, Quibus uerbis adeo dãnat ueſtras prolixas ora⸗</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0016">
<pc:Coords points="192,1002 1324,1002 1324,1052 192,1052"/>
<pc:TextEquiv>
<pc:Unicode>tiones, ut ſi ſemiuncia ſanæ mentis uel mica ſidei eſfet</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0017">
<pc:Coords points="218,1055 965,1055 965,1101 218,1101"/>
<pc:TextEquiv>
<pc:Unicode>in uobis, eas ſine dubio omitteretis.</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0018">
<pc:Coords points="325,1103 1323,1103 1323,1160 325,1160"/>
<pc:TextEquiv>
<pc:Unicode>De inuocatione diuorum ne apiculus quidem ha</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0019">
<pc:Coords points="216,1156 1326,1156 1326,1212 216,1212"/>
<pc:TextEquiv>
<pc:Unicode>betur in ſacris literis, quare ter ſtulti eſtis quod inuo⸗</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0020">
<pc:Coords points="220,1210 1326,1210 1326,1262 220,1262"/>
<pc:TextEquiv>
<pc:Unicode>catis ſanctos, cum ex præce pto Dei ne mo inuocandus</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0021">
<pc:Coords points="218,1261 1326,1261 1326,1307 218,1307"/>
<pc:TextEquiv>
<pc:Unicode>ſit niſi ſolus Deus. Inuoca inquit me in die tribulatio⸗</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0022">
<pc:Coords points="222,1305 1324,1305 1324,1354 222,1354"/>
<pc:TextEquiv>
<pc:Unicode>nis. &amp; eruam te, &amp; honorificabis me. Et omnis qui⸗</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0023">
<pc:Coords points="221,1353 1324,1353 1324,1415 221,1415"/>
<pc:TextEquiv>
<pc:Unicode>cumq; inuocauerit nomen domini, ſaluus erit Sed</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0024">
<pc:Coords points="220,1404 1321,1404 1321,1465 220,1465"/>
<pc:TextEquiv>
<pc:Unicode>quomodo inuocabitis, in quem non credidiſtis? Quo</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0025">
<pc:Coords points="221,1456 1325,1456 1325,1508 221,1508"/>
<pc:TextEquiv>
<pc:Unicode>modo credetis ſine uerbo ? Inuocationẽ ergo in ſcrip⸗</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0026">
<pc:Coords points="222,1509 1323,1509 1323,1559 222,1559"/>
<pc:TextEquiv>
<pc:Unicode>turis non legitis cõmemorationem uero ſæpe, non ut</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0027">
<pc:Coords points="222,1555 1330,1555 1330,1612 222,1612"/>
<pc:TextEquiv>
<pc:Unicode>intercedant pro uobis ſancti, ſed nt meminerit Deus</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0028">
<pc:Coords points="219,1604 1325,1604 1325,1664 219,1664"/>
<pc:TextEquiv>
<pc:Unicode>Teſtamenti cum patribus ſanctis pacti, ut ſimiliter uo⸗</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0029">
<pc:Coords points="218,1653 1323,1653 1323,1719 218,1719"/>
<pc:TextEquiv>
<pc:Unicode>biſcum agat per miſericordiam, quemadmodum cum</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0030">
<pc:Coords points="219,1704 1321,1704 1321,1769 219,1769"/>
<pc:TextEquiv>
<pc:Unicode>ilis egit. Atq; hoc non eſt inuocare ſanctos. ſed Deum</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0031">
<pc:Coords points="222,1758 1322,1758 1322,1817 222,1817"/>
<pc:TextEquiv>
<pc:Unicode>ſuæ miſericordiæ &amp; promiſsionis admonere Sic pſal</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0032">
<pc:Coords points="224,1809 1324,1809 1324,1866 224,1866"/>
<pc:TextEquiv>
<pc:Unicode>mographus dicit, Qui paſcis Iſrael attende, qui de⸗</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0033">
<pc:Coords points="222,1858 1320,1858 1320,1913 222,1913"/>
<pc:TextEquiv>
<pc:Unicode>ducis uelut ouem Iacob Sic &amp; Moſes orat, Memento</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextLine id="region0001_line0034">
<pc:Coords points="345,1909 1320,1909 1320,1963 345,1963"/>
<pc:TextEquiv>
<pc:Unicode>B 3 domi⸗</pc:Unicode>
</pc:TextEquiv>
</pc:TextLine>
<pc:TextEquiv>
<pc:Unicode>phariſei hypocritæ, qui comeditis domos uiduarã ſub
prætextu longarum precationum, propterea maiorẽ
accipieris condemnationem. Ideo enim ꝙ non oratis
ſecundum præſeriptum ſacræ ſcripturæ, nec ex ſpiritu
&amp; ueritate ſed iuxta ueſtram propriam conſtitutionẽ,
orationes ueſtræ nõ ſiunt Deo acceptæ, neq; ab eo ex⸗
audiunt᷑ Eſa, Cum multiplicaueritis orationes ueſtras
non exaudiam uos. Chriſtiani uero quia orant iuxta
tenorem ſcripturæ, &amp; ex ſpiritu &amp; ueritate, ideo eo⸗
rum orationes a Deo exaudiuntur, ſuntq; illi grat iſsi⸗
mæ, dicunt enim Pater noſter qui es iu cœlis &amp;c. Vos
autem hoc tenore orandi contempto, obmur muratis
ueſtras Horas canonicas, hoc eſt, diabolicas ab Anti⸗
chriſto inſtitutas. Paulus mauult quinq; uerba in Ec⸗
cle ſia loqui in ſenſu, qß decem milia uerborum in lin⸗
ua, Quibus uerbis adeo dãnat ueſtras prolixas ora⸗
tiones, ut ſi ſemiuncia ſanæ mentis uel mica ſidei eſfet
in uobis, eas ſine dubio omitteretis.
De inuocatione diuorum ne apiculus quidem ha
betur in ſacris literis, quare ter ſtulti eſtis quod inuo⸗
catis ſanctos, cum ex præce pto Dei ne mo inuocandus
ſit niſi ſolus Deus. Inuoca inquit me in die tribulatio⸗
nis. &amp; eruam te, &amp; honorificabis me. Et omnis qui⸗
cumq; inuocauerit nomen domini, ſaluus erit Sed
quomodo inuocabitis, in quem non credidiſtis? Quo
modo credetis ſine uerbo ? Inuocationẽ ergo in ſcrip⸗
turis non legitis cõmemorationem uero ſæpe, non ut
intercedant pro uobis ſancti, ſed nt meminerit Deus
Teſtamenti cum patribus ſanctis pacti, ut ſimiliter uo⸗
biſcum agat per miſericordiam, quemadmodum cum
ilis egit. Atq; hoc non eſt inuocare ſanctos. ſed Deum
ſuæ miſericordiæ &amp; promiſsionis admonere Sic pſal
mographus dicit, Qui paſcis Iſrael attende, qui de⸗
ducis uelut ouem Iacob Sic &amp; Moſes orat, Memento
B 3 domi⸗</pc:Unicode>
</pc:TextEquiv>
</pc:TextRegion>
<pc:ImageRegion id="region0000">
<pc:Coords points="5,21 1790,21 1790,302 5,302"/>
</pc:ImageRegion>
<pc:ImageRegion id="region0002">
<pc:Coords points="0,1962 1813,1962 1813,2361 0,2361"/>
</pc:ImageRegion>
<pc:ImageRegion id="region0003">
<pc:Coords points="1316,166 1790,166 1790,238 1316,238"/>
</pc:ImageRegion>
</pc:Page>
</pc:PcGts>

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

@ -0,0 +1,64 @@
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<alto xmlns="http://www.loc.gov/standards/alto/ns-v2#" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v2# http://www.loc.gov/standards/alto/alto-v2.0.xsd">
<Description>
<MeasurementUnit>pixel</MeasurementUnit>
<OCRProcessing ID="IdOcr"><ocrProcessingStep><processingDateTime>2017-03-27</processingDateTime><processingSoftware><softwareCreator>ABBYY</softwareCreator><softwareName>ABBYY FineReader Engine</softwareName><softwareVersion>11</softwareVersion></processingSoftware></ocrProcessingStep></OCRProcessing>
</Description>
<Styles><TextStyle ID="font0" FONTFAMILY="Times New Roman" FONTSIZE="7"/><TextStyle ID="font1" FONTFAMILY="Times New Roman" FONTSIZE="11"/>
</Styles>
<Layout>
<Page ID="Page1" PHYSICAL_IMG_NR="1" HEIGHT="2500" WIDTH="1720">
<TopMargin HEIGHT="172" WIDTH="1720" VPOS="0" HPOS="0">
</TopMargin>
<LeftMargin HEIGHT="2016" WIDTH="341" VPOS="172" HPOS="0">
</LeftMargin>
<RightMargin HEIGHT="2016" WIDTH="111" VPOS="172" HPOS="1609">
</RightMargin>
<BottomMargin HEIGHT="312" WIDTH="1720" VPOS="2188" HPOS="0">
</BottomMargin>
<PrintSpace HEIGHT="2016" WIDTH="1268" VPOS="172" HPOS="341">
<TextBlock ID="Page1_Block1" HEIGHT="43" WIDTH="72" VPOS="174" HPOS="936" language="de" STYLEREFS="font1">
<TextLine HEIGHT="31" WIDTH="60" VPOS="180" HPOS="942"><String STYLE="bold" WC="0.676666677" CONTENT="142" HEIGHT="31" WIDTH="60" VPOS="180" HPOS="942"/></TextLine>
</TextBlock>
<ComposedBlock ID="Page1_Block2" HEIGHT="1306" WIDTH="1266" VPOS="257" HPOS="341" TYPE="container"><Shape><Polygon POINTS="348,262 1610,262 1610,1564 348,1564 348,262"/></Shape>
<TextBlock ID="Page1_Block3" HEIGHT="776" WIDTH="1261" VPOS="257" HPOS="343" language="de" STYLEREFS="font1"><Shape><Polygon POINTS="350,262 1610,262 1610,708 992,708 992,1034 350,1034 350,262"/></Shape>
<TextLine HEIGHT="50" WIDTH="1223" VPOS="267" HPOS="363"><String WC="0.6899999976" CONTENT="die" HEIGHT="33" WIDTH="46" VPOS="271" HPOS="363"/><SP WIDTH="16" VPOS="272" HPOS="410"/><String WC="0.7875000238" CONTENT="Zugtiere" HEIGHT="44" WIDTH="142" VPOS="270" HPOS="427"/><SP WIDTH="20" VPOS="281" HPOS="570"/><String WC="0.9499999881" CONTENT="eines" HEIGHT="34" WIDTH="82" VPOS="271" HPOS="591"/><SP WIDTH="10" VPOS="272" HPOS="674"/><String WC="0.6349999905" CONTENT="Joches" HEIGHT="42" WIDTH="113" VPOS="272" HPOS="685"/><SP WIDTH="15" VPOS="271" HPOS="799"/><String WC="0.6009091139" CONTENT="(griechisch" HEIGHT="45" WIDTH="161" VPOS="270" HPOS="815"/><SP WIDTH="19" VPOS="271" HPOS="977"/><String WC="0.7699999809" CONTENT="zygos)," HEIGHT="44" WIDTH="126" VPOS="269" HPOS="997"/><SP WIDTH="21" VPOS="272" HPOS="1124"/><String WC="0.7099999785" CONTENT="so" HEIGHT="42" WIDTH="27" VPOS="271" HPOS="1146"/><SP WIDTH="19" VPOS="280" HPOS="1174"/><String WC="0.6679999828" CONTENT="nennt" HEIGHT="32" WIDTH="94" VPOS="272" HPOS="1194"/><SP WIDTH="19" VPOS="272" HPOS="1289"/><String WC="0.4133333266" CONTENT="man" HEIGHT="23" WIDTH="72" VPOS="281" HPOS="1309"/><SP WIDTH="21" VPOS="271" HPOS="1382"/><String WC="0.5099999905" CONTENT="die" HEIGHT="33" WIDTH="46" VPOS="271" HPOS="1404"/><SP WIDTH="15" VPOS="272" HPOS="1451"/><String WC="0.8700000048" CONTENT="Zporen" HEIGHT="43" WIDTH="119" VPOS="271" HPOS="1467"/></TextLine>
<TextLine HEIGHT="51" WIDTH="1224" VPOS="321" HPOS="363"><String WC="0.8133333325" CONTENT="der" HEIGHT="34" WIDTH="50" VPOS="325" HPOS="363"/><SP WIDTH="24" VPOS="327" HPOS="414"/><String WC="0.8700000048" CONTENT="Tonjugaten" HEIGHT="43" WIDTH="197" VPOS="326" HPOS="439"/><SP WIDTH="32" VPOS="337" HPOS="637"/><String WC="0.6499999762" CONTENT="auch" HEIGHT="43" WIDTH="70" VPOS="326" HPOS="670"/><SP WIDTH="31" VPOS="326" HPOS="741"/><String WC="0.7120000124" CONTENT="Jochsporen" HEIGHT="43" WIDTH="185" VPOS="326" HPOS="773"/><SP WIDTH="37" VPOS="336" HPOS="959"/><String WC="0.9200000167" CONTENT="oder" HEIGHT="32" WIDTH="71" VPOS="327" HPOS="997"/><SP WIDTH="31" VPOS="326" HPOS="1069"/><String WC="0.7072727084" CONTENT="Zpgosporen." HEIGHT="44" WIDTH="203" VPOS="325" HPOS="1101"/><SP WIDTH="53" VPOS="326" HPOS="1305"/><String WC="0.5320000052" CONTENT="Daher" HEIGHT="43" WIDTH="107" VPOS="326" HPOS="1359"/><SP WIDTH="36" VPOS="325" HPOS="1467"/><String WC="0.5720000267" CONTENT="heißt" HEIGHT="43" WIDTH="83" VPOS="325" HPOS="1504"/></TextLine>
<TextLine HEIGHT="46" WIDTH="655" VPOS="379" HPOS="363"><String WC="0.8650000095" CONTENT="auch" HEIGHT="43" WIDTH="70" VPOS="381" HPOS="363"/><SP WIDTH="29" VPOS="381" HPOS="434"/><String WC="0.6299999952" CONTENT="die" HEIGHT="33" WIDTH="46" VPOS="381" HPOS="464"/><SP WIDTH="24" VPOS="392" HPOS="511"/><String WC="0.7699999809" CONTENT="ganze" HEIGHT="33" WIDTH="94" VPOS="391" HPOS="536"/><SP WIDTH="24" VPOS="381" HPOS="631"/><String WC="0.7371428609" CONTENT="Ordnung" HEIGHT="43" WIDTH="154" VPOS="381" HPOS="656"/><SP WIDTH="24" VPOS="382" HPOS="811"/><String WC="0.800999999" CONTENT="Jochalgen." HEIGHT="43" WIDTH="182" VPOS="381" HPOS="836"/></TextLine>
<TextLine HEIGHT="50" WIDTH="1182" VPOS="432" HPOS="406"><String WC="0.3966666758" CONTENT="Wir" HEIGHT="33" WIDTH="69" VPOS="436" HPOS="406"/><SP WIDTH="24" VPOS="446" HPOS="475"/><String WC="0.6949999928" CONTENT="wollen" HEIGHT="33" WIDTH="112" VPOS="436" HPOS="499"/><SP WIDTH="24" VPOS="445" HPOS="611"/><String WC="0.5166666508" CONTENT="nun" HEIGHT="23" WIDTH="65" VPOS="446" HPOS="635"/><SP WIDTH="24" VPOS="446" HPOS="700"/><String WC="0.7570000291" CONTENT="versuchen," HEIGHT="44" WIDTH="166" VPOS="435" HPOS="724"/><SP WIDTH="27" VPOS="446" HPOS="890"/><String WC="0.6733333468" CONTENT="uns" HEIGHT="23" WIDTH="59" VPOS="446" HPOS="917"/><SP WIDTH="25" VPOS="446" HPOS="976"/><String WC="0.6725000143" CONTENT="eine" HEIGHT="33" WIDTH="66" VPOS="436" HPOS="1001"/><SP WIDTH="25" VPOS="436" HPOS="1067"/><String WC="0.6690909266" CONTENT="Vorstellung" HEIGHT="44" WIDTH="192" VPOS="435" HPOS="1092"/><SP WIDTH="25" VPOS="446" HPOS="1284"/><String WC="0.8466666937" CONTENT="von" HEIGHT="23" WIDTH="62" VPOS="446" HPOS="1309"/><SP WIDTH="25" VPOS="436" HPOS="1371"/><String WC="0.5866666436" CONTENT="den" HEIGHT="32" WIDTH="56" VPOS="436" HPOS="1396"/><SP WIDTH="25" VPOS="436" HPOS="1452"/><String WC="0.7366666794" CONTENT="Zchon-" HEIGHT="44" WIDTH="111" VPOS="435" HPOS="1477"/></TextLine>
<TextLine HEIGHT="50" WIDTH="1224" VPOS="486" HPOS="363"><String WC="0.7181817889" CONTENT="heitsformen" HEIGHT="45" WIDTH="199" VPOS="489" HPOS="363"/><SP WIDTH="32" VPOS="490" HPOS="563"/><String WC="0.8633333445" CONTENT="der" HEIGHT="33" WIDTH="50" VPOS="490" HPOS="596"/><SP WIDTH="31" VPOS="491" HPOS="647"/><String WC="0.7749999762" CONTENT="in" HEIGHT="33" WIDTH="30" VPOS="491" HPOS="679"/><SP WIDTH="31" VPOS="501" HPOS="710"/><String WC="0.5479999781" CONTENT="viele" HEIGHT="33" WIDTH="75" VPOS="491" HPOS="742"/><SP WIDTH="32" VPOS="502" HPOS="818"/><String WC="0.7345454693" CONTENT="artenreiche" HEIGHT="44" WIDTH="181" VPOS="490" HPOS="851"/><SP WIDTH="31" VPOS="491" HPOS="1033"/><String WC="0.7277777791" CONTENT="Gattungen" HEIGHT="43" WIDTH="181" VPOS="490" HPOS="1065"/><SP WIDTH="32" VPOS="501" HPOS="1247"/><String WC="0.7766666412" CONTENT="geteilten" HEIGHT="43" WIDTH="140" VPOS="490" HPOS="1280"/><SP WIDTH="32" VPOS="491" HPOS="1421"/><String WC="0.7514285445" CONTENT="Familie" HEIGHT="44" WIDTH="133" VPOS="489" HPOS="1454"/></TextLine>
<TextLine HEIGHT="51" WIDTH="1225" VPOS="540" HPOS="362"><String WC="0.7633333206" CONTENT="der" HEIGHT="32" WIDTH="51" VPOS="546" HPOS="362"/><SP WIDTH="24" VPOS="544" HPOS="414"/><String WC="0.4366666675" CONTENT="OesmiäiLLeen" HEIGHT="35" WIDTH="254" VPOS="543" HPOS="439"/><SP WIDTH="29" VPOS="555" HPOS="694"/><String WC="0.8199999928" CONTENT="zu" HEIGHT="31" WIDTH="35" VPOS="556" HPOS="724"/><SP WIDTH="24" VPOS="556" HPOS="760"/><String WC="0.5699999928" CONTENT="machen." HEIGHT="44" WIDTH="131" VPOS="545" HPOS="785"/><SP WIDTH="47" VPOS="546" HPOS="917"/><String WC="0.7466666698" CONTENT="Vas" HEIGHT="33" WIDTH="68" VPOS="546" HPOS="965"/><SP WIDTH="25" VPOS="556" HPOS="1034"/><String WC="0.6685714126" CONTENT="gelingt" HEIGHT="43" WIDTH="116" VPOS="545" HPOS="1060"/><SP WIDTH="24" VPOS="545" HPOS="1177"/><String WC="0.5785714388" CONTENT="leicht," HEIGHT="43" WIDTH="95" VPOS="545" HPOS="1202"/><SP WIDTH="31" VPOS="556" HPOS="1298"/><String WC="0.6675000191" CONTENT="wenn" HEIGHT="23" WIDTH="90" VPOS="556" HPOS="1330"/><SP WIDTH="23" VPOS="556" HPOS="1421"/><String WC="0.5666666627" CONTENT="wir" HEIGHT="35" WIDTH="58" VPOS="544" HPOS="1445"/><SP WIDTH="23" VPOS="555" HPOS="1504"/><String WC="0.8000000119" CONTENT="uns" HEIGHT="23" WIDTH="59" VPOS="555" HPOS="1528"/></TextLine>
<TextLine HEIGHT="50" WIDTH="1225" VPOS="596" HPOS="362"><String WC="0.6399999857" CONTENT="selbst" HEIGHT="42" WIDTH="84" VPOS="600" HPOS="362"/><SP WIDTH="23" VPOS="603" HPOS="447"/><String WC="0.80400002" CONTENT="etwas" HEIGHT="33" WIDTH="98" VPOS="601" HPOS="471"/><SP WIDTH="23" VPOS="601" HPOS="570"/><String WC="0.6587499976" CONTENT="Material" HEIGHT="34" WIDTH="156" VPOS="600" HPOS="594"/><SP WIDTH="24" VPOS="601" HPOS="751"/><String WC="0.7300000191" CONTENT="holen," HEIGHT="44" WIDTH="99" VPOS="600" HPOS="776"/><SP WIDTH="25" VPOS="600" HPOS="876"/><String WC="0.7516666651" CONTENT="höchst" HEIGHT="43" WIDTH="95" VPOS="600" HPOS="902"/><SP WIDTH="22" VPOS="603" HPOS="998"/><String WC="0.5454545617" CONTENT="mangelhaft," HEIGHT="44" WIDTH="206" VPOS="600" HPOS="1021"/><SP WIDTH="25" VPOS="610" HPOS="1228"/><String WC="0.7599999905" CONTENT="wenn" HEIGHT="23" WIDTH="90" VPOS="610" HPOS="1254"/><SP WIDTH="23" VPOS="610" HPOS="1345"/><String WC="0.6299999952" CONTENT="wir" HEIGHT="34" WIDTH="58" VPOS="600" HPOS="1369"/><SP WIDTH="23" VPOS="611" HPOS="1428"/><String WC="0.8100000024" CONTENT="uns" HEIGHT="24" WIDTH="59" VPOS="610" HPOS="1452"/><SP WIDTH="20" VPOS="610" HPOS="1512"/><String WC="0.5966666937" CONTENT="auf" HEIGHT="42" WIDTH="54" VPOS="600" HPOS="1533"/></TextLine>
<TextLine HEIGHT="50" WIDTH="1224" VPOS="651" HPOS="362"><String WC="0.7933333516" CONTENT="die" HEIGHT="33" WIDTH="46" VPOS="655" HPOS="362"/><SP WIDTH="23" VPOS="655" HPOS="409"/><String WC="0.8428571224" CONTENT="Lektüre" HEIGHT="35" WIDTH="129" VPOS="654" HPOS="433"/><SP WIDTH="24" VPOS="655" HPOS="563"/><String WC="0.6150000095" CONTENT="dieses" HEIGHT="42" WIDTH="92" VPOS="655" HPOS="588"/><SP WIDTH="23" VPOS="656" HPOS="681"/><String WC="0.8766666651" CONTENT="Buches" HEIGHT="43" WIDTH="115" VPOS="655" HPOS="705"/><SP WIDTH="30" VPOS="655" HPOS="821"/><String WC="0.6575000286" CONTENT="beschränken." HEIGHT="45" WIDTH="211" VPOS="654" HPOS="852"/><SP WIDTH="46" VPOS="656" HPOS="1064"/><String WC="0.5699999928" CONTENT="Das" HEIGHT="34" WIDTH="68" VPOS="655" HPOS="1111"/><SP WIDTH="23" VPOS="656" HPOS="1180"/><String WC="0.7912499905" CONTENT="Material" HEIGHT="33" WIDTH="156" VPOS="655" HPOS="1204"/><SP WIDTH="24" VPOS="655" HPOS="1361"/><String WC="0.8199999928" CONTENT="ist" HEIGHT="42" WIDTH="33" VPOS="655" HPOS="1386"/><SP WIDTH="23" VPOS="655" HPOS="1420"/><String WC="0.6716666818" CONTENT="leicht" HEIGHT="44" WIDTH="83" VPOS="654" HPOS="1444"/><SP WIDTH="22" VPOS="657" HPOS="1528"/><String WC="0.6999999881" CONTENT="zu" HEIGHT="31" WIDTH="35" VPOS="665" HPOS="1551"/></TextLine>
<TextLine HEIGHT="46" WIDTH="608" VPOS="707" HPOS="361"><String WC="0.6736363769" CONTENT="beschaffen." HEIGHT="43" WIDTH="175" VPOS="709" HPOS="361"/><SP WIDTH="30" VPOS="710" HPOS="537"/><String WC="0.6533333063" CONTENT="Man" HEIGHT="33" WIDTH="84" VPOS="710" HPOS="568"/><SP WIDTH="22" VPOS="710" HPOS="653"/><String WC="0.6228571534" CONTENT="sammelt" HEIGHT="42" WIDTH="137" VPOS="710" HPOS="676"/><SP WIDTH="20" VPOS="712" HPOS="814"/><String WC="0.7666666508" CONTENT="aus" HEIGHT="24" WIDTH="57" VPOS="720" HPOS="835"/><SP WIDTH="20" VPOS="710" HPOS="893"/><String WC="0.5966666937" CONTENT="den" HEIGHT="33" WIDTH="55" VPOS="710" HPOS="914"/></TextLine>
<TextLine HEIGHT="47" WIDTH="607" VPOS="762" HPOS="364"><String WC="0.7990909219" CONTENT="Torflöchern" HEIGHT="44" WIDTH="195" VPOS="763" HPOS="364"/><SP WIDTH="16" VPOS="764" HPOS="559"/><String WC="0.9300000072" CONTENT="der" HEIGHT="33" WIDTH="52" VPOS="764" HPOS="575"/><SP WIDTH="8" VPOS="764" HPOS="627"/><String WC="0.7636363506" CONTENT="Niedermoore" HEIGHT="34" WIDTH="217" VPOS="764" HPOS="635"/><SP WIDTH="11" VPOS="765" HPOS="852"/><String WC="0.7620000243" CONTENT="Moose" HEIGHT="42" WIDTH="108" VPOS="765" HPOS="863"/></TextLine>
<TextLine HEIGHT="48" WIDTH="608" VPOS="817" HPOS="363"><String WC="1." CONTENT="oder" HEIGHT="33" WIDTH="70" VPOS="819" HPOS="363"/><SP WIDTH="28" VPOS="819" HPOS="434"/><String WC="0.6233333349" CONTENT="höhere" HEIGHT="45" WIDTH="111" VPOS="818" HPOS="463"/><SP WIDTH="28" VPOS="820" HPOS="575"/><String WC="0.6035714149" CONTENT="Wasserpflanzen" HEIGHT="44" WIDTH="260" VPOS="818" HPOS="604"/><SP WIDTH="29" VPOS="818" HPOS="865"/><String WC="0.7839999795" CONTENT="(sehr" HEIGHT="45" WIDTH="76" VPOS="818" HPOS="895"/></TextLine>
<TextLine HEIGHT="46" WIDTH="609" VPOS="872" HPOS="362"><String WC="0.6299999952" CONTENT="ist" HEIGHT="42" WIDTH="35" VPOS="874" HPOS="362"/><SP WIDTH="25" VPOS="875" HPOS="398"/><String WC="0.9666666389" CONTENT="der" HEIGHT="33" WIDTH="51" VPOS="875" HPOS="424"/><SP WIDTH="25" VPOS="875" HPOS="476"/><String WC="0.5278571248" CONTENT="Wasserschlauch" HEIGHT="44" WIDTH="245" VPOS="874" HPOS="502"/><SP WIDTH="25" VPOS="874" HPOS="748"/><String WC="0.8245454431" CONTENT="Utricularia" HEIGHT="36" WIDTH="197" VPOS="873" HPOS="774"/></TextLine>
<TextLine HEIGHT="47" WIDTH="608" VPOS="927" HPOS="361"><String WC="0.7950000167" CONTENT="zu" HEIGHT="32" WIDTH="36" VPOS="939" HPOS="361"/><SP WIDTH="24" VPOS="939" HPOS="398"/><String WC="0.7300000191" CONTENT="empfehlen)," HEIGHT="44" WIDTH="194" VPOS="928" HPOS="423"/><SP WIDTH="32" VPOS="930" HPOS="618"/><String WC="0.9433333278" CONTENT="die" HEIGHT="33" WIDTH="46" VPOS="929" HPOS="651"/><SP WIDTH="29" VPOS="940" HPOS="698"/><String WC="0.5666666627" CONTENT="mit" HEIGHT="33" WIDTH="56" VPOS="930" HPOS="728"/><SP WIDTH="23" VPOS="930" HPOS="785"/><String WC="0.7674999833" CONTENT="braunem," HEIGHT="44" WIDTH="160" VPOS="929" HPOS="809"/></TextLine>
<TextLine HEIGHT="49" WIDTH="606" VPOS="980" HPOS="362"><String WC="0.6863636374" CONTENT="schlickigem" HEIGHT="43" WIDTH="176" VPOS="984" HPOS="362"/><SP WIDTH="32" VPOS="981" HPOS="539"/><String WC="0.6887500286" CONTENT="Überzüge" HEIGHT="45" WIDTH="157" VPOS="981" HPOS="572"/><SP WIDTH="31" VPOS="984" HPOS="730"/><String WC="0.5857142806" CONTENT="besetzt" HEIGHT="45" WIDTH="101" VPOS="983" HPOS="762"/><SP WIDTH="32" VPOS="985" HPOS="864"/><String WC="0.8379999995" CONTENT="sind." HEIGHT="42" WIDTH="71" VPOS="984" HPOS="897"/></TextLine>
</TextBlock>
<Illustration ID="Page1_Block4" HEIGHT="232" WIDTH="604" VPOS="1131" HPOS="374"><Shape><Polygon POINTS="378,1134 982,1134 982,1364 378,1364 378,1134"/></Shape></Illustration>
<Illustration ID="Page1_Block5" HEIGHT="664" WIDTH="539" VPOS="732" HPOS="1013"><Shape><Polygon POINTS="1019,737 1556,737 1556,1399 1019,1399 1019,737"/></Shape></Illustration>
<TextBlock ID="Page1_Block6" HEIGHT="140" WIDTH="1258" VPOS="1423" HPOS="345" language="de" STYLEREFS="font0"><Shape><Polygon POINTS="348,1428 1606,1428 1606,1564 348,1564 348,1428"/></Shape>
<TextLine HEIGHT="32" WIDTH="1225" VPOS="1429" HPOS="362"><String WC="0.4325000048" CONTENT="Fig." HEIGHT="26" WIDTH="46" VPOS="1435" HPOS="362"/><SP WIDTH="22" VPOS="1438" HPOS="409"/><String WC="0.3540000021" CONTENT="J54;." HEIGHT="22" WIDTH="44" VPOS="1438" HPOS="432"/><SP WIDTH="33" VPOS="1434" HPOS="477"/><String WC="0.7620000243" CONTENT="Cosmarium." HEIGHT="22" WIDTH="139" VPOS="1433" HPOS="511"/><SP WIDTH="32" VPOS="1432" HPOS="651"/><String WC="0.4550000131" CONTENT="A." HEIGHT="21" WIDTH="30" VPOS="1432" HPOS="684"/><SP WIDTH="19" VPOS="1432" HPOS="715"/><String WC="0.7699999809" CONTENT="C." HEIGHT="21" WIDTH="25" VPOS="1432" HPOS="735"/><SP WIDTH="23" VPOS="1439" HPOS="761"/><String WC="0.6628571153" CONTENT="margaritaceum," HEIGHT="28" WIDTH="184" VPOS="1431" HPOS="785"/><SP WIDTH="30" VPOS="1432" HPOS="970"/><String WC="0.4524999857" CONTENT="Fig." HEIGHT="27" WIDTH="46" VPOS="1432" HPOS="1001"/><SP WIDTH="15" VPOS="1435" HPOS="1048"/><String WC="0.5400000215" CONTENT="J35." HEIGHT="23" WIDTH="44" VPOS="1435" HPOS="1064"/><SP WIDTH="31" VPOS="1432" HPOS="1109"/><String WC="0.7572727203" CONTENT="Clostcrium." HEIGHT="23" WIDTH="134" VPOS="1430" HPOS="1141"/><SP WIDTH="27" VPOS="1431" HPOS="1276"/><String WC="0.5199999809" CONTENT="A" HEIGHT="19" WIDTH="22" VPOS="1431" HPOS="1304"/><SP WIDTH="18" VPOS="1430" HPOS="1327"/><String WC="0.6366666555" CONTENT="CI." HEIGHT="21" WIDTH="33" VPOS="1430" HPOS="1346"/><SP WIDTH="16" VPOS="1430" HPOS="1380"/><String WC="0.6342856884" CONTENT="lunula," HEIGHT="25" WIDTH="86" VPOS="1430" HPOS="1397"/><SP WIDTH="21" VPOS="1429" HPOS="1484"/><String WC="0.6314285994" CONTENT="Linzel-" HEIGHT="26" WIDTH="81" VPOS="1429" HPOS="1506"/></TextLine>
<TextLine HEIGHT="32" WIDTH="1225" VPOS="1461" HPOS="361"><String WC="0.5600000024" CONTENT="a" HEIGHT="13" WIDTH="13" VPOS="1474" HPOS="361"/><SP WIDTH="14" VPOS="1468" HPOS="375"/><String WC="0.5083333254" CONTENT="Lnizelzellp," HEIGHT="26" WIDTH="128" VPOS="1467" HPOS="390"/><SP WIDTH="15" VPOS="1467" HPOS="519"/><String WC="0.25" CONTENT="b" HEIGHT="20" WIDTH="13" VPOS="1466" HPOS="535"/><SP WIDTH="14" VPOS="1466" HPOS="549"/><String WC="0.5822222233" CONTENT="Iochspore" HEIGHT="26" WIDTH="112" VPOS="1465" HPOS="564"/><SP WIDTH="14" VPOS="1471" HPOS="677"/><String WC="0.3700000048" CONTENT="mit" HEIGHT="20" WIDTH="39" VPOS="1465" HPOS="692"/><SP WIDTH="10" VPOS="1465" HPOS="732"/><String WC="0.3100000024" CONTENT="den" HEIGHT="20" WIDTH="37" VPOS="1465" HPOS="743"/><SP WIDTH="13" VPOS="1471" HPOS="781"/><String WC="0.4350000024" CONTENT="entleerten" HEIGHT="21" WIDTH="111" VPOS="1464" HPOS="795"/><SP WIDTH="8" VPOS="1464" HPOS="907"/><String WC="0.7940000296" CONTENT="Zell-" HEIGHT="27" WIDTH="55" VPOS="1464" HPOS="916"/><SP WIDTH="28" VPOS="1471" HPOS="972"/><String WC="0.6333333254" CONTENT="zelle," HEIGHT="25" WIDTH="54" VPOS="1465" HPOS="1001"/><SP WIDTH="15" VPOS="1464" HPOS="1056"/><String WC="0.2800000012" CONTENT="B" HEIGHT="20" WIDTH="18" VPOS="1464" HPOS="1072"/><SP WIDTH="14" VPOS="1464" HPOS="1091"/><String WC="0.9233333468" CONTENT="CI." HEIGHT="21" WIDTH="32" VPOS="1464" HPOS="1106"/><SP WIDTH="15" VPOS="1471" HPOS="1139"/><String WC="0.8188889027" CONTENT="rostratum" HEIGHT="19" WIDTH="111" VPOS="1465" HPOS="1155"/><SP WIDTH="12" VPOS="1463" HPOS="1267"/><String WC="0.2399999946" CONTENT="(nad?" HEIGHT="25" WIDTH="62" VPOS="1463" HPOS="1280"/><SP WIDTH="8" VPOS="1464" HPOS="1343"/><String WC="0.2949999869" CONTENT="Präparat" HEIGHT="26" WIDTH="110" VPOS="1463" HPOS="1352"/><SP WIDTH="10" VPOS="1465" HPOS="1463"/><String WC="0.1566666663" CONTENT="uon" HEIGHT="16" WIDTH="41" VPOS="1467" HPOS="1474"/><SP WIDTH="8" VPOS="1463" HPOS="1516"/><String WC="0.3420000076" CONTENT="pvof." HEIGHT="27" WIDTH="61" VPOS="1461" HPOS="1525"/></TextLine>
<TextLine HEIGHT="33" WIDTH="1224" VPOS="1493" HPOS="362"><String WC="0.6571428776" CONTENT="häuten." HEIGHT="27" WIDTH="88" VPOS="1499" HPOS="362"/><SP WIDTH="27" VPOS="1499" HPOS="451"/><String WC="0.400000006" CONTENT="B" HEIGHT="20" WIDTH="18" VPOS="1499" HPOS="479"/><SP WIDTH="15" VPOS="1499" HPOS="498"/><String WC="0.6918181777" CONTENT="Linzelzelle" HEIGHT="27" WIDTH="120" VPOS="1497" HPOS="514"/><SP WIDTH="22" VPOS="1503" HPOS="635"/><String WC="0.453333348" CONTENT="von" HEIGHT="14" WIDTH="42" VPOS="1503" HPOS="658"/><SP WIDTH="21" VPOS="1497" HPOS="701"/><String WC="0.9250000119" CONTENT="C." HEIGHT="20" WIDTH="24" VPOS="1497" HPOS="723"/><SP WIDTH="15" VPOS="1497" HPOS="748"/><String WC="0.8562499881" CONTENT="botrytis" HEIGHT="26" WIDTH="89" VPOS="1497" HPOS="764"/><SP WIDTH="18" VPOS="1502" HPOS="854"/><String WC="0.4499999881" CONTENT="mit" HEIGHT="21" WIDTH="40" VPOS="1496" HPOS="873"/><SP WIDTH="19" VPOS="1498" HPOS="914"/><String WC="0.6700000167" CONTENT="un-" HEIGHT="15" WIDTH="38" VPOS="1502" HPOS="934"/><SP WIDTH="29" VPOS="1496" HPOS="973"/><String WC="0.5155555606" CONTENT="Homfeld)," HEIGHT="27" WIDTH="115" VPOS="1496" HPOS="1003"/><SP WIDTH="20" VPOS="1497" HPOS="1119"/><String WC="0.3355555534" CONTENT=")ochspore" HEIGHT="28" WIDTH="112" VPOS="1495" HPOS="1140"/><SP WIDTH="14" VPOS="1501" HPOS="1253"/><String WC="0.853333354" CONTENT="mit" HEIGHT="20" WIDTH="39" VPOS="1495" HPOS="1268"/><SP WIDTH="13" VPOS="1495" HPOS="1308"/><String WC="0.5233333111" CONTENT="den" HEIGHT="20" WIDTH="37" VPOS="1495" HPOS="1322"/><SP WIDTH="13" VPOS="1494" HPOS="1360"/><String WC="0.4783333242" CONTENT="leeren" HEIGHT="22" WIDTH="65" VPOS="1494" HPOS="1374"/><SP WIDTH="10" VPOS="1494" HPOS="1440"/><String WC="0.6600000262" CONTENT="Zellhäuten," HEIGHT="28" WIDTH="135" VPOS="1493" HPOS="1451"/></TextLine>
<TextLine HEIGHT="29" WIDTH="839" VPOS="1527" HPOS="568"><String WC="0.4187499881" CONTENT="gleichen" HEIGHT="27" WIDTH="90" VPOS="1529" HPOS="568"/><SP WIDTH="14" VPOS="1529" HPOS="659"/><String WC="0.6687499881" CONTENT="Hälften." HEIGHT="27" WIDTH="97" VPOS="1529" HPOS="674"/><SP WIDTH="411" VPOS="1527" HPOS="772"/><String WC="0.7599999905" CONTENT="in" HEIGHT="21" WIDTH="22" VPOS="1527" HPOS="1184"/><SP WIDTH="13" VPOS="1534" HPOS="1207"/><String WC="0.4300000072" CONTENT="zwei" HEIGHT="26" WIDTH="50" VPOS="1527" HPOS="1221"/><SP WIDTH="15" VPOS="1527" HPOS="1272"/><String WC="0.6629999876" CONTENT="Ansichten." HEIGHT="26" WIDTH="119" VPOS="1527" HPOS="1288"/></TextLine>
</TextBlock></ComposedBlock>
<TextBlock ID="Page1_Block7" HEIGHT="610" WIDTH="1241" VPOS="1578" HPOS="354" language="de" STYLEREFS="font1"><Shape><Polygon POINTS="357,1583 1596,1583 1596,2189 357,2189 357,1583"/></Shape>
<TextLine HEIGHT="49" WIDTH="1224" VPOS="1583" HPOS="363"><String WC="0.6650000215" CONTENT="Zu" HEIGHT="34" WIDTH="45" VPOS="1589" HPOS="363"/><SP WIDTH="37" VPOS="1590" HPOS="409"/><String WC="0.7360000014" CONTENT="hause" HEIGHT="43" WIDTH="97" VPOS="1589" HPOS="447"/><SP WIDTH="37" VPOS="1588" HPOS="545"/><String WC="0.7419999838" CONTENT="spült" HEIGHT="43" WIDTH="77" VPOS="1587" HPOS="583"/><SP WIDTH="32" VPOS="1589" HPOS="661"/><String WC="0.6266666651" CONTENT="man" HEIGHT="24" WIDTH="75" VPOS="1597" HPOS="694"/><SP WIDTH="37" VPOS="1587" HPOS="770"/><String WC="0.9300000072" CONTENT="die" HEIGHT="34" WIDTH="46" VPOS="1587" HPOS="808"/><SP WIDTH="36" VPOS="1596" HPOS="855"/><String WC="0.8169230819" CONTENT="mitgenommenen" HEIGHT="43" WIDTH="280" VPOS="1586" HPOS="892"/><SP WIDTH="38" VPOS="1586" HPOS="1173"/><String WC="0.7077777982" CONTENT="Pröbchen," HEIGHT="43" WIDTH="172" VPOS="1585" HPOS="1212"/><SP WIDTH="39" VPOS="1584" HPOS="1385"/><String WC="0.5366666913" CONTENT="die" HEIGHT="35" WIDTH="46" VPOS="1584" HPOS="1425"/><SP WIDTH="40" VPOS="1594" HPOS="1472"/><String WC="0.6233333349" CONTENT="man" HEIGHT="24" WIDTH="74" VPOS="1594" HPOS="1513"/></TextLine>
<TextLine HEIGHT="48" WIDTH="1224" VPOS="1639" HPOS="363"><String WC="0.6377778053" CONTENT="natürlich" HEIGHT="43" WIDTH="148" VPOS="1644" HPOS="363"/><SP WIDTH="43" VPOS="1643" HPOS="512"/><String WC="0.5960000157" CONTENT="nicht" HEIGHT="43" WIDTH="75" VPOS="1642" HPOS="556"/><SP WIDTH="41" VPOS="1642" HPOS="632"/><String WC="0.7549999952" CONTENT="literweise" HEIGHT="43" WIDTH="157" VPOS="1642" HPOS="674"/><SP WIDTH="42" VPOS="1642" HPOS="832"/><String WC="0.6299999952" CONTENT="sammelt," HEIGHT="43" WIDTH="156" VPOS="1641" HPOS="875"/><SP WIDTH="43" VPOS="1641" HPOS="1032"/><String WC="1." CONTENT="in" HEIGHT="34" WIDTH="30" VPOS="1641" HPOS="1076"/><SP WIDTH="41" VPOS="1651" HPOS="1107"/><String WC="0.6600000262" CONTENT="wenig" HEIGHT="44" WIDTH="102" VPOS="1640" HPOS="1149"/><SP WIDTH="37" VPOS="1641" HPOS="1252"/><String WC="0.6949999928" CONTENT="Wasser" HEIGHT="42" WIDTH="118" VPOS="1640" HPOS="1290"/><SP WIDTH="37" VPOS="1650" HPOS="1409"/><String WC="0.8700000048" CONTENT="ab" HEIGHT="33" WIDTH="39" VPOS="1640" HPOS="1447"/><SP WIDTH="38" VPOS="1639" HPOS="1487"/><String WC="0.3733333349" CONTENT="und" HEIGHT="33" WIDTH="61" VPOS="1639" HPOS="1526"/></TextLine>
<TextLine HEIGHT="48" WIDTH="1226" VPOS="1693" HPOS="362"><String WC="0.7250000238" CONTENT="bringt" HEIGHT="42" WIDTH="107" VPOS="1699" HPOS="362"/><SP WIDTH="43" VPOS="1700" HPOS="469"/><String WC="0.6857143044" CONTENT="winzige" HEIGHT="44" WIDTH="131" VPOS="1697" HPOS="512"/><SP WIDTH="36" VPOS="1698" HPOS="643"/><String WC="0.7214285731" CONTENT="Partien" HEIGHT="43" WIDTH="129" VPOS="1697" HPOS="679"/><SP WIDTH="46" VPOS="1697" HPOS="808"/><String WC="0.7133333087" CONTENT="des" HEIGHT="35" WIDTH="53" VPOS="1696" HPOS="854"/><SP WIDTH="46" VPOS="1706" HPOS="907"/><String WC="0.7216666937" CONTENT="abgeklopften" HEIGHT="43" WIDTH="222" VPOS="1696" HPOS="953"/><SP WIDTH="38" VPOS="1696" HPOS="1175"/><String WC="0.5181818008" CONTENT="Scf]lid?es-" HEIGHT="43" WIDTH="151" VPOS="1695" HPOS="1213"/><SP WIDTH="32" VPOS="1705" HPOS="1364"/><String WC="0.7933333516" CONTENT="mit" HEIGHT="35" WIDTH="57" VPOS="1694" HPOS="1396"/><SP WIDTH="37" VPOS="1696" HPOS="1453"/><String WC="0.7400000095" CONTENT="einem" HEIGHT="35" WIDTH="98" VPOS="1694" HPOS="1490"/></TextLine>
<TextLine HEIGHT="47" WIDTH="1224" VPOS="1749" HPOS="363"><String WC="0.7430769205" CONTENT="Wassertropfen" HEIGHT="43" WIDTH="240" VPOS="1753" HPOS="363"/><SP WIDTH="32" VPOS="1763" HPOS="604"/><String WC="0.6000000238" CONTENT="auf" HEIGHT="42" WIDTH="55" VPOS="1752" HPOS="637"/><SP WIDTH="29" VPOS="1752" HPOS="693"/><String WC="0.6359999776" CONTENT="einen" HEIGHT="34" WIDTH="87" VPOS="1752" HPOS="723"/><SP WIDTH="31" VPOS="1753" HPOS="811"/><String WC="0.7069230676" CONTENT="Objektträger." HEIGHT="44" WIDTH="233" VPOS="1751" HPOS="843"/><SP WIDTH="51" VPOS="1752" HPOS="1077"/><String WC="0.6866666675" CONTENT="Mit" HEIGHT="35" WIDTH="65" VPOS="1750" HPOS="1129"/><SP WIDTH="29" VPOS="1752" HPOS="1195"/><String WC="0.6750000119" CONTENT="zwei" HEIGHT="42" WIDTH="75" VPOS="1750" HPOS="1225"/><SP WIDTH="30" VPOS="1750" HPOS="1301"/><String WC="0.7866666913" CONTENT="feinen" HEIGHT="42" WIDTH="101" VPOS="1750" HPOS="1332"/><SP WIDTH="30" VPOS="1751" HPOS="1434"/><String WC="0.6683333516" CONTENT="Nadeln" HEIGHT="35" WIDTH="122" VPOS="1749" HPOS="1465"/></TextLine>
<TextLine HEIGHT="48" WIDTH="1224" VPOS="1804" HPOS="363"><String WC="0.7785714269" CONTENT="breitet" HEIGHT="33" WIDTH="109" VPOS="1809" HPOS="363"/><SP WIDTH="23" VPOS="1810" HPOS="473"/><String WC="0.4099999964" CONTENT="man" HEIGHT="24" WIDTH="74" VPOS="1818" HPOS="497"/><SP WIDTH="24" VPOS="1808" HPOS="572"/><String WC="0.8100000024" CONTENT="das" HEIGHT="33" WIDTH="56" VPOS="1808" HPOS="597"/><SP WIDTH="19" VPOS="1808" HPOS="654"/><String WC="0.7633333206" CONTENT="Klümpchen" HEIGHT="43" WIDTH="186" VPOS="1807" HPOS="674"/><SP WIDTH="24" VPOS="1817" HPOS="861"/><String WC="0.678888917" CONTENT="möglichst" HEIGHT="44" WIDTH="151" VPOS="1806" HPOS="886"/><SP WIDTH="23" VPOS="1809" HPOS="1038"/><String WC="0.6850000024" CONTENT="weit" HEIGHT="34" WIDTH="71" VPOS="1806" HPOS="1062"/><SP WIDTH="23" VPOS="1809" HPOS="1134"/><String WC="0.6025000215" CONTENT="aus," HEIGHT="33" WIDTH="68" VPOS="1816" HPOS="1158"/><SP WIDTH="25" VPOS="1805" HPOS="1227"/><String WC="0.7080000043" CONTENT="damit" HEIGHT="34" WIDTH="98" VPOS="1805" HPOS="1253"/><SP WIDTH="23" VPOS="1807" HPOS="1352"/><String WC="1." CONTENT="es" HEIGHT="24" WIDTH="31" VPOS="1815" HPOS="1376"/><SP WIDTH="25" VPOS="1807" HPOS="1408"/><String WC="0.8366666436" CONTENT="übersicht" HEIGHT="44" WIDTH="140" VPOS="1804" HPOS="1434" SUBS_TYPE="HypPart1" SUBS_CONTENT="übersichtlich"/><HYP CONTENT="­"/></TextLine>
<TextLine HEIGHT="48" WIDTH="1224" VPOS="1859" HPOS="363"><String WC="0.6650000215" CONTENT="lich" HEIGHT="43" WIDTH="52" VPOS="1864" HPOS="363" SUBS_TYPE="HypPart2" SUBS_CONTENT="übersichtlich"/><SP WIDTH="31" VPOS="1864" HPOS="416"/><String WC="0.5849999785" CONTENT="wird" HEIGHT="33" WIDTH="76" VPOS="1864" HPOS="448"/><SP WIDTH="31" VPOS="1863" HPOS="525"/><String WC="0.9066666961" CONTENT="und" HEIGHT="34" WIDTH="61" VPOS="1862" HPOS="557"/><SP WIDTH="31" VPOS="1862" HPOS="619"/><String WC="0.8728571534" CONTENT="bedeckt" HEIGHT="34" WIDTH="119" VPOS="1862" HPOS="651"/><SP WIDTH="30" VPOS="1863" HPOS="771"/><String WC="0.7833333611" CONTENT="das" HEIGHT="33" WIDTH="57" VPOS="1863" HPOS="802"/><SP WIDTH="24" VPOS="1862" HPOS="860"/><String WC="0.7537500262" CONTENT="Präparat" HEIGHT="43" WIDTH="161" VPOS="1862" HPOS="885"/><SP WIDTH="27" VPOS="1863" HPOS="1047"/><String WC="0.7566666603" CONTENT="mit" HEIGHT="34" WIDTH="56" VPOS="1861" HPOS="1075"/><SP WIDTH="24" VPOS="1863" HPOS="1132"/><String WC="0.7179999948" CONTENT="einem" HEIGHT="34" WIDTH="96" VPOS="1861" HPOS="1157"/><SP WIDTH="24" VPOS="1861" HPOS="1254"/><String WC="0.6629999876" CONTENT="veckglase." HEIGHT="42" WIDTH="171" VPOS="1861" HPOS="1279"/><SP WIDTH="47" VPOS="1860" HPOS="1451"/><String WC="1." CONTENT="Beim" HEIGHT="34" WIDTH="88" VPOS="1859" HPOS="1499"/></TextLine>
<TextLine HEIGHT="48" WIDTH="1223" VPOS="1914" HPOS="364"><String WC="0.5649999976" CONTENT="Züchen" HEIGHT="43" WIDTH="115" VPOS="1919" HPOS="364"/><SP WIDTH="24" VPOS="1929" HPOS="480"/><String WC="0.8666666746" CONTENT="mit" HEIGHT="35" WIDTH="55" VPOS="1918" HPOS="505"/><SP WIDTH="22" VPOS="1919" HPOS="561"/><String WC="0.8566666842" CONTENT="mittlerer" HEIGHT="33" WIDTH="148" VPOS="1918" HPOS="584"/><SP WIDTH="24" VPOS="1918" HPOS="733"/><String WC="0.6583333611" CONTENT="Vergrößerung" HEIGHT="44" WIDTH="238" VPOS="1917" HPOS="758"/><SP WIDTH="31" VPOS="1927" HPOS="997"/><String WC="0.4524999857" CONTENT="wird" HEIGHT="34" WIDTH="78" VPOS="1916" HPOS="1029"/><SP WIDTH="24" VPOS="1917" HPOS="1108"/><String WC="0.6800000072" CONTENT="man" HEIGHT="25" WIDTH="73" VPOS="1926" HPOS="1133"/><SP WIDTH="25" VPOS="1916" HPOS="1207"/><String WC="0.6316666603" CONTENT="Formen" HEIGHT="42" WIDTH="132" VPOS="1916" HPOS="1233"/><SP WIDTH="24" VPOS="1915" HPOS="1366"/><String WC="0.7300000191" CONTENT="finden," HEIGHT="43" WIDTH="116" VPOS="1914" HPOS="1391"/><SP WIDTH="32" VPOS="1915" HPOS="1508"/><String WC="0.8633333445" CONTENT="die" HEIGHT="33" WIDTH="46" VPOS="1915" HPOS="1541"/></TextLine>
<TextLine HEIGHT="48" WIDTH="1222" VPOS="1969" HPOS="365"><String WC="0.6333333254" CONTENT="aus" HEIGHT="23" WIDTH="58" VPOS="1984" HPOS="365"/><SP WIDTH="29" VPOS="1984" HPOS="424"/><String WC="0.7825000286" CONTENT="zwei" HEIGHT="41" WIDTH="74" VPOS="1974" HPOS="454"/><SP WIDTH="31" VPOS="1973" HPOS="529"/><String WC="0.6437500119" CONTENT="einander" HEIGHT="34" WIDTH="147" VPOS="1972" HPOS="561"/><SP WIDTH="30" VPOS="1983" HPOS="709"/><String WC="0.6938889027" CONTENT="gegenüberstehenden" HEIGHT="44" WIDTH="336" VPOS="1972" HPOS="740"/><SP WIDTH="32" VPOS="1972" HPOS="1077"/><String WC="0.5190908909" CONTENT="Halbkreisen" HEIGHT="45" WIDTH="198" VPOS="1970" HPOS="1110"/><SP WIDTH="33" VPOS="1970" HPOS="1309"/><String WC="0.2849999964" CONTENT="in" HEIGHT="34" WIDTH="31" VPOS="1970" HPOS="1343"/><SP WIDTH="33" VPOS="1970" HPOS="1375"/><String WC="0.8033333421" CONTENT="der" HEIGHT="33" WIDTH="51" VPOS="1970" HPOS="1409"/><SP WIDTH="30" VPOS="1971" HPOS="1461"/><String WC="0.8820000291" CONTENT="Mitte" HEIGHT="35" WIDTH="95" VPOS="1969" HPOS="1492"/></TextLine>
<TextLine HEIGHT="49" WIDTH="1227" VPOS="2023" HPOS="361"><String WC="0.6323529482" CONTENT="zusammengewachsen" HEIGHT="43" WIDTH="350" VPOS="2028" HPOS="361"/><SP WIDTH="32" VPOS="2038" HPOS="711"/><String WC="0.7599999905" CONTENT="erscheinen" HEIGHT="44" WIDTH="163" VPOS="2027" HPOS="743"/><SP WIDTH="26" VPOS="2025" HPOS="906"/><String WC="0.8854545355" CONTENT="(Cosmarium," HEIGHT="44" WIDTH="238" VPOS="2025" HPOS="932"/><SP WIDTH="31" VPOS="2026" HPOS="1170"/><String WC="0.7774999738" CONTENT="Fig." HEIGHT="42" WIDTH="68" VPOS="2026" HPOS="1201"/><SP WIDTH="30" VPOS="2029" HPOS="1269"/><String WC="0.6140000224" CONTENT="134)," HEIGHT="43" WIDTH="84" VPOS="2024" HPOS="1299"/><SP WIDTH="34" VPOS="2035" HPOS="1383"/><String WC="0.6825000048" CONTENT="oder" HEIGHT="33" WIDTH="72" VPOS="2025" HPOS="1417"/><SP WIDTH="24" VPOS="2034" HPOS="1489"/><String WC="0.6833333373" CONTENT="man" HEIGHT="24" WIDTH="75" VPOS="2034" HPOS="1513"/></TextLine>
<TextLine HEIGHT="47" WIDTH="1223" VPOS="2079" HPOS="365"><String WC="0.7799999714" CONTENT="findet" HEIGHT="41" WIDTH="94" VPOS="2083" HPOS="365"/><SP WIDTH="18" VPOS="2083" HPOS="460"/><String WC="0.8355555534" CONTENT="türkische" HEIGHT="44" WIDTH="142" VPOS="2082" HPOS="479"/><SP WIDTH="15" VPOS="2083" HPOS="622"/><String WC="0.6140000224" CONTENT="Halbmonde," HEIGHT="43" WIDTH="203" VPOS="2082" HPOS="638"/><SP WIDTH="20" VPOS="2083" HPOS="842"/><String WC="0.7233333588" CONTENT="die" HEIGHT="34" WIDTH="46" VPOS="2082" HPOS="863"/><SP WIDTH="21" VPOS="2092" HPOS="910"/><String WC="0.5899999738" CONTENT="genau" HEIGHT="33" WIDTH="101" VPOS="2091" HPOS="932"/><SP WIDTH="17" VPOS="2081" HPOS="1034"/><String WC="0.6620000005" CONTENT="durch" HEIGHT="43" WIDTH="86" VPOS="2081" HPOS="1052"/><SP WIDTH="20" VPOS="2081" HPOS="1139"/><String WC="0.6340000033" CONTENT="einen" HEIGHT="35" WIDTH="87" VPOS="2080" HPOS="1160"/><SP WIDTH="15" VPOS="2081" HPOS="1248"/><String WC="0.7910000086" CONTENT="Ouerstrich" HEIGHT="43" WIDTH="168" VPOS="2080" HPOS="1264"/><SP WIDTH="21" VPOS="2080" HPOS="1433"/><String WC="0.5950000286" CONTENT="halbiert" HEIGHT="44" WIDTH="133" VPOS="2079" HPOS="1455"/></TextLine>
<TextLine HEIGHT="50" WIDTH="1222" VPOS="2133" HPOS="365"><String WC="0.5674999952" CONTENT="sind" HEIGHT="43" WIDTH="62" VPOS="2137" HPOS="365"/><SP WIDTH="37" VPOS="2137" HPOS="428"/><String WC="0.8000000119" CONTENT="und" HEIGHT="34" WIDTH="61" VPOS="2137" HPOS="466"/><SP WIDTH="38" VPOS="2136" HPOS="528"/><String WC="0.6499999762" CONTENT="an" HEIGHT="24" WIDTH="40" VPOS="2147" HPOS="567"/><SP WIDTH="33" VPOS="2137" HPOS="608"/><String WC="0.8183333278" CONTENT="beiden" HEIGHT="35" WIDTH="107" VPOS="2137" HPOS="642"/><SP WIDTH="34" VPOS="2138" HPOS="750"/><String WC="0.4499999881" CONTENT="Enden" HEIGHT="34" WIDTH="106" VPOS="2137" HPOS="785"/><SP WIDTH="36" VPOS="2137" HPOS="892"/><String WC="0.8600000143" CONTENT="je" HEIGHT="44" WIDTH="27" VPOS="2137" HPOS="929"/><SP WIDTH="34" VPOS="2146" HPOS="957"/><String WC="0.7225000262" CONTENT="eine" HEIGHT="34" WIDTH="64" VPOS="2136" HPOS="992"/><SP WIDTH="33" VPOS="2136" HPOS="1057"/><String WC="0.9139999747" CONTENT="kreisrunde" HEIGHT="35" WIDTH="180" VPOS="2135" HPOS="1091"/><SP WIDTH="33" VPOS="2135" HPOS="1272"/><String WC="0.6079999804" CONTENT="Blase" HEIGHT="43" WIDTH="89" VPOS="2135" HPOS="1306"/><SP WIDTH="33" VPOS="2145" HPOS="1396"/><String WC="0.7266666889" CONTENT="enthalten" HEIGHT="46" WIDTH="157" VPOS="2133" HPOS="1430"/></TextLine>
</TextBlock><GraphicalElement ID="Page1_Block8" HEIGHT="184" WIDTH="8" VPOS="900" HPOS="1258"/><GraphicalElement ID="Page1_Block9" HEIGHT="90" WIDTH="3" VPOS="896" HPOS="1427"/><GraphicalElement ID="Page1_Block10" HEIGHT="146" WIDTH="7" VPOS="885" HPOS="1544"/>
</PrintSpace>
</Page>
</Layout>
</alto>

@ -0,0 +1,37 @@
<?xml version="1.0" encoding="UTF-8"?>
<alto xmlns="http://www.loc.gov/standards/alto/ns-v3#">
<Layout>
<Page WIDTH="1148" HEIGHT="1852" PHYSICAL_IMG_NR="0" ID="page_0">
<PrintSpace HPOS="0" VPOS="0" WIDTH="1148" HEIGHT="1852">
<TextBlock ID="block_3" HPOS="135" VPOS="251" WIDTH="741" HEIGHT="47">
<TextLine ID="line_3" HPOS="135" VPOS="251" WIDTH="741" HEIGHT="47">
<String ID="string_5" HPOS="135" VPOS="251" WIDTH="65" HEIGHT="34" WC="0.89" CONTENT="über"/><SP WIDTH="19" VPOS="251" HPOS="200"/>
<String ID="string_6" HPOS="219" VPOS="256" WIDTH="41" HEIGHT="31" WC="0.96" CONTENT="die"/><SP WIDTH="23" VPOS="256" HPOS="260"/>
<String ID="string_7" HPOS="283" VPOS="258" WIDTH="87" HEIGHT="30" WC="0.87" CONTENT="vielen"/><SP WIDTH="16" VPOS="258" HPOS="370"/>
<String ID="string_8" HPOS="386" VPOS="259" WIDTH="118" HEIGHT="37" WC="0.96" CONTENT="Sorgen"/><SP WIDTH="14" VPOS="259" HPOS="504"/>
<String ID="string_9" HPOS="518" VPOS="265" WIDTH="90" HEIGHT="32" WC="0.21" CONTENT="wegen"/><SP WIDTH="12" VPOS="265" HPOS="608"/>
<String ID="string_10" HPOS="620" VPOS="254" WIDTH="130" HEIGHT="42" WC="0.21" CONTENT="deſſelben"/><SP WIDTH="24" VPOS="254" HPOS="750"/>
<String ID="string_11" HPOS="774" VPOS="255" WIDTH="102" HEIGHT="43" WC="0.74" CONTENT="vergaß"/>
</TextLine>
</TextBlock>
<TextBlock ID="block_4" HPOS="134" VPOS="304" WIDTH="740" HEIGHT="40">
<TextLine ID="line_4" HPOS="134" VPOS="304" WIDTH="740" HEIGHT="40">
<String ID="string_12" HPOS="134" VPOS="304" WIDTH="203" HEIGHT="40" WC="0.75" CONTENT="Hartkopf,"/><SP WIDTH="30" VPOS="304" HPOS="337"/>
<String ID="string_13" HPOS="367" VPOS="310" WIDTH="45" HEIGHT="27" WC="0.93" CONTENT="der"/><SP WIDTH="24" VPOS="310" HPOS="412"/>
<String ID="string_14" HPOS="436" VPOS="309" WIDTH="74" HEIGHT="35" WC="0.59" CONTENT="Frau"/><SP WIDTH="22" VPOS="309" HPOS="510"/>
<String ID="string_15" HPOS="532" VPOS="306" WIDTH="189" HEIGHT="36" WC="0.23" CONTENT="Amtmännin"/><SP WIDTH="16" VPOS="306" HPOS="721"/>
<String ID="string_16" HPOS="737" VPOS="307" WIDTH="66" HEIGHT="34" WC="0.52" CONTENT="das"/><SP WIDTH="16" VPOS="307" HPOS="803"/>
<String ID="string_17" HPOS="819" VPOS="318" WIDTH="55" HEIGHT="24" WC="0.0" CONTENT="ver-"/>
</TextLine>
</TextBlock>
<TextBlock ID="block_5" HPOS="134" VPOS="356" WIDTH="761" HEIGHT="46">
<TextLine ID="line_5" HPOS="134" VPOS="356" WIDTH="761" HEIGHT="46">
<String ID="string_18" HPOS="134" VPOS="356" WIDTH="137" HEIGHT="37" WC="0.92" CONTENT="ſprochene"/><SP WIDTH="31" VPOS="356" HPOS="271"/>
<String ID="string_19" HPOS="302" VPOS="365" WIDTH="32" HEIGHT="30" WC="0.73" CONTENT="zu"/><SP WIDTH="29" VPOS="365" HPOS="334"/>
<String ID="string_20" HPOS="363" VPOS="356" WIDTH="170" HEIGHT="39" WC="0.52" CONTENT="überliefern."/><SP WIDTH="28" VPOS="356" HPOS="533"/>
</TextLine>
</TextBlock>
</PrintSpace>
</Page>
</Layout>
</alto>

File diff suppressed because it is too large Load Diff

@ -0,0 +1 @@
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.

@ -0,0 +1,108 @@
from .util import unzip
from .. import align, seq_align, distance
def test_left_empty():
result = list(align('', 'foo'))
expected = [(None, 'f'), (None, 'o'), (None, 'o')]
assert result == expected
def test_right_empty():
result = list(align('foo', ''))
expected = [('f', None), ('o', None), ('o', None)]
assert result == expected
def test_left_longer():
result = list(align('food', 'foo'))
expected = [('f', 'f'), ('o', 'o'), ('o', 'o'), ('d', None)]
assert result == expected
def test_right_longer():
result = list(align('foo', 'food'))
expected = [('f', 'f'), ('o', 'o'), ('o', 'o'), (None, 'd')]
assert result == expected
def test_some_diff():
result = list(align('abcde', 'aaadef'))
left, right = unzip(result)
assert list(left) == ['a', 'b', 'c', 'd', 'e', None]
assert list(right) == ['a', 'a', 'a', 'd', 'e', 'f']
def test_longer():
s1 = 'Dies ist eine Tst!'
s2 = 'Dies ist ein Test.'
result = list(align(s1, s2)) # ; diffprint(*unzip(result))
expected = [('D', 'D'), ('i', 'i'), ('e', 'e'), ('s', 's'), (' ', ' '),
('i', 'i'), ('s', 's'), ('t', 't'), (' ', ' '),
('e', 'e'), ('i', 'i'), ('n', 'n'), ('e', None), (' ', ' '),
('T', 'T'), (None, 'e'), ('s', 's'), ('t', 't'), ('!', '.')]
assert result == expected
def test_completely_different():
assert len(list(align('abcde', 'fghij'))) == 5
def test_with_some_fake_ocr_errors():
result = list(align('Über die vielen Sorgen wegen desselben vergaß',
'SomeJunk MoreJunk Übey die vielen Sorgen wegen AdditionalJunk deffelben vcrgab'))
left, right = unzip(result)
# Beginning
assert list(left[:18]) == [None]*18
assert list(right[:18]) == list('SomeJunk MoreJunk ')
# End
assert list(left[-1:]) == ['ß']
assert list(right[-1:]) == ['b']
def test_lines():
"""Test comparing list of lines.
This mainly serves as documentation for comparing lists of lines.
"""
result = list(seq_align(
['This is a line.', 'This is another', 'And the last line'],
['This is a line.', 'This is another', 'J u n k', 'And the last line']
))
left, right = unzip(result)
assert list(left) == ['This is a line.', 'This is another', None, 'And the last line']
assert list(right) == ['This is a line.', 'This is another', 'J u n k', 'And the last line']
def test_lines_similar():
"""Test comparing list of lines while using a "weaker equivalence".
This mainly serves as documentation.
"""
class SimilarString:
def __init__(self, string):
self._string = string
def __eq__(self, other):
return distance(self._string, other._string) < 2 # XXX NOT the final version
def __ne__(self, other):
return not self.__eq__(other)
def __repr__(self):
return 'SimilarString(\'%s\')' % self._string
def __hash__(self):
return hash(self._string)
result = list(seq_align(
[SimilarString('This is a line.'), SimilarString('This is another'), SimilarString('And the last line')],
[SimilarString('This is a ljne.'), SimilarString('This is another'), SimilarString('J u n k'), SimilarString('And the last line')]
))
left, right = unzip(result)
assert list(left) == [SimilarString('This is a line.'), SimilarString('This is another'), None, SimilarString('And the last line')]
assert list(right) == [SimilarString('This is a ljne.'), SimilarString('This is another'), SimilarString('J u n k'), SimilarString('And the last line')]

@ -0,0 +1,37 @@
from __future__ import division, print_function
import math
import unicodedata
from .. import character_error_rate
def test_character_error_rate():
assert character_error_rate('a', 'a') == 0
assert character_error_rate('a', 'b') == 1/1
assert character_error_rate('Foo', 'Bar') == 3/3
assert character_error_rate('Foo', '') == 3/3
assert character_error_rate('', '') == 0
assert math.isinf(character_error_rate('', 'Foo'))
assert character_error_rate('Foo', 'Food') == 1/3
assert character_error_rate('Fnord', 'Food') == 2/5
assert character_error_rate('Müll', 'Mull') == 1/4
assert character_error_rate('Abstand', 'Sand') == 4/7
def test_character_error_rate_hard():
s1 = unicodedata.normalize('NFC', 'Schlyñ lorem ipsum.')
s2 = unicodedata.normalize('NFD', 'Schlyñ lorem ipsum!') # Different, decomposed!
assert character_error_rate(s1, s2) == 1/19
s1 = 'Schlyñ'
assert len(s1) == 6 # This ends with LATIN SMALL LETTER N WITH TILDE, so 6 code points
s2 = 'Schlym̃'
assert len(s2) == 7 # This, OTOH, ends with LATIN SMALL LETTER M + COMBINING TILDE, 7 code points
# Both strings have the same length in terms of grapheme clusters. So the CER should be symmetrical.
assert character_error_rate(s2, s1) == 1/6
assert character_error_rate(s1, s2) == 1/6

@ -0,0 +1,40 @@
from __future__ import division, print_function
import unicodedata
from .. import levenshtein, distance
def test_levenshtein():
assert levenshtein('a', 'a') == 0
assert levenshtein('a', 'b') == 1
assert levenshtein('Foo', 'Bar') == 3
assert levenshtein('', '') == 0
assert levenshtein('Foo', '') == 3
assert levenshtein('', 'Foo') == 3
assert levenshtein('Foo', 'Food') == 1
assert levenshtein('Fnord', 'Food') == 2
assert levenshtein('Müll', 'Mull') == 1
assert levenshtein('Abstand', 'Sand') == 4
def test_levenshtein_other_sequences():
assert levenshtein(['a', 'ab'], ['a', 'ab', 'c']) == 1
assert levenshtein(['a', 'ab'], ['a', 'c']) == 1
def test_distance():
assert distance('Fnord', 'Food') == 2
assert distance('Müll', 'Mull') == 1
word1 = unicodedata.normalize('NFC', 'Schlyñ')
word2 = unicodedata.normalize('NFD', 'Schlyñ') # Different, decomposed!
assert distance(word1, word2) == 0
word1 = 'Schlyñ'
assert len(word1) == 6 # This ends with LATIN SMALL LETTER N WITH TILDE, so 6 code points
word2 = 'Schlym̃'
assert len(word2) == 7 # This, OTOH, ends with LATIN SMALL LETTER M + COMBINING TILDE, 7 code points
assert distance(word1, word2) == 1

@ -0,0 +1,48 @@
import unicodedata
from .. import seq_editops, editops
def test_trivial():
assert seq_editops('abc', 'abc') == []
assert seq_editops('', '') == []
def test_insert():
assert seq_editops('bc', 'abc') == [('insert', 0, 0)]
assert seq_editops('ac', 'abc') == [('insert', 1, 1)]
assert seq_editops('ab', 'abc') == [('insert', 2, 2)]
assert seq_editops('', 'a') == [('insert', 0, 0)]
def test_multiple():
assert seq_editops('bcd', 'abce') == [('insert', 0, 0), ('replace', 2, 3)]
def test_delete():
assert seq_editops('abcdef', 'cdef') == [('delete', 0, 0), ('delete', 1, 0)]
assert seq_editops('Xabcdef', 'Xcdef') == [('delete', 1, 1), ('delete', 2, 1)]
assert seq_editops('abcdefg', 'acdefX') == [('delete', 1, 1), ('replace', 6, 5)]
assert seq_editops('abcde', 'aabcd') == [('insert', 1, 1), ('delete', 4, 5)]
assert seq_editops('Foo', '') == [('delete', 0, 0), ('delete', 1, 0), ('delete', 2, 0)]
assert seq_editops('Foolish', 'Foo') == [('delete', 3, 3), ('delete', 4, 3), ('delete', 5, 3), ('delete', 6, 3)]
def test_ambiguous():
assert seq_editops('bcd', 'abcef') == [('insert', 0, 0), ('replace', 2, 3), ('insert', 3, 4)]
def test_editops():
"""Test editops() in cases where dealing with grapheme clusters matters"""
# In these cases, one of the words has a composed form, the other one does not.
assert editops('Schlyñ', 'Schlym̃') == [('replace', 5, 5)]
assert editops('oͤde', 'öde') == [('replace', 0, 0)]
def test_editops_canonically_equivalent():
left = unicodedata.lookup('LATIN SMALL LETTER N') + unicodedata.lookup('COMBINING TILDE')
right = unicodedata.lookup('LATIN SMALL LETTER N WITH TILDE')
assert left != right
assert unicodedata.normalize('NFC', left) == unicodedata.normalize('NFC', right)
assert editops(left, right) == []

@ -0,0 +1,23 @@
from __future__ import division, print_function
import os
import pytest
from lxml import etree as ET
from .. import align, page_text
data_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'data')
@pytest.mark.integration
def test_align_page_files():
# In the fake OCR file, we changed 2 characters and replaced a fi ligature with fi.
# → 4 elements in the alignment should be different.
# NOTE: In this example, it doesn't matter that we work with "characters", not grapheme clusters.
gt = page_text(ET.parse(os.path.join(data_dir, 'test-gt.page2018.xml')))
ocr = page_text(ET.parse(os.path.join(data_dir, 'test-fake-ocr.page2018.xml')))
result = list(align(gt, ocr))
assert sum(left != right for left, right in result) == 4

@ -0,0 +1,35 @@
from __future__ import division, print_function
import os
import pytest
from lxml import etree as ET
from .. import character_error_rate, page_text, alto_text
data_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'data')
@pytest.mark.integration
def test_character_error_rate_between_page_files():
# In the fake OCR file, we changed 2 characters and replaced a fi ligature with fi.
gt = page_text(ET.parse(os.path.join(data_dir, 'test-gt.page2018.xml')))
ocr = page_text(ET.parse(os.path.join(data_dir, 'test-fake-ocr.page2018.xml')))
assert character_error_rate(gt, ocr) == 4/(470 + 1 + 311) # 2 TextRegions, 1 \n
@pytest.mark.integration
def test_character_error_rate_between_page_alto():
gt = page_text(ET.parse(os.path.join(data_dir, 'lorem-ipsum', 'lorem-ipsum-scan.gt.page.xml')))
ocr = alto_text(ET.parse(os.path.join(data_dir, 'lorem-ipsum', 'lorem-ipsum-scan.ocr.tesseract.alto.xml')))
assert gt == ocr
assert character_error_rate(gt, ocr) == 0
@pytest.mark.integration
def test_character_error_rate_between_page_alto_2():
gt = page_text(ET.parse(os.path.join(data_dir, 'lorem-ipsum', 'lorem-ipsum-scan-bad.gt.page.xml')))
ocr = alto_text(ET.parse(os.path.join(data_dir, 'lorem-ipsum', 'lorem-ipsum-scan-bad.ocr.tesseract.alto.xml')))
assert character_error_rate(gt, ocr) == 8/591 # Manually verified

@ -0,0 +1,39 @@
import os
import json
import pytest
from .util import working_directory
from ..cli import process
def test_cli_json(tmp_path):
"""Test that the cli/process() yields a loadable JSON report"""
# XXX Path.__str__() is necessary for Python 3.5
with working_directory(str(tmp_path)):
with open('gt.txt', 'w') as gtf:
gtf.write('AAAAA')
with open('ocr.txt', 'w') as ocrf:
ocrf.write('AAAAB')
process('gt.txt', 'ocr.txt', 'report')
with open('report.json', 'r') as jsonf:
j = json.load(jsonf)
assert j['cer'] == pytest.approx(0.2)
def test_cli_json_cer_is_infinity(tmp_path):
"""Test that the cli/process() yields a loadable JSON report when CER == inf"""
# XXX Path.__str__() is necessary for Python 3.5
with working_directory(str(tmp_path)):
with open('gt.txt', 'w') as gtf:
gtf.write('') # Empty to yield CER == inf
with open('ocr.txt', 'w') as ocrf:
ocrf.write('Not important')
process('gt.txt', 'ocr.txt', 'report')
with open('report.json', 'r') as jsonf:
j = json.load(jsonf)
assert j['cer'] == pytest.approx(float('inf'))

@ -0,0 +1,35 @@
from __future__ import division, print_function
import os
import pytest
from lxml import etree as ET
from .. import distance, page_text, alto_text
data_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'data')
@pytest.mark.integration
def test_distance_between_page_files():
# In the fake OCR file, we changed 2 characters and replaced a fi ligature with fi.
gt = page_text(ET.parse(os.path.join(data_dir, 'test-gt.page2018.xml')))
ocr = page_text(ET.parse(os.path.join(data_dir, 'test-fake-ocr.page2018.xml')))
assert distance(gt, ocr) == 4
@pytest.mark.integration
def test_distance_between_page_alto():
gt = page_text(ET.parse(os.path.join(data_dir, 'lorem-ipsum', 'lorem-ipsum-scan.gt.page.xml')))
ocr = alto_text(ET.parse(os.path.join(data_dir, 'lorem-ipsum', 'lorem-ipsum-scan.ocr.tesseract.alto.xml')))
assert gt == ocr
assert distance(gt, ocr) == 0
@pytest.mark.integration
def test_distance_between_page_alto_2():
gt = page_text(ET.parse(os.path.join(data_dir, 'lorem-ipsum', 'lorem-ipsum-scan-bad.gt.page.xml')))
ocr = alto_text(ET.parse(os.path.join(data_dir, 'lorem-ipsum', 'lorem-ipsum-scan-bad.ocr.tesseract.alto.xml')))
assert distance(gt, ocr) == 8 # Manually verified

@ -0,0 +1,37 @@
import os
import re
import shutil
import json
from pathlib import Path
from click.testing import CliRunner
import pytest
from .util import working_directory
from ..ocrd_cli import ocrd_dinglehopper
data_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'data')
def test_ocrd_cli(tmp_path):
"""Test OCR-D interface"""
# XXX Path.str() is necessary for Python 3.5
# Copy test workspace
test_workspace_dir_source = Path(data_dir) / 'actevedef_718448162'
test_workspace_dir = tmp_path / 'test_ocrd_cli'
shutil.copytree(str(test_workspace_dir_source), str(test_workspace_dir))
# Run through the OCR-D interface
with working_directory(str(test_workspace_dir)):
runner = CliRunner()
result = runner.invoke(ocrd_dinglehopper, [
'-m', 'mets.xml',
'-I', 'OCR-D-GT-PAGE,OCR-D-OCR-CALAMARI',
'-O', 'OCR-D-OCR-CALAMARI-EVAL'
])
assert result.exit_code == 0
result_json = list((test_workspace_dir / 'OCR-D-OCR-CALAMARI-EVAL').glob('*.json'))
assert json.load(open(str(result_json[0])))['cer'] < 0.03

@ -0,0 +1,43 @@
from __future__ import division, print_function
import os
import pytest
from lxml import etree as ET
from .. import word_error_rate, words, page_text, alto_text
data_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'data')
@pytest.mark.integration
def test_word_error_rate_between_page_files():
# In the fake OCR file, we changed 2 characters and replaced a fi ligature with fi. → 3 changed words
gt = page_text(ET.parse(os.path.join(data_dir, 'test-gt.page2018.xml')))
gt_word_count = 7+6+5+8+7+6+7+8+6+7+7+5+6+8+8+7+7+6+5+4 # Manually verified word count per line
assert len(list(words(gt))) == gt_word_count
ocr = page_text(ET.parse(os.path.join(data_dir, 'test-fake-ocr.page2018.xml')))
assert word_error_rate(gt, ocr) == 3/gt_word_count
@pytest.mark.integration
def test_word_error_rate_between_page_alto():
gt = page_text(ET.parse(os.path.join(data_dir, 'lorem-ipsum', 'lorem-ipsum-scan.gt.page.xml')))
ocr = alto_text(ET.parse(os.path.join(data_dir, 'lorem-ipsum', 'lorem-ipsum-scan.ocr.tesseract.alto.xml')))
assert gt == ocr
assert word_error_rate(gt, ocr) == 0
@pytest.mark.integration
def test_word_error_rate_between_page_alto_2():
gt = page_text(ET.parse(os.path.join(data_dir, 'lorem-ipsum', 'lorem-ipsum-scan-bad.gt.page.xml')))
gt_word_count = 14+18+17+14+17+17+3 # Manually verified word count per line
assert len(list(words(gt))) == gt_word_count
ocr = alto_text(ET.parse(os.path.join(data_dir, 'lorem-ipsum', 'lorem-ipsum-scan-bad.ocr.tesseract.alto.xml')))
assert word_error_rate(gt, ocr) == 7/gt_word_count # Manually verified, 6 words are wrong, 1 got split (=2 errors)

@ -0,0 +1,110 @@
import os
import re
import lxml.etree as ET
import textwrap
import pytest
from .. import alto_namespace, alto_text, page_namespace, page_text, text
data_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'data')
def test_alto_namespace():
tree = ET.parse(os.path.join(data_dir, 'test.alto3.xml'))
assert alto_namespace(tree) == 'http://www.loc.gov/standards/alto/ns-v3#'
def test_alto_text():
tree = ET.parse(os.path.join(data_dir, 'test.alto3.xml'))
result = alto_text(tree)
expected = textwrap.dedent("""\
über die vielen Sorgen wegen deſſelben vergaß
Hartkopf, der Frau Amtmännin das ver-
ſprochene zu überliefern.""")
assert result == expected
def test_alto_text_ALTO1():
tree = ET.parse(os.path.join(data_dir, 'test.alto1.xml'))
assert "being erected at the Broadway stock" in alto_text(tree)
def test_alto_text_ALTO2():
tree = ET.parse(os.path.join(data_dir, 'test.alto2.xml'))
assert "Halbmonde, die genau durch einen Ouerstrich halbiert\nsind und an beiden Enden" in alto_text(tree)
def test_alto_text_ALTO3():
tree = ET.parse(os.path.join(data_dir, 'test.alto3.xml'))
assert "über die vielen Sorgen wegen deſſelben vergaß" in alto_text(tree)
def test_page_namespace():
tree = ET.parse(os.path.join(data_dir, 'test.page2018.xml'))
assert page_namespace(tree) == 'http://schema.primaresearch.org/PAGE/gts/pagecontent/2018-07-15'
def test_page_test():
tree = ET.parse(os.path.join(data_dir, 'test.page2018.xml'))
result = page_text(tree)
expected = textwrap.dedent("""\
ber die vielen Sorgen wegen deelben vergaß
Hartkopf, der Frau Amtmnnin das ver
ſproene zu berliefern. Ein Erpreer
wurde an ihn abgeſit, um ihn ums Him
melswien zu ſagen, daß er das Verſproene
glei den Augenbli berbringen mte, die
Frau Amtmnnin htte auf ihn verlaen,
und nun wßte e nit, was e anfangen
ſote. Den Augenbli ſote er kommen,
ſon vergieng e in ihrer Ang. Die
Ge wren ſon angekommen, und es fehlte
ihr do no an aem.
Hartkopf mußte er bennen, und
endli na langem Nadenken fiel es ihm er
wieder ein. Er langte den Zettel aus dem
Accisbue heraus, und ſagte ſeiner Frau, daß
e das, was da wre, herbeyſaffen mte.
Jndeß mangelten do einige Generalia, die
alſo wegfielen. Hartkopf gieng ſelb
mit und berbrate es. """)
assert result == expected
def test_page_with_empty_region():
# This file contains an empty TextRegion:
#
# <TextRegion id="region0000">
# <Coords points="488,133 1197,133 1197,193 488,193"/>
# <TextEquiv>
# <Unicode></Unicode>
# </TextEquiv>
# </TextRegion>
tree = ET.parse(os.path.join(data_dir, 'brochrnx_73075507X/00000139.ocrd-tess.ocr.page.xml'))
result = page_text(tree)
assert result
def test_page_order():
# This file contains TextRegions where file order is not the same as reading order.
tree = ET.parse(os.path.join(data_dir, 'order.page.xml'))
result = page_text(tree)
assert re.search(r'Herr Konfrater.*75.*Etwas f.r Wittwen.*Ein gewi.er Lord.*76\. Die', result, re.DOTALL)
def test_page_mixed_regions():
# This file contains ImageRegions and TextRegions in the ReadingOrder
tree = ET.parse(os.path.join(data_dir, 'mixed-regions.page.xml'))
with pytest.warns(UserWarning, match=r'Not a TextRegion'):
result = page_text(tree)
assert 'non exaudiam uos. Chriſtiani uero quia orant iuxta' in result
def test_text():
assert "being erected at the Broadway stock" in text(os.path.join(data_dir, 'test.alto1.xml'))
assert "wieder ein. — Er langte den Zettel aus dem" in text(os.path.join(data_dir, 'test.page2018.xml'))
assert "Lorem ipsum" in text(os.path.join(data_dir, 'test.txt'))

@ -0,0 +1,37 @@
from __future__ import division, print_function
import math
from .. import word_error_rate, words
def test_words():
result = list(words('Der schnelle [„braune“] Fuchs kann keine 3,14 Meter springen, oder?'))
expected = ['Der', 'schnelle', 'braune', 'Fuchs', 'kann', 'keine', '3,14', 'Meter', 'springen', 'oder']
assert result == expected
def test_words_private_use_area():
result = list(words(
'ber die vielen Sorgen wegen deelben vergaß Hartkopf, der Frau Amtmnnin das ver⸗\n'
'ſproene zu berliefern.'))
expected = [
'ber', 'die', 'vielen', 'Sorgen', 'wegen', 'deelben', 'vergaß', 'Hartkopf',
'der', 'Frau', 'Amtmnnin', 'das', 'ver',
'ſproene', 'zu', 'berliefern']
assert result == expected
def test_word_error_rate():
assert word_error_rate('Dies ist ein Beispielsatz!', 'Dies ist ein Beispielsatz!') == 0
assert word_error_rate('Dies. ist ein Beispielsatz!', 'Dies ist ein Beispielsatz!') == 0
assert word_error_rate('Dies. ist ein Beispielsatz!', 'Dies ist ein Beispielsatz.') == 0
assert word_error_rate('Dies ist ein Beispielsatz!', 'Dies ist ein Beispielsarz:') == 1/4
assert word_error_rate('Dies ist ein Beispielsatz!', 'Dies ein ist Beispielsatz!') == 2/4
assert word_error_rate('Dies ist ein Beispielsatz!', '') == 4/4
assert math.isinf(word_error_rate('', 'Dies ist ein Beispielsatz!'))
assert word_error_rate('', '') == 0
assert word_error_rate('Schlyñ lorem ipsum dolor sit amet,', 'Schlym̃ lorem ipsum dolor sit amet.') == 1/6

@ -0,0 +1,38 @@
from itertools import zip_longest
from typing import Iterable
import colorama
import os
def diffprint(x, y):
"""Print elements or lists x and y, with differences in red"""
def _diffprint(x, y):
if x != y:
print(colorama.Fore.RED, x, y, colorama.Fore.RESET)
else:
print(x, y)
if isinstance(x, Iterable):
for xe, ye in zip_longest(x, y):
_diffprint(xe, ye)
else:
_diffprint(x, y)
def unzip(l):
return zip(*l)
class working_directory:
"""Context manager to temporarily change the working directory"""
def __init__(self, wd):
self.wd = wd
def __enter__(self):
self.old_wd = os.getcwd()
os.chdir(self.wd)
def __exit__(self, etype, value, traceback):
os.chdir(self.old_wd)

@ -0,0 +1,63 @@
from __future__ import division
import unicodedata
import uniseg.wordbreak
from .edit_distance import levenshtein
def words(s):
# Patch uniseg.wordbreak.word_break to deal with our private use characters. See also
# https://www.unicode.org/Public/UCD/latest/ucd/auxiliary/WordBreakProperty.txt
old_word_break = uniseg.wordbreak.word_break
def new_word_break(c, index=0):
if 0xE000 <= ord(c) <= 0xF8FF: # Private Use Area
return 'ALetter'
else:
return old_word_break(c, index)
uniseg.wordbreak.word_break = new_word_break
# Check if c is an unwanted character, i.e. whitespace, punctuation, or similar
def unwanted(c):
# See https://www.fileformat.info/info/unicode/category/index.htm
# and https://unicodebook.readthedocs.io/unicode.html#categories
unwanted_categories = 'O', 'M', 'P', 'Z', 'S'
unwanted_subcategories = 'Cc', 'Cf'
subcat = unicodedata.category(c)
cat = subcat[0]
return cat in unwanted_categories or subcat in unwanted_subcategories
# We follow Unicode Standard Annex #29 on Unicode Text Segmentation here: Split on word boundaries using
# uniseg.wordbreak.words() and ignore all "words" that contain only whitespace, punctation "or similar characters."
for word in uniseg.wordbreak.words(s):
if all(unwanted(c) for c in word):
pass
else:
yield word
def words_normalized(s):
return words(unicodedata.normalize('NFC', s))
def word_error_rate(reference, compared):
if isinstance(reference, str):
reference_seq = list(words_normalized(reference))
compared_seq = list(words_normalized(compared))
else:
reference_seq = list(reference)
compared_seq = list(compared)
d = levenshtein(reference_seq, compared_seq)
if d == 0:
return 0
n = len(reference_seq)
if n == 0:
return float('inf')
return d / n

@ -1,2 +0,0 @@
from .main import *
from .ocrd_cli import *

File diff suppressed because it is too large Load Diff

@ -1,19 +0,0 @@
{
"version": "0.0.1",
"tools": {
"ocrd-sbb-textline-detector": {
"executable": "ocrd-sbb-textline-detector",
"description": "Detect lines",
"steps": ["layout/segmentation/line"],
"input_file_grp": [
"OCR-D-IMG"
],
"output_file_grp": [
"OCR-D-SBB-SEG-LINE"
],
"parameters": {
"model": {"type": "string", "format": "file", "cacheable": true}
}
}
}
}

@ -1,110 +0,0 @@
import json
import os
import tempfile
import click
import ocrd_models.ocrd_page
from ocrd import Processor
from ocrd.decorators import ocrd_cli_options, ocrd_cli_wrap_processor
from ocrd_modelfactory import page_from_file
from ocrd_models import OcrdFile
from ocrd_models.ocrd_page_generateds import MetadataItemType, LabelsType, LabelType
from ocrd_utils import concat_padded, getLogger, MIMETYPE_PAGE
from pkg_resources import resource_string
from qurator.sbb_textline_detector import textline_detector
log = getLogger('processor.OcrdSbbTextlineDetectorRecognize')
OCRD_TOOL = json.loads(resource_string(__name__, 'ocrd-tool.json').decode('utf8'))
@click.command()
@ocrd_cli_options
def ocrd_sbb_textline_detector(*args, **kwargs):
return ocrd_cli_wrap_processor(OcrdSbbTextlineDetectorRecognize, *args, **kwargs)
TOOL = 'ocrd_sbb_textline_detector'
class OcrdSbbTextlineDetectorRecognize(Processor):
def __init__(self, *args, **kwargs):
kwargs['ocrd_tool'] = OCRD_TOOL['tools'][TOOL]
kwargs['version'] = OCRD_TOOL['version']
super(OcrdSbbTextlineDetectorRecognize, self).__init__(*args, **kwargs)
def _make_file_id(self, input_file, input_file_grp, n):
file_id = input_file.ID.replace(input_file_grp, self.output_file_grp)
if file_id == input_file.ID:
file_id = concat_padded(self.output_file_grp, n)
return file_id
def _resolve_image_file(self, input_file: OcrdFile) -> str:
if input_file.mimetype == MIMETYPE_PAGE:
pcgts = page_from_file(self.workspace.download_file(input_file))
page = pcgts.get_Page()
image_file = page.imageFilename
else:
image_file = input_file.local_filename
return image_file
def process(self):
for n, page_id in enumerate(self.workspace.mets.physical_pages):
input_file = self.workspace.mets.find_files(fileGrp=self.input_file_grp, pageId=page_id)[0]
log.info("INPUT FILE %i / %s", n, input_file)
file_id = self._make_file_id(input_file, self.input_file_grp, n)
# Process the files
try:
os.mkdir(self.output_file_grp)
except FileExistsError:
pass
with tempfile.TemporaryDirectory() as tmp_dirname:
# Segment the image
image_file = self._resolve_image_file(input_file)
model = self.parameter['model']
x = textline_detector(image_file, tmp_dirname, file_id, model)
x.run()
# Read segmentation results
tmp_filename = os.path.join(tmp_dirname, file_id) + '.xml'
tmp_pcgts = ocrd_models.ocrd_page.parse(tmp_filename)
tmp_page = tmp_pcgts.get_Page()
# Create a new PAGE file from the input file
pcgts = page_from_file(self.workspace.download_file(input_file))
page = pcgts.get_Page()
# Merge results → PAGE file
page.set_PrintSpace(tmp_page.get_PrintSpace())
page.set_ReadingOrder(tmp_page.get_ReadingOrder())
page.set_TextRegion(tmp_page.get_TextRegion())
# Save metadata about this operation
metadata = pcgts.get_Metadata()
metadata.add_MetadataItem(
MetadataItemType(type_="processingStep",
name=self.ocrd_tool['steps'][0],
value=TOOL,
Labels=[LabelsType(
externalModel="ocrd-tool",
externalId="parameters",
Label=[LabelType(type_=name, value=self.parameter[name])
for name in self.parameter.keys()])]))
self.workspace.add_file(
ID=file_id,
file_grp=self.output_file_grp,
pageId=page_id,
mimetype='application/vnd.prima.page+xml',
local_filename=os.path.join(self.output_file_grp, file_id) + '.xml',
content=ocrd_models.ocrd_page.to_xml(pcgts)
)
if __name__ == '__main__':
ocrd_sbb_textline_detector()

@ -1,10 +1,7 @@
opencv-python-headless click
matplotlib jinja2
seaborn lxml
tqdm uniseg
keras numpy
shapely colorama
scikit-learn ocrd >= 1.0.0b15
tensorflow-gpu < 2.0
scipy
ocrd >= 2.0.0

@ -5,34 +5,24 @@ with open('requirements.txt') as fp:
install_requires = fp.read() install_requires = fp.read()
setup( setup(
name="qurator-sbb-textline", name='dinglehopper',
version="0.0.1", author='Mike Gerber, The QURATOR SPK Team',
author="The Qurator Team", author_email='mike.gerber@sbb.spk-berlin.de, qurator@sbb.spk-berlin.de',
author_email="qurator@sbb.spk-berlin.de", description='The OCR evaluation tool',
description="Qurator", long_description=open('README.md', 'r', encoding='utf-8').read(),
long_description=open("README.md", "r", encoding='utf-8').read(), long_description_content_type='text/markdown',
long_description_content_type="text/markdown", keywords='qurator ocr',
keywords='qurator',
license='Apache', license='Apache',
url="https://qurator.ai", namespace_packages=['qurator'],
packages=find_packages(exclude=["*.tests", "*.tests.*", packages=find_packages(exclude=['*.tests', '*.tests.*', 'tests.*', 'tests']),
"tests.*", "tests"]),
install_requires=install_requires, install_requires=install_requires,
package_data={ package_data={
'': ['*.json'], '': ['*.json', 'templates/*'],
}, },
entry_points={ entry_points={
'console_scripts': [ 'console_scripts': [
"sbb_textline_detector=qurator.sbb_textline_detector:main", 'dinglehopper=qurator.dinglehopper.cli:main',
"ocrd-sbb-textline-detector=qurator.sbb_textline_detector:ocrd_sbb_textline_detector", 'ocrd-dinglehopper=qurator.dinglehopper.ocrd_cli:ocrd_dinglehopper',
] ]
}, }
python_requires='>=3.6.0',
tests_require=['pytest'],
classifiers=[
'Intended Audience :: Science/Research',
'License :: OSI Approved :: Apache Software License',
'Programming Language :: Python :: 3',
'Topic :: Scientific/Engineering :: Artificial Intelligence',
],
) )

Loading…
Cancel
Save