Newest OCR-D wasn't happy with the test data anymore (see gh-89). I'm not sure if the
test data was invalid the way it was, but having a LOCTYPE certainly is "prettier" so
adding it. This fixes the test again.
Newest OCR-D wasn't happy with the test data anymore (see gh-89). I'm not sure if the
test data was invalid the way it was, but having a LOCTYPE certainly is "prettier" so
adding it. This fixes the test again.
See https://github.com/qurator-spk/setuptools_ocrd/issues/10 - The sdist does not
contain ocrd-tool.json, so that the wheel built from it does not get the proper version.
Needs to be fixed in setuptools_ocrd, then MANIFEST.in can be removed again.
"dinglehopper uses to have its own (very inefficient) Levenshtein edit distance implementation, but now uses RapidFuzz."
"dinglehopper used to have its own (very inefficient) Levenshtein edit distance implementation, but now uses RapidFuzz."
]
},
{
@ -391,7 +391,7 @@
"\\text{CER} = \\frac{i + s + d}{n}\n",
"$$\n",
"\n",
"where $i$ is the number of inserts, $s$ the number of substitutions, $d$ the number of deletions and $n$ is the number of characters in the reference text. (The text is not super clear about $n$ being the number of characters in the reference text, but it seems appropiate as they *are* clear about this when computing the word error rate.)"
"where $i$ is the number of inserts, $s$ the number of substitutions, $d$ the number of deletions and $n$ is the number of characters in the reference text. (The text is not super clear about $n$ being the number of characters in the reference text, but it seems appropriate as they *are* clear about this when computing the word error rate.)"
]
},
{
@ -680,7 +680,7 @@
" return cat in unwanted_categories or subcat in unwanted_subcategories\n",
"\n",
" # We follow Unicode Standard Annex #29 on Unicode Text Segmentation here: Split on word boundaries using\n",
" # uniseg.wordbreak.words() and ignore all \"words\" that contain only whitespace, punctation \"or similar characters.\"\n",
" # uniseg.wordbreak.words() and ignore all \"words\" that contain only whitespace, punctuation \"or similar characters.\"\n",