Update Preprocessing.md

pull/40/head
Clemens Neudecker 5 years ago committed by GitHub
parent 5da39a06e0
commit 6a49815db3
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -14,7 +14,11 @@ OCR is based on [OCR-D](https://github.com/OCR-D)'s [ocrd_tesserocr](https://git
### Tokenization ### Tokenization
[Transformation](https://github.com/qurator-spk/neath/tree/master/tools) of [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) to [TSV](https://github.com/qurator-spk/neath/blob/master/docs/User_Guide.md#data-format). * [Transformation](https://github.com/qurator-spk/neath/tree/master/tools) of [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) to [TSV](https://github.com/qurator-spk/neath/blob/master/docs/User_Guide.md#data-format).
* Postprocessing:
* replace ``„`` and ``“`` with ``"``
* sentence boundaries
* punctuation
### Named Entity Recognition ### Named Entity Recognition

Loading…
Cancel
Save