mirror of
https://github.com/qurator-spk/neat.git
synced 2025-06-09 19:59:55 +02:00
Update Preprocessing.md
This commit is contained in:
parent
5da39a06e0
commit
6a49815db3
1 changed files with 5 additions and 1 deletions
|
@ -14,7 +14,11 @@ OCR is based on [OCR-D](https://github.com/OCR-D)'s [ocrd_tesserocr](https://git
|
||||||
|
|
||||||
### Tokenization
|
### Tokenization
|
||||||
|
|
||||||
[Transformation](https://github.com/qurator-spk/neath/tree/master/tools) of [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) to [TSV](https://github.com/qurator-spk/neath/blob/master/docs/User_Guide.md#data-format).
|
* [Transformation](https://github.com/qurator-spk/neath/tree/master/tools) of [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) to [TSV](https://github.com/qurator-spk/neath/blob/master/docs/User_Guide.md#data-format).
|
||||||
|
* Postprocessing:
|
||||||
|
* replace ``„`` and ``“`` with ``"``
|
||||||
|
* sentence boundaries
|
||||||
|
* punctuation
|
||||||
|
|
||||||
### Named Entity Recognition
|
### Named Entity Recognition
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue