mirror of
https://github.com/qurator-spk/neat.git
synced 2025-06-09 11:49:54 +02:00
Update Preprocessing.md
This commit is contained in:
parent
5da39a06e0
commit
6a49815db3
1 changed files with 5 additions and 1 deletions
|
@ -14,7 +14,11 @@ OCR is based on [OCR-D](https://github.com/OCR-D)'s [ocrd_tesserocr](https://git
|
|||
|
||||
### Tokenization
|
||||
|
||||
[Transformation](https://github.com/qurator-spk/neath/tree/master/tools) of [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) to [TSV](https://github.com/qurator-spk/neath/blob/master/docs/User_Guide.md#data-format).
|
||||
* [Transformation](https://github.com/qurator-spk/neath/tree/master/tools) of [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) to [TSV](https://github.com/qurator-spk/neath/blob/master/docs/User_Guide.md#data-format).
|
||||
* Postprocessing:
|
||||
* replace ``„`` and ``“`` with ``"``
|
||||
* sentence boundaries
|
||||
* punctuation
|
||||
|
||||
### Named Entity Recognition
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue