1
0
Fork 0
mirror of https://github.com/qurator-spk/neat.git synced 2025-06-09 11:49:54 +02:00

Update Preprocessing.md

This commit is contained in:
Clemens Neudecker 2019-11-19 23:50:52 +01:00 committed by GitHub
parent 564a9ee851
commit 5f6b8bc9c3
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -3,7 +3,15 @@
The preprocessing pipeline that is developed at the The preprocessing pipeline that is developed at the
[Berlin State Library](http://staatsbibliothek-berlin.de/) [Berlin State Library](http://staatsbibliothek-berlin.de/)
comprises the following steps: comprises the following steps:
- textline extraction @[sbb_pixelwise_segmentation](https://github.com/qurator-spk/pixelwise_segmentation_SBB) - Layout Analysis & Textline Extraction @[sbb_pixelwise_segmentation](https://github.com/qurator-spk/pixelwise_segmentation_SBB)
- OCR + word segmentation @[ocrd_tesserocr](https://github.com/OCR-D/ocrd_tesserocr) - OCR & Word Segmentation @[ocrd_tesserocr](https://github.com/OCR-D/ocrd_tesserocr)
- Tokenization - Tokenization
- Pretagging @[sbb_ner](https://github.com/qurator-spk/sbb_ner) - Named Entity Recognition @[sbb_ner](https://github.com/qurator-spk/sbb_ner)
### Layout Analysis & Textline Extraction
### OCR & Word Segmentation
### Tokenization
### Named Entity Recognition