mirror of
https://github.com/qurator-spk/neat.git
synced 2025-06-09 19:59:55 +02:00
Update Preprocessing.md
This commit is contained in:
parent
564a9ee851
commit
5f6b8bc9c3
1 changed files with 11 additions and 3 deletions
|
@ -3,7 +3,15 @@
|
||||||
The preprocessing pipeline that is developed at the
|
The preprocessing pipeline that is developed at the
|
||||||
[Berlin State Library](http://staatsbibliothek-berlin.de/)
|
[Berlin State Library](http://staatsbibliothek-berlin.de/)
|
||||||
comprises the following steps:
|
comprises the following steps:
|
||||||
- textline extraction @[sbb_pixelwise_segmentation](https://github.com/qurator-spk/pixelwise_segmentation_SBB)
|
- Layout Analysis & Textline Extraction @[sbb_pixelwise_segmentation](https://github.com/qurator-spk/pixelwise_segmentation_SBB)
|
||||||
- OCR + word segmentation @[ocrd_tesserocr](https://github.com/OCR-D/ocrd_tesserocr)
|
- OCR & Word Segmentation @[ocrd_tesserocr](https://github.com/OCR-D/ocrd_tesserocr)
|
||||||
- Tokenization
|
- Tokenization
|
||||||
- Pretagging @[sbb_ner](https://github.com/qurator-spk/sbb_ner)
|
- Named Entity Recognition @[sbb_ner](https://github.com/qurator-spk/sbb_ner)
|
||||||
|
|
||||||
|
### Layout Analysis & Textline Extraction
|
||||||
|
|
||||||
|
### OCR & Word Segmentation
|
||||||
|
|
||||||
|
### Tokenization
|
||||||
|
|
||||||
|
### Named Entity Recognition
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue