mirror of
https://github.com/qurator-spk/sbb_ner.git
synced 2025-06-07 19:35:14 +02:00
improve README
This commit is contained in:
parent
8752403746
commit
b14f7ce90b
1 changed files with 171 additions and 12 deletions
183
README.md
183
README.md
|
@ -1,66 +1,225 @@
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
|
How the models have been obtained: http://area.staatsbibliothek-berlin.de/sbb-upload/qurator/sbb_ner/konvens2019.pdf .
|
||||||
|
|
||||||
***
|
***
|
||||||
# Preprocessing of NER ground-truth:
|
#Installation:
|
||||||
|
|
||||||
|
Setup virtual environment:
|
||||||
|
```
|
||||||
|
virtualenv --python=python3.6 venv
|
||||||
|
```
|
||||||
|
|
||||||
|
Activate virtual environment:
|
||||||
|
```
|
||||||
|
source venv/bin/activate
|
||||||
|
```
|
||||||
|
|
||||||
|
Upgrade pip:
|
||||||
|
```
|
||||||
|
pip install -U pip
|
||||||
|
```
|
||||||
|
|
||||||
|
Install package together with its dependencies in development mode:
|
||||||
|
```
|
||||||
|
pip install -e ./
|
||||||
|
```
|
||||||
|
|
||||||
|
Download required models: http://area.staatsbibliothek-berlin.de/sbb-upload/qurator/sbb_ner/models.tar.gz
|
||||||
|
|
||||||
|
Extract model archive:
|
||||||
|
```
|
||||||
|
tar -xzf models.tar.gz
|
||||||
|
```
|
||||||
|
|
||||||
|
Run webapp directly:
|
||||||
|
|
||||||
|
```
|
||||||
|
env FLASK_APP=qurator/sbb_ner/webapp/app.py env FLASK_ENV=development env USE_CUDA=True flask run --host=0.0.0.0
|
||||||
|
```
|
||||||
|
|
||||||
|
Set USE_CUDA=False, if you do not have a GPU available/installed.
|
||||||
|
|
||||||
|
# Docker
|
||||||
|
|
||||||
|
## CPU-only:
|
||||||
|
|
||||||
|
```
|
||||||
|
docker build --build-arg http_proxy=$http_proxy -t qurator/webapp-ner-cpu -f Dockerfile.cpu .
|
||||||
|
```
|
||||||
|
|
||||||
|
```
|
||||||
|
docker run -ti --rm=true --mount type=bind,source=data/konvens2019,target=/usr/src/qurator-sbb-ner/data/konvens2019 -p 5000:5000 qurator/webapp-ner-cpu
|
||||||
|
```
|
||||||
|
|
||||||
|
## GPU:
|
||||||
|
|
||||||
|
Make sure that your GPU is correctly set up and that nvidia-docker has been installed.
|
||||||
|
|
||||||
|
|
||||||
## compile_conll
|
```
|
||||||
|
docker build --build-arg http_proxy=$http_proxy -t qurator/webapp-ner-gpu -f Dockerfile .
|
||||||
|
```
|
||||||
|
|
||||||
|
```
|
||||||
|
docker run -ti --rm=true --mount type=bind,source=data/konvens2019,target=/usr/src/qurator-sbb-ner/data/konvens2019 -p 5000:5000 qurator/webapp-ner-gpu
|
||||||
|
```
|
||||||
|
|
||||||
|
NER web-interface is availabe at http://localhost:5000 .
|
||||||
|
|
||||||
|
# REST - Interface
|
||||||
|
|
||||||
|
Get available models:
|
||||||
|
```
|
||||||
|
curl http://localhost:5000/models
|
||||||
|
```
|
||||||
|
|
||||||
|
Output:
|
||||||
|
|
||||||
|
```
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"default": true,
|
||||||
|
"id": 1,
|
||||||
|
"model_dir": "data/konvens2019/build-wd_0.03/bert-all-german-de-finetuned",
|
||||||
|
"name": "DC-SBB + CONLL + GERMEVAL"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"default": false,
|
||||||
|
"id": 2,
|
||||||
|
"model_dir": "data/konvens2019/build-on-all-german-de-finetuned/bert-sbb-de-finetuned",
|
||||||
|
"name": "DC-SBB + CONLL + GERMEVAL + SBB"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"default": false,
|
||||||
|
"id": 3,
|
||||||
|
"model_dir": "data/konvens2019/build-wd_0.03/bert-sbb-de-finetuned",
|
||||||
|
"name": "DC-SBB + SBB"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"default": false,
|
||||||
|
"id": 4,
|
||||||
|
"model_dir": "data/konvens2019/build-wd_0.03/bert-all-german-baseline",
|
||||||
|
"name": "CONLL + GERMEVAL"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
Perform NER using model 1:
|
||||||
|
|
||||||
|
```
|
||||||
|
curl -d '{ "text": "Paris Hilton wohnt im Hilton Paris in Paris." }' -H "Content-Type: application/json" http://localhost:5000/ner/1
|
||||||
|
```
|
||||||
|
|
||||||
|
Output:
|
||||||
|
|
||||||
|
```
|
||||||
|
[
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"prediction": "B-PER",
|
||||||
|
"word": "Paris"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"prediction": "I-PER",
|
||||||
|
"word": "Hilton"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"prediction": "O",
|
||||||
|
"word": "wohnt"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"prediction": "O",
|
||||||
|
"word": "im"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"prediction": "B-ORG",
|
||||||
|
"word": "Hilton"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"prediction": "I-ORG",
|
||||||
|
"word": "Paris"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"prediction": "O",
|
||||||
|
"word": "in"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"prediction": "B-LOC",
|
||||||
|
"word": "Paris"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"prediction": "O",
|
||||||
|
"word": "."
|
||||||
|
}
|
||||||
|
]
|
||||||
|
]
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
# Model-Training
|
||||||
|
|
||||||
|
***
|
||||||
|
## Preprocessing of NER ground-truth:
|
||||||
|
|
||||||
|
|
||||||
|
### compile_conll
|
||||||
|
|
||||||
Read CONLL 2003 ner ground truth files from directory and
|
Read CONLL 2003 ner ground truth files from directory and
|
||||||
write the outcome of the data parsing to some pandas DataFrame that is
|
write the outcome of the data parsing to some pandas DataFrame that is
|
||||||
stored as pickle.
|
stored as pickle.
|
||||||
|
|
||||||
### Usage
|
#### Usage
|
||||||
|
|
||||||
```
|
```
|
||||||
compile_conll --help
|
compile_conll --help
|
||||||
```
|
```
|
||||||
|
|
||||||
## compile_germ_eval
|
### compile_germ_eval
|
||||||
|
|
||||||
Read germ eval .tsv files from directory and write the
|
Read germ eval .tsv files from directory and write the
|
||||||
outcome of the data parsing to some pandas DataFrame that is stored as
|
outcome of the data parsing to some pandas DataFrame that is stored as
|
||||||
pickle.
|
pickle.
|
||||||
|
|
||||||
### Usage
|
#### Usage
|
||||||
|
|
||||||
```
|
```
|
||||||
compile_germ_eval --help
|
compile_germ_eval --help
|
||||||
```
|
```
|
||||||
|
|
||||||
## compile_europeana_historic
|
### compile_europeana_historic
|
||||||
|
|
||||||
Read europeana historic ner ground truth .bio files from directory
|
Read europeana historic ner ground truth .bio files from directory
|
||||||
and write the outcome of the data parsing to some pandas
|
and write the outcome of the data parsing to some pandas
|
||||||
DataFrame that is stored as pickle.
|
DataFrame that is stored as pickle.
|
||||||
|
|
||||||
### Usage
|
#### Usage
|
||||||
|
|
||||||
```
|
```
|
||||||
compile_europeana_historic --help
|
compile_europeana_historic --help
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
## compile_wikiner
|
### compile_wikiner
|
||||||
|
|
||||||
Read wikiner files from directory and write the outcome
|
Read wikiner files from directory and write the outcome
|
||||||
of the data parsing to some pandas DataFrame that is stored as pickle.
|
of the data parsing to some pandas DataFrame that is stored as pickle.
|
||||||
|
|
||||||
### Usage
|
#### Usage
|
||||||
|
|
||||||
```
|
```
|
||||||
compile_wikiner --help
|
compile_wikiner --help
|
||||||
```
|
```
|
||||||
|
|
||||||
***
|
***
|
||||||
# Train BERT - NER model:
|
## Train BERT - NER model:
|
||||||
|
|
||||||
## bert-ner
|
### bert-ner
|
||||||
|
|
||||||
Perform BERT for NER supervised training and test/cross-validation.
|
Perform BERT for NER supervised training and test/cross-validation.
|
||||||
|
|
||||||
### Usage
|
#### Usage
|
||||||
|
|
||||||
```
|
```
|
||||||
bert-ner --help
|
bert-ner --help
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue