mirror of
https://github.com/qurator-spk/sbb_ner.git
synced 2025-07-09 10:59:56 +02:00
fix README
This commit is contained in:
parent
5c2b5ecd9c
commit
9057148d8d
1 changed files with 31 additions and 0 deletions
31
README.md
31
README.md
|
@ -237,6 +237,37 @@ Perform BERT for NER supervised training and test/cross-validation.
|
||||||
bert-ner --help
|
bert-ner --help
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## BERT-Pre-training:
|
||||||
|
|
||||||
|
### collectcorpus
|
||||||
|
|
||||||
|
```
|
||||||
|
collectcorpus --help
|
||||||
|
|
||||||
|
Usage: collectcorpus [OPTIONS] FULLTEXT_FILE SELECTION_FILE CORPUS_FILE
|
||||||
|
|
||||||
|
Reads the fulltext from a CSV or SQLITE3 file (see also altotool) and
|
||||||
|
write it to one big text file.
|
||||||
|
|
||||||
|
FULLTEXT_FILE: The CSV or SQLITE3 file to read from.
|
||||||
|
|
||||||
|
SELECTION_FILE: Consider only a subset of all pages that is defined by the
|
||||||
|
DataFrame that is stored in <selection_file>.
|
||||||
|
|
||||||
|
CORPUS_FILE: The output file that can be used by bert-pregenerate-trainingdata.
|
||||||
|
|
||||||
|
Options:
|
||||||
|
--chunksize INTEGER Process the corpus in chunks of <chunksize>.
|
||||||
|
default:10**4
|
||||||
|
|
||||||
|
--processes INTEGER Number of parallel processes. default: 6
|
||||||
|
--min-line-len INTEGER Lower bound of line length in output file.
|
||||||
|
default:80
|
||||||
|
|
||||||
|
--help Show this message and exit.
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
### bert-pregenerate-trainingdata
|
### bert-pregenerate-trainingdata
|
||||||
|
|
||||||
Generate data for BERT pre-training from a corpus text file where
|
Generate data for BERT pre-training from a corpus text file where
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue