mirror of
https://github.com/qurator-spk/sbb_ner.git
synced 2025-06-30 22:49:56 +02:00
fix README
This commit is contained in:
parent
5c2b5ecd9c
commit
9057148d8d
1 changed files with 31 additions and 0 deletions
31
README.md
31
README.md
|
@ -237,6 +237,37 @@ Perform BERT for NER supervised training and test/cross-validation.
|
|||
bert-ner --help
|
||||
```
|
||||
|
||||
## BERT-Pre-training:
|
||||
|
||||
### collectcorpus
|
||||
|
||||
```
|
||||
collectcorpus --help
|
||||
|
||||
Usage: collectcorpus [OPTIONS] FULLTEXT_FILE SELECTION_FILE CORPUS_FILE
|
||||
|
||||
Reads the fulltext from a CSV or SQLITE3 file (see also altotool) and
|
||||
write it to one big text file.
|
||||
|
||||
FULLTEXT_FILE: The CSV or SQLITE3 file to read from.
|
||||
|
||||
SELECTION_FILE: Consider only a subset of all pages that is defined by the
|
||||
DataFrame that is stored in <selection_file>.
|
||||
|
||||
CORPUS_FILE: The output file that can be used by bert-pregenerate-trainingdata.
|
||||
|
||||
Options:
|
||||
--chunksize INTEGER Process the corpus in chunks of <chunksize>.
|
||||
default:10**4
|
||||
|
||||
--processes INTEGER Number of parallel processes. default: 6
|
||||
--min-line-len INTEGER Lower bound of line length in output file.
|
||||
default:80
|
||||
|
||||
--help Show this message and exit.
|
||||
|
||||
```
|
||||
|
||||
### bert-pregenerate-trainingdata
|
||||
|
||||
Generate data for BERT pre-training from a corpus text file where
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue