Input data is required to follow the format used in the [GermEval2014 Named Entity Recognition Shared Task ](https://sites.google.com/site/germeval2014ner/data). Here, text is encoded as one token per line, with information provided in tab-separated columns. The first column contains either a #, which signals the source the sentence is cited from and the date it was retrieved, or the token number within the sentence. The second column contains the token. Name spans are encoded in the BIO-scheme. Outer spans are encoded in the third column, embedded spans in the fourth column.
Input data is required to follow the format used in the [GermEval2014 Named Entity Recognition Shared Task ](https://sites.google.com/site/germeval2014ner/data). Here, text is encoded as one token per line, with information provided in tab-separated columns. The first column contains either a #, which signals the source the sentence is cited from and the date it was retrieved, or the token number within the sentence. The second column contains the token. Name spans are encoded in the BIO-scheme. Outer spans are encoded in the third column, embedded spans in the fourth column.
#### Data preparation
#### Data preparation
We also provide some [Python tools](https://github.com/cneud/ner.edith/tree/master/tools) that help with data wrangling.
We also provide some [Python tools](https://github.com/cneud/ner.edith/tree/master/tools) that help with data wrangling.