1
0
Fork 0
mirror of https://github.com/qurator-spk/neat.git synced 2025-10-26 06:14:15 +01:00

downloaded old neath

This commit is contained in:
snmnzl 2020-02-05 10:49:16 +01:00
parent 1fe2479b6f
commit 62f2f4963b
5 changed files with 1145 additions and 264 deletions

BIN
Annotation_Guidelines.pdf Normal file

Binary file not shown.

195
README.md
View file

@ -1,5 +1,192 @@
# neath: named entity annotation tool in html
[User Guide](docs/User_Guide.md) | [Anntotation Guidelines](docs/Annotation_Guidelines.md)
# neath: named entity annotation tool
#### version 0.1
---
![Screenshot](assets/screenshot.png)
![Screenshot](https://user-images.githubusercontent.com/952378/72167036-ad2c6680-33ca-11ea-980f-5859e5155877.png)
---
### Table of contents
[1. Introduction](https://github.com/qurator-spk/neath/blob/master/README.md#1-introduction)
[2. User Guide](https://github.com/qurator-spk/neath/blob/master/README.md#2-user-guide)
   [2.1 Technical requirements](https://github.com/qurator-spk/neath/blob/master/README.md#21-technical-requirements)
   [2.2 Installation](https://github.com/qurator-spk/neath/blob/master/README.md#22-installation)
   [2.3 Data format](https://github.com/qurator-spk/neath/blob/master/README.md#23-data-format)
   [2.4 Data preparation](https://github.com/qurator-spk/neath/blob/master/README.md#24-data-preparation)
   [2.5 Provenance](https://github.com/qurator-spk/neath/blob/master/README.md#25-provenance)
   [2.6 Keyboard navigation](https://github.com/qurator-spk/neath/blob/master/README.md#26-keyboard-navigation)
   [2.7 Mouse navigation](https://github.com/qurator-spk/neath/blob/master/README.md#27-mouse-navigation)
   [2.8 Image support](https://github.com/qurator-spk/neath/blob/master/README.md#28-image-support)
   [2.9 Saving progress](https://github.com/qurator-spk/neath/blob/master/README.md#29-saving-progress)
[3. Annotation Guidelines](https://github.com/qurator-spk/neath/blob/master/README.md#3-annotation-guidelines)
### 1. Introduction
[neath](https://github.com/qurator-spk/neath) is a simple, browser-based tool for editing and annotating text with named entities to produce a corpus for training/testing/evaluation. It can be used to add or correct named entity BIO-tags in a TSV file and to correct the token text or tokenization (e.g. due to OCR/segmentation errors).
[neath](https://github.com/qurator-spk/neath) is developed at the [Berlin State Library](https://staatsbibliothek-berlin.de/) for data annotation in the context of the [SoNAR-IDH](https://sonar.fh-potsdam.de/) project and the [QURATOR](https://qurator.ai/) project.
### 2. User Guide
#### 2.1 Technical Requirements
[neath](https://github.com/qurator-spk/neath) runs locally as a pure HTML+JavaScript webpage in your web browser. No software needs to be installed, but JavaScript has to be enabled in the browser.
#### 2.2. Installation
Simply clone the repo using ``git clone https://github.com/qurator-spk/neath.git`` or download the [ZIP](https://github.com/qurator-spk/neath/archive/master.zip). Make sure you have at minimum ``neath.html`` and ``neath.js`` residing in a local directory, then it is sufficient to just open ``neath.html`` in a browser. Any fairly recent browser should work, but only Chrome and Firefox are tested.
#### 2.3 Data format
The data format is based on the format used in the [GermEval2014 Named Entity Recognition Shared Task](https://sites.google.com/site/germeval2014ner/data). Text is encoded as one token per line, with name spans encoded in the BIO-scheme, provided as tab-separated values:
* the first column contains either a `#`, which signals the source the sentence is cited from, or
* the token position within the sentence ``>=1``
* sentence boundaries are indicated by ``0``
* the second column contains the token ``text``
* outer entity spans are encoded in the third column ``NE-TAG``
* embedded entity spans are encoded in the fourth column ``NE-EMB``
Example (simple):
```tsv
No. TOKEN NE-TAG NE-EMB
# https://example.url
1 Donnerstag O O
2 , O O
3 1 O O
4 . O O
5 Januar O O
6 . O O
0 O O
1 Berliner B-ORG B-LOC
2 Tageblatt I-ORG O
3 . O O
0 O O
1 Nr O O
2 . O O
3 1 O O
4 . O O
0 O O
1 Seite O O
2 3 O O
```
For our purposes we extend this format by adding
* a fifth column for an ``ID`` for the outer ``NE-TAG`` from an authority file (in this case, the [GND](https://www.dnb.de/EN/Professionell/Standardisierung/GND/gnd_node.html) is used)
* column six for use as a variable ``url_id`` (see [Image Support](https://github.com/qurator-spk/neath/blob/master/README.md#28-image-support) for further details)
* finally, columns 7+ are used for storing ``left,right,top,bottom`` pixel coordinates for facsimile snippets
Example (full):
```tsv
No. TOKEN NE-TAG NE-EMB GND-ID url_id left,right,top,bottom
# https://example.url/iiif/left,right,top,bottom/full/0/default.jpg
1 Donnerstag O O - 0 174,352,358,390
2 , O O - 0 174,352,358,390
3 1 O O - 0 367,392,361,381
4 . O O - 0 370,397,352,379
5 Januar O O - 0 406,518,358,386
6 . O O - 0 406,518,358,386
0
1 Berliner B-ORG B-LOC 1086206452 0 816,984,358,388
2 Tageblatt I-ORG O 1086206452 0 1005,1208,360,387
3 . O O - 0 1005,1208,360,387
0
1 Nr O O - 0 1237,1288,360,382
2 . O O - 0 1237,1288,360,382
3 1 O O - 0 1304,1326,361,381
4 . O O - 0 1304,1326,361,381
0
1 Seite O O - 0 1837,1926,361,392
2 3 O O - 0 1939,1967,364,385
```
#### 2.4 Data preparation
The source data that is used for annotation are OCR results in [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) format. We provide a [Python tool](https://github.com/qurator-spk/page2tsv) that supports the transformation of [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) OCR files into the [TSV format](https://github.com/qurator-spk/neath/blob/master/README.md#23-data-format) required for use with [neath](https://github.com/qurator-spk/neath).
#### 2.5 Provenance
The processing pipeline applied at the Berlin State Library comprises the follows steps:
1. Layout Analysis & Textline Extraction
Layout Analysis & Textline Extraction @[sbb_textline_detector](https://github.com/qurator-spk/sbb_textline_detector)
2. OCR & Word Segmentation
OCR is based on [OCR-D](https://github.com/OCR-D)'s [ocrd_tesserocr](https://github.com/OCR-D/ocrd_tesserocr) which requires [Tesseract](https://github.com/tesseract-ocr/tesseract) **>= 4.1.0**. The [GT4HistOCR_2000000](https://ub-backup.bib.uni-mannheim.de/~stweil/ocrd-train/data/GT4HistOCR_2000000.traineddata) model, which is [trained](https://github.com/tesseract-ocr/tesstrain/wiki/GT4HistOCR) on the [GT4HistOCR](https://zenodo.org/record/1344132) corpus, is used. Further details are available in the [paper](https://arxiv.org/abs/1809.05501).
3. TSV Transformation
A simple [Python tool](https://github.com/qurator-spk/page2tsv) is used for the transformation of the OCR results in [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) to [TSV](https://github.com/qurator-spk/neath/blob/master/docs/README.md#23-data-format).
4. Tokenization
For tokenization, [SoMaJo](https://github.com/tsproisl/SoMaJo) is used.
5. Named Entity Recognition
For Named Entity Recognition, a [BERT-Base](https://github.com/google-research/bert) model was trained for noisy OCR texts with historical spelling variation. [sbb_ner](https://github.com/qurator-spk/sbb_ner) is using a combination of unsupervised training on a large (~2.3m pages) [corpus of German OCR](https://zenodo.org/record/3257041) in combination with supervised training on a small (47k tokens) [annotated corpus](https://github.com/EuropeanaNewspapers/ner-corpora/tree/master/enp_DE.sbb.bio). Further details are available in the [paper](https://corpora.linguistik.uni-erlangen.de/data/konvens/proceedings/papers/KONVENS2019_paper_4.pdf).
#### 2.6 Keyboard-Navigation
| Key Combination| Action |
|:---------|:-------------------------------------------|
| Left | Move one cell left |
| Right | Move one cell right |
| Up | Move one row up |
| Down | Move one row down |
| PageDown | Move page down |
| PageUp | Move page up |
| Crtl+Up | Move entire table one row up |
| Crtl+Down| Move entire table one row down |
|----------|--------------------------------------------|
| s t | Start new sentence in current row |
| m e | Merge current row with row above |
| s p | Create copy of current row |
| d l | Delete current row |
|----------|--------------------------------------------|
| backspace| Set NE-TAG / NE-EMB to "O" |
| b p | Set NE-TAG / NE-EMB to "B-PER" |
| b l | Set NE-TAG / NE-EMB to "B-LOC" |
| b o | Set NE-TAG / NE-EMB to "B-ORG" |
| b w | Set NE-TAG / NE-EMB to "B-WORK" |
| b c | Set NE-TAG / NE-EMB to "B-CONF" |
| b e | Set NE-TAG / NE-EMB to "B-EVT" |
| b t | Set NE-TAG / NE-EMB to "B-TODO" |
| i p | Set NE-TAG / NE-EMB to "I-PER" |
| i l | Set NE-TAG / NE-EMB to "I-LOC" |
| i o | Set NE-TAG / NE-EMB to "I-ORG" |
| i w | Set NE-TAG / NE-EMB to "I-WORK" |
| i c | Set NE-TAG / NE-EMB to "I-CONF" |
| i e | Set NE-TAG / NE-EMB to "I-EVT" |
| i t | Set NE-TAG / NE-EMB to "I-TODO" |
|----------|--------------------------------------------|
| enter | Edit TOKEN or GND-ID |
| esc | Close TOKEN or GND-ID edit field without |
| | application of changes. |
|----------|--------------------------------------------|
| l a | add one display row |
| l r | remove on display row (minimum is 5) |
|----------|--------------------------------------------|
#### 2.7 Mouse-Navigation
* use mouse wheel to scroll up and down
* left-click `<<` and `>>` to move 15 rows up or down
* left-click `O` in the `NE-TAG` or `NE-EMB` columns to open the drop-down menu and select any of the supported NE-Tags to tag a token or change an existing tag to another one
* left-click a tag in the `NE-TAG` or `NE-EMB` columns and subsequently select `O` to remove a wrong tag
* left-click a token in the `TOKEN` column to edit/correct the text content
* left-click the `POSITION` of a row and select `split` from the drop-down menu to create a copy of the current row
* left-click the `POSITION` of a row and select `merge` from the drop-down menu to merge the current row with the row above
* left-click the `POSITION` of a row and select `start-sentence` from the drop-down menu to start a new sentence
#### 2.8 Image Support
Provided facsimile images are available online via the [iiif.io](https://iiif.io/) Image API, [neath](https://github.com/qurator-spk/neath) supports the embedding of facsimile snippets into its interface to help with data annotation and correction.
This further requires that OCR with word segmentation is applied to the image to determine bounding boxes for tokens.
The iiif-image-url contained in the source ``#`` can then be used as a replacement for ``url_id`` in combination with the token bounding boxes as ``left,right,top,bottom`` to obtain the facsimile snippet url and display the image in the leftmost column. Clicking on the facsimile snippet opens up a new tab with a larger context.
#### 2.9 Saving progress
[neath](https://github.com/qurator-spk/neath) runs fully locally in the browser. Therefore it can not automatically save any changes you made to disk. You have to use the `Save Changes` button in order to so manually from time to time. If your browser automatically saves all downloads to your `Downloads` folder, you might want to configure it so that it instead prompts you where to save.
### 3. Annotation Guidelines
The most recent version of the [Annotation Guidelines](https://github.com/qurator-spk/neath/blob/master/Annotation_Guidelines.pdf) is included in this repository.

351
example.tsv Normal file
View file

@ -0,0 +1,351 @@
No. TOKEN NE-TAG NE-EMB GND-ID url_id left right top bottom
# https://content.staatsbibliothek-berlin.de/zefys/SNP27646518-18800101-0-3-0-0/left,top,width,height/full/0/default.jpg
0 Kampf O O - 0 154 212 400 419
0 , O O - 0 154 212 400 419
0 deſſen O O - 0 221 264 400 419
0 Ende O O - 0 274 313 401 417
0 vielleicht O O - 0 324 388 399 418
0 noch O O - 0 397 429 400 418
0 heute O O - 0 439 478 400 418
0 nicht O O - 0 487 523 399 417
0 abzuſehen O O - 0 532 605 399 418
0 wäre O O - 0 615 656 399 417
0 , O O - 0 615 656 399 417
0 wenn O O - 0 671 701 402 415
0 nicht O O - 0 702 755 399 417
0 Herr O O - 0 155 192 419 437
0 Gambetta B-PER O 118716263 0 202 277 419 437
0 als O O - 0 287 311 420 436
0 deus O O - 0 320 357 419 434
0 ex O O - 0 366 385 422 434
0 machina O O - 0 395 451 419 434
0 erſchienen O O - 0 452 543 417 436
0 wäre O O - 0 553 594 417 437
0 , O O - 0 553 594 417 437
0 reſp O O - 0 608 642 418 437
0 . O O - 0 608 642 418 437
0 durch O O - 0 652 692 418 436
0 perſön⸗ O O - 0 698 756 418 437
0 liche O O - 0 156 188 437 457
0 Intervention O O - 0 197 298 438 457
0 bei O O - 0 309 330 438 453
0 dem O O - 0 339 370 437 453
0 Präſidenten O O - 0 379 468 437 457
0 Grévy B-PER O 119064693 0 475 524 436 456
0 einen O O - 0 534 572 437 453
0 Ausgleich O O - 0 577 650 437 455
0 herbeigeführt O O - 0 658 755 436 455
0 hätte O O - 0 155 207 457 475
0 . O O - 0 155 207 457 475
0 O O - 0 216 239 457 474
0 Es O O - 0 216 239 457 474
0 ſcheint O O - 0 252 300 457 475
0 dem O O - 0 309 339 457 472
0 Kammerpräſidenten O O - 0 349 498 455 474
0 plötzlich O O - 0 508 566 455 475
0 ein O O - 0 576 598 455 472
0 Argwohn O O - 0 604 676 455 475
0 oder O O - 0 686 710 455 471
0 eine O O - 0 711 756 455 471
0 Befürchtung O O - 0 155 250 475 495
0 gekommen O O - 0 259 338 475 495
0 zu O O - 0 346 354 479 495
0 ſein O O - 0 354 404 475 494
0 , O O - 0 354 404 475 494
0 als O O - 0 414 438 475 490
0 ob O O - 0 449 467 474 490
0 hinter O O - 0 476 522 474 493
0 dem O O - 0 531 561 474 491
0 Bemühen O O - 0 570 648 474 492
0 , O O - 0 570 648 474 492
0 Waddington B-PER O 117086630 0 660 756 474 493
0 unb O O - 0 155 185 494 512
0 Léon B-PER O 117619744 0 200 249 494 512
0 Say I-PER O - 0 254 288 494 512
0 zu O O - 0 308 324 498 512
0 halten O O - 0 343 398 494 512
0 , O O - 0 343 398 494 512
0 dagegen O O - 0 410 477 492 512
0 Lepère B-PER O 1012607569 0 492 544 493 512
0 zu O O - 0 563 581 497 512
0 entfernen O O - 0 600 678 492 511
0 , O O - 0 600 678 492 511
0 die O O - 0 693 718 492 509
0 Ab O O - 0 724 756 492 509
0 ſicht O O - 0 156 187 513 531
0 ſtecke O O - 0 206 250 513 531
0 , O O - 0 206 250 513 531
0 das O O - 0 268 296 513 529
0 neue O O - 0 316 349 516 529
0 Miniſterium O O - 0 367 463 511 529
0 von O O - 0 482 509 515 528
0 dem O O - 0 529 559 512 528
0 bisher O O - 0 566 632 511 530
0 dominirenden O O - 0 653 756 511 528
0 Einfluß O O - 0 156 216 531 550
0 des O O - 0 240 266 532 548
0 Palais B-LOC O 4342820-4 0 293 346 530 550
0 Bourbon I-LOC O - 0 368 437 530 546
0 frei O O - 0 462 488 530 549
0 zu O O - 0 511 528 535 550
0 machen O O - 0 552 610 530 549
0 . O O - 0 552 610 530 549
0 O O - 0 644 682 529 546
0 Sein O O - 0 644 682 529 546
0 Beſuch O O - 0 706 756 530 548
0 bei O O - 0 159 189 550 567
0 Grévy B-PER O 119064693 0 195 246 551 569
0 am O O - 0 262 285 554 566
0 Sonntag O O - 0 300 368 550 569
0 Morgen O O - 0 380 442 549 569
0 um O O - 0 457 482 553 565
0 10 O O - 0 496 514 550 565
0 Uhr O O - 0 525 546 549 568
0 ſoll O O - 0 546 593 549 569
0 keineswegs O O - 0 607 691 548 567
0 erbeten O O - 0 703 756 549 565
0 ſondern O O - 0 163 216 570 586
0 — O O - 0 225 243 577 580
0 zum O O - 0 254 285 573 587
0 erſten O O - 0 295 335 569 587
0 Mal O O - 0 345 386 567 587
0 ! O O - 0 345 386 567 587
0 — O O - 0 396 414 576 578
0 freiwillig O O - 0 418 493 567 587
0 und O O - 0 508 537 568 584
0 ziemlich O O - 0 542 605 567 587
0 unerwartet O O - 0 615 697 568 583
0 erfolgt O O - 0 707 756 567 586
0 ſein O O - 0 156 190 586 606
0 . O O - 0 156 190 586 606
0 O O - 0 209 237 588 604
0 Was O O - 0 209 237 588 604
0 zwiſchen O O - 0 238 317 587 606
0 den O O - 0 327 353 587 603
0 beiden O O - 0 362 408 587 603
0 Präſidenten O O - 0 418 508 586 606
0 verhandelt O O - 0 523 602 587 606
0 worden O O - 0 611 671 586 604
0 , O O - 0 611 671 586 604
0 weiß O O - 0 687 723 586 604
0 na⸗ O O - 0 732 756 590 602
0 türlich O O - 0 157 205 606 624
0 Niemand O O - 0 217 289 607 624
0 , O O - 0 217 289 607 624
0 wenn O O - 0 300 339 609 623
0 nicht O O - 0 349 383 605 624
0 Herr O O - 0 393 429 606 624
0 Gambetta B-PER O 118716263 0 434 509 606 622
0 ſelbſt O O - 0 519 557 604 624
0 es O O - 0 566 582 607 621
0 hinterher O O - 0 588 656 605 623
0 beim O O - 0 666 700 605 621
0 Früh O O - 0 710 756 604 624
0 — O O - 0 710 756 604 624
0 ftück O O - 0 157 189 625 643
0 ſeinem O O - 0 199 248 624 643
0 Intimus O O - 0 257 330 625 643
0 , O O - 0 257 330 625 643
0 dem O O - 0 339 370 624 640
0 Schauſpieler O O - 0 380 476 624 643
0 Coquelin B-PER O 116670673 0 491 559 624 642
0 dem O O - 0 575 605 624 640
0 „ O O - 0 620 714 623 642
0 Jüngeren O O - 0 620 714 623 642
0 “ O O - 0 620 714 623 642
0 von O O - 0 728 756 626 639
0 der O O - 0 157 181 643 660
0 Comédie B-ORG O 16295404-9 0 197 262 643 661
0 françaiſe I-ORG O - 0 277 345 642 661
0 anvertraut O O - 0 359 440 644 659
0 hat O O - 0 455 484 644 661
0 . O O - 0 455 484 644 661
0 O O - 0 503 560 642 659
0 Abends O O - 0 503 560 642 659
0 im O O - 0 576 595 642 658
0 Theater O O - 0 604 665 642 661
0 ſpielte O O - 0 665 724 642 661
0 der O O - 0 733 756 642 658
0 Allgewaltige O O - 0 157 252 662 682
0 freilich O O - 0 262 312 662 681
0 wieder O O - 0 326 375 661 678
0 den O O - 0 389 415 662 678
0 Unbefangenen O O - 0 425 530 662 681
0 und O O - 0 544 572 661 677
0 Ununterrichteten O O - 0 582 711 661 679
0 , O O - 0 582 711 661 679
0 denn O O - 0 720 755 661 677
0 er O O - 0 158 172 686 697
0 leugnete O O - 0 182 242 682 700
0 ſogar O O - 0 256 296 681 699
0 ſeinen O O - 0 312 356 680 699
0 Beſuch O O - 0 366 416 681 699
0 vom O O - 0 433 465 683 696
0 Vormittag O O - 0 481 566 679 699
0 , O O - 0 481 566 679 699
0 obwohl O O - 0 583 638 681 698
0 Hunderte O O - 0 646 716 679 699
0 das O O - 0 728 755 679 695
0 wohlbekannte O O - 0 157 258 700 718
0 kleine O O - 0 271 312 698 715
0 Coupé O O - 0 322 371 699 718
0 Gambettas B-PER O 119064693 0 382 466 698 716
0 eine O O - 0 482 510 699 715
0 ganze O O - 0 525 566 702 718
0 Stunde O O - 0 577 633 698 715
0 lang O O - 0 648 681 698 715
0 von O O - 0 695 712 701 714
0 der O O - 0 714 756 698 714
0 Rue B-LOC O - 0 157 189 718 735
0 du I-LOC O - 0 204 222 719 735
0 Faubourg I-LOC O - 0 232 308 718 736
0 St I-LOC O - 0 324 351 718 735
0 . I-LOC O - 0 324 351 718 735
0 Honoré I-LOC O - 0 360 418 718 736
0 aus O O - 0 434 462 720 733
0 im O O - 0 476 496 718 734
0 Vorhof O O - 0 505 562 717 735
0 des O O - 0 577 602 718 733
0 Elyſée B-LOC O 4075880-1 0 612 661 717 736
0 ſtationiren O O - 0 666 755 703 736
0 geſehen O O - 0 158 211 737 756
0 hatten O O - 0 222 273 737 754
0 . O O - 0 222 273 737 754
0 O O - 0 292 321 737 753
0 Der O O - 0 292 321 737 753
0 Erfolg O O - 0 331 382 736 756
0 dieſer O O - 0 392 432 736 755
0 Viſite O O - 0 437 480 736 754
0 war O O - 0 490 520 740 753
0 denn O O - 0 530 565 736 752
0 auch O O - 0 574 606 736 754
0 ſchon O O - 0 616 655 735 755
0 in O O - 0 665 679 735 752
0 derſelben O O - 0 689 756 736 754
0 Zeit O O - 0 157 189 756 775
0 zu O O - 0 198 214 760 775
0 ſpüren O O - 0 224 277 755 774
0 , O O - 0 224 277 755 774
0 da O O - 0 287 305 755 772
0 derjenige O O - 0 314 392 755 774
0 , O O - 0 314 392 755 774
0 der O O - 0 396 419 756 771
0 ſie O O - 0 429 445 755 774
0 gemacht O O - 0 455 519 756 774
0 , O O - 0 455 519 756 774
0 ſie O O - 0 533 550 754 773
0 ableugnen O O - 0 565 641 756 774
0 wollte O O - 0 651 702 754 770
0 . O O - 0 651 702 754 770
0 O O - 0 720 756 754 774
0 Herr O O - 0 720 756 754 774
0 Lepère B-PER O 1012607569 0 156 212 774 793
0 , O O - 0 156 212 774 793
0 der O O - 0 227 250 774 790
0 bereits O O - 0 264 314 774 790
0 ſeine O O - 0 331 374 773 792
0 Siebenſachen O O - 0 382 480 773 792
0 zuſammengepackt O O - 0 494 623 773 793
0 hatte O O - 0 638 679 773 791
0 , O O - 0 638 679 773 791
0 weil O O - 0 696 727 773 789
0 er O O - 0 743 756 777 789
0 glaubte O O - 0 157 211 793 811
0 ausziehen O O - 0 221 295 793 811
0 zu O O - 0 305 322 797 811
0 müſſen O O - 0 332 383 793 811
0 — O O - 0 393 412 801 803
0 Freycinet B-PER O 118703099 0 421 493 793 811
0 ſelbſt O O - 0 496 544 792 811
0 hatte O O - 0 554 590 792 810
0 ihm O O - 0 600 629 793 809
0 das O O - 0 639 666 792 808
0 zu O O - 0 675 692 796 811
0 wieder O O - 0 702 756 791 808
0 — O O - 0 702 756 791 808
0 holten O O - 0 156 202 810 830
0 Malen O O - 0 212 262 811 828
0 in O O - 0 272 287 811 828
0 dürren O O - 0 297 347 812 827
0 Worten O O - 0 357 415 812 827
0 geſagt O O - 0 425 475 811 830
0 — O O - 0 484 503 819 822
0 Herr O O - 0 512 548 811 830
0 Lepre B-PER O 1012607569 0 556 607 811 829
0 erhielt O O - 0 616 664 811 830
0 von O O - 0 674 701 814 826
0 Gam B-PER O 118716263 0 711 755 811 827
0 — I-PER O - 0 711 755 811 827
0 betta I-PER O - 0 156 192 829 846
0 die O O - 0 202 224 830 846
0 Nachricht O O - 0 234 308 830 848
0 , O O - 0 234 308 830 848
0 daß O O - 0 318 346 830 848
0 er O O - 0 356 370 835 846
0 bleiben O O - 0 380 432 830 846
0 dürfe O O - 0 445 488 830 848
0 . O O - 0 445 488 830 848
0 O O - 0 508 592 830 848
0 Gleichzeitig O O - 0 508 592 830 848
0 wurde O O - 0 602 649 829 845
0 Herrn O O - 0 658 703 829 848
0 Waddington B-PER O 117086630 0 714 756 829 845
0 das O O - 0 230 257 849 865
0 Gegentheil O O - 0 272 354 848 867
0 bedeutet O O - 0 370 437 849 867
0 ; O O - 0 370 437 849 867
0 den O O - 0 451 476 849 864
0 Botſchafterpoſten O O - 0 486 617 848 867
0 in O O - 0 633 648 848 864
0 London B-LOC O 4074335-4 0 658 716 848 864
0 , O O - 0 658 716 848 864
0 der O O - 0 720 756 848 866
0 ihm O O - 0 156 185 866 885
0 als O O - 0 196 219 868 884
0 Entſchädigung O O - 0 230 339 867 886
0 angeboten O O - 0 350 426 868 886
0 wurde O O - 0 436 486 868 884
0 , O O - 0 436 486 868 884
0 ſchlug O O - 0 496 539 867 886
0 er O O - 0 549 563 872 883
0 aus O O - 0 573 605 869 883
0 . O O - 0 573 605 869 883
0 O O - 0 625 648 866 882
0 Von O O - 0 625 648 866 882
0 allen O O - 0 649 699 868 882
0 dieſen O O - 0 699 756 866 884
0 Vorgängen O O - 0 159 244 885 905
0 erhielt O O - 0 254 305 886 904
0 Léon B-PER O 117619744 0 310 350 885 902
0 Say I-PER O - 0 360 394 886 905
0 erſt O O - 0 407 432 886 902
0 in O O - 0 445 460 886 902
0 ſpäter O O - 0 475 519 886 903
0 Nachmittagsſtunde O O - 0 528 671 885 905
0 Kenntniß O O - 0 682 756 885 903
0 . O O - 0 682 756 885 903
0 O O - 0 161 198 904 921
0 Sein O O - 0 161 198 904 921
0 Entſchluß O O - 0 208 281 904 923
0 war O O - 0 297 328 908 920
0 ſofort O O - 0 343 391 905 923
0 gefaßt O O - 0 400 451 903 923
0 . O O - 0 400 451 903 923
0 O O - 0 471 519 905 923
0 Gegen O O - 0 471 519 905 923
0 6 O O - 0 535 544 907 920
0 Uhr O O - 0 560 589 905 922
0 Abends O O - 0 599 656 904 920
0 fuhr O O - 0 666 690 904 922
0 er O O - 0 692 723 909 920
0 ins O O - 0 733 756 904 919
0 Elyſée B-LOC O 4075880-1 0 158 207 923 942
0 und O O - 0 220 248 924 939
0 legte O O - 0 264 299 924 940
0 ſein O O - 0 313 340 923 940
0 Portefeuille O O - 0 355 445 923 942
0 in O O - 0 461 475 923 939
0 Grevys B-PER O 119064693 0 490 546 923 942
0 Hände O O - 0 557 606 923 942
0 zurück O O - 0 621 671 923 942
0 . O O - 0 621 671 923 942
Can't render this file because it has a wrong number of fields in line 2.

View file

@ -3,33 +3,35 @@
<head>
<meta charset="UTF-8">
<title>neath</title>
<base href="neath.html" target="_blank">
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css"
integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T" crossorigin="anonymous">
<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/PapaParse/5.0.1/papaparse.js"></script>
<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/PapaParse/5.1.0/papaparse.min.js"></script>
<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/keypress/2.1.5/keypress.min.js"></script>
<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script>
<script type="text/javascript" src="http://code.jquery.com/ui/1.12.1/jquery-ui.min.js"></script>
<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/jqueryui/1.12.1/jquery-ui.min.js"></script>
<style>
body{font-family:Verdana;font-size:16px}
table{table-layout:fixed;width:100%;text-align:center}
th{background-color:lightgray}
.editable:hover{background-color:yellow}
tr:hover{background-color:whitesmoke}
.editable:focus{background-color:#f0e442}
tr:focus-within{background-color:#dddddd}
.accordion:hover .accordion-item:hover .accordion-item-content,
.accordion .accordion-item--default .accordion-item-content{height:9em;}
.accordion-item-content, .accordion:hover .accordion-item-content{height:0;overflow:hidden;transition:height.25s;}
.accordion{padding:0;margin:0auto;width:100px;}
.accordion-item:hover{background-color:yellow;}
.accordion .accordion-item--default .accordion-item-content{height:10.5em}
.accordion-item-content, .accordion:hover .accordion-item-content{height:0;overflow:hidden;transition:height.25s}
.accordion{padding:0;margin:auto;width:100px}
.accordion-item:hover{background-color:#f0e442}
.type_select:hover{background-color:yellow;}
.type_select:hover{background-color:#f0e442}
.ner_per{background-color:skyblue}
.ner_loc{background-color:goldenrod}
.ner_org{background-color:plum}
.ner_pub{background-color:lightgreen}
.ner_conf{background-color:olive}
.ner_art{background-color:lavender}
.ner_todo{background-color:turquoise}
.ner_per{background-color:#56b3e9}
.ner_loc{background-color:#e69d00}
.ner_org{background-color:#df6caa}
.ner_work{background-color:#009e74}
.ner_conf{background-color:#0072b2}
.ner_evt{background-color:#a60a2d}
.ner_todo{background-color:#d55e00}
.fit-image{
width: 100%;
@ -53,8 +55,8 @@
<div class="col-9">
<div class="row">
<div class="col text-center">
<h3><a href="https://github.com/qurator-spk/neath" target="_blank">neath</a>: named entity annotation tool in html</h3>
<a href="https://github.com/qurator-spk/neath/blob/master/docs/User_Guide.md" target="_blank">User Guide</a> | <a href="https://github.com/qurator-spk/neath/blob/master/docs/Annotation_Guidelines.md" target="_blank">Annotation Guidelines</a> | <a href="https://github.com/qurator-spk/neath/issues" target="_blank">Issues</a><hr>
<h3><a href="https://github.com/qurator-spk/neath" target="_blank" tabindex="-1">neath</a>: named entity annotation tool</h3>
<a href="https://github.com/qurator-spk/neath/blob/master/README.md#2-user-guide" target="_blank" tabindex="-1">User Guide</a> | <a href="https://github.com/qurator-spk/neath/blob/master/Annotation_Guidelines.pdf" target="_blank" tabindex="-1">Annotation Guidelines</a> | <a href="https://github.com/qurator-spk/neath/issues" target="_blank" tabindex="-1">Issues</a><hr>
</div>
</div>
</div>
@ -62,13 +64,13 @@
</div>
</div>
<div class="row mt-3">
<div class="col-2" id="region-left">
<div class="col-3" id="region-left">
<a href="" id="preview-link">
<img id="preview" alt="facsimile_preview" class="img-responsive fit-image"/>
</a>
</div>
<div class="col-9 text-center" id="tableregion">
Please upload a TSV file:
<div class="col-8 text-center" id="tableregion">
Please upload a TSV<sup>(<a href="https://github.com/qurator-spk/neath/blob/master/User_Guide.md#22-data-format">i</a>)</sup> file:
<br><br>
<input type="file" id="tsv-file" name="files"/>
</div>

819
neath.js

File diff suppressed because it is too large Load diff