mirror of
				https://github.com/qurator-spk/neat.git
				synced 2025-10-31 00:34:14 +01:00 
			
		
		
		
	final name
This commit is contained in:
		
							parent
							
								
									6adee57607
								
							
						
					
					
						commit
						dbafbc3261
					
				
					 3 changed files with 32 additions and 32 deletions
				
			
		
							
								
								
									
										46
									
								
								README.md
									
										
									
									
									
								
							
							
						
						
									
										46
									
								
								README.md
									
										
									
									
									
								
							|  | @ -1,46 +1,46 @@ | ||||||
| # neath: named entity annotation tool | # neat: named entity annotation tool | ||||||
| #### version 0.1 | #### version 0.1 | ||||||
| --- | --- | ||||||
|  |  | ||||||
| --- | --- | ||||||
| 
 | 
 | ||||||
| ### Table of contents | ### Table of contents | ||||||
| [1. Introduction](https://github.com/qurator-spk/neath/blob/master/README.md#1-introduction)  | [1. Introduction](https://github.com/qurator-spk/neat/blob/master/README.md#1-introduction)  | ||||||
| 
 | 
 | ||||||
| [2. User Guide](https://github.com/qurator-spk/neath/blob/master/README.md#2-user-guide) | [2. User Guide](https://github.com/qurator-spk/neat/blob/master/README.md#2-user-guide) | ||||||
| 
 | 
 | ||||||
|    [2.1 Technical requirements](https://github.com/qurator-spk/neath/blob/master/README.md#21-technical-requirements)  |    [2.1 Technical requirements](https://github.com/qurator-spk/neat/blob/master/README.md#21-technical-requirements)  | ||||||
| 
 | 
 | ||||||
|    [2.2 Installation](https://github.com/qurator-spk/neath/blob/master/README.md#22-installation)  |    [2.2 Installation](https://github.com/qurator-spk/neat/blob/master/README.md#22-installation)  | ||||||
|      |      | ||||||
|    [2.3 Data format](https://github.com/qurator-spk/neath/blob/master/README.md#23-data-format) |    [2.3 Data format](https://github.com/qurator-spk/neat/blob/master/README.md#23-data-format) | ||||||
|      |      | ||||||
|    [2.4 Data preparation](https://github.com/qurator-spk/neath/blob/master/README.md#24-data-preparation) |    [2.4 Data preparation](https://github.com/qurator-spk/neat/blob/master/README.md#24-data-preparation) | ||||||
|      |      | ||||||
|    [2.5 Provenance](https://github.com/qurator-spk/neath/blob/master/README.md#25-provenance) |    [2.5 Provenance](https://github.com/qurator-spk/neat/blob/master/README.md#25-provenance) | ||||||
|      |      | ||||||
|    [2.6 Keyboard navigation](https://github.com/qurator-spk/neath/blob/master/README.md#26-keyboard-navigation) |    [2.6 Keyboard navigation](https://github.com/qurator-spk/neat/blob/master/README.md#26-keyboard-navigation) | ||||||
|      |      | ||||||
|    [2.7 Mouse navigation](https://github.com/qurator-spk/neath/blob/master/README.md#27-mouse-navigation) |    [2.7 Mouse navigation](https://github.com/qurator-spk/neat/blob/master/README.md#27-mouse-navigation) | ||||||
|      |      | ||||||
|    [2.8 Image support](https://github.com/qurator-spk/neath/blob/master/README.md#28-image-support) |    [2.8 Image support](https://github.com/qurator-spk/neat/blob/master/README.md#28-image-support) | ||||||
|      |      | ||||||
|    [2.9 Saving progress](https://github.com/qurator-spk/neath/blob/master/README.md#29-saving-progress) |    [2.9 Saving progress](https://github.com/qurator-spk/neat/blob/master/README.md#29-saving-progress) | ||||||
| 
 | 
 | ||||||
| [3. Annotation Guidelines](https://github.com/qurator-spk/neath/blob/master/README.md#3-annotation-guidelines) | [3. Annotation Guidelines](https://github.com/qurator-spk/neat/blob/master/README.md#3-annotation-guidelines) | ||||||
| 
 | 
 | ||||||
| ### 1. Introduction | ### 1. Introduction | ||||||
| [neath](https://github.com/qurator-spk/neath) is a simple, browser-based tool for editing and annotating text with named entities to produce a corpus for training/testing/evaluation. It can be used to add or correct named entity BIO-tags in a TSV file and to correct the token text or tokenization (e.g. due to OCR/segmentation errors).  | [neat](https://github.com/qurator-spk/neat) is a simple, browser-based tool for editing and annotating text with named entities to produce a corpus for training/testing/evaluation. It can be used to add or correct named entity BIO-tags in a TSV file and to correct the token text or tokenization (e.g. due to OCR/segmentation errors).  | ||||||
| 
 | 
 | ||||||
| [neath](https://github.com/qurator-spk/neath) is developed at the [Berlin State Library](https://staatsbibliothek-berlin.de/) for data annotation in the context of the [SoNAR-IDH](https://sonar.fh-potsdam.de/) project and the [QURATOR](https://qurator.ai/) project. | [neat](https://github.com/qurator-spk/neat) is developed at the [Berlin State Library](https://staatsbibliothek-berlin.de/) for data annotation in the context of the [SoNAR-IDH](https://sonar.fh-potsdam.de/) project and the [QURATOR](https://qurator.ai/) project. | ||||||
| 
 | 
 | ||||||
| ### 2. User Guide | ### 2. User Guide | ||||||
| 
 | 
 | ||||||
| #### 2.1 Technical Requirements  | #### 2.1 Technical Requirements  | ||||||
| [neath](https://github.com/qurator-spk/neath) runs locally as a pure HTML+JavaScript webpage in your web browser. No software needs to be installed, but JavaScript has to be enabled in the browser.  | [neat](https://github.com/qurator-spk/neat) runs locally as a pure HTML+JavaScript webpage in your web browser. No software needs to be installed, but JavaScript has to be enabled in the browser.  | ||||||
| 
 | 
 | ||||||
| #### 2.2. Installation | #### 2.2. Installation | ||||||
| Simply clone the repo using ``git clone https://github.com/qurator-spk/neath.git`` or download the [ZIP](https://github.com/qurator-spk/neath/archive/master.zip). Make sure you have at minimum ``neath.html`` and ``neath.js`` residing in a local directory, then it is sufficient to just open ``neath.html`` in a browser. Any fairly recent browser should work, but only Chrome and Firefox are tested. | Simply clone the repo using ``git clone https://github.com/qurator-spk/neat.git`` or download the [ZIP](https://github.com/qurator-spk/neat/archive/master.zip). Make sure you have at minimum ``neat.html`` and ``neat.js`` residing in a local directory, then it is sufficient to just open ``neat.html`` in a browser. Any fairly recent browser should work, but only Chrome and Firefox are tested. | ||||||
| 
 | 
 | ||||||
| #### 2.3 Data format    | #### 2.3 Data format    | ||||||
| The data format is based on the format used in the [GermEval2014 Named Entity Recognition Shared Task](https://sites.google.com/site/germeval2014ner/data). Text is encoded as one token per line, with name spans encoded in the BIO-scheme, provided as tab-separated values: | The data format is based on the format used in the [GermEval2014 Named Entity Recognition Shared Task](https://sites.google.com/site/germeval2014ner/data). Text is encoded as one token per line, with name spans encoded in the BIO-scheme, provided as tab-separated values: | ||||||
|  | @ -77,7 +77,7 @@ No.	TOKEN	NE-TAG	NE-EMB | ||||||
| 
 | 
 | ||||||
| For our purposes we extend this format by adding | For our purposes we extend this format by adding | ||||||
| * a fifth column for an ``ID`` for the outer ``NE-TAG`` from an authority file (in this case, the [GND](https://www.dnb.de/EN/Professionell/Standardisierung/GND/gnd_node.html) is used)  | * a fifth column for an ``ID`` for the outer ``NE-TAG`` from an authority file (in this case, the [GND](https://www.dnb.de/EN/Professionell/Standardisierung/GND/gnd_node.html) is used)  | ||||||
| * column six for use as a variable ``url_id`` (see [Image Support](https://github.com/qurator-spk/neath/blob/master/README.md#28-image-support) for further details) | * column six for use as a variable ``url_id`` (see [Image Support](https://github.com/qurator-spk/neat/blob/master/README.md#28-image-support) for further details) | ||||||
| * finally, columns 7+ are used for storing ``left,right,top,bottom`` pixel coordinates for facsimile snippets  | * finally, columns 7+ are used for storing ``left,right,top,bottom`` pixel coordinates for facsimile snippets  | ||||||
| 
 | 
 | ||||||
| Example (full): | Example (full): | ||||||
|  | @ -105,7 +105,7 @@ No.	TOKEN	NE-TAG	NE-EMB	GND-ID	url_id	left,right,top,bottom | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
| #### 2.4 Data preparation   | #### 2.4 Data preparation   | ||||||
| The source data that is used for annotation are OCR results in [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) format. We provide a [Python tool](https://github.com/qurator-spk/page2tsv) that supports the transformation of [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) OCR files into the [TSV format](https://github.com/qurator-spk/neath/blob/master/README.md#23-data-format) required for use with [neath](https://github.com/qurator-spk/neath). | The source data that is used for annotation are OCR results in [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) format. We provide a [Python tool](https://github.com/qurator-spk/page2tsv) that supports the transformation of [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) OCR files into the [TSV format](https://github.com/qurator-spk/neat/blob/master/README.md#23-data-format) required for use with [neat](https://github.com/qurator-spk/neat). | ||||||
| 
 | 
 | ||||||
| #### 2.5 Provenance | #### 2.5 Provenance | ||||||
| The processing pipeline applied at the Berlin State Library comprises the follows steps:  | The processing pipeline applied at the Berlin State Library comprises the follows steps:  | ||||||
|  | @ -115,7 +115,7 @@ Layout Analysis & Textline Extraction @[sbb_textline_detector](https://github.co | ||||||
| 2. OCR & Word Segmentation     | 2. OCR & Word Segmentation     | ||||||
| OCR is based on [OCR-D](https://github.com/OCR-D)'s [ocrd_tesserocr](https://github.com/OCR-D/ocrd_tesserocr) which requires [Tesseract](https://github.com/tesseract-ocr/tesseract) **>= 4.1.0**. The [GT4HistOCR_2000000](https://ub-backup.bib.uni-mannheim.de/~stweil/ocrd-train/data/GT4HistOCR_2000000.traineddata) model, which is [trained](https://github.com/tesseract-ocr/tesstrain/wiki/GT4HistOCR) on the [GT4HistOCR](https://zenodo.org/record/1344132) corpus, is used. Further details are available in the [paper](https://arxiv.org/abs/1809.05501). | OCR is based on [OCR-D](https://github.com/OCR-D)'s [ocrd_tesserocr](https://github.com/OCR-D/ocrd_tesserocr) which requires [Tesseract](https://github.com/tesseract-ocr/tesseract) **>= 4.1.0**. The [GT4HistOCR_2000000](https://ub-backup.bib.uni-mannheim.de/~stweil/ocrd-train/data/GT4HistOCR_2000000.traineddata) model, which is [trained](https://github.com/tesseract-ocr/tesstrain/wiki/GT4HistOCR) on the [GT4HistOCR](https://zenodo.org/record/1344132) corpus, is used. Further details are available in the [paper](https://arxiv.org/abs/1809.05501). | ||||||
| 3. TSV Transformation    | 3. TSV Transformation    | ||||||
| A simple [Python tool](https://github.com/qurator-spk/page2tsv) is used for the transformation of the OCR results in [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) to [TSV](https://github.com/qurator-spk/neath/blob/master/docs/README.md#23-data-format). | A simple [Python tool](https://github.com/qurator-spk/page2tsv) is used for the transformation of the OCR results in [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) to [TSV](https://github.com/qurator-spk/neat/blob/master/docs/README.md#23-data-format). | ||||||
| 4. Tokenization     | 4. Tokenization     | ||||||
| For tokenization, [SoMaJo](https://github.com/tsproisl/SoMaJo) is used. | For tokenization, [SoMaJo](https://github.com/tsproisl/SoMaJo) is used. | ||||||
| 5. Named Entity Recognition     | 5. Named Entity Recognition     | ||||||
|  | @ -180,13 +180,13 @@ For Named Entity Recognition, a [BERT-Base](https://github.com/google-research/b | ||||||
| * left-click the `POSITION` of a row and select `start-sentence` from the drop-down menu to start a new sentence | * left-click the `POSITION` of a row and select `start-sentence` from the drop-down menu to start a new sentence | ||||||
| 
 | 
 | ||||||
| #### 2.8 Image Support | #### 2.8 Image Support | ||||||
| Provided facsimile images are available online via the [iiif.io](https://iiif.io/) Image API, [neath](https://github.com/qurator-spk/neath) supports the embedding of facsimile snippets into its interface to help with data annotation and correction.  | Provided facsimile images are available online via the [iiif.io](https://iiif.io/) Image API, [neat](https://github.com/qurator-spk/neat) supports the embedding of facsimile snippets into its interface to help with data annotation and correction.  | ||||||
| This further requires that OCR with word segmentation is applied to the image to determine bounding boxes for tokens.  | This further requires that OCR with word segmentation is applied to the image to determine bounding boxes for tokens.  | ||||||
| 
 | 
 | ||||||
| The iiif-image-url contained in the source ``#`` can then be used as a replacement for ``url_id`` in combination with the token bounding boxes as ``left,right,top,bottom`` to obtain the facsimile snippet url and display the image in the leftmost column. Clicking on the facsimile snippet opens up a new tab with a larger context. | The iiif-image-url contained in the source ``#`` can then be used as a replacement for ``url_id`` in combination with the token bounding boxes as ``left,right,top,bottom`` to obtain the facsimile snippet url and display the image in the leftmost column. Clicking on the facsimile snippet opens up a new tab with a larger context. | ||||||
| 
 | 
 | ||||||
| #### 2.9 Saving progress | #### 2.9 Saving progress | ||||||
| [neath](https://github.com/qurator-spk/neath) runs fully locally in the browser. Therefore it can not automatically save any changes you made to disk. You have to use the `Save Changes` button in order to so manually from time to time. If your browser automatically saves all downloads to your `Downloads` folder, you might want to configure it so that it instead prompts you where to save. | [neat](https://github.com/qurator-spk/neat) runs fully locally in the browser. Therefore it can not automatically save any changes you made to disk. You have to use the `Save Changes` button in order to so manually from time to time. If your browser automatically saves all downloads to your `Downloads` folder, you might want to configure it so that it instead prompts you where to save. | ||||||
| 
 | 
 | ||||||
| ### 3. Annotation Guidelines | ### 3. Annotation Guidelines | ||||||
| The most recent version of the [Annotation Guidelines](https://github.com/qurator-spk/neath/blob/master/Annotation_Guidelines.pdf) is included in this repository.  | The most recent version of the [Annotation Guidelines](https://github.com/qurator-spk/neat/blob/master/Annotation_Guidelines.pdf) is included in this repository.  | ||||||
|  |  | ||||||
|  | @ -2,8 +2,8 @@ | ||||||
| <html> | <html> | ||||||
| <head> | <head> | ||||||
|     <meta charset="UTF-8"> |     <meta charset="UTF-8"> | ||||||
|     <title>neath</title> |     <title>neat</title> | ||||||
|     <base href="neath.html" target="_blank"> |     <base href="neat.html" target="_blank"> | ||||||
|     <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css" |     <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css" | ||||||
|           integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T" crossorigin="anonymous"> |           integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T" crossorigin="anonymous"> | ||||||
|     <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/PapaParse/5.1.0/papaparse.min.js"></script> |     <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/PapaParse/5.1.0/papaparse.min.js"></script> | ||||||
|  | @ -55,8 +55,8 @@ | ||||||
|         <div class="col-9"> |         <div class="col-9"> | ||||||
|             <div class="row"> |             <div class="row"> | ||||||
|                 <div class="col text-center"> |                 <div class="col text-center"> | ||||||
|                     <h3><a href="https://github.com/qurator-spk/neath" target="_blank" tabindex="-1">neath</a>: named entity annotation tool</h3> |                     <h3><a href="https://github.com/qurator-spk/neat" target="_blank" tabindex="-1">neat</a>: named entity annotation tool</h3> | ||||||
|                     <a href="https://github.com/qurator-spk/neath/blob/master/README.md#2-user-guide" target="_blank"  tabindex="-1">User Guide</a> | <a href="https://github.com/qurator-spk/neath/blob/master/Annotation_Guidelines.pdf" target="_blank" tabindex="-1">Annotation Guidelines</a> | <a href="https://github.com/qurator-spk/neath/issues" target="_blank" tabindex="-1">Issues</a><hr> |                     <a href="https://github.com/qurator-spk/neat/blob/master/README.md#2-user-guide" target="_blank"  tabindex="-1">User Guide</a> | <a href="https://github.com/qurator-spk/neat/blob/master/Annotation_Guidelines.pdf" target="_blank" tabindex="-1">Annotation Guidelines</a> | <a href="https://github.com/qurator-spk/neat/issues" target="_blank" tabindex="-1">Issues</a><hr> | ||||||
|                 </div> |                 </div> | ||||||
|             </div> |             </div> | ||||||
|         </div> |         </div> | ||||||
|  | @ -70,7 +70,7 @@ | ||||||
|             </a> |             </a> | ||||||
|         </div> |         </div> | ||||||
|         <div class="col-8 text-center" id="tableregion"> |         <div class="col-8 text-center" id="tableregion"> | ||||||
|             Please upload a TSV<sup>(<a href="https://github.com/qurator-spk/neath/blob/master/User_Guide.md#22-data-format">i</a>)</sup> file: |             Please upload a TSV<sup>(<a href="https://github.com/qurator-spk/neat/blob/master/User_Guide.md#22-data-format">i</a>)</sup> file: | ||||||
|             <br><br> |             <br><br> | ||||||
|             <input type="file" id="tsv-file" name="files"/> |             <input type="file" id="tsv-file" name="files"/> | ||||||
|         </div> |         </div> | ||||||
|  | @ -88,6 +88,6 @@ | ||||||
| 
 | 
 | ||||||
| </div> | </div> | ||||||
| 
 | 
 | ||||||
| <script src="neath.js"></script> | <script src="neat.js"></script> | ||||||
| </body> | </body> | ||||||
| </html> | </html> | ||||||
|  | @ -1009,14 +1009,14 @@ $(document).ready( | ||||||
|                 loadFile ( evt, |                 loadFile ( evt, | ||||||
|                     function(results, file, urls) { |                     function(results, file, urls) { | ||||||
| 
 | 
 | ||||||
|                         let neath = setupInterface(results, file, urls); |                         let neat = setupInterface(results, file, urls); | ||||||
| 
 | 
 | ||||||
|                         $(window).bind("beforeunload", |                         $(window).bind("beforeunload", | ||||||
|                             function() { |                             function() { | ||||||
| 
 | 
 | ||||||
|                                 console.log(neath.hasChanges()); |                                 console.log(neat.hasChanges()); | ||||||
| 
 | 
 | ||||||
|                                 if (neath.hasChanges()) |                                 if (neat.hasChanges()) | ||||||
|                                     return confirm("You have unsaved changes. Do you want to save them before leaving?"); |                                     return confirm("You have unsaved changes. Do you want to save them before leaving?"); | ||||||
|                             } |                             } | ||||||
|                         ); |                         ); | ||||||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue