mirror of
				https://github.com/qurator-spk/modstool.git
				synced 2025-11-03 19:04:13 +01:00 
			
		
		
		
	📝 README: Add some documentation for alto4pandas
This commit is contained in:
		
							parent
							
								
									6f2265a619
								
							
						
					
					
						commit
						f507370729
					
				
					 1 changed files with 30 additions and 1 deletions
				
			
		
							
								
								
									
										31
									
								
								README.md
									
										
									
									
									
								
							
							
						
						
									
										31
									
								
								README.md
									
										
									
									
									
								
							| 
						 | 
					@ -1,19 +1,29 @@
 | 
				
			||||||
Extract the MODS metadata of a bunch of METS files into a pandas DataFrame.
 | 
					Extract the MODS/ALTO metadata of a bunch of METS/ALTO files into pandas DataFrames.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
[](https://circleci.com/gh/qurator-spk/modstool)
 | 
					[](https://circleci.com/gh/qurator-spk/modstool)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					**modstool** converts the MODS metadata from METS files into a pandas DataFrame.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Column names are derived from the corresponding MODS elements. Some domain
 | 
					Column names are derived from the corresponding MODS elements. Some domain
 | 
				
			||||||
knowledge is used to convert elements to a useful column, e.g. produce sets
 | 
					knowledge is used to convert elements to a useful column, e.g. produce sets
 | 
				
			||||||
instead of ordered lists for topics, etc. Parts of the tool are specific to
 | 
					instead of ordered lists for topics, etc. Parts of the tool are specific to
 | 
				
			||||||
our environment/needs at the State Library Berlin and may need to be changed for
 | 
					our environment/needs at the State Library Berlin and may need to be changed for
 | 
				
			||||||
your library.
 | 
					your library.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					**alto4pandas** convets the metadata from ALTO files into a pandas DataFrame.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Column names are derived from the corresponding ALTO elements. Some columns
 | 
				
			||||||
 | 
					contain descriptive statistics (e.g. counts or mean) of the corresponding ALTO
 | 
				
			||||||
 | 
					elements or attributes.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
## Usage
 | 
					## Usage
 | 
				
			||||||
~~~
 | 
					~~~
 | 
				
			||||||
modstool /path/to/a/directory/containing/mets_files
 | 
					modstool /path/to/a/directory/containing/mets_files
 | 
				
			||||||
~~~
 | 
					~~~
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					~~
 | 
				
			||||||
 | 
					alto4pandas /path/to/a/directory/full/of/alto_files
 | 
				
			||||||
 | 
					~~~
 | 
				
			||||||
 | 
					
 | 
				
			||||||
## Example
 | 
					## Example
 | 
				
			||||||
In this example we convert the MODS metadata contained in the METS files in
 | 
					In this example we convert the MODS metadata contained in the METS files in
 | 
				
			||||||
| 
						 | 
					@ -29,3 +39,22 @@ INFO:root:Processing METS files
 | 
				
			||||||
100%|████████████████████████████████████████| 301/301 [00:01<00:00, 162.59it/s]
 | 
					100%|████████████████████████████████████████| 301/301 [00:01<00:00, 162.59it/s]
 | 
				
			||||||
INFO:root:Writing DataFrame to mods_info_df.pkl
 | 
					INFO:root:Writing DataFrame to mods_info_df.pkl
 | 
				
			||||||
~~~
 | 
					~~~
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					In the next example we convert the metadata from the ALTO files in the test data
 | 
				
			||||||
 | 
					directory:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					~~~
 | 
				
			||||||
 | 
					% alto4pandas qurator/modstool/tests/data/alto
 | 
				
			||||||
 | 
					Scanning directory qurator/modstool/tests/data/alto
 | 
				
			||||||
 | 
					Scanning directory qurator/modstool/tests/data/alto/PPN636777308
 | 
				
			||||||
 | 
					Scanning directory qurator/modstool/tests/data/alto/734008031
 | 
				
			||||||
 | 
					Scanning directory qurator/modstool/tests/data/alto/PPN895016346
 | 
				
			||||||
 | 
					Scanning directory qurator/modstool/tests/data/alto/PPN640992293
 | 
				
			||||||
 | 
					Scanning directory qurator/modstool/tests/data/alto/alto-ner
 | 
				
			||||||
 | 
					Scanning directory qurator/modstool/tests/data/alto/PPN767883624
 | 
				
			||||||
 | 
					Scanning directory qurator/modstool/tests/data/alto/PPN715049151
 | 
				
			||||||
 | 
					Scanning directory qurator/modstool/tests/data/alto/749782137
 | 
				
			||||||
 | 
					Scanning directory qurator/modstool/tests/data/alto/weird-ns
 | 
				
			||||||
 | 
					INFO:alto4pandas:Processing ALTO files
 | 
				
			||||||
 | 
					INFO:alto4pandas:Writing DataFrame to alto_info_df.pkl
 | 
				
			||||||
 | 
					~~~
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue