mirror of
https://github.com/qurator-spk/modstool.git
synced 2025-06-08 11:20:07 +02:00
📝 README: Add some documentation for alto4pandas
This commit is contained in:
parent
6f2265a619
commit
f507370729
1 changed files with 30 additions and 1 deletions
31
README.md
31
README.md
|
@ -1,19 +1,29 @@
|
||||||
Extract the MODS metadata of a bunch of METS files into a pandas DataFrame.
|
Extract the MODS/ALTO metadata of a bunch of METS/ALTO files into pandas DataFrames.
|
||||||
|
|
||||||
[](https://circleci.com/gh/qurator-spk/modstool)
|
[](https://circleci.com/gh/qurator-spk/modstool)
|
||||||
|
|
||||||
|
**modstool** converts the MODS metadata from METS files into a pandas DataFrame.
|
||||||
|
|
||||||
Column names are derived from the corresponding MODS elements. Some domain
|
Column names are derived from the corresponding MODS elements. Some domain
|
||||||
knowledge is used to convert elements to a useful column, e.g. produce sets
|
knowledge is used to convert elements to a useful column, e.g. produce sets
|
||||||
instead of ordered lists for topics, etc. Parts of the tool are specific to
|
instead of ordered lists for topics, etc. Parts of the tool are specific to
|
||||||
our environment/needs at the State Library Berlin and may need to be changed for
|
our environment/needs at the State Library Berlin and may need to be changed for
|
||||||
your library.
|
your library.
|
||||||
|
|
||||||
|
**alto4pandas** convets the metadata from ALTO files into a pandas DataFrame.
|
||||||
|
|
||||||
|
Column names are derived from the corresponding ALTO elements. Some columns
|
||||||
|
contain descriptive statistics (e.g. counts or mean) of the corresponding ALTO
|
||||||
|
elements or attributes.
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
~~~
|
~~~
|
||||||
modstool /path/to/a/directory/containing/mets_files
|
modstool /path/to/a/directory/containing/mets_files
|
||||||
~~~
|
~~~
|
||||||
|
|
||||||
|
~~
|
||||||
|
alto4pandas /path/to/a/directory/full/of/alto_files
|
||||||
|
~~~
|
||||||
|
|
||||||
## Example
|
## Example
|
||||||
In this example we convert the MODS metadata contained in the METS files in
|
In this example we convert the MODS metadata contained in the METS files in
|
||||||
|
@ -29,3 +39,22 @@ INFO:root:Processing METS files
|
||||||
100%|████████████████████████████████████████| 301/301 [00:01<00:00, 162.59it/s]
|
100%|████████████████████████████████████████| 301/301 [00:01<00:00, 162.59it/s]
|
||||||
INFO:root:Writing DataFrame to mods_info_df.pkl
|
INFO:root:Writing DataFrame to mods_info_df.pkl
|
||||||
~~~
|
~~~
|
||||||
|
|
||||||
|
In the next example we convert the metadata from the ALTO files in the test data
|
||||||
|
directory:
|
||||||
|
|
||||||
|
~~~
|
||||||
|
% alto4pandas qurator/modstool/tests/data/alto
|
||||||
|
Scanning directory qurator/modstool/tests/data/alto
|
||||||
|
Scanning directory qurator/modstool/tests/data/alto/PPN636777308
|
||||||
|
Scanning directory qurator/modstool/tests/data/alto/734008031
|
||||||
|
Scanning directory qurator/modstool/tests/data/alto/PPN895016346
|
||||||
|
Scanning directory qurator/modstool/tests/data/alto/PPN640992293
|
||||||
|
Scanning directory qurator/modstool/tests/data/alto/alto-ner
|
||||||
|
Scanning directory qurator/modstool/tests/data/alto/PPN767883624
|
||||||
|
Scanning directory qurator/modstool/tests/data/alto/PPN715049151
|
||||||
|
Scanning directory qurator/modstool/tests/data/alto/749782137
|
||||||
|
Scanning directory qurator/modstool/tests/data/alto/weird-ns
|
||||||
|
INFO:alto4pandas:Processing ALTO files
|
||||||
|
INFO:alto4pandas:Writing DataFrame to alto_info_df.pkl
|
||||||
|
~~~
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue