mirror of
https://github.com/qurator-spk/modstool.git
synced 2025-06-09 19:59:57 +02:00
🐛 Fix converting/writing out per-page information (e.g. structure information)
This commit is contained in:
parent
1bf86bfb4c
commit
a1f333f4a4
2 changed files with 14 additions and 12 deletions
11
README.md
11
README.md
|
@ -10,6 +10,9 @@ instead of ordered lists for topics, etc. Parts of the tool are specific to
|
|||
our environment/needs at the State Library Berlin and may need to be changed for
|
||||
your library.
|
||||
|
||||
Per-page information (e.g. structure information from the METS structMap) can
|
||||
be converted as well (`--output-page-info`).
|
||||
|
||||
**alto4pandas** converts the metadata from ALTO files into a pandas DataFrame.
|
||||
|
||||
Column names are derived from the corresponding ALTO elements. Some columns
|
||||
|
@ -31,14 +34,14 @@ In this example we convert the MODS metadata contained in the METS files in
|
|||
`mods_info_df.parquet`. This file can then be read by your data scientist using
|
||||
`pd.read_parquet()`.
|
||||
|
||||
~~~
|
||||
```
|
||||
% mods4pandas /srv/data/digisam_mets-sample-300
|
||||
INFO:root:Scanning directory /srv/data/digisam_mets-sample-300
|
||||
301it [00:00, 19579.19it/s]
|
||||
INFO:root:Processing METS files
|
||||
100%|████████████████████████████████████████| 301/301 [00:01<00:00, 162.59it/s]
|
||||
INFO:root:Writing DataFrame to mods_info_df.pkl
|
||||
~~~
|
||||
INFO:root:Writing DataFrame to mods_info_df.parquet
|
||||
```
|
||||
|
||||
In the next example we convert the metadata from the ALTO files in the test data
|
||||
directory:
|
||||
|
@ -56,5 +59,5 @@ Scanning directory qurator/mods4pandas/tests/data/alto/PPN715049151
|
|||
Scanning directory qurator/mods4pandas/tests/data/alto/749782137
|
||||
Scanning directory qurator/mods4pandas/tests/data/alto/weird-ns
|
||||
INFO:alto4pandas:Processing ALTO files
|
||||
INFO:alto4pandas:Writing DataFrame to alto_info_df.pkl
|
||||
INFO:alto4pandas:Writing DataFrame to alto_info_df.parquet
|
||||
~~~
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue