You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Gerber, Mike 6e2e0bd67a | 3 years ago | |
---|---|---|
.circleci | 3 years ago | |
.vscode | 3 years ago | |
qurator | 3 years ago | |
.editorconfig | 3 years ago | |
.gitignore | 3 years ago | |
LICENSE | 5 years ago | |
README.md | 4 years ago | |
requirements-test.txt | 3 years ago | |
requirements.txt | 3 years ago | |
setup.py | 3 years ago |
README.md
Extract the MODS metadata of a bunch of METS files into a pandas DataFrame.
Column names are derived from the corresponding MODS elements. Some domain knowledge is used to convert elements to a useful column, e.g. produce sets instead of ordered lists for topics, etc. Parts of the tool are specific to our environment/needs at the State Library Berlin and may need to be changed for your library.
Usage
modstool /path/to/a/directory/containing/mets_files
Example
In this example we convert the MODS metadata contained in the METS files in
/srv/data/digisam_mets-sample-300
to a pandas DataFrame under
mods_info_df.pkl
. This file can then be read by your data scientist using
pd.read_pickle()
.
% modstool /srv/data/digisam_mets-sample-300
INFO:root:Scanning directory /srv/data/digisam_mets-sample-300
301it [00:00, 19579.19it/s]
INFO:root:Processing METS files
100%|████████████████████████████████████████| 301/301 [00:01<00:00, 162.59it/s]
INFO:root:Writing DataFrame to mods_info_df.pkl