1
0
Fork 0
mirror of https://github.com/qurator-spk/modstool.git synced 2025-06-09 03:40:01 +02:00
No description
Find a file
Gerber, Mike db79960ba1 Force singleton for shelfLocator
Very few input sources contain multiple mods:shelfLocator elements
for a mods:Location (illegal according to DFG MODS Anwendungsprofil).
Force a singleton in this case.

Fixes gh-7.
2022-04-07 16:22:41 +02:00
.circleci ✔️ CircleCI: - Python 3.5 + Python 3.10 2022-03-31 21:10:13 +02:00
.vscode ⚙️ Add VSCode settings 2022-04-05 14:24:08 +02:00
qurator Force singleton for shelfLocator 2022-04-07 16:22:41 +02:00
.editorconfig ⚙️ Add .editorconfig 2022-04-01 16:04:47 +02:00
.gitignore 🧹 .gitignore 2022-04-01 16:04:21 +02:00
LICENSE 📝 modstool: Add LICENSE 2019-10-11 13:41:33 +02:00
README.md 🚧 modstool: Replace Travis with CircleCI 2021-02-10 18:46:25 +01:00
requirements.txt 🐛 Fix install on Python 3.10 2022-04-07 15:17:19 +02:00
setup.py 🤪 modstool: Fix setup.py keywords 2019-08-29 16:28:06 +02:00

Extract the MODS metadata of a bunch of METS files into a pandas DataFrame.

Build Status

Column names are derived from the corresponding MODS elements. Some domain knowledge is used to convert elements to a useful column, e.g. produce sets instead of ordered lists for topics, etc. Parts of the tool are specific to our environment/needs at the State Library Berlin and may need to be changed for your library.

Usage

modstool /path/to/a/directory/containing/mets_files

Example

In this example we convert the MODS metadata contained in the METS files in /srv/data/digisam_mets-sample-300 to a pandas DataFrame under mods_info_df.pkl. This file can then be read by your data scientist using pd.read_pickle().

% modstool /srv/data/digisam_mets-sample-300
INFO:root:Scanning directory /srv/data/digisam_mets-sample-300
301it [00:00, 19579.19it/s]
INFO:root:Processing METS files
100%|████████████████████████████████████████| 301/301 [00:01<00:00, 162.59it/s]
INFO:root:Writing DataFrame to mods_info_df.pkl