|
46847c5000
|
💄 Use loguru for logging/remove extra progress bars
Closes gh-42.
|
2025-08-08 14:34:06 +02:00 |
|
|
308d2430a8
|
🐛 Fix --help
The --help text mentioned the INPUT arguments which were renamed to
METS_FILES. Update the text so it fits the renamed arguments.
Fixes gh-44.
|
2025-08-08 14:14:11 +02:00 |
|
|
b5b88cf772
|
🐛 Add index column as regular column, too
Closes gh-63.
|
2025-08-08 13:59:47 +02:00 |
|
|
0928caa9d6
|
🧹 MODS: Only add type='text' if there is no type attribute
|
2025-08-08 12:36:25 +02:00 |
|
|
b9190a3695
|
🐛 Update test
|
2025-08-08 12:29:27 +02:00 |
|
|
f332f46e99
|
Merge branch 'master' of https://github.com/qurator-spk/mods4pandas
|
2025-08-08 12:06:48 +02:00 |
|
|
2af30598bd
|
✨ Be more flexible about recordIdentifiers
|
2025-08-08 12:06:35 +02:00 |
|
|
0855ccb66b
|
✨ Add --mets-files-list option to give a list of input files
|
2025-08-07 21:16:32 +02:00 |
|
|
4178f1e380
|
🧹 MODS: Fix typo in type=text fix
|
2025-08-07 20:50:12 +02:00 |
|
|
91502c519d
|
🧹 MODS: Warn if we see the incorrect tag 'mods:origininfo'
|
2025-08-07 20:48:37 +02:00 |
|
|
64aafbb88c
|
🧹 MODS: Handle mods:languageTerm with authority=iso639-2/rfc3066
|
2025-08-07 20:29:40 +02:00 |
|
|
231f53eb7a
|
🧹 MODS: Remove extra attributes in mods:titleInfo
|
2025-08-07 20:07:20 +02:00 |
|
|
2f00dfcce0
|
🧹 MODS: Add missing type='text' for mods:placeTerm
|
2025-08-07 19:32:37 +02:00 |
|
|
2c47a34c14
|
✔ Add some more test data
|
2025-08-07 18:48:59 +02:00 |
|
|
16155e72d6
|
✔ Add some more test data
|
2025-08-07 18:45:39 +02:00 |
|
|
fce59e72f9
|
💩 Workaround missing mods:recordInfo
|
2025-08-07 18:27:40 +02:00 |
|
|
c8744829cf
|
Merge branch 'master' of https://github.com/qurator-spk/mods4pandas
|
2025-08-06 20:19:46 +02:00 |
|
|
a3fc34fcdc
|
✔ MODS: Check for Warnings
|
2025-08-06 20:19:34 +02:00 |
|
|
2f5c872563
|
🐛 Explicitly set con_page_info to None if we don't output page_info
|
2025-08-06 20:18:44 +02:00 |
|
|
eae273452c
|
🐛 MODS/get_mets_div: Return empty list in case an ID is not found
|
2025-08-06 20:17:10 +02:00 |
|
|
a9d650e345
|
✔ ALTO: Make sure we have inner types when testing SQLite conversion for 'object'
|
2025-08-06 20:16:16 +02:00 |
|
|
f59bcfbd63
|
🐛 Fix alto4pandas CLI
|
2025-08-06 20:00:19 +02:00 |
|
|
feb8d09126
|
🧹 MODS: Filter UUID
|
2025-08-05 20:53:43 +02:00 |
|
|
c25de380b4
|
✨ mods:issuance
|
2025-07-30 13:56:43 +02:00 |
|
|
814bc57401
|
🧹 Fix recordIdentifier attribute 'type' 'zdb' to the correct 'source'
|
2025-07-30 13:03:50 +02:00 |
|
|
b12973adb1
|
🐛 Fix alto4pandas CLI
|
2025-06-13 20:29:12 +02:00 |
|
|
01c1762d32
|
🎨 Remove unused var + use 'not in'
|
2025-06-13 19:35:48 +02:00 |
|
|
2511fe8ca4
|
🎨 Sort and remove unused imports
|
2025-06-13 19:22:30 +02:00 |
|
|
89b71dd5c4
|
🎨 Sort/format imports
|
2025-06-13 19:03:35 +02:00 |
|
|
d7c75914d9
|
🐛 Make mypy happier by making mods4pandas a real package
|
2025-06-13 19:00:07 +02:00 |
|
|
212df99436
|
🎨 Reformat (Black)
|
2025-06-12 09:51:02 +02:00 |
|
|
ac8740c33f
|
✔ Test if dtypes are as expected in produced Parquet files
|
2025-06-12 09:42:29 +02:00 |
|
|
215bfbb11f
|
✨ Represent sets as arrays in the Parquet file
|
2025-06-12 07:45:22 +02:00 |
|
|
ebdded90d6
|
🤓 Add type annotations (and related changes)
|
2025-06-12 07:02:23 +02:00 |
|
|
d685454c52
|
✨ page_info: Use boolean for indicator variable, str for hrefs
|
2025-06-11 20:41:44 +02:00 |
|
|
64ed7298da
|
✨ Make Layout_Page_WIDTH/HEIGHT integer values
|
2025-06-11 19:13:38 +02:00 |
|
|
a20c979351
|
🧹 Filter annoying UserWarning on every pandas import (on WSL)
|
2025-06-11 17:20:28 +02:00 |
|
|
62b93c760b
|
🤓 Add type annotations (and related changes)
|
2025-06-11 14:56:26 +02:00 |
|
|
580442a4c9
|
🤓 Add type annotations (and related changes)
|
2025-06-11 14:36:29 +02:00 |
|
|
ebe988cfff
|
🚧 Restore types before saving as Parquet
|
2025-06-04 21:10:10 +02:00 |
|
|
14172e3b81
|
🚧 Save Python types for later conversion
|
2025-06-04 20:32:07 +02:00 |
|
|
88a6c5f26f
|
🐛 alto4pandas: *Really* commit data to SQLite DB
|
2024-12-03 17:34:07 +01:00 |
|
|
4d6e1f4ff4
|
🐛 Add missing tag alto:fileIdentifier
|
2024-12-03 17:24:24 +01:00 |
|
|
ef4eeac7e2
|
🧹 Remove unused/obsolete code
|
2024-12-03 17:02:24 +01:00 |
|
|
6af4a6f671
|
🧹 Remove unused/obsolete code
|
2024-12-03 17:02:12 +01:00 |
|
|
39f7d8646a
|
🚧 Use temporary SQLite DB for alto4pandas, too
|
2024-11-29 15:53:00 +01:00 |
|
|
ca8f165955
|
🧹 Remove redundant comment
|
2024-11-28 20:05:55 +01:00 |
|
|
6981efb87c
|
🐛 Write page_info Parquet file again
|
2024-11-28 18:32:40 +01:00 |
|
|
11a04916f3
|
🐛 Write mods_info Parquet file again
|
2024-11-28 18:27:39 +01:00 |
|
|
abb20b8ba9
|
🐛 Add multivolume type 'multivolume_manuscript'
|
2024-11-28 14:25:27 +01:00 |
|