-
88a6c5f26f
🐛 alto4pandas: *Really* commit data to SQLite DB
fix/use-temp-sqlite3
Mike Gerber
2024-12-03 17:34:07 +0100
-
4d6e1f4ff4
🐛 Add missing tag alto:fileIdentifier
Mike Gerber
2024-12-03 17:24:24 +0100
-
ef4eeac7e2
🧹 Remove unused/obsolete code
Mike Gerber
2024-12-03 17:02:24 +0100
-
6af4a6f671
🧹 Remove unused/obsolete code
Mike Gerber
2024-12-03 17:02:12 +0100
-
39f7d8646a
🚧 Use temporary SQLite DB for alto4pandas, too
Mike Gerber
2024-11-29 15:53:00 +0100
-
ca8f165955
🧹 Remove redundant comment
Mike Gerber
2024-11-28 20:05:55 +0100
-
6981efb87c
🐛 Write page_info Parquet file again
Mike Gerber
2024-11-28 18:32:40 +0100
-
11a04916f3
🐛 Write mods_info Parquet file again
Mike Gerber
2024-11-28 18:27:39 +0100
-
abb20b8ba9
🐛 Add multivolume type 'multivolume_manuscript'
Mike Gerber
2024-11-28 14:25:27 +0100
-
8ee4c3d0bc
🐛 Normalize structure type names to lower case
Mike Gerber
2024-11-27 19:56:36 +0100
-
939967edc8
🐛 De-couple mods_info from page_info
Mike Gerber
2024-11-27 19:05:05 +0100
-
11d7b012ec
🐛 Leave tqdm progress bar to avoid confusion through other left-over progress bars
Mike Gerber
2024-11-27 19:03:35 +0100
-
22b62d7a2f
🐛 Remove output files before writing them again
Mike Gerber
2024-11-27 18:48:56 +0100
-
eeaad03686
🚧 Avoid nested quotes for Python < 3.12
Mike Gerber
2024-11-27 16:38:18 +0100
-
b385f27391
🚧 Write out page_info
Mike Gerber
2024-11-27 14:43:42 +0100
-
a1390699d4
🚧 Use a temporary sqlite db
Mike Gerber
2024-11-26 16:27:43 +0100
-
8d6b97f6b3
🐛 Fix typo in XlsxWriter dependency
master
Mike Gerber
2024-08-02 05:55:47 +0200
-
7122f0265f
Remove direct CSV/Excel support
Mike Gerber
2024-07-31 11:09:34 +0200
-
a1f333f4a4
🐛 Fix converting/writing out per-page information (e.g. structure information)
Mike Gerber
2024-07-31 10:27:46 +0200
-
1bf86bfb4c
✔ Test on Python 3.12
Mike Gerber
2024-07-29 07:02:59 +0200
-
191867cdef
⚙ Make saving per-page information optional
Mike Gerber
2024-07-29 06:08:01 +0200
-
dd4febf24d
🚧 Write a Parquet file
Mike Gerber
2024-07-27 12:57:33 +0200
-
03d86ce68a
🐛 Fix tests
Mike Gerber
2024-07-25 13:26:12 +0200
-
ae650f70a2
⚙ Migrate to pyproject.toml
Mike Gerber
2024-07-25 13:20:18 +0200
-
187da054b0
⚙ Migrate to pyproject.toml
Mike Gerber
2024-07-25 12:45:22 +0200
-
acd9c5cd4b
Merge branch 'feat/page_info' of https://github.com/qurator-spk/mods4pandas into feat/page_info
Mike Gerber
2024-07-25 12:01:49 +0200
-
-
515d39bb1d
🚧 Workaround NumPy incompatibility by requiring < 2
Mike Gerber
2024-07-25 12:01:35 +0200
-
e9fca0f563
🐛 mods4pandas: Handle MultiVolumeWork (differently spelled type cp. to before) without structMap TYPE='PHYSICAL'
Gerber, Mike
2023-12-12 13:33:05 +0100
-
b8a2872582
🐛 mods4pandas: Handle periodical without structMap TYPE='PHYSICAL'
Gerber, Mike
2023-12-12 13:13:23 +0100
-
6226618f40
🐛 mods4pandas: Handle multivolume_work without structMap TYPE='PHYSICAL'
Gerber, Mike
2023-12-12 12:34:24 +0100
-
0acaa83163
⚡ MUse relative predicate
Gerber, Mike
2023-12-09 12:48:07 +0100
-
8fc4eeeb3b
⚡ Make get_sets_file faster by using a lookup table
Gerber, Mike
2023-12-09 12:05:20 +0100
-
912e5d2b4a
⚡ Make get_struct_log faster by using precise predicates
Gerber, Mike
2023-12-09 11:40:45 +0100
-
448639b05b
⚡ Make gettstruct_log faster by using precise predicates
Gerber, Mike
2023-12-09 11:35:24 +0100
-
1dac77a2f5
⚡ Make gett_struct_log faster by using find(all) instead of xpath()
Gerber, Mike
2023-12-09 10:36:55 +0100
-
90c60ebb80
⚡ Make get_mets_file aa lot faster by using find() instead of xpath()
Gerber, Mike
2023-12-09 10:24:38 +0100
-
16a3a3bcc8
✔ Fix tests on Python 3.8 by backporting removeprefix
Gerber, Mike
2023-12-09 09:21:10 +0100
-
8d0dc72ca2
✔ Enable/document profiling
Gerber, Mike
2023-12-08 16:28:45 +0100
-
8c269b35a4
✔ Test creation of page_info
Gerber, Mike
2023-12-08 15:58:59 +0100
-
-
f243dd204a
✒ Add comments for populating type indicator variables
Gerber, Mike
2023-11-27 16:36:45 +0100
-
ddffb76fb6
🐛 Fix getting parent elements if necessary
Gerber, Mike
2023-11-27 16:35:48 +0100
-
c5332ae80d
🚧 Write out page_info
Mike Gerber
2023-11-23 16:37:30 +0100
-
e51fa5750f
🧹 Remove debug noise
Mike Gerber
2023-11-23 16:08:49 +0100
-
b8980bbf25
🧹 page_info: Name structMap type columns a bit more consistently
Mike Gerber
2023-11-23 16:07:28 +0100
-
3ec0f8c62a
✔ CircleCI: Don't test on Python 3.12 yet
Mike Gerber
2023-11-23 15:15:29 +0100
-
e1238259b7
✔ CircleCI: Don't test on EOL Python 3.6/3.7, but test on 3.11/3.12
Mike Gerber
2023-11-23 15:09:05 +0100
-
3d920f2b50
🐛 Use List/Dict for type annotations to support ye olde Python
Mike Gerber
2023-11-23 15:04:27 +0100
-
968572168e
🧹 Extract a function to convert list[dict] to a DataFrame
Mike Gerber
2023-11-23 15:00:06 +0100
-
5c2dfa8505
✔ Add another (large) METS example
Mike Gerber
2023-11-23 11:26:33 +0100
-
889d36f0d4
✨ page_info: Retrieve filenames + structMap types
Mike Gerber
2023-11-22 18:11:14 +0100
-
dd3943eaf6
🧹 .gitignore pyenv's .python-version
Mike Gerber
2023-11-28 15:45:48 +0100
-
a769d89d0a
🎨 Rename test_modstool → test_mod4pandas
Mike Gerber
2023-11-10 17:58:53 +0100
-
5238c0600b
Merge branch 'master' of https://github.com/qurator-spk/mods4pandas
Mike Gerber
2023-11-10 17:57:46 +0100
-
-
7def0bccaf
🎨 Reformat test METS/MOTS files (to make them easier to read)
Mike Gerber
2023-11-10 17:57:37 +0100
-
100b2a5e6c
🐛 Fix mods:relatedItem with mods:recordIdentifier source=dnb-ppn
Gerber, Mike
2023-04-17 19:21:43 +0200
-
-
4e7b8ed642
✨ Convert mods:relatedItem for types original and host
Mike Gerber
2023-04-14 12:53:11 +0200
-
6d8ba871eb
🎨 Fix link to CircleCI project
Gerber, Mike
2022-07-04 19:34:13 +0200
-
1dfdacc5a5
🎨 Rename the Python package to mods4pandas
Gerber, Mike
2022-07-04 19:28:34 +0200
-
9c0dce7a04
🎨 Rename modstool to mods4pandas in the last code parts
Gerber, Mike
2022-07-04 19:26:41 +0200
-
03d2fc9670
🎨 Rename qurator.modstool to qurator.mods4pandas
Gerber, Mike
2022-07-04 19:24:33 +0200
-
3c2e59f0ed
🎨 Rename qurator.modstool to qurator.mods4pandas
Gerber, Mike
2022-07-04 19:11:10 +0200
-
1d2c5e2d10
🎨 Rename modstool.py to mods4pandas.py
Gerber, Mike
2022-07-04 18:58:39 +0200
-
c48084de93
➡️ Rename modstool script to mods4pandas
Gerber, Mike
2022-06-29 17:25:39 +0200
-
3121621e14
📝 README: Fix typo
Gerber, Mike
2022-06-21 13:16:10 +0200
-
83befba3ab
📝 README: Fix markdown
Gerber, Mike
2022-06-21 13:15:14 +0200
-
f507370729
📝 README: Add some documentation for alto4pandas
Gerber, Mike
2022-06-21 13:12:44 +0200
-
6f2265a619
✔️ Add test data
Gerber, Mike
2022-06-21 12:54:26 +0200
-
c803ce0907
✨ Count all alto:String elements with TAGREFS attribute
Gerber, Mike
2022-06-17 17:59:34 +0200
-
a40716a320
✨ ALTO: Count alto:Tags
Gerber, Mike
2022-06-17 17:32:17 +0200
-
de50f13043
🚧 alto4pandas: Determine ALTO namespace for group
Gerber, Mike
2022-06-17 17:01:07 +0200
-
53a8db955c
🐛 Consistently use lxml for etree
Gerber, Mike
2022-06-16 19:42:44 +0200
-
a2fb3ee387
Merge branch 'feat/alto'
Gerber, Mike
2022-06-16 19:29:44 +0200
-
-
21f906ec7d
✨ Rename altotool to alto4pandas
Gerber, Mike
2022-06-16 19:27:54 +0200
-
3d2e53f739
✨ ALTO: Extract namespace == ALTO version
Gerber, Mike
2022-06-08 18:25:33 +0200
-
8285bdb423
🚧 ALTO: Calculate more descriptive statistics for String@WC
Gerber, Mike
2022-05-23 19:45:44 +0200
-
aa4e8e290d
🚧 ALTO: Move xpath_statistics to TagGroup class
Gerber, Mike
2022-05-23 19:39:21 +0200
-
9246519162
🚧 ALTO: Extract a function to calculate statistics on xpath expressions
Gerber, Mike
2022-05-23 19:33:54 +0200
-
e24a846ea2
🚧 ALTO: Calculate mean of String@WC
Gerber, Mike
2022-05-23 19:12:39 +0200
-
9b3db1cd1d
✨ ALTO: Support more ALTO versions
Gerber, Mike
2022-05-10 19:32:26 +0200
-
937e7d74eb
✨ ALTO: Support more ALTO versions
Gerber, Mike
2022-05-10 18:15:35 +0200
-
4bb3379ab1
🐛 Use tqdm's write() instead of logging during scanning
Gerber, Mike
2022-05-10 17:57:36 +0200
-
6a549968b5
🐛 Produce a text attribute even if the attribute has no value
Gerber, Mike
2022-05-10 17:47:38 +0200
-
c85356bd23
✨ ALTO: Support more ALTO versions
Gerber, Mike
2022-05-10 17:46:50 +0200
-
c91c9b1714
✨ ALTO: preProcessingStep/processingAgency/sourceImageInformation etc.
Gerber, Mike
2022-05-10 14:27:39 +0200
-
01326050d3
✨ ALTO: Handle PermissionErrors
Gerber, Mike
2022-05-09 18:28:31 +0200
-
10b8023dd6
✨ ALTO: Count Layout/Page/* elements
Gerber, Mike
2022-05-06 20:59:51 +0200
-
1c62085612
✨ ALTO: Count Layout/Page/* elements
Gerber, Mike
2022-05-06 20:28:55 +0200
-
c9737683b1
✨ ALTO: Add Layout/Page's attribute values
Gerber, Mike
2022-05-06 19:59:19 +0200
-
102b15ffa9
🧹 Do not duplicate ALTO metadata
Gerber, Mike
2022-05-06 19:36:50 +0200
-
6e2e0bd67a
🐛 Fix imports
Gerber, Mike
2022-05-05 11:10:59 +0200
-
e86369e76d
🚧 Add support for ALTO Description
Gerber, Mike
2022-05-04 20:02:27 +0200
-
-
08082d5fe8
✨ Support mods:partName
Gerber, Mike
2022-04-07 17:37:08 +0200
-
9227575555
🎨 Improve log/output a bit
Gerber, Mike
2022-04-07 16:35:18 +0200
-
db79960ba1
✨ Force singleton for shelfLocator
Gerber, Mike
2022-04-07 16:22:41 +0200
-
d35032067a
🐛 Fix install on Python 3.10
Gerber, Mike
2022-04-07 15:17:19 +0200
-
2ff15f3497
Merge branch 'master' of https://github.com/qurator-spk/modstool
Gerber, Mike
2022-04-06 19:12:45 +0200
-
-
e65cad772d
🐛 Require pandas ~ 1.0 to fix import
Gerber, Mike
2022-04-06 19:12:40 +0200
-
f9e418c460
✨ Optionally output to an Excel .xlsx file
Gerber, Mike
2022-04-06 19:08:38 +0200
-
-
0a9a66c2cc
✨ Optionally output to a CSV file
Gerber, Mike
2022-04-06 16:46:48 +0200
-
840045a54a
⚙️ Add VSCode settings
Gerber, Mike
2022-04-05 14:24:08 +0200