-
8d6b97f6b3
🐛 Fix typo in XlsxWriter dependency
master
Mike Gerber
2024-08-02 05:55:47 +0200
-
7122f0265f
Remove direct CSV/Excel support
Mike Gerber
2024-07-31 11:09:34 +0200
-
a1f333f4a4
🐛 Fix converting/writing out per-page information (e.g. structure information)
Mike Gerber
2024-07-31 10:27:46 +0200
-
1bf86bfb4c
✔ Test on Python 3.12
Mike Gerber
2024-07-29 07:02:59 +0200
-
191867cdef
⚙ Make saving per-page information optional
Mike Gerber
2024-07-29 06:08:01 +0200
-
dd4febf24d
🚧 Write a Parquet file
Mike Gerber
2024-07-27 12:57:33 +0200
-
03d86ce68a
🐛 Fix tests
Mike Gerber
2024-07-25 13:26:12 +0200
-
ae650f70a2
⚙ Migrate to pyproject.toml
Mike Gerber
2024-07-25 13:20:18 +0200
-
187da054b0
⚙ Migrate to pyproject.toml
Mike Gerber
2024-07-25 12:45:22 +0200
-
acd9c5cd4b
Merge branch 'feat/page_info' of https://github.com/qurator-spk/mods4pandas into feat/page_info
Mike Gerber
2024-07-25 12:01:49 +0200
-
-
515d39bb1d
🚧 Workaround NumPy incompatibility by requiring < 2
Mike Gerber
2024-07-25 12:01:35 +0200
-
e9fca0f563
🐛 mods4pandas: Handle MultiVolumeWork (differently spelled type cp. to before) without structMap TYPE='PHYSICAL'
Gerber, Mike
2023-12-12 13:33:05 +0100
-
b8a2872582
🐛 mods4pandas: Handle periodical without structMap TYPE='PHYSICAL'
Gerber, Mike
2023-12-12 13:13:23 +0100
-
6226618f40
🐛 mods4pandas: Handle multivolume_work without structMap TYPE='PHYSICAL'
Gerber, Mike
2023-12-12 12:34:24 +0100
-
0acaa83163
⚡ MUse relative predicate
Gerber, Mike
2023-12-09 12:48:07 +0100
-
8fc4eeeb3b
⚡ Make get_sets_file faster by using a lookup table
Gerber, Mike
2023-12-09 12:05:20 +0100
-
912e5d2b4a
⚡ Make get_struct_log faster by using precise predicates
Gerber, Mike
2023-12-09 11:40:45 +0100
-
448639b05b
⚡ Make gettstruct_log faster by using precise predicates
Gerber, Mike
2023-12-09 11:35:24 +0100
-
1dac77a2f5
⚡ Make gett_struct_log faster by using find(all) instead of xpath()
Gerber, Mike
2023-12-09 10:36:55 +0100
-
90c60ebb80
⚡ Make get_mets_file aa lot faster by using find() instead of xpath()
Gerber, Mike
2023-12-09 10:24:38 +0100
-
16a3a3bcc8
✔ Fix tests on Python 3.8 by backporting removeprefix
Gerber, Mike
2023-12-09 09:21:10 +0100
-
8d0dc72ca2
✔ Enable/document profiling
Gerber, Mike
2023-12-08 16:28:45 +0100
-
8c269b35a4
✔ Test creation of page_info
Gerber, Mike
2023-12-08 15:58:59 +0100
-
-
f243dd204a
✒ Add comments for populating type indicator variables
Gerber, Mike
2023-11-27 16:36:45 +0100
-
ddffb76fb6
🐛 Fix getting parent elements if necessary
Gerber, Mike
2023-11-27 16:35:48 +0100
-
c5332ae80d
🚧 Write out page_info
Mike Gerber
2023-11-23 16:37:30 +0100
-
e51fa5750f
🧹 Remove debug noise
Mike Gerber
2023-11-23 16:08:49 +0100
-
b8980bbf25
🧹 page_info: Name structMap type columns a bit more consistently
Mike Gerber
2023-11-23 16:07:28 +0100
-
3ec0f8c62a
✔ CircleCI: Don't test on Python 3.12 yet
Mike Gerber
2023-11-23 15:15:29 +0100
-
e1238259b7
✔ CircleCI: Don't test on EOL Python 3.6/3.7, but test on 3.11/3.12
Mike Gerber
2023-11-23 15:09:05 +0100
-
3d920f2b50
🐛 Use List/Dict for type annotations to support ye olde Python
Mike Gerber
2023-11-23 15:04:27 +0100
-
968572168e
🧹 Extract a function to convert list[dict] to a DataFrame
Mike Gerber
2023-11-23 15:00:06 +0100
-
5c2dfa8505
✔ Add another (large) METS example
Mike Gerber
2023-11-23 11:26:33 +0100
-
889d36f0d4
✨ page_info: Retrieve filenames + structMap types
Mike Gerber
2023-11-22 18:11:14 +0100
-
dd3943eaf6
🧹 .gitignore pyenv's .python-version
Mike Gerber
2023-11-28 15:45:48 +0100
-
a769d89d0a
🎨 Rename test_modstool → test_mod4pandas
Mike Gerber
2023-11-10 17:58:53 +0100
-
5238c0600b
Merge branch 'master' of https://github.com/qurator-spk/mods4pandas
Mike Gerber
2023-11-10 17:57:46 +0100
-
-
7def0bccaf
🎨 Reformat test METS/MOTS files (to make them easier to read)
Mike Gerber
2023-11-10 17:57:37 +0100
-
100b2a5e6c
🐛 Fix mods:relatedItem with mods:recordIdentifier source=dnb-ppn
Gerber, Mike
2023-04-17 19:21:43 +0200
-
-
4e7b8ed642
✨ Convert mods:relatedItem for types original and host
Mike Gerber
2023-04-14 12:53:11 +0200
-
6d8ba871eb
🎨 Fix link to CircleCI project
Gerber, Mike
2022-07-04 19:34:13 +0200
-
1dfdacc5a5
🎨 Rename the Python package to mods4pandas
Gerber, Mike
2022-07-04 19:28:34 +0200
-
9c0dce7a04
🎨 Rename modstool to mods4pandas in the last code parts
Gerber, Mike
2022-07-04 19:26:41 +0200
-
03d2fc9670
🎨 Rename qurator.modstool to qurator.mods4pandas
Gerber, Mike
2022-07-04 19:24:33 +0200
-
3c2e59f0ed
🎨 Rename qurator.modstool to qurator.mods4pandas
Gerber, Mike
2022-07-04 19:11:10 +0200
-
1d2c5e2d10
🎨 Rename modstool.py to mods4pandas.py
Gerber, Mike
2022-07-04 18:58:39 +0200
-
c48084de93
➡️ Rename modstool script to mods4pandas
Gerber, Mike
2022-06-29 17:25:39 +0200
-
3121621e14
📝 README: Fix typo
Gerber, Mike
2022-06-21 13:16:10 +0200
-
83befba3ab
📝 README: Fix markdown
Gerber, Mike
2022-06-21 13:15:14 +0200
-
f507370729
📝 README: Add some documentation for alto4pandas
Gerber, Mike
2022-06-21 13:12:44 +0200
-
6f2265a619
✔️ Add test data
Gerber, Mike
2022-06-21 12:54:26 +0200
-
c803ce0907
✨ Count all alto:String elements with TAGREFS attribute
Gerber, Mike
2022-06-17 17:59:34 +0200
-
a40716a320
✨ ALTO: Count alto:Tags
Gerber, Mike
2022-06-17 17:32:17 +0200
-
de50f13043
🚧 alto4pandas: Determine ALTO namespace for group
Gerber, Mike
2022-06-17 17:01:07 +0200
-
53a8db955c
🐛 Consistently use lxml for etree
Gerber, Mike
2022-06-16 19:42:44 +0200
-
a2fb3ee387
Merge branch 'feat/alto'
Gerber, Mike
2022-06-16 19:29:44 +0200
-
-
21f906ec7d
✨ Rename altotool to alto4pandas
Gerber, Mike
2022-06-16 19:27:54 +0200
-
3d2e53f739
✨ ALTO: Extract namespace == ALTO version
Gerber, Mike
2022-06-08 18:25:33 +0200
-
8285bdb423
🚧 ALTO: Calculate more descriptive statistics for String@WC
Gerber, Mike
2022-05-23 19:45:44 +0200
-
aa4e8e290d
🚧 ALTO: Move xpath_statistics to TagGroup class
Gerber, Mike
2022-05-23 19:39:21 +0200
-
9246519162
🚧 ALTO: Extract a function to calculate statistics on xpath expressions
Gerber, Mike
2022-05-23 19:33:54 +0200
-
e24a846ea2
🚧 ALTO: Calculate mean of String@WC
Gerber, Mike
2022-05-23 19:12:39 +0200
-
9b3db1cd1d
✨ ALTO: Support more ALTO versions
Gerber, Mike
2022-05-10 19:32:26 +0200
-
937e7d74eb
✨ ALTO: Support more ALTO versions
Gerber, Mike
2022-05-10 18:15:35 +0200
-
4bb3379ab1
🐛 Use tqdm's write() instead of logging during scanning
Gerber, Mike
2022-05-10 17:57:36 +0200
-
6a549968b5
🐛 Produce a text attribute even if the attribute has no value
Gerber, Mike
2022-05-10 17:47:38 +0200
-
c85356bd23
✨ ALTO: Support more ALTO versions
Gerber, Mike
2022-05-10 17:46:50 +0200
-
c91c9b1714
✨ ALTO: preProcessingStep/processingAgency/sourceImageInformation etc.
Gerber, Mike
2022-05-10 14:27:39 +0200
-
01326050d3
✨ ALTO: Handle PermissionErrors
Gerber, Mike
2022-05-09 18:28:31 +0200
-
10b8023dd6
✨ ALTO: Count Layout/Page/* elements
Gerber, Mike
2022-05-06 20:59:51 +0200
-
1c62085612
✨ ALTO: Count Layout/Page/* elements
Gerber, Mike
2022-05-06 20:28:55 +0200
-
c9737683b1
✨ ALTO: Add Layout/Page's attribute values
Gerber, Mike
2022-05-06 19:59:19 +0200
-
102b15ffa9
🧹 Do not duplicate ALTO metadata
Gerber, Mike
2022-05-06 19:36:50 +0200
-
6e2e0bd67a
🐛 Fix imports
Gerber, Mike
2022-05-05 11:10:59 +0200
-
e86369e76d
🚧 Add support for ALTO Description
Gerber, Mike
2022-05-04 20:02:27 +0200
-
-
08082d5fe8
✨ Support mods:partName
Gerber, Mike
2022-04-07 17:37:08 +0200
-
9227575555
🎨 Improve log/output a bit
Gerber, Mike
2022-04-07 16:35:18 +0200
-
db79960ba1
✨ Force singleton for shelfLocator
Gerber, Mike
2022-04-07 16:22:41 +0200
-
d35032067a
🐛 Fix install on Python 3.10
Gerber, Mike
2022-04-07 15:17:19 +0200
-
2ff15f3497
Merge branch 'master' of https://github.com/qurator-spk/modstool
Gerber, Mike
2022-04-06 19:12:45 +0200
-
-
e65cad772d
🐛 Require pandas ~ 1.0 to fix import
Gerber, Mike
2022-04-06 19:12:40 +0200
-
f9e418c460
✨ Optionally output to an Excel .xlsx file
Gerber, Mike
2022-04-06 19:08:38 +0200
-
-
0a9a66c2cc
✨ Optionally output to a CSV file
Gerber, Mike
2022-04-06 16:46:48 +0200
-
840045a54a
⚙️ Add VSCode settings
Gerber, Mike
2022-04-05 14:24:08 +0200
-
86d1154638
✔️ Add METS tests
Gerber, Mike
2022-04-01 16:05:07 +0200
-
f7c51d127d
⚙️ Add .editorconfig
Gerber, Mike
2022-04-01 16:04:47 +0200
-
ad2b0a1d9b
🧹 .gitignore
Gerber, Mike
2022-04-01 16:04:21 +0200
-
9a8f7f69bd
🐛 Handle multiple mods:role
Gerber, Mike
2022-04-01 14:52:54 +0200
-
75ff143a25
🐛 modstool: Fix handling multiple <mods:language>
Gerber, Mike
2022-04-01 14:02:19 +0200
-
5c48541dee
✔️ CircleCI: - Python 3.5 + Python 3.10
Gerber, Mike
2022-03-31 21:10:13 +0200
-
73333ea2e2
⚡ Include METS fileGrp counts
Gerber, Mike
2022-03-31 21:03:58 +0200
-
2399699990
✔️ CircleCI: Use non-deprecated "next-gen Docker convenience image"
Gerber, Mike
2022-03-31 19:07:11 +0200
-
93ce1505c4
✨ Handle namePart + nameIdentifier
Gerber, Mike
2022-03-31 18:57:35 +0200
-
c49ac4f6e8
🚧 modstool: Replace Travis with CircleCI
Gerber, Mike
2021-02-10 18:56:01 +0100
-
3cf596d5fe
Merge commit '1bc366706b2296c4d81bbe44a723dbd9bb585618'
Gerber, Mike
2021-02-10 18:52:19 +0100
-
-
1bc366706b
🚧 modstool: Replace Travis with CircleCI
Gerber, Mike
2021-02-10 18:46:25 +0100
-
-
662343fdcf
📝 modstool: Add LICENSE
Gerber, Mike
2019-10-11 13:41:33 +0200
-
7e07bd26bf
🤪 modstool: Fix setup.py keywords
Gerber, Mike
2019-08-29 16:28:06 +0200
-
d885d6bfa2
🤪 modstool: Fiddle with the damn packaging/namespace so that stuff works
Gerber, Mike
2019-08-29 16:15:48 +0200
-
5773b9c9b1
🐛 modstool: Handle multiple scriptTerms per language correctly
Gerber, Mike
2019-08-29 15:42:13 +0200