https://github.com/qurator-spk/mods4pandas
Extract the MODS/ALTO metadata of a bunch of METS/ALTO files into pandas DataFrames for data analysis
https://github.com/qurator-spk/mods4pandas
alto alto-xml data-analysis digital-humanities library mets mods pandas qurator
Last synced: 5 months ago
JSON representation
Extract the MODS/ALTO metadata of a bunch of METS/ALTO files into pandas DataFrames for data analysis
- Host: GitHub
- URL: https://github.com/qurator-spk/mods4pandas
- Owner: qurator-spk
- License: apache-2.0
- Created: 2019-08-28T14:37:30.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2025-08-06T18:24:29.000Z (10 months ago)
- Last Synced: 2025-08-06T20:30:58.338Z (10 months ago)
- Topics: alto, alto-xml, data-analysis, digital-humanities, library, mets, mods, pandas, qurator
- Language: Python
- Homepage:
- Size: 435 KB
- Stars: 12
- Watchers: 2
- Forks: 0
- Open Issues: 30
-
Metadata Files:
- Readme: README-DEV.md
- License: LICENSE
Awesome Lists containing this project
README
```
pip install -r requirements-dev.txt
```
To run tests:
```
pip install -e .
pytest
```
To run a test with profiling:
1. Make sure graphviz is installed
2. Run pytest with with profiling enabled:
```
pytest --profile-svg -k test_page_info
```
To directly test the CLIs using our test data, run:
```
mods4pandas --output-page-info "page_info_df.parquet" src/mods4pandas/tests/data/mets-mods
alto4pandas src/mods4pandas/tests/data/alto
```
# How to use pre-commit
This project optionally uses [pre-commit](https://pre-commit.com) to check commits. To use it:
- Install pre-commit, e.g. `pip install -r requirements-dev.txt`
- Install the repo-local git hooks: `pre-commit install`