https://github.com/isawnyu/oracc2csv
The Open Richly Annotated Cuneiform Corpus (ORACC) publishes JSON data for each of its projects. Sometimes you want the catalog data listing each text to be in CSV format. This package does that.
https://github.com/isawnyu/oracc2csv
csv cuneiform json oracc
Last synced: 12 months ago
JSON representation
The Open Richly Annotated Cuneiform Corpus (ORACC) publishes JSON data for each of its projects. Sometimes you want the catalog data listing each text to be in CSV format. This package does that.
- Host: GitHub
- URL: https://github.com/isawnyu/oracc2csv
- Owner: isawnyu
- License: agpl-3.0
- Created: 2022-06-26T09:46:41.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2022-06-26T09:47:06.000Z (almost 4 years ago)
- Last Synced: 2025-02-15T06:36:41.793Z (about 1 year ago)
- Topics: csv, cuneiform, json, oracc
- Language: Python
- Homepage:
- Size: 8.68 MB
- Stars: 1
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# oracc2csv
The [Open Richly Annotated Cuneiform Corpus (ORACC)](http://oracc.museum.upenn.edu/) publishes JSON data for each of its projects. Sometimes you want the catalog data listing each text to be in CSV format. This package does that.
This program was written by [Tom Elliott](https://orcid.org/0000-0002-4114-6677) for the [Institute for the Study of the Ancient World (NYU)](https://isaw.nyu.edu) and is Copyright 2022 by New York University. It is licensed under the GNU Affero General Public License (see LICENSE.txt).
## Install
Create a python 3.10.4+ virtual environment. Download or clone this package from GitHub. Run:
```
pip install -U -r requirements_dev.txt
```
## Use
Download the zip file of the ORACC project you're interested in (e.g., http://oracc.org/json/hbtin.zip). Run the oracc2csv `dump` script:
```
> python scripts/dump.py -v ~/oracc/hbtin ~/scratch
INFO:root:logging level changed to INFO via command line option; was WARNING
INFO:oracc2csv:Loaded corpus from /Users/banana/oracc/hbtin:
HBTIN: Hellenistic Babylonia: Texts, Iconography, Names
Cuneiform texts, iconography and onomastic data from Hellenistic Babylonia, primarily from Uruk. HBTIN texts form the demonstrator corpus of the Berkeley Prosopography Service (BPS). Directed by Laurie Pearce at UC Berkeley.
572 entries
INFO:oracc2csv:Wrote corpus to /Users/banana/scratch
```