{"id":44343868,"url":"https://github.com/joemull/pyjade","last_synced_at":"2026-02-11T13:12:14.737Z","repository":{"id":37747641,"uuid":"259529986","full_name":"joemull/pyjade","owner":"joemull","description":"A data curation script for the Jane Addams Digital Edition","archived":false,"fork":false,"pushed_at":"2024-05-03T19:56:00.000Z","size":827,"stargazers_count":0,"open_issues_count":3,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-10-04T14:04:23.940Z","etag":null,"topics":["data-analysis","digital-humanities"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/joemull.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-04-28T04:21:21.000Z","updated_at":"2022-02-11T16:35:57.000Z","dependencies_parsed_at":"2023-01-17T17:15:31.065Z","dependency_job_id":null,"html_url":"https://github.com/joemull/pyjade","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/joemull/pyjade","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/joemull%2Fpyjade","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/joemull%2Fpyjade/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/joemull%2Fpyjade/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/joemull%2Fpyjade/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/joemull","download_url":"https://codeload.github.com/joemull/pyjade/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/joemull%2Fpyjade/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29333170,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-11T12:42:24.625Z","status":"ssl_error","status_checked_at":"2026-02-11T12:41:23.344Z","response_time":97,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-analysis","digital-humanities"],"created_at":"2026-02-11T13:12:14.050Z","updated_at":"2026-02-11T13:12:14.731Z","avatar_url":"https://github.com/joemull.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Pyjade\n\nJoseph Muller\n\njhmuller@umich.edu\n\nPyjade is a program for interacting with the Omeka Classic MySQL database used by the [Jane Addams Digital Edition](https://github.com/joemull/pyjade/). The program exports, transforms, and curates datasets for use by data modelers, researchers, and programmers. Export format options include XLSX, CSV, JSON, HTML, and TXT.\n\n## Data model\nThe program structures the data according to the following model.\n\n![The data model relates entities including text, publication, event, person, organization, and location with relationships including written by, lived in, and located in.](jade_data_model.jpeg)\n\n## Installing the environment\nThe program runs on Python 3.7 and several external libraries, which are installed by the virtual environment library [`virtualenv`](https://pypi.org/project/virtualenv/).\n\n1. Clone or download the repository and navigate into it with Terminal (Mac) or [Git Bash](https://gitforwindows.org/) (Windows).\n2. If you do not have `virtualenv` installed, install it with `pip install virtualenv` or [another installation method](https://virtualenv.pypa.io/en/latest/installation.html).\n3. Create a virtual environment.\n    ```\n    virtualenv venv\n    ```\n4. Activate the virtualenv.\n\n    Mac\n    ```\n    source venv/bin/activate\n    ```\n    Windows Git Bash\n    ```\n    source venv/Scripts/activate\n    ```\n\n5. Install the external libraries.\n    ```\n    pip install -r requirements.txt\n    ```\n6. Deactivate the virtual environment.\n    ```\n    deactivate\n    ```\n\n## Running the program\n1. Create a copy of `example_options.json` and name it `options.json`.\n2. Activate the virtual environment if not already activated (see above).\n3. Edit `options.json` as desired--see (see [Options](#Options))\n4. Run the program.\n    ```\n    python pyjade.py\n    ```\n5. Choose between partial or full ingest.\n    ```\n    Loaded options file\n     'part' to test script (by using limit=1000 on the main SQL queries)\n     'full' to run full export\n    pyjade: part\n    ```\n6. Choose between cached or fresh data\n    ```\n     'cached' to load from cache (if just output settings were changed)\n     'fresh' to load from RDBMS\n    pyjade: fresh\n    ```\n7. View time-stamped outputs in `outputs` directory.\n8. Repeat steps 3–7 as needed.\n\n## Options\nThe `options.json` is used to set up the SQL connection, control what data is exported, and provide frequently referenced information for ingesting the item elements from the SQL database.\n\n### `SQL`\nPut in parameters for your SQL connection. Visit [`mysql-connector-python`](https://github.com/mysql/mysql-connector-python) for details.\n\n### `DATASET_OPTIONS`\n* `EXPORT` Whether to export any data. Useful to set `false` if you are just trying to ingest data from SQL. Overrides the subsequent five format export options if set to `false`.\n* `EXPORT_XLSX` Whether to include XLSX in each dataset.\n* `EXPORT_CSV` Whether to include CSV in each dataset.\n* `EXPORT_JSON` Whether to include JSON in each dataset.\n* `WRAP_JSON_RECORDS_IN_TYPE_BRANCHES` Recommended `false`, so that foreign key in relations (`jade_id` in object position) can be easily looked up in one large dictionary without having to sort by type. If set to `true`, will segment the object-level dictionary into up to six separate dictionaries named by type.\n* `EXPORT_TXT` Whether to include TXT in each dataset.\n* `EXPORT_HTML` Whether to include HTML in each dataset.\n* `EXPORT_COMPLETE_CURATED_DATASET` Whether to export a complete dataset. (The total items ingested and exported can still be capped at runtime with `part`—see above).\n* `EXPORT_SUBSETS` Whether to export any subsets, as specified below.\n* `SUBSET_PROPERTIES_AND_QUANTITIES` Set the property names for which to export value subsets (if `EXPORT_SUBSETS` is `true`), and how many sets to make for each. The script takes the top values by frequency of occurrence. For example,\n  ```json\n  \"dcterms_subject\" : 15,\n  ```\n  will export fifteen subsets containing objects tagged with the fifteen most frequently occurring subjects.\n* `OTHER_SUBSET_PROPERTIES_AND_QUANTITIES` Use as a place to hold property names you don't want to export.\n* `PROPERTIES_TO_INCLUDE_FOR_EACH_TYPE` Defines what properties are considered regular for each type. Include or exclude properties to export more or less data for each type.\n* `EXPORT_SEPARATE_SQL_CRUMBS` Properties that occur irregularly in the RDBMS (regular as defined above) will be considered noise by the program and can be optionally exported as `sql_crumbs.xlsx`).\n* `EXPORT_EVERYTHING_IN_SINGLE_XLSX_SHEET` Whether to create one big spreadsheet with all objects and properties. Includes lots of blank cells. Overwhelming but useful if you just want to see all the data in one place.\n* `OUTPUT_STATS` Whether to create markdown and XLSX files with tables of property and relation counts. Tables may have bugs, so edit before publishing.\n\n### ELEMENT_DICTIONARY\nRecommended this does not change. Used by the script to translate SQL IDs to labels and know which elements to ingest (mostly `DCTERMS_IN_USE` and `DESC_JADE_ELEMENTS`) and which object types to keep (`TYPES`). Most administrative data are excluded.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjoemull%2Fpyjade","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjoemull%2Fpyjade","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjoemull%2Fpyjade/lists"}