https://github.com/ondata/appaltipop
ETL scripts and issue tracking for AppaltiPOP project.
https://github.com/ondata/appaltipop
appaltipop csv elasticsearch json json-schema jsonl jupiter python
Last synced: 12 days ago
JSON representation
ETL scripts and issue tracking for AppaltiPOP project.
- Host: GitHub
- URL: https://github.com/ondata/appaltipop
- Owner: ondata
- License: mit
- Created: 2020-02-02T11:10:32.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2022-12-08T10:22:01.000Z (over 3 years ago)
- Last Synced: 2026-03-10T01:32:20.560Z (20 days ago)
- Topics: appaltipop, csv, elasticsearch, json, json-schema, jsonl, jupiter, python
- Language: Jupyter Notebook
- Homepage: https://www.appaltipop.it
- Size: 138 MB
- Stars: 1
- Watchers: 6
- Forks: 0
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# AppaltiPOP
This repository is intended for project tracking. Here you can also find raw data and utilities for validation and indexing.
## Data
Pipeline:
- start: json files (an array of objects per source) in `json` folder
- then: jsonl files (same data, but one objects per line) in `jsonl` folder
- finally: indexing in `elasticsearch` folder
## Schema
You can validate all files using [JSON Schema](https://json-schema.org/) in `schema` folder. Refer to README files in each folder for further informations, you need Python 3 and virtual environments managed by [pipenv](https://pipenv.pypa.io/en/latest/).
General usage:
- `cd [folder]`
- `pipenv shell`
- `pipenv install` (only the first time)
- `python [script] [...args]` (inside the virtual env) or `pipenv run python [script] [...args]`