{"id":23617020,"url":"https://github.com/willf/ejpipeline","last_synced_at":"2025-11-06T14:30:42.908Z","repository":{"id":268929020,"uuid":"899183758","full_name":"willf/ejpipeline","owner":"willf","description":"Environmental Justice Data Pipeline Tool","archived":false,"fork":false,"pushed_at":"2025-01-23T15:10:41.000Z","size":64,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-15T20:58:42.554Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/willf.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-05T19:19:45.000Z","updated_at":"2025-01-23T15:10:44.000Z","dependencies_parsed_at":"2024-12-19T19:25:49.794Z","dependency_job_id":"c19f9d41-538f-4db3-9040-a0683c0fd7d6","html_url":"https://github.com/willf/ejpipeline","commit_stats":null,"previous_names":["willf/ejpipeline"],"tags_count":0,"template":true,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/willf%2Fejpipeline","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/willf%2Fejpipeline/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/willf%2Fejpipeline/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/willf%2Fejpipeline/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/willf","download_url":"https://codeload.github.com/willf/ejpipeline/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239494590,"owners_count":19648154,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-27T18:16:01.182Z","updated_at":"2025-02-18T15:24:42.874Z","avatar_url":"https://github.com/willf.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Environmental Justice Data Pipeline (EJPipeline)\n\n[![Python tests](https://github.com/willf/ejpipeline/actions/workflows/test.yml/badge.svg)](https://github.com/willf/ejpipeline/actions/workflows/test.yml)\n\nThis repository contains the code for the Environmental Justice Data Pipeline (EJPipeline) project. The EJPipeline is a data processing pipeline collects data from various sources relevant to environmental justice and processes it to make it more accessible to researchers and the public. The pipeline is designed to be modular and extensible, allowing for the addition of new data sources and processing steps. We also aim to minimize the Python dependencies required to run the pipeline, so that it can be run on a wide variety of systems. Further, we aim to use previous open source work as much as possible, and to contribute back to the open source community.\n\n## Installation\n\nThe current way to install the EJPipeline to clone the repository, and run it using `uv`. For example,\nif you have `gh` installed, you can run:\n\n```bash\ngh repo clone willf/ejpipeline\ncd ejpipeline\n```\n\n## Use\n\nTo run the pipeline, you can use `uv` to run the `etl/base.py` script from the ejpipeline directory. For example:\n\n```bash\nPYTHONPATH=`pwd`:$PYTHONPATH uv run data_pipeline/etl/base.py\n```\n\nUse `--force` to force the pipeline to run, even if the data is up to date.\n\nYou will likely need to install playwright browsers the first time.\n\n`uv run playwright install`\n\n## Development\n\nETL modules are located in the `data_pipeline/etl` directory. Each module should be a subclass of `BaseETL` and should implement the `extract`, `transform`, and `load` methods as necessary, as well as an `already_etled` method to check if the data is already up to date. See, for example, the `data_pipeline/etl/calenviroscreen/` module.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwillf%2Fejpipeline","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwillf%2Fejpipeline","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwillf%2Fejpipeline/lists"}