{"id":20284058,"url":"https://github.com/ccao-data/extract-permits","last_synced_at":"2026-06-01T06:31:50.673Z","repository":{"id":207846451,"uuid":"719754937","full_name":"ccao-data/extract-permits","owner":"ccao-data","description":"Script and workflow for permit data extraction","archived":false,"fork":false,"pushed_at":"2026-05-29T19:35:39.000Z","size":1594,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-05-29T21:15:02.537Z","etag":null,"topics":["etl","gh-actions","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ccao-data.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-11-16T20:45:29.000Z","updated_at":"2026-05-29T19:35:42.000Z","dependencies_parsed_at":"2024-01-08T22:22:40.997Z","dependency_job_id":"568c2a27-9ca0-4e99-8e44-669178dddf51","html_url":"https://github.com/ccao-data/extract-permits","commit_stats":null,"previous_names":["ccao-data/extract-permits"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ccao-data/extract-permits","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ccao-data%2Fextract-permits","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ccao-data%2Fextract-permits/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ccao-data%2Fextract-permits/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ccao-data%2Fextract-permits/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ccao-data","download_url":"https://codeload.github.com/ccao-data/extract-permits/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ccao-data%2Fextract-permits/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33763648,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-01T02:00:06.963Z","response_time":115,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["etl","gh-actions","python"],"created_at":"2024-11-14T14:18:11.628Z","updated_at":"2026-06-01T06:31:50.667Z","avatar_url":"https://github.com/ccao-data.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# extract-permits\n\nScripts and workflows for permit data extraction.\n\nCurrently, the only permits we extract are from the City of Chicago data\nportal. The code that extracts these permits is defined in the [`chicago/`\nsubdirectory](./chicago/) and forms the basis of the\n[`extract-chicago-permits`\nworkflow](https://github.com/ccao-data/extract-permits/actions/workflows/extract-chicago-permits.yaml).\n\n## Running the workflow\n\nTo run the workflow, navigate to [the page for the\nworkflow](https://github.com/ccao-data/extract-permits/actions/workflows/extract-chicago-permits.yaml),\nclick the **Run workflow** button to open a dropdown, and select your parameters:\n\n- **Use workflow from**: The git branch to use as the basis for running the\n  workflow. Unless you are testing changes, this should always be `main`.\n- **Start date**: The lower bound (inclusive) for a date range to use to filter\n  permits. Must be in `YYYY-MM-DD` format.\n- **End date**: The upper bound (inclusive) for a date range to use to filter\n  permits. Must be in `YYYY-MM-DD` format.\n- **Deduplicate**: Filter out permits that have already been extracted to our\n  data warehouse. We recommend leaving this option unchecked because we have\n  not extensively tested the deduplication logic. Instead, we only query\n  mutually-exclusive date ranges of permits to send to the Data Integrity team.\n\nOnce the workflow finishes running, it will upload a ZIP archive containing the\npermits to an AWS S3 bucket. It will also send a message containing a link to\nthe bucket to an AWS SNS topic dedicated to the workflow. If you subscribe to\nthat AWS SNS topic, you will receive an email with this link when the workflow\nhas finished running. Alternatively, the workflow will also print a link to the\nS3 bucket in its logs, so you can check the logs instead of subscribing to\nthe SNS topic.\n\n## Development\n\nFollow these instructions if you need to make changes to the permit extraction\nscripts.\n\nThese instructions are for Ubuntu, which is the only platform we've tested.\n\n### Installation\n\n#### Requirements\n\n* Python3 with `uv` installed (pre-installed on the CCAO server)\n* [AWS CLI installed\n  locally](https://github.com/ccao-data/wiki/blob/master/How-To/Connect-to-AWS-Resources.md)\n  * You'll also need permissions for Athena, Glue, and S3\n* [`aws-mfa` installed locally](https://github.com/ccao-data/wiki/blob/master/How-To/Setup-the-AWS-Command-Line-Interface-and-Multi-factor-Authentication.md)\n\n#### Install Python dependencies\n\nRun the following commands to install Python dependencies:\n\n```bash\ncd chicago\nuv sync --frozen\n```\n\n### Run the script\n\nTo run the script, make sure you're in the `chicago/` subdirectory:\n\n```bash\ncd chicago\n```\n\nYou must also authenticate with AWS using MFA if you haven't already today:\n\n```bash\naws-mfa\n```\n\nThen, run the script:\n\n```bash\nuv run python3 permit_cleaning.py \\\n  # The first argument is the lower bound for the date range (inclusive)\n  \u003cYYYY-MM-DD\u003e \\\n  # The second argument is the upper bound for the date range (inclusive)\n  \u003cYYYY-MM-DD\u003e \\\n  # Boolean indicating whether to filter out permits that are already in\n  # our warehouse. We recommend not deduplicating because the logic has\n  # not been extensively tested\n  False\n```\n\nYou can also run the script using the `extract-chicago-permits` workflow. See\n[Running the workflow](#running-the-workflow) for instructions on how to do\nthat.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fccao-data%2Fextract-permits","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fccao-data%2Fextract-permits","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fccao-data%2Fextract-permits/lists"}