{"id":29003762,"url":"https://github.com/prachisarode95/automated-spatial_data-qa","last_synced_at":"2025-08-29T14:47:24.663Z","repository":{"id":301130135,"uuid":"1006962752","full_name":"prachisarode95/Automated-Spatial_Data-QA","owner":"prachisarode95","description":"Python-based spatial QA tool for detecting geometry issues in OSM urban data using PostGIS. Outputs include CSV logs and GIS-ready files. Built to showcase automation skills for real-world QA workflows.","archived":false,"fork":false,"pushed_at":"2025-06-25T09:16:51.000Z","size":41696,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-25T10:20:00.352Z","etag":null,"topics":["error-handling","geopandas","jupyter-notebook","openstreetmap-data","osm2pgsql","pandas","pbf-data","pgadmin4-desktop","postgis-extension","postgis-sql","psycopg2","python-3","qa-automation","qgis","sqlalchemy"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/prachisarode95.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-23T08:58:20.000Z","updated_at":"2025-06-25T09:16:54.000Z","dependencies_parsed_at":"2025-06-25T10:20:28.549Z","dependency_job_id":null,"html_url":"https://github.com/prachisarode95/Automated-Spatial_Data-QA","commit_stats":null,"previous_names":["prachisarode95/automated-spatial_data-qa"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/prachisarode95/Automated-Spatial_Data-QA","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prachisarode95%2FAutomated-Spatial_Data-QA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prachisarode95%2FAutomated-Spatial_Data-QA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prachisarode95%2FAutomated-Spatial_Data-QA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prachisarode95%2FAutomated-Spatial_Data-QA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/prachisarode95","download_url":"https://codeload.github.com/prachisarode95/Automated-Spatial_Data-QA/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prachisarode95%2FAutomated-Spatial_Data-QA/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261852242,"owners_count":23219637,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["error-handling","geopandas","jupyter-notebook","openstreetmap-data","osm2pgsql","pandas","pbf-data","pgadmin4-desktop","postgis-extension","postgis-sql","psycopg2","python-3","qa-automation","qgis","sqlalchemy"],"created_at":"2025-06-25T10:29:10.886Z","updated_at":"2025-08-29T14:47:21.826Z","avatar_url":"https://github.com/prachisarode95.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Title: Automated Spatial QA for Urban GIS Layers using Python \u0026 PostGIS\r\n\r\n## Project Summary: \r\nA Python-based QA engine to detect geometry errors in OSM urban datasets using PostGIS. Outputs include CSV reports and GIS-ready spatial files. Inspired by enterprise QA workflows and built to demonstrate spatial data automation skills.\r\n\r\n## Project Phases:\r\n\r\n**Phase 1:** Project setup, virtual environment, database setup  \r\n**Phase 2:** Download OSM data, clip to Pune, import into PostGIS using `osm2pgsql`  \r\n**Phase 3:** Run spatial QA checks (invalid geometry, overlaps, duplicates, slivers)  \r\n**Phase 4:** Generate CSV and GeoPackage output  \r\n\r\n---\r\n## Features\r\n- Run 100% in Python + PostGIS\r\n- Validate spatial data stored in PostGIS\r\n- Detect invalid geometries, overlaps, topology errors, duplicates\r\n- Auto-fix common geometry errors\r\n- Generate error logs and summary QA reports\r\n\r\n---\r\n## Sample QA Logic Implemented\r\n\r\n| Check             | Function                          |\r\n| ----------------- | --------------------------------- |\r\n| Invalid Geometry  | `ST_IsValid()`                    |\r\n| Overlaps          | `ST_Overlaps()`                   |\r\n| Duplicates        | `ST_Equals()` + `ST_Intersects()` |\r\n| Slivers           | `ST_Area() \u003c threshold`           |\r\n| Zero-Length Lines | `ST_Length()`                     |\r\n| Point Duplicates  | `ST_Equals()` between points      |\r\n\r\n---\r\n\r\n## Project Structure\r\n```\r\nautomated_spatial_qa/\r\n├── .gitignore                     # Files/folders to exclude from version control\r\n├── requirements.txt              # Conda or pip dependencies\r\n│\r\n├── data/                          # Input spatial data (e.g., .osm.pbf files)\r\n│   └── pune.osm.pbf\r\n    └── generic.lua\r\n│\r\n├── notebook/                      # Jupyter notebooks\r\n│   └── run_all_qa.ipynb           # Final automation notebook (with markdown + code)\r\n│\r\n├── outputs/                       # QA output files\r\n│   ├── pune_qa_errors.csv             # CSV report of all spatial QA issues\r\n│   └── pune_qa_errors.gpkg     # GeoPackage of invalid/duplicate geometries\r\n│\r\n├── scripts/                       # Python scripts for modular QA\r\n│   ├── run_all_qa.py              # Master script for full automation\r\n│   ├── run_qa_checks_polygons.py  # Polygon QA (invalid, overlap, sliver, duplicate)\r\n│   ├── run_qa_checks_lines.py     # Line QA (invalid, zero-length, duplicate)\r\n│   └── run_qa_checks_points.py    # Point QA (invalid, duplicate location)\r\n│\r\n├── sql/                           # Optional: future SQL-only QA scripts (raw)\r\n│   └── (optional custom .sql files)\r\n\r\n```\r\n## Technologies Used\r\n- Python (3.10)\r\n\r\n- PostgreSQL + PostGIS (v14+)\r\n\r\n- Geopandas, Pandas, Psycopg2, SQLAlchemy\r\n\r\n- OpenStreetMap PBF data\r\n\r\n- Jupyter Notebook for interactive runs\r\n\r\n---\r\n\r\n## How to Run This Project\r\n\r\n### Clone the Repository\r\n```bash\r\ngit clone https://github.com/prachisarode95/Automated-Spatial_Data-QA.git\r\ncd Automated-Spatial_Data-QA\r\n```\r\n### Set Up Conda Environment\r\n```bash\r\nconda create -n urbanqa_env python=3.11\r\nconda activate urbanqa_env\r\nconda install psycopg2 pandas geopandas sqlalchemy jupyter -c conda-forge\r\n```\r\n### Start the Notebook\r\n```\r\njupyter notebook\r\n\r\n```\r\nNote: Open notebook/run_all_qa.ipynb and run all cells. Alternatively, run scripts/run_all_qa.py from the terminal.\r\n\r\n---\r\n\r\n# Outputs\r\n\r\n| Output Type              | Description                  |\r\n| ------------------------ | ---------------------------- |\r\n| `pune_qa_errors.csv`         | Tabular log of all issues    |\r\n| `pune_qa_errors.gpkg` | Can be opened in QGIS/ArcGIS |\r\n| `.ipynb` notebook        | Full code for entire spatial QA pipeline in one go |\r\n\r\n---\r\n# Outputs Visualization\r\n![Visualizing Spatial QA Errors](https://github.com/user-attachments/assets/eca582f6-4627-46e5-9bb3-5e0dbd7e6282)\r\n\r\n---\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprachisarode95%2Fautomated-spatial_data-qa","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprachisarode95%2Fautomated-spatial_data-qa","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprachisarode95%2Fautomated-spatial_data-qa/lists"}