https://github.com/prachisarode95/automated-spatial_data-qa
Python-based spatial QA tool for detecting geometry issues in OSM urban data using PostGIS. Outputs include CSV logs and GIS-ready files. Built to showcase automation skills for real-world QA workflows.
https://github.com/prachisarode95/automated-spatial_data-qa
error-handling geopandas jupyter-notebook openstreetmap-data osm2pgsql pandas pbf-data pgadmin4-desktop postgis-extension postgis-sql psycopg2 python-3 qa-automation qgis sqlalchemy
Last synced: 4 days ago
JSON representation
Python-based spatial QA tool for detecting geometry issues in OSM urban data using PostGIS. Outputs include CSV logs and GIS-ready files. Built to showcase automation skills for real-world QA workflows.
- Host: GitHub
- URL: https://github.com/prachisarode95/automated-spatial_data-qa
- Owner: prachisarode95
- Created: 2025-06-23T08:58:20.000Z (6 days ago)
- Default Branch: main
- Last Pushed: 2025-06-25T09:16:51.000Z (4 days ago)
- Last Synced: 2025-06-25T10:20:00.352Z (4 days ago)
- Topics: error-handling, geopandas, jupyter-notebook, openstreetmap-data, osm2pgsql, pandas, pbf-data, pgadmin4-desktop, postgis-extension, postgis-sql, psycopg2, python-3, qa-automation, qgis, sqlalchemy
- Language: Python
- Homepage:
- Size: 39.8 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Title: Automated Spatial QA for Urban GIS Layers using Python & PostGIS
## Project Summary:
A Python-based QA engine to detect geometry errors in OSM urban datasets using PostGIS. Outputs include CSV reports and GIS-ready spatial files. Inspired by enterprise QA workflows and built to demonstrate spatial data automation skills.## Project Phases:
**Phase 1:** Project setup, virtual environment, database setup
**Phase 2:** Download OSM data, clip to Pune, import into PostGIS using `osm2pgsql`
**Phase 3:** Run spatial QA checks (invalid geometry, overlaps, duplicates, slivers)
**Phase 4:** Generate CSV and GeoPackage output---
## Features
- Run 100% in Python + PostGIS
- Validate spatial data stored in PostGIS
- Detect invalid geometries, overlaps, topology errors, duplicates
- Auto-fix common geometry errors
- Generate error logs and summary QA reports---
## Sample QA Logic Implemented| Check | Function |
| ----------------- | --------------------------------- |
| Invalid Geometry | `ST_IsValid()` |
| Overlaps | `ST_Overlaps()` |
| Duplicates | `ST_Equals()` + `ST_Intersects()` |
| Slivers | `ST_Area() < threshold` |
| Zero-Length Lines | `ST_Length()` |
| Point Duplicates | `ST_Equals()` between points |---
## Project Structure
```
automated_spatial_qa/
├── .gitignore # Files/folders to exclude from version control
├── requirements.txt # Conda or pip dependencies
│
├── data/ # Input spatial data (e.g., .osm.pbf files)
│ └── pune.osm.pbf
│
├── notebook/ # Jupyter notebooks
│ └── run_all_qa.ipynb # Final automation notebook (with markdown + code)
│
├── outputs/ # QA output files
│ ├── qa_summary.csv # CSV report of all spatial QA issues
│ └── spatial_qa_errors.gpkg # GeoPackage of invalid/duplicate geometries
│
├── scripts/ # Python scripts for modular QA
│ ├── run_all_qa.py # Master script for full automation
│ ├── run_qa_checks_polygons.py # Polygon QA (invalid, overlap, sliver, duplicate)
│ ├── run_qa_checks_lines.py # Line QA (invalid, zero-length, duplicate)
│ └── run_qa_checks_points.py # Point QA (invalid, duplicate location)
│
├── sql/ # Optional: future SQL-only QA scripts (raw)
│ └── (optional custom .sql files)```
## Technologies Used
- Python (3.10)- PostgreSQL + PostGIS (v14+)
- Geopandas, Pandas, Psycopg2, SQLAlchemy
- OpenStreetMap PBF data
- Jupyter Notebook for interactive runs
---
## How to Run This Project
### Clone the Repository
```bash
git clone https://github.com/prachisarode95/Automated-Spatial_Data-QA.git
cd Automated-Spatial_Data-QA
```
### Set Up Conda Environment
```bash
conda create -n urbanqa_env python=3.10
conda activate urbanqa_env
conda install psycopg2 pandas geopandas sqlalchemy jupyter -c conda-forge
```
### Start the Notebook
```
jupyter notebook```
Note: Open notebook/QA_Reporting_Output_Automation.ipynb and run all cells. Alternatively, run scripts/run_all_qa.py from the terminal.---
# Outputs
| Output Type | Description |
| ------------------------ | ---------------------------- |
| `qa_summary.csv` | Tabular log of all issues |
| `spatial_qa_errors.gpkg` | Can be opened in QGIS/ArcGIS |
| `.ipynb` notebook | Full code for entire spatial QA pipeline in one go |---
# Outputs Visualization
---