https://github.com/prachisarode95/automated-spatial_data-qa

Python-based spatial QA tool for detecting geometry issues in OSM urban data using PostGIS. Outputs include CSV logs and GIS-ready files. Built to showcase automation skills for real-world QA workflows.
https://github.com/prachisarode95/automated-spatial_data-qa

error-handling geopandas jupyter-notebook openstreetmap-data osm2pgsql pandas pbf-data pgadmin4-desktop postgis-extension postgis-sql psycopg2 python-3 qa-automation qgis sqlalchemy

Last synced: 4 days ago
JSON representation

Host: GitHub
URL: https://github.com/prachisarode95/automated-spatial_data-qa
Owner: prachisarode95
Created: 2025-06-23T08:58:20.000Z (6 days ago)
Default Branch: main
Last Pushed: 2025-06-25T09:16:51.000Z (4 days ago)
Last Synced: 2025-06-25T10:20:00.352Z (4 days ago)
Topics: error-handling, geopandas, jupyter-notebook, openstreetmap-data, osm2pgsql, pandas, pbf-data, pgadmin4-desktop, postgis-extension, postgis-sql, psycopg2, python-3, qa-automation, qgis, sqlalchemy
Language: Python
Homepage:
Size: 39.8 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Title: Automated Spatial QA for Urban GIS Layers using Python & PostGIS

## Project Summary:
A Python-based QA engine to detect geometry errors in OSM urban datasets using PostGIS. Outputs include CSV reports and GIS-ready spatial files. Inspired by enterprise QA workflows and built to demonstrate spatial data automation skills.

## Project Phases:

**Phase 1:** Project setup, virtual environment, database setup
**Phase 2:** Download OSM data, clip to Pune, import into PostGIS using `osm2pgsql`
**Phase 3:** Run spatial QA checks (invalid geometry, overlaps, duplicates, slivers)
**Phase 4:** Generate CSV and GeoPackage output

---
## Features
- Run 100% in Python + PostGIS
- Validate spatial data stored in PostGIS
- Detect invalid geometries, overlaps, topology errors, duplicates
- Auto-fix common geometry errors
- Generate error logs and summary QA reports

---
## Sample QA Logic Implemented

| Check | Function |
| ----------------- | --------------------------------- |
| Invalid Geometry | `ST_IsValid()` |
| Overlaps | `ST_Overlaps()` |
| Duplicates | `ST_Equals()` + `ST_Intersects()` |
| Slivers | `ST_Area() < threshold` |
| Zero-Length Lines | `ST_Length()` |
| Point Duplicates | `ST_Equals()` between points |

---

## Project Structure
```
automated_spatial_qa/
├── .gitignore # Files/folders to exclude from version control
├── requirements.txt # Conda or pip dependencies
│
├── data/ # Input spatial data (e.g., .osm.pbf files)
│ └── pune.osm.pbf
│
├── notebook/ # Jupyter notebooks
│ └── run_all_qa.ipynb # Final automation notebook (with markdown + code)
│
├── outputs/ # QA output files
│ ├── qa_summary.csv # CSV report of all spatial QA issues
│ └── spatial_qa_errors.gpkg # GeoPackage of invalid/duplicate geometries
│
├── scripts/ # Python scripts for modular QA
│ ├── run_all_qa.py # Master script for full automation
│ ├── run_qa_checks_polygons.py # Polygon QA (invalid, overlap, sliver, duplicate)
│ ├── run_qa_checks_lines.py # Line QA (invalid, zero-length, duplicate)
│ └── run_qa_checks_points.py # Point QA (invalid, duplicate location)
│
├── sql/ # Optional: future SQL-only QA scripts (raw)
│ └── (optional custom .sql files)

```
## Technologies Used
- Python (3.10)

- PostgreSQL + PostGIS (v14+)

- Geopandas, Pandas, Psycopg2, SQLAlchemy

- OpenStreetMap PBF data

- Jupyter Notebook for interactive runs

---

## How to Run This Project

### Clone the Repository
```bash
git clone https://github.com/prachisarode95/Automated-Spatial_Data-QA.git
cd Automated-Spatial_Data-QA
```
### Set Up Conda Environment
```bash
conda create -n urbanqa_env python=3.10
conda activate urbanqa_env
conda install psycopg2 pandas geopandas sqlalchemy jupyter -c conda-forge
```
### Start the Notebook
```
jupyter notebook

```
Note: Open notebook/QA_Reporting_Output_Automation.ipynb and run all cells. Alternatively, run scripts/run_all_qa.py from the terminal.

---

# Outputs

| Output Type | Description |
| ------------------------ | ---------------------------- |
| `qa_summary.csv` | Tabular log of all issues |
| `spatial_qa_errors.gpkg` | Can be opened in QGIS/ArcGIS |
| `.ipynb` notebook | Full code for entire spatial QA pipeline in one go |

---
# Outputs Visualization
![Visualizing Spatial QA Errors](https://github.com/user-attachments/assets/e2c673e0-71f0-4156-b77b-20a374be66de)

---

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/prachisarode95/automated-spatial_data-qa

Awesome Lists containing this project

README