https://github.com/ammaryasser455/geoqa
GeoQA: Geospatial Data Quality Assessment . one-liner profiling, quality scoring, interactive reports
https://github.com/ammaryasser455/geoqa
data-quality-checks geopandas geopython geospatial gis interactive-mapping profiling python quality-assessment quality-control shapely vector-data
Last synced: about 1 month ago
JSON representation
GeoQA: Geospatial Data Quality Assessment . one-liner profiling, quality scoring, interactive reports
- Host: GitHub
- URL: https://github.com/ammaryasser455/geoqa
- Owner: AmmarYasser455
- License: mit
- Created: 2026-02-11T02:35:35.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2026-02-12T19:35:42.000Z (about 2 months ago)
- Last Synced: 2026-02-12T21:51:23.416Z (about 2 months ago)
- Topics: data-quality-checks, geopandas, geopython, geospatial, gis, interactive-mapping, profiling, python, quality-assessment, quality-control, shapely, vector-data
- Language: Python
- Homepage: https://ammaryasser455.github.io/geoqa/
- Size: 5.74 MB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
GeoQA
Geospatial Data Quality Assessment & Interactive Profiling
Profile any geodataset with a single line of code
---
## What is GeoQA?
**GeoQA** is a Python package for **automated quality assessment and interactive profiling** of geospatial vector data. Think of it as [ydata-profiling](https://github.com/ydataai/ydata-profiling) (formerly pandas-profiling) but purpose-built for geodata.
- **Profile** any vector dataset (Shapefile, GeoJSON, GeoPackage, etc.) with one line of code
- **Validate** geometry quality — invalid, empty, duplicate, and mixed-type detection
- **Analyze** attribute completeness, statistics, and distributions
- **Visualize** data on interactive maps with quality-issue highlighting
- **Generate** self-contained HTML quality reports with charts and tables
- **Automate** QA/QC workflows via CLI or Python API
## Key Features
| Feature | Description |
|---|---|
| **One-liner profiling** | `geoqa.profile("data.shp")` — instant dataset overview |
| **Geometry validation** | OGC-compliant validity checks, empty/null detection, duplicate finding |
| **Attribute profiling** | Data types, null analysis, unique values, descriptive statistics |
| **Interactive maps** | Folium-based maps with issue highlighting and quality coloring |
| **HTML reports** | Self-contained quality reports with charts and tables |
| **CLI interface** | `geoqa profile data.shp` — terminal access to all features |
| **Auto-fix** | Repair invalid geometries with `profile.geometry_results` |
| **Spatial analysis** | CRS info, extent, area/length statistics, centroid computation |
## Installation
```bash
pip install geoqa
```
**From source (development):**
```bash
git clone https://github.com/AmmarYasser455/geoqa.git
cd geoqa
pip install -e ".[dev]"
```
**Requirements:** Python 3.9+ — depends on geopandas, shapely, folium, matplotlib, pandas, numpy, jinja2, click, and rich.
## Quick Start
### Python API
```python
import geoqa
# Profile a dataset
profile = geoqa.profile("buildings.shp")
# View summary
profile.summary()
# Interactive map with issue highlighting
profile.show_map()
# Quality check details
profile.quality_checks()
# Generate HTML report
profile.to_html("quality_report.html")
# Attribute and geometry statistics
profile.attribute_stats()
profile.geometry_stats()
```
### From a GeoDataFrame
```python
import geopandas as gpd
import geoqa
gdf = gpd.read_file("roads.geojson")
profile = geoqa.profile(gdf, name="City Roads")
profile.summary()
```
### CLI
```bash
geoqa profile data.shp # Profile a dataset
geoqa report data.shp --output report.html # Generate HTML report
geoqa check data.geojson # Run quality checks only
geoqa show data.gpkg --output map.html # Open interactive map
```
## Quality Score
GeoQA computes an overall quality score (0–100) based on:
| Component | Weight | Description |
|---|---|---|
| Geometry Validity | 40% | Percentage of valid geometries (OGC compliance) |
| Attribute Completeness | 30% | Percentage of non-null attribute values |
| CRS Defined | 15% | Whether a coordinate reference system is set |
| No Empty Geometries | 15% | Percentage of non-empty geometries |
## Quality Checks
| Check | Severity | Description |
|---|---|---|
| Geometry Validity | High | OGC Simple Features compliance |
| Empty Geometries | Medium | Geometries with no coordinates |
| Duplicate Geometries | Medium | Identical geometry pairs (WKB comparison) |
| CRS Defined | High | Coordinate reference system presence |
| Attribute Completeness | Varies | Null/missing value analysis |
| Mixed Geometry Types | Low | Multiple geometry types in one layer |
## Interactive Visualization
GeoQA creates interactive Folium maps with auto-reprojection to WGS84, quality highlighting (invalid in red, valid in blue), interactive tooltips, multiple basemaps, and layer controls.
```python
profile.show_map()
# Or use the visualization API directly
from geoqa.visualization import MapVisualizer
viz = MapVisualizer(profile.gdf, name="My Data")
quality_map = viz.create_quality_map(profile.geometry_results)
```
## HTML Reports
Generate comprehensive, self-contained HTML reports:
```python
profile.to_html("report.html")
```
Reports include quality score badges, dataset overview cards, quality check tables with pass/fail/warn indicators, spatial extent information, attribute completeness bars, numeric column statistics, and geometry type distributions.
## Supported Formats
All vector formats readable by GeoPandas/Fiona: Shapefile, GeoJSON, GeoPackage, KML, GML, CSV with geometry, File Geodatabase, and more via GDAL/OGR drivers.
## Architecture
```
geoqa/
├── core.py # GeoProfile — main entry point
├── geometry.py # Geometry validation & quality checks
├── attributes.py # Attribute profiling & statistics
├── spatial.py # CRS, extent, area/length analysis
├── visualization.py # Folium-based interactive maps
├── report.py # HTML report generation (Jinja2)
├── charts.py # Matplotlib chart generation
├── cli.py # Click-based CLI interface
└── utils.py # Utility functions
```
## Contributing
Contributions are welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
```bash
git clone https://github.com/AmmarYasser455/geoqa.git
cd geoqa
pip install -e ".[dev]"
pytest
black geoqa/ tests/
```
## License
[MIT License](LICENSE)
## Acknowledgments
GeoQA is inspired by the development methodology and open-source philosophy of [Dr. Qiusheng Wu](https://github.com/giswqs) and the [opengeos](https://github.com/opengeos) community. Key inspirations include [leafmap](https://github.com/opengeos/leafmap), [geemap](https://github.com/gee-community/geemap), and [ydata-profiling](https://github.com/ydataai/ydata-profiling).
## Citation
```bibtex
@software{geoqa2026,
title = {GeoQA: A Python Package for Geospatial Data Quality Assessment},
year = {2026},
url = {https://github.com/AmmarYasser455/geoqa},
license = {MIT}
}
```