https://github.com/brownag/py-soildb
Python client for USDA-NRCS Soil Data
https://github.com/brownag/py-soildb
agriculture gis ncss nrcs python sda soil soil-data-access soil-science soil-survey sql usda
Last synced: 3 days ago
JSON representation
Python client for USDA-NRCS Soil Data
- Host: GitHub
- URL: https://github.com/brownag/py-soildb
- Owner: brownag
- License: mit
- Created: 2025-09-29T02:17:32.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2026-03-30T00:13:41.000Z (7 days ago)
- Last Synced: 2026-03-30T03:54:15.394Z (7 days ago)
- Topics: agriculture, gis, ncss, nrcs, python, sda, soil, soil-data-access, soil-science, soil-survey, sql, usda
- Language: Python
- Homepage: https://py-soildb.readthedocs.io
- Size: 644 KB
- Stars: 3
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# soildb
[](https://pypi.org/project/soildb/)
[](https://opensource.org/licenses/MIT)
Python client for the USDA-NRCS Soil Data Access (SDA) web service and other
National Cooperative Soil Survey data sources.
## Overview
`soildb` provides Python access to:
- **Soil Survey Data**: USDA Soil Data Access (SDA) web service for SSURGO/STATSGO
- **Laboratory Data**: NCSS Kellogg Soil Survey Laboratory (KSSL) characterization data
- **Bulk Downloads**: Complete SSURGO/STATSGO datasets from Web Soil Survey
- **Multiple Backends**: Query data from SDA web service, local SQLite snapshots, or GeoPackage files
Query soil survey data via web service or local database, export to pandas/polars DataFrames,
and handle spatial queries.
## Installation
``` bash
pip install soildb
```
For spatial functionality:
``` bash
pip install soildb[spatial]
```
For all optional features support:
``` bash
pip install soildb[all]
```
## Features
### Soil Survey Data (SDA)
- Query SSURGO/STATSGO data from NRCS Soil Data Access web service
- Build custom SQL queries with fluent interface
- Spatial queries with points, bounding boxes, and polygons
- Bulk data fetching with automatic pagination and chunking
- Export to pandas and polars DataFrames
### Laboratory Characterization Data
- Access NCSS Kellogg Soil Survey Laboratory (KSSL) pedon data
- Query via SDA web service or local SQLite snapshot databases
- Full horizon-level data with lab analyses
- Structured object models for nested pedon data
- Support for flexible column selection
### Web Soil Survey Downloads
- Download complete SSURGO datasets as ZIP files
- Download STATSGO (general soil map) data
- Concurrent downloads with progress tracking
- Automatic file extraction and organization
- State-wide and custom area selections
### Multi-Backend Support
- Query from SDA web service (live data)
- Query from local SQLite snapshots (offline analysis)
- Support for GeoPackage files with spatial features
- Unified interface across all backends
- Async I/O for high performance and concurrency
## Quick Start
### Query Builder
Build and execute custom SQL queries with the fluent interface:
``` python
from soildb import Query
query = (Query()
.select("mukey", "muname", "musym")
.from_("mapunit")
.inner_join("legend", "mapunit.lkey = legend.lkey")
.where("areasymbol = 'IA109'")
.limit(5))
# Inspect the generated SQL
print(query.to_sql())
# Execute and get results
import asyncio
from soildb import SDAClient
async def main():
result = await SDAClient().execute(query)
return result.to_pandas()
df = asyncio.run(main())
print(df.head())
```
SELECT TOP 5 mukey, muname, musym FROM mapunit INNER JOIN legend ON mapunit.lkey = legend.lkey WHERE areasymbol = 'IA109'
mukey muname musym
0 408337 Colo silty clay loam, channeled, 0 to 2 percen... 1133
1 408339 Colo silty clay loam, 0 to 2 percent slopes 133
2 408340 Colo silty clay loam, 2 to 4 percent slopes 133B
3 408345 Clarion loam, 9 to 14 percent slopes, moderate... 138D2
4 408348 Harpster silt loam, 0 to 2 percent slopes 1595
## Async vs Synchronous Usage
All soildb functions have both async and synchronous versions. For most use cases, the synchronous `.sync()` version is simpler and easier to use.
### Synchronous Usage
For simple scripts and interactive use, soildb provides synchronous versions of all async functions:
``` python
from soildb import get_mapunit_by_areasymbol
# Synchronous usage - no async/await needed!
mapunits = get_mapunit_by_areasymbol.sync("IA109")
df = mapunits.to_pandas()
print(f"Found {len(df)} map units")
df.head()
```
Found 80 map units
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
vertical-align: right;
}
| | mukey | musym | muname | mukind | muacres | areasymbol | areaname |
|----|----|----|----|----|----|----|----|
| 0 | 408333 | 1032 | Spicer silty clay loam, 0 to 2 percent slopes | Consociation | 1834 | IA109 | Kossuth County, Iowa |
| 1 | 408334 | 107 | Webster clay loam, 0 to 2 percent slopes | Consociation | 46882 | IA109 | Kossuth County, Iowa |
| 2 | 408335 | 108 | Wadena loam, 0 to 2 percent slopes | Consociation | 807 | IA109 | Kossuth County, Iowa |
| 3 | 408336 | 108B | Wadena loam, 2 to 6 percent slopes | Consociation | 1103 | IA109 | Kossuth County, Iowa |
| 4 | 408337 | 1133 | Colo silty clay loam, channeled, 0 to 2 percen... | Consociation | 1403 | IA109 | Kossuth County, Iowa |
The `.sync` methods automatically manage SDA client connections for you. For multiple calls, consider reusing a client:
``` python
from soildb import SDAClient, get_mapunit_by_areasymbol
client = SDAClient()
mapunits1 = get_mapunit_by_areasymbol.sync("IA109", client=client)
mapunits2 = get_mapunit_by_areasymbol.sync("IA113", client=client)
client.close()
```
### Convenience Functions
soildb provides high-level functions for common tasks:
``` python
from soildb import get_mapunit_by_areasymbol
mapunits = get_mapunit_by_areasymbol.sync("IA109")
df = mapunits.to_pandas()
print(f"Found {len(df)} map units")
df.head()
```
Found 80 map units
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| | mukey | musym | muname | mukind | muacres | areasymbol | areaname |
|----|----|----|----|----|----|----|----|
| 0 | 408333 | 1032 | Spicer silty clay loam, 0 to 2 percent slopes | Consociation | 1834 | IA109 | Kossuth County, Iowa |
| 1 | 408334 | 107 | Webster clay loam, 0 to 2 percent slopes | Consociation | 46882 | IA109 | Kossuth County, Iowa |
| 2 | 408335 | 108 | Wadena loam, 0 to 2 percent slopes | Consociation | 807 | IA109 | Kossuth County, Iowa |
| 3 | 408336 | 108B | Wadena loam, 2 to 6 percent slopes | Consociation | 1103 | IA109 | Kossuth County, Iowa |
| 4 | 408337 | 1133 | Colo silty clay loam, channeled, 0 to 2 percen... | Consociation | 1403 | IA109 | Kossuth County, Iowa |
If you have suggestions for new convenience functions please file a
[feature request on
GitHub](https://github.com/brownag/py-soildb/issues/new).
### Spatial Queries
Query soil data by location with points, bounding boxes, or polygons:
``` python
from soildb import spatial_query
# Point query
response = spatial_query.sync(
geometry="POINT(-93.6 42.0)",
table="mupolygon"
)
df = response.to_pandas()
print(f"Point query found {len(df)} results")
```
Point query found 1 results
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| | mukey | areasymbol | musym | nationalmusym | muname | mukind |
|----|----|----|----|----|----|----|
| 0 | 411278 | IA169 | 1314 | fsz1 | Hanlon-Spillville complex, channeled, 0 to 2 p... | Complex |
### Bulk Data Fetching
Retrieve large datasets efficiently with automatic pagination and chunking:
``` python
from soildb import fetch_by_keys, get_mukey_by_areasymbol
# Get mukeys for survey areas
areas = ["IA109", "IA113", "IA117"]
all_mukeys = get_mukey_by_areasymbol.sync(areas)
print(f"Found {len(all_mukeys)} mukeys across {len(areas)} areas")
# Fetch components in chunks automatically
response = fetch_by_keys.sync(
all_mukeys,
"component",
key_column="mukey",
chunk_size=100,
columns=["mukey", "cokey", "compname", "localphase", "comppct_r"]
)
df = response.to_pandas()
print(f"Fetched {len(df)} component records")
```
Found 410 mukeys across 3 areas
Fetching 410 keys in 5 chunks of 100
Fetched 1067 component records
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| | mukey | cokey | compname | localphase | comppct_r |
|-----|--------|----------|----------|------------|-----------|
| 0 | 408333 | 25562547 | Kingston | \ | 2 |
| 1 | 408333 | 25562548 | Okoboji | \ | 5 |
| 2 | 408333 | 25562549 | Spicer | \ | 90 |
| 3 | 408333 | 25562550 | Madelia | \ | 3 |
| 4 | 408334 | 25562837 | Okoboji | \ | 5 |
| 5 | 408334 | 25562838 | Glencoe | \ | 3 |
| 6 | 408334 | 25562839 | Canisteo | \ | 2 |
| 7 | 408334 | 25562840 | Webster | \ | 85 |
| 8 | 408334 | 25562841 | Nicollet | \ | 5 |
| 9 | 408335 | 25562135 | Biscay | \ | 1 |
The `component` table has a hierarchical relationship:
- mukey (map unit key) is the parent
- cokey (component key) is the child
So when fetching components, you typically want to filter by mukey to
get all components for specific map units.
Use the `fetch_by_keys()` function with the `"mukey"` as the
`key_column` to achieve this with automatic pagination over chunks with
`100` rows each (or specify your own `chunk_size`).
### Bulk Downloads (Web Soil Survey)
Download complete SSURGO and STATSGO datasets as ZIP files from the USDA Web Soil Survey portal:
``` python
from soildb import download_wss
# Download specific survey areas
paths = download_wss.sync(
areasymbols=["IA109", "IA113"],
dest_dir="./ssurgo_data",
extract=True
)
print(f"Downloaded {len(paths)} survey areas")
# Download all survey areas for a state
paths = download_wss.sync(
where_clause="areasymbol LIKE 'IA%'",
dest_dir="./iowa_ssurgo",
extract=True,
remove_zip=True # Clean up ZIP files after extraction
)
# Download STATSGO (general soil map) data
paths = download_wss.sync(
areasymbols=["IA"],
db="STATSGO",
dest_dir="./iowa_statsgo",
extract=True
)
```
Each extracted survey area directory contains:
- `tabular/` - Pipe-delimited TXT files with soil data tables
- `spatial/` - ESRI shapefiles with map unit polygons and boundaries
**Use Cases:**
- **SDA**: Live queries, filtered data, programmatic access to current data
- **WSS Downloads**: Complete offline datasets, bulk data for analysis, static snapshots updated annually
## Async Usage
For performance-critical applications, use async functions directly with concurrent requests:
``` python
import asyncio
from soildb import fetch_by_keys, get_mukey_by_areasymbol
async def concurrent_example():
# Get mukeys for multiple areas concurrently
areas = ["IA109", "IA113", "IA117"]
all_mukeys = await get_mukey_by_areasymbol(areas)
# Fetch components concurrently with automatic pagination
response = await fetch_by_keys(
all_mukeys,
"component",
key_column="mukey",
chunk_size=100,
columns=["mukey", "cokey", "compname", "comppct_r"]
)
return response.to_pandas()
# Run async function
df = asyncio.run(concurrent_example())
```
For more async patterns, see the [Async Programming Guide](docs/async.qmd).
# Examples
See the [`examples/` directory](examples/) and [documentation](docs/)
for detailed usage patterns.
## License
This project is licensed under the MIT License. See the
[LICENSE](LICENSE) file for details.