https://github.com/brownag/py-soildb

Python client for USDA-NRCS Soil Data
https://github.com/brownag/py-soildb
agriculture gis ncss nrcs python sda soil soil-data-access soil-science soil-survey sql usda
Last synced: 2 months ago
JSON representation
Python client for USDA-NRCS Soil Data
Host: GitHub
URL: https://github.com/brownag/py-soildb
Owner: brownag
License: mit
Created: 2025-09-29T02:17:32.000Z (8 months ago)
Default Branch: main
Last Pushed: 2026-03-30T00:13:41.000Z (2 months ago)
Last Synced: 2026-03-30T03:54:15.394Z (2 months ago)
Topics: agriculture, gis, ncss, nrcs, python, sda, soil, soil-data-access, soil-science, soil-survey, sql, usda
Language: Python
Homepage: https://py-soildb.readthedocs.io
Size: 644 KB
Stars: 3
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project

README

          # soildb

[![PyPI

version](https://badge.fury.io/py/soildb.svg)](https://pypi.org/project/soildb/)

[![License:

MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)

Python client for the USDA-NRCS Soil Data Access (SDA) web service and other

National Cooperative Soil Survey data sources.

## Overview

`soildb` provides Python access to:

- **Soil Survey Data**: USDA Soil Data Access (SDA) web service for SSURGO/STATSGO

- **Laboratory Data**: NCSS Kellogg Soil Survey Laboratory (KSSL) characterization data

- **Bulk Downloads**: Complete SSURGO/STATSGO datasets from Web Soil Survey

- **Multiple Backends**: Query data from SDA web service, local SQLite snapshots, or GeoPackage files

Query soil survey data via web service or local database, export to pandas/polars DataFrames,

and handle spatial queries.

## Installation

``` bash

pip install soildb

```

For spatial functionality:

``` bash

pip install soildb[spatial]

```

For all optional features support:

``` bash

pip install soildb[all]

```

## Features

### Soil Survey Data (SDA)

- Query SSURGO/STATSGO data from NRCS Soil Data Access web service

- Build custom SQL queries with fluent interface

- Spatial queries with points, bounding boxes, and polygons

- Bulk data fetching with automatic pagination and chunking

- Export to pandas and polars DataFrames

### Laboratory Characterization Data

- Access NCSS Kellogg Soil Survey Laboratory (KSSL) pedon data

- Query via SDA web service or local SQLite snapshot databases

- Full horizon-level data with lab analyses

- Structured object models for nested pedon data

- Support for flexible column selection

### Web Soil Survey Downloads

- Download complete SSURGO datasets as ZIP files

- Download STATSGO (general soil map) data

- Concurrent downloads with progress tracking

- Automatic file extraction and organization

- State-wide and custom area selections

### Multi-Backend Support

- Query from SDA web service (live data)

- Query from local SQLite snapshots (offline analysis)

- Support for GeoPackage files with spatial features

- Unified interface across all backends

- Async I/O for high performance and concurrency

## Quick Start

### Query Builder

Build and execute custom SQL queries with the fluent interface:

``` python

from soildb import Query

query = (Query()

        .select("mukey", "muname", "musym")

        .from_("mapunit")

        .inner_join("legend", "mapunit.lkey = legend.lkey")

        .where("areasymbol = 'IA109'")

        .limit(5))

# Inspect the generated SQL

print(query.to_sql())

# Execute and get results

import asyncio

from soildb import SDAClient

async def main():

    result = await SDAClient().execute(query)

    return result.to_pandas()

df = asyncio.run(main())

print(df.head())

```

    SELECT TOP 5 mukey, muname, musym FROM mapunit INNER JOIN legend ON mapunit.lkey = legend.lkey WHERE areasymbol = 'IA109'

        mukey                                             muname  musym

    0  408337  Colo silty clay loam, channeled, 0 to 2 percen...   1133

    1  408339        Colo silty clay loam, 0 to 2 percent slopes    133

    2  408340        Colo silty clay loam, 2 to 4 percent slopes   133B

    3  408345  Clarion loam, 9 to 14 percent slopes, moderate...  138D2

    4  408348          Harpster silt loam, 0 to 2 percent slopes   1595

## Async vs Synchronous Usage

All soildb functions have both async and synchronous versions. For most use cases, the synchronous `.sync()` version is simpler and easier to use.

### Synchronous Usage

For simple scripts and interactive use, soildb provides synchronous versions of all async functions:

``` python

from soildb import get_mapunit_by_areasymbol

# Synchronous usage - no async/await needed!

mapunits = get_mapunit_by_areasymbol.sync("IA109")

df = mapunits.to_pandas()

print(f"Found {len(df)} map units")

df.head()

```

    Found 80 map units



    .dataframe tbody tr th:only-of-type {

        vertical-align: middle;

    }

&#10;    .dataframe tbody tr th {

        vertical-align: top;

    }

&#10;    .dataframe thead th {

        vertical-align: right;

    }

|  | mukey | musym | muname | mukind | muacres | areasymbol | areaname |

|----|----|----|----|----|----|----|----|

| 0 | 408333 | 1032 | Spicer silty clay loam, 0 to 2 percent slopes | Consociation | 1834 | IA109 | Kossuth County, Iowa |

| 1 | 408334 | 107 | Webster clay loam, 0 to 2 percent slopes | Consociation | 46882 | IA109 | Kossuth County, Iowa |

| 2 | 408335 | 108 | Wadena loam, 0 to 2 percent slopes | Consociation | 807 | IA109 | Kossuth County, Iowa |

| 3 | 408336 | 108B | Wadena loam, 2 to 6 percent slopes | Consociation | 1103 | IA109 | Kossuth County, Iowa |

| 4 | 408337 | 1133 | Colo silty clay loam, channeled, 0 to 2 percen... | Consociation | 1403 | IA109 | Kossuth County, Iowa |



The `.sync` methods automatically manage SDA client connections for you. For multiple calls, consider reusing a client:

``` python

from soildb import SDAClient, get_mapunit_by_areasymbol

client = SDAClient()

mapunits1 = get_mapunit_by_areasymbol.sync("IA109", client=client)

mapunits2 = get_mapunit_by_areasymbol.sync("IA113", client=client)

client.close()

```

### Convenience Functions

soildb provides high-level functions for common tasks:

``` python

from soildb import get_mapunit_by_areasymbol

mapunits = get_mapunit_by_areasymbol.sync("IA109")

df = mapunits.to_pandas()

print(f"Found {len(df)} map units")

df.head()

```

    Found 80 map units



    .dataframe tbody tr th:only-of-type {

        vertical-align: middle;

    }

&#10;    .dataframe tbody tr th {

        vertical-align: top;

    }

&#10;    .dataframe thead th {

        text-align: right;

    }

|  | mukey | musym | muname | mukind | muacres | areasymbol | areaname |

|----|----|----|----|----|----|----|----|

| 0 | 408333 | 1032 | Spicer silty clay loam, 0 to 2 percent slopes | Consociation | 1834 | IA109 | Kossuth County, Iowa |

| 1 | 408334 | 107 | Webster clay loam, 0 to 2 percent slopes | Consociation | 46882 | IA109 | Kossuth County, Iowa |

| 2 | 408335 | 108 | Wadena loam, 0 to 2 percent slopes | Consociation | 807 | IA109 | Kossuth County, Iowa |

| 3 | 408336 | 108B | Wadena loam, 2 to 6 percent slopes | Consociation | 1103 | IA109 | Kossuth County, Iowa |

| 4 | 408337 | 1133 | Colo silty clay loam, channeled, 0 to 2 percen... | Consociation | 1403 | IA109 | Kossuth County, Iowa |



If you have suggestions for new convenience functions please file a

[feature request on

GitHub](https://github.com/brownag/py-soildb/issues/new).

### Spatial Queries

Query soil data by location with points, bounding boxes, or polygons:

``` python

from soildb import spatial_query

# Point query

response = spatial_query.sync(

    geometry="POINT(-93.6 42.0)",

    table="mupolygon"

)

df = response.to_pandas()

print(f"Point query found {len(df)} results")

```

    Point query found 1 results



    .dataframe tbody tr th:only-of-type {

        vertical-align: middle;

    }

&#10;    .dataframe tbody tr th {

        vertical-align: top;

    }

&#10;    .dataframe thead th {

        text-align: right;

    }

|  | mukey | areasymbol | musym | nationalmusym | muname | mukind |

|----|----|----|----|----|----|----|

| 0 | 411278 | IA169 | 1314 | fsz1 | Hanlon-Spillville complex, channeled, 0 to 2 p... | Complex |



### Bulk Data Fetching

Retrieve large datasets efficiently with automatic pagination and chunking:

``` python

from soildb import fetch_by_keys, get_mukey_by_areasymbol

# Get mukeys for survey areas

areas = ["IA109", "IA113", "IA117"]

all_mukeys = get_mukey_by_areasymbol.sync(areas)

print(f"Found {len(all_mukeys)} mukeys across {len(areas)} areas")

# Fetch components in chunks automatically

response = fetch_by_keys.sync(

    all_mukeys, 

    "component", 

    key_column="mukey", 

    chunk_size=100,

    columns=["mukey", "cokey", "compname", "localphase", "comppct_r"]

)

df = response.to_pandas()

print(f"Fetched {len(df)} component records")

```

    Found 410 mukeys across 3 areas

    Fetching 410 keys in 5 chunks of 100

    Fetched 1067 component records



    .dataframe tbody tr th:only-of-type {

        vertical-align: middle;

    }

&#10;    .dataframe tbody tr th {

        vertical-align: top;

    }

&#10;    .dataframe thead th {

        text-align: right;

    }

|     | mukey  | cokey    | compname | localphase | comppct_r |

|-----|--------|----------|----------|------------|-----------|

| 0   | 408333 | 25562547 | Kingston | \     | 2         |

| 1   | 408333 | 25562548 | Okoboji  | \     | 5         |

| 2   | 408333 | 25562549 | Spicer   | \     | 90        |

| 3   | 408333 | 25562550 | Madelia  | \     | 3         |

| 4   | 408334 | 25562837 | Okoboji  | \     | 5         |

| 5   | 408334 | 25562838 | Glencoe  | \     | 3         |

| 6   | 408334 | 25562839 | Canisteo | \     | 2         |

| 7   | 408334 | 25562840 | Webster  | \     | 85        |

| 8   | 408334 | 25562841 | Nicollet | \     | 5         |

| 9   | 408335 | 25562135 | Biscay   | \     | 1         |



The `component` table has a hierarchical relationship:

- mukey (map unit key) is the parent

- cokey (component key) is the child

So when fetching components, you typically want to filter by mukey to

get all components for specific map units.

Use the `fetch_by_keys()` function with the `"mukey"` as the

`key_column` to achieve this with automatic pagination over chunks with

`100` rows each (or specify your own `chunk_size`).

### Bulk Downloads (Web Soil Survey)

Download complete SSURGO and STATSGO datasets as ZIP files from the USDA Web Soil Survey portal:

``` python

from soildb import download_wss

# Download specific survey areas

paths = download_wss.sync(

    areasymbols=["IA109", "IA113"],

    dest_dir="./ssurgo_data",

    extract=True

)

print(f"Downloaded {len(paths)} survey areas")

# Download all survey areas for a state

paths = download_wss.sync(

    where_clause="areasymbol LIKE 'IA%'",

    dest_dir="./iowa_ssurgo",

    extract=True,

    remove_zip=True  # Clean up ZIP files after extraction

)

# Download STATSGO (general soil map) data

paths = download_wss.sync(

    areasymbols=["IA"],

    db="STATSGO",

    dest_dir="./iowa_statsgo",

    extract=True

)

```

Each extracted survey area directory contains:

- `tabular/` - Pipe-delimited TXT files with soil data tables

- `spatial/` - ESRI shapefiles with map unit polygons and boundaries

**Use Cases:**

- **SDA**: Live queries, filtered data, programmatic access to current data

- **WSS Downloads**: Complete offline datasets, bulk data for analysis, static snapshots updated annually

## Async Usage

For performance-critical applications, use async functions directly with concurrent requests:

``` python

import asyncio

from soildb import fetch_by_keys, get_mukey_by_areasymbol

async def concurrent_example():

    # Get mukeys for multiple areas concurrently

    areas = ["IA109", "IA113", "IA117"]

    all_mukeys = await get_mukey_by_areasymbol(areas)

    

    # Fetch components concurrently with automatic pagination

    response = await fetch_by_keys(

        all_mukeys,

        "component",

        key_column="mukey",

        chunk_size=100,

        columns=["mukey", "cokey", "compname", "comppct_r"]

    )

    return response.to_pandas()

# Run async function

df = asyncio.run(concurrent_example())

```

For more async patterns, see the [Async Programming Guide](docs/async.qmd).

# Examples

See the [`examples/` directory](examples/) and [documentation](docs/)

for detailed usage patterns.

## License

This project is licensed under the MIT License. See the

[LICENSE](LICENSE) file for details.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/brownag/py-soildb

Awesome Lists containing this project

README