{"id":31681048,"url":"https://github.com/brownag/py-soildb","last_synced_at":"2026-04-02T18:40:49.364Z","repository":{"id":317134180,"uuid":"1066090124","full_name":"brownag/py-soildb","owner":"brownag","description":"Python client for USDA-NRCS Soil Data","archived":false,"fork":false,"pushed_at":"2026-03-30T00:13:41.000Z","size":659,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-30T03:54:15.394Z","etag":null,"topics":["agriculture","gis","ncss","nrcs","python","sda","soil","soil-data-access","soil-science","soil-survey","sql","usda"],"latest_commit_sha":null,"homepage":"https://py-soildb.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/brownag.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-29T02:17:32.000Z","updated_at":"2026-03-30T00:10:57.000Z","dependencies_parsed_at":"2025-09-29T05:24:16.045Z","dependency_job_id":"4b405ccf-af82-4765-afa0-7625260f4a72","html_url":"https://github.com/brownag/py-soildb","commit_stats":null,"previous_names":["brownag/py-soildb"],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/brownag/py-soildb","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brownag%2Fpy-soildb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brownag%2Fpy-soildb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brownag%2Fpy-soildb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brownag%2Fpy-soildb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/brownag","download_url":"https://codeload.github.com/brownag/py-soildb/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brownag%2Fpy-soildb/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31313158,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-02T12:59:32.332Z","status":"ssl_error","status_checked_at":"2026-04-02T12:54:48.875Z","response_time":89,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agriculture","gis","ncss","nrcs","python","sda","soil","soil-data-access","soil-science","soil-survey","sql","usda"],"created_at":"2025-10-08T07:45:18.921Z","updated_at":"2026-04-02T18:40:49.352Z","avatar_url":"https://github.com/brownag.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# soildb\n\n\n[![PyPI\nversion](https://badge.fury.io/py/soildb.svg)](https://pypi.org/project/soildb/)\n[![License:\nMIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n\nPython client for the USDA-NRCS Soil Data Access (SDA) web service and other\nNational Cooperative Soil Survey data sources.\n\n## Overview\n\n`soildb` provides Python access to:\n\n- **Soil Survey Data**: USDA Soil Data Access (SDA) web service for SSURGO/STATSGO\n- **Laboratory Data**: NCSS Kellogg Soil Survey Laboratory (KSSL) characterization data\n- **Bulk Downloads**: Complete SSURGO/STATSGO datasets from Web Soil Survey\n- **Multiple Backends**: Query data from SDA web service, local SQLite snapshots, or GeoPackage files\n\nQuery soil survey data via web service or local database, export to pandas/polars DataFrames,\nand handle spatial queries.\n\n## Installation\n\n``` bash\npip install soildb\n```\n\nFor spatial functionality:\n\n``` bash\npip install soildb[spatial]\n```\n\nFor all optional features support:\n\n``` bash\npip install soildb[all]\n```\n\n## Features\n\n### Soil Survey Data (SDA)\n\n- Query SSURGO/STATSGO data from NRCS Soil Data Access web service\n- Build custom SQL queries with fluent interface\n- Spatial queries with points, bounding boxes, and polygons\n- Bulk data fetching with automatic pagination and chunking\n- Export to pandas and polars DataFrames\n\n### Laboratory Characterization Data\n\n- Access NCSS Kellogg Soil Survey Laboratory (KSSL) pedon data\n- Query via SDA web service or local SQLite snapshot databases\n- Full horizon-level data with lab analyses\n- Structured object models for nested pedon data\n- Support for flexible column selection\n\n### Web Soil Survey Downloads\n\n- Download complete SSURGO datasets as ZIP files\n- Download STATSGO (general soil map) data\n- Concurrent downloads with progress tracking\n- Automatic file extraction and organization\n- State-wide and custom area selections\n\n### Multi-Backend Support\n\n- Query from SDA web service (live data)\n- Query from local SQLite snapshots (offline analysis)\n- Support for GeoPackage files with spatial features\n- Unified interface across all backends\n- Async I/O for high performance and concurrency\n\n## Quick Start\n\n### Query Builder\n\nBuild and execute custom SQL queries with the fluent interface:\n\n``` python\nfrom soildb import Query\n\nquery = (Query()\n        .select(\"mukey\", \"muname\", \"musym\")\n        .from_(\"mapunit\")\n        .inner_join(\"legend\", \"mapunit.lkey = legend.lkey\")\n        .where(\"areasymbol = 'IA109'\")\n        .limit(5))\n\n# Inspect the generated SQL\nprint(query.to_sql())\n\n# Execute and get results\nimport asyncio\nfrom soildb import SDAClient\n\nasync def main():\n    result = await SDAClient().execute(query)\n    return result.to_pandas()\n\ndf = asyncio.run(main())\nprint(df.head())\n```\n\n    SELECT TOP 5 mukey, muname, musym FROM mapunit INNER JOIN legend ON mapunit.lkey = legend.lkey WHERE areasymbol = 'IA109'\n        mukey                                             muname  musym\n    0  408337  Colo silty clay loam, channeled, 0 to 2 percen...   1133\n    1  408339        Colo silty clay loam, 0 to 2 percent slopes    133\n    2  408340        Colo silty clay loam, 2 to 4 percent slopes   133B\n    3  408345  Clarion loam, 9 to 14 percent slopes, moderate...  138D2\n    4  408348          Harpster silt loam, 0 to 2 percent slopes   1595\n\n## Async vs Synchronous Usage\n\nAll soildb functions have both async and synchronous versions. For most use cases, the synchronous `.sync()` version is simpler and easier to use.\n\n### Synchronous Usage\n\nFor simple scripts and interactive use, soildb provides synchronous versions of all async functions:\n\n``` python\nfrom soildb import get_mapunit_by_areasymbol\n\n# Synchronous usage - no async/await needed!\nmapunits = get_mapunit_by_areasymbol.sync(\"IA109\")\ndf = mapunits.to_pandas()\nprint(f\"Found {len(df)} map units\")\ndf.head()\n```\n\n    Found 80 map units\n\n\u003cdiv\u003e\n\u003cstyle scoped\u003e\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\u0026#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\u0026#10;    .dataframe thead th {\n        vertical-align: right;\n    }\n\u003c/style\u003e\n\n|  | mukey | musym | muname | mukind | muacres | areasymbol | areaname |\n|----|----|----|----|----|----|----|----|\n| 0 | 408333 | 1032 | Spicer silty clay loam, 0 to 2 percent slopes | Consociation | 1834 | IA109 | Kossuth County, Iowa |\n| 1 | 408334 | 107 | Webster clay loam, 0 to 2 percent slopes | Consociation | 46882 | IA109 | Kossuth County, Iowa |\n| 2 | 408335 | 108 | Wadena loam, 0 to 2 percent slopes | Consociation | 807 | IA109 | Kossuth County, Iowa |\n| 3 | 408336 | 108B | Wadena loam, 2 to 6 percent slopes | Consociation | 1103 | IA109 | Kossuth County, Iowa |\n| 4 | 408337 | 1133 | Colo silty clay loam, channeled, 0 to 2 percen... | Consociation | 1403 | IA109 | Kossuth County, Iowa |\n\n\u003c/div\u003e\n\nThe `.sync` methods automatically manage SDA client connections for you. For multiple calls, consider reusing a client:\n\n``` python\nfrom soildb import SDAClient, get_mapunit_by_areasymbol\n\nclient = SDAClient()\nmapunits1 = get_mapunit_by_areasymbol.sync(\"IA109\", client=client)\nmapunits2 = get_mapunit_by_areasymbol.sync(\"IA113\", client=client)\nclient.close()\n```\n\n### Convenience Functions\n\nsoildb provides high-level functions for common tasks:\n\n``` python\nfrom soildb import get_mapunit_by_areasymbol\n\nmapunits = get_mapunit_by_areasymbol.sync(\"IA109\")\ndf = mapunits.to_pandas()\nprint(f\"Found {len(df)} map units\")\ndf.head()\n```\n\n    Found 80 map units\n\n\u003cdiv\u003e\n\u003cstyle scoped\u003e\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\u0026#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\u0026#10;    .dataframe thead th {\n        text-align: right;\n    }\n\u003c/style\u003e\n\n|  | mukey | musym | muname | mukind | muacres | areasymbol | areaname |\n|----|----|----|----|----|----|----|----|\n| 0 | 408333 | 1032 | Spicer silty clay loam, 0 to 2 percent slopes | Consociation | 1834 | IA109 | Kossuth County, Iowa |\n| 1 | 408334 | 107 | Webster clay loam, 0 to 2 percent slopes | Consociation | 46882 | IA109 | Kossuth County, Iowa |\n| 2 | 408335 | 108 | Wadena loam, 0 to 2 percent slopes | Consociation | 807 | IA109 | Kossuth County, Iowa |\n| 3 | 408336 | 108B | Wadena loam, 2 to 6 percent slopes | Consociation | 1103 | IA109 | Kossuth County, Iowa |\n| 4 | 408337 | 1133 | Colo silty clay loam, channeled, 0 to 2 percen... | Consociation | 1403 | IA109 | Kossuth County, Iowa |\n\n\u003c/div\u003e\n\nIf you have suggestions for new convenience functions please file a\n[feature request on\nGitHub](https://github.com/brownag/py-soildb/issues/new).\n\n### Spatial Queries\n\nQuery soil data by location with points, bounding boxes, or polygons:\n\n``` python\nfrom soildb import spatial_query\n\n# Point query\nresponse = spatial_query.sync(\n    geometry=\"POINT(-93.6 42.0)\",\n    table=\"mupolygon\"\n)\ndf = response.to_pandas()\nprint(f\"Point query found {len(df)} results\")\n```\n\n    Point query found 1 results\n\n\u003cdiv\u003e\n\u003cstyle scoped\u003e\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\u0026#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\u0026#10;    .dataframe thead th {\n        text-align: right;\n    }\n\u003c/style\u003e\n\n|  | mukey | areasymbol | musym | nationalmusym | muname | mukind |\n|----|----|----|----|----|----|----|\n| 0 | 411278 | IA169 | 1314 | fsz1 | Hanlon-Spillville complex, channeled, 0 to 2 p... | Complex |\n\n\u003c/div\u003e\n\n### Bulk Data Fetching\n\nRetrieve large datasets efficiently with automatic pagination and chunking:\n\n``` python\nfrom soildb import fetch_by_keys, get_mukey_by_areasymbol\n\n# Get mukeys for survey areas\nareas = [\"IA109\", \"IA113\", \"IA117\"]\nall_mukeys = get_mukey_by_areasymbol.sync(areas)\n\nprint(f\"Found {len(all_mukeys)} mukeys across {len(areas)} areas\")\n\n# Fetch components in chunks automatically\nresponse = fetch_by_keys.sync(\n    all_mukeys, \n    \"component\", \n    key_column=\"mukey\", \n    chunk_size=100,\n    columns=[\"mukey\", \"cokey\", \"compname\", \"localphase\", \"comppct_r\"]\n)\ndf = response.to_pandas()\nprint(f\"Fetched {len(df)} component records\")\n```\n\n    Found 410 mukeys across 3 areas\n    Fetching 410 keys in 5 chunks of 100\n    Fetched 1067 component records\n\n\u003cdiv\u003e\n\u003cstyle scoped\u003e\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\u0026#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\u0026#10;    .dataframe thead th {\n        text-align: right;\n    }\n\u003c/style\u003e\n\n|     | mukey  | cokey    | compname | localphase | comppct_r |\n|-----|--------|----------|----------|------------|-----------|\n| 0   | 408333 | 25562547 | Kingston | \\\u003cNA\\\u003e     | 2         |\n| 1   | 408333 | 25562548 | Okoboji  | \\\u003cNA\\\u003e     | 5         |\n| 2   | 408333 | 25562549 | Spicer   | \\\u003cNA\\\u003e     | 90        |\n| 3   | 408333 | 25562550 | Madelia  | \\\u003cNA\\\u003e     | 3         |\n| 4   | 408334 | 25562837 | Okoboji  | \\\u003cNA\\\u003e     | 5         |\n| 5   | 408334 | 25562838 | Glencoe  | \\\u003cNA\\\u003e     | 3         |\n| 6   | 408334 | 25562839 | Canisteo | \\\u003cNA\\\u003e     | 2         |\n| 7   | 408334 | 25562840 | Webster  | \\\u003cNA\\\u003e     | 85        |\n| 8   | 408334 | 25562841 | Nicollet | \\\u003cNA\\\u003e     | 5         |\n| 9   | 408335 | 25562135 | Biscay   | \\\u003cNA\\\u003e     | 1         |\n\n\u003c/div\u003e\n\nThe `component` table has a hierarchical relationship:\n\n- mukey (map unit key) is the parent\n- cokey (component key) is the child\n\nSo when fetching components, you typically want to filter by mukey to\nget all components for specific map units.\n\nUse the `fetch_by_keys()` function with the `\"mukey\"` as the\n`key_column` to achieve this with automatic pagination over chunks with\n`100` rows each (or specify your own `chunk_size`).\n\n### Bulk Downloads (Web Soil Survey)\n\nDownload complete SSURGO and STATSGO datasets as ZIP files from the USDA Web Soil Survey portal:\n\n``` python\nfrom soildb import download_wss\n\n# Download specific survey areas\npaths = download_wss.sync(\n    areasymbols=[\"IA109\", \"IA113\"],\n    dest_dir=\"./ssurgo_data\",\n    extract=True\n)\nprint(f\"Downloaded {len(paths)} survey areas\")\n\n# Download all survey areas for a state\npaths = download_wss.sync(\n    where_clause=\"areasymbol LIKE 'IA%'\",\n    dest_dir=\"./iowa_ssurgo\",\n    extract=True,\n    remove_zip=True  # Clean up ZIP files after extraction\n)\n\n# Download STATSGO (general soil map) data\npaths = download_wss.sync(\n    areasymbols=[\"IA\"],\n    db=\"STATSGO\",\n    dest_dir=\"./iowa_statsgo\",\n    extract=True\n)\n```\n\nEach extracted survey area directory contains:\n\n- `tabular/` - Pipe-delimited TXT files with soil data tables\n- `spatial/` - ESRI shapefiles with map unit polygons and boundaries\n\n**Use Cases:**\n\n- **SDA**: Live queries, filtered data, programmatic access to current data\n- **WSS Downloads**: Complete offline datasets, bulk data for analysis, static snapshots updated annually\n\n## Async Usage\n\nFor performance-critical applications, use async functions directly with concurrent requests:\n\n``` python\nimport asyncio\nfrom soildb import fetch_by_keys, get_mukey_by_areasymbol\n\nasync def concurrent_example():\n    # Get mukeys for multiple areas concurrently\n    areas = [\"IA109\", \"IA113\", \"IA117\"]\n    all_mukeys = await get_mukey_by_areasymbol(areas)\n    \n    # Fetch components concurrently with automatic pagination\n    response = await fetch_by_keys(\n        all_mukeys,\n        \"component\",\n        key_column=\"mukey\",\n        chunk_size=100,\n        columns=[\"mukey\", \"cokey\", \"compname\", \"comppct_r\"]\n    )\n    return response.to_pandas()\n\n# Run async function\ndf = asyncio.run(concurrent_example())\n```\n\nFor more async patterns, see the [Async Programming Guide](docs/async.qmd).\n\n# Examples\n\nSee the [`examples/` directory](examples/) and [documentation](docs/)\nfor detailed usage patterns.\n\n## License\n\nThis project is licensed under the MIT License. See the\n[LICENSE](LICENSE) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrownag%2Fpy-soildb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbrownag%2Fpy-soildb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrownag%2Fpy-soildb/lists"}