https://github.com/stac-utils/stac-geoparquet
https://github.com/stac-utils/stac-geoparquet
Last synced: 23 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/stac-utils/stac-geoparquet
- Owner: stac-utils
- License: mit
- Created: 2022-06-01T20:42:02.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-06-07T14:54:05.000Z (about 1 year ago)
- Last Synced: 2024-06-11T16:57:53.047Z (about 1 year ago)
- Language: Python
- Size: 206 KB
- Stars: 56
- Watchers: 7
- Forks: 8
- Open Issues: 10
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-earthobservation-code - stac-geoparquet - Convert STAC items to geoparquet. `Python` (`Python` processing of optical imagery (non deep learning) / Cloud Native Geospatial)
README
# STAC-GeoParquet
Convert [STAC](https://stacspec.org/en) items between JSON, [GeoParquet](https://geoparquet.org/), [pgstac](https://github.com/stac-utils/pgstac), and [Delta Lake](https://delta.io/).
## Purpose
The STAC spec defines a JSON-based schema.
But it can be hard to manage and search through many millions of STAC items in JSON format.
For one, JSON is very large on disk.
And you need to parse the entire JSON data into memory to extract just a small piece of information, say the `datetime` and one `asset` of an Item.GeoParquet can be a good complement to JSON for many bulk-access and analytic use cases.
While STAC Items are commonly distributed as individual JSON files on object storage or through a [STAC API](https://github.com/radiantearth/stac-api-spec), STAC GeoParquet allows users to access a large number of STAC items in bulk without making repeated HTTP requests.For analytic questions like "find the items in the Sentinel-2 collection in June 2024 over New York City with cloud cover of less than 20%" it can be much, much faster to find the relevant data from a GeoParquet source than from JSON, because GeoParquet needs to load only the relevant columns for that query, not the full data.
See the [STAC-GeoParquet specification](./spec/stac-geoparquet-spec.md) for details on the exact schema of the written Parquet files.
## Installation
Install via `pip` or `conda`:
* `pip install stac-geoparquet`
* `conda install conda-forge::stac-geoparquet`## Documentation
[Documentation website](https://stac-utils.github.io/stac-geoparquet/)
## Development
Get [uv](https://docs.astral.sh/uv/getting-started/installation/), then:
```shell
git clone [email protected]:stac-utils/stac-geoparquet.git
cd stac-geoparquet
uv sync
uv run pre-commit install
uv run pytest
scripts/lint
```Validate the example collection metadata against the jsonschema:
```shell
check-jsonschema --schemafile spec/json-schema/metadata.json spec/example-metadata.json
```