Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/spatialx-project/geolake
Universal solution for geospatial data tailored to data lakehouse systems for the first time in the industry
https://github.com/spatialx-project/geolake
geospatial geospatial-analysis geospatial-processing iceberg spark spatial spatial-data
Last synced: about 15 hours ago
JSON representation
Universal solution for geospatial data tailored to data lakehouse systems for the first time in the industry
- Host: GitHub
- URL: https://github.com/spatialx-project/geolake
- Owner: spatialx-project
- License: apache-2.0
- Created: 2023-03-02T02:02:50.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-10-24T09:58:45.000Z (about 1 year ago)
- Last Synced: 2024-09-29T07:08:46.778Z (4 months ago)
- Topics: geospatial, geospatial-analysis, geospatial-processing, iceberg, spark, spatial, spatial-data
- Language: Java
- Homepage:
- Size: 20.2 MB
- Stars: 60
- Watchers: 7
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- awesome-datalake - Geolake - Universal solution for geospatial data tailored to data lakehouse systems for the first time in the industry. (Lakehouse)
- awesome-datalake - Geolake - Universal solution for geospatial data tailored to data lakehouse systems for the first time in the industry. (Lakehouse)
README
# GeoLake
**GeoLake** aims at bringing geospatial support to lakehouses.
![geolake-overview](docs/geolake-overview.png)
Note: We develop GeoLake atop Apache Iceberg, preserving the committed history of Apache Iceberg in the process. This retention explains the extensive contributor list on our project. Maintaining the commit history facilitates easy tracking of the changes within the Apache Iceberg project, enabling us to rebase our code to the latest version of Iceberg and ensure compatibility with its new releases.
## GeoLake Architecture
GeoLake can be used to build a lakehouse with geospatial support. It is built on top of [Apache Spark](https://spark.apache.org/) and [Apache Iceberg](https://iceberg.apache.org/).
- **GeoLake Parquet**: A extension to Apache Parquet to support geospatial data types.
- **Spatial Partition**: A spatial partitioning scheme for Apache Iceberg.
- **Geometry Type**: A geometry type for Apache Iceberg.
- **Spark & Sedona**: Integrate with Apache Spark and Apache Sedona seamlessly.## Spark SQL Examples
```sql
-- Create table with a geometry type, as well as a spatial partition
CREATE TABLE iceberg.geom_table(
id int,
geom geometry
) USING ICEBERG PARTITIONED BY (xz2(geo, 7));-- insert geometry values using WKT
INSERT INTO iceberg.geom_table VALUES
(1, 'POINT(1 2)'),
(2, 'LINESTRING(1 2, 3 4)'),
(3, 'POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))');-- query with spatial predicates
SELECT * FROM iceberg.geom_table
WHERE ST_Contains(geom, ST_Point(0.5, 0.5));
```## Quickstart
Check this repo [docker-spark-geolake](https://github.com/spatialx-project/docker-spark-geolake) for early access, there are some [notebooks](https://github.com/spatialx-project/docker-spark-geolake/tree/main/spark/notebooks) inside.