{"id":15208971,"url":"https://github.com/spatialx-project/geolake","last_synced_at":"2025-10-03T01:31:32.450Z","repository":{"id":151865750,"uuid":"608431360","full_name":"spatialx-project/geolake","owner":"spatialx-project","description":"Universal solution for geospatial data tailored to data lakehouse systems for the first time in the industry","archived":false,"fork":false,"pushed_at":"2023-10-24T09:58:45.000Z","size":21219,"stargazers_count":60,"open_issues_count":0,"forks_count":4,"subscribers_count":7,"default_branch":"main","last_synced_at":"2024-09-29T07:08:46.778Z","etag":null,"topics":["geospatial","geospatial-analysis","geospatial-processing","iceberg","spark","spatial","spatial-data"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/spatialx-project.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-03-02T02:02:50.000Z","updated_at":"2024-08-10T03:21:46.000Z","dependencies_parsed_at":"2023-05-13T01:00:16.483Z","dependency_job_id":"d5ef6167-31e0-4ab8-bea5-a4a46743f594","html_url":"https://github.com/spatialx-project/geolake","commit_stats":{"total_commits":4032,"total_committers":392,"mean_commits":"10.285714285714286","dds":0.8742559523809523,"last_synced_commit":"940efacbd80fc32a0573a33399a92052ebc6519b"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spatialx-project%2Fgeolake","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spatialx-project%2Fgeolake/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spatialx-project%2Fgeolake/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spatialx-project%2Fgeolake/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/spatialx-project","download_url":"https://codeload.github.com/spatialx-project/geolake/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":235059234,"owners_count":18929279,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["geospatial","geospatial-analysis","geospatial-processing","iceberg","spark","spatial","spatial-data"],"created_at":"2024-09-28T07:08:34.468Z","updated_at":"2025-10-03T01:31:25.967Z","avatar_url":"https://github.com/spatialx-project.png","language":"Java","readme":"# GeoLake\n\n**GeoLake** aims at bringing geospatial support to lakehouses.\n\n![geolake-overview](docs/geolake-overview.png)\n\nNote: We develop GeoLake atop Apache Iceberg, preserving the committed history of Apache Iceberg in the process. This retention explains the extensive contributor list on our project. Maintaining the commit history facilitates easy tracking of the changes within the Apache Iceberg project, enabling us to rebase our code to the latest version of Iceberg and ensure compatibility with its new releases.\n\n## GeoLake Architecture\n\nGeoLake can be used to build a lakehouse with geospatial support. It is built on top of [Apache Spark](https://spark.apache.org/) and [Apache Iceberg](https://iceberg.apache.org/).\n\n- **GeoLake Parquet**: A extension to Apache Parquet to support geospatial data types.\n- **Spatial Partition**: A spatial partitioning scheme for Apache Iceberg.\n- **Geometry Type**: A geometry type for Apache Iceberg.\n- **Spark \u0026 Sedona**: Integrate with Apache Spark and Apache Sedona seamlessly.\n\n## Spark SQL Examples\n```sql\n-- Create table with a geometry type, as well as a spatial partition\nCREATE TABLE iceberg.geom_table(\n    id int,\n    geom geometry\n) USING ICEBERG PARTITIONED BY (xz2(geo, 7));\n\n-- insert geometry values using WKT\nINSERT INTO iceberg.geom_table VALUES\n(1, 'POINT(1 2)'),\n(2, 'LINESTRING(1 2, 3 4)'),\n(3, 'POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))');\n\n-- query with spatial predicates\nSELECT * FROM iceberg.geom_table\nWHERE ST_Contains(geom, ST_Point(0.5, 0.5));\n```\n\n## Quickstart\n\nCheck this repo [docker-spark-geolake](https://github.com/spatialx-project/docker-spark-geolake) for early access, there are some [notebooks](https://github.com/spatialx-project/docker-spark-geolake/tree/main/spark/notebooks) inside.\n\n\n","funding_links":[],"categories":["Lakehouse","Table of Contents"],"sub_categories":["Lakehouse System"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspatialx-project%2Fgeolake","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fspatialx-project%2Fgeolake","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspatialx-project%2Fgeolake/lists"}