{"id":13582760,"url":"https://github.com/apache/sedona","last_synced_at":"2026-04-09T05:32:18.345Z","repository":{"id":30975430,"uuid":"34533675","full_name":"apache/sedona","owner":"apache","description":"A cluster computing framework for processing large-scale geospatial data","archived":false,"fork":false,"pushed_at":"2025-05-06T22:24:36.000Z","size":1329403,"stargazers_count":2049,"open_issues_count":63,"forks_count":710,"subscribers_count":94,"default_branch":"master","last_synced_at":"2025-05-07T23:40:05.929Z","etag":null,"topics":["cluster-computing","geospatial","java","python","scala","spatial-analysis","spatial-query","spatial-sql"],"latest_commit_sha":null,"homepage":"https://sedona.apache.org/","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/apache.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2015-04-24T18:01:29.000Z","updated_at":"2025-05-07T17:22:36.000Z","dependencies_parsed_at":"2024-01-16T05:37:38.651Z","dependency_job_id":"bc02151a-4a3b-444b-9672-7ef2d4ddc541","html_url":"https://github.com/apache/sedona","commit_stats":{"total_commits":1767,"total_committers":125,"mean_commits":14.136,"dds":0.5585738539898133,"last_synced_commit":"7876d8c62de1ce97b2d0392d4fc75d01c882fb35"},"previous_names":["datasystemslab/geospark","apache/incubator-sedona","sarwat/geospark"],"tags_count":99,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fsedona","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fsedona/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fsedona/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fsedona/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/apache","download_url":"https://codeload.github.com/apache/sedona/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254045548,"owners_count":22005395,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cluster-computing","geospatial","java","python","scala","spatial-analysis","spatial-query","spatial-sql"],"created_at":"2024-08-01T15:03:00.231Z","updated_at":"2025-12-12T00:39:12.517Z","avatar_url":"https://github.com/apache.png","language":"Java","readme":"\u003c!--\n Licensed to the Apache Software Foundation (ASF) under one\n or more contributor license agreements.  See the NOTICE file\n distributed with this work for additional information\n regarding copyright ownership.  The ASF licenses this file\n to you under the Apache License, Version 2.0 (the\n \"License\"); you may not use this file except in compliance\n with the License.  You may obtain a copy of the License at\n\n   http://www.apache.org/licenses/LICENSE-2.0\n\n Unless required by applicable law or agreed to in writing,\n software distributed under the License is distributed on an\n \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n KIND, either express or implied.  See the License for the\n specific language governing permissions and limitations\n under the License.\n --\u003e\n\n[![Apache Sedona](docs/image/sedona_logo.png)](https://sedona.apache.org/)\n\n[![Scala and Java build](https://github.com/apache/sedona/actions/workflows/java.yml/badge.svg)](https://github.com/apache/sedona/actions/workflows/java.yml) [![Python build](https://github.com/apache/sedona/actions/workflows/python.yml/badge.svg)](https://github.com/apache/sedona/actions/workflows/python.yml) [![R build](https://github.com/apache/sedona/actions/workflows/r.yml/badge.svg)](https://github.com/apache/sedona/actions/workflows/r.yml) [![Docker image build](https://github.com/apache/sedona/actions/workflows/docker-build.yml/badge.svg)](https://github.com/apache/sedona/actions/workflows/docker-build.yml) [![Example project build](https://github.com/apache/sedona/actions/workflows/example.yml/badge.svg)](https://github.com/apache/sedona/actions/workflows/example.yml) [![Docs build](https://github.com/apache/sedona/actions/workflows/docs.yml/badge.svg)](https://github.com/apache/sedona/actions/workflows/docs.yml)\n\n| Download statistics        | **Maven**  | **PyPI**                                                                                                                                                                                                                                                                                                                                     | Conda-forge                                                                                                                                     | **CRAN**                                                                                                                                                                                                                                                                                                    | **DockerHub**                                                                                                                  |\n|----------------------------|------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------|\n| Apache Sedona              | 225k/month | [![PyPI - Downloads](https://img.shields.io/pypi/dm/apache-sedona)](https://pepy.tech/project/apache-sedona) [![Downloads](https://static.pepy.tech/personalized-badge/apache-sedona?period=total\u0026units=international_system\u0026left_color=black\u0026right_color=brightgreen\u0026left_text=total%20downloads)](https://pepy.tech/project/apache-sedona) | [![Anaconda-Server Badge](https://anaconda.org/conda-forge/apache-sedona/badges/downloads.svg)](https://anaconda.org/conda-forge/apache-sedona) | [![CRAN monthly downloads](https://cranlogs.r-pkg.org/badges/apache.sedona?color=brightgreen)](https://cran.r-project.org/package=apache.sedona) [![Total CRAN downloads](https://cranlogs.r-pkg.org/badges/grand-total/apache.sedona?color=brightgreen)](https://cran.r-project.org/package=apache.sedona) | [![Docker pulls](https://img.shields.io/docker/pulls/apache/sedona?color=brightgreen)](https://hub.docker.com/r/apache/sedona) |\n| Archived GeoSpark releases | 10k/month  | [![PyPI - Downloads](https://img.shields.io/pypi/dm/geospark)](https://pepy.tech/project/geospark)[![Downloads](https://static.pepy.tech/personalized-badge/geospark?period=total\u0026units=international_system\u0026left_color=black\u0026right_color=brightgreen\u0026left_text=total%20downloads)](https://pepy.tech/project/geospark)                      |                                                                                                                                                 |                                                                                                                                                                                                                                                                                                             |                                                                                                                                |\n\n* [Join the community](#join-the-community)\n* [What is Apache Sedona?](#what-is-apache-sedona)\n  * [Features](#features)\n* [When to use Sedona?](#when-to-use-sedona)\n  * [Use Cases](#use-cases)\n  * [Code Example](#code-example)\n* [Docker image](#docker-image)\n* [Building Sedona](#building-sedona)\n* [Documentation](#documentation)\n* [Powered by](#powered-by)\n\n## Join the community\n\nEveryone is welcome to join our community events. We have a community office hour every 4 weeks. Please register to the event you want to attend: https://bit.ly/3UBmxFY\n\nPlease join our Discord community!\n\n[![Apache Sedona Community Discord Server](https://dcbadge.vercel.app/api/server/9A3k5dEBsY)](https://discord.gg/9A3k5dEBsY)\n\n* [Apache Sedona@LinkedIn](https://www.linkedin.com/company/apache-sedona)\n* [Apache Sedona@X](https://X.com/ApacheSedona)\n* [Sedona JIRA](https://issues.apache.org/jira/projects/SEDONA): bug reports and feature requests\n* [Sedona GitHub Issues](https://github.com/apache/sedona/issues?q=sort%3Aupdated-desc+is%3Aissue+is%3Aopen): bug reports and feature requests\n* [Sedona GitHub Discussion](https://github.com/apache/sedona/discussions): project development and general questions\n* [Sedona Mailing Lists](https://lists.apache.org/list.html?sedona.apache.org): [dev@sedona.apache.org](https://lists.apache.org/list.html?dev@sedona.apache.org): project development and general questions\n\nFor the mailing list, Please first subscribe and then post emails. To subscribe, please send an email (leave the subject and content blank) to [dev-subscribe@sedona.apache.org](mailto:dev-subscribe@sedona.apache.org?subject=Subscribe\u0026body=Subscribe)\n\n## What is Apache Sedona?\n\nApache Sedona™ is a [spatial computing](https://en.wikipedia.org/wiki/Spatial_computing) engine that enables developers to easily process spatial data at any scale within modern cluster computing systems such as [Apache Spark](https://spark.apache.org/) and [Apache Flink](https://flink.apache.org/).\nSedona developers can express their spatial data processing tasks in [Spatial SQL](https://carto.com/spatial-sql), [Spatial Python](https://docs.scipy.org/doc/scipy/reference/spatial.html) or [Spatial R](https://r-spatial.org/). Internally, Sedona provides spatial data loading, indexing, partitioning, and query processing/optimization functionality that enable users to efficiently analyze spatial data at any scale.\n\n![Sedona Ecosystem](docs/image/sedona-ecosystem.png \"Sedona Ecosystem\")\n\n### Features\n\nSome of the key features of Apache Sedona include:\n\n* Support for a wide range of geospatial data formats, including [GeoJSON](https://en.wikipedia.org/wiki/GeoJSON), [WKT](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry), and [ESRI](https://www.esri.com) [Shapefile](https://en.wikipedia.org/wiki/Shapefile).\n* Scalable distributed processing of large vector and raster datasets.\n* Tools for spatial indexing, spatial querying, and spatial join operations.\n* Integration with popular geospatial Python tools such as [GeoPandas](https://geopandas.org).\n* Integration with popular big data tools, such as Spark, [Hadoop](https://hadoop.apache.org/), [Hive](https://hive.apache.org/), and Flink for data storage and querying.\n* A user-friendly API for working with geospatial data in the [SQL](https://en.wikipedia.org/wiki/SQL), [Python](https://www.python.org/), [Scala](https://www.scala-lang.org/) and [Java](https://www.java.com) languages.\n* Flexible deployment options, including standalone, local, and cluster modes.\n\nThese are some of the key features of Apache Sedona, but it may offer additional capabilities depending on the specific version and configuration.\n\n## When to use Sedona?\n\n### Use Cases:\n\nApache Sedona is a widely used framework for working with spatial data, and it has many different use cases and applications. Some of the main use cases for Apache Sedona include:\n\n* Automotive data analytics: Apache Sedona is widely used in geospatial analytics applications, where it is used to perform spatial analysis and data mining on large and complex datasets collected from fleets.\n* Urban planning and development: Apache Sedona is commonly used in urban planning and development applications to analyze and visualize spatial data sets related to urban environments, such as land use, transportation networks, and population density.\n* Location-based services: Apache Sedona is often used in location-based services, such as mapping and navigation applications, where it is used to process and analyze spatial data to provide location-based information and services to users.\n* Environmental modeling and analysis: Apache Sedona is used in many different environmental modeling and analysis applications, where it is used to process and analyze spatial data related to environmental factors, such as air quality, water quality, and weather patterns.\n* Disaster response and management: Apache Sedona is used in disaster response and management applications to process and analyze spatial data related to disasters, such as floods, earthquakes, and other natural disasters, in order to support emergency response and recovery efforts.\n\n### Code Example:\n\nThis example loads NYC taxi trip records and taxi zone information stored as .CSV files on AWS S3 into Sedona spatial dataframes. It then performs spatial SQL query on the taxi trip datasets to filter out all records except those within the Manhattan area of New York. The example also shows a spatial join operation that matches taxi trip records to zones based on whether the taxi trip lies within the geographical extents of the zone. Finally, the last code snippet integrates the output of Sedona with GeoPandas and plots the spatial distribution of both datasets.\n\n#### Load NYC taxi trips and taxi zones data from CSV Files Stored on AWS S3\n\n```python\ntaxidf = (\n    sedona.read.format(\"csv\")\n    .option(\"header\", \"true\")\n    .option(\"delimiter\", \",\")\n    .load(\"s3a://your-directory/data/nyc-taxi-data.csv\")\n)\ntaxidf = taxidf.selectExpr(\n    \"ST_Point(CAST(Start_Lon AS Decimal(24,20)), CAST(Start_Lat AS Decimal(24,20))) AS pickup\",\n    \"Trip_Pickup_DateTime\",\n    \"Payment_Type\",\n    \"Fare_Amt\",\n)\n```\n\n```python\nzoneDf = (\n    sedona.read.format(\"csv\")\n    .option(\"delimiter\", \",\")\n    .load(\"s3a://your-directory/data/TIGER2018_ZCTA5.csv\")\n)\nzoneDf = zoneDf.selectExpr(\"ST_GeomFromWKT(_c0) as zone\", \"_c1 as zipcode\")\n```\n\n#### Spatial SQL query to only return Taxi trips in Manhattan\n\n```python\ntaxidf_mhtn = taxidf.where(\n    \"ST_Contains(ST_PolygonFromEnvelope(-74.01,40.73,-73.93,40.79), pickup)\"\n)\n```\n\n#### Spatial Join between Taxi Dataframe and Zone Dataframe to Find taxis in each zone\n\n```python\ntaxiVsZone = sedona.sql(\n    \"SELECT zone, zipcode, pickup, Fare_Amt FROM zoneDf, taxiDf WHERE ST_Contains(zone, pickup)\"\n)\n```\n\n#### Show a map of the loaded Spatial Dataframes using GeoPandas\n\n```python\nzoneGpd = gpd.GeoDataFrame(zoneDf.toPandas(), geometry=\"zone\")\ntaxiGpd = gpd.GeoDataFrame(taxidf.toPandas(), geometry=\"pickup\")\n\nzone = zoneGpd.plot(color=\"yellow\", edgecolor=\"black\", zorder=1)\nzone.set_xlabel(\"Longitude (degrees)\")\nzone.set_ylabel(\"Latitude (degrees)\")\n\nzone.set_xlim(-74.1, -73.8)\nzone.set_ylim(40.65, 40.9)\n\ntaxi = taxiGpd.plot(ax=zone, alpha=0.01, color=\"red\", zorder=3)\n```\n\n## Docker image\n\nWe provide a Docker image for Apache Sedona with Python JupyterLab and a single-node cluster. The images are available on [DockerHub](https://hub.docker.com/r/apache/sedona)\n\n## Building Sedona\n\n* To install the Python package:\n\n  ```\n  pip install apache-sedona\n  ```\n\n* To compile the source code, please refer to [Sedona website](https://sedona.apache.org/latest-snapshot/setup/compile/)\n\n* Modules in the source code\n\n| Name             | API                                      | Introduction                                           |\n|------------------|------------------------------------------|--------------------------------------------------------|\n| common           | Java                                     | Core geometric operation logics, serialization, index  |\n| spark            | Spark RDD/DataFrame Scala/Java/SQL       | Distributed geospatial data processing on Apache Spark |\n| flink            | Flink DataStream/Table in Scala/Java/SQL | Distributed geospatial data processing on Apache Flink |\n| snowflake        | Snowflake SQL                            | Distributed geospatial data processing on Snowflake    |\n| spark-shaded     | No source code                           | shaded jar for Sedona Spark                            |\n| flink-shaded     | No source code                           | shaded jar for Sedona Flink                            |\n| snowflake-tester | Java                                     | tester program for Sedona Snowflake                    |\n| python           | Spark RDD/DataFrame Python               | Distributed geospatial data processing on Apache Spark |\n| R                | Spark RDD/DataFrame in R                 | R wrapper for Sedona                                   |\n| Zeppelin         | Apache Zeppelin                          | Plugin for Apache Zeppelin 0.8.1+                      |\n\n## Documentation\n\n* [Spatial SQL in Sedona](https://sedona.apache.org/latest-snapshot/tutorial/sql/)\n* [Integrate with GeoPandas and Shapely](https://sedona.apache.org/latest-snapshot/tutorial/geopandas-shapely/)\n* [Working with Spatial R in Sedona](https://sedona.apache.org/latest-snapshot/api/rdocs/)\n\nPlease visit [Apache Sedona website](http://sedona.apache.org/) for detailed information\n\n## Powered by\n\n\u003ca href=\"https://www.apache.org/\"\u003e\n  \u003cimg alt=\"The Apache Software Foundation\" class=\"center\" src=\"https://www.apache.org/foundation/press/kit/asf_logo_wide.png\"\n    title=\"The Apache Software Foundation\" width=\"500\"\u003e\n\u003c/a\u003e\n","funding_links":[],"categories":["Java","Data Lake Engines","大数据"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapache%2Fsedona","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fapache%2Fsedona","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapache%2Fsedona/lists"}