https://github.com/azavea/hiveless
Scala API for Hive UDFs with the GIS extension
https://github.com/azavea/hiveless
geospatial gis scala spark typelevel
Last synced: 12 months ago
JSON representation
Scala API for Hive UDFs with the GIS extension
- Host: GitHub
- URL: https://github.com/azavea/hiveless
- Owner: azavea
- License: apache-2.0
- Created: 2021-11-13T14:30:51.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2022-05-26T21:51:42.000Z (almost 4 years ago)
- Last Synced: 2025-04-03T07:12:22.974Z (about 1 year ago)
- Topics: geospatial, gis, scala, spark, typelevel
- Language: Scala
- Homepage:
- Size: 3.6 MB
- Stars: 8
- Watchers: 6
- Forks: 1
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Hiveless
[](https://github.com/azavea/hiveless/actions/workflows/ci.yml)
[](https://search.maven.org/search?q=g:com.azavea%20and%20hiveless)
[](https://oss.sonatype.org/content/repositories/snapshots/com/azavea/hiveless-core_2.12/)
Hiveless is a Scala library for working with [Spark](https://spark.apache.org/) and [Hive](https://hive.apache.org/) using a more expressive typed API.
It adds typed HiveUDFs and implements Spatial Hive UDFs. It consists of the following modules:
* `hiveless-core` with the typed Hive UDFs API and the initial base set of codecs
* `hiveless-jts` with the TWKB JTS encoding support
* `hiveless-spatial` with Hive GIS UDFs (depends on [GeoMesa](https://github.com/locationtech/geomesa))
* `hiveless-spatial-index` with extra Hive GIS UDFs that may be used for the GIS indexing purposes (depends on [GeoMesa](https://github.com/locationtech/geomesa) and [GeoTrellis](https://github.com/locationtech/geotrellis))
* There is also a forked release [CartoDB/analytics-toolbox-databricks](https://github.com/CartoDB/analytics-toolbox-databricks), which is a complete `hiveless-spatial` and `hiveless-spatial-index` copy at this point. However, it may contain an extended GIS functionality in the future.
## Quick Start
To use Hiveless in your project add the following in your `build.sbt` file as needed:
```scala
resolvers ++= Seq(
// for snapshot artifacts only
"oss-sonatype" at "https://oss.sonatype.org/content/repositories/snapshots"
)
libraryDependencies ++= List(
"com.azavea" %% "hiveless-core" % "",
"com.azavea" %% "hiveless-spatial" % "",
"com.azavea" %% "hiveless-spatial-index" % ""
)
```
## Hiveless Spatial supported GIS functions
```sql
CREATE OR REPLACE FUNCTION st_geometryFromText as 'com.azavea.hiveless.spatial.ST_GeomFromWKT';
CREATE OR REPLACE FUNCTION st_intersects as 'com.azavea.hiveless.spatial.ST_Intersects';
CREATE OR REPLACE FUNCTION st_simplify as 'com.azavea.hiveless.spatial.ST_Simplify';
-- ...and more
```
The full list of supported functions can be found [here](./spatial/sql/createUDFs.sql).
## Spatial Query Optimizations
There are two types of supported optimizations: `ST_Intersects` and `ST_Contains`, which allow Spark to push down predicates when possible.
To enable optimizations:
```scala
import com.azavea.hiveless.spark.sql.rules.SpatialFilterPushdownRules
val spark: SparkSession = ???
SpatialFilterPushdownRules.registerOptimizations(sparkContext.sqlContext)
```
It is also possible to set it through the Spark configuration via the optimizations injector:
```scala
import com.azavea.hiveless.spark.sql.SpatialFilterPushdownOptimizations
val conf: SparkConfig = ???
config.set("spark.sql.extensions", classOf[SpatialFilterPushdownOptimizations].getName)
```
## License
Code is provided under the Apache 2.0 license available at http://opensource.org/licenses/Apache-2.0,
as well as in the LICENSE file. This is the same license used as Spark.