azavea / hiveless   0.0.12

Apache License 2.0 GitHub

Scala API for Hive UDFs with the GIS extension

Scala versions: 2.12

Hiveless

CI Maven Badge Snapshots Badge

Hiveless is a Scala library for working with Spark and Hive using a more expressive typed API. It adds typed HiveUDFs and implements Spatial Hive UDFs. It consists of the following modules:

  • hiveless-core with the typed Hive UDFs API and the initial base set of codecs
  • hiveless-jts with the TWKB JTS encoding support
  • hiveless-spatial with Hive GIS UDFs (depends on GeoMesa)
  • hiveless-spatial-index with extra Hive GIS UDFs that may be used for the GIS indexing purposes (depends on GeoMesa and GeoTrellis)
    • There is also a forked release CartoDB/analytics-toolbox-databricks, which is a complete hiveless-spatial and hiveless-spatial-index copy at this point. However, it may contain an extended GIS functionality in the future.

Quick Start

To use Hiveless in your project add the following in your build.sbt file as needed:

resolvers ++= Seq(
  // for snapshot artifacts only
  "oss-sonatype" at "https://oss.sonatype.org/content/repositories/snapshots"
)

libraryDependencies ++= List(
  "com.azavea" %% "hiveless-core"          % "<latest version>",
  "com.azavea" %% "hiveless-spatial"       % "<latest version>",
  "com.azavea" %% "hiveless-spatial-index" % "<latest version>"
)

Hiveless Spatial supported GIS functions

CREATE OR REPLACE FUNCTION st_geometryFromText as 'com.azavea.hiveless.spatial.ST_GeomFromWKT';
CREATE OR REPLACE FUNCTION st_intersects as 'com.azavea.hiveless.spatial.ST_Intersects';
CREATE OR REPLACE FUNCTION st_simplify as 'com.azavea.hiveless.spatial.ST_Simplify';
 -- ...and more

The full list of supported functions can be found here.

Spatial Query Optimizations

There are two types of supported optimizations: ST_Intersects and ST_Contains, which allow Spark to push down predicates when possible.

To enable optimizations:

import com.azavea.hiveless.spark.sql.rules.SpatialFilterPushdownRules

val spark: SparkSession = ???
SpatialFilterPushdownRules.registerOptimizations(sparkContext.sqlContext)

It is also possible to set it through the Spark configuration via the optimizations injector:

import com.azavea.hiveless.spark.sql.SpatialFilterPushdownOptimizations

val conf: SparkConfig = ???
config.set("spark.sql.extensions", classOf[SpatialFilterPushdownOptimizations].getName)

License

Code is provided under the Apache 2.0 license available at http://opensource.org/licenses/Apache-2.0, as well as in the LICENSE file. This is the same license used as Spark.