-
qbeast-io/qbeast-spark
Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!
Scala versions: 2.12 -
projectglow/glow
An open-source toolkit for large-scale genomic analysis
Scala versions: 2.12 2.11 -
swoop-inc/spark-alchemy
Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Scala versions: 2.12 -
setl-framework/setl
A simple Spark-powered ETL framework that just works 🍺
Scala versions: 2.12 2.11 -
azure/azure-cosmosdb-spark
Apache Spark Connector for Azure Cosmos DB
Scala versions: 2.11 2.10 -
leobenkel/zparkio
Boiler plate framework to use Spark and ZIO together.
Scala versions: 2.11 -
sparkling-graph/sparkling-graph
SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Scala versions: 2.11 2.10 -
housepower/spark-clickhouse-connector
Spark ClickHouse Connector build on DataSourceV2 API
Scala versions: 2.13 2.12 -
clustering4ever/clustering4ever
C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
Scala versions: 2.11 -
zouzias/spark-lucenerdd
Spark RDD with Lucene's query and entity linkage capabilities
Scala versions: 2.12 2.11 2.10 -
streamnative/pulsar-spark
Spark Connector to read and write with Pulsar
Scala versions: 2.13 2.12 2.11 -
aliyun/aliyun-emapreduce-datasources
Extended datasource support for Spark/Hadoop on Aliyun E-MapReduce.
Scala versions: 2.11 2.10 -
indix/schemer
Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Scala versions: 2.11 -
smart-data-lake/smart-data-lake
Smart Automation Tool for building modern Data Lakes and Data Pipelines
Scala versions: 2.13 2.12 2.11