Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/douban/dpark
Python clone of Spark, a MapReduce alike framework in Python
bigdata dpark mapreduce python spark stream-processing
Last synced: 26 Mar 2024
![](https://github.com/douban.png)
https://github.com/minzhang-1/PointHop-PointHop2_Spark
A fast and low memory requirement version of PointHop and PointHop++, which is built upon Apache Spark.
3d 3d-classification classification feature-extraction knn least-square-regression pca point-cloud pyspark python spark
Last synced: 26 Mar 2024
![](https://github.com/minzhang-1.png)
https://github.com/tencentmusic/cube-studio
cube studio开源云原生一站式机器学习/深度学习AI平台,支持sso登录,多租户/多项目组,大数据平台对接,notebook在线开发,拖拉拽任务流pipeline编排,多机多卡分布式训练,超参搜索,推理服务VGPU,边缘计算,serverless,标注平台,自动化标注,数据集管理,大模型微调,vllm大模型推理,llmops,私有知识库,AI模型应用商店,支持模型一键开发/推理/微调,支持国产cpu/gpu/npu芯片,支持RDMA,支持pytorch/tf/mxnet/deepspeed/paddle/colossalai/horovod/spark/ray/volcano分布式
ai aihub argo automl gpt inference kubeflow kubernetes llmops mlops notebook pipeline pytorch spark vgpu workflow
Last synced: 26 Mar 2024
![](https://github.com/tencentmusic.png)
https://github.com/deanwampler/JustEnoughScalaForSpark
A tutorial on the most important features and idioms of Scala that you need to use Spark's Scala APIs.
Last synced: 26 Mar 2024
![](https://github.com/deanwampler.png)
https://github.com/pixiedust/pixiedust
Python Helper library for Jupyter Notebooks
data-science jupyter-notebook pixiedust python python-notebook scala-notebooks spark visualization
Last synced: 23 Mar 2024
![](https://github.com/pixiedust.png)
https://github.com/awslabs/deequ
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
dataquality scala spark unit-testing
Last synced: 23 Mar 2024
![](https://github.com/awslabs.png)
https://github.com/yahoo/TensorFlowOnSpark
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
cluster featured machine-learning python scala spark tensorflow yahoo
Last synced: 23 Mar 2024
![](https://github.com/yahoo.png)
https://github.com/SETL-Framework/setl
A simple Spark-powered ETL framework that just works 🍺
big-data data-analysis data-engineering data-science data-transformation dataset etl etl-pipeline framework machine-learning modularization pipeline scala setl spark
Last synced: 23 Mar 2024
![](https://github.com/SETL-Framework.png)
https://github.com/basin-etl/basin
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
emr etl hadoop informatica odi pipeline pyspark spark
Last synced: 23 Mar 2024
![](https://github.com/basin-etl.png)
https://github.com/logicalclocks/maggy
Distribution transparent Machine Learning experiments on Apache Spark
ablation ablation-studies ablation-study automl blackbox-optimization hyperparameter-optimization hyperparameter-search hyperparameter-tuning spark
Last synced: 23 Mar 2024
![](https://github.com/logicalclocks.png)
https://github.com/combust/mleap
MLeap: Deploy ML Pipelines to Production
data-pipelines python scala scikit-learn spark tensorflow transformers
Last synced: 23 Mar 2024
![](https://github.com/combust.png)
https://github.com/delta-io/delta-sharing
An open protocol for secure data sharing
big-data data-sharing delta-lake pandas spark
Last synced: 21 Mar 2024
![](https://github.com/delta-io.png)
https://github.com/getyourguide/TypedPyspark
Type-annotate your spark dataframes and validate them
Last synced: 19 Mar 2024
![](https://github.com/getyourguide.png)
https://github.com/Ibotta/sk-dist
Distributed scikit-learn meta-estimators in PySpark
data-science machine-learning ml scikit-learn spark
Last synced: 18 Mar 2024
![](https://github.com/Ibotta.png)
https://github.com/projectglow/glow
An open-source toolkit for large-scale genomic analysis
delta genomics gwas machine-learning population-genetics regression spark
Last synced: 18 Mar 2024
![](https://github.com/projectglow.png)
https://github.com/manuzhang/jupyterlab_spark
Spark Application UI extension for JupyterLab
jupyterlab jupyterlab-extension spark typescript
Last synced: 18 Mar 2024
![](https://github.com/manuzhang.png)
https://github.com/itsjafer/jupyterlab-sparkmonitor
JupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook
apache-spark jupyter jupyter-lab jupyterlab jupyterlab-extension pyspark spark
Last synced: 18 Mar 2024
![](https://github.com/itsjafer.png)
https://github.com/jupyter-server/enterprise_gateway
A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
enterprise gateway hacktoberfest jupyter jupyter-enterprise-gateway jupyter-kernels jupyter-notebook kernel kubernetes remote-kernels spark spark-on-kubernetes yarn
Last synced: 18 Mar 2024
![](https://github.com/jupyter-server.png)
https://github.com/vericast/spylon-kernel
Jupyter kernel for scala and spark
jupyter-kernels kernel metakernel scala spark team-platform
Last synced: 18 Mar 2024
![](https://github.com/vericast.png)
https://github.com/almond-sh/almond
A Scala kernel for Jupyter
jupyter jupyter-kernels jupyter-notebook repl scala spark spark-sql
Last synced: 18 Mar 2024
![](https://github.com/almond-sh.png)
https://github.com/paypal/PPExtensions
Set of iPython and Jupyter extensions to improve user experience
gimel hive ipython-magic jupyer jupyter-extension magics notebooks spark tableau teradata
Last synced: 18 Mar 2024
![](https://github.com/paypal.png)
https://github.com/krishnan-r/sparkmonitor
Monitor Apache Spark from Jupyter Notebook
Last synced: 18 Mar 2024
![](https://github.com/krishnan-r.png)
https://github.com/asavinov/prosto
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
business-intelligence data-preparation data-preprocessing data-processing data-science data-wrangling feature-engineering map-reduce olap pandas python spark workflow
Last synced: 18 Mar 2024
![](https://github.com/asavinov.png)
https://github.com/garystafford/kafka-connect-msk-demo
For a series of posts on Amazon MSK, Amazon EKS, and Amazon EMR
aws kafka kafka-connect kubernetes spark spark-streaming
Last synced: 18 Mar 2024
![](https://github.com/garystafford.png)
https://github.com/Chabane/generator-mitosis
A micro-service infrastructure generator based on Yeoman/Chatbot, Kubernetes/Docker Swarm, Traefik, Ansible, Jenkins, Spark, Hadoop, Kafka, etc.
ansible chatbot docker elasticsearch golang jenkins kafka kibana kubernetes logstash machine-learning rust sonarqube spark swarm traefik vagrant yeoman-generator
Last synced: 16 Mar 2024
![](https://github.com/Chabane.png)
https://github.com/trK54Ylmz/kafka-spark-streaming-example
Simple examle for Spark Streaming over Kafka topic
java kafka spark stream-processing
Last synced: 16 Mar 2024
![](https://github.com/trK54Ylmz.png)