Projects in Awesome Lists tagged with dask-distributed
A curated list of projects in awesome lists tagged with dask-distributed .
https://github.com/datacanvasio/hypergbm
A full pipeline AutoML tool for tabular data
adversarial-validation automl catboost dask dask-distributed datacleaning distributed-training ensemble-learning fullpipeline gbm gpu-acceleration lightgbm preprocessing pseudo-labeling rapidsai semi-supervised-learning sklearn tabular-data xgboost
Last synced: 15 May 2025
https://github.com/DataCanvasIO/HyperGBM
A full pipeline AutoML tool for tabular data
adversarial-validation automl catboost dask dask-distributed datacleaning distributed-training ensemble-learning fullpipeline gbm gpu-acceleration lightgbm preprocessing pseudo-labeling rapidsai semi-supervised-learning sklearn tabular-data xgboost
Last synced: 09 May 2025
https://github.com/TimeEval/TimeEval
Evaluation Tool for Anomaly Detection Algorithms on Time Series
anomaly-detection benchmark-framework benchmarking dask dask-distributed distributed jupyter-notebooks numpy pandas python3 time-series time-series-analysis time-series-anomaly-detection
Last synced: 27 Jun 2026
https://github.com/modin-project/unidist
Unified Distributed Execution
dask-distributed distributed mpi multiprocessing python ray
Last synced: 11 Jun 2025
https://github.com/shauryashaurya/learn-data-munging
Notes on Data Engineering with Pandas, PySpark, Dask, Ray, Arrow DataFusion, Polars etc.
arrow dask dask-distributed data-engineering datafusion jupyter numpy pandas polars pyspark ray spark
Last synced: 16 Apr 2025
https://github.com/pyiron/pylammpsmpi
Parallel Lammps Python interface - control a mpi4py parallel LAMMPS instance from a serial python process or a Jupyter notebook - based on executorlib
dask-distributed lammps lammps-python-interface mpi4py openmpi
Last synced: 13 Feb 2026
https://github.com/elcorto/psweep
Loop like a pro, make parameter studies fun.
computational-experiment dask dask-distributed dask-jobqueue database pandas parameter-estimation parameter-scan parameter-search parameter-study parameter-sweep python
Last synced: 17 Jun 2025
https://github.com/aws-solutions-library-samples/distributed-compute-on-aws-with-cross-regional-dask
Perform I/O intensive workloads on high-volume data sparsely located across multiple AWS regions through the use of Dask.
dask dask-distributed dask-worker-pools
Last synced: 14 Oct 2025
https://github.com/jameslamb/lightgbm-dask-testing
Test LightGBM's Dask integration on different cluster types
aws dask dask-distributed docker lightgbm machine-learning
Last synced: 06 Sep 2025
https://github.com/gjoseph92/sneks
Launch a Dask cluster from a Poetry environment
coiled dask dask-distributed poetry-python
Last synced: 20 Mar 2025
https://github.com/pleiszenburg/scherbelberg
HPC cluster deployment and management for the Hetzner Cloud
cloud cluster cluster-management dask dask-distributed deployment hetzner high-performance high-performance-computing hpc hpc-cluster hpc-clusters management python
Last synced: 15 Jun 2025
https://github.com/eth-cscs/ipcluster_magic
Magic commands to support running MPI python code as well as multi-node Dask workloads on Jupyter notebooks.
dask-distributed ipyparallel jupyter-notebook mpi4py
Last synced: 04 Apr 2025
https://github.com/comp-dev-cms-ita/dask-remote-jobqueue
A custom dask remote jobqueue for HTCondor.
dask dask-distributed dask-jobqueue htcondor
Last synced: 28 Feb 2026
https://github.com/maawoo/stac-access-performance
Testing access performance of Sentinel-1 RTC metadata catalogs
analysis-ready-data dask-distributed earth-observation metadata sentinel-1 xarray
Last synced: 17 Jan 2026
https://github.com/jbris/pycaret-fugue-dask-test
Testing PyCaret, Fugue, and Dask
dask dask-distributed fugue pycaret pycaret-library
Last synced: 13 May 2026
https://github.com/lebedov/dask-ml-on-azure-ml
Using Dask-ML on Azure ML
azure-ml dask-distributed dask-ml
Last synced: 06 May 2026
https://github.com/jkanche/asynchronous-api-dask-terraform
Asynchronous API using Dask and AWS Fargate
aws dask-distributed fargate-containers fastapi
Last synced: 17 May 2026
https://github.com/ivanbgd/dask_demo_reins
A Dask library for Big Data processing in Python demo
dask dask-distributed distributed distributed-computing insurance larger-than-memory python python3 reinsurance
Last synced: 12 Jul 2025
https://github.com/amishidesai04/distributed-machine-learning
A lightweight, scalable system that demonstrates model and data parallelism in machine learning using Dask, PyTorch, and Flask. Features distributed CNN inference and linear regression training across multiple networked devices.
dask-distributed distributed-computing distributed-machine-learning flask machine-learning pytorch
Last synced: 30 Apr 2026
https://github.com/hamedalemo/dask-tutorial
A tutorial to learn Dask DataArray and Dask DataFrames with examples from geospatial data catalogs.
dask dask-dataframes dask-distributed geospatial geospatial-analysis geospatial-data
Last synced: 06 Jun 2026
https://github.com/daniel-elston/real-time-reddit-scalable-processing
Scaling NLP processing pipelines with Dask and PySpark, utilising Apache Kafka real-time data streaming, for optimal LLM training
apache-kafka dask-distributed embeddings llm llm-training nlp pyspark scalability
Last synced: 12 May 2026
https://github.com/kaydvc/semmed-neo4j
A project using the National Library of Medicine's Semantic Medline Database to create a graphical-relational database.
aws dask dask-distributed graphical-data neo4j relational-databases semmeddb
Last synced: 17 Mar 2025