Projects in Awesome Lists tagged with dask
A curated list of projects in awesome lists tagged with dask .
https://github.com/dask/dask
Parallel computing with task scheduling
dask numpy pandas pydata python scikit-learn scipy
Last synced: 12 May 2025
https://github.com/rapidsai/cudf
cuDF - GPU DataFrame Library
arrow cpp cuda cudf dask data-analysis data-science dataframe gpu pandas pydata python rapids
Last synced: 12 May 2025
https://github.com/tdameritrade/stumpy
STUMPY is a powerful and scalable Python library for modern time series analysis
anomaly-detection dask data-science matrix-profile motif-discovery numba pattern-matching pydata python time-series-analysis time-series-data-mining time-series-segmentation
Last synced: 14 May 2025
https://github.com/TDAmeritrade/stumpy
STUMPY is a powerful and scalable Python library for modern time series analysis
anomaly-detection dask data-science matrix-profile motif-discovery numba pattern-matching pydata python time-series-analysis time-series-data-mining time-series-segmentation
Last synced: 26 Mar 2025
https://github.com/mars-project/mars
Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
dask dataframe joblib lightgbm machine-learning numpy pandas python pytorch ray scikit-learn statsmodels tensor tensorflow xgboost
Last synced: 25 Apr 2025
https://github.com/jmcarpenter2/swifter
A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner
dask modin pandas pandas-dataframe parallel-computing parallelization
Last synced: 14 May 2025
https://github.com/fugue-project/fugue
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
dask data-practitioners distributed distributed-computing distributed-systems machine-learning pandas spark sql
Last synced: 12 May 2025
https://github.com/dask/distributed
A distributed task scheduler for Dask
dask distributed-computing hacktoberfest pydata python
Last synced: 06 May 2025
https://github.com/hi-primus/optimus
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
big-data-cleaning bigdata cudf dask dask-cudf data-analysis data-cleaner data-cleaning data-cleansing data-exploration data-extraction data-preparation data-profiling data-science data-transformation data-wrangling machine-learning pyspark spark
Last synced: 14 May 2025
https://github.com/itamarst/eliot
Eliot: the logging system that tells you *why* it happened
asyncio causality causality-analysis causation dask elasticsearch journald logging logging-library numpy python scientific-computing tracing twisted
Last synced: 14 May 2025
https://github.com/pytroll/satpy
Python package for earth-observing satellite data processing
closember dask hacktoberfest python satellite weather xarray
Last synced: 13 May 2025
https://github.com/Nixtla/mlforecast
Scalable machine 🤖 learning for time series forecasting.
dask forecast forecasting lightgbm machine-learning python time-series xgboost
Last synced: 04 Apr 2025
https://github.com/ranaroussi/pystore
Fast data store for Pandas time-series data
dask database dataframe datastore pandas parquet timeseries
Last synced: 01 Apr 2025
https://github.com/capitalone/datacompy
Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!
compare dask data data-science dataframes fugue numpy pandas polars pyspark python snowflake snowpark spark
Last synced: 14 May 2025
https://github.com/polyaxon/datatile
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
dask data-exploration data-profiling data-quality data-quality-checks data-science data-visualization dataframes dataops explainable-ai matplotlib mlops pandas pandas-summary plotly pytorch spark statistics tensorflow tracking
Last synced: 17 Aug 2025
https://github.com/polyaxon/traceml
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
dask data-exploration data-profiling data-quality data-quality-checks data-science data-visualization dataframes dataops explainable-ai matplotlib mlops pandas pandas-summary plotly pytorch spark statistics tensorflow tracking
Last synced: 12 Dec 2025
https://github.com/dask-contrib/dask-sql
Distributed SQL Engine in Python using Dask
dask distributed ml python sql sql-engines sql-server
Last synced: 15 May 2025
https://github.com/ouranosinc/xclim
Library of derived climate variables, ie climate indicators, based on xarray.
anuclim climate-analysis climate-science dask icclim netcdf4 python xarray xclim
Last synced: 14 May 2025
https://github.com/datacanvasio/hypergbm
A full pipeline AutoML tool for tabular data
adversarial-validation automl catboost dask dask-distributed datacleaning distributed-training ensemble-learning fullpipeline gbm gpu-acceleration lightgbm preprocessing pseudo-labeling rapidsai semi-supervised-learning sklearn tabular-data xgboost
Last synced: 15 May 2025
https://github.com/DataCanvasIO/HyperGBM
A full pipeline AutoML tool for tabular data
adversarial-validation automl catboost dask dask-distributed datacleaning distributed-training ensemble-learning fullpipeline gbm gpu-acceleration lightgbm preprocessing pseudo-labeling rapidsai semi-supervised-learning sklearn tabular-data xgboost
Last synced: 09 May 2025
https://github.com/Ouranosinc/xclim
Library of derived climate variables, ie climate indicators, based on xarray.
anuclim climate-analysis climate-science dask icclim netcdf4 python xarray xclim
Last synced: 04 Apr 2025
https://github.com/pytroll/pyresample
Geospatial image resampling in Python
closember dask hacktoberfest kd-tree numpy python resampling xarray
Last synced: 08 Apr 2025
https://github.com/nebari-dev/nebari
🪴 Nebari - your open source data science platform
aws azure dask devops gcp infrastructure jupyter jupyterhub kubernetes python
Last synced: 12 Dec 2025
https://github.com/JiaweiZhuang/xESMF
Universal Regridder for Geospatial Data
dask earth-science esmf esmpy geospatial interpolation python regridding remapping xarray
Last synced: 04 Apr 2025
https://github.com/jiaweizhuang/xesmf
Universal Regridder for Geospatial Data
dask earth-science esmf esmpy geospatial interpolation python regridding remapping xarray
Last synced: 09 Apr 2025
https://github.com/aws-samples/amazon-sagemaker-local-mode
Amazon SageMaker Local Mode Examples
amazon-sagemaker catboost dask delta-lake gensim-word2vec hdbscan-clustering-algorithm huggingface huggingface-transformers lightgbm machine-learning prophet prophet-model pycharm pytorch pytorch-training sagemaker sagemaker-processing scikit-learn tensorflow tensorflow-training
Last synced: 16 May 2025
https://github.com/gjoseph92/stackstac
Turn a STAC catalog into a dask-based xarray
cog dask geospatial rasterio stac xarray
Last synced: 16 May 2025
https://github.com/tkp-archive/paperboy
A web frontend for scheduling Jupyter notebook reports
airflow apache-airflow celery dask docker jupyter jupyter-notebook jupyter-notebooks jupyterlab kubernetes luigi notebook nteract papermill phosphorjs scheduling-notebooks
Last synced: 12 Dec 2025
https://github.com/dask/dask-jobqueue
Deploy Dask on job schedulers like PBS, SLURM, and SGE
dask distributed hpc pbs-cluster python sge-cluster slurm-cluster
Last synced: 01 May 2025
https://github.com/pangeo-data/climpred
:earth_americas: Verification of weather and climate forecasts :earth_africa:
climate climate-analysis dask forecasting pangeo prediction python s2d s2s xarray
Last synced: 15 May 2025
https://github.com/nvidia-merlin/models
Merlin Models is a collection of deep learning recommender system model reference implementations
dask deep-learning gpu machine-learning pytorch rapidsai recommendation-system recommender-system recsys tensorflow
Last synced: 12 Apr 2025
https://github.com/LDO-CERT/orochi
The Volatility Collaborative GUI
dask hacktoberfest memory-dump orochi volatility volatility-framework volatility-gui
Last synced: 30 Mar 2025
https://github.com/AllenCellModeling/aicsimageio
Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Python
bio-formats dask image-metadata imageio microscopy python scientific-computing scientific-formats xarray
Last synced: 07 May 2025
https://github.com/allencellmodeling/aicsimageio
Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Python
bio-formats dask image-metadata imageio microscopy python scientific-computing scientific-formats xarray
Last synced: 14 Apr 2025
https://github.com/jgrss/geowombat
GeoWombat: Utilities for geospatial data
dask geography python raster rasterio remote-sensing satellite xarray
Last synced: 21 Oct 2025
https://github.com/nci/scores
scores: Metrics for the verification, evaluation and optimisation of forecasts, predictions or models.
climate contingency-table dask forecast-evaluation forecast-verification forecasting model-validation oceanography pandas python verification weather xarray
Last synced: 18 Aug 2025
https://github.com/jcmgray/autoray
Abstract your array operations.
array dask deep-learning jax lazy machine-learning numpy python pytorch tensor tensorflow
Last synced: 15 May 2025
https://github.com/hi-primus/bumblebee
🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)
bumblebee cudf dask dask-cudf data-cleaning data-preparation data-profiling datasets gpu gui optimus prepare-data python
Last synced: 02 May 2025
https://github.com/dask/dask-cloudprovider
Cloud provider cluster managers for Dask. Supports AWS, Google Cloud Azure and more...
aws azure cloud dask deployment-tools digitalocean distributed-computing google-cloud hetzner ibm-cloud openstack
Last synced: 08 May 2025
https://github.com/ray-project/xgboost_ray
Distributed XGBoost on Ray
dask data-science kaggle machine-learning modin xgboost
Last synced: 12 Apr 2025
https://github.com/xarray-contrib/flox
Fast & furious GroupBy operations for dask.array
Last synced: 12 Dec 2025
https://github.com/p2p-ld/numpydantic
Type annotations for specifying, validating, and serializing arrays with arbitrary backends in Pydantic (and beyond)
arrays dask data-modeling hdf5 numpy pydantic pydantic-numpy serialization validation zarr
Last synced: 06 Oct 2025
https://github.com/xarray-contrib/xeofs
Comprehensive EOF analysis in Python with xarray: A versatile, multidimensional, and scalable tool for advanced climate data analysis
climate-science dask dimensionality-reduction eof-analysis pattern-recognition pca xarray
Last synced: 12 Dec 2025
https://github.com/facultyai/lens
Summarise and explore Pandas DataFrames
dask data-exploration data-science data-visualisation dataframe pandas
Last synced: 14 Apr 2025
https://github.com/dymaxionlabs/dask-rasterio
Read and write rasters in parallel using Rasterio and Dask
Last synced: 30 Apr 2025
https://github.com/polyaxon/mloperator
Machine learning operator & controller for Kubernetes
dask deep-learning k8s keras kubernetes kubernetes-operator machine-learning mlops mpi mxnet notebook pytorch scikit-learn spark tensorboard tensorflow xgboost
Last synced: 03 Mar 2025
https://github.com/msoechting/lexcube
Lexcube: 3D Data Cube Visualization in Jupyter Notebooks
dask datacube earth-engine eo geographical-information-system geospatial gis google-earth-engine javascript jupyterlab-extension python raster remote-sensing typescript visualization xarray
Last synced: 19 Mar 2025
https://github.com/ds2-lab/wukong
Wukong: A scalable and locality-enhanced serverless parallel framework (ACM SoCC'20)
analytics aws aws-lambda cloud-computing dask data-analytics faas linear-algebra machine-learning parallel-computing python serverless serverless-computing
Last synced: 09 Jul 2025
https://github.com/superlinear-ai/graphchain
⚡️ An efficient cache for the execution of dask graphs.
Last synced: 27 Apr 2025
https://github.com/treebeardtech/kubeflow-bootstrap
🪐 1-click Kubeflow using ArgoCD
ai airflow argocd dask gpu helm jupyter jupyterhub jupyterlab kserve kubeflow kubernetes kustomize llms machine-learning mlflow ray spark terraform
Last synced: 08 May 2025
https://github.com/ncar/ncar-python-tutorial
Numerical & Scientific Computing with Python Tutorial
cartopy dask jupyter matplotlib numpy python scipy tutorial xarray
Last synced: 08 Oct 2025
https://github.com/mitgcm/xmitgcm
Read MITgcm mds binary files into xarray
dask ocean-modelling oceanography xarray
Last synced: 21 Oct 2025
https://github.com/dask-contrib/dask-awkward
Native Dask collection for awkward arrays, and the library to use it.
columnar-format dask data-analysis data-science data-structure jagged-array python ragged-array
Last synced: 12 Apr 2025
https://github.com/bytehub-ai/bytehub
ByteHub: making feature stores simple
bytehub-cloud dask data-engineering data-science feature-engineering feature-store featurestore forecasting machine-learning machinelearning machinelearning-python pandas timeseries
Last synced: 27 Aug 2025
https://github.com/saturncloud/dask-pytorch-ddp
dask-pytorch-ddp is a Python package that makes it easy to train PyTorch models on dask clusters using distributed data parallel.
computer-vision dask deep-learning distributed-computing machine-learning nlp pytorch
Last synced: 14 Dec 2025
https://github.com/dask-contrib/dask-deltatable
A Delta Lake reader for Dask
dask dask-dataframes delta-lake parquet python
Last synced: 11 Oct 2025
https://github.com/pnavaro/big-data
Python tools for big data
dask data-science hadoop jupyter-book notebooks python spark
Last synced: 06 May 2025
https://github.com/ml-tooling/lazycluster
🎛 Distributed machine learning made simple.
cluster dask distributed-computing hyperopt machine-learning python ssh
Last synced: 30 Dec 2025
https://github.com/shauryashaurya/learn-data-munging
Notes on Data Engineering with Pandas, PySpark, Dask, Ray, Arrow DataFusion, Polars etc.
arrow dask dask-distributed data-engineering datafusion jupyter numpy pandas polars pyspark ray spark
Last synced: 16 Apr 2025
https://github.com/lesommer/oocgcm
oocgcm is a python library for the analysis of large gridded geophysical dataset.
dask geoscientific ocean ocean-models python python-packages xarray
Last synced: 05 Sep 2025
https://github.com/jrbourbeau/madpy-dask
MadPy Dask talk materials
dask parallel-computing python
Last synced: 30 Apr 2025
https://github.com/thewtex/ngff-zarr
A lean and kind Open Microscopy Environment (OME) Next Generation File Format (NGFF) Zarr implementation.
dask imaging medical microscopy ome-zarr ome-zarr-converter python
Last synced: 07 Apr 2025
https://github.com/mdanalysis/pmda
Parallel algorithms for MDAnalysis
analysis dask mdanalysis molecular-dynamics parallel
Last synced: 12 Apr 2025
https://github.com/basnijholt/adaptive-scheduler
Run many functions (adaptively) on many cores (>10k-100k) using mpi4py.futures, ipyparallel, loky, or dask-mpi. :tada:
active-learning adaptive adaptive-learning dask distributed-computing interactive ipyparallel loky mpi4py parallel-computing pbs python slurm
Last synced: 06 Apr 2025
https://github.com/dask-contrib/dask-snowflake
Dask integration for Snowflake
Last synced: 12 Dec 2025
https://github.com/gjbex/python-for-hpc
Repository for participants of the "Python for HPC" training
cuda cython dask gpu hpc mpi numba python python-training scientific-computing swig training
Last synced: 13 Jul 2025
https://github.com/vizzuality/cog_worker
Scalable arbitrary analysis on COGs
cog dask geotiff gis raster rasterio remote-sensing
Last synced: 14 Oct 2025
https://github.com/sinhrks/daskperiment
Reproducibility for Humans: A lightweight tool to perform reproducible machine learning experiment.
dask machine-learning reproducibility
Last synced: 08 May 2025
https://github.com/makepath/austin-ml-change-detection-demo
A change detection demo for the Austin area using a pre-trained PyTorch model scaled with Dask on Planet imagery.
change-detection dask deep-learning planet pytorch remote-sensing
Last synced: 31 Oct 2025
https://github.com/dask-contrib/dask-histogram
Histograms with task scheduling.
Last synced: 09 Apr 2025
https://github.com/ratt-ru/codex-africanus
Radio Astronomy Algorithms Library
dask numba python radio-astronomy
Last synced: 06 Jul 2025
https://github.com/blazingdb/welcome_to_blazingsql_notebooks
RAPIDS data science. No setup required.
blazingsql blazingsql-notebooks dask data-science data-visualization demos gpu jupyterlab machine-learning notebooks rapids sql
Last synced: 12 Apr 2025
https://github.com/pnnl/mercat
MerCat: python code for versatile k-mer counting and diversity estimation for database independent property analysis for meta -ome data
dask database-independent-analysis diversity diversity-estimation divideandconquer fastq k-mer-counting k-mer-frequency kmer-frequency-count metagenomic-analysis metatranscriptomic-analysis nucleotides plotly protein python
Last synced: 12 Apr 2025
https://github.com/pangeo-data/pangeo-binder
Pangeo + Binder (dev repo for a binder/pangeo fusion concept)
binderhub dask jupyter-notebook jupyterhub pangeo
Last synced: 12 Apr 2025
https://github.com/elcorto/psweep
Loop like a pro, make parameter studies fun.
computational-experiment dask dask-distributed dask-jobqueue database pandas parameter-estimation parameter-scan parameter-search parameter-study parameter-sweep python
Last synced: 17 Jun 2025
https://github.com/splunk/deep-learning-toolkit
Deep Learning Toolkit for Splunk
dask kubernetes pytorch spark splunk tensorflow
Last synced: 11 Oct 2025
https://github.com/ncar/ncar-jobqueue
Utilities for configuring dask-jobqueue with appropriate settings for NCAR clusters
Last synced: 11 Apr 2025
https://github.com/jameslamb/lightgbm-dask-testing
Test LightGBM's Dask integration on different cluster types
aws dask dask-distributed docker lightgbm machine-learning
Last synced: 06 Sep 2025
https://github.com/mansenfranzen/pywrangler
Advanced data wrangling for python
dask dataframe datawrangling pyspark python
Last synced: 24 Apr 2025
https://github.com/aws-solutions-library-samples/distributed-compute-on-aws-with-cross-regional-dask
Perform I/O intensive workloads on high-volume data sparsely located across multiple AWS regions through the use of Dask.
dask dask-distributed dask-worker-pools
Last synced: 14 Oct 2025
https://github.com/carpentries-incubator/lesson-parallel-python
Parallel Programming in Python
beta carpentries-incubator dask english lesson parallel-programming python
Last synced: 10 Aug 2025
https://github.com/ashishpatel26/rapidsai_machine_learning_on_gpu
Rapidsai_Machine_learnring_on_GPU
cudf cuml dask dask-ml deep-learning machine-learning machine-learning-algorithms nvidia pandas scikit-learn sklearn
Last synced: 14 Oct 2025
https://github.com/wigging/pythonic
Examples of the Python programming language
dask flask matplotlib numpy python scipy tkinter
Last synced: 11 Apr 2025
https://github.com/dask/dask-pyspy
Profile the dask distributed scheduler with py-spy and viztracer
dask profiling py-spy viztracer
Last synced: 08 May 2025
https://github.com/lisaong/diec
Workshop content for Designing Intelligent Edge Computing by NUS ISS
dask docker eclipse-iot edge-computing iota iss kapua kura microbit mqtt nus openai-gym-environment raspberry-pi-3 reinforcement-learning swarm-intelligence tensorflow tensorflow-lite
Last synced: 17 Apr 2025
https://github.com/social-media-public-analysis/dozent
Dozent is a powerful downloader that is used to collect large amounts of Twitter data from the internet archive.
accelerator dask download followers following image likes python save-image scrape scraper scraping selenium social-media tweets twitter webdriver
Last synced: 06 Oct 2025
https://github.com/tomwhite/dask-executor-scheduler
A Dask scheduler that uses a Python concurrent.futures.Executor to run tasks
Last synced: 24 Jul 2025
https://github.com/aporia-ai/aporia-importer
🏋️♀️ Import inference data from Amazon S3, Azure Blob Storage, Google Cloud Storage and others to Aporia
amazon-s3 azure-blob-storage csv dask google-cloud-storage importer parquet
Last synced: 30 Apr 2025
https://github.com/developmentseed/label-maker-dask
Library for running label-maker as a dask job
dask machine-learning microsoft osm
Last synced: 13 Oct 2025