An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with dask

A curated list of projects in awesome lists tagged with dask .

https://github.com/dask/dask

Parallel computing with task scheduling

dask numpy pandas pydata python scikit-learn scipy

Last synced: 12 May 2025

https://github.com/pydata/xarray

N-D labeled arrays and datasets in Python

dask netcdf numpy pandas python xarray

Last synced: 22 Dec 2025

https://github.com/mars-project/mars

Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.

dask dataframe joblib lightgbm machine-learning numpy pandas python pytorch ray scikit-learn statsmodels tensor tensorflow xgboost

Last synced: 25 Apr 2025

https://github.com/jmcarpenter2/swifter

A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner

dask modin pandas pandas-dataframe parallel-computing parallelization

Last synced: 14 May 2025

https://github.com/fugue-project/fugue

A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.

dask data-practitioners distributed distributed-computing distributed-systems machine-learning pandas spark sql

Last synced: 12 May 2025

https://github.com/dask/distributed

A distributed task scheduler for Dask

dask distributed-computing hacktoberfest pydata python

Last synced: 06 May 2025

https://github.com/narwhals-dev/narwhals

Lightweight and extensible compatibility layer between dataframe libraries!

cudf dask duckdb ibis pandas polars pyarrow pyspark

Last synced: 06 Jan 2026

https://narwhals-dev.github.io/narwhals/

Lightweight and extensible compatibility layer between dataframe libraries!

cudf dask duckdb ibis pandas polars pyarrow pyspark

Last synced: 18 Jul 2025

https://github.com/pytroll/satpy

Python package for earth-observing satellite data processing

closember dask hacktoberfest python satellite weather xarray

Last synced: 13 May 2025

https://github.com/Nixtla/mlforecast

Scalable machine 🤖 learning for time series forecasting.

dask forecast forecasting lightgbm machine-learning python time-series xgboost

Last synced: 04 Apr 2025

https://github.com/ranaroussi/pystore

Fast data store for Pandas time-series data

dask database dataframe datastore pandas parquet timeseries

Last synced: 01 Apr 2025

https://github.com/capitalone/datacompy

Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!

compare dask data data-science dataframes fugue numpy pandas polars pyspark python snowflake snowpark spark

Last synced: 14 May 2025

https://github.com/dask-contrib/dask-sql

Distributed SQL Engine in Python using Dask

dask distributed ml python sql sql-engines sql-server

Last synced: 15 May 2025

https://github.com/ouranosinc/xclim

Library of derived climate variables, ie climate indicators, based on xarray.

anuclim climate-analysis climate-science dask icclim netcdf4 python xarray xclim

Last synced: 14 May 2025

https://github.com/Ouranosinc/xclim

Library of derived climate variables, ie climate indicators, based on xarray.

anuclim climate-analysis climate-science dask icclim netcdf4 python xarray xclim

Last synced: 04 Apr 2025

https://github.com/pytroll/pyresample

Geospatial image resampling in Python

closember dask hacktoberfest kd-tree numpy python resampling xarray

Last synced: 08 Apr 2025

https://github.com/nebari-dev/nebari

🪴 Nebari - your open source data science platform

aws azure dask devops gcp infrastructure jupyter jupyterhub kubernetes python

Last synced: 12 Dec 2025

https://github.com/gjoseph92/stackstac

Turn a STAC catalog into a dask-based xarray

cog dask geospatial rasterio stac xarray

Last synced: 16 May 2025

https://github.com/dask/dask-jobqueue

Deploy Dask on job schedulers like PBS, SLURM, and SGE

dask distributed hpc pbs-cluster python sge-cluster slurm-cluster

Last synced: 01 May 2025

https://github.com/pangeo-data/climpred

:earth_americas: Verification of weather and climate forecasts :earth_africa:

climate climate-analysis dask forecasting pangeo prediction python s2d s2s xarray

Last synced: 15 May 2025

https://github.com/nvidia-merlin/models

Merlin Models is a collection of deep learning recommender system model reference implementations

dask deep-learning gpu machine-learning pytorch rapidsai recommendation-system recommender-system recsys tensorflow

Last synced: 12 Apr 2025

https://github.com/AllenCellModeling/aicsimageio

Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Python

bio-formats dask image-metadata imageio microscopy python scientific-computing scientific-formats xarray

Last synced: 07 May 2025

https://github.com/allencellmodeling/aicsimageio

Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Python

bio-formats dask image-metadata imageio microscopy python scientific-computing scientific-formats xarray

Last synced: 14 Apr 2025

https://github.com/jgrss/geowombat

GeoWombat: Utilities for geospatial data

dask geography python raster rasterio remote-sensing satellite xarray

Last synced: 21 Oct 2025

https://github.com/nci/scores

scores: Metrics for the verification, evaluation and optimisation of forecasts, predictions or models.

climate contingency-table dask forecast-evaluation forecast-verification forecasting model-validation oceanography pandas python verification weather xarray

Last synced: 18 Aug 2025

https://github.com/google/xarray-beam

Distributed Xarray with Apache Beam

beam dask xarray zarr

Last synced: 21 Oct 2025

https://github.com/hi-primus/bumblebee

🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)

bumblebee cudf dask dask-cudf data-cleaning data-preparation data-profiling datasets gpu gui optimus prepare-data python

Last synced: 02 May 2025

https://github.com/dask/dask-cloudprovider

Cloud provider cluster managers for Dask. Supports AWS, Google Cloud Azure and more...

aws azure cloud dask deployment-tools digitalocean distributed-computing google-cloud hetzner ibm-cloud openstack

Last synced: 08 May 2025

https://github.com/xarray-contrib/flox

Fast & furious GroupBy operations for dask.array

dask map-reduce xarray

Last synced: 12 Dec 2025

https://github.com/p2p-ld/numpydantic

Type annotations for specifying, validating, and serializing arrays with arbitrary backends in Pydantic (and beyond)

arrays dask data-modeling hdf5 numpy pydantic pydantic-numpy serialization validation zarr

Last synced: 06 Oct 2025

https://github.com/xarray-contrib/xeofs

Comprehensive EOF analysis in Python with xarray: A versatile, multidimensional, and scalable tool for advanced climate data analysis

climate-science dask dimensionality-reduction eof-analysis pattern-recognition pca xarray

Last synced: 12 Dec 2025

https://github.com/facultyai/lens

Summarise and explore Pandas DataFrames

dask data-exploration data-science data-visualisation dataframe pandas

Last synced: 14 Apr 2025

https://github.com/dymaxionlabs/dask-rasterio

Read and write rasters in parallel using Rasterio and Dask

dask gdal python rasterio

Last synced: 30 Apr 2025

https://github.com/ds2-lab/wukong

Wukong: A scalable and locality-enhanced serverless parallel framework (ACM SoCC'20)

analytics aws aws-lambda cloud-computing dask data-analytics faas linear-algebra machine-learning parallel-computing python serverless serverless-computing

Last synced: 09 Jul 2025

https://github.com/superlinear-ai/graphchain

⚡️ An efficient cache for the execution of dask graphs.

cache dask s3

Last synced: 27 Apr 2025

https://github.com/ncar/ncar-python-tutorial

Numerical & Scientific Computing with Python Tutorial

cartopy dask jupyter matplotlib numpy python scipy tutorial xarray

Last synced: 08 Oct 2025

https://github.com/mitgcm/xmitgcm

Read MITgcm mds binary files into xarray

dask ocean-modelling oceanography xarray

Last synced: 21 Oct 2025

https://github.com/dask-contrib/dask-awkward

Native Dask collection for awkward arrays, and the library to use it.

columnar-format dask data-analysis data-science data-structure jagged-array python ragged-array

Last synced: 12 Apr 2025

https://github.com/saturncloud/dask-pytorch-ddp

dask-pytorch-ddp is a Python package that makes it easy to train PyTorch models on dask clusters using distributed data parallel.

computer-vision dask deep-learning distributed-computing machine-learning nlp pytorch

Last synced: 14 Dec 2025

https://github.com/dask-contrib/dask-deltatable

A Delta Lake reader for Dask

dask dask-dataframes delta-lake parquet python

Last synced: 11 Oct 2025

https://github.com/ml-tooling/lazycluster

🎛 Distributed machine learning made simple.

cluster dask distributed-computing hyperopt machine-learning python ssh

Last synced: 30 Dec 2025

https://github.com/shauryashaurya/learn-data-munging

Notes on Data Engineering with Pandas, PySpark, Dask, Ray, Arrow DataFusion, Polars etc.

arrow dask dask-distributed data-engineering datafusion jupyter numpy pandas polars pyspark ray spark

Last synced: 16 Apr 2025

https://github.com/ncar/cesm-lens-aws

Examples of analysis of CESM LENS data publicly available on Amazon S3 (us-west-2 region) using xarray and dask

aws binder cesm-lens dask intake pangeo python xarray zarr

Last synced: 11 Apr 2025

https://github.com/dgerlanc/dask-scaling-dataframe

Python and Dask: Scaling the Dataframe

big-data dask dataframe exercises pandas python

Last synced: 16 Aug 2025

https://github.com/lesommer/oocgcm

oocgcm is a python library for the analysis of large gridded geophysical dataset.

dask geoscientific ocean ocean-models python python-packages xarray

Last synced: 05 Sep 2025

https://github.com/jrbourbeau/madpy-dask

MadPy Dask talk materials

dask parallel-computing python

Last synced: 30 Apr 2025

https://github.com/thewtex/ngff-zarr

A lean and kind Open Microscopy Environment (OME) Next Generation File Format (NGFF) Zarr implementation.

dask imaging medical microscopy ome-zarr ome-zarr-converter python

Last synced: 07 Apr 2025

https://github.com/mdanalysis/pmda

Parallel algorithms for MDAnalysis

analysis dask mdanalysis molecular-dynamics parallel

Last synced: 12 Apr 2025

https://github.com/basnijholt/adaptive-scheduler

Run many functions (adaptively) on many cores (>10k-100k) using mpi4py.futures, ipyparallel, loky, or dask-mpi. :tada:

active-learning adaptive adaptive-learning dask distributed-computing interactive ipyparallel loky mpi4py parallel-computing pbs python slurm

Last synced: 06 Apr 2025

https://github.com/dask-contrib/dask-snowflake

Dask integration for Snowflake

dask python snowflake

Last synced: 12 Dec 2025

https://github.com/gjbex/python-for-hpc

Repository for participants of the "Python for HPC" training

cuda cython dask gpu hpc mpi numba python python-training scientific-computing swig training

Last synced: 13 Jul 2025

https://github.com/vizzuality/cog_worker

Scalable arbitrary analysis on COGs

cog dask geotiff gis raster rasterio remote-sensing

Last synced: 14 Oct 2025

https://github.com/sinhrks/daskperiment

Reproducibility for Humans: A lightweight tool to perform reproducible machine learning experiment.

dask machine-learning reproducibility

Last synced: 08 May 2025

https://github.com/itamarst/dask-memusage

A low-impact profiler to figure out how much memory each task in Dask is using

dask memory profiler profiling python

Last synced: 14 Sep 2025

https://github.com/makepath/austin-ml-change-detection-demo

A change detection demo for the Austin area using a pre-trained PyTorch model scaled with Dask on Planet imagery.

change-detection dask deep-learning planet pytorch remote-sensing

Last synced: 31 Oct 2025

https://github.com/dask-contrib/dask-histogram

Histograms with task scheduling.

dask histograms pydata

Last synced: 09 Apr 2025

https://github.com/ratt-ru/codex-africanus

Radio Astronomy Algorithms Library

dask numba python radio-astronomy

Last synced: 06 Jul 2025

https://github.com/pnnl/mercat

MerCat: python code for versatile k-mer counting and diversity estimation for database independent property analysis for meta -ome data

dask database-independent-analysis diversity diversity-estimation divideandconquer fastq k-mer-counting k-mer-frequency kmer-frequency-count metagenomic-analysis metatranscriptomic-analysis nucleotides plotly protein python

Last synced: 12 Apr 2025

https://github.com/pangeo-data/pangeo-binder

Pangeo + Binder (dev repo for a binder/pangeo fusion concept)

binderhub dask jupyter-notebook jupyterhub pangeo

Last synced: 12 Apr 2025

https://github.com/splunk/deep-learning-toolkit

Deep Learning Toolkit for Splunk

dask kubernetes pytorch spark splunk tensorflow

Last synced: 11 Oct 2025

https://github.com/ncar/ncar-jobqueue

Utilities for configuring dask-jobqueue with appropriate settings for NCAR clusters

dask dask-jobqueue

Last synced: 11 Apr 2025

https://github.com/jameslamb/lightgbm-dask-testing

Test LightGBM's Dask integration on different cluster types

aws dask dask-distributed docker lightgbm machine-learning

Last synced: 06 Sep 2025

https://github.com/mansenfranzen/pywrangler

Advanced data wrangling for python

dask dataframe datawrangling pyspark python

Last synced: 24 Apr 2025

https://github.com/aws-solutions-library-samples/distributed-compute-on-aws-with-cross-regional-dask

Perform I/O intensive workloads on high-volume data sparsely located across multiple AWS regions through the use of Dask.

dask dask-distributed dask-worker-pools

Last synced: 14 Oct 2025

https://github.com/unum-cloud/udsb

Unlimited Data-Science Benchmarks for Numeric, Tabular and Graph Workloads

apache-arrow arrow cublas cudf cugraph dask modin networkx numpy pandas sqlite

Last synced: 26 Jun 2025

https://github.com/wigging/pythonic

Examples of the Python programming language

dask flask matplotlib numpy python scipy tkinter

Last synced: 11 Apr 2025

https://github.com/nasa-nccs-hpda/terragpu

Python library to process and classify remote sensing imagery by means of GPUs and ML.

ai cudf cupy cuspatial dask earth-science geopandas gpu ml numpy raster vector

Last synced: 10 Apr 2025

https://github.com/dask/dask-pyspy

Profile the dask distributed scheduler with py-spy and viztracer

dask profiling py-spy viztracer

Last synced: 08 May 2025

https://github.com/casangi/cngi_prototype

Prototype Development of CNGI

astronomy dask numba scipy xarray zarr

Last synced: 07 Oct 2025

https://github.com/social-media-public-analysis/dozent

Dozent is a powerful downloader that is used to collect large amounts of Twitter data from the internet archive.

accelerator dask download followers following image likes python save-image scrape scraper scraping selenium social-media tweets twitter webdriver

Last synced: 06 Oct 2025

https://github.com/tomwhite/dask-executor-scheduler

A Dask scheduler that uses a Python concurrent.futures.Executor to run tasks

dask pywren serverless

Last synced: 24 Jul 2025

https://github.com/aporia-ai/aporia-importer

🏋️‍♀️ Import inference data from Amazon S3, Azure Blob Storage, Google Cloud Storage and others to Aporia

amazon-s3 azure-blob-storage csv dask google-cloud-storage importer parquet

Last synced: 30 Apr 2025

https://github.com/developmentseed/label-maker-dask

Library for running label-maker as a dask job

dask machine-learning microsoft osm

Last synced: 13 Oct 2025