Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with dask
A curated list of projects in awesome lists tagged with dask .
https://github.com/dask/dask
Parallel computing with task scheduling
dask numpy pandas pydata python scikit-learn scipy
Last synced: 29 Sep 2024
https://github.com/rapidsai/cudf
cuDF - GPU DataFrame Library
arrow cpp cuda cudf dask data-analysis data-science dataframe gpu pandas pydata python rapids
Last synced: 29 Sep 2024
https://github.com/ibis-project/ibis
the portable Python dataframe library
bigquery clickhouse dask database datafusion duckdb impala mssql mysql pandas polars postgresql pyarrow pyspark python snowflake sql sqlalchemy sqlite trino
Last synced: 29 Sep 2024
https://github.com/tdameritrade/stumpy
STUMPY is a powerful and scalable Python library for modern time series analysis
anomaly-detection dask data-science matrix-profile motif-discovery numba pattern-matching pydata python time-series-analysis time-series-data-mining time-series-segmentation
Last synced: 01 Oct 2024
https://github.com/TDAmeritrade/stumpy
STUMPY is a powerful and scalable Python library for modern time series analysis
anomaly-detection dask data-science matrix-profile motif-discovery numba pattern-matching pydata python time-series-analysis time-series-data-mining time-series-segmentation
Last synced: 31 Jul 2024
https://github.com/mars-project/mars
Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
dask dataframe joblib lightgbm machine-learning numpy pandas python pytorch ray scikit-learn statsmodels tensor tensorflow xgboost
Last synced: 29 Sep 2024
https://github.com/jmcarpenter2/swifter
A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner
dask modin pandas pandas-dataframe parallel-computing parallelization
Last synced: 30 Sep 2024
https://github.com/fugue-project/fugue
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
dask data-practitioners distributed distributed-computing distributed-systems machine-learning pandas spark sql
Last synced: 28 Sep 2024
https://github.com/dask/distributed
A distributed task scheduler for Dask
dask distributed-computing hacktoberfest pydata python
Last synced: 31 Jul 2024
https://github.com/ironmussa/Optimus
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
big-data-cleaning bigdata cudf dask dask-cudf data-analysis data-cleaner data-cleaning data-cleansing data-exploration data-extraction data-preparation data-profiling data-science data-transformation data-wrangling machine-learning pyspark spark
Last synced: 30 Jul 2024
https://github.com/hi-primus/optimus
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
big-data-cleaning bigdata cudf dask dask-cudf data-analysis data-cleaner data-cleaning data-cleansing data-exploration data-extraction data-preparation data-profiling data-science data-transformation data-wrangling machine-learning pyspark spark
Last synced: 28 Sep 2024
https://github.com/itamarst/eliot
Eliot: the logging system that tells you *why* it happened
asyncio causality causality-analysis causation dask elasticsearch journald logging logging-library numpy python scientific-computing tracing twisted
Last synced: 28 Sep 2024
https://github.com/pytroll/satpy
Python package for earth-observing satellite data processing
closember dask hacktoberfest python satellite weather xarray
Last synced: 01 Aug 2024
https://github.com/Nixtla/mlforecast
Scalable machine 🤖 learning for time series forecasting.
dask forecast forecasting lightgbm machine-learning python time-series xgboost
Last synced: 01 Aug 2024
https://github.com/ranaroussi/pystore
Fast data store for Pandas time-series data
dask database dataframe datastore pandas parquet timeseries
Last synced: 01 Aug 2024
https://github.com/polyaxon/traceml
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
dask data-exploration data-profiling data-quality data-quality-checks data-science data-visualization dataframes dataops explainable-ai matplotlib mlops pandas pandas-summary plotly pytorch spark statistics tensorflow tracking
Last synced: 27 Sep 2024
https://github.com/capitalone/datacompy
Pandas and Spark DataFrame comparison for humans and more!
compare dask data data-science dataframes fugue numpy pandas polars pyspark python spark
Last synced: 28 Sep 2024
https://github.com/dask-contrib/dask-sql
Distributed SQL Engine in Python using Dask
dask distributed ml python sql sql-engines sql-server
Last synced: 28 Sep 2024
https://github.com/pytroll/pyresample
Geospatial image resampling in Python
closember dask hacktoberfest kd-tree numpy python resampling xarray
Last synced: 03 Aug 2024
https://github.com/datacanvasio/hypergbm
A full pipeline AutoML tool for tabular data
adversarial-validation automl catboost dask dask-distributed datacleaning distributed-training ensemble-learning fullpipeline gbm gpu-acceleration lightgbm preprocessing pseudo-labeling rapidsai semi-supervised-learning sklearn tabular-data xgboost
Last synced: 26 Sep 2024
https://github.com/DataCanvasIO/HyperGBM
A full pipeline AutoML tool for tabular data
adversarial-validation automl catboost dask dask-distributed datacleaning distributed-training ensemble-learning fullpipeline gbm gpu-acceleration lightgbm preprocessing pseudo-labeling rapidsai semi-supervised-learning sklearn tabular-data xgboost
Last synced: 03 Aug 2024
https://github.com/Ouranosinc/xclim
Library of derived climate variables, ie climate indicators, based on xarray.
anuclim climate-analysis climate-science dask icclim netcdf4 python xarray xclim
Last synced: 01 Aug 2024
https://github.com/JiaweiZhuang/xESMF
Universal Regridder for Geospatial Data
dask earth-science esmf esmpy geospatial interpolation python regridding remapping xarray
Last synced: 01 Aug 2024
https://github.com/tkp-archive/paperboy
A web frontend for scheduling Jupyter notebook reports
airflow apache-airflow celery dask docker jupyter jupyter-notebook jupyter-notebooks jupyterlab kubernetes luigi notebook nteract papermill phosphorjs scheduling-notebooks
Last synced: 02 Oct 2024
https://github.com/aws-samples/amazon-sagemaker-local-mode
Amazon SageMaker Local Mode Examples
amazon-sagemaker catboost dask delta-lake gensim-word2vec hdbscan-clustering-algorithm huggingface huggingface-transformers lightgbm machine-learning prophet prophet-model pycharm pytorch pytorch-training sagemaker sagemaker-processing scikit-learn tensorflow tensorflow-training
Last synced: 26 Sep 2024
https://github.com/gjoseph92/stackstac
Turn a STAC catalog into a dask-based xarray
cog dask geospatial rasterio stac xarray
Last synced: 03 Oct 2024
https://github.com/dask/dask-jobqueue
Deploy Dask on job schedulers like PBS, SLURM, and SGE
dask distributed hpc pbs-cluster python sge-cluster slurm-cluster
Last synced: 02 Aug 2024
https://github.com/pangeo-data/climpred
:earth_americas: Verification of weather and climate forecasts :earth_africa:
climate climate-analysis dask forecasting pangeo prediction python s2d s2s xarray
Last synced: 01 Aug 2024
https://github.com/LDO-CERT/orochi
The Volatility Collaborative GUI
dask hacktoberfest memory-dump orochi volatility volatility-framework volatility-gui
Last synced: 01 Aug 2024
https://github.com/allencellmodeling/aicsimageio
Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Python
bio-formats dask image-metadata imageio microscopy python scientific-computing scientific-formats xarray
Last synced: 01 Oct 2024
https://github.com/AllenCellModeling/aicsimageio
Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Python
bio-formats dask image-metadata imageio microscopy python scientific-computing scientific-formats xarray
Last synced: 03 Aug 2024
https://github.com/jgrss/geowombat
GeoWombat: Utilities for geospatial data
dask geography python raster rasterio remote-sensing satellite xarray
Last synced: 31 Jul 2024
https://github.com/ray-project/xgboost_ray
Distributed XGBoost on Ray
dask data-science kaggle machine-learning modin xgboost
Last synced: 03 Aug 2024
https://github.com/xarray-contrib/flox
Fast & furious GroupBy operations for dask.array
Last synced: 02 Aug 2024
https://github.com/dymaxionlabs/dask-rasterio
Read and write rasters in parallel using Rasterio and Dask
Last synced: 03 Aug 2024
https://github.com/polyaxon/mloperator
Machine learning operator & controller for Kubernetes
dask deep-learning k8s keras kubernetes kubernetes-operator machine-learning mlops mpi mxnet notebook pytorch scikit-learn spark tensorboard tensorflow xgboost
Last synced: 30 Sep 2024
https://github.com/xarray-contrib/xeofs
Comprehensive EOF analysis in Python with xarray: A versatile, multidimensional, and scalable tool for advanced climate data analysis
climate-science dask dimensionality-reduction eof-analysis pattern-recognition pca xarray
Last synced: 08 Aug 2024
https://github.com/ncar/ncar-python-tutorial
Numerical & Scientific Computing with Python Tutorial
cartopy dask jupyter matplotlib numpy python scipy tutorial xarray
Last synced: 01 Oct 2024
https://github.com/bytehub-ai/bytehub
ByteHub: making feature stores simple
bytehub-cloud dask data-engineering data-science feature-engineering feature-store featurestore forecasting machine-learning machinelearning machinelearning-python pandas timeseries
Last synced: 03 Aug 2024
https://github.com/saturncloud/dask-pytorch-ddp
dask-pytorch-ddp is a Python package that makes it easy to train PyTorch models on dask clusters using distributed data parallel.
computer-vision dask deep-learning distributed-computing machine-learning nlp pytorch
Last synced: 03 Aug 2024
https://github.com/ml-tooling/lazycluster
🎛 Distributed machine learning made simple.
cluster dask distributed-computing hyperopt machine-learning python ssh
Last synced: 06 Aug 2024
https://github.com/sinhrks/daskperiment
Reproducibility for Humans: A lightweight tool to perform reproducible machine learning experiment.
dask machine-learning reproducibility
Last synced: 03 Aug 2024
https://github.com/nci/scores
scores: verification scores and metrics, supporting the earth system modelling community
climate dask forecast-evaluation forecast-verification forecasting model-validation oceanography pandas python verification weather xarray
Last synced: 08 Aug 2024
https://github.com/mansenfranzen/pywrangler
Advanced data wrangling for python
dask dataframe datawrangling pyspark python
Last synced: 02 Oct 2024
https://github.com/developmentseed/label-maker-dask
Library for running label-maker as a dask job
dask machine-learning microsoft osm
Last synced: 03 Aug 2024
https://github.com/osoceanacoustics/echodataflow
Orchestrated sonar data processing workflow
dask docker elasticsearch kafka kibana logstash prefect python
Last synced: 28 Sep 2024
https://github.com/ornl/flowcept
Runtime data integration system that empowers any data processing system to capture and query workflow provenance using data observability.
big-data dask data-integration lineage machine-learning mlflow model-management parallel-processing provenance reproducibility responsible-ai scientific-workflows tensorboard trustworthy-ai workflows
Last synced: 29 Sep 2024
https://github.com/ramy-badr-ahmed/higgs-dataset-training
Training Higgs Dataset with Keras - https://doi.org/10.5281/zenodo.13133945
binary-classification cuda-toolkit cupy dask dask-dataframes higgs-boson keras keras-tensorflow matplotlib matplotlib-python numpy pandas pandas-dataframe scikit-learn uci-dataset uci-machine-learning
Last synced: 26 Sep 2024
https://github.com/rhasanm/airflower
Airflower adds intelligent decision-making to Apache Airflow by capturing real-time metadata through listeners and sending it to an external brain via gRPC. The brain, equipped with analytical and rule engines, processes the data and sends decisions back to Airflow to optimize task execution dynamically.
airflow dask data-engineering etl grpc machine-learning pandas python rabbitmq tensorflow
Last synced: 29 Sep 2024