awesome-datascience
:memo: An awesome Data Science repository to learn and apply for real world problems.
https://github.com/academic/awesome-datascience
Last synced: 5 days ago
JSON representation
-
The Data Science Toolbox
-
Deep Learning Packages
- vizzu
- Sonnet
- TRFL
- tensorflow-upstream
- Glue
- Wrangler
- r2d3
- TensorWatch
- Dash
- TensorLight
- PyTorchNet
- pytorch_tabular
- Metabase
- MetaReview - Free online meta-analysis platform with 11 interactive D3.js statistical charts (forest plot, funnel plot, Galbraith, L'Abbé, Baujat, etc.), 5 effect size measures, AI literature screening, and publication-ready report export. [github.com](https://github.com/TerryFYL/metareview)
- raw
- torchvista - Interactive notebook-based tool to visualize the forward pass of any PyTorch model.
-
General Machine Learning Packages
- scikit-learn
- Shogun
- scikit-survival
- scikit-multilearn
- sklearn-expertsys
- scikit-feature
- scikit-rebate
- seqlearn
- sklearn-bayes
- sklearn-crfsuite
- sklearn-deap
- sigopt_sklearn
- sklearn-evaluation
- scikit-image
- scikit-opt
- scikit-posthocs
- pystruct
- xLearn
- cuML
- causalml
- mlpack
- MLxtend
- modAL
- Sparkit-learn
- dlib
- imodels
- RuleFit
- pyGAM
- Deepchecks
- XGBoost
- LightGBM
- CatBoost
- interpretable
- feature-engine
- PerpetualBooster
- hyperlearn
- jSciPy - A Java port of SciPy's signal processing module, offering filters, transformations, and other scientific computing utilities.
- JAX
-
Miscellaneous Tools
- Hortonworks Sandbox
- R
- Tidyverse
- Scikit-Learn
- NumPy - dimensional arrays and matrices and includes an assortment of high-level mathematical functions to operate on these arrays. |
- Vaex
- SciPy
- Data Science Toolbox
- Datadog - scale data science. |
- Variance
- Kite Development Kit
- Apache Flink - purpose data processing. |
- Apache Hama - Level open source project, allowing you to do advanced analytics beyond MapReduce. |
- Apache Spark - fast cluster computing |
- Data Mechanics - friendly and cost-effective. |
- Caffe
- Torch
- Aerosolve
- Datawrapper
- Tensor Flow
- Natural Language Toolkit
- nlp-toolkit for node.js
- Apache Zeppelin - based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more |
- LightTag
- Amazon Rekognition
- Amazon Textract
- Amazon Lookout for Vision
- Amazon CodeGuru - powered recommendations.|
- Statsmodels - based inferential statistics, hypothesis testing and regression framework |
- Gensim - source library for topic modeling of natural language text |
- spaCy
- PyMC3
- Nimblebox - stack MLOps platform designed to help data scientists and machine learning practitioners around the world discover, create, and launch multi-cloud apps from their web browser. |
- Explore Data Science Libraries
- MLflow
- Arize AI - causing issues such as data quality and performance drift. |
- Aureo.io - code platform that focuses on building artificial intelligence. It provides users with the capability to create pipelines, automations and integrate them with artificial intelligence models – all with their basic data. |
- ERD Lab
- Arize-Phoenix - uncover insights, surface problems, monitor, and fine tune your models. |
- Synthical - powered collaborative environment for research. Find relevant papers, create collections to manage bibliography, and summarize content — all in one place |
- The Data Science Lifecycle Process
- Data Science Lifecycle Template Repo
- RexMex
- ChemicalX
- PyTorch Geometric Temporal
- Little Ball of Fur - Learn like API. |
- Karate Club - Learn like API. |
- ML Workspace - in-one web-based IDE for machine learning and data science. The workspace is deployed as a Docker container and is preloaded with a variety of popular data science libraries (e.g., Tensorflow, PyTorch) and dev tools (e.g., Jupyter, VS Code) |
- steppy
- steppy-toolkit
- Data Science Toolbox
- Kite Development Kit
- Weka
- Octave - level interpreted language, primarily intended for numerical computations.(Free Matlab) |
- Hydrosphere Mist
- Torch
- Nervana's python based Deep Learning Framework
- Skale
- Intel framework
- IJulia - language backend combined with the Jupyter interactive environment |
- Featuretools
- Optimus - processing, feature engineering, exploratory data analysis and easy ML with PySpark backend. |
- Albumentations
- Lambdo
- Feast
- Hopsworks - source data-intensive machine learning platform with a feature store. Ingest and manage features for both online (MySQL Cluster) and offline (Apache Hive) access, train and serve models at scale. |
- MindsDB
- Lightwood
- AWS Data Wrangler - source Python package that extends the power of Pandas library to AWS connecting DataFrames and AWS data related services (Amazon Redshift, AWS Glue, Amazon Athena, Amazon EMR, etc). |
- CML - like environments with GitHub Actions & GitLab CI, and autogenerate visual reports on pull/merge requests. |
- Grid Studio - based spreadsheet application with full integration of the Python programming language. |
- Python Data Science Handbook
- Shapley - driven framework to quantify the value of classifiers in a machine learning ensemble. |
- DAGsHub
- Nimblebox - stack MLOps platform designed to help data scientists and machine learning practitioners around the world discover, create, and launch multi-cloud apps from their web browser. |
- Towhee
- LineaPy
- envd
- Explore Data Science Libraries
- MLEM
- cleanlab - centric AI and automatically detecting various issues in ML datasets |
- AutoGluon - series, and multi-modal data |
- Arize-Phoenix - uncover insights, surface problems, monitor, and fine tune your models. |
- Comet
- Opik
- Synthical - powered collaborative environment for research. Find relevant papers, create collections to manage bibliography, and summarize content — all in one place |
- teeplot
- Streamlit
- Gradio
- Weights & Biases
- Optuna
- Ray Tune
- Apache Airflow
- Prefect
- Kedro - source Python framework for creating reproducible, maintainable data science code |
- InterpretML - source package also provides visualization tools for EBMs, other glass-box models, and black-box explanations |
- LIME
- flyte
- dbt
- zasper
- Codeflash - Fast Python Code — Every Time |
- Hugging Face
- Chinese-Elite - source project that automatically maps relationship networks by parsing public data using LLMs and visualizes it as an interactive graph. |
- RunMat - syntax runtime with automatic CPU/GPU execution and fused array kernels. |
- Python - Pandas - Anaconda - ready Python distribution for large-scale data processing, predictive analytics, and scientific computing |
- Data Science Toolbox
- Turbostream - time data streams, without worrying about streaming infra or backpressure. |
- CorpusExplorer
- Data Science Toolbox
- CorpusExplorer
- Trains - Magical Experiment Manager, Version Control & DevOps for AI |
- Pandas GUI
- DVC - source data science version control system. It helps track, organize and make data science projects reproducible. In its very basic scenario it helps version control and share large data and model files. |
- Chaos Genius
- Aureo.io - code platform that focuses on building artificial intelligence. It provides users with the capability to create pipelines, automations and integrate them with artificial intelligence models – all with their basic data. |
- Hamilton
- SHAP
- skrub
- Desbordante - source data profiler specifically focused on discovery and validation of complex patterns, such as [numerical association rules](https://colab.research.google.com/github/Desbordante/desbordante-core/blob/main/examples/notebooks/Numerical_Association_Rules.ipynb), [differential dependencies](https://colab.research.google.com/github/Desbordante/desbordante-core/blob/main/examples/notebooks/Differential_Dependencies.ipynb), [denial constraints](https://colab.research.google.com/github/Desbordante/desbordante-core/blob/main/examples/notebooks/Denial_Constraints.ipynb), and more. |
- xonsh shell - powered shell that enables integration, management and orchestration of data science libraries mostly written in Python, allowing you to build pipelines, code and command-based workflows. It can also be used as a kernel for Jupyter Notebook. |
- Neptune.ai - friendly platform supporting data scientists in creating and sharing machine learning models. Neptune facilitates teamwork, infrastructure management, models comparison and reproducibility. |
- Polars
- DuckDB - process SQL OLAP database management system |
- Nimblebox - stack MLOps platform designed to help data scientists and machine learning practitioners around the world discover, create, and launch multi-cloud apps from their web browser. |
- dna-claude-analysis - style single-page HTML visualization. |
- WFGY ProblemMap
- Deploybase - time GPU and LLM pricing across all cloud and inference providers. |
- DeepAnalyze
- TabGAN
- FileShot.io - knowledge encrypted file sharing (AES-256-GCM in-browser). No account required, MIT licensed, self-hostable, optional link expiry. |
- Disco - values, effect sizes, and literature citations. Free for public data. |
- Annotation Lab - to-End No-Code platform for text annotation and DL model training/tuning. Out-of-the-box support for Named Entity Recognition, Classification, Relation extraction and Assertion Status Spark NLP models. Unlimited support for users, teams, projects, documents. |
- Domino Data Labs
- FunASR - grade speech recognition toolkit supporting 50+ languages with built-in VAD, punctuation, speaker diarization, and emotion detection. OpenAI-compatible API server included. |
- Annotation Lab - to-End No-Code platform for text annotation and DL model training/tuning. Out-of-the-box support for Named Entity Recognition, Classification, Relation extraction and Assertion Status Spark NLP models. Unlimited support for users, teams, projects, documents. |
- AI for Database - refreshing dashboards, and trigger automated workflows based on database changes. |
-
-
Training Resources
-
Colleges
- Data Science Degree @ Berkeley
- Data Science Degree @ UVA
- Data Science Degree @ Wisconsin
- BS in Data Science & Applications
- MS in Computer Information Systems @ Boston University
- MS in Applied Data Science @ Syracuse
- M.S. Management & Data Science @ Leuphana
- Master of Data Science @ Melbourne University
- Msc in Data Science @ The University of Edinburgh
- Master of Management Analytics @ Queen's University
-
Programming Languages
Categories
Sub Categories
Miscellaneous Tools
136
Bloggers
123
Deep Learning Packages
90
Books
84
Datasets
80
Comparison
76
Twitter Accounts
73
Comics
50
MOOC's
45
YouTube Videos & Channels
42
Facebook Accounts
41
Journals, Publications and Magazines
41
General Machine Learning Packages
38
Podcasts
33
Algorithms
28
Tutorials
24
Free Courses
22
Colleges
20
Infographics
15
Presentations
11
Data Science Competitions
6
Tools
4
Research & Knowledge Retrieval
4
Newsletters
4
Telegram Channels
3
Intensive Programs
2
Workflow
2
Hobby
1
Mailing lists
1
GitHub Groups
1
Slack Communities
1
Disaster
1
Frameworks
1
Keywords
machine-learning
86
python
59
deep-learning
58
data-science
50
pytorch
26
tensorflow
21
scikit-learn
13
keras
13
ml
12
data-analysis
12
neural-network
11
reinforcement-learning
11
artificial-intelligence
10
mlops
10
ai
9
data-visualization
9
numpy
8
computer-vision
8
hyperparameter-optimization
7
object-detection
7
awesome-list
7
gradient-boosting
7
neural-networks
7
llm
6
data-mining
6
jupyter
6
pandas
6
r
6
jupyter-notebook
6
data
5
pipeline
5
workflow
5
image-processing
5
big-data
5
spark
5
nlp
5
explainable-ai
5
explainable-ml
5
awesome
5
dataset
5
cli
4
classifier
4
gpu
4
data-engineering
4
node-embedding
4
feature-engineering
4
random-forest
4
network-embedding
4
graph-embedding
4
machine-learning-algorithms
4