fucking-awesome-datascience
📝 An awesome Data Science repository to learn and apply for real world problems. With repository stars⭐ and forks🍴
https://github.com/correia-jpv/fucking-awesome-datascience
Last synced: 4 days ago
JSON representation
-
The Data Science Toolbox
-
General Machine Learning Packages
- jSciPy - A Java port of SciPy's signal processing module, offering filters, transformations, and other scientific computing utilities.
- JAX
- scikit-multilearn
- sklearn-expertsys
- scikit-feature
- scikit-rebate
- seqlearn
- sklearn-bayes
- sklearn-crfsuite
- sklearn-deap
- sigopt_sklearn
- sklearn-evaluation
- scikit-image
- scikit-opt
- scikit-posthocs
- pystruct
- xLearn
- cuML
- causalml
- mlpack
- MLxtend
- modAL
- Sparkit-learn
- Deepchecks
- XGBoost
- dlib
- imodels
- RuleFit
- pyGAM
- LightGBM
- CatBoost
- PerpetualBooster
- scikit-learn
- Shogun
- interpretable
- hyperlearn
-
Miscellaneous Tools
- Turbostream - time data streams, without worrying about streaming infra or backpressure. |
- CorpusExplorer
- Pandas GUI
- Trains - Magical Experiment Manager, Version Control & DevOps for AI |
- skrub
- RunMat - syntax runtime with automatic CPU/GPU execution and fused array kernels. |
- The Data Science Lifecycle Process
- Data Science Lifecycle Template Repo
- RexMex
- ChemicalX
- PyTorch Geometric Temporal
- Little Ball of Fur - Learn like API. |
- Karate Club - Learn like API. |
- ML Workspace - in-one web-based IDE for machine learning and data science. The workspace is deployed as a Docker container and is preloaded with a variety of popular data science libraries (e.g., Tensorflow, PyTorch) and dev tools (e.g., Jupyter, VS Code) |
- steppy
- steppy-toolkit
- Kite Development Kit
- Hydrosphere Mist
- Torch
- Nervana's python based Deep Learning Framework
- Skale
- Intel framework
- IJulia - language backend combined with the Jupyter interactive environment |
- Featuretools
- Optimus - processing, feature engineering, exploratory data analysis and easy ML with PySpark backend. |
- Albumentations
- Lambdo
- Feast
- Hopsworks - source data-intensive machine learning platform with a feature store. Ingest and manage features for both online (MySQL Cluster) and offline (Apache Hive) access, train and serve models at scale. |
- MindsDB
- Lightwood
- AWS Data Wrangler - source Python package that extends the power of Pandas library to AWS connecting DataFrames and AWS data related services (Amazon Redshift, AWS Glue, Amazon Athena, Amazon EMR, etc). |
- CML - like environments with GitHub Actions & GitLab CI, and autogenerate visual reports on pull/merge requests. |
- Grid Studio - based spreadsheet application with full integration of the Python programming language. |
- Python Data Science Handbook
- Shapley - driven framework to quantify the value of classifiers in a machine learning ensemble. |
- Towhee
- LineaPy
- envd
- MLEM
- cleanlab - centric AI and automatically detecting various issues in ML datasets |
- AutoGluon - series, and multi-modal data |
- Comet
- Opik
- teeplot
- Streamlit
- Gradio
- Weights & Biases
- Optuna
- Ray Tune
- Apache Airflow
- Prefect
- Kedro - source Python framework for creating reproducible, maintainable data science code |
- LIME
- flyte
- dbt
- zasper
- Chinese-Elite - source project that automatically maps relationship networks by parsing public data using LLMs and visualizes it as an interactive graph. |
- CorpusExplorer
- InterpretML - source package also provides visualization tools for EBMs, other glass-box models, and black-box explanations |
- Desbordante - source data profiler specifically focused on discovery and validation of complex patterns, such as 🌎 [numerical association rules](colab.research.google.com/github/Desbordante/desbordante-core/blob/main/examples/notebooks/Numerical_Association_Rules.ipynb), 🌎 [differential dependencies](colab.research.google.com/github/Desbordante/desbordante-core/blob/main/examples/notebooks/Differential_Dependencies.ipynb), 🌎 [denial constraints](colab.research.google.com/github/Desbordante/desbordante-core/blob/main/examples/notebooks/Denial_Constraints.ipynb), and more. |
- DVC - source data science version control system. It helps track, organize and make data science projects reproducible. In its very basic scenario it helps version control and share large data and model files. |
- Chaos Genius
- Hamilton
- SHAP
- Desbordante - source data profiler specifically focused on discovery and validation of complex patterns, such as 🌎 [numerical association rules](colab.research.google.com/github/Desbordante/desbordante-core/blob/main/examples/notebooks/Numerical_Association_Rules.ipynb), 🌎 [differential dependencies](colab.research.google.com/github/Desbordante/desbordante-core/blob/main/examples/notebooks/Differential_Dependencies.ipynb), 🌎 [denial constraints](colab.research.google.com/github/Desbordante/desbordante-core/blob/main/examples/notebooks/Denial_Constraints.ipynb), and more. |
-
Deep Learning Packages
- TensorLight
- PyTorch
- torchvision
- torchtext
- torchaudio
- ignite
- pytorch_tabular
- PyToune
- skorch
- PyVarInf
- pytorch_geometric
- Yolov3
- GPyTorch
- pyro
- Catalyst
- Yolov5
- Yolov8
- TensorFlow
- TensorLayer
- TFLearn
- Sonnet
- tensorpack
- NeuPy
- TRFL
- tfdeploy
- Polyaxon
- TensorFlow Fold
- tensorlm
- Mesh TensorFlow
- Ludwig
- TF-Agents
- TensorForce
- keras-contrib
- Hyperas
- Elephas
- Hera
- Spektral
- qkeras
- Talos
- keras-rl
- altair
- amcharts
- anychart
- bokeh
- Comet
- slemma
- cartodb
- Cube
- d3plus
- Data-Driven Documents(D3js)
- dygraphs
- exhibit
- gephi
- ggplot2
- tensorflow-upstream
- Glue
- Netron
- Resseract Lite
- vizzu
- Wrangler
- r2d3
- TensorWatch
- PyTorchNet
- pytorch_tabular
-
Comparison
- datacompy - DataComPy is a package to compare two Pandas DataFrames.
- Boosting
- Temporal difference learning
- Regression
- Linear Regression
- Ordinary Least Squares
- Logistic Regression
- Stepwise Regression
- Multivariate Adaptive Regression Splines
- Softmax Regression
- Locally Estimated Scatterplot Smoothing
- Decision Trees
- ID3 algorithm
- Ensemble Learning
- Bagging
- Random Forest
- AdaBoost
- KNN (K-Nearest Neighbors)
- Self-Organized Maps
- Density-based clustering
- Fuzzy clustering
- Mixture models
- Dimension Reduction
- Latent Dirichlet Allocation (LDA)
- Neural Networks
- Adaptive resonance theory
- Hidden Markov Models (HMM)
- C4.5
- SVM (Support Vector Machine)
- Multilayer Perceptron
- Convolutional Neural Network (CNN)
- Recurrent Neural Network (RNN)
- Heuristic approaches
- Q Learning
- SARSA (State-Action-Reward-State-Action) algorithm
- k-Means
- Apriori
- EM (Expectation-Maximization)
- PageRank
- Naive Bayes
- CART (Classification and Regression Trees)
- Boltzmann Machines
- Autoencoder
- Generative Adversarial Network (GAN)
- Transformer
- ML System Designs)
-
-
Training Resources
-
Free Courses
- AI Expert Roadmap - Roadmap to becoming an Artificial Intelligence Expert
- MLSys-NYU-2022 - Slides, scripts and materials for the Machine Learning in Finance course at NYU Tandon, 2022.
- Hands-on Train and Deploy ML - A hands-on course to train and deploy a serverless API that predicts crypto prices.
- Learning from Data - Introduction to machine learning covering basic theory, algorithms and applications
- Kaggle - Learn about Data Science, Machine Learning, Python etc
- ML Observability Fundamentals - Learn how to monitor and root-cause production ML issues.
- Weights & Biases Effective MLOps: Model Development - Free Course and Certification for building an end-to-end machine using W&B
- Python for Data Science by Scaler - This course is designed to empower beginners with the essential skills to excel in today's data-driven world. The comprehensive curriculum will give you a solid foundation in statistics, programming, data visualization, and machine learning.
- LLMOps: Building Real-World Applications With Large Language Models - Learn to build modern software with LLMs using the newest tools and techniques in the field.
- Prompt Engineering for Vision Models - Learn to prompt cutting-edge computer vision models with natural language, coordinate points, bounding boxes, segmentation masks, and even other images in this free course from DeepLearning.AI.
- Data Science Course By IBM - Free resources and learn what data science is and how it’s used in different industries.
-
MOOC's
- Data Science Specialization
- Coursera Introduction to Data Science
- Data Science - 9 Steps Courses, A Specialization on Coursera
- Data Mining - 5 Steps Courses, A Specialization on Coursera
- Machine Learning – 5 Steps Courses, A Specialization on Coursera
- CS 109 Data Science
- OpenIntro
- CS 171 Visualization
- Process Mining: Data science in Action
- Oxford Deep Learning
- Oxford Machine Learning
- UBC Machine Learning - video
- Programming with Julia
- Coursera Big Data Specialization
- Scaler Data Science & Machine Learning Program
- Cognitive Class AI by IBM
- Udacity - Deep Learning
- Keras in Motion
- Microsoft Professional Program for Data Science
- COMP3222/COMP6246 - Machine Learning Technologies
- CS 231 - Convolutional Neural Networks for Visual Recognition
- Coursera Tensorflow in practice
- Coursera Deep Learning Specialization
- 365 Data Science Course
- Coursera Natural Language Processing Specialization
- Coursera GAN Specialization
- Codecademy's Data Science
- Linear Algebra - Linear Algebra course by Gilbert Strang
- Data Science: Statistics & Machine Learning
- Recommender Systems Specialization from University of Minnesota
- Stanford Artificial Intelligence Professional Program
- Data Scientist with Python
- Data Science Skill Tree
- Data Science for Beginners - Learn with AI tutor
- Machine Learning for Beginners - Learn with AI tutor
-
Colleges
- A list of colleges and universities offering degrees in data science.
- Data Science Degree @ Berkeley
- Data Science Degree @ UVA
- BS in Data Science & Applications
- MS in Computer Information Systems @ Boston University
- M.S. Management & Data Science @ Leuphana
- Master of Data Science @ Melbourne University
- Master of Management Analytics @ Queen's University
- Master of Data Science @ Illinois Institute of Technology
- Master of Applied Data Science @ The University of Michigan
- Master's Degree in Data Science and Computer Engineering @ University of Granada
-
Tutorials
- #tidytuesday
- Tutorials of source code from the book Genetic Algorithms with Python by Clinton Sheppard
- Tutorials to get started on signal processing for machine learning
- Minimum Viable Study Plan for Machine Learning Interviews
- Data science your way
- PySpark Cheatsheet
- Machine Learning, Data Science and Deep Learning with Python
- Your Guide to Latent Dirichlet Allocation
- Python for Data Science: A Beginner’s Guide
- 12 free Data Science projects to practice Python and Pandas
- Best CV/Resume for Data Science Freshers
- Understand Data Science Course in Java
- Data Analytics Interview Questions (Beginner to Advanced)
- Top 100+ Data Science Interview Questions and Answers
-
Intensive Programs
-
-
What is Data Science?
- Data Science For Beginners - week, 20-lesson curriculum all about Data Science. |
- What is Data Science @ Quora
- The sexiest job of 21st century
- Wikipedia
- How to Become a Data Scientist
- a very short history of #datascience - -computer science. The term “Data Science” has emerged only recently to specifically designate a new profession that is expected to make sense of the vast stores of big data. But making sense of data has a long history and has been discussed by scientists, statisticians, librarians, computer scientists and others for years. The following timeline traces the evolution of the term “Data Science” and its use, attempts to define it, and related terms._ |
- Data Scientist Roadmap - driven world where approx 328.77 million terabytes of data are generated daily. And this number is only increasing day by day, which in turn increases the demand for skilled data scientists who can utilize this data to drive business growth._|
- Navigating Your Path to Becoming a Data Scientist - demand careers today. With businesses increasingly relying on data to make decisions, the need for skilled data scientists has grown rapidly. Whether it’s tech companies, healthcare organizations, or even government institutions, data scientists play a crucial role in turning raw data into valuable insights. But how do you become a data scientist, especially if you’re just starting out? _|
-
Where do I Start?
- Scikit-Learn - purpose data science package which implements the most popular algorithms - it also includes rich documentation, tutorials, and examples of the models it implements. Even if you prefer to write your own implementations, Scikit-Learn is a valuable reference to the nuts-and-bolts behind many of the common algorithms you'll find. With [Pandas](https://pandas.pydata.org/), one can collect and analyze their data into a convenient table format. [Numpy](https://numpy.org/) provides very fast tooling for mathematical operations, with a focus on vectors and matrices. [Seaborn](https://seaborn.pydata.org/), itself based on the [Matplotlib](https://matplotlib.org/) package, is a quick way to generate beautiful visualizations of your data, with many good defaults available out of the box, as well as a gallery showing how to produce many common visualizations of your data.
-
Agents
-
Frameworks
- ADK-Rust - Production-ready AI agent development kit for Rust with model-agnostic design (Gemini, OpenAI, Anthropic), multiple agent types (LLM, Graph, Workflow), MCP support, and built-in telemetry.
-
-
Literature and Media
-
Books
- Mining Massive Datasets - free e-book comprehended by an online course
- Advances in Genetic Programming, Vol. 3 - Free Download
-
Bloggers
- Greg Reda - Greg Reda Personal Blog
- Drew Conway - Personal Web Page
- Noah Iliinsky - Personal Blog
- Clare Corthell - The Open Source Data Science Masters
- Emilio Ferrara's web page
-
Presentations
-
Podcasts
-
-
Socialize
-
GitHub Groups
-
-
Fun
-
Other Awesome Lists
-
Comics
- awesome-awesomeness
- Awesome Machine Learning
- lists
- awesome-dataviz
- awesome-python
- Data Science IPython Notebooks.
- awesome-r
- awesome-datasets
- awesome-Machine Learning & Deep Learning Tutorials
- Awesome Data Science Ideas
- Machine Learning for Software Engineers
- Awesome Machine Learning On Source Code
- Awesome Community Detection
- Awesome Graph Classification
- Awesome Decision Tree Papers
- Awesome Fraud Detection Papers
- Awesome Gradient Boosting Papers
- Awesome Computer Vision Models
- Awesome Monte Carlo Tree Search
- 100 NLP Papers
- Awesome Game Datasets
- Data Science Interviews Questions
- Awesome Explainable Graph Reasoning
- Awesome Drug Synergy, Interaction and Polypharmacy Prediction
- Data Science Projects
- Awesome Data Analysis - A curated list of data analysis tools, libraries and resources.
-
Hobby
-
-
Source
Programming Languages
Categories
Sub Categories
Keywords
machine-learning
86
deep-learning
58
python
56
data-science
51
pytorch
25
tensorflow
21
keras
13
ml
12
data-analysis
12
scikit-learn
12
neural-network
11
reinforcement-learning
11
artificial-intelligence
10
mlops
10
data-visualization
10
ai
9
data-mining
8
awesome-list
8
computer-vision
8
object-detection
7
gradient-boosting
7
neural-networks
7
hyperparameter-optimization
7
numpy
7
llm
6
jupyter
6
r
6
jupyter-notebook
5
data
5
explainable-ml
5
image-processing
5
workflow
5
big-data
5
pandas
5
explainable-ai
5
nlp
5
spark
5
pipeline
5
awesome
5
statistics
4
scientific-computing
4
gpu
4
random-forest
4
classifier
4
data-engineering
4
reproducibility
4
dataset
4
network-science
4
node-embedding
4
machine-learning-algorithms
4