awesome-data-analysis
π 500+ curated resources for Data Analysis & Data Science: Python, SQL, Statistics, ML, AI, Visualization, Cheatsheets, Roadmaps, Interview Prep. For beginners and experts.
https://github.com/pavelgrigoryevds/awesome-data-analysis
Last synced: 1 day ago
JSON representation
-
ποΈ SQL & Databases
-
Resources
- MySQL Tutorial by W3Resource - Tutorial for MySQL.
- Awesome MongoDB - A curated list of awesome MongoDB resources, libraries, tools, and applications.
- Awesome Duckdb - Curated tools, resources, and extensions for DuckDB analytical database.
- EverSQL - AI-powered SQL query optimization and database observability tool.
- GeeksforGeeks - SQL Tutorial - Detailed SQL tutorial.
- Awesome Postgres - A curated list of awesome PostgreSQL software, libraries, tools and resources.
- Awesome MySql - A curated list of awesome MySQL software, libraries, tools and resources.
- Awesome SQLAlchemy - A curated list of awesome tools for SQLAlchemy.
- Practice Window Functions - on problems with hints and solutions.
- Practice Window Functions - Free interactive SQL tutorial site focused on mastering window functions through 80+ hands-on problems with hints and solutions.
- SQLZoo - SQL Tutorial - Interactive SQL tutorial.
- SQL Bolt - Learn SQL - Learn SQL through interactive lessons.
- SQL Tutorial - Comprehensive SQL tutorial resource.
- SQL Tutorial by W3Schools. - Comprehensive SQL tutorial.
- PostgreSQL Tutorial by W3Resource - Tutorial for PostgreSQL.
- MongoDB Tutorial by W3Resource - Tutorial for MongoDB.
- Awesome Database Learning - Educational resources on database internals, distributed systems, and storage.
- Awesome Clickhouse - A curated list of awesome ClickHouse software.
- Awesome Sql - List of tools and techniques for working with relational databases.
- AnimateSQL - Interactive tool that visualizes the step-by-step execution of SQL queries.
- SQL Tips and Tricks - Useful SQL techniques and optimizations for data analysis.
-
Tools
- TimescaleDB - An open-source time-series SQL database optimized for fast ingest and complex queries.
- PyMySQL - A pure-Python MySQL client library for interacting with MySQL databases from Python applications.
- DuckDB - In-memory analytical database for fast SQL queries.
- Psycopg2 - PostgreSQL database adapter.
- MySQL Connector/Python - MySQL driver for Python.
- DB Browser for SQLite - A high quality, visual, open source tool to create, design, and edit database files compatible with SQLite.
- SQLite - A C-language library that implements a small, fast, self-contained, high-reliability, full-featured SQL database engine.
- Records - SQL queries to databases via Python syntax.
- PyODBC - Python library for ODBC database access.
- SQLAlchemy - SQL toolkit and ORM for Python.
- PonyORM - ORM for Python with dynamic query generation.
- PyMongo - Official MongoDB driver for Python.
- SQLiteviz - A tool for exploring SQLite databases and visualizing the results of your queries.
- DBeaver - A free universal database tool and SQL client for developers, SQL programmers, and administrators.
- Beekeeper Studio - A modern, easy-to-use SQL client and database manager with a clean, cross-platform interface.
- SQLFluff - A modular SQL linter and auto-formatter designed to enforce consistent style and catch errors in SQL code.
- Vanna.AI - An AI-powered tool for generating SQL queries from natural language questions.
- SQLChat - A chat-based SQL client that allows you to query databases using natural language conversations.
- Dataset - JSON-like interface for working with SQL databases.
- SQLGlot - A no-dependency SQL parser, transpiler, and optimizer for Python.
- TDengine - An open-source big data platform designed for time-series data, IoT, and industrial monitoring.
-
-
π² Statistics & Probability
-
Resources
- The Statistics Handbook - Open-source statistics hands-on handbook.
- All of Statistics - Resource for studying statistics based on Wasserman's book.
- Think Stats - Book and code for an introduction to Probability and Statistics.
- The Effect - Modern introduction to causality and research design.
- Awesome Statistics - A curated list of statistics resources, software, and learning materials.
- Seeing Theory - Interactive visual resource for learning probability and statistics.
- Code repository for O'Reilly book - Companion code for a practical statistics book.
- StatLect - Comprehensive online textbook covering probability and statistics concepts.
- stanford.edu - Probabilities and Statistics - Refresher course on probabilities and statistics from Stanford University.
- Bayesian Methods for Hackers - Resource for learning Bayesian methods in Python.
- Bayesian Modeling and Computation in Python - Code for the book "Bayesian Modeling and Computation in Python".
- Stat Trek - A resource for learning statistics and probability, with tutorials and tools.
- Statistical Learning Theory - Stanford University - Lecture notes on statistical learning theory.
- Think Bayes 2 - Book and code for Bayesian statistical methods.
- The Elements of Statistical Learning - Notebooks for understanding statistical learning concepts.
- Online Statistics Book - An interactive online statistics book with simulations and demonstrations.
- Causal Inference: The Mixtape - Practical guide to causal inference methods.
-
Tools
- Lifelines - Survival analysis and event history analysis in Python.
- NumPyro - A probabilistic programming library built on JAX for high-performance Bayesian modeling.
- Causal Impact - A Python implementation of the R package for causal inference using Bayesian structural time-series models.
- DoWhy - A Python library for causal inference that supports explicit modeling and testing of causal assumptions.
- Patsy - A Python library for describing statistical models and building design matrices.
- Pomegranate - Fast and flexible probabilistic modeling library for Python with GPU support.
- Pgmpy - Python library for probabilistic and causal inference using graphical models.
- SciPy - Fundamental library for scientific computing and statistics.
- Statsmodels - Statistical modeling, testing, and data exploration.
- PyMC - A probabilistic programming library for Python that allows for flexible Bayesian modeling.
- Pingouin - Statistical package with improved usability over SciPy.
- scikit-posthocs - Post-hoc tests for statistical analysis of data.
- scikit-survival - Survival analysis built on scikit-learn for time-to-event prediction.
- Bootstrap - Bootstrap confidence interval estimation methods.
- PyStan - Python interface to Stan for Bayesian statistical modeling.
- ArviZ - Exploratory analysis of Bayesian models with visual diagnostics.
- PyGAM - A Python library for generalized additive models with built-in smoothing and regularization.
-
-
π§ AI Applications & Platforms
-
Tools
- OpenManus - Open-source platform for building and deploying AI agents.
- youtu-agent - Multi-modal intelligent agent framework by Tencent Cloud.
- trae-agent - Tool-using reasoning agent with execution-augmented reasoning.
- deepagents - LangChain framework for building sophisticated multi-agent systems.
- mem0 - AI memory system for long-term context and personalized interactions.
- web-ui - AI-powered browser automation framework for web interaction.
- autogen - Framework for building multi-agent conversational systems.
- AutoGPT - Autonomous AI agent that can complete complex tasks.
- Agents.md - Open source framework for building agentic AI systems.
- tabby - Self-hosted AI coding assistant.
- LLaMA-Factory - Easy-to-use LLM fine-tuning framework.
- open-webui - Web interface for interacting with various LLMs.
- ComfyUI - Visual node-based interface for Stable Diffusion.
- lobe-chat - Modern AI conversation interface.
- Deep Research - AI-powered research assistant for iterative, deep research on any topic.
- LangChain - Framework for developing applications powered by language models.
- LlamaIndex - Data framework for LLM-based applications with RAG capabilities.
- openai-python - Official Python library for OpenAI API.
- OpenLLM - Open platform for operating large language models in production.
- Fabric - Framework for augmenting humans using AI.
- Agent-S - Open agentic framework that autonomously interacts with computer GUIs like a human.
- Langflow - Powerful visual platform for building and deploying AI-powered agents and workflows.
- NeMo - Scalable generative AI framework from NVIDIA for LLMs, Multimodal, and Speech AI.
- Bagel - Open-source unified multimodal model for understanding and generating images.
- crewAI - Framework for orchestrating role-playing AI agents.
- LangGraph - Framework for building stateful, multi-actor applications with LLMs, with cycles and control flow.
- openai-agents-python - Official OpenAI framework for building AI agents.
- Dyad - Open-source platform for building AI applications with custom API keys.
- gpt-engineer - AI-powered code generation tool.
- dify - Visual LLM application development platform.
- upscayl - AI-powered image upscaling tool.
- facefusion - AI face swapping and enhancement tool.
- DocsGPT - Documentation-based question answering system.
- Whisper - Robust speech recognition model for transcription and translation.
- n8n - Workflow automation platform for connecting APIs and services.
- ragflow - Open-source RAG (Retrieval-Augmented Generation) workflow platform.
- firecrawl - Web crawling and data extraction service for AI applications.
- gpt-pilot - AI pair programmer that writes entire applications.
- LocalAI - Self-hosted, local-first AI model deployment platform.
- unsloth - Library for faster and more memory-efficient LLM fine-tuning.
- LibreChat - Open-source ChatGPT alternative.
- quivr - Personal second brain and AI assistant.
-
Resources
- AI Agents for Beginners - Microsoft's course on designing and building AI agents.
- Generative AI for Beginners - Course on generative AI for beginners from Microsoft.
- LLM Course - Practical course to master large language models from start to finish.
- Awesome AI Agents - A curated list of AI autonomous agents, environments, and frameworks.
- AI Collection - The Generative AI Landscape - A Collection of Awesome Generative AI Applications.
- Awesome LLM Security - A curation of awesome tools, documents and projects about LLM Security.
- 500 AI Agents Projects - 500+ AI agent projects with code for learning and inspiration.
- Prompt Engineering Guide - Guides, papers, and resources for prompt engineering with LLMs.
- Prompt Engineering - Collection of prompt engineering techniques and strategies.
- Awesome Generative AI - A curated list of modern Generative Artificial Intelligence projects and services.
- Awesome LLM Apps - Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models.
- Awesome AI Apps - A collection of projects showcasing RAG, agents, workflows, and other AI use cases.
- System Prompts and Models - System Prompts, Internal Tools & AI Models from various AI applications and coding tools.
- RAG Techniques - Collection of advanced techniques for Retrieval-Augmented Generation.
- Awesome LangChain - Awesome list of tools and projects with the awesome LangChain framework.
- Awesome AI Tools - A curated list of Artificial Intelligence Top Tools.
- Claude Cookbooks - Official Anthropic examples and recipes for working with Claude AI.
- Hands On Large Language Models - Covers LLM fundamentals, prompt engineering, and fine-tuning.
- AI Engineering Hub - Resources for building, deploying, and maintaining AI systems.
- Agents Towards Production - Code-first tutorials for building production-grade GenAI agents.
- LLM Engineer Toolkit - Curated list of 120+ LLM libraries across various categories.
- GenAI Agents - Repository of AI agent implementations and tutorials.
- AI Notes - Personal notes and essays on AI and software development.
- Open LLMs - Comprehensive list of open-source large language models and their capabilities.
- Generative AI - Roadmap and resources for mastering generative AI technologies.
-
-
π Awesome Data Science Repositories
- Data Science Best Resources - Carefully curated links for data science resources in one place.
- Data Science for Beginners - Microsoft's data science curriculum.
- Data Science Articles from CodeCut - A collection of articles, videos, and code related to data science.
- Data Science Using Python - Resources for data analysis using Python.
- Awesome Data Science - A curated list of courses, books, tools, and resources for data science.
- OSSU Data Science - Open Source Society University's self-study path.
-
π Python
-
Useful Python Tools for Data Analysis
- Polars - Multithreaded, vectorized query engine for DataFrames.
- Fugue - Unified interface for Pandas, Spark, and Dask.
- IGraph - A library for creating and manipulating graphs and networks, with bindings for multiple languages.
- ImageIO - A library that provides an easy interface to read and write a wide range of image data.
- TheFuzz - Fuzzy string matching (Levenshtein distance).
- DateUtil - Extensions for standard Python datetime features.
- Pandas Stubs - Type stubs for pandas, improves IDE autocompletion.
- AutoViz - Automatic data visualization in 1 line of code.
- Category Encoders - Extensive collection of categorical variable encoders.
- Pendulum - Alternative to datetime with timezone support.
- DataCleaner - Python tool for automatically cleaning and preparing datasets.
- Great Tables - Create awesome display tables using Python.
- Joblib - A lightweight pipelining library for Python, particularly useful for saving and loading large NumPy arrays.
- fitter - Figures out the distribution your data comes from.
- Arrow - Enhanced work with dates and times.
- Cerberus - Data validation through schemas.
- Pandera - Data validation through declarative schemas.
- Petl - ETL tool for data cleaning and transformation.
- Pandarallel - Parallel operations for pandas DataFrames.
- Dask - Parallel computing for arrays and DataFrames.
- Pillow - Image processing library.
- Geopy - Geocoding addresses and calculating distances.
- Scattertext - Beautiful visualizations of language differences among document types.
- CuPy - A NumPy-compatible array library accelerated by NVIDIA CUDA for high-performance computing.
- Sweetviz - Automatic EDA with dataset comparison.
- Yellowbrick - Visual diagnostic tools for machine learning.
- DataMapPlot - Create beautiful plots of data maps.
- Prince - Multivariate exploratory data analysis (PCA, CA, MCA).
- Mimesis - Generates realistic test data.
- PyOD - Outlier and anomaly detection.
- Pandas DQ - Data type correction and automatic DataFrame cleaning.
- Modin - Speeds up Pandas by distributing computations.
- Pandas Flavor - Add custom methods to Pandas.
- Pandas DataReader - Reads data from various online sources into pandas DataFrames.
- Lux - Automatic DataFrame visualization in Jupyter.
- YData Profiling - Data quality profiling & exploratory data analysis.
- Missingno - Visualize missing data patterns.
- Datashader - Quickly and accurately render even the largest data.
- PandasAI - Conversational data analysis using LLMs and RAG.
- Mito - Jupyter extensions for faster code writing.
- D-Tale - Interactive GUI for data analysis in a browser.
- Pandasgui - GUI for viewing and filtering DataFrames.
- PyGWalker - Interactive UIs for visual analysis of DataFrames.
- Pivottablejs - Interactive PivotTable.js tables in Jupyter.
- Alibi Detect - Outlier, adversarial and drift detection.
- Pydantic - Data validation using Python type annotations.
- Dora - Automate EDA: preprocessing, feature engineering, visualization.
- Great Expectations - Data validation and testing.
- FeatureTools - Automated feature engineering.
- Feature Engine - Feature engineering with Scikit-Learn compatibility.
- Fitter - Figures out the distribution your data comes from.
- Feature Selector - Tool for dimensionality reduction of machine learning datasets.
- Imbalanced Learn - Handling imbalanced datasets.
- cuDF - A GPU DataFrame library for loading, joining, and aggregating data.
- Faker - Generates fake data for testing.
- PySAL - Spatial analysis functions.
- Factor Analyzer - A Python package for factor analysis, including exploratory and confirmatory methods.
- Texthero - Text preprocessing, representation and visualization.
- Geopandas - Geographic data operations with pandas.
- NetworkX - Network analysis and graph theory.
- Vizro - Low-code toolkit for building data visualization apps.
- Vaex - High-performance Python library for lazy Out-of-Core DataFrames.
- Sklearn Pandas - Bridge between Pandas and Scikit-learn.
- Numba - A JIT compiler that translates a subset of Python and NumPy code into fast machine code.
- QGrid - Interactive grid for DataFrames in Jupyter.
-
Resources
- Best of Python - A ranked list of awesome Python open-source libraries and tools.
- Data Science Python - Common data analysis and machine learning tasks using Python.
- Python for Algorithms & Interviews - Files for Udemy course on algorithms and data structures.
- List of Python Api Wrappers - List of Python API wrappers and libraries.
- Awesome Python - An opinionated list of awesome Python frameworks, libraries, software, and resources.
- 30 Days Of Python - A 30-day programming challenge to learn the Python programming language.
- Python Data Science Handbook - Full text of the "Python Data Science Handbook" in Jupyter Notebooks.
- W3Schools Python - A beginner-friendly tutorial and reference for the Python programming language.
- Tanu N Prabhu Python - This repository helps you understand Python from scratch.
- Think Python - Jupyter notebooks and other resources for Think Python by Allen Downey.
- GeeksforGeeks Python - Python tutorial from GeeksforGeeks.
- Real Python Tutorials - Tutorials on Python from Real Python.
- Awesome Python Data Science - A curated list of Python resources for data science.
- Interactive Coding Challenges - 120+ interactive Python coding interview challenges.
- Clean Code Python - Clean Code concepts adapted for Python.
-
Data Manipulation with Pandas and Numpy
- NumPy 100 Exercises - A collection of 100 exercises to master the NumPy library for scientific computing.
- Awesome Pandas - A curated list of resources for using the Pandas library.
- 100 data puzzles for pandas - A collection of data puzzles to practice your Pandas skills.
- Pandas Tutor - Visualize Pandas operations step-by-step (perfect for beginners).
- Effective Pandas - A series focused on writing effective and idiomatic Pandas code.
- Pandas Exercises - Exercises designed to help you improve your Pandas skills.
- Pandas Cookbook - A cookbook with various recipes for using Pandas effectively.
- Hands-On Data Analysis with Pandas - Materials for following along with Hands-On Data Analysis with Pandas.
- From Python to Numpy - An open-access book on vectorization and efficient numerical computing with NumPy.
-
-
π Data Visualization
-
Resources
- Scientific Visualization Book - Guide to creating effective scientific visualizations and plots.
- Data Visualization Catalogue - A comprehensive catalog of data visualization types.
- Data Viz Project - A resource for selecting suitable visualizations.
- Friends Don't Let Friends - A collection of bad data visualization practices and better alternatives.
- Colorgorical - Resource for generating categorical color palettes using perceptual principles.
- The Python Graph Gallery - A collection of Python graph examples for data visualization.
- Cedric Scherer's DataViz Resources - A collection of top data visualization resources and inspiration.
- Natural Colours - A digital archive of historical color systems and pigments.
- From Data to Viz - A guide to choosing the right visualization based on your data.
- Awesome DataViz - A curated list of awesome data visualization libraries, tools, and resources.
- Visualization Curriculum - Interactive notebooks designed to teach data visualization concepts.
- FlowingData - Insights on data analysis and visualization.
- Chartopedia - A guide to help you select the appropriate chart types.
- DataForVisualization - Tutorials and insights on data visualization techniques.
- Truth & Beauty - Exploration of the aesthetics of data visualization.
- Information is Beautiful - A site dedicated to visualizations that make complex ideas clear and engaging.
- Plottie - A vast library of scientific plots for visualization inspiration and ideas.
-
Tools
- Matplotlib - A comprehensive library for creating static, animated, and interactive visualizations in Python.
- Seaborn - A statistical data visualization library based on Matplotlib.
- Plotly - A library for creating interactive plots and dashboards.
- Altair - A declarative statistical visualization library for Python.
- Python for Geo - Contextily: add background basemaps to your plots in GeoPandas.
- Altair - A declarative statistical visualization library for Python.
- Plotnine - A grammar of graphics for Python.
- Pygal - A Python SVG charting library.
- Plotnine - A grammar of graphics for Python.
- Deck.gl - A WebGL-powered framework for visual exploratory data analysis of large datasets.
- OSMnx - A package to easily download, model, analyze, and visualize street networks from OpenStreetMap.
- Apache ECharts - A powerful, interactive charting and visualization library for browser-based applications.
- VisPy - A high-performance interactive 2D/3D data visualization library leveraging the power of OpenGL.
- Glumpy - A Python library for scientific visualization that is fast, scalable and beautiful, based on OpenGL.
- Pandas-bokeh - Bokeh plotting backend for Pandas.
- Bokeh - A library for creating interactive visualizations for modern web browsers.
- HoloViews - A tool for building complex visualizations easily.
- Geopandas - An extension of Pandas for geospatial data.
- Folium - A library for visualizing data on interactive maps.
- Bqplot - A plotting library for IPython/Jupyter notebooks.
- PyPalettes - A large (+2500) collection of color maps for Python.
-
-
π’ Mathematics
-
Tools
- Immersive Linear Algebra - Interactive resource for understanding linear algebra.
- Stats Maths with Python - Collection of Python scripts and notebooks for statistics and mathematics.
- MML Bool - Comprehensive resource for mathematics in machine learning.
- 3Blue1Brown - Visual explanations of mathematical concepts through animated videos.
- Hackermath - Resource for learning statistics and mathematics for data science.
- Fast.ai - Computational Linear Algebra - Resource for learning linear algebra computationally.
- Awesome Math - A curated list of mathematics resources, books, and online courses.
-
-
π Natural Language Processing (NLP)
-
Resources
- The NLP Pandect - Comprehensive NLP guide covering theory, models, and practical implementations.
- NLP in Python with Deep Learning - A resource for learning NLP with deep learning.
- Hands on NLTK Tutorial - The hands-on NLTK tutorial for NLP in Python.
- Awesome Nlp - A ranked list of awesome Python libraries for natural language processing (NLP).
- Hugging Face NLP Course - Official course on transformers and NLP from Hugging Face.
- Practical NLP Code - Code examples and notebooks for practical natural language processing.
- Oxford Deep NLP Lectures - Lecture materials from Oxford's Deep Natural Language Processing course.
- NLTK Book - Natural Language Processing with Python.
- NLP with Python by Susan Li - Jupyter notebooks demonstrating various NLP techniques and applications.
- YSDA NLP Course - Yandex School of Data Analysis course on Natural Language Processing.
-
Tools
- TextBlob - A simple library for processing textual data.
- OpenHands - A library and framework for building applications with large language models.
- Rasa - Open-source framework for building contextual AI assistants and chatbots.
- John Snow Labs Spark-NLP - A state-of-the-art Natural Language Processing library built on Apache Spark.
- TextAttack - A Python framework for adversarial attacks, data augmentation, and model training in NLP.
- Gensim - Topic modeling and natural language processing library for Python.
- TextRank - A library for TextRank algorithm implementation.
- Natural Language Toolkit (NLTK) - A leading platform for building Python programs to work with human language data.
- SpaCy - An open-source software library for advanced NLP in Python.
- BERT - A transformer-based model for NLP tasks.
- Flair - A simple framework for state-of-the-art NLP.
- Stanford CoreNLP - A Java suite of core NLP tools providing fundamental linguistic analysis capabilities.
- Stanza - Python NLP library for many human languages, from the Stanford NLP Group.
- SentenceTransformers - Framework for state-of-the-art sentence and text embeddings.
- LangExtract - Google's library for structured information extraction from text using language models.
-
-
π€ Machine Learning & AI
-
Resources
- Awesome Machine Learning - A curated list of awesome Machine Learning frameworks, libraries and software.
- Machine Learning Tutorials - Machine learning and deep learning tutorials, articles and other resources.
- Awesome Deep Learning - A curated list of awesome Deep Learning tutorials, projects and communities.
- Best of ML Python - A ranked list of awesome machine learning Python libraries and tools.
- mlcourse.ai - Open Machine Learning Course with practical assignments and real-world applications.
- Machine Learning Zoomcamp - A free practical machine learning course focused on building and deploying models.
- Google Research - Official repository for Google Research projects and publications.
- 100 Days of ML Coding - A comprehensive coding challenge to learn machine learning over 100 days.
- Made With ML - Resource for building and deploying machine learning applications.
- Awesome LLM - A curated list of papers, projects, and resources related to Large Language Models.
- Machine Learning with Python by Susan Li - Jupyter notebooks covering various machine learning algorithms and applications.
- Handson-ml3 - Hands-on guide to machine learning and deep learning using Python.
- Microsoft ML for Beginners - A beginner-friendly introduction to machine learning concepts and practices.
- AI For Beginners - Microsoft's curriculum on artificial intelligence.
- LLMs-from-scratch - Educational repository for building LLMs from scratch.
- Understanding Deep Learning - Comprehensive and accessible textbook on deep learning fundamentals.
- Deep Learning Papers Reading Roadmap - Curated roadmap of seminal deep learning papers for newcomers.
- Applied ML - Curated resources and tools for applied machine learning in industry.
- Awesome Artificial Intelligence - A curated list of artificial intelligence resources.
- Awesome Generative AI Guide - A comprehensive guide to generative AI models, tools, and applications.
- Annotated deep learning paper implementations - Implementations of deep learning papers with annotated code.
-
Tools
- HuggingFace Transformers - Model-definition framework for state-of-the-art machine learning models.
- YOLOv5 - Real-time object detection system.
- TensorFlow - End-to-end open source platform for machine learning and deep learning.
- PyTorch - Deep learning framework with strong support for research and production.
- PyTorch Lightning - PyTorch wrapper for high-performance AI research.
- HuggingFace Diffusers - Library for state-of-the-art pretrained diffusion models.
- Skorch - Scikit-learn compatible neural network library.
- Sonnet - DeepMind's library for building complex neural networks.
- JAX - Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more.
- TensorFlow Models - Official TensorFlow repository with models and examples.
- Ultralytics - YOLOv8 and other computer vision models.
- PEFT - Library for efficiently adapting large pretrained models.
- Optuna - Hyperparameter optimization framework.
- CatBoost - High-performance gradient boosting on decision trees with categorical features support.
- Pyro - Deep universal probabilistic programming with Python and PyTorch.
- Scikit-learn - Machine learning library for classical algorithms and model building.
- XGBoost - Optimized distributed gradient boosting library for tree-based models.
- H2O-3 - Open-source distributed machine learning platform.
- SHAP - Game theoretic approach to explain the output of any machine learning model.
- LightGBM - Fast, distributed, high-performance gradient boosting framework.
- cuML - GPU-accelerated machine learning algorithms from RAPIDS.
- dlib - Modern C++ toolkit containing machine learning algorithms and tools.
- InterpretML - Fit interpretable models and explain blackbox machine learning.
- PyTorch Ignite - High-level library to help with training and evaluating neural networks.
- Keras - High-level neural networks API, running on top of TensorFlow.
- Fast.ai - Deep learning library simplifying training fast and accurate neural nets.
- ONNX - Open standard for machine learning interoperability.
- PyTorch Geometric - Geometric deep learning extension library for PyTorch.
-
-
βοΈ Cloud Platforms & Infrastructure
-
Resources
- Awesome Kubernetes Resources - A curated list of awesome Kubernetes tutorials, tools, and resources.
- Awesome Cloud Security - A curated list of awesome cloud security resources, tools, and best practices.
- DevOps Exercises - Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, and more.
- Awesome Cloud Native - A curated list of resources for cloud native technologies.
- Awesome Kubernetes - A curated list for awesome Kubernetes resources.
- AWS EKS Best Practices - A best practices guide for Amazon EKS.
- Awesome Docker - A curated list of Docker resources and projects.
- AWS Well-Architected Labs - Hands-on labs to help you learn about the AWS Well-Architected Framework.
- Kubernetes The Hard Way - Tutorial for bootstrapping a Kubernetes cluster the hard way on Google Cloud Platform.
- Awesome Compose - A curated list of Docker Compose samples.
- Awesome Selfhosted Docker - A curated list of awesome selfhosted applications and solutions using Docker.
- Awesome Selfhosted - A list of Free Software network services and web applications which can be hosted locally.
-
Tools
- Helm - Package manager for Kubernetes.
- Flagger - Progressive delivery operator.
- Spinnaker - Multi-cloud continuous delivery.
- Dagger - Portable devkit for CI/CD pipelines.
- Harness - End-to-end developer platform.
- Tilt - Local development for Kubernetes.
- KubeVela - Application delivery platform.
- Docker Compose - A tool for defining and running multi-container Docker applications.
- OpenTofu - Open source fork of Terraform.
- Pulumi - Modern IaC platform using familiar programming languages.
- CDK8s - Define Kubernetes apps using familiar languages.
- Jenkins - Open source automation server.
- Argo CD - Declarative GitOps continuous delivery.
- Argo Workflows - Container-native workflow engine.
- Envoy Gateway - Manages Envoy Proxy as gateway.
- Higress - Cloud-native API gateway based on Istio.
- Meshery - Service mesh management.
- Kustomize - Configuration customization for Kubernetes.
- Kubernetes Dashboard - Web-based UI for Kubernetes.
- Skaffold - Continuous development for Kubernetes.
- KubeSphere - Kubernetes multi-cloud management.
- Crossplane - Cloud native control plane.
- Artifact Hub - Kubernetes packages and Helm charts.
- Devtron - Kubernetes dashboard.
- Kubernetes - Production-grade container orchestration system.
- Kompose - Conversion tool from Docker Compose to Kubernetes.
- Terraform - Infrastructure as Code tool.
- Tekton - Kubernetes-native CI/CD framework.
- Traefik - Modern HTTP reverse proxy and load balancer.
- Kong - Cloud-native API Gateway.
- Apache APISIX - Dynamic API gateway.
- Docker - Open platform for developing, shipping, and running applications in containers.
-
-
β‘ Productivity
-
Resources
- AFFiNE - All-in-one workspace for notes, docs, and data visualization.
- ChatGPT Data Science Prompts - A collection of useful prompts for data scientists using ChatGPT.
- Gamma.app - AI-powered platform for creating and sharing presentations and documents.
- Cookiecutter Data Science - A standardized project structure for data science projects.
- Learn Regex - Comprehensive guide to learning regular expressions with examples and exercises.
- Awesome Regex - Curated collection of regex tools, libraries, and learning resources.
- Habitica - A habit-building and productivity app that treats your life like a role-playing game.
- Bujo - Tools to help transform the way you work and live.
- Asana - A project management platform for tracking work and projects.
- Trello - A visual project management tool.
- Nanobrowser - An open-source AI web automation tool with multi-agent system that runs directly in your browser.
- Best of Jupyter - Ranked list of notable Jupyter Notebook, Hub, and Lab projects.
- Deepnote - AI native data science notebook platform compatible with Jupyter, featuring real-time collaboration, environment management, and integrations.
- Marimo - Reactive Python notebook for reproducible and interactive data science.
- screenshot-to-code - AI tool that converts screenshots into code for various frontend stacks.
- Codebeautify - All-in-one online code formatter and beautifier for Python, SQL, JSON, and more.
- Parabola - An AI-powered workflow builder for organizing data.
- Puter - An open-source, browser-based computing environment and cloud OS.
- Positron - A next-generation data science IDE.
- The Markdown Guide - Comprehensive guide to learning Markdown.
- Readme-AI - A tool to automatically generate README.md files for your projects.
- Markdown Here - Extension for writing emails in Markdown and rendering them before sending.
- MarkText - Simple and elegant markdown editor for documentation.
- QuarkDown - Lightweight markdown processor for fast document rendering.
- Notion - An all-in-one workspace for note-taking and task management.
-
Useful Linux Tools
- Zoxide - Smarter cd command.
- HTop - An interactive process viewer.
- CopyQ - Clipboard manager with advanced features.
- Translate Shell - Command-line translator using Google Translate, Bing Translator, Yandex.Translate, etc.
- Espanso - Cross-platform Text Expander written in Rust.
- Flameshot - Powerful yet simple to use screenshot software.
- csvkit - Suite of command-line tools for working with CSV data.
- Bat - Cat clone with syntax highlighting.
- VisiData - Interactive multitool for tabular data exploration in the terminal.
- tldr-pages - Simplified and community-driven man pages with practical examples.
- Exa - Modern replacement for ls.
- Ripgrep - Faster grep alternative.
- Peek - Simple animated GIF screen recorder with an easy to use interface.
- DrawIO Desktop - An open-source diagramming software for making flowcharts, process diagrams, and more.
- Timeshift - System restore tool for Linux that creates filesystem snapshots using rsync+hardlinks or BTRFS snapshots.
- Backintime - A comfortable and well-configurable graphical frontend for incremental backups.
- Fzf - A command-line fuzzy finder.
- Osquery - SQL powered operating system instrumentation, monitoring, and analytics.
- GNU Parallel - A tool to run jobs in parallel.
- Ncdu - A disk usage analyzer with an ncurses interface.
- Thefuck - A command line tool to correct your previous console command.
- Miller - A tool for querying, processing, and formatting data in various file formats (CSV, JSON, etc.), like awk/sed/cut for data.
- jq - Command-line JSON processor for parsing and manipulating JSON data.
- q - Run SQL directly on CSV or TSV files from the command line.
- httpie - Modern command-line HTTP client for API testing and debugging.
- glances - Cross-platform system monitoring tool for resource usage analysis.
- hyperfine - Command-line benchmarking tool for performance testing.
- termgraph - Draw basic graphs in the terminal for quick data visualization.
- fd - Simple, fast and user-friendly alternative to 'find'.
- dust - More intuitive version of du written in rust.
- bottom - Cross-platform graphical process/system monitor.
- Keychain - Tool for managing and securely storing passwords and secrets.
- yq - Portable command-line YAML processor (like jq for YAML and XML).
- Inkscape - A powerful, free, and open-source vector graphics editor for creating and editing visualizations.
- Rclone - A command-line program to manage files on cloud storage.
- Rsync - A fast and versatile file copying tool that can synchronize files and directories between two locations over a network or locally.
-
Useful VS Code Extensions
- Data Preview - Import, view, slice, and export data.
- Error Lens - Enhances the display of errors and warnings in code.
- Indent Rainbow - Makes indentation easier to read.
- Markdown Table Editor - Add features to edit Markdown tables.
- WYSIWYG Editor for Markdown - View Word and Excel files and edit Markdown.
- SQL Notebooks - Open SQL files as VSCode Notebooks.
- Workspace Dashboard - Organize your workspaces in a speed-dial manner.
- Text Power Tools - An all-in-one solution with 240+ commands for text manipulation.
- Toggle Quotes - Toggle between single, double, and backticks for strings.
- Comment Translate - Helps translate comments, strings, and variable names in your code.
- Bookmarks - Mark lines in your code and jump to them easily.
- Gitignore Generator - Simplifies the process of generating .gitignore files.
- Test Explorer UI - Run your tests in the sidebar of Visual Studio Code.
- PDF Viewer for Visual Studio Code - View PDF files directly in VS Code.
- Path Autocomplete - Provides path completion for files and directories in VS Code.
- PDF Preview in VSCode - Show PDF previews in VS Code.
- JDBC Adapter - Connect to various databases using JDBC.
- DBCode - Connect - Database client for managing and querying databases.
- Markdown All in One - Essential tools for Markdown editing.
- Markdown Preview GitHub Styles - Changes VS Code's markdown preview to match GitHub's styling.
- Snippington Python Pandas Basic - Basic tools for working with Pandas in Python.
- Quick Python Print - Quickly handle print operations in Python.
- Rainbow CSV - Highlight CSV and TSV files and run SQL-like queries.
- Remove Blank Lines - Extension to remove empty lines in documents.
- Path Intellisense - Autocompletes filenames in your code.
- Python Imports Utils - Utilities for managing Python imports.
- VSCode Markdownlint - A VS Code extension to lint and style check markdown files.
- CSV to Table - Convert CSV/TSV/PSV files to ASCII formatted tables.
- Data Wrangler - Tool for cleaning and preparing tabular datasets.
- Prettier - Code formatting extension for VS Code.
- Project Manager - Easily switch between projects.
- Python Indent - Automatically indent Python code.
- SandDance - Visually explore and present your data.
- SQL Tools - Database management tools for VSCode.
- Kanban Board - A Kanban board extension for organizing tasks within VS Code.
- Remote Development - Open any folder in a container, on a remote machine, or in WSL.
- Text Marker - Select text in your code and mark all matches with configurable highlight color.
- Dendron - A hierarchical note-taking tool that grows as you do.
- Python Test Explorer - Run your Python tests in the sidebar of Visual Studio Code.
-
-
π More Awesome Lists
-
Miscellaneous
- Awesome Big Data - A curated list of awesome big data frameworks, resources, and tools.
- Awesome Algorithms - Collection of resources for learning and practicing algorithms and data structures.
- Awesome AI System Prompts - Collection of effective system prompts for various AI models.
- Awesome Osint - Curated list of Open Source Intelligence (OSINT) tools and resources.
- Awesome Telegram - Collection of Telegram bots, channels, and tools for developers.
- Free for Dev - List of SaaS, PaaS, and IaaS offerings with free developer tiers.
- Awesome Product Design - A collection of bookmarks, resources, articles about product design.
- Awesome LaTeX - A curated list of LaTeX resources, libraries, and tools.
- Awesome Tunneling - A list of ngrok alternatives and tunneling software.
- Awesome Scientific Writing - A curated list of resources for scientific writing, publishing, and research.
- Awesome GitHub Profile Readme - A collection of awesome GitHub profile READMEs and resources.
- Awesome Certificates - A curated list of IT and developer certifications and learning resources.
- Awesome Claude Prompts - Collection of powerful prompts for Anthropic's Claude AI.
- Awesome Linux Software - A list of awesome applications and tools for Linux.
- Awesome AutoHotkey - A curated list of awesome AutoHotkey libraries, scripts, and resources.
- Awesome Productivity - A curated list of delightful productivity resources.
- Awesome Actions - A curated list of awesome GitHub Actions for automation.
- Awesome - A curated list of awesome lists.
- Awesome Geospatial - A curated list of awesome geospatial libraries, tools, and resources.
- Awesome Chatgpt Prompts - A repository for ChatGPT prompt curation.
- Awesome Jupyter - Curated list of Jupyter projects, libraries, and resources.
- Awesome Business Intelligence - Actively curated list of awesome BI tools.
- Awesome Prompt Engineering - A curated list of resources for prompt engineering with LLMs like ChatGPT.
- Awesome Shell - A curated list of awesome command-line frameworks, toolkits, and guides.
- Awesome FastAPI - A curated list of awesome FastAPI frameworks, libraries, and resources.
- Awesome Product Management - A curated list of resources for product managers and aspiring PMs.
- Awesome Python Applications - A list of free software and applications written in Python.
- Awesome Quarto - A curated list of Quarto resources, including talks, tools, examples, and articles. Contributions are welcome!
- Awesome Vscode - A comprehensive list of useful VS Code extensions and resources.
- Awesome Readme - Collection of well-crafted README files for inspiration.
- Awesome Code Review - A collection of resources for code review practices.
- Anomaly Detection Resources - Books, papers, videos, and toolboxes related to anomaly detection.
- Awesome Linux - Curated list of Linux applications, tools, and resources for users and developers.
- Awesome for Beginners - List of beginner-friendly projects for contributing to open-source software.
- Best websites a programmer should visit - Curated list of helpful websites for programmers and engineers.
- Awesome Creative Coding - Curated list of creative coding resources and libraries.
- Awesome AI in Finance - Curated list of AI applications, tools, and research in finance.
- Awesome Serverless - Curated resources for serverless architectures and cloud computing.
- Awesome R - Curated list of R packages, frameworks, and learning resources.
- Font-Awesome - Icon library and toolkit for scalable vector graphics on the web.
-
-
π€ Contributing
-
Miscellaneous
-
-
π¦ Additional Python Libraries
-
Miscellaneous
- Poetry - Python dependency management and packaging.
- Pampy - Pattern matching for Python dictionaries.
- UV - An extremely fast Python package installer and resolver.
- Pytest - Framework for writing small tests.
- Pygorithm - A Python module for learning all major algorithms.
- GitPython - A Python library used to interact with Git repositories.
- TQDM - Progress bars for loops and operations.
- Loguru - Python logging made simple.
- Hydra - Elegant configuration management.
- Funcy - Fancy functional tools for Python.
- Pillow - Image processing library.
- Ftfy - Fixes broken Unicode strings.
- JmesPath - Queries JSON data (SQL-like for JSON).
- Glom - Transforms nested data structures.
- Diagrams - Diagrams as code for cloud architecture.
- Click - Beautiful command line interfaces.
- papermill - Tool for parameterizing and executing Jupyter notebooks programmatically.
-
Code Quality & Development
- Mypy - Optional static typing for Python.
- Pydeps - Python module dependency graphs.
- PyForest - Automated Python imports for data science.
- Complexipy - Blazingly fast cognitive complexity analysis for Python, written in Rust.
- Black - Uncompromising Python code formatter.
- Pre-commit - Framework for managing pre-commit hooks.
- Pylint - Python code static analysis.
- Rich - Rich text and beautiful formatting in the terminal.
- Icecream - Debugging without using print.
- Pandas-log - Logs pandas operations for data transformation tracking.
- PandasVet - Code style validator for Pandas.
-
Documentation & File Processing
- Camelot - PDF table extraction library.
- PyPDF2 - Reads and writes PDF files.
- Python-docx - Reads and writes Word documents.
- Python-markdownify - Convert HTML to Markdown.
- Sphinx - Documentation generator.
- Pdoc - API documentation for Python projects.
- Mkdocs - Project documentation with Markdown.
- OpenPyXL - Read/write Excel files.
- Tablib - Exports data to XLSX, JSON, CSV.
- CleverCSV - Smart CSV reader for messy data.
- Xlwings - Integration of Python with Excel.
- WeasyPrint - Convert HTML to PDF.
- Jupyter-book - Build publication-quality books from Jupyter notebooks.
- MarkItDown - Python tool for converting files and office documents to Markdown.
- PyMuPDF - Advanced PDF manipulation library.
- Xmltodict - Converts XML to Python dictionaries.
-
Web & APIs
- Requests-cache - Persistent caching for requests library.
- HTTPX - Next-generation HTTP client for Python.
- FastAPI - Modern web framework for building APIs.
- Flask - Lightweight Python web framework for building applications and APIs.
- Typer - Library for building CLI applications.
-
-
π Skill Development & Career
-
Resume and Interview Tips
- Interviews AI - AI interview preparation guide with questions and solutions.
- Data Science Interview Questions Answers - Curated list of data science interview questions and answers.
- Interview - Everything you need to prepare for your technical interview.
- Cracking Data Science Interview - A Collection of Cheatsheets, Books, Questions, and Portfolio For DS/ML Interview Prep.
- Awesome Behavioral Interviews - Curated resources for mastering behavioral and system design interviews.
- Enhancv Data Scientist Resumes - A collection of resume examples and tips tailored for data scientists.
- Data Science Portfolio - A platform to create and showcase your data science portfolio.
- The Data Science Interview Book - A comprehensive resource to prepare for data science and machine learning interviews.
- Machine Learning Interviews Book - A comprehensive guide to preparing for machine learning engineering interviews.
- Interviews - Personal tech interview study guide covering algorithms and data structures.
- Devinterview - Ace your next tech interview with confidence.
- Interviewqs - Ace your next data science interview.
- Data Science Interview Preperation Resources - Resource to help you prepare for your upcoming data science interviews.
- Data Science Interviews - A comprehensive collection of data science interview questions and resources.
- MLQuestions - Collection of machine learning interview questions and answers.
- Interview Query - Another platform to prepare for data science interviews.
- InterviewBit - SQL Interview Questions - Collection of SQL interview questions.
- Best Resume Ever - Collection of modern resume templates and CV examples.
- StrataScratch - Platform with real data science interview questions from top companies.
- LeetCode Patterns - Curated collection of coding patterns and strategies for technical interviews.
- Bartosz Jarocki's CV - Modern, open-source technical resume template and example.
- Awesome-CV - Professional CV and resume templates built with LaTeX.
- Reactive-Resume - Open-source resume builder with multiple templates and customization options.
-
Practice Resources
- Leetcode Company Wise Problems - Company-wise Leetcode problems for interview preparation.
- LeetCode - A platform for preparing technical coding interviews.
- Kaggle Competitions - Platform for participating in data analysis and machine learning competitions.
- Makeovermonday - A platform focused on enhancing data visualization practices.
- Workout Wednesday - Engage in weekly challenges to improve your visualization skills.
- Official TidyTuesday Repository - Repository for the TidyTuesday project, promoting data analysis.
- DrivenData Competitions - Data analysis competitions with a social impact focus.
- Codecademy Data Science Path - Interactive courses for learning data analysis.
- SQL Masterclass - A course to master SQL for data analysis, complete with real-world projects.
- Hugging Face Tasks - Hands-on practice with specific NLP and machine learning tasks using real models.
- Awesome LeetCode Resources - Collection of curated resources and strategies for LeetCode practice.
-
Curated Jupyter Notebooks
- Deep Learning with Python Notebooks - Official Jupyter notebooks from FranΓ§ois Chollet's Deep Learning with Python book.
- Awesome Notebooks - Data & AI notebook templates catalog organized by tools.
- Pydata Book - Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney.
- Python For Data Analysis - An introduction to data science using Python and Pandas with Jupyter notebooks.
- Jdwittenauer Ipython Notebooks - A collection of IPython notebooks covering various topics.
- DataScienceInteractivePython - A collection of interactive Python notebooks for learning data science concepts.
- Unsloth Notebooks - Optimized notebooks for faster AI model training and fine-tuning.
- Huggingface Notebooks - Official Hugging Face notebooks for NLP, vision, audio, and diffusion models.
- PythonNumericalDemos - Python notebooks for geostatistics and numerical demonstrations.
- Spark py Notebooks - Apache Spark & Python tutorials for big data analysis and machine learning.
- DataMiningNotebooks - Example notebooks for data mining accompanying the course at Southern Methodist University.
- Pythondataanalysis - Python data repository with Jupyter notebooks and scripts.
- Data Science Ipython Notebooks - Data science Python notebooks covering various topics.
-
Data Sources & Datasets
- Kaggle Datasets - Extensive collection of datasets for practice in data analysis.
- Opendatasets - A Python library for downloading datasets from Kaggle, Google Drive, and other online sources.
- Open Data Sources - Collection of various open data sources.
- TensorFlow Datasets - A collection of ready-to-use datasets for use with TensorFlow and other Python ML frameworks.
- LLM Datasets - A collection of datasets and resources for training and fine-tuning Large Language Models (LLMs).
- Datasette - An open source multi-tool for exploring and publishing data.
- Awesome Public Datasets - Curated list of high-quality open datasets.
- Data World - The enterprise data catalog that CIOs, governance professionals, data analysts, and engineers trust in the AI era.
- Awesome Public Real Time Datasets - A list of publicly available datasets with real-time data.
- Google Dataset Search - A search engine for datasets from across the web.
- NASA Open Data Portal - A site for NASA's open data initiative, providing access to NASA's data resources.
- The World Bank Data - Free and open access to global development data by The World Bank.
- Voice Datasets - A collection of audio and speech datasets for voice AI and machine learning.
- HuggingFace Datasets - A lightweight library to easily share and access datasets for audio, computer vision, and NLP.
- NLP Datasets - A curated list of datasets for natural language processing (NLP) tasks.
- TorchVision Datasets - The torchvision.datasets module provides many built-in computer vision datasets.
- Unsplash Datasets - A collection of datasets from Unsplash, useful for computer vision and research.
- Awesome JSON Datasets - A curated list of awesome JSON datasets that are publicly available without authentication.
- Free Datasets for Projects - Dataquest's compilation of free datasets.
-
-
π Dashboards & BI
-
Resources
- Best of Streamlit - Showcase of community-built Streamlit applications.
- Awesome Dash - Comprehensive resources for Dash users.
- geeksforgeeks - Tableau Tutorial - Comprehensive tutorial on Tableau.
- Awesome Dashboards - A collection of outstanding dashboard and visualization resources.
- Awesome Panel - Resources and support for Panel users.
- Awesome Streamlit - Curated list of Streamlit resources and components.
- Dash Enterprise Samples - Production-ready Dash apps.
-
Tools
- Gradio - Tool for creating and sharing machine learning applications.
- Gradio - Tool for creating and sharing machine learning applications.
- Panel - Python library for creating custom interactive web apps and dashboards.
- Tremor - A React library to build dashboards fast with pre-built components for charts, KPIs, and more.
- Appsmith - An open-source platform to build and deploy internal tools, admin panels, and CRUD apps quickly.
- Grafanalib - A Python library for generating Grafana dashboards configuration as code.
- H2O Wave - A Python framework for rapidly building and deploying realtime web apps and dashboards for AI and analytics.
- Shiny for Python - Python version of the popular R Shiny framework.
- VoilΓ - Turn Jupyter notebooks into standalone web applications.
- Reflex - Full-stack Python framework for building web apps.
- Taipy - Python library for building web applications and interactive dashboards.
- Evidence - Business intelligence platform that uses SQL and Markdown for reports.
- Panel - Framework for creating interactive web applications.
- Dash - Framework for creating interactive web applications.
- Streamlit - Simplified framework for building data applications.
- OpenSearch Dashboards - A powerful data visualization and dashboarding tool for OpenSearch data, forked from Kibana.
- GridStack.js - A library for building draggable, resizable responsive dashboard layouts.
-
Software
- ChartBlocks - Online chart creation platform.
- Redash - Tool for visualizing and sharing data insights.
- Metabase - The simplest way to get analytics and business intelligence for everyone in your company.
- Metabase - User-friendly open-source BI tool.
- Grafana - Dashboarding and monitoring tool.
- ChartBlocks - Online chart creation platform.
- Infogram - Tool for creating infographics and visual content.
- Google Data Studio - Free tool for creating interactive dashboards and reports.
- Microsoft Power BI - Business analytics tool for visualizing data.
- QlikView - Tool for data visualization and business intelligence.
- Preset - A platform for modern business intelligence, providing a hosted version of Apache Superset.
- Redash - Tool for visualizing and sharing data insights.
- Rath - Next-generation automated data exploratory analysis and visualization platform.
- Kibana - The official visualization and dashboarding tool for the Elastic Stack (Elasticsearch, Logstash, Beats).
-
-
β³ Time Series Analysis
-
Tools
- Uber Orbit - A Python package for Bayesian time series forecasting and inference.
- sktime - A unified Python framework for machine learning with time series, compatible with scikit-learn.
- GluonTS - A Python toolkit for probabilistic time series modeling, built on MXNet.
- Time-Series-Library - A library for deep learning-based time series analysis and forecasting.
- TimesFM - A pretrained time series foundation model from Google Research for zero-shot forecasting.
- PyTorch Forecasting - A PyTorch-based library for time series forecasting with neural networks.
- Time-series-prediction - A collection of time series prediction methods and implementations.
- PlotJuggler - A tool to visualize and analyze time series data logs in real-time.
- TSFresh - Automatically extracting features from time series data.
- pmdarima - Python library for ARIMA modeling and time series analysis.
- Kats - Toolkit for analyzing time series data from Facebook Research.
- Facebook Prophet - A procedure for forecasting time series data based on an additive model.
-
Resources
- Awesome Time Series - A curated list of resources dedicated to time series analysis and forecasting.
- Forecasting: Principles and Practice - Comprehensive textbook on forecasting methods with practical examples.
- NIST/SEMATECH e-Handbook - Official time series analysis guide from NIST.
- Awesome Time Series Anomaly Detection - A curated list of tools, datasets, and papers dedicated to time series anomaly detection.
- Awesome Time Series in Python - A comprehensive list of Python tools and libraries for time series analysis.
-
-
βοΈ Data Engineering
-
Tools
- Apache Spark - A unified engine for large-scale data processing and analytics.
- Apache Iceberg - A high-performance table format for huge analytic datasets.
- Apache Cassandra - A highly scalable distributed NoSQL database designed for handling large amounts of data across many commodity servers.
- Apache Beam - A unified model for defining both batch and streaming data-parallel processing pipelines.
- Apache Pulsar - A cloud-native, distributed messaging and streaming platform.
- Prefect - Workflow orchestration for building resilient data pipelines.
- Kestra - An open-source, event-driven orchestrator that simplifies data workflow management.
- Apache Flink - A framework for stateful computations over unbounded and bounded data streams (real-time stream processing).
- Apache Kafka - A distributed event streaming platform for building real-time data pipelines.
- Dagster - A data orchestrator for machine learning, analytics, and ETL.
- Apache Airflow - A platform to programmatically author, schedule, and monitor workflows.
- Apache Hive - A data warehouse software for reading, writing, and managing large datasets in distributed storage using SQL.
- Apache Hadoop - A framework that allows for the distributed processing of large data sets across clusters of computers.
- Trino - A distributed SQL query engine designed for fast analytic queries against large datasets.
- DataHub - A metadata platform for the modern data stack.
- OpenLineage - An open framework for collection and analysis of data lineage.
- Kedro - A framework for creating reproducible, maintainable and modular data science code.
- Apache Calcite - A dynamic data management framework that allows for SQL parsing, optimization, and federation.
- Apache Arrow - Universal columnar format and multi-language toolbox for fast data interchange.
- dbt-core - A framework for transforming data in your warehouse using SQL and Jinja.
- Luigi - A Python module for building complex and batch-oriented data pipelines.
- Delta Lake - A storage layer that brings ACID transactions to Apache Spark and big data workloads.
- Apache Hudi - An open data lakehouse platform, built on a high-performance open table format.
-
Resources
- Data Engineer Handbook - A comprehensive guide covering fundamental and advanced data engineering concepts.
- Data Engineering Zoomcamp - Free course on data engineering fundamentals.
- Awesome Data Engineering - A curated list of data engineering tools, software, and resources.
- Data Engineering Cookbook - Techniques and strategies for building reliable data platforms.
- Awesome Pipeline - A curated list of pipeline toolkits for data processing and workflow management.
- Awesome DB Tools - A curated list of awesome database tools.
-
-
π MLOps
-
Resources
- MLOps Zoomcamp - A free course focused on the practical aspects of deploying and maintaining ML systems.
- Awesome MLOps (visenger) - A curated list of references for MLOps.
- Awesome LLMOps - An awesome & curated list of best LLMOps tools for developers.
- ML Engineering Guide - A practical guide to machine learning engineering and MLOps best practices.
- LLM Zoomcamp - A course dedicated to Large Language Models, their architecture and applications.
- Awesome MLOps (kelvins) - A curated list of awesome MLOps tools.
- Awesome Production Machine Learning - A curated list of tools for deploying, monitoring, and maintaining ML systems in production.
- Llama Cookbook - Official recipes and examples for working with Llama models.
-
Tools
- Netflix Metaflow - A human-friendly Python library for helping scientists and engineers build and manage real-life data science projects.
- DVC - Version control system for machine learning projects.
- Evidently - Tool for analyzing and monitoring data and model drift.
- Deepchecks - Validation for ML models and data.
- netdata - Real-time performance monitoring.
- Sematic - Tool to build, debug, and execute ML pipelines with native Python.
- vLLM - High-throughput and memory-efficient inference library for LLMs.
- ColossalAI - High-performance distributed training framework.
- haystack - LLM framework for building search and question answering systems.
- mindsdb - Platform for integrating AI into databases and applications.
- KServe - Standardized serverless inference platform for deploying and serving machine learning models on Kubernetes.
- SQLFlow - Brings machine learning capabilities to SQL, enabling model training and prediction using SQL syntax.
- Jina AI Serve - Framework for building and deploying AI services that communicate via gRPC, HTTP and WebSockets.
- LiteLLM - Unified interface to call all LLM APIs (OpenAI, Anthropic, Cohere, etc.) with consistent output formatting.
- meilisearch - Fast, open-source search engine.
- Kubeflow - Machine learning toolkit for Kubernetes.
- Seldon Core - Open source platform for deploying and monitoring machine learning models in production.
- Feast - A feature store for machine learning that manages and serves ML features to models.
- BentoML - Framework for building, shipping, and scaling ML applications.
- MLflow - Open-source platform for the complete machine learning lifecycle.
- Wandb - Tool for experiment tracking, dataset versioning, and model management.
- Comet ML - ML platform for tracking, comparing and optimizing experiments.
-
-
π Cheatsheets
-
Linux & Git
- Linux Bash Commands - Comprehensive list of Linux/Bash commands for developers and sysadmins.
- Bash Awesome Cheatsheets - Bash scripting essentials.
- Unix Commands Reference - Unix terminal basics.
- Git and Git Flow Cheat Sheet - Branching strategies.
- Linux Cheatsheet - Linux commands and shortcuts.
- GitHub Cheat Sheet - Git/GitHub workflows and tips.
- Git Awesome Cheatsheets - Git commands and best practices.
-
Probability & Statistics
- Stanford CME 106 Cheatsheets - Probability and statistics for engineers.
- 10-Page Probability Cheatsheet - In-depth probability concepts.
- Statistics Cheatsheet - Key statistical methods.
-
Data Science & Machine Learning
- Machine Learning Cheat Sheet - Concise machine learning cheat sheets covering key concepts and equations.
- DS Cheatsheets - List of Data Science Cheatsheets.
- DS Notes \& Cheatsheets - Cheatsheets for data science, ML, computer science and more.
- Data Science Cheat Sheets (Math) - Cheat sheets for quick reference in data science mathematics.
- Pandas Cheat Sheet - Data manipulation with Pandas.
- PySpark Cheatsheet - Common PySpark patterns.
-
Miscellaneous
- Docker Awesome Cheatsheets - Containerization basics.
- CheatSheet for CheatSheets - Mega-repository of cheat sheets.
- Matplotlib Cheatsheets - Official cheatsheets for the Matplotlib plotting library in Python.
- VSCode Awesome Cheatsheets - VS Code shortcuts.
- Markdown Cheatsheet - Formatting for GitHub READMEs.
- Emoji Cheat Sheet - Emojis in Markdown.
- Dataquest - Power BI Cheat Sheet - A helpful resource for Power BI users.
- Data Structures Cheat Sheet - A concise reference for common data structures and their properties.
- Docker Cheat Sheet - Docker commands and workflows.
-
GoalKicker Programming Notes
- MongoDB Notes for Professionals - A practical guide to working with NoSQL and MongoDB for modern application development.
- Bash Notes for Professionals - A comprehensive guide to shell scripting and command-line mastery.
- Oracle Database Notes for Professionals - A guide to Oracle Database concepts, PL/SQL, and administration tasks.
- Git Notes for Professionals - Everything you need to know about version control with Git, from basics to advanced workflows.
- Linux Notes for Professionals - A deep dive into Linux system administration, commands, and environment management.
- Microsoft SQL Server Notes for Professionals - A detailed reference for developing and administering MS SQL Server databases.
- PowerShell Notes for Professionals - A guide to task automation and configuration management using PowerShell.
- Python Notes for Professionals - A massive collection of Python concepts, idioms, and best practices for all levels.
- SQL Notes for Professionals - A definitive guide to SQL syntax, queries, and database interaction concepts.
- PostgreSQL Notes for Professionals - A professional compendium of knowledge for PostgreSQL administration and development.
- MySQL Notes for Professionals - Essential reference material for working with the MySQL database management system.
-
SQL & Databases
- PostgreSQL Cheatsheet - A handy reference for the most common PostgreSQL psql commands and queries.
- Quick SQL Cheatsheet - Handy SQL reference guide.
-
Python
- Learn Python - Interactive Python learning.
- Comprehensive Python Cheatsheet - Detailed Python functions and libraries.
- Python Cheatsheet - A comprehensive cheatsheet for the Python programming language.
- Pysheeet - Concise Python cheat sheet for quick reference and interview prep.
- Python Cheat Sheet - Comprehensive Python syntax and examples.
- Pythoncheatsheet - Quick reference for Python basics and advanced topics.
-
-
πΈοΈ Web Scraping & Crawling
-
Tools
- Ferret - A web scraping system that lets you declaratively describe what data to extract using a simple query language.
- Grab - A Python framework for building web scraping apps, providing a high-level API for asynchronous requests.
- Playwright - Python version of the Playwright browser automation library.
- PyQuery - A jQuery-like library for parsing HTML documents in Python.
- Helium - High-level Selenium wrapper for easier web automation.
- Scrapling - A framework for building web scrapers and crawlers.
- Selenium - A tool for automating web applications for testing purposes.
- Dirsearch - A web path scanner.
- Requests - A simple, yet elegant, HTTP library for Python.
- BeautifulSoup - A library for parsing HTML and XML documents.
- Selenium - A tool for automating web applications for testing purposes.
- Feedparser - A library to parse feeds in Python.
- Trafilatura - A Python & command-line tool to gather text and metadata on the web.
- You-Get - A tiny command-line utility to download media contents (videos, audios, images) from the web.
- Snscrape - A social networking service scraper in Python.
- Crawl4AI - Advanced web crawling framework designed for AI and data extraction tasks.
- Browser Use - A library for browser automation and web scraping.
- Gerapy - Distributed Crawler Management Framework based on Scrapy, Scrapyd, Django, and Vue.js.
- AutoScraper - A smart, automatic, fast, and lightweight web scraper for Python.
- MechanicalSoup - A Python library for automating interaction with websites.
- ScrapeGraph AI - A Python scraper based on AI.
-
Resources
- Easy Scraping Tutorial - Simple but useful Python web scraping tutorial code.
- Trump Lies - Tutorial for web scraping in Python with Beautiful Soup.
- Scraper Projects - List of mini projects that involve web scraping.
- Best of Web Python - A ranked list of awesome Python libraries for web development.
- Awesome Web Scraping - List of libraries, tools, and APIs for web scraping and data processing.
- Python Scraping - Code samples from the book "Web Scraping with Python".
- Scraping Tutorial - Tutorial for scraping streaming sites.
- Webscraping from 0 to Hero - An open project repository sharing knowledge and experiences about web scraping with Python.
-
-
πΊοΈ Roadmaps
- Data Analyst RoadMap - Comprehensive roadmap for aspiring data analysts.
- Roadmap for Data Science - Structured roadmap for aspiring data scientists.
- Data Analyst Roadmap for Professionals - 8-week program for analysts at all levels.
- Data Science Roadmap Tutorials - Tutorials for the data science roadmap.
- Data Analyst Roadmap from Zero - Guide to becoming a data analyst from scratch.
- Data Science Roadmap from A to Z - Comprehensive roadmap for data science.
- Roadmap To Learn Data Science - A comprehensive and updated roadmap for learning data science with modern tools and technologies.
- 66DaysOfData - 66-day data analytics learning challenge.
- Data Analyst Roadmap - Structured learning path for analysts.
-
π Dashboards
-
Resources
- geeksforgeeks - Power BI Tutorial - Detailed tutorial on Power BI.
- Plotly Dash Tutorial - Tutorial for learning Plotly Dash.
- geeksforgeeks - Tableau Tutorial - Comprehensive tutorial on Tableau.
- DashTools - Command line tools for Dash applications.
-
-
π’ Mathematics, Statistics & Probability
-
Mathematics
- ML foundations - Focus on calculus and optimization techniques for ML.
- Khan Academy - Math for Data Science - Free online courses covering various math topics.
- Towards Data Science - Math Section - Articles and resources on mathematics for data science.
- Towards Data Science - Math Section - Articles and resources on mathematics for data science.
-
-
π License
-
Miscellaneous
-
-
π§ͺ A/B Testing
-
Tools
- Experimentguide - A practical guide to A/B testing and experimentation from industry leaders.
- Google's A/B Testing Course - A free Udacity course covering the fundamentals of A/B testing.
- DynamicYield A/B Testing - An online course covering advanced testing and optimization techniques.
- Evan's Awesome A/B Tools - A/B test calculators.
- So You Think You Can Test? - Experience the challenges of A/B testing through this educational simulation.
-
-
π Additional Resources and Tools
-
Miscellaneous
- A collective list of free APIs - A comprehensive list of free APIs for various purposes.
- arXiv.org - A free distribution service and open-access archive for scholarly articles.
- Elicit - An AI research assistant that helps automate parts of literature review.
- 500+ AI/ML/DL/NLP Projects - A massive collection of AI and machine learning projects with code for learning and portfolios.
- Full Stack Fastapi Template - Full-stack template with FastAPI, React, and PostgreSQL.
- Kittl - Platform for creating and editing charts and data visualizations.
- Zasper - High Performace IDE for Jupyter Notebooks.
- Sketch - Toolkit designed for designers, focusing on their workflow.
- CS Video Courses - Curated list of free university computer science video courses.
- Build Your Own X - Tutorials on how to build your own technology from scratch.
- What Happens When - Technical explanation of what happens when you type a URL and press Enter.
- OSSU Computer Science - Path to a free, self-taught education in computer science.
- UC Berkeley - Data 8 - Course materials for the Data Science Foundations course.
- PaddleOCR - Production-ready OCR toolkit with multilingual and document AI support.
- Growth.Design - A collection of product case studies and behavioral psychology insights for data-driven decision-making.
- Markdown Badges - Collection of badges for GitHub profiles and Markdown files.
-
Programming Languages
Categories
β‘ Productivity
100
π Python
89
π§ AI Applications & Platforms
67
π Skill Development & Career
66
π€ Machine Learning & AI
49
π¦ Additional Python Libraries
49
π Cheatsheets
44
βοΈ Cloud Platforms & Infrastructure
44
ποΈ SQL & Databases
42
π More Awesome Lists
40
π Dashboards & BI
38
π Data Visualization
38
π² Statistics & Probability
34
π MLOps
30
βοΈ Data Engineering
29
πΈοΈ Web Scraping & Crawling
29
π Natural Language Processing (NLP)
25
β³ Time Series Analysis
17
π Additional Resources and Tools
16
πΊοΈ Roadmaps
9
π’ Mathematics
7
π Awesome Data Science Repositories
6
π§ͺ A/B Testing
5
π Dashboards
4
π’ Mathematics, Statistics & Probability
4
π License
2
π€ Contributing
1
Sub Categories
Tools
283
Resources
201
Miscellaneous
85
Useful Python Tools for Data Analysis
65
Useful VS Code Extensions
39
Useful Linux Tools
36
Resume and Interview Tips
23
Data Sources & Datasets
19
Documentation & File Processing
16
Software
14
Curated Jupyter Notebooks
13
Practice Resources
11
GoalKicker Programming Notes
11
Code Quality & Development
11
Data Manipulation with Pandas and Numpy
9
Linux & Git
7
Python
6
Data Science & Machine Learning
6
Web & APIs
5
Mathematics
4
Probability & Statistics
3
SQL & Databases
2
Keywords
python
227
machine-learning
134
data-science
89
awesome-list
68
deep-learning
63
ai
61
awesome
60
llm
38
data-analysis
37
pandas
36
sql
36
pytorch
34
data-visualization
34
kubernetes
30
mlops
27
database
27
statistics
25
visualization
24
nlp
24
cli
22
openai
22
docker
22
numpy
21
artificial-intelligence
21
python3
20
natural-language-processing
19
list
19
data
19
jupyter
18
scikit-learn
18
chatgpt
18
analytics
17
tensorflow
16
llms
16
java
15
rag
15
r
14
time-series
14
jupyter-notebook
14
big-data
14
cheatsheet
14
data-engineering
13
ml
13
devops
13
markdown
13
gpt
12
agents
12
neural-network
12
postgresql
12
large-language-models
12