awesome-data-analysis
ππ 400+ curated resources for data analysis and data science: Python, SQL, ML, Visualization, Dashboards, Cheatsheets, Roadmaps, and Interview Prep. Perfect for beginners and pros!
https://github.com/pavelgrigoryevds/awesome-data-analysis
Last synced: about 4 hours ago
JSON representation
-
πΊοΈ Roadmaps
- Roadmap To Learn Data Science - A comprehensive and updated roadmap for learning data science with modern tools and technologies.
- Data Analyst RoadMap - Comprehensive roadmap for aspiring data analysts.
- 66DaysOfData - 66-day data analytics learning challenge.
- Data Analyst Roadmap from Zero - Guide to becoming a data analyst from scratch.
- Data Analyst Roadmap for Professionals - 8-week program for analysts at all levels.
- Data Analyst Roadmap - Structured learning path for analysts.
- Data Science Roadmap Tutorials - Tutorials for the data science roadmap.
- Data Science Roadmap from A to Z - Comprehensive roadmap for data science.
- Roadmap for Data Science - Structured roadmap for aspiring data scientists.
- Data Analyst RoadMap - Comprehensive roadmap for aspiring data analysts.
- 66DaysOfData - 66-day data analytics learning challenge.
- Data Analyst Roadmap from Zero - Guide to becoming a data analyst from scratch.
- Data Analyst Roadmap for Professionals - 8-week program for analysts at all levels.
-
π Python
-
Resources
- Awesome Python Data Science - A curated list of Python resources for data science.
- W3Schools Python - A beginner-friendly tutorial and reference for the Python programming language.
- Awesome Python - An opinionated list of awesome Python frameworks, libraries, software, and resources.
- 30 Days Of Python - A 30-day programming challenge to learn the Python programming language.
- Real Python Tutorials - Tutorials on Python from Real Python.
- Data Science Python - Common data analysis and machine learning tasks using Python.
- Python Data Science Handbook - Full text of the "Python Data Science Handbook" in Jupyter Notebooks.
- Python for Algorithms & Interviews - Files for Udemy course on algorithms and data structures.
- Tanu N Prabhu Python - This repository helps you understand Python from scratch.
- Awesome Python - An opinionated list of awesome Python frameworks, libraries, software, and resources.
- 30 Days Of Python - A 30-day programming challenge to learn the Python programming language.
- Real Python Tutorials - Tutorials on Python from Real Python.
- Data Science Python - Common data analysis and machine learning tasks using Python.
- Python Data Science Handbook - Full text of the "Python Data Science Handbook" in Jupyter Notebooks.
- Python for Algorithms & Interviews - Files for Udemy course on algorithms and data structures.
- Tanu N Prabhu Python - This repository helps you understand Python from scratch.
- Interactive Coding Challenges - 120+ interactive Python coding interview challenges.
- Clean Code Python - Clean Code concepts adapted for Python.
- Awesome Python Ppplications - Free software that works great, and also happens to be open-source Python.
- List of Python Api Wrappers - List of Python API wrappers and libraries.
- Awesome Time Series in Python - Curated list of Python packages for time series analysis.
- Interactive Coding Challenges - 120+ interactive Python coding interview challenges.
- Awesome Python Ppplications - Free software that works great, and also happens to be open-source Python.
- Clean Code Python - Clean Code concepts adapted for Python.
- List of Python Api Wrappers - List of Python API wrappers and libraries.
- Awesome Time Series in Python - Curated list of Python packages for time series analysis.
-
Useful Python Tools for Data Analysis
- Great Expectations - Data validation and testing.
- Fitter - Figures out the distribution your data comes from.
- Sklearn Pandas - Bridge between Pandas and Scikit-learn.
- CuPy - A NumPy-compatible array library accelerated by NVIDIA CUDA for high-performance computing.
- Numba - A JIT compiler that translates a subset of Python and NumPy code into fast machine code.
- Pandas Stubs - Type stubs for pandas, improves IDE autocompletion.
- Pydantic - Data validation using Python type annotations.
- Category Encoders - Extensive collection of categorical variable encoders.
- Imbalanced Learn - Handling imbalanced datasets.
- PySAL - Spatial analysis functions.
- ImageIO - A library that provides an easy interface to read and write a wide range of image data.
- Texthero - Text preprocessing, representation and visualization.
- Geopandas - Geographic data operations with pandas.
- NetworkX - Network analysis and graph theory.
- Pandas DQ - Data type correction and automatic DataFrame cleaning.
- Vaex - High-performance Python library for lazy Out-of-Core DataFrames.
- DataCleaner - Python tool for automatically cleaning and preparing datasets.
- TheFuzz - Fuzzy string matching (Levenshtein distance).
- PandasAI - Conversational data analysis using LLMs and RAG.
- DateUtil - Extensions for standard Python datetime features.
- Fugue - Unified interface for Pandas, Spark, and Dask.
- Pandas DataReader - Reads data from various online sources into pandas DataFrames.
- Pandas-dq - Data type correction and automatic DataFrame cleaning.
- Vaex - High-performance Python library for lazy Out-of-Core DataFrames.
- DataCleaner - Python tool for automatically cleaning and preparing datasets.
- Polars - Multithreaded, vectorized query engine for DataFrames.
- TheFuzz - Fuzzy string matching (Levenshtein distance).
- PandasAI - Conversational data analysis using LLMs and RAG.
- DateUtil - Extensions for standard Python datetime features.
- Fugue - Unified interface for Pandas, Spark, and Dask.
- Pandas-DataReader - Reads data from various online sources into pandas DataFrames.
- sklearn-pandas - Bridge between Pandas and Scikit-learn.
- fitter - Figures out the distribution your data comes from.
- Arrow - Enhanced work with dates and times.
- fitter - Figures out the distribution your data comes from.
- Arrow - Enhanced work with dates and times.
- Pendulum - Alternative to datetime with timezone support.
- AutoViz - Automatic data visualization in 1 line of code.
- Datashader - Quickly and accurately render even the largest data.
- Vizro - Low-code toolkit for building data visualization apps.
- Great Tables - Create awesome display tables using Python.
- DataMapPlot - Create beautiful plots of data maps.
- Sweetviz - Automatic EDA with dataset comparison.
- Lux - Automatic DataFrame visualization in Jupyter.
- Yellowbrick - Visual diagnostic tools for machine learning.
- PyOD - Outlier and anomaly detection.
- Pendulum - Alternative to datetime with timezone support.
- AutoViz - Automatic data visualization in 1 line of code.
- Vizro - Low-code toolkit for building high-quality data visualization apps.
- Great Tables - Create awesome display tables using Python.
- DataMapPlot - Create beautiful plots of data maps.
- Datashader - Quickly and accurately render even the largest data.
- Sweetviz - Automatic EDA with dataset comparison.
- Lux - Automatic DataFrame visualization in Jupyter with a click.
- Yellowbrick - A suite of visual diagnostic tools for machine learning, extending the Scikit-Learn API.
- PyOD - Python library for outlier and anomaly detection.
- YData Profiling - Data quality profiling & exploratory data analysis.
- Missingno - Visualize missing data patterns.
- Alibi Detect - Outlier, adversarial and drift detection.
- YData Profiling - 1 line of code data quality profiling & exploratory data analysis.
- Missingno - Visualize missing data patterns in matrix format.
- Dora - Automate EDA: preprocessing, feature engineering, visualization.
- FeatureTools - Automated feature engineering.
- Feature Selector - Tool for dimensionality reduction of machine learning datasets.
- Dora - Automate EDA: preprocessing, feature engineering, visualization.
- Alibi-detect - Algorithms for outlier, adversarial and drift detection.
- FeatureTools - Open-source automated feature engineering.
- Feature Selector - Tool for dimensionality reduction of machine learning datasets.
- TSFresh - A Python library for automatically extracting features from time series data.
- Prince - Multivariate exploratory data analysis (PCA, CA, MCA).
- Factor Analyzer - A Python package for factor analysis, including exploratory and confirmatory methods.
- Pytest - Framework for writing small tests.
- TSFresh - A Python library for automatically extracting features from time series data.
- Feature Engine - Feature engineering with Scikit-Learn compatibility.
- Cerberus - Data validation through schemas.
- Pandera - Data validation through declarative schemas.
- PandasVet - Code style validator for Pandas (similar to ESLint).
- Prefect - Workflow orchestration for building resilient data pipelines.
- Airflow - Platform for automating data workflows.
- Apache Arrow - Universal columnar format and multi-language toolbox for fast data interchange.
- Petl - ETL tool for data cleaning and transformation.
- D-Tale - Interactive GUI for data analysis in a browser.
- Feature Engine - A feature engineering library with Scikit-Learn compatibility.
- Prince - A Python library for multivariate exploratory data analysis, including PCA, CA, MCA, and more.
- Factor Analyzer - A Python package for factor analysis, including exploratory and confirmatory methods.
- Pytest - Framework for writing small tests.
- Cerberus - Data validation through schemas.
- Pandera - Data validation through declarative schemas.
- PandasVet - Code style validator for Pandas (similar to ESLint).
- Prefect - Workflow orchestration for building resilient data pipelines.
- Airflow - Platform for automating data workflows.
- Apache Arrow - Universal columnar format and multi-language toolbox for fast data interchange.
- Petl - ETL tool for data cleaning and transformation.
- DuckDB - In-memory analytical database for fast SQL queries.
- D-Tale - Interactive GUI for data analysis in a browser.
- Pandasgui - GUI for viewing and filtering DataFrames.
- QGrid - Interactive grid for DataFrames in Jupyter.
- PyGWalker - Interactive UIs for visual analysis of DataFrames.
- Pivottablejs - Interactive PivotTable.js tables in Jupyter.
- Faker - Generates fake data for testing.
- Mimesis - Generates realistic test data.
- Rich - Rich text and beautiful formatting in the terminal.
- Pandas-log - Logs pandas operations for data transformation tracking.
- Icecream - Debugging without using print.
- Pydeps - Python module dependency graphs.
- PyForest - Automated Python imports for data science.
- Pandarallel - Parallel operations for pandas DataFrames.
- Dask - Parallel computing for arrays and DataFrames.
- Modin - Speeds up Pandas by distributing computations.
- Sphinx - The Sphinx documentation generator.
- Pdoc - API documentation for Python projects.
- Mkdocs - Project documentation with Markdown.
- OpenPyXL - Read/write Excel files with support for advanced features.
- Tablib - Exports data to XLSX, JSON, CSV via a single API.
- Pandasgui - GUI for viewing and filtering DataFrames.
- QGrid - Interactive grid for sorting, filtering, and editing DataFrames in Jupyter.
- PyGWalker - Interactive UIs for visual analysis of pandas DataFrames.
- Rich - Rich text and beautiful formatting in the terminal.
- Pandas-log - Logs pandas operations for data transformation tracking.
- Pivottablejs - Interactive PivotTable.js tables in Jupyter.
- Faker - Generates fake data for testing.
- Mimesis - Generates realistic test data.
- Icecream - Debugging without using print.
- Pydeps - Python module dependency graphs.
- PyForest - Automated Python imports for data science.
- Pandarallel - Parallel operations for pandas DataFrames.
- Dask - Parallel computing for arrays and DataFrames.
- Modin - Speeds up Pandas by distributing computations.
- OpenPyXL - Read/write Excel files with support for advanced features.
- Tablib - Exports data to XLSX, JSON, CSV via a single API.
- PyPDF2 - Reads and writes PDF files.
- Sphinx - The Sphinx documentation generator.
- Pdoc - API documentation for Python projects.
- Mkdocs - Project documentation with Markdown.
- Python-docx - Reads and writes Word documents.
- CleverCSV - Smart CSV reader for messy data.
- Xlwings - Integration of Python with Excel.
- Xmltodict - Converts XML to Python dictionaries.
- Python-markdownify - Convert HTML to Markdown.
- PyPDF2 - Reads and writes PDF files.
- Python-docx - Reads and writes Word documents.
- CleverCSV - Smart CSV reader for messy data.
- Xlwings - Integration of Python with Excel.
- Xmltodict - Converts XML to Python dictionaries.
- Python-markdownify - Convert HTML to Markdown.
- MarkItDown - Python tool for converting files and office documents to Markdown.
- Pillow - Image processing library.
- Ftfy - Fixes broken Unicode strings.
- MarkItDown - Python tool for converting files and office documents to Markdown.
- Pillow - Image processing library.
- Ftfy - Fixes broken Unicode strings.
- JmesPath - Queries JSON data (SQL-like for JSON).
- Glom - Transforms nested data structures.
- Pampy - Pattern matching for Python dictionaries.
- Geopy - Geocoding addresses and calculating distances.
- Diagrams - Diagrams as code for cloud system architecture prototyping.
- Scattertext - Beautiful visualizations of language differences among document types.
- Pygorithm - A Python module for learning all major algorithms.
- IGraph - A library for creating and manipulating graphs and networks, with bindings for multiple languages.
- Joblib - A lightweight pipelining library for Python, particularly useful for saving and loading large NumPy arrays.
- Dataset - JSON-like interface for working with SQL databases.
- JmesPath - Queries JSON data (SQL-like for JSON).
- Glom - Transforms nested data structures.
- Pampy - Pattern matching for Python dictionaries.
- Geopy - Geocoding addresses and calculating distances.
- Diagrams - Diagrams as code for cloud system architecture prototyping.
- Scattertext - Beautiful visualizations of language differences among document types.
- Pygorithm - A Python module for learning all major algorithms.
- IGraph - A library for creating and manipulating graphs and networks, with bindings for multiple languages.
- Joblib - A lightweight pipelining library for Python, particularly useful for saving and loading large NumPy arrays.
-
Data Manipulation with Pandas and Numpy
- From Python to Numpy - An open-access book on vectorization and efficient numerical computing with NumPy.
- NumPy 100 Exercises - A collection of 100 exercises to master the NumPy library for scientific computing.
- Awesome Pandas - A curated list of resources for using the Pandas library.
- 100 data puzzles for pandas - A collection of data puzzles to practice your Pandas skills.
- Pandas Tutor - Visualize Pandas operations step-by-step (perfect for beginners).
- Pandas Exercises - Exercises designed to help you improve your Pandas skills.
- Pandas Cookbook - A cookbook with various recipes for using Pandas effectively.
- Hands-On Data Analysis with Pandas - Materials for following along with Hands-On Data Analysis with Pandas.
- Effective Pandas - A series focused on writing effective and idiomatic Pandas code.
-
Data Manipulation with Pandas
- Awesome Pandas - A curated list of resources for using the Pandas library.
- 100 data puzzles for pandas - A collection of data puzzles to practice your Pandas skills.
- Pandas Tutor - Visualize Pandas operations step-by-step (perfect for beginners).
- Pandas Exercises - Exercises designed to help you improve your Pandas skills.
- Pandas Cookbook - A cookbook with various recipes for using Pandas effectively.
- Hands-On Data Analysis with Pandas - Materials for following along with Hands-On Data Analysis with Pandas.
- Effective Pandas - A series focused on writing effective and idiomatic Pandas code.
-
-
ποΈ SQL & Databases
-
Tools
- SQLiteviz - A tool for exploring SQLite databases and visualizing the results of your queries.
- SQLite - A C-language library that implements a small, fast, self-contained, high-reliability, full-featured SQL database engine.
- DB Browser for SQLite - A high quality, visual, open source tool to create, design, and edit database files compatible with SQLite.
- DBeaver - A free universal database tool and SQL client for developers, SQL programmers, and administrators.
- Beekeeper Studio - A modern, easy-to-use SQL client and database manager with a clean, cross-platform interface.
- SQLFluff - A modular SQL linter and auto-formatter designed to enforce consistent style and catch errors in SQL code.
- PyMySQL - A pure-Python MySQL client library for interacting with MySQL databases from Python applications.
- Vanna.AI - An AI-powered tool for generating SQL queries from natural language questions.
- SQLChat - A chat-based SQL client that allows you to query databases using natural language conversations.
- SQLGlot - A no-dependency SQL parser, transpiler, and optimizer for Python.
- TDengine - An open-source big data platform designed for time-series data, IoT, and industrial monitoring.
- TimescaleDB - An open-source time-series SQL database optimized for fast ingest and complex queries.
- DuckDB - In-memory analytical database for fast SQL queries.
- Dataset - JSON-like interface for working with SQL databases.
- PyODBC - Python library for ODBC database access.
- SQLAlchemy - SQL toolkit and ORM for Python.
- Psycopg2 - PostgreSQL database adapter.
- MySQL Connector/Python - MySQL driver for Python.
- PyODBC - Python library for ODBC database access.
- SQLAlchemy - SQL toolkit and ORM for Python.
- Psycopg2 - PostgreSQL database adapter.
- MySQL Connector/Python - MySQL driver for Python.
- PonyORM - ORM for Python with dynamic query generation.
- PyMongo - Official MongoDB driver for Python.
- PonyORM - ORM for Python with dynamic query generation.
- PyMongo - Official MongoDB driver for Python.
-
Resources
- SQLZoo - SQL Tutorial - Interactive SQL tutorial.
- SQL Bolt - Learn SQL - Learn SQL through interactive lessons.
- SQL Tutorial - Comprehensive SQL tutorial resource.
- SQLZoo - SQL Tutorial - Interactive SQL tutorial.
- SQL Bolt - Learn SQL - Learn SQL through interactive lessons.
- SQL Tutorial - Comprehensive SQL tutorial resource.
- SQL Tutorial by W3Schools. - Comprehensive SQL tutorial.
- PostgreSQL Tutorial by W3Resource - Tutorial for PostgreSQL.
- SQL Tutorial by W3Schools. - Comprehensive SQL tutorial.
- MongoDB Tutorial by W3Resource - Tutorial for MongoDB.
- InterviewBit - SQL Interview Questions - Collection of SQL interview questions.
- PostgreSQL Tutorial by W3Resource - Tutorial for PostgreSQL.
- MySQL Tutorial by W3Resource - Tutorial for MySQL.
- EverSQL - AI-powered SQL query optimization and database observability tool.
- GeeksforGeeks - SQL Tutorial - Detailed SQL tutorial.
- Awesome Postgres - A curated list of awesome PostgreSQL software, libraries, tools and resources.
- Awesome MySql - A curated list of awesome MySQL software, libraries, tools and resources.
- Awesome Clickhouse - A curated list of awesome ClickHouse software.
- Awesome SQLAlchemy - A curated list of awesome tools for SQLAlchemy.
- MySQL Tutorial by W3Resource - Tutorial for MySQL.
- MongoDB Tutorial by W3Resource - Tutorial for MongoDB.
- InterviewBit - SQL Interview Questions - Collection of SQL interview questions.
- EverSQL - AI-powered SQL query optimization and database observability tool.
- GeeksforGeeks - SQL Tutorial - Detailed SQL tutorial.
- Awesome Postgres - A curated list of awesome PostgreSQL software, libraries, tools and resources.
- Awesome MySql - A curated list of awesome MySQL software, libraries, tools and resources.
- Awesome Clickhouse - A curated list of awesome ClickHouse software.
- Awesome SQLAlchemy - A curated list of awesome tools for SQLAlchemy.
- Awesome Sql - List of tools and techniques for working with relational databases.
- Awesome Sql - List of tools and techniques for working with relational databases.
-
-
π Data Visualization
-
Resources
- Awesome DataViz - A curated list of awesome data visualization libraries, tools, and resources.
- Cedric Scherer's DataViz Resources - A collection of top data visualization resources and inspiration.
- Information is Beautiful - A site dedicated to visualizations that make complex ideas clear and engaging.
- Plottie - A vast library of scientific plots for visualization inspiration and ideas.
- Data Viz Project - A resource for selecting suitable visualizations.
- Chartopedia - A guide to help you select the appropriate chart types.
- Visualization Curriculum - Interactive notebooks designed to teach data visualization concepts.
- The Python Graph Gallery - A collection of Python graph examples for data visualization.
- FlowingData - Insights on data analysis and visualization.
- Data Visualization Catalogue - A comprehensive catalog of data visualization types.
- From Data to Viz - A guide to choosing the right visualization based on your data.
- DataForVisualization - Tutorials and insights on data visualization techniques.
- Truth & Beauty - Exploration of the aesthetics of data visualization.
- Visualization Curriculum - Interactive notebooks designed to teach data visualization concepts.
- The Python Graph Gallery - A collection of Python graph examples for data visualization.
- FlowingData - Insights on data analysis and visualization.
- Data Visualization Catalogue - A comprehensive catalog of data visualization types.
- From Data to Viz - A guide to choosing the right visualization based on your data.
- Data Viz Project - A resource for selecting suitable visualizations.
- Chartopedia - A guide to help you select the appropriate chart types.
- DataForVisualization - Tutorials and insights on data visualization techniques.
- Truth & Beauty - Exploration of the aesthetics of data visualization.
-
Tools
- Altair - A declarative statistical visualization library for Python.
- Glumpy - A Python library for scientific visualization that is fast, scalable and beautiful, based on OpenGL.
- Pandas-bokeh - Bokeh plotting backend for Pandas.
- Deck.gl - A WebGL-powered framework for visual exploratory data analysis of large datasets.
- Python for Geo - Contextily: add background basemaps to your plots in GeoPandas.
- OSMnx - A package to easily download, model, analyze, and visualize street networks from OpenStreetMap.
- Apache ECharts - A powerful, interactive charting and visualization library for browser-based applications.
- VisPy - A high-performance interactive 2D/3D data visualization library leveraging the power of OpenGL.
- Matplotlib - A comprehensive library for creating static, animated, and interactive visualizations in Python.
- Seaborn - A statistical data visualization library based on Matplotlib.
- Plotly - A library for creating interactive plots and dashboards.
- Bokeh - A library for creating interactive visualizations for modern web browsers.
- HoloViews - A tool for building complex visualizations easily.
- Geopandas - An extension of Pandas for geospatial data.
- Folium - A library for visualizing data on interactive maps.
- Matplotlib - A comprehensive library for creating static, animated, and interactive visualizations in Python.
- Seaborn - A statistical data visualization library based on Matplotlib.
- Plotly - A library for creating interactive plots and dashboards.
- Altair - A declarative statistical visualization library for Python.
- Bokeh - A library for creating interactive visualizations for modern web browsers.
- HoloViews - A tool for building complex visualizations easily.
- Geopandas - An extension of Pandas for geospatial data.
- Folium - A library for visualizing data on interactive maps.
- Plotnine - A grammar of graphics for Python.
- Plotnine - A grammar of graphics for Python.
- Bqplot - A plotting library for IPython/Jupyter notebooks.
- PyPalettes - A large (+2500) collection of color maps for Python.
- Bqplot - A plotting library for IPython/Jupyter notebooks.
- PyPalettes - A large (+2500) collection of color maps for Python.
-
-
π Dashboards & BI
-
Resources
- Awesome Dash - Comprehensive resources for Dash users.
- Awesome Streamlit - Curated list of Streamlit resources and components.
- Awesome Dashboards - A collection of outstanding dashboard and visualization resources.
- Best of Streamlit - Showcase of community-built Streamlit applications.
- Awesome Panel - Resources and support for Panel users.
- Dash Enterprise Samples - Production-ready Dash apps.
-
Tools
- OpenSearch Dashboards - A powerful data visualization and dashboarding tool for OpenSearch data, forked from Kibana.
- GridStack.js - A library for building draggable, resizable responsive dashboard layouts.
- Tremor - A React library to build dashboards fast with pre-built components for charts, KPIs, and more.
- Appsmith - An open-source platform to build and deploy internal tools, admin panels, and CRUD apps quickly.
- Grafanalib - A Python library for generating Grafana dashboards configuration as code.
- H2O Wave - A Python framework for rapidly building and deploying realtime web apps and dashboards for AI and analytics.
- Shiny for Python - Python version of the popular R Shiny framework.
- VoilΓ - Turn Jupyter notebooks into standalone web applications.
- Reflex - Full-stack Python framework for building web apps.
- Dash - Framework for creating interactive web applications.
- Streamlit - Simplified framework for building data applications.
- Panel - Framework for creating interactive web applications.
- Gradio - Tool for creating and sharing machine learning applications.
-
Software
- Preset - A platform for modern business intelligence, providing a hosted version of Apache Superset.
- Kibana - The official visualization and dashboarding tool for the Elastic Stack (Elasticsearch, Logstash, Beats).
- Rath - Next-generation automated data exploratory analysis and visualization platform.
- Microsoft Power BI - Business analytics tool for visualizing data.
- QlikView - Tool for data visualization and business intelligence.
- Redash - Tool for visualizing and sharing data insights.
-
-
πΈοΈ Web Scraping & Crawling
-
Tools
- Ferret - A web scraping system that lets you declaratively describe what data to extract using a simple query language.
- Grab - A Python framework for building web scraping apps, providing a high-level API for asynchronous requests.
- Playwright - Python version of the Playwright browser automation library.
- PyQuery - A jQuery-like library for parsing HTML documents in Python.
- Helium - High-level Selenium wrapper for easier web automation.
- BeautifulSoup - A library for parsing HTML and XML documents.
- Selenium - A tool for automating web applications for testing purposes.
- BeautifulSoup - A library for parsing HTML and XML documents.
- Selenium - A tool for automating web applications for testing purposes.
- Gerapy - Distributed Crawler Management Framework based on Scrapy, Scrapyd, Django, and Vue.js.
- TextAttack - A Python framework for adversarial attacks, data augmentation, and model training in NLP.
- AutoScraper - A smart, automatic, fast, and lightweight web scraper for Python.
- Feedparser - A library to parse feeds in Python.
- Trafilatura - A Python & command-line tool to gather text and metadata on the web.
- Gerapy - Distributed Crawler Management Framework based on Scrapy, Scrapyd, Django, and Vue.js.
- TextAttack - A Python framework for adversarial attacks, data augmentation, and model training in NLP.
- AutoScraper - A smart, automatic, fast, and lightweight web scraper for Python.
- Feedparser - A library to parse feeds in Python.
- Trafilatura - A Python & command-line tool to gather text and metadata on the web.
- You-Get - A tiny command-line utility to download media contents (videos, audios, images) from the web.
- Dirsearch - A web path scanner.
- MechanicalSoup - A Python library for automating interaction with websites.
- ScrapeGraph AI - A Python scraper based on AI.
- Snscrape - A social networking service scraper in Python.
- You-Get - A tiny command-line utility to download media contents (videos, audios, images) from the web.
- Dirsearch - A web path scanner.
- MechanicalSoup - A Python library for automating interaction with websites.
- ScrapeGraph AI - A Python scraper based on AI.
- Snscrape - A social networking service scraper in Python.
-
Resources
- Python Scraping - Code samples from the book "Web Scraping with Python".
- Awesome Web Scraping - List of libraries, tools, and APIs for web scraping and data processing.
- Easy Scraping Tutorial - Simple but useful Python web scraping tutorial code.
- Webscraping from 0 to Hero - An open project repository sharing knowledge and experiences about web scraping with Python.
- Best of Web Python - A ranked list of awesome Python libraries for web development.
- Trump Lies - Tutorial for web scraping in Python with Beautiful Soup.
- Scraping Tutorial - Tutorial for scraping streaming sites.
- Scraper Projects - List of mini projects that involve web scraping.
- Best of Web Python - A ranked list of awesome Python libraries for web development.
- Python Scraping - Code samples from the book "Web Scraping with Python".
- Awesome Web Scraping - List of libraries, tools, and APIs for web scraping and data processing.
- Easy Scraping Tutorial - Simple but useful Python web scraping tutorial code.
- Webscraping from 0 to Hero - An open project repository sharing knowledge and experiences about web scraping with Python.
- Trump Lies - Tutorial for web scraping in Python with Beautiful Soup.
- Scraping Tutorial - Tutorial for scraping streaming sites.
- Scraper Projects - List of mini projects that involve web scraping.
-
-
π’ Mathematics
-
Tools
- Awesome Math - A curated list of mathematics resources, books, and online courses.
- 3Blue1Brown - Visual explanations of mathematical concepts through animated videos.
- MML Bool - Comprehensive resource for mathematics in machine learning.
-
-
π Awesome Data Science Repositories
- OSSU Data Science - Open Source Society University's self-study path.
- Data Science Best Resources - Carefully curated links for data science resources in one place.
- Awesome Data Science - A curated list of courses, books, tools, and resources for data science.
- Data Science for Beginners - Microsoft's data science curriculum.
- Data Science Articles from CodeCut - A collection of articles, videos, and code related to data science.
- Data Science Using Python - Resources for data analysis using Python.
-
π Dashboards
-
Resources
- Awesome Dashboards - A collection of outstanding dashboard and visualization resources.
- Best of Streamlit - Showcase of community-built Streamlit applications.
- Awesome Dashboards - Comprehensive resources for Dash users.
- Awesome Panel - Resources and support for Panel users.
- Dash Enterprise Samples - Production-ready Dash apps.
- Plotly Dash Tutorial - Tutorial for learning Plotly Dash.
- geeksforgeeks - Tableau Tutorial - Comprehensive tutorial on Tableau.
- geeksforgeeks - Power BI Tutorial - Detailed tutorial on Power BI.
- DashTools - Command line tools for Dash applications.
- Plotly Dash Tutorial - Tutorial for learning Plotly Dash.
- geeksforgeeks - Tableau Tutorial - Comprehensive tutorial on Tableau.
- geeksforgeeks - Power BI Tutorial - Detailed tutorial on Power BI.
- DashTools - Command line tools for Dash applications.
-
Tools
-
Software
- Microsoft Power BI - Business analytics tool for visualizing data.
- QlikView - Tool for data visualization and business intelligence.
- Redash - Tool for visualizing and sharing data insights.
- Rath - Next-generation automated data exploratory analysis and visualization platform.
-
-
π Natural Language Processing (NLP)
-
Resources
- NLP in Python with Deep Learning - A resource for learning NLP with deep learning.
- Awesome Nlp - A ranked list of awesome Python libraries for natural language processing (NLP).
- Hands on NLTK Tutorial - The hands-on NLTK tutorial for NLP in Python.
- NLP in Python with Deep Learning - A resource for learning NLP with deep learning.
- Awesome Nlp - A ranked list of awesome Python libraries for natural language processing (NLP).
- NLTK Book - Natural Language Processing with Python.
- Hands on NLTK Tutorial - The hands-on NLTK tutorial for NLP in Python.
- NLTK Book - Natural Language Processing with Python.
-
Tools
- Natural Language Toolkit (NLTK) - A leading platform for building Python programs to work with human language data.
- TextBlob - A simple library for processing textual data.
- SpaCy - An open-source software library for advanced NLP in Python.
- TextRank - A library for TextRank algorithm implementation.
- Natural Language Toolkit (NLTK) - A leading platform for building Python programs to work with human language data.
- TextBlob - A simple library for processing textual data.
- SpaCy - An open-source software library for advanced NLP in Python.
- TextRank - A library for TextRank algorithm implementation.
- Flair - A simple framework for state-of-the-art NLP.
- BERT - A transformer-based model for NLP tasks.
- Transformers - A library for state-of-the-art NLP models.
- Flair - A simple framework for state-of-the-art NLP.
- BERT - A transformer-based model for NLP tasks.
- Transformers - A library for state-of-the-art NLP models.
-
-
π’ Mathematics, Statistics & Probability
-
Mathematics
- Stats Maths with Python - Collection of Python scripts and notebooks for statistics and mathematics.
- Hackermath - Resource for learning statistics and mathematics for data science.
- Stats Maths with Python - Collection of Python scripts and notebooks for statistics and mathematics.
- Hackermath - Resource for learning statistics and mathematics for data science.
- ML Bool - Comprehensive resource for mathematics in machine learning.
- ML foundations - Focus on calculus and optimization techniques for ML.
- Khan Academy - Math for Data Science - Free online courses covering various math topics.
- ML foundations - Focus on calculus and optimization techniques for ML.
- Khan Academy - Math for Data Science - Free online courses covering various math topics.
- Towards Data Science - Math Section - Articles and resources on mathematics for data science.
- Fast.ai - Computational Linear Algebra - Resource for learning linear algebra computationally.
- Towards Data Science - Math Section - Articles and resources on mathematics for data science.
- Fast.ai - Computational Linear Algebra - Resource for learning linear algebra computationally.
- Towards Data Science - Math Section - Articles and resources on mathematics for data science.
-
Programming Languages
Categories
π Python
212
ποΈ SQL & Databases
56
π Data Visualization
51
πΈοΈ Web Scraping & Crawling
45
π Dashboards & BI
25
π Natural Language Processing (NLP)
22
π Dashboards
21
π’ Mathematics, Statistics & Probability
14
πΊοΈ Roadmaps
13
π Awesome Data Science Repositories
6
π’ Mathematics
3
Sub Categories
Keywords
python
182
data-science
64
machine-learning
63
pandas
52
data-analysis
39
visualization
30
data-visualization
30
sql
29
python3
25
statistics
21
numpy
21
database
21
nlp
20
deep-learning
18
scikit-learn
16
awesome-list
15
analytics
15
natural-language-processing
14
awesome
14
data
13
plotly
13
jupyter-notebook
13
matplotlib
13
scraping
12
tableau
12
postgresql
11
automation
11
jupyter
11
dataframe
11
python-library
11
crawler
10
plotly-dash
10
web-scraping
10
flask
10
mysql
9
mathematics
9
exploratory-data-analysis
9
eda
9
markdown
8
data-exploration
8
webscraping
8
dashboard
8
pandas-dataframe
8
time-series
8
sqlite
8
testing
8
ai
7
finance
7
distributed
7
mongodb
7