An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/hazyresearch/meerkat

Creative interactive views of any dataset.

data-science foundation-models machine-learning ml pandas

Last synced: 15 May 2025

https://github.com/dswah/pyGAM

[HELP REQUESTED] Generalized Additive Models in Python

data-science gams interpretable-machine-learning machine-learning python scientific-computing

Last synced: 27 Mar 2025

https://github.com/thealgorithms/jupyter

The repository contains script and notebook related to Statistics, Machine learning, Neural network, Deep learning, NLP, Numerical methods, and Automation.

algorithms data-science data-structures deep-learning hacktoberfest machine-learning neural-network

Last synced: 16 May 2025

https://github.com/HazyResearch/meerkat

Creative interactive views of any dataset.

data-science foundation-models machine-learning ml pandas

Last synced: 26 Mar 2025

https://github.com/opengeos/geoai

GeoAI: Artificial Intelligence for Geospatial Data

ai data-science geoai geopython geospatial jupyter python

Last synced: 26 Apr 2025

https://github.com/TheAlgorithms/Jupyter

The repository contains script and notebook related to Statistics, Machine learning, Neural network, Deep learning, NLP, Numerical methods, and Automation.

algorithms data-science data-structures deep-learning hacktoberfest machine-learning neural-network

Last synced: 04 May 2025

https://github.com/aloctavodia/statistical-rethinking-with-python-and-pymc3

Python/PyMC3 port of the examples in " Statistical Rethinking A Bayesian Course with Examples in R and Stan" by Richard McElreath

bayesian-data-analysis data-science pymc python statistics

Last synced: 12 Apr 2025

https://github.com/aloctavodia/Statistical-Rethinking-with-Python-and-PyMC3

Python/PyMC3 port of the examples in " Statistical Rethinking A Bayesian Course with Examples in R and Stan" by Richard McElreath

bayesian-data-analysis data-science pymc python statistics

Last synced: 27 Nov 2024

https://github.com/ChawlaAvi/Daily-Dose-of-Data-Science

A collection of code snippets from the publication Daily Dose of Data Science on Substack: http://www.dailydoseofds.com/

data-analysis data-science data-science-tips data-visualization jupyter jupyter-notebook jupyter-tips matplotlib matplotlib-tips numpy pandas pandas-tips python python-tips sklearn

Last synced: 23 Jan 2025

https://github.com/janpfeifer/gonb

GoNB, a Go Notebook Kernel for Jupyter

data-science go golang gonb jupyter jupyter-notebook jupyter-notebook-kernel

Last synced: 14 May 2025

https://github.com/pymc-labs/pymc-marketing

Bayesian marketing toolbox in PyMC. Media Mix (MMM), customer lifetime value (CLV), buy-till-you-die (BTYD) models and more.

btyd buy-till-you-die clv customer-lifetime-value data-science marketing marketing-mix-modeling media-mix-modeling mmm python

Last synced: 26 Mar 2025

https://github.com/enkidevs/curriculum

๐Ÿ‘ฉโ€๐Ÿซ ๐Ÿ‘จโ€๐Ÿซ The open-source curriculum of Enki!

ai algorithms blockchain chatgpt computer-science css curriculum data-science education enki git gpt4 html java javascript learn-to-code linux python security sql

Last synced: 15 May 2025

https://github.com/mrankitgupta/data-analyst-roadmap

I am sharing my Journey of 66DaysofData into Data Analytics by participating in Ken Jee's #66daysofdata challenge

ankit ankit-gupta ankitgupta data-analysis data-analytics data-science data-structures data-visualization excel mongodb mysql pandas powerbi python sql sql-server tableau

Last synced: 13 Apr 2025

https://github.com/JosephLai241/URS

Universal Reddit Scraper - A comprehensive Reddit scraping/archival command-line tool.

archiving command-line comments csv data-analysis data-science json livestream osint-tool praw pyo3 python reddit reddit-scraper redditor rust scraper subreddit trees wordcloud

Last synced: 24 Mar 2025

https://github.com/kuwala-io/kuwala

Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times

admin-boundaries data data-integration data-science dbt elt google-trends jupyter kuwala no-code open-data open-source population postgres pyspark python react react-flow scraping spatial-analysis

Last synced: 30 Mar 2025

https://github.com/scrapinghub/python-crfsuite

A python binding for crfsuite

crf crfsuite data-science

Last synced: 14 May 2025

https://github.com/biomedsciai/causallib

A Python package for modular causal inference analysis and model evaluations

causal causal-inference causal-models causality data-science machine-learning ml

Last synced: 14 May 2025

https://github.com/glue-viz/glue

Linked Data Visualizations Across Multiple Files

data-science linked-data python visualization

Last synced: 14 May 2025

https://github.com/pm4py/pm4py-source

Public repository for the PM4Py (Process Mining for Python) project.

data-mining data-science machine-learning processmining python

Last synced: 07 Jan 2025

https://github.com/ipython-books/cookbook-2nd-code

Code of the IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018 [read-only repository]

computing data-analysis data-mining data-science data-visualization ipython jupyter jupyter-notebook machine-learning numerical-computation python visualization

Last synced: 12 Apr 2025

https://github.com/target/matrixprofile-ts

A Python library for detecting patterns and anomalies in massive datasets using the Matrix Profile

data-science matrix-profile motif motif-discovery pip pip3 pypi pypi-packages python python3 time-series timeseries-analysis timeseries-segmentation

Last synced: 26 Mar 2025

https://github.com/williamFalcon/test-tube

Python library to easily log experiments and parallelize hyperparameter search for neural networks

caffe caffe2 chainer data-science deep-learning grid-search hyperparameter-optimization keras machine-learning neural-networks pytorch random-search tensorflow

Last synced: 27 Mar 2025

https://github.com/williamfalcon/test-tube

Python library to easily log experiments and parallelize hyperparameter search for neural networks

caffe caffe2 chainer data-science deep-learning grid-search hyperparameter-optimization keras machine-learning neural-networks pytorch random-search tensorflow

Last synced: 04 Apr 2025

https://github.com/akfamily/aktools

AKTools is an elegant and simple HTTP API library for AKShare, built for AKSharers!

akshare asyncio data data-science fastapi openapi pydanti

Last synced: 14 May 2025

https://github.com/arvkevi/kneed

Knee point detection in Python :chart_with_upwards_trend:

data-analysis data-science elbow-method knee-point python scientific-computing systems

Last synced: 23 Mar 2025

https://github.com/iterative/mlem

๐Ÿถ A tool to package, serve, and deploy any ML model on any platform. Archived to be resurrected one day๐Ÿคž

cli data-science deployment developer-tools git machine-learning mlem model-registry python

Last synced: 26 Mar 2025

https://github.com/erikaduan/r_tips

A repository of R usage tips for data cleaning, data mining, data visualisation, statistical inference and machine learning

data-science data-visualization machine-learning r rstats statistics

Last synced: 04 Dec 2024

https://github.com/pdpipe/pdpipe

Easy pipelines for pandas DataFrames.

data data-science dataframe dataframes pandas pandas-dataframe pipeline

Last synced: 17 Apr 2025

https://github.com/ashishpatel26/amazing-feature-engineering

Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.

data-analysis data-mining data-science data-scientists data-visualization deep-learning feature-engineering feature-extraction feature-scaling feature-selection features machine-learning scikit-learn

Last synced: 16 May 2025

https://github.com/nicolaskruchten/jupyter_pivottablejs

Dragโ€™nโ€™drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js

data-analysis data-science interactive jupyter-notebook pivot-chart pivot-tables

Last synced: 15 May 2025

https://github.com/ashishpatel26/Amazing-Feature-Engineering

Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.

data-analysis data-mining data-science data-scientists data-visualization deep-learning feature-engineering feature-extraction feature-scaling feature-selection features machine-learning scikit-learn

Last synced: 10 Apr 2025

https://github.com/TrainingByPackt/Data-Science-Projects-with-Python

A Case Study Approach to Successful Data Science Projects Using Python, Pandas, and Scikit-Learn

data-science machine-learning numpy pandas pandas-dataframe python scikit-learn

Last synced: 14 Apr 2025

https://github.com/trainingbypackt/data-science-projects-with-python

A Case Study Approach to Successful Data Science Projects Using Python, Pandas, and Scikit-Learn

data-science machine-learning numpy pandas pandas-dataframe python scikit-learn

Last synced: 04 Apr 2025

https://github.com/litaotao/ipython-dashboard

A stand alone, light-weight web server for building, sharing graphs created in ipython. Build for data science, data analysis guys. Aiming at building an interactive visualization, collaborated dashboard, and real-time streaming graph.

dashboard data-science ipython ipython-dashboard notebook visualization

Last synced: 16 May 2025

https://github.com/litaotao/IPython-Dashboard

A stand alone, light-weight web server for building, sharing graphs created in ipython. Build for data science, data analysis guys. Aiming at building an interactive visualization, collaborated dashboard, and real-time streaming graph.

dashboard data-science ipython ipython-dashboard notebook visualization

Last synced: 07 Dec 2024

https://github.com/BiomedSciAI/causallib

A Python package for modular causal inference analysis and model evaluations

causal causal-inference causal-models causality data-science machine-learning ml

Last synced: 27 Mar 2025

https://github.com/odpi/opends4all

OpenDS4All project, hosted by LF AI & Data

data-science jupyter-notebooks materials

Last synced: 23 Feb 2025

https://github.com/github/codespaces-jupyter

Explore machine learning and data science with Codespaces

codespaces data-science jupyter-notebook machine-learning

Last synced: 11 Apr 2025

https://github.com/faktionai/awesome-ai-usecases

A list of awesome and proven Artificial Intelligence use cases and applications

data-science machine-learning

Last synced: 14 Mar 2025

https://github.com/fastai/fastai2

Temporary home for fastai v2 while it's being developed

data-science deep-learning fastai jupyter machine-learning nbdev python pytorch

Last synced: 27 Nov 2024

https://github.com/aeturrell/coding-for-economists

This repository hosts the code behind the online book, Coding for Economists.

book data-science econometrics economics economics-models jupyter-notebook learning python research vscode

Last synced: 12 Apr 2025

https://github.com/Kotlin/kandy

Kotlin plotting library.

data-science graphics jupyter-notebooks kotlin plot

Last synced: 12 Apr 2025

https://github.com/blue-yonder/turbodbc

Turbodbc is a Python module to access relational databases via the Open Database Connectivity (ODBC) interface. The module complies with the Python Database API Specification 2.0.

data-science database exasol numpy odbc pep249 pyodbc python python-database-api speedup

Last synced: 14 May 2025

https://github.com/sforaidl/kd_lib

A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.

algorithm-implementations benchmarking data-science deep-learning-library knowledge-distillation machine-learning model-compression pruning pytorch quantization

Last synced: 16 May 2025

https://github.com/cerndb/dist-keras

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

apache-spark data-parallelism data-science deep-learning distributed-optimizers hadoop keras machine-learning optimization-algorithms tensorflow

Last synced: 22 Jan 2025

https://github.com/rstojnic/lazydata

Lazydata: Scalable data dependencies for Python projects

data-science datamanagement machine-learning python

Last synced: 26 Mar 2025

https://github.com/squarespace/datasheets

Read data from, write data to, and modify the formatting of Google Sheets

data data-analytics data-science dataframe google pandas python

Last synced: 16 May 2025

https://github.com/farukalamai/advanced-machine-learning-engineer-roadmap-2024

A Full Stack ML (Machine Learning) Roadmap involves learning the necessary skills and technologies to become proficient in all aspects of machine learning, including data collection and preprocessing, model development, deployment, and maintenance.

aws computer-vision data-analysis data-science data-visualization deep-learning git-github machine-learning machine-learning-roadmap mlops natural-language-processing neural-network nlp opencv pandas python pytorch statistics tensorflow yolo

Last synced: 04 Apr 2025

https://github.com/erezsh/preql

An interpreted relational query language that compiles to SQL.

data-science database python query sql

Last synced: 16 May 2025

https://github.com/Squarespace/datasheets

Read data from, write data to, and modify the formatting of Google Sheets

data data-analytics data-science dataframe google pandas python

Last synced: 15 Mar 2025

https://github.com/chris-greening/instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

beginner-friendly data-mining data-science instagram instagram-data instagram-scraper lightweight python python-scraper python3 webscraping

Last synced: 07 Apr 2025

https://github.com/juliastats/glm.jl

Generalized linear models in Julia

data-science glm julia regression statistical-models statistics

Last synced: 14 May 2025

https://github.com/gesistsa/rio

๐ŸŸ A Swiss-Army Knife for Data I/O

cran csv csvy data data-science excel io r rio sas spss stata

Last synced: 14 May 2025

https://github.com/erezsh/Preql

An interpreted relational query language that compiles to SQL.

data-science database python query sql

Last synced: 26 Mar 2025

https://github.com/rnorm/book_sample

another book on data science

book data-science python r

Last synced: 27 Nov 2024

https://github.com/fabsig/gpboost

Combining tree-boosting with Gaussian process and mixed effects models

artificial-intelligence boosting cpp data-science gaussian-processes machine-learning mixed-effects python r

Last synced: 14 May 2025

https://github.com/jadianes/data-science-your-way

Ways of doing Data Science Engineering and Machine Learning in R and Python

data-frame data-science data-science-engineering exploratory-data-analysis jupyter machine-learning notebook python r tutorial

Last synced: 04 Apr 2025