An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/winvector/data_algebra

Codd method-chained SQL generator and Pandas data processing in Python.

data-analysis data-science pandas python

Last synced: 09 Apr 2025

https://github.com/crowdcent/numerblox

Solid Numerai pipelines

data-science mlops numerai

Last synced: 14 Jan 2026

https://github.com/autoviml/pandas_dq

Find data quality issues and clean your data in a single line of code with a Scikit-Learn compatible Transformer.

data data-science dataquality dataqualitycheck machine-learning pandas python scikit-learn

Last synced: 07 Apr 2025

https://github.com/codait/max-central-repo

Central Repository of Model Asset Exchange project. This repository contains information about the available models, current project status, contribution guidelines and supporting assets.

cloud codait data-science deep-learning ibm-developer kubernetes model-asset-exchange node-red-flow openshift trainable-models watson-machine-learning watson-st

Last synced: 06 May 2025

https://github.com/ieshreya/data-science-resources

Free self-taught educational resources for Data Science! I'm currently learning Data Science. I build this repository for helping myself. But if it helps you anyhow, feel free to star it!

computer-science data-science python resources

Last synced: 03 Oct 2025

https://github.com/diffusionkinetics/open

DiffusionKinetics open-source monorepo

data-science haskell

Last synced: 06 Apr 2025

https://github.com/outerbounds/dsbook

Code samples for the Effective Data Science Infrastructure book

data-science infrastructure machine-learning

Last synced: 22 Apr 2025

https://github.com/felixriese/susi

SuSi: Python package for unsupervised, supervised and semi-supervised self-organizing maps (SOM)

data-science machine-learning opensource pypi-package python self-organizing-map semi-supervised-learning som sphinx-doc supervised-learning unsupervised-learning

Last synced: 21 Feb 2026

https://github.com/lawmurray/birch

A probabilistic programming language that combines automatic differentiation, automatic marginalization, and automatic conditioning within Monte Carlo methods.

autodiff bayesian bayesian-inference bayesian-methods bayesian-statistics data-science machine-learning machine-learning-algorithms machine-learning-projects monte-carlo-methods monte-carlo-sampling probabilistic-programming-languages statistics

Last synced: 05 Apr 2025

https://github.com/lawmurray/Birch

A probabilistic programming language that combines automatic differentiation, automatic marginalization, and automatic conditioning within Monte Carlo methods.

autodiff bayesian bayesian-inference bayesian-methods bayesian-statistics data-science machine-learning machine-learning-algorithms machine-learning-projects monte-carlo-methods monte-carlo-sampling probabilistic-programming-languages statistics

Last synced: 26 Mar 2025

https://github.com/DataKitchen/data-observability-installer

Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.

data data-engineering data-observability data-profiling data-quality data-reliability data-science datachecker datacleaner datacleaning dataops dataquality datatesting datavalidation mssql pipeline-tests postgresql redshift self-hosted snowflake

Last synced: 05 May 2025

https://github.com/LankyCyril/pyvenn

Python module for plotting Venn diagrams of 2..6 sets

data-science matplotlib matplotlib-venn venn venn-diagram venndiagram visualization

Last synced: 08 May 2025

https://github.com/ColtAllen/btyd

Buy Till You Die and Customer Lifetime Value statistical models in Python.

bayesian buy-til-you-die customer-lifetime-value data-science python

Last synced: 06 May 2025

https://github.com/broadinstitute/depmap_omics

What you need to process the Quarterly DepMap-Omics releases from Terra

cancer-genomics cloud-computing data-science depmap

Last synced: 17 Mar 2026

https://github.com/tulip-lab/sit742

SIT742: Modern Data Science

data-science jupyter-notebook python tuliplab

Last synced: 21 Feb 2026

https://github.com/ujjwalkarn/xda

R package for exploratory data analysis

data-analysis data-science exploratory-data-analysis r

Last synced: 26 Apr 2025

https://github.com/mybridge/learn-machine-learning

Learn to Build a Machine Learning Application from Top Articles

computer-vision data-science deep-learning machine-learning neural-networks

Last synced: 27 Jan 2026

https://github.com/jovianhq/jovian-py

Collaboration platform for data science projects & Jupyter notebooks

data-science deep-learning jupyter-notebook machine-learning ml

Last synced: 20 Aug 2025

https://github.com/alan-turing-institute/environmental-ds-book

A computational notebook community for open environmental data science ๐ŸŒŽ

climate-change community-project data-science ecosystem-modeling environmental-monitoring

Last synced: 02 Mar 2025

https://github.com/firmai/business-analytics-and-mathematics-python-book

Advanced Business Analytics and Mathematics with Python (by @firmai)

analytics business data-analysis data-science mathematics python

Last synced: 06 May 2025

https://github.com/JovianHQ/jovian-py

Collaboration platform for data science projects & Jupyter notebooks

data-science deep-learning jupyter-notebook machine-learning ml

Last synced: 28 Oct 2025

https://github.com/jayantgoel001/jayantgoel001

JayantGoel001's profile with 111 stars โญ and 110 forks ๐ŸŽ‰.

android data-science devops git github mean-stack portfolio profile readme web-development

Last synced: 04 Apr 2025

https://github.com/datakitchen/data-observability-installer

Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.

data data-engineering data-observability data-profiling data-quality data-reliability data-science datachecker datacleaner datacleaning dataops dataquality datatesting datavalidation mssql pipeline-tests postgresql redshift self-hosted snowflake

Last synced: 06 Apr 2026

https://github.com/lsys/forestplot

A Python package to make publication-ready but customizable coefficient plots.

coefficientplot data-science data-visualization dataviz forestplot matplotlib python visualization

Last synced: 07 Apr 2025

https://github.com/weijie-chen/basic-statistics-with-python

Introduction to statistics featuring Python. This series of lecture notes aim to walk you through all basic concepts of statistics, such as descriptive statistics, parameter estimations, hypothesis testing, ANOVA and etc. All codes are straightforward to understand.

anova-analysis data-science descriptive-statistics econometrics frequentist-statistics hypothesis-testing mathematics parameter-estimation probability

Last synced: 14 Aug 2025

https://github.com/imsanjoykb/data-science-regular-bootcamp

Regular practice on Data Science, Machien Learning, Deep Learning, Solving ML Project problem, Analytical Issue. Regular boost up my knowledge. The goal is to help learner with learning resource on Data Science filed.

artificial-intelligence data-analysis data-science data-science-notebook data-science-projects data-visualization database-connection deep-learning etl-pipeline etl-process feature-engineering machine-learning mysql-database neural-network numpy pandas postgresql python python-automation sqlite

Last synced: 30 Oct 2025

https://github.com/innat/ML-Resource

A concise resource repository for machine learning

data-analysis data-science deep-learning kaggle machine-learning python spark

Last synced: 29 Apr 2025

https://github.com/DataWithBaraa/sql-data-analytics-project

This repository contains a collection of SQL scripts demonstrating various analytical techniques, such as changes over time, cumulative, performance, data segmentation, part-to-whole analysis.

analytics business-analytics business-intelligence data data-analysis data-analyst data-analytics data-engineering data-science data-scientist database datascience query reporting sql sql-queries sql-query sql-server window-functions window-functions-in-sql

Last synced: 14 Oct 2025

https://github.com/clipperhouse/jargon

Tokenizers and lemmatizers for Go

data-science go lemmatizer nlp tokenizer

Last synced: 09 Apr 2025

https://github.com/thomasnield/oreilly_reactive_python_for_data

Resources for the O'Reilly online video "Reactive Python for Data"

data-science database python reactivex rxpy sqlalchemy tweepy twitter

Last synced: 27 Mar 2025

https://github.com/sb-ai-lab/hypex

Fast and customizable framework for automatic and quick Causal Inference in Python

ab-testing causal-inference causalinference data-science faiss kaggle matching python statistics

Last synced: 04 Apr 2025

https://github.com/NicholasMamo/multiplex-plot

Multiplex: visualizations that tell storiesโ€”A Python library to create and annotate beautiful network graph visualizations, text visualizations and more.

data-science data-visualisation graph-visualization graphs information-retrieval matplotlib natural-language-processing network-visualization python text-mining text-visualisation text-visualization visualisation visualizations viz vizualisation

Last synced: 18 Jul 2025

https://github.com/nicholasmamo/multiplex-plot

Multiplex: visualizations that tell storiesโ€”A Python library to create and annotate beautiful network graph visualizations, text visualizations and more.

data-science data-visualisation graph-visualization graphs information-retrieval matplotlib natural-language-processing network-visualization python text-mining text-visualisation text-visualization visualisation visualizations viz vizualisation

Last synced: 19 Aug 2025

https://github.com/scottshambaugh/monaco

Quantify uncertainty and sensitivities in your computer models with an industry-grade Monte Carlo library.

data-science monaco monte-carlo python scientific-computing sensitivity-analysis simulation statistics uncertainty-analysis uncertainty-quantification

Last synced: 03 Apr 2026

https://github.com/takuti/flurs

:ocean: FluRS: A Python library for streaming recommendation algorithms

data-science factorization-machines machine-learning matrix-factorization python recommender-system

Last synced: 18 Aug 2025

https://github.com/scrapinghub/mdr

A python library detect and extract listing data from HTML page.

data-science

Last synced: 14 Dec 2025

https://github.com/null8626/python-weather

A free and asynchronous weather API wrapper made in python, for python.

data-science forecast python weather weather-api weather-forecast

Last synced: 23 Jan 2026

https://github.com/senderle/topic-modeling-tool

A point-and-click tool for creating and analyzing topic models produced by MALLET.

data-science digital-humanities mallet text-analytics topic-modeling

Last synced: 21 Jan 2026

https://github.com/olow304/data-science-machine-learning

The overall objective of this toolkit is to provide and offer a free collection of data analysis and machine learning that is specifically suited for doing data science. Its purpose is to get you started in a matter of minutes. You can run this collections either in Jupyter notebook or python alone.

all best-practices cheatsheet cheatsheets data-science data-science-toolkit deep-learning jupyter-notebook machine-learning machine-learning-algorithms machine-learning-tutorials matplotlib mindmap numpy pandas popular-posts python roadmap sklearn toolkit

Last synced: 24 Oct 2025

https://github.com/AlexIoannides/pymc-example-project

Example PyMC3 project for performing Bayesian data analysis using a probabilistic programming approach to machine learning.

bayesian-data-analysis bayesian-inference data-science machine-learning numpy pandas probabilistic-programming pymc3 python scikit-learn

Last synced: 19 Jul 2025

https://github.com/formlio/forml

ForML - A development framework and MLOps platform for the lifecycle management of data science projects

ai data-science machine-learning ml mlops portability python reproducibility

Last synced: 08 May 2025

https://github.com/mc2-project/secure-xgboost

Secure collaborative training and inference for XGBoost.

collaborative-learning data-science enclave machine-learning privacy security xgboost

Last synced: 17 Jan 2026

https://github.com/juliaml/tabletransforms.jl

Transforms and pipelines with tabular data in Julia

data-science machine-learning pipelines statistics table transforms

Last synced: 05 May 2026

https://github.com/aershov24/machine-learning-ds-interview-questions

๐Ÿ”ด 1704 Machine Learning, Data Science & Python Interview Questions (ANSWERED) To Kill Your Next ML & DS Interview. Get All Answers + PDFs on MLStack.Cafe. Post your ML Jobs ๐Ÿ‘‰

algorithms-and-data-structures data-analysis data-science interview-practice interview-preparation interview-questions machine-learning machine-learning-algorithms machinelearning

Last synced: 17 Aug 2025

https://github.com/contextlab/storytelling-with-data

Course materials for Dartmouth Course: Storytelling with Data (PSYC 81.09).

course-materials data-science data-stories python tutorials

Last synced: 20 Aug 2025

https://github.com/alexioannides/pymc-example-project

Example PyMC3 project for performing Bayesian data analysis using a probabilistic programming approach to machine learning.

bayesian-data-analysis bayesian-inference data-science machine-learning numpy pandas probabilistic-programming pymc3 python scikit-learn

Last synced: 05 Jul 2025

https://github.com/nischalshrestha/Unravel

A fluent code explorer for R. ๐Ÿ”

data-science datawrangling dplyr r rstats shiny tidyr tidyverse

Last synced: 29 Jul 2025

https://github.com/dssg/MLforPublicPolicy

Class resources for CAPP 30254 (Machine Learning for Public Policy)

data-science machine-learning public-policy

Last synced: 15 Mar 2025

https://github.com/tidypyverse/tidypandas

A grammar of data manipulation for pandas inspired by tidyverse

data-analysis data-science dataframe dataframe-library dplyr pandas python tidyverse

Last synced: 13 Mar 2026

https://github.com/OpenSTEF/openstef

Automated Machine Learning pipelines. Builds the Open Short Term Energy Forecasting package.

data-science energy energy-forecasting forecasting machine-learning python time-series

Last synced: 07 May 2025

https://github.com/danmorales/cursods_profdanilo

Cรณdigos Python com diferentes aplicaรงรตes como tรฉcnicas de machine learning e deep learning, fundamentos de estatรญstica, problemas de regressรฃo de classificaรงรฃo. Os vรญdeos com as explicaรงรตes teรณricas estรฃo disponรญveis no meu canal do YouTube

aprendizado-de-maquina ciencia-de-dados data-science deep-learning keras-classification-models keras-layer keras-models keras-neural-networks machine-learning machine-learning-algorithms numpy pandas-dataframe pandas-python scikit-learn scikitlearn-machine-learning scipy tensorflow tensorflow-tutorials

Last synced: 14 Jul 2025

https://github.com/tlverse/sl3

๐Ÿ’ช ๐Ÿค” Modern Super Learning with Machine Learning Pipelines

data-science ensemble-learning ensemble-model machine-learning model-selection r r-package regression stacking statistics

Last synced: 19 Feb 2026

https://github.com/ideos/gloe

A general-purpose library designed to guide developers in expressing their code as a flow.

clean-code data-science flow functional-programming machine-learning python typing

Last synced: 26 Mar 2025

https://github.com/santiment/sanpy

Santiment API Python Client

blockchain data-science machine-learning numpy pandas python

Last synced: 16 May 2025

https://github.com/target/data-validator

A tool to validate data, built around Apache Spark.

data-science data-validation hacktoberfest

Last synced: 13 May 2025

https://github.com/TexteaInc/funix

Building web apps without manually creating widgets

app-builder data-science frontend machine-learning

Last synced: 29 Jul 2025

https://github.com/jay-johnson/sci-pype

A Machine Learning API with native redis caching and export + import using S3. Analyze entire datasets using an API for building, training, testing, analyzing, extracting, importing, and archiving. This repository can run from a docker container or from the repository.

data-science devops-for-data-science docker docker-compose ipython ipython-notebook jupyter jupyter-notebook jupyter-themes machine-learning machine-learning-api predictive python red10 redis s3 seaborn stock-price-prediction xgb xgboost

Last synced: 14 Jul 2025

https://github.com/oracle/macest

Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

confidence-estimation data-science machine-learning python

Last synced: 19 Aug 2025

https://github.com/facultyai/lens

Summarise and explore Pandas DataFrames

dask data-exploration data-science data-visualisation dataframe pandas

Last synced: 14 Apr 2025

https://github.com/autogluon/autogluon-assistant

ML Assistant for Competitive Machine Learning

automl data-science llms machine-learning

Last synced: 01 Apr 2026

https://github.com/php-ai/php-mlx

PHP-MLX (php-ml next generation) - Machine Learning library for PHP

artificial-intelligence data-science dataset feature-extraction machine-learning supervised-learning unsupervised-learning

Last synced: 17 Jun 2026

https://github.com/yamalight/litlytics

๐Ÿ”ฅ LitLytics - an affordable, simple analytics platform that leverages LLMs to automate data analysis

analytics data-science llm

Last synced: 06 Apr 2025

https://github.com/talegari/tidypandas

A grammar of data manipulation for pandas inspired by tidyverse

data-analysis data-science dataframe dataframe-library dplyr pandas python tidyverse

Last synced: 03 Mar 2025