An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/alishobeiri/thread

AI-powered Jupyter Notebook โ€” use local AI to generate and edit code cells, automatically fix errors, and chat with your data

ai analysis analytics data-science jupyter jupyter-notebook jupyter-notebooks jupyterhub jupyterlab ollama python react reactjs

Last synced: 14 May 2025

https://github.com/novak-99/mlpp

A library created to revitalize C++ as a machine learning front end. Per aspera ad astra.

cpp data-science deep-learning machine-learning

Last synced: 23 Oct 2025

https://github.com/novak-99/MLPP

A library created to revitalize C++ as a machine learning front end. Per aspera ad astra.

cpp data-science deep-learning machine-learning

Last synced: 20 Mar 2025

https://github.com/mrkn/pycall.rb

Calling Python functions from the Ruby language

data-science pycall python ruby rubydatascience rubyml

Last synced: 13 Apr 2025

https://github.com/datumbox/datumbox-framework

Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.

big-data data-science java machine-learning nlp statistics

Last synced: 15 May 2025

https://github.com/TeoMeWhy/teomerefs

Guia de referรชncias tรฉcnicas para carreira em dados

data data-science machine-learning python

Last synced: 25 Mar 2025

https://github.com/makcedward/nlp

:memo: This repository recorded my NLP journey.

ai data-science deep-learning machine-learning nlp

Last synced: 12 Apr 2025

https://github.com/fraunhoferportugal/tsfel

An intuitive library to extract features from time series.

classification colab-notebook data-science feature-engineering feature-extraction time-series

Last synced: 05 Apr 2026

https://github.com/rhiever/datacleaner

A Python tool that automatically cleans data sets and readies them for analysis.

automation data-science machine-learning python

Last synced: 15 May 2025

https://github.com/egbertbouman/youtube-comment-downloader

Simple script for downloading Youtube comments without using the Youtube API

data-science data-scraper python youtube youtube-comments

Last synced: 15 May 2025

https://github.com/mlr-org/mlr3

mlr3: Machine Learning in R - next generation

classification data-science machine-learning mlr3 r r-package regression

Last synced: 06 Oct 2025

https://github.com/dataquestio/project-walkthroughs

Data science, machine learning, and web development project code for https://www.youtube.com/c/Dataquestio .

data-science machine-learning pandas python

Last synced: 08 Apr 2025

https://github.com/maxpumperla/deep_learning_and_the_game_of_go

Code and other material for the book "Deep Learning and the Game of Go"

alphago alphago-zero data-science deep-learning game-of-go games machine-learning neural-networks python

Last synced: 15 May 2025

https://github.com/firmai/data-science-career

Career Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository

analytics big-data business-analytics business-intelligence career data-science machine-learning resources

Last synced: 17 Feb 2026

https://github.com/dssg/hitchhikers-guide

The Hitchhiker's Guide to Data Science for Social Good

data-science dssg machine-learning training tutorial-exercises

Last synced: 21 Jan 2026

https://github.com/sematic-ai/sematic

An open-source ML pipeline development platform

ai data-science machine-learning ml ml-ops ml-pipeline ml-pipelines mlops pipeline python python3

Last synced: 14 May 2025

https://github.com/tidyverse/datascience-box

Data Science Course in a Box

data-science education r rstats teaching

Last synced: 15 May 2025

https://github.com/rstudio-education/datascience-box

Data Science Course in a Box

data-science education r rstats teaching

Last synced: 26 Mar 2025

https://github.com/webartifex/intro-to-python

An intro to Python & programming for wanna-be data scientists

data-science introduction-to-programming jupyter python tutorial

Last synced: 04 Mar 2026

https://github.com/mybridge/machine-learning-open-source

Monthly Series - Machine Learning Top 10 Open Source Projects

ai algorithm artificial-intelligence data-science machine-learning neural-network

Last synced: 23 Jan 2026

https://github.com/Mybridge/machine-learning-open-source

Monthly Series - Machine Learning Top 10 Open Source Projects

ai algorithm artificial-intelligence data-science machine-learning neural-network

Last synced: 22 Mar 2025

https://github.com/WenjieDu/PyPOTS

A Python toolkit/library for reality-centric machine/deep learning and data mining on partially-observed time series, including SOTA neural network models for scientific analysis tasks of imputation/classification/clustering/forecasting/anomaly detection/cleaning on incomplete industrial (irregularly-sampled) multivariate TS with NaN missing values

classification clustering data-mining data-science deep-learning forecasting healthcare imputation incomplete industrial interpolation machine-learning missing-values missingness neural-network partially-observed-time-series pytorch science-research time-series time-series-analysis

Last synced: 01 Apr 2025

https://github.com/grailbio/reflow

A language and runtime for distributed, incremental data processing in the cloud

analysis-pipeline aws bioinformatics-pipeline cloud-computing data-science golang language runtime scientific-computing

Last synced: 15 Mar 2025

https://github.com/caserec/Datasets-for-Recommender-Systems

This is a repository of a topic-centric public data sources in high quality for Recommender Systems (RS)

data-science database datasets public-data recommender-systems

Last synced: 20 Jul 2025

https://github.com/iamaziz/pydataset

Instant access to many datasets in Python.

data-science datasets python

Last synced: 16 May 2025

https://github.com/kotlin/dataframe

Structured data processing in Kotlin

data-analysis data-science dataframe kotlin

Last synced: 04 Jul 2025

https://github.com/iamaziz/PyDataset

Instant access to many datasets in Python.

data-science datasets python

Last synced: 19 Jul 2025

https://github.com/process-intelligence-solutions/pm4py

Official public repository for PM4Py (Process Mining for Python) โ€” an open-source library for exploring, analyzing, and optimizing business processes with Python.

data-mining data-science machine-learning processmining python

Last synced: 06 Apr 2026

https://github.com/probcomp/bayeslite

BayesDB on SQLite. A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data itself.

automatic-data-modeling data-science databases machine-learning probabilistic-programming

Last synced: 14 Mar 2026

https://github.com/youssefHosni/Practical-Machine-Learning

Practical machine learning notebook & articles covers the machine learning end to end life cycle.

data-science machine-learning

Last synced: 17 Mar 2025

https://github.com/youssefhosni/practical-machine-learning

Practical machine learning notebook & articles covers the machine learning end to end life cycle.

data-science machine-learning

Last synced: 12 Apr 2025

https://github.com/wecoai/aideml

AIDE: AI-Driven Exploration in the Space of Code. State of the Art machine Learning engineering agents that automates AI R&D.

ai data-science llm machine-learning

Last synced: 18 Jun 2025

https://github.com/epsilla-cloud/vectordb

Epsilla is a high performance Vector Database Management System. Try out hosted Epsilla at https://cloud.epsilla.com/

ai chatgpt data data-science database embeddings embeddings-similarity infrastructure llms machine-learning neural-network neural-search rag retrieval search-engine vector-database vector-search

Last synced: 15 May 2025

https://github.com/chawlaavi/daily-dose-of-data-science

A collection of code snippets from the publication Daily Dose of Data Science on Substack: http://www.dailydoseofds.com/

data-analysis data-science data-science-tips data-visualization jupyter jupyter-notebook jupyter-tips matplotlib matplotlib-tips numpy pandas pandas-tips python python-tips sklearn

Last synced: 04 Apr 2025

https://github.com/google/lightweight_mmm

LightweightMMM ๐Ÿฆ‡ is a lightweight Bayesian Marketing Mix Modeling (MMM) library that allows users to easily train MMMs and obtain channel attribution information.

bayesian data-science econometrics marketing-science mmm

Last synced: 06 May 2025

https://github.com/empathy87/the-elements-of-statistical-learning-python-notebooks

A series of Python Jupyter notebooks that help you better understand "The Elements of Statistical Learning" book

data-analysis data-science machine-learning python sklearn statistical-learning tensorflow tutorials

Last synced: 13 Apr 2025

https://github.com/turicas/rows

A common, beautiful interface to tabular data, no matter the format

convert-data csv data data-science excel hacktoberfest python table tabular-data xls xlsx

Last synced: 14 May 2025

https://github.com/Kotlin/dataframe

Structured data processing in Kotlin

data-analysis data-science dataframe kotlin

Last synced: 11 Apr 2025

https://github.com/stitchfix/hamilton

A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton

dag data-engineering data-platform data-science dataframe etl etl-framework etl-pipeline feature-engineering featurization hamilton hamiltonian machine-learning numpy pandas python software-engineering stitch-fix

Last synced: 29 Sep 2025

https://github.com/GoogleCloudPlatform/DataflowJavaSDK

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.

big-data data-analysis data-mining data-processing data-science google-cloud-dataflow

Last synced: 01 May 2025

https://github.com/googlecloudplatform/dataflowjavasdk

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.

big-data data-analysis data-mining data-processing data-science google-cloud-dataflow

Last synced: 03 Oct 2025

https://github.com/empathy87/The-Elements-of-Statistical-Learning-Python-Notebooks

A series of Python Jupyter notebooks that help you better understand "The Elements of Statistical Learning" book

data-analysis data-science machine-learning python sklearn statistical-learning tensorflow tutorials

Last synced: 01 May 2025

https://github.com/yoshoku/rumale

Rumale is a machine learning library in Ruby

artificial-intelligence data-analysis data-science machine-learning ml ruby rubyml

Last synced: 29 Apr 2025

https://github.com/compdemocracy/polis

:milky_way: Open Source AI for large scale open ended feedback

civic-tech data-science deliberative-democracy participatory-democracy

Last synced: 30 Mar 2025

https://github.com/hazyresearch/meerkat

Creative interactive views of any dataset.

data-science foundation-models machine-learning ml pandas

Last synced: 15 May 2025

https://github.com/dswah/pyGAM

[HELP REQUESTED] Generalized Additive Models in Python

data-science gams interpretable-machine-learning machine-learning python scientific-computing

Last synced: 27 Mar 2025

https://github.com/thealgorithms/jupyter

The repository contains script and notebook related to Statistics, Machine learning, Neural network, Deep learning, NLP, Numerical methods, and Automation.

algorithms data-science data-structures deep-learning hacktoberfest machine-learning neural-network

Last synced: 16 May 2025

https://github.com/opengeos/geoai

GeoAI: Artificial Intelligence for Geospatial Data

ai data-science geoai geopython geospatial jupyter python

Last synced: 22 May 2026

https://github.com/HazyResearch/meerkat

Creative interactive views of any dataset.

data-science foundation-models machine-learning ml pandas

Last synced: 26 Mar 2025

https://github.com/TheAlgorithms/Jupyter

The repository contains script and notebook related to Statistics, Machine learning, Neural network, Deep learning, NLP, Numerical methods, and Automation.

algorithms data-science data-structures deep-learning hacktoberfest machine-learning neural-network

Last synced: 04 May 2025

https://github.com/aloctavodia/Statistical-Rethinking-with-Python-and-PyMC3

Python/PyMC3 port of the examples in " Statistical Rethinking A Bayesian Course with Examples in R and Stan" by Richard McElreath

bayesian-data-analysis data-science pymc python statistics

Last synced: 19 Jul 2025

https://github.com/aloctavodia/statistical-rethinking-with-python-and-pymc3

Python/PyMC3 port of the examples in " Statistical Rethinking A Bayesian Course with Examples in R and Stan" by Richard McElreath

bayesian-data-analysis data-science pymc python statistics

Last synced: 12 Apr 2025