An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/ChawlaAvi/Daily-Dose-of-Data-Science

A collection of code snippets from the publication Daily Dose of Data Science on Substack: http://www.dailydoseofds.com/

data-analysis data-science data-science-tips data-visualization jupyter jupyter-notebook jupyter-tips matplotlib matplotlib-tips numpy pandas pandas-tips python python-tips sklearn

Last synced: 04 Oct 2025

https://github.com/janpfeifer/gonb

GoNB, a Go Notebook Kernel for Jupyter

data-science go golang gonb jupyter jupyter-notebook jupyter-notebook-kernel

Last synced: 14 May 2025

https://github.com/inseefrlab/onyxia

๐Ÿ”ฌ Data science environment for k8s

bluehats data-science datalab helm insee kubernetes onyxia

Last synced: 07 Jun 2026

https://github.com/mrankitgupta/Data-Analyst-Roadmap

I am sharing my Journey of 66DaysofData into Data Analytics by participating in Ken Jee's #66daysofdata challenge

ankit ankit-gupta ankitgupta data-analysis data-analytics data-science data-structures data-visualization excel mongodb mysql pandas powerbi python sql sql-server tableau

Last synced: 07 Sep 2025

https://github.com/pymc-labs/pymc-marketing

Bayesian marketing toolbox in PyMC. Media Mix (MMM), customer lifetime value (CLV), buy-till-you-die (BTYD) models and more.

btyd buy-till-you-die clv customer-lifetime-value data-science marketing marketing-mix-modeling media-mix-modeling mmm python

Last synced: 14 Dec 2025

https://github.com/mrankitgupta/data-analyst-roadmap

I am sharing my Journey of 66DaysofData into Data Analytics by participating in Ken Jee's #66daysofdata challenge

ankit ankit-gupta ankitgupta data-analysis data-analytics data-science data-structures data-visualization excel mongodb mysql pandas powerbi python sql sql-server tableau

Last synced: 13 Apr 2025

https://github.com/enkidevs/curriculum

๐Ÿ‘ฉโ€๐Ÿซ ๐Ÿ‘จโ€๐Ÿซ The open-source curriculum of Enki!

ai algorithms blockchain chatgpt computer-science css curriculum data-science education enki git gpt4 html java javascript learn-to-code linux python security sql

Last synced: 15 May 2025

https://github.com/JosephLai241/URS

Universal Reddit Scraper - A comprehensive Reddit scraping/archival command-line tool.

archiving command-line comments csv data-analysis data-science json livestream osint-tool praw pyo3 python reddit reddit-scraper redditor rust scraper subreddit trees wordcloud

Last synced: 24 Mar 2025

https://github.com/kuwala-io/kuwala

Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times

admin-boundaries data data-integration data-science dbt elt google-trends jupyter kuwala no-code open-data open-source population postgres pyspark python react react-flow scraping spatial-analysis

Last synced: 30 Mar 2025

https://github.com/erikaduan/r_tips

R programming tips for data cleaning, data visualisation, statistical modelling and machine learning

data-science data-visualization machine-learning r rstats statistics

Last synced: 29 Jul 2025

https://github.com/scrapinghub/python-crfsuite

A python binding for crfsuite

crf crfsuite data-science

Last synced: 14 May 2025

https://github.com/jalapic/engsoccerdata

English and European soccer results 1871-2022

data-science data-visualization r rstats soccer sport sports sports-stats

Last synced: 06 Feb 2026

https://github.com/biomedsciai/causallib

A Python package for modular causal inference analysis and model evaluations

causal causal-inference causal-models causality data-science machine-learning ml

Last synced: 14 May 2025

https://github.com/glue-viz/glue

Linked Data Visualizations Across Multiple Files

data-science linked-data python visualization

Last synced: 14 May 2025

https://github.com/target/matrixprofile-ts

A Python library for detecting patterns and anomalies in massive datasets using the Matrix Profile

data-science matrix-profile motif motif-discovery pip pip3 pypi pypi-packages python python3 time-series timeseries-analysis timeseries-segmentation

Last synced: 14 Jan 2026

https://github.com/ipython-books/cookbook-2nd-code

Code of the IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018 [read-only repository]

computing data-analysis data-mining data-science data-visualization ipython jupyter jupyter-notebook machine-learning numerical-computation python visualization

Last synced: 12 Apr 2025

https://github.com/williamFalcon/test-tube

Python library to easily log experiments and parallelize hyperparameter search for neural networks

caffe caffe2 chainer data-science deep-learning grid-search hyperparameter-optimization keras machine-learning neural-networks pytorch random-search tensorflow

Last synced: 27 Mar 2025

https://github.com/williamfalcon/test-tube

Python library to easily log experiments and parallelize hyperparameter search for neural networks

caffe caffe2 chainer data-science deep-learning grid-search hyperparameter-optimization keras machine-learning neural-networks pytorch random-search tensorflow

Last synced: 23 Feb 2026

https://github.com/akfamily/aktools

AKTools is an elegant and simple HTTP API library for AKShare, built for AKSharers!

akshare asyncio data data-science fastapi openapi pydanti

Last synced: 14 May 2025

https://github.com/pdpipe/pdpipe

Easy pipelines for pandas DataFrames.

data data-science dataframe dataframes pandas pandas-dataframe pipeline

Last synced: 06 Mar 2026

https://github.com/arvkevi/kneed

Knee point detection in Python :chart_with_upwards_trend:

data-analysis data-science elbow-method knee-point python scientific-computing systems

Last synced: 21 Oct 2025

https://github.com/iterative/mlem

๐Ÿถ A tool to package, serve, and deploy any ML model on any platform. Archived to be resurrected one day๐Ÿคž

cli data-science deployment developer-tools git machine-learning mlem model-registry python

Last synced: 26 Mar 2025

https://github.com/ShawhinT/YouTube-Blog

Codes to complement YouTube videos and blog posts on Medium.

data-science example-code machine-learning medium-articles youtube

Last synced: 18 Jul 2025

https://github.com/ashishpatel26/amazing-feature-engineering

Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.

data-analysis data-mining data-science data-scientists data-visualization deep-learning feature-engineering feature-extraction feature-scaling feature-selection features machine-learning scikit-learn

Last synced: 16 May 2025

https://github.com/nicolaskruchten/jupyter_pivottablejs

Dragโ€™nโ€™drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js

data-analysis data-science interactive jupyter-notebook pivot-chart pivot-tables

Last synced: 15 May 2025

https://github.com/ashishpatel26/Amazing-Feature-Engineering

Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.

data-analysis data-mining data-science data-scientists data-visualization deep-learning feature-engineering feature-extraction feature-scaling feature-selection features machine-learning scikit-learn

Last synced: 10 Apr 2025

https://github.com/TrainingByPackt/Data-Science-Projects-with-Python

A Case Study Approach to Successful Data Science Projects Using Python, Pandas, and Scikit-Learn

data-science machine-learning numpy pandas pandas-dataframe python scikit-learn

Last synced: 14 Apr 2025

https://github.com/trainingbypackt/data-science-projects-with-python

A Case Study Approach to Successful Data Science Projects Using Python, Pandas, and Scikit-Learn

data-science machine-learning numpy pandas pandas-dataframe python scikit-learn

Last synced: 04 Apr 2025

https://github.com/litaotao/ipython-dashboard

A stand alone, light-weight web server for building, sharing graphs created in ipython. Build for data science, data analysis guys. Aiming at building an interactive visualization, collaborated dashboard, and real-time streaming graph.

dashboard data-science ipython ipython-dashboard notebook visualization

Last synced: 16 May 2025

https://github.com/litaotao/IPython-Dashboard

A stand alone, light-weight web server for building, sharing graphs created in ipython. Build for data science, data analysis guys. Aiming at building an interactive visualization, collaborated dashboard, and real-time streaming graph.

dashboard data-science ipython ipython-dashboard notebook visualization

Last synced: 03 Aug 2025

https://github.com/BiomedSciAI/causallib

A Python package for modular causal inference analysis and model evaluations

causal causal-inference causal-models causality data-science machine-learning ml

Last synced: 27 Mar 2025

https://github.com/buckaroo-data/buckaroo

Buckaroo - The data table UI for Notebooks. Quickly explore dataframes, scroll through dataframes, search, sort, view summary stats and histograms. Works with Pandas, Polars, Jupyter, Marimo, VSCode Notebooks

buckaroo data-science jupyter marimo-notebook paddy pandas polars

Last synced: 22 May 2026

https://github.com/odpi/opends4all

OpenDS4All project, hosted by LF AI & Data

data-science jupyter-notebooks materials

Last synced: 10 Jun 2025

https://github.com/floydwch/kaggle-cli

(Deprecated, use https://github.com/Kaggle/kaggle-api instead) An unofficial Kaggle command line tool.

cli data-science

Last synced: 30 Dec 2025

https://github.com/kotlin/kandy

Kotlin plotting library.

data-science graphics jupyter-notebooks kotlin plot

Last synced: 04 Jul 2025

https://github.com/perpetual-ml/perpetual

Perpetual is a high-performance gradient boosting machine. It delivers optimal accuracy in a single run without complex tuning through a simple budget parameter. It features out-of-the-box support for causal ML, continual learning, native calibration, and robust drift monitoring, along with Rust core and zero-copy bindings for Python and R

data-science gbdt gbm gradient-boosted-trees gradient-boosting gradient-boosting-decision-trees kaggle machine-learning python rust

Last synced: 02 Apr 2026

https://github.com/github/codespaces-jupyter

Explore machine learning and data science with Codespaces

codespaces data-science jupyter-notebook machine-learning

Last synced: 11 Apr 2025

https://github.com/fabsig/GPBoost

Combining tree-boosting with Gaussian process and mixed effects models

artificial-intelligence boosting cpp data-science gaussian-processes machine-learning mixed-effects python r

Last synced: 04 Feb 2026

https://github.com/fastai/fastai2

Temporary home for fastai v2 while it's being developed

data-science deep-learning fastai jupyter machine-learning nbdev python pytorch

Last synced: 19 Jul 2025

https://github.com/faktionai/awesome-ai-usecases

A list of awesome and proven Artificial Intelligence use cases and applications

data-science machine-learning

Last synced: 14 Mar 2025

https://github.com/aeturrell/coding-for-economists

This repository hosts the code behind the online book, Coding for Economists.

book data-science econometrics economics economics-models jupyter-notebook learning python research vscode

Last synced: 12 Oct 2025

https://github.com/Kotlin/kandy

Kotlin plotting library.

data-science graphics jupyter-notebooks kotlin plot

Last synced: 12 Apr 2025

https://github.com/blue-yonder/turbodbc

Turbodbc is a Python module to access relational databases via the Open Database Connectivity (ODBC) interface. The module complies with the Python Database API Specification 2.0.

data-science database exasol numpy odbc pep249 pyodbc python python-database-api speedup

Last synced: 14 May 2025

https://github.com/sforaidl/kd_lib

A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.

algorithm-implementations benchmarking data-science deep-learning-library knowledge-distillation machine-learning model-compression pruning pytorch quantization

Last synced: 16 May 2025

https://github.com/cerndb/dist-keras

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

apache-spark data-parallelism data-science deep-learning distributed-optimizers hadoop keras machine-learning optimization-algorithms tensorflow

Last synced: 03 Oct 2025

https://github.com/rstojnic/lazydata

Lazydata: Scalable data dependencies for Python projects

data-science datamanagement machine-learning python

Last synced: 26 Mar 2025

https://github.com/farukalamai/advanced-machine-learning-engineer-roadmap-2024

A Full Stack ML (Machine Learning) Roadmap involves learning the necessary skills and technologies to become proficient in all aspects of machine learning, including data collection and preprocessing, model development, deployment, and maintenance.

aws computer-vision data-analysis data-science data-visualization deep-learning git-github machine-learning machine-learning-roadmap mlops natural-language-processing neural-network nlp opencv pandas python pytorch statistics tensorflow yolo

Last synced: 04 Apr 2025

https://github.com/squarespace/datasheets

Read data from, write data to, and modify the formatting of Google Sheets

data data-analytics data-science dataframe google pandas python

Last synced: 16 May 2025

https://github.com/erezsh/preql

An interpreted relational query language that compiles to SQL.

data-science database python query sql

Last synced: 16 May 2025

https://github.com/Squarespace/datasheets

Read data from, write data to, and modify the formatting of Google Sheets

data data-analytics data-science dataframe google pandas python

Last synced: 15 Mar 2025

https://github.com/chris-greening/instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

beginner-friendly data-mining data-science instagram instagram-data instagram-scraper lightweight python python-scraper python3 webscraping

Last synced: 07 Apr 2025

https://github.com/juliastats/glm.jl

Generalized linear models in Julia

data-science glm julia regression statistical-models statistics

Last synced: 14 May 2025

https://github.com/gesistsa/rio

๐ŸŸ A Swiss-Army Knife for Data I/O

cran csv csvy data data-science excel io r rio sas spss stata

Last synced: 12 Dec 2025

https://github.com/erezsh/Preql

An interpreted relational query language that compiles to SQL.

data-science database python query sql

Last synced: 26 Mar 2025

https://github.com/rnorm/book_sample

another book on data science

book data-science python r

Last synced: 19 Jul 2025

https://github.com/fabsig/gpboost

Combining tree-boosting with Gaussian process and mixed effects models

artificial-intelligence boosting cpp data-science gaussian-processes machine-learning mixed-effects python r

Last synced: 14 May 2025

https://github.com/jadianes/data-science-your-way

Ways of doing Data Science Engineering and Machine Learning in R and Python

data-frame data-science data-science-engineering exploratory-data-analysis jupyter machine-learning notebook python r tutorial

Last synced: 04 Apr 2025

https://github.com/tuangauss/DataScienceProjects

The code repository for projects and tutorials in R and Python that covers a variety of topics in data visualization, statistics sports analytics and general application of probability theory.

data-science data-visualization statistics

Last synced: 29 Mar 2025

https://github.com/diskframe/disk.frame

Fast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data

data data-science large-dataset manipulation-data medium-data r

Last synced: 16 Jun 2025