An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/mybridge/learn-python

Python Top 45 Articles of 2017

algorithm data-science machine-learning python python3

Last synced: 13 Apr 2025

https://github.com/rivasiker/ggHoriPlot

A user-friendly, highly customizable R package for building horizon plots in ggplot2

data-science data-visualization ggplot2 horizon-plots r r-package

Last synced: 02 May 2025

https://github.com/khuyentran1401/reproducible-data-science

Tutorials on creating a reproducible and maintainable data science project

data-science machine-learning mlops mlops-workflow python

Last synced: 13 Apr 2025

https://rivasiker.github.io/ggHoriPlot/

A user-friendly, highly customizable R package for building horizon plots in ggplot2

data-science data-visualization ggplot2 horizon-plots r r-package

Last synced: 06 May 2025

https://github.com/bps-statistics/stadata

STADATA is a Python package that simplifies access to statistical data provided by BPS - Statistics Indonesia

data data-analytics data-science national-statistics nso official-statistics python python-package statistics

Last synced: 14 Oct 2025

https://github.com/rk2900/drsa

Deep Recurrent Survival Analysis, an auto-regressive deep model for time-to-event data analysis with censorship handling. An implementation of our AAAI 2019 paper and a benchmark for several (Python) implemented survival analysis methods.

data-science deep-learning machine-learning survival-analysis

Last synced: 23 Jul 2025

https://github.com/dlab-berkeley/R-Fundamentals-Legacy

D-Lab's 12 hour introduction to R Fundamentals. Learn how to create variables and functions, manipulate data frames, make visualizations, use control flow structures, and more, using R in RStudio.

automation data-science data-visualization data-wrangling r

Last synced: 26 Apr 2025

https://github.com/hemansnation/Data-Analyst-Roadmap

Data-Analyst-Roadmap for Professionals. This roadmap contains 8 Chapters that can be completed in 8 weeks, whether you are a fresher in the field or an experienced professional who wants to transition into Data Analysis.

analytics data-analysis data-analysis-python data-analytics data-science numpy predictive-analytics project-based-learning python statistics tableau

Last synced: 07 Sep 2025

https://github.com/jupyterhub/repo2docker-action

A GitHub action to build data science environment images with repo2docker and push them to registries.

actions binder data-science datascience docker jupyter jupyter-notebook repo2docker repo2docker-action

Last synced: 06 Apr 2025

https://github.com/hamelsmu/Seq2Seq_Tutorial

Code For Medium Article "How To Create Data Products That Are Magical Using Sequence-to-Sequence Models"

data-science deep-learning deeplearning keras keras-tutorials machine-learning medium-article nlp-machine-learning rnn-encoder-decoder seq2seq-tutorial sequence-to-sequence

Last synced: 26 Mar 2025

https://github.com/hamelsmu/seq2seq_tutorial

Code For Medium Article "How To Create Data Products That Are Magical Using Sequence-to-Sequence Models"

data-science deep-learning deeplearning keras keras-tutorials machine-learning medium-article nlp-machine-learning rnn-encoder-decoder seq2seq-tutorial sequence-to-sequence

Last synced: 16 Mar 2025

https://github.com/picnicml/doddle-model

:cake: doddle-model: machine learning in Scala.

breeze data-science doddle-model machine-learning scala

Last synced: 11 Jan 2026

https://github.com/google/bayesnf

Bayesian Neural Field models for prediction in large-scale spatiotemporal datasets

bayesian-inference data-science machine-learning spatiotemporal-data-analysis statistics

Last synced: 16 Sep 2025

https://github.com/jacobgil/confidenceinterval

The long missing library for python confidence intervals

data-science machine-learning metrics statistics

Last synced: 14 Oct 2025

https://github.com/hemansnation/data-analyst-roadmap

Data-Analyst-Roadmap for Professionals. This roadmap contains 8 Chapters that can be completed in 8 weeks, whether you are a fresher in the field or an experienced professional who wants to transition into Data Analysis.

analytics data-analysis data-analysis-python data-analytics data-science numpy predictive-analytics project-based-learning python statistics tableau

Last synced: 15 Apr 2025

https://github.com/ing-bank/probatus

SHAP-based validation for linear and tree-based models. Applied to binary, multiclass and regression problems.

binary-classifiers data-analysis data-science feature-elimination machine-learning multi-class-classification recursive-feature-elimination regressors shap statistics tree-model

Last synced: 07 Apr 2025

https://github.com/joshuaeckroth/clj-ml

A machine learning library for Clojure built on top of Weka and friends

clojure data-science machine-learning weka

Last synced: 13 Nov 2025

https://github.com/nickpoison/astsa

R package to accompany Time Series Analysis and Its Applications: With R Examples -and- Time Series: A Data Analysis Approach Using R

astsa data-analysis data-science dna-sequences em-algorithm kalman-filter missing-data package r state-space-models time-series-analysis

Last synced: 21 Oct 2025

https://github.com/svilupp/promptingtools.jl

Streamline your life using PromptingTools.jl, the Julia package that simplifies interacting with large language models.

data-science generative-ai julia

Last synced: 12 Dec 2025

https://github.com/dmnfarrell/tablexplore

Table analysis and plotting application written in PySide2/PyQt5

data-analysis data-science dataframe pandas plotting pyqt5 pyside2 python qt

Last synced: 01 Aug 2025

https://github.com/ome/ngff

Next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.

bioimaging cloud data-science file-formats spec

Last synced: 09 May 2025

https://github.com/metalblueberry/go-plotly

The goal of the go-plotly package is to provide a pleasant Go interface for creating figure specifications which are displayed by the plotly.js JavaScript graphing library.

charts data-science data-visualization go golang graph plotly plotly-python plotlyjs plotting

Last synced: 15 Jan 2026

https://github.com/BCG-X-Official/artkit

Automated prompt-based testing and evaluation of Gen AI applications

asyncio data-science gen-ai genai python red-teaming test-automation

Last synced: 27 Jul 2025

https://github.com/svilupp/PromptingTools.jl

Streamline your life using PromptingTools.jl, the Julia package that simplifies interacting with large language models.

data-science generative-ai julia

Last synced: 23 Mar 2025

https://github.com/suji04/normalizednerd

Codes for the videos of my YouTube channel

data-science machine-learning python tutorial youtube

Last synced: 08 Apr 2025

https://github.com/trainingbypackt/data-wrangling-with-python

Simplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices

analytics beautifulsoup data-analytics data-munging data-science data-wrangling database numpy pandas python regular-expression web-scraping

Last synced: 06 Apr 2025

https://github.com/njtierney/rmd4sci

Rmarkdown for Scientists

book bookdown data-science r rmarkdown rstats science

Last synced: 16 Mar 2025

https://github.com/datacamp/viewflow

Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.

airflow apache-airflow data-engineering data-science packages python workflow

Last synced: 20 Aug 2025

https://github.com/machine-learning-apps/ml-template-azure

Template for getting started with automated ML Ops on Azure Machine Learning

aml azure azure-machine-learning data-science machine-learning machine-learning-lifecycle mlops

Last synced: 01 Apr 2025

https://github.com/dkedar7/fast_dash

Turn your Python functions into interactive apps! Fast Dash is an innovative way to deploy your Python code as interactive web apps with minimal changes.

dash data-analysis data-science data-visualization deep-learning fast-dash flask machine-learning plotly-dash python ui webdevelopment

Last synced: 20 Apr 2026

https://github.com/RamiKrispin/Introduction-to-Docker

(WIP) Getting started with Docker - An introduction to Docker with data science and engineering applications

data-engineering data-science docker dockerfile

Last synced: 14 Mar 2025

https://github.com/miniql/miniql

A tiny JSON-based query language inspired by GraphQL

data data-analysis data-science graphql javascript json queries query query-language typescript

Last synced: 15 Apr 2025

https://github.com/voxel51/papers-with-data

A curated list of papers that released datasets along with their work

ai artificial-intelligence computer-vision data-science datasets deep-learning machine-learning papers

Last synced: 16 Jul 2025

https://github.com/linogaliana/python-datascientist

Dรฉpรดt associรฉ au cours Python pour data scientists (ENSAE 2e annรฉe)

data-science jupyter jupyter-notebook machine-learning opendata python teaching

Last synced: 04 Apr 2025

https://github.com/romanmichaelpaolucci/AI_Stock_Trading

Design pattern for critical stages in the development process of an AI Stock Trading Bot

artificial-intelligence data-science machine-learning neural-network python trading trading-algorithms trading-bot trading-strategies

Last synced: 10 Apr 2025

https://github.com/scrapinghub/python-simhash

An efficient simhash implementation for python

data-science

Last synced: 25 Jun 2025

https://github.com/scitime/scitime

Training time estimation for scikit-learn algorithms

data-science machine-learning python scikit-learn timer

Last synced: 14 Apr 2026

https://github.com/vkoul/Econ-Data-Science

Articles/ Journals and Videos related to Economics:chart_with_upwards_trend: and Data Science :bar_chart:

casual-inference data-science econometrics economics economist machine-learning social-sciences

Last synced: 06 May 2025

https://github.com/autoviml/deep_autoviml

Build tensorflow keras model pipelines in a single line of code. Now with mlflow tracking. Created by Ram Seshadri. Collaborators welcome. Permission granted upon request.

autokeras automl data-science deep-learning gcp keras machine-learning mlflow mljar pycaret python tensorflow tensorflow2 tpot

Last synced: 06 Apr 2025

https://github.com/solegalli/hyperparameter-optimization

Code repository for the online course Hyperparameter Optimization for Machine Learning

data-science hyperopt hyperparameter-optimization machine-learning optuna python scikit-optimize

Last synced: 06 Apr 2025

https://github.com/wyattowalsh/data-science-notes

Open-source project hosted at https://makeuseofdata.com to crowdsource a robust collection of notes related to data science (math, visualization, modeling, etc)

calculus classification compilation crowdsourcing data-science first-timers first-timers-only jupyter-book linear-algebra machine-learning modeling probability regression simulation statistics up-for-grabs visualization

Last synced: 18 Jan 2026

https://github.com/winvector/pyvtreat

vtreat is a data frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. Distributed under a BSD-3-Clause license.

data-science machine-learning pydata python

Last synced: 13 Apr 2025

https://github.com/jadianes/spark-r-notebooks

R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks

big-data bigdata data-analysis data-science exploratory-data-analysis jupyter jupyter-notebook notebook r sparkr

Last synced: 21 Apr 2025

https://github.com/WinVector/pyvtreat

vtreat is a data frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. Distributed under a BSD-3-Clause license.

data-science machine-learning pydata python

Last synced: 13 Jul 2025

https://github.com/soda-inria/carte

Repository for CARTE: Context-Aware Representation of Table Entries

classification data-science graph-transformer machine-learning regression transformers

Last synced: 12 Apr 2025

https://github.com/medtagger/MedTagger

A collaborative framework for annotating medical datasets using crowdsourcing.

crowdsourcing data-science data-validation deep-learning labeling medical-imaging

Last synced: 08 May 2025

https://github.com/napjon/krisk

Statistical Interactive Visualization with pandas+Jupyter integration on top of Echarts.

dashboard data-science data-visualization echarts interactive-charts jupyter-notebook python

Last synced: 29 Jul 2025

https://github.com/voxel51/fiftyone-plugins

A curated list of plugins that you can add to your FiftyOne install!

artificial-intelligence computer-vision data-science deep-learning fiftyone machine-learning plugins python

Last synced: 11 Apr 2025

https://github.com/materialsinnovation/pymks

Materials Knowledge System in Python

data-science machine-learning materials-science python

Last synced: 22 Feb 2026

https://github.com/winvector/data_algebra

Codd method-chained SQL generator and Pandas data processing in Python.

data-analysis data-science pandas python

Last synced: 09 Apr 2025