An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/joaquinamatrodrigo/estadistica-con-r

Apuntes personales sobre estadística, machine learning y lenguaje de programación R

bioestadistica data-mining data-science estadistica machine-learning mineria-de-datos r

Last synced: 05 Apr 2025

https://github.com/tellery/tellery

Tellery lets you build metrics using SQL and bring them to your team. As easy as using a document. As powerful as a data modeling tool.

analytics bigquery business-intelligence collaboration dashboard data-analytics data-modeling data-science data-visualization database dbt notebook self-hosted sql

Last synced: 16 May 2025

https://github.com/AnotherSamWilson/miceforest

Multiple Imputation with LightGBM in Python

data-science imputed-values mice-algorithm python random-forest

Last synced: 13 Jul 2025

https://github.com/InseeFrLab/onyxia

🔬 Data science environment for k8s

bluehats data-science datalab helm insee kubernetes onyxia

Last synced: 30 Aug 2025

https://github.com/finlay-liu/kaggle_public

阿水的数据竞赛开源分支

data-science kaggle-competition

Last synced: 07 Apr 2025

https://github.com/KiranGershenfeld/VisualizingTwitchCommunities

Graphing communities on Twitch.tv in a visually intuitive way

community data-science python twitch visualization

Last synced: 14 Mar 2025

https://github.com/nshiab/simple-data-analysis

Easy-to-use and high-performance TypeScript library for data analysis. Works with tabular, geospatial and vector data.

ai analysis bun data data-analysis data-science deno duckdb geospatial javascript llm machine-learning node node-js nodejs spatial spatial-analysis sql typescript

Last synced: 22 May 2026

https://github.com/aporia-ai/mlnotify

🔔 No need to keep checking your training - just one import line and you'll know the second it's done.

data-science deep-learning deeplearning machine-learning machinelearning machinelearning-python ml notification notifications opensource python python3 tool tools

Last synced: 05 Apr 2025

https://github.com/datmo/datmo

Open source production model management tool for data scientists

artificial-intelligence data-science deep-learning machine-learning reproducibility version-control

Last synced: 14 Dec 2025

https://github.com/autonlab/auton-survival

Auton Survival - an open source package for Regression, Counterfactual Estimation, Evaluation and Phenotyping with Censored Time-to-Events

causal-inference counterfactual-inference data-science deep-learning graphical-models machine-learning python regression reliability-analysis survival-analysis time-to-event

Last synced: 16 Jan 2026

https://github.com/alibaba/feathub

FeatHub - A stream-batch unified feature store for real-time machine learning

apache-flink data data-engineering data-quality data-science feature-engineering feature-store machine-learning mlops streaming

Last synced: 14 Oct 2025

https://github.com/gdsbook/book

This book serves as an introduction to a whole new way of thinking systematically about geographic data, using geographical analysis and computation to unlock new insights hidden within data.

data-analysis-python data-science geographic-data geographical-information-system spatial-analysis spatial-data-analysis spatial-statistics statistics

Last synced: 15 Mar 2025

https://github.com/larswaechter/voici.js

A Node.js library for pretty printing your data on the terminal🎨

console data-science javascript shell terminal tty typescript

Last synced: 05 Apr 2025

https://github.com/yzhao062/data-mining-conferences

Ranking, acceptance rate, deadline, and publication tips

data-mining data-science research

Last synced: 04 Oct 2025

https://github.com/jovianhq/opendatasets

A Python library for downloading datasets from Kaggle, Google Drive, and other online sources.

data-science datasets machine-learning python

Last synced: 04 Apr 2025

https://github.com/tommyod/efficient-apriori

An efficient Python implementation of the Apriori algorithm.

apriori-algorithm association-rules data-mining data-science machinelearning

Last synced: 15 May 2025

https://github.com/mljar/plotai

PlotAI - Your Ultimate Plotting Assistant! 📊🤖 Use ChatGPT-3.5 to create plots in Python and Matplotlib directly in your Python script or notebook.

charts chatgpt data-science llm matplotlib plots python visualization

Last synced: 15 May 2025

https://github.com/upgini/upgini

Data search & enrichment library for Machine Learning → Easily find and add relevant features to your ML & AI pipeline from hundreds of public and premium external data sources, including open & commercial LLMs

automated-feature-engineering automl automl-pipeline chatgpt data-enrichment data-science feature-engineering feature-extraction feature-selection features kaggle kaggle-solution large-language-models llm machine-learning open-data open-datasets public-data python-library scikit-learn

Last synced: 15 May 2025

https://github.com/anonyfox/elixir-scrape

Scrape any website, article or RSS/Atom Feed with ease!

data-science elixir feed html information-retrieval readability rss scrape scraping

Last synced: 13 Apr 2025

https://github.com/petrobras/3w

Promotes development of ML algorithms for early detection and classification of undesirable events in offshore oil wells.

anomaly-detection data-science machine-learning multivariate-time-series-analysis oil-well-monitoring

Last synced: 07 Feb 2026

https://github.com/Anonyfox/elixir-scrape

Scrape any website, article or RSS/Atom Feed with ease!

data-science elixir feed html information-retrieval readability rss scrape scraping

Last synced: 30 Mar 2025

https://github.com/machine-learning-apps/Issue-Label-Bot

Code For The Issue Label Bot, an App that automatically labels issues using machine learning, available on the GitHub Marketplace. This is also code for the blog article: "How to automate tasks on GitHub with machine learning for fun and profit"

bigquery bootstrap data-science deep-learning end-to-end-application flask gcp-cloud gharchive github-api-v3 github-app keras kubernetes machine-learning machine-learning-tutorials nlp production-machine-learning tensorflow

Last synced: 14 Mar 2025

https://github.com/machine-learning-apps/issue-label-bot

Code For The Issue Label Bot, an App that automatically labels issues using machine learning, available on the GitHub Marketplace. This is also code for the blog article: "How to automate tasks on GitHub with machine learning for fun and profit"

bigquery bootstrap data-science deep-learning end-to-end-application flask gcp-cloud gharchive github-api-v3 github-app keras kubernetes machine-learning machine-learning-tutorials nlp production-machine-learning tensorflow

Last synced: 05 Oct 2025

https://github.com/databrickslabs/tempo

API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation

data-analysis data-science pandas python scala time-series timeseries timeseries-analysis timeseries-data

Last synced: 29 Apr 2025

https://github.com/profjsb/python-seminar

Python for Data Science (Seminar Course at UC Berkeley; AY 250)

data-science distributed-computing machine-learning python visualization

Last synced: 19 Jul 2025

https://github.com/tommyod/Efficient-Apriori

An efficient Python implementation of the Apriori algorithm.

apriori-algorithm association-rules data-mining data-science machinelearning

Last synced: 26 Mar 2025

https://github.com/maxhumber/redframes

General Purpose Data Manipulation Library

data-science pandas python

Last synced: 05 Apr 2025

https://github.com/microsoft/genalog

Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

data-generation data-science machine-learning ner ocr-recognition python synthetic-data synthetic-data-generation synthetic-images text-alignment

Last synced: 04 Apr 2025

https://github.com/kamu-data/kamu-cli

Next-generation decentralized data lakehouse and a multi-party stream processing network

blockchain data-as-code data-management data-science datafusion flink jupyter kamu open-data open-data-fabric spark sql

Last synced: 17 Feb 2026

https://github.com/jbryer/likert

Package to analyze likert based items.

data-science r visualization

Last synced: 28 Jan 2026

https://github.com/solegalli/feature-selection-for-machine-learning

Code repository for the online course Feature Selection for Machine Learning

data-science feature-selection machine-learning python

Last synced: 15 May 2025

https://github.com/ml-tooling/ml-hub

🧰 Multi-user development platform for machine learning teams. Simple to setup within minutes.

data-science docker jupyter jupyterhub machine-learning python

Last synced: 06 Apr 2025

https://github.com/weecology/retriever

Quickly download, clean up, and install public datasets into a database management system

data data-retrieval data-science dataset datasets hacktobefest python

Last synced: 21 Oct 2025

https://jbryer.github.io/likert/

Package to analyze likert based items.

data-science r visualization

Last synced: 06 May 2025

https://github.com/huridocs/uwazi

Uwazi is a web-based, open-source solution for building and sharing document collections

ai data-science database documents non-profit open-source pdf

Last synced: 27 Jun 2026

https://github.com/CJWorkbench/cjworkbench

The data journalism platform with built in training

data-analysis data-journalism data-science data-visualization journalism notebook

Last synced: 17 Jul 2025

https://github.com/yamafaktory/hypergraph

Hypergraph is data structure library to create a directed hypergraph in which a hyperedge can join any number of vertices.

data data-science data-structure data-structures hypergraph hypergraphs rust rust-lang rustlang

Last synced: 15 May 2025

https://github.com/flyteorg/flytekit

Extensible Python SDK for developing Flyte tasks and workflows. Simple to get started and learn and highly extensible.

automation data data-science extensible flyte flyte-tasks hacktoberfest mlops pypi python sdk spark workflows

Last synced: 04 May 2026

https://github.com/pog87/PtitPrince

python version of raincloud

data-science hacktoberfest matplotlib python

Last synced: 21 Nov 2025

https://github.com/data-describe/data-describe

data⎰describe: Pythonic EDA Accelerator for Data Science

analysis data-science eda exploratory-data-analysis pypi

Last synced: 12 Apr 2025

https://github.com/priorlabs/tabpfn-extensions

Community extensions for TabPFN - the foundation model for tabular data. Built with TabPFN! 🤗

data-science machine-learning tabpfn tabular-data

Last synced: 02 Jun 2026

https://github.com/PPshrimpGo/BDCI2018-ChinauUicom-1st-solution

这是BDCI2018的联通赛题第一名解决方案

competition data-science

Last synced: 20 Jul 2025

https://github.com/greenelab/scihub

Source code and data analyses for the Sci-Hub Coverage Study

crossref data-science doi journals libgen open-data sci-hub scimag scopus

Last synced: 09 Apr 2025

https://github.com/PKU-DAIR/Hetu

A high-performance distributed deep learning system targeting large-scale and automated distributed training.

artificial-intelligence autograd data-science deep-learning deep-neural-networks distributed-systems distributed-training embeddings gpu high-dimensional machine-learning python state-of-the-art

Last synced: 20 Mar 2025

https://github.com/kde/labplot

LabPlot is a FREE, open source and cross-platform Data Visualization and Analysis software accessible to everyone.

data-analysis data-science data-visualization fitting graph graph2d plotting scientific-plotting scientific-visualization

Last synced: 16 May 2025

https://github.com/drivendataorg/deon

A command line tool to easily add an ethics checklist to your data science projects.

data-ethics data-science ethics machine-learning

Last synced: 04 Apr 2025

https://github.com/Dyakonov/PZAD

Курс "Прикладные задачи анализа данных" (ВМК, МГУ имени М.В. Ломоносова)

data-mining data-science data-visualization education lectures machine-learning ml russian slides

Last synced: 19 Jul 2025

https://github.com/glm-tools/pyglmnet

Python implementation of elastic-net regularized generalized linear models

data-science elastic-net glm lasso machine-learning python

Last synced: 18 Feb 2026

https://github.com/Ibotta/sk-dist

Distributed scikit-learn meta-estimators in PySpark

data-science machine-learning ml scikit-learn spark

Last synced: 18 Jul 2025

https://github.com/amanovishnu/ineuron-full-stack-data-science-assignments

this repository features assignments and projects from the iNeuron full stack data science course, providing valuable resources for learners to enhance their skills and apply their knowledge.

computer-vision data-science datascience deep-learning exploratory-data-analysis linear-regression machine-learning natural-language-processing python recommender-system sql statistics

Last synced: 08 Apr 2025

https://github.com/microsoft/NimbusML

Python machine learning package providing simple interoperability between ML.NET and scikit-learn components.

data-science machine-learning ml mlnet nimbusml python scikit-learn

Last synced: 18 Apr 2025

https://github.com/microsoft/nimbusml

Python machine learning package providing simple interoperability between ML.NET and scikit-learn components.

data-science machine-learning ml mlnet nimbusml python scikit-learn

Last synced: 07 Oct 2025

https://github.com/ibotta/sk-dist

Distributed scikit-learn meta-estimators in PySpark

data-science machine-learning ml scikit-learn spark

Last synced: 16 May 2025

https://github.com/amanovishnu/ineuron-full-stack-data-science-assignment-collection

this repository features assignments and projects from the iNeuron full stack data science course, providing valuable resources for learners to enhance their skills and apply their knowledge.

computer-vision data-science datascience deep-learning exploratory-data-analysis linear-regression machine-learning natural-language-processing python recommender-system sql statistics

Last synced: 28 Feb 2025

https://github.com/Mybridge/python-articles

Monthly Series - Top 10 Python Articles

data-science data-visualization django flask python python3

Last synced: 22 Mar 2025

https://github.com/project-ryoma/ryoma

Common AI agent framework solving your data problems

ai data-science llm

Last synced: 02 May 2025

https://github.com/mybridge/python-articles

Monthly Series - Top 10 Python Articles

data-science data-visualization django flask python python3

Last synced: 21 Jul 2025

https://github.com/senseyeio/roger

Golang RServe client. Use R from Go

data-science go r rserve scientific-computing

Last synced: 09 Apr 2025

https://github.com/stocknear/backend

Backend of stocknear - Open Source Stock Analysis

data data-science fastapi fastify finance javascript machine-learning nodejs pocketbase python redis

Last synced: 16 May 2025

https://github.com/shahinrostami/plotapi

Engaging visualisations, made easy.

data data-science data-visualization plotting python visualization

Last synced: 22 Jan 2026

https://github.com/kraina-ai/srai

Spatial Representations for Artificial Intelligence - a Python library toolkit for geospatial machine learning focused on creating embeddings for downstream tasks

artificial-intelligence data-science geo geospatial machine-learning python spatial spatial-analysis srai

Last synced: 15 May 2025

https://github.com/Griperis/BlenderDataVis

Data visualisation addon for Blender

blender blender-addon chart data-science data-visualisation

Last synced: 09 May 2025