An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/amirhosseinhonardoust/workout-efficiency-benchmark

Streamlit + Python pipeline that benchmarks gym workout efficiency (kcal/min) using present sessions only. Generates sortable workout-type benchmarks, distribution plots, fairness-aware gap analysis with uncertainty/low-sample flags, and a data-quality report to prevent misleading comparisons.

analytics benchmarking bias-audit dashboard data-analysis data-quality data-science eda fairness fitness health-data pandas plotly python reporting reproducible-research statistics streamlit visualization workout

Last synced: 10 Jun 2026

https://github.com/nikhilaravi/neuralnetflix

Movie Genre Prediction from movie posters using Deep Learning

data-science deeplearning

Last synced: 18 Oct 2025

https://github.com/liamarguedas/uber-eats-delivery-time

Delivery time prediction system for Uber Eats

data-science machine-learning regression

Last synced: 10 Oct 2025

https://github.com/gianlucatruda/warfit-learn

A machine learning toolkit for reproducible research in anticoagulant dose estimation.

data-science iwpc pandas preprocessing python reproducible-research sklearn supervised-learning warfarin warfit-learn

Last synced: 24 Oct 2025

https://github.com/teddyoweh/dimensionality-reduction-pca

Dimensionality reduction is basically a process of reducing the amount of random features,attributes variables or in this case called dimensions in a dataset and leaving as much variation in the dataset as possible by obtaining a set of only relevant features to increase the effiency of a model.

data-science dataset dimensional-analysis dimensionality-reduction feature-extraction feature-selection machine-learning

Last synced: 09 Apr 2025

https://github.com/joshuaulrich/stl-rug

Content presented at the Saint Louis R User Group

data-analysis data-science r

Last synced: 26 Aug 2025

https://github.com/cadcad-org/snippets

Repo containing notebooks showcasing features and applications of cadCAD.

cadcad data-science education python simulation snippets

Last synced: 23 Apr 2025

https://github.com/dionhaefner/fowd

Processing framework for FOWD, a free ocean wave dataset, ready for your ML application :ocean:

data-science machine-learning ocean open-data waves

Last synced: 21 Aug 2025

https://github.com/tuanle618/AEDA

AEDA - Automated Data Exploratory Analysis in R

data-science eda eda-report exploratory-data-analysis r

Last synced: 29 Jul 2025

https://github.com/durgeshsamariya/100daysofdatascience

A 100 Day DS Challenge to learn and implement DS concepts ranging from the beginner of Data Science to Data Scientist.

100days 100daysofcode 100daysofdscode 100daysofmlcode data data-science

Last synced: 15 Apr 2025

https://github.com/quantifyearth/yirgacheffe

A declarative geospatial library for Python to make data-science with maps easier

data-science geospatial python3

Last synced: 01 Apr 2026

https://github.com/ihmeuw/easylink

A tool that allows users to build and run highly configurable record linkage/entity resolution pipelines.

data-science entity-resolution record-linkage

Last synced: 01 Apr 2026

https://github.com/tezansahu/dvc-pycaret-fastapi-demo

Repository for the Demo of using DVC with PyCaret & MLOps (DVC Office Hours - 20th Jan, 2022)

data-science demo deployment dvc fastapi machine-learning mlops-workflow pycaret

Last synced: 26 Dec 2025

https://github.com/mrdandelion6/learn-to-code

This repository is a collection of my notes and code snippets as I journey through learning different programming languages and coding concepts.

c data-analysis data-science javascript learn-to-code machine-learning matlab python r react shell-script

Last synced: 11 Apr 2025

https://github.com/clowdr/clowdr

Command-line utility for iteratively developing pipelines, deploying them at scale, and sharing data and derivatives

data-science docker hpc-applications pipelines python singularity

Last synced: 14 Jan 2026

https://github.com/storopoli/linguagem-r

Disciplina de Linguagem R para Ciência de Dados de Pós-Graduação da UNINOVE

data-science r-language r-programming r-stats

Last synced: 31 Oct 2025

https://github.com/cakecrusher/mimicbot

Mimicbot enables the effortless yet modular creation of an AI chat bot model that imitates another person's manner of speech.

ai bot data-science discord discord-bot huggingface natural-language-processing pypi python python-package

Last synced: 28 Oct 2025

https://github.com/nalomran/pyreqtl

A collection of Python modules equivalent to R ReQTL Toolkit aims to identify the association between expressed SNVs with their gene expression using RNA-sequencing data.

bioinformatics bioinformatics-analysis bioinformatics-tool data-science gene-expression matrixeqtl numpy pandas python python3 r rna-seq rna-seq-analysis rpy2 scipy snvs

Last synced: 27 Oct 2025

https://github.com/giswqs/leafmaptools

A Python package for building a tool widgets infrastructure with ipyleaflet and ipywidgets

data-science data-visualization geopython geospatial ipyleaflet ipywidgets jupyter jupyter-notebook mapping python

Last synced: 12 May 2025

https://github.com/mahdi-eth/linear-regression-from-scratch

This project implements a Python-based linear regression model from scratch, complete with custom functions for mean squared error and gradient descent algorithm. It is tested on data, using features to predict target variables. The project offers a practical introduction to linear regression.

algorithm data-science data-visualization linear-regression machine-learning machine-learning-algorithms python

Last synced: 15 Apr 2025

https://github.com/serialbandicoot/great-assertions

This library is inspired by the Great Expectations library. The library has made the various expectations found in Great Expectations available when using the inbuilt python unittest assertions.

data-science data-testing databricks great-expectations jupyter-notebook python python3 quality-assurance testing

Last synced: 28 Oct 2025

https://github.com/iterative/features

A collection of development container 'features' for machine learning and data science

data-science dvc features machine-learning

Last synced: 18 Jun 2025

https://github.com/minusxai/minusx

MinusX is an Agentic Business Intelligence platform. It's Claude Code for data.

artificial-intelligence data-analytics data-science jupyter metabase

Last synced: 18 Feb 2026

https://github.com/google-marketing-solutions/fractional_uplift

A flexible python package for cost-aware uplift modelling.

data-science marketing python uplift-modeling

Last synced: 31 Jul 2025

https://github.com/srlozano/tinder-big-data-analysis

Big Data Analysis of Tinder done at Universitat Rovira i Virgili and Universitat Politècnica de Catalunya · BarcelonaTech

big-data big-data-analytics data-science dating-app mongodb python

Last synced: 11 Oct 2025

https://github.com/mljar/variable-inspector

Explore variables in Jupyter notebooks

data-science jupyter jupyterlab jupyterlab-extension mljar python

Last synced: 01 Mar 2025

https://github.com/celbridge-org/celbridge

Celbridge is an open source tool that provides a bridge between spreadsheets and Python scripting.

data-science data-visualization excel markdown python spreadjs spreadsheets webviewer

Last synced: 02 Feb 2026

https://github.com/stainlessai/micronaut-jupyter

A Micronaut configuration that integrates your app with an existing Jupyter installation.

data-science jupyter jupyter-notebooks jupyterlab micronaut microservices

Last synced: 14 Jan 2026

https://github.com/bfortuner/zoosearch

Search engine for machine learning models and datasets

data-science deep-learning fusejs machine-learning react

Last synced: 23 Oct 2025

https://github.com/akshaysharma096/classify-human-diseases-using-deeplearning

Automated methods to detect and classify human diseases from medical images, using Deep Neural Networks

convolutional-neural-networks data-science deep-learning keras keras-neural-networks machine-learning python3

Last synced: 12 Sep 2025

https://github.com/jl33-ai/dotplotlib

A basic extension library for creating tree dot plots, strip plots or dot charts w/ matplotlib or seaborn in Python

data-analysis data-science data-visualization dot-chart dotplot dotplots matplotlib-pyplot matplotlib-python python seaborn seaborn-plots strip-plots

Last synced: 07 Sep 2025

https://github.com/zjunlp/datamind

Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study

agent artificial-intelligence data-analysis data-science language-model natural-language-processing

Last synced: 04 Oct 2025

https://github.com/omarsar/nlp_research

🔥 Summary of interesting NLP Papers and Research (Fast and easy reads!) 🔥

artificial-intelligence data-science deep-learning machine-learning nlp

Last synced: 13 Feb 2026

https://github.com/cool-japan/pandrs

DataFrame library for data analysis implemented in Rust. It has features and design inspired by Python's pandas library, combining fast data processing with type safety.

data-analysis data-science datafrane pandas rust rust-lang

Last synced: 04 Apr 2026

https://github.com/adityakamble49/loss-ratio-prediction

Predicting Loss Ratios for Auto Insurance Portfolios - ITCS 6100 Big Data Analytics for Competitive Advantage

big-data big-data-analytics data-science insurance jupyter-notebook politics python

Last synced: 04 Apr 2026

https://github.com/chifisource/ipycells.jl

cells, pluto, ipython, and olive readers and writers

data-science ipython-notebook julia jupyter-notebook odd-data olive pluto pluto-notebooks

Last synced: 28 Oct 2025

https://github.com/madhurimarawat/developer-resources-hub

A comprehensive collection of valuable resources for developers, covering job preparation, programming, frontend, backend, IoT, databases, and more.

ai-art ai-ml app-links aptitude awesome-list blockchain coding-questions data-science databases developer-experience developer-tools free-books free-courses full-stack-development graphic-design iot-resources linux llm powerbi python

Last synced: 11 Oct 2025

https://github.com/scigolib/matlab

Pure Go library for reading and writing MATLAB .mat files (v5-v7.3+). No CGo, no external dependencies. Full support for numeric types, complex numbers, and multi-dimensional arrays. Cross-platform (Windows/Linux/macOS). Part of SciGoLib ecosystem.

cross-platform data-science go golang hdf5 mat-files matlab no-cgo octave pure-go scientific-computing scientific-data

Last synced: 05 Apr 2026

https://github.com/bgreenwell/statlingua

Explain Statistical Output with Large Language Models

data-science explainability large-language-models llm llms statistics teaching-tools

Last synced: 28 Feb 2026

https://github.com/manuparra/volleyball-performance-analysis

R package to Volleyball Performance Analysis and Visualization

analysis data-science datavisualization performance r sport volleyball

Last synced: 12 Apr 2025

https://github.com/kehaowu/dailypython

python日报,每天分享5篇精选python好文

data-science data-visualization machine-learning python

Last synced: 10 Mar 2026

https://github.com/gagolews/programowanie_w_jezyku_r

M. Gągolewski, Programowanie w języku R, PWN, 2016

data-science polski r statistics

Last synced: 14 Jul 2025

https://github.com/calkit/calkit-cloud

A platform for creating and sharing knowledge via Calkit projects.

analytics data-science open-science reproducibility reproducible-research research sharing sharing-data

Last synced: 11 Apr 2026

https://github.com/6chaoran/data-story

data story tech-blog

data-science data-visualization

Last synced: 08 Apr 2026

https://github.com/tsg405/applied-machine-learning-in-python

This Repo contains - Starter files, Coursework, Programming Assignments for the course --> Applied Machine Learning in Python, University of Michigan [COURSERA]

applied-machine-learning assignment classification coursera data-science fruit-dataset machine-learning matplotlib-pyplot numpy pandas python quiz regression scikit-learn scipy seaborn supervised-machine-learning university-of-michigan unsupervised-machine-learning

Last synced: 14 Apr 2025

https://github.com/sachinl0har/lgmvip-data-science

Lets Grow More Data Science Internship. Blog 👇🏻

data-science letsgrowmore lgm lgmvip

Last synced: 28 Jul 2025

https://github.com/ahmednasef3/data-science-roadmap

A Roadmap that it is divided into weeks and tasks for beginners to learn and master data science

beginners data-science master roadmap

Last synced: 03 Jul 2025

https://github.com/selva221724/edasql

edaSQL is a python library to bridge the SQL with Exploratory Data Analysis where you can connect to the Database and insert the queries. The query results can be passed to the EDA tool which can give greater insights to the user.

correlation data-analysis data-science data-visualization dataprofiling eda missing-values outlier-detection pandas python sql

Last synced: 10 Jun 2025

https://github.com/georgesalkhouri/l3wtransformer

A word hashing method based on vectors of letter n-grams. Currently transforms text into sequences of numbers.

bag-of-words data-science feature-extraction letter-trigram-word-hashing python text-processing

Last synced: 10 Apr 2025

https://github.com/tushar2704/my_homebrewed_notebooks_archived-account-kaggle.com-tusharaggarwal27

My_homebrewed_NOTEBOOKS is a GitHub repository that houses a collection of personal notebooks derived from various sources, including Kaggle and Jupyter Notebooks. This repository serves as a curated collection of notebooks created and customized by the repository owner, providing a valuable resource for learning and exploring different topics.

data-analysis data-science kaggle kaggle-competition kaggle-competition-notebooks kaggle-competiton kaggle-scripts machine-learning python

Last synced: 07 May 2025

https://github.com/akbaritabar/bibliometric_data_for_demographic_research

Materials for workshop on "Using bibliometric data in demographic research". A report here: https://iussp.org/en/using-bibliometric-data-demographic-research-0

computational-social-science data-science demographic-research migration-research

Last synced: 07 May 2025

https://github.com/chrislemke/sk-transformers

A collection of pandas & scikit-learn compatible transformers for preprocessing and feature engineering 🛠

data-science feature-engineering feature-selection machine-learning pandas preprocessing python scikit-learn scikit-learn-pipelines scikit-learn-transformer

Last synced: 17 Jun 2025

https://github.com/code2k13/nlphose

Enables creation of complex NLP pipelines in seconds, for processing static files or streaming text, using a set of simple command line tools. Perform multiple operation on text like NER, Sentiment Analysis, Chunking, Language Identification, Q&A, 0-shot Classification and more by executing a single command in the terminal. Can be used as a low code or no code Natural Language Processing solution. Also works with Kubernetes and PySpark !

ai artifical-intelligense data-science language-detection low-code machine-learning named-entity-recognition natural-language-processing nlp no-code sentiment-analysis text-mining twitter-sentiment-analysis

Last synced: 06 May 2025

https://github.com/govau/galileo

Quantifying interactions with government services to support delivery teams to improve their own products and services

analytics data data-science government observatory pandas python r shiny website

Last synced: 10 Jul 2025

https://github.com/ibm-cloud/iot-device-phone-simulator

A web application which acts as an IoT device when loaded in a smart phone browser. The data from the sensors are then used for Anomaly detection.

anomaly-detection cloud data-science datascience gyroscope-data ibm-cloud-solutions internet-of-things iot iot-device machine-learning mobile-web

Last synced: 11 Jul 2025

https://github.com/houarizegai/datasciencelearning

For Learn Data Science

data-science python

Last synced: 14 Jul 2025

https://github.com/jincheng9/python-tutorial

Python tutorial,量化交易,涵盖基础、中级和高级教程

data data-analysis-python data-analyst data-science django flask numpy pandas python quant quant-dev tutorial

Last synced: 07 May 2025

https://github.com/contextlab/data-wrangler

Wrangle messy numerical, image, and text data into consistent well-organized formats

data data-analysis data-science data-wrangling hugging-face image-data machine-learning nlp numpy pandas python scikit-learn

Last synced: 10 Apr 2025

https://github.com/nathaneastwood/brew-ds

Common Data Science set up for Mac and Linux 🍺🔬

data-science homebrew linuxbrew package-manager

Last synced: 08 Sep 2025

https://github.com/carefree0910/carefree-toolkit

Some commonly used functions and modules

data-science numpy python

Last synced: 19 Jul 2025