An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/sithu-khant/math-for-ml-ds

Mathematics learning path for Machine Learning and Data Science.

awesome-list data-science deep-learning machine-learning mathematics

Last synced: 13 Apr 2025

https://github.com/lironmiz/data.intro

Introductory course in the field of data science of the cyber education center at campus il which touches both the theoretical and the practical aspect of big data analysis in the Python language

big-data course data-analysis data-science data-visualization education jupyter-notebook learning-by-doing matplotlib numpy pandas-library python3 statistics

Last synced: 05 Jul 2025

https://github.com/a-poor/flask-celery-ml

Handling long-running processes (like ML model predictions) inside a Flask app using Celery.

api celery data-science flask machine-learning python

Last synced: 03 Aug 2025

https://github.com/zgornel/datalinter

Linting tools for ML workflows, data, code

code-analysis-tool coding-agent data-science linting

Last synced: 21 Apr 2026

https://github.com/ihmeuw/easylink

A tool that allows users to build and run highly configurable record linkage/entity resolution pipelines.

data-science entity-resolution record-linkage

Last synced: 01 Apr 2026

https://github.com/quantifyearth/yirgacheffe

A declarative geospatial library for Python to make data-science with maps easier

data-science geospatial python3

Last synced: 01 Apr 2026

https://github.com/nikhilba/aerial-imagery

Data Science Research Project: Map poverty using satellite images.

carnegie-mellon-university data-science deep-learning ipynb neural-network satellite-images vgg16

Last synced: 28 Oct 2025

https://github.com/tkonopka/rcssplot

R plots styled with css

css data-science r visualization

Last synced: 22 Oct 2025

https://github.com/fwd/reddit

Graph Visualization UI for Reddit.

data data-science datasets worldnews

Last synced: 24 Apr 2025

https://github.com/toxpi/toxpir

toxpiR R package for the Toxicological Priority Index (ToxPi) algorithm.

data-science modeling r r-package toxicology

Last synced: 19 Aug 2025

https://github.com/ammarlodhi255/student_performance_indicator_end-to-end_implementation

An end-to-end machine learning project, student performance indicator. The goal of this project is to understand the influence of the parents background, test preparation, and various other variables on the students performance.

aws cd-pipeline data-analysis data-science data-science-projects eda end-to-end-machine-learning machine-learning machine-learning-projects regression regression-analysis

Last synced: 27 Sep 2025

https://github.com/opt-nc/setup-duckdb-action

🦆 Blazing Fast and highly customizable Github Action to setup a DuckDb runtime

action actions analytics csv data-science database databases dataquality dataqualitycheck duckdb embedded-database github-actions olap sql

Last synced: 16 Mar 2026

https://github.com/nikhilaravi/neuralnetflix

Movie Genre Prediction from movie posters using Deep Learning

data-science deeplearning

Last synced: 18 Oct 2025

https://github.com/manuparra/volleyball-performance-analysis

R package to Volleyball Performance Analysis and Visualization

analysis data-science datavisualization performance r sport volleyball

Last synced: 12 Apr 2025

https://github.com/adityakamble49/loss-ratio-prediction

Predicting Loss Ratios for Auto Insurance Portfolios - ITCS 6100 Big Data Analytics for Competitive Advantage

big-data big-data-analytics data-science insurance jupyter-notebook politics python

Last synced: 04 Apr 2026

https://github.com/bfortuner/zoosearch

Search engine for machine learning models and datasets

data-science deep-learning fusejs machine-learning react

Last synced: 23 Oct 2025

https://github.com/mljar/variable-inspector

Explore variables in Jupyter notebooks

data-science jupyter jupyterlab jupyterlab-extension mljar python

Last synced: 01 Mar 2025

https://github.com/mahdi-eth/linear-regression-from-scratch

This project implements a Python-based linear regression model from scratch, complete with custom functions for mean squared error and gradient descent algorithm. It is tested on data, using features to predict target variables. The project offers a practical introduction to linear regression.

algorithm data-science data-visualization linear-regression machine-learning machine-learning-algorithms python

Last synced: 15 Apr 2025

https://github.com/stainlessai/micronaut-jupyter

A Micronaut configuration that integrates your app with an existing Jupyter installation.

data-science jupyter jupyter-notebooks jupyterlab micronaut microservices

Last synced: 14 Jan 2026

https://github.com/cakecrusher/mimicbot

Mimicbot enables the effortless yet modular creation of an AI chat bot model that imitates another person's manner of speech.

ai bot data-science discord discord-bot huggingface natural-language-processing pypi python python-package

Last synced: 28 Oct 2025

https://github.com/6chaoran/data-story

data story tech-blog

data-science data-visualization

Last synced: 08 Apr 2026

https://github.com/gagolews/programowanie_w_jezyku_r

M. Gągolewski, Programowanie w języku R, PWN, 2016

data-science polski r statistics

Last synced: 14 Jul 2025

https://github.com/scigolib/matlab

Pure Go library for reading and writing MATLAB .mat files (v5-v7.3+). No CGo, no external dependencies. Full support for numeric types, complex numbers, and multi-dimensional arrays. Cross-platform (Windows/Linux/macOS). Part of SciGoLib ecosystem.

cross-platform data-science go golang hdf5 mat-files matlab no-cgo octave pure-go scientific-computing scientific-data

Last synced: 05 Apr 2026

https://github.com/selva221724/edasql

edaSQL is a python library to bridge the SQL with Exploratory Data Analysis where you can connect to the Database and insert the queries. The query results can be passed to the EDA tool which can give greater insights to the user.

correlation data-analysis data-science data-visualization dataprofiling eda missing-values outlier-detection pandas python sql

Last synced: 10 Jun 2025

https://github.com/omarsar/nlp_research

🔥 Summary of interesting NLP Papers and Research (Fast and easy reads!) 🔥

artificial-intelligence data-science deep-learning machine-learning nlp

Last synced: 13 Feb 2026

https://github.com/madhurimarawat/developer-resources-hub

A comprehensive collection of valuable resources for developers, covering job preparation, programming, frontend, backend, IoT, databases, and more.

ai-art ai-ml app-links aptitude awesome-list blockchain coding-questions data-science databases developer-experience developer-tools free-books free-courses full-stack-development graphic-design iot-resources linux llm powerbi python

Last synced: 11 Oct 2025

https://github.com/jincheng9/python-tutorial

Python tutorial,量化交易,涵盖基础、中级和高级教程

data data-analysis-python data-analyst data-science django flask numpy pandas python quant quant-dev tutorial

Last synced: 07 May 2025

https://github.com/akbaritabar/bibliometric_data_for_demographic_research

Materials for workshop on "Using bibliometric data in demographic research". A report here: https://iussp.org/en/using-bibliometric-data-demographic-research-0

computational-social-science data-science demographic-research migration-research

Last synced: 07 May 2025

https://github.com/chrislemke/sk-transformers

A collection of pandas & scikit-learn compatible transformers for preprocessing and feature engineering 🛠

data-science feature-engineering feature-selection machine-learning pandas preprocessing python scikit-learn scikit-learn-pipelines scikit-learn-transformer

Last synced: 17 Jun 2025

https://github.com/govau/galileo

Quantifying interactions with government services to support delivery teams to improve their own products and services

analytics data data-science government observatory pandas python r shiny website

Last synced: 10 Jul 2025

https://github.com/code2k13/nlphose

Enables creation of complex NLP pipelines in seconds, for processing static files or streaming text, using a set of simple command line tools. Perform multiple operation on text like NER, Sentiment Analysis, Chunking, Language Identification, Q&A, 0-shot Classification and more by executing a single command in the terminal. Can be used as a low code or no code Natural Language Processing solution. Also works with Kubernetes and PySpark !

ai artifical-intelligense data-science language-detection low-code machine-learning named-entity-recognition natural-language-processing nlp no-code sentiment-analysis text-mining twitter-sentiment-analysis

Last synced: 06 May 2025

https://github.com/georgesalkhouri/l3wtransformer

A word hashing method based on vectors of letter n-grams. Currently transforms text into sequences of numbers.

bag-of-words data-science feature-extraction letter-trigram-word-hashing python text-processing

Last synced: 10 Apr 2025

https://github.com/xilinjia/xj-strategist

A powerful machine learning and AI system for constructing sustainable strategies for financial trading.

data-analysis data-science data-visualization julia machine-learning quantitative-analysis quantitative-finance quantitative-trading rest-api trading-algorithms trading-strategies

Last synced: 12 May 2025

https://github.com/houarizegai/datasciencelearning

For Learn Data Science

data-science python

Last synced: 14 Jul 2025

https://github.com/ibm-cloud/iot-device-phone-simulator

A web application which acts as an IoT device when loaded in a smart phone browser. The data from the sensors are then used for Anomaly detection.

anomaly-detection cloud data-science datascience gyroscope-data ibm-cloud-solutions internet-of-things iot iot-device machine-learning mobile-web

Last synced: 11 Jul 2025

https://github.com/carefree0910/carefree-toolkit

Some commonly used functions and modules

data-science numpy python

Last synced: 19 Jul 2025

https://github.com/shekohex/my-free-ebooks

My Collection of free E-Books, for everyone !

data-science ebooks free-ebook learning programming-languages

Last synced: 14 Mar 2026

https://github.com/nathaneastwood/brew-ds

Common Data Science set up for Mac and Linux 🍺🔬

data-science homebrew linuxbrew package-manager

Last synced: 08 Sep 2025

https://github.com/tushar2704/my_homebrewed_notebooks_archived-account-kaggle.com-tusharaggarwal27

My_homebrewed_NOTEBOOKS is a GitHub repository that houses a collection of personal notebooks derived from various sources, including Kaggle and Jupyter Notebooks. This repository serves as a curated collection of notebooks created and customized by the repository owner, providing a valuable resource for learning and exploring different topics.

data-analysis data-science kaggle kaggle-competition kaggle-competition-notebooks kaggle-competiton kaggle-scripts machine-learning python

Last synced: 07 May 2025

https://github.com/contextlab/data-wrangler

Wrangle messy numerical, image, and text data into consistent well-organized formats

data data-analysis data-science data-wrangling hugging-face image-data machine-learning nlp numpy pandas python scikit-learn

Last synced: 10 Apr 2025

https://github.com/jl33-ai/dotplotlib

A basic extension library for creating tree dot plots, strip plots or dot charts w/ matplotlib or seaborn in Python

data-analysis data-science data-visualization dot-chart dotplot dotplots matplotlib-pyplot matplotlib-python python seaborn seaborn-plots strip-plots

Last synced: 07 Sep 2025

https://github.com/srlozano/tinder-big-data-analysis

Big Data Analysis of Tinder done at Universitat Rovira i Virgili and Universitat Politècnica de Catalunya · BarcelonaTech

big-data big-data-analytics data-science dating-app mongodb python

Last synced: 11 Oct 2025

https://github.com/bgreenwell/statlingua

Explain Statistical Output with Large Language Models

data-science explainability large-language-models llm llms statistics teaching-tools

Last synced: 28 Feb 2026

https://github.com/clowdr/clowdr

Command-line utility for iteratively developing pipelines, deploying them at scale, and sharing data and derivatives

data-science docker hpc-applications pipelines python singularity

Last synced: 14 Jan 2026

https://github.com/cool-japan/pandrs

DataFrame library for data analysis implemented in Rust. It has features and design inspired by Python's pandas library, combining fast data processing with type safety.

data-analysis data-science datafrane pandas rust rust-lang

Last synced: 04 Apr 2026

https://github.com/serialbandicoot/great-assertions

This library is inspired by the Great Expectations library. The library has made the various expectations found in Great Expectations available when using the inbuilt python unittest assertions.

data-science data-testing databricks great-expectations jupyter-notebook python python3 quality-assurance testing

Last synced: 28 Oct 2025

https://github.com/chifisource/ipycells.jl

cells, pluto, ipython, and olive readers and writers

data-science ipython-notebook julia jupyter-notebook odd-data olive pluto pluto-notebooks

Last synced: 28 Oct 2025

https://github.com/sachinl0har/lgmvip-data-science

Lets Grow More Data Science Internship. Blog 👇🏻

data-science letsgrowmore lgm lgmvip

Last synced: 28 Jul 2025

https://github.com/nalomran/pyreqtl

A collection of Python modules equivalent to R ReQTL Toolkit aims to identify the association between expressed SNVs with their gene expression using RNA-sequencing data.

bioinformatics bioinformatics-analysis bioinformatics-tool data-science gene-expression matrixeqtl numpy pandas python python3 r rna-seq rna-seq-analysis rpy2 scipy snvs

Last synced: 27 Oct 2025

https://github.com/mrdandelion6/learn-to-code

This repository is a collection of my notes and code snippets as I journey through learning different programming languages and coding concepts.

c data-analysis data-science javascript learn-to-code machine-learning matlab python r react shell-script

Last synced: 11 Apr 2025

https://github.com/akshaysharma096/classify-human-diseases-using-deeplearning

Automated methods to detect and classify human diseases from medical images, using Deep Neural Networks

convolutional-neural-networks data-science deep-learning keras keras-neural-networks machine-learning python3

Last synced: 12 Sep 2025

https://github.com/celbridge-org/celbridge

Celbridge is an open source tool that provides a bridge between spreadsheets and Python scripting.

data-science data-visualization excel markdown python spreadjs spreadsheets webviewer

Last synced: 02 Feb 2026

https://github.com/kehaowu/dailypython

python日报,每天分享5篇精选python好文

data-science data-visualization machine-learning python

Last synced: 10 Mar 2026

https://github.com/storopoli/linguagem-r

Disciplina de Linguagem R para Ciência de Dados de Pós-Graduação da UNINOVE

data-science r-language r-programming r-stats

Last synced: 31 Oct 2025

https://github.com/zjunlp/datamind

Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study

agent artificial-intelligence data-analysis data-science language-model natural-language-processing

Last synced: 04 Oct 2025

https://github.com/ahmednasef3/data-science-roadmap

A Roadmap that it is divided into weeks and tasks for beginners to learn and master data science

beginners data-science master roadmap

Last synced: 03 Jul 2025

https://github.com/giswqs/leafmaptools

A Python package for building a tool widgets infrastructure with ipyleaflet and ipywidgets

data-science data-visualization geopython geospatial ipyleaflet ipywidgets jupyter jupyter-notebook mapping python

Last synced: 12 May 2025

https://github.com/minusxai/minusx

MinusX is an Agentic Business Intelligence platform. It's Claude Code for data.

artificial-intelligence data-analytics data-science jupyter metabase

Last synced: 18 Feb 2026

https://github.com/calkit/calkit-cloud

A platform for creating and sharing knowledge via Calkit projects.

analytics data-science open-science reproducibility reproducible-research research sharing sharing-data

Last synced: 11 Apr 2026

https://github.com/iterative/features

A collection of development container 'features' for machine learning and data science

data-science dvc features machine-learning

Last synced: 18 Jun 2025