An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/thomasnield/oreilly_kotlin_for_data_science

Notes, slides, and contents for the O'Reilly videos using Kotlin for Data Science

data-engineering data-science etl kotlin oreilly statistics

Last synced: 27 Mar 2025

https://github.com/l480/rewe-price-data

๐Ÿช Daily updated prices of all items from the German supermarket chain REWE as CSV (including EAN, grammage, product image etc.)

csv data-science ean inflation prices rewe shrinkflation supermarket

Last synced: 11 Jan 2026

https://github.com/yangfa-zhang/lunax

Lunax is a machine learning framework specifically designed for the processing and analysis of tabular data.

data-analysis data-science lunax machine-learning tabular-data

Last synced: 14 Dec 2025

https://github.com/florents-tselai/sqlite-for-data-scientists

Notebooks and supporting files for SQLite for Data Scientists Online Live Training, on OReilly Learning Platform

data-science learning sql sqlite3 training-materials

Last synced: 11 Apr 2025

https://github.com/dhhruv/stock-price-prediction

A deep learning project in which the model was trained using LSTM layers and Tata Stock prices were predicted and compared with thier actual values.

algorithm cli college-project data data-science dataset deep-learning jupyter jupyter-notebook lstm machine-learning prediction science shell stock-price-prediction tata-beverages terminal

Last synced: 03 May 2025

https://github.com/klarna-incubator/mleko

Simplify and accelerate your machine learning development with mleko. Designed with modularity and customization in mind, it seamlessly integrates into your existing workflows. Its robust caching system optimizes performance, taking you from data ingestion to finalized models with unparalleled efficiency.

artificial-intelligence data-science machine-learning pipeline python vaex

Last synced: 11 Apr 2025

https://github.com/rbhatia46/data-preprocessing-template

This repository includes all the Data Preprocessing required before using a dataset on a Machine Learning Model. Please refer README on how to use.

data-preprocessing data-science machine-learning python

Last synced: 11 Apr 2025

https://github.com/hsins/mpl-tc-fonts

๐Ÿ‡น๐Ÿ‡ผ A package to solve the problem of "Tofu" in your matplotlib plots whenever you're trying to use Traditional Chinese characters in labels or texts.

cjk-characters data-science matplotlib

Last synced: 29 Oct 2025

https://github.com/bcgov/canwqdata

R ๐Ÿ“ฆ to download ๐Ÿ‡จ๐Ÿ‡ฆ open water quality data

data-science env r r-package rlang rstats

Last synced: 20 Jul 2025

https://github.com/luminousmen/python_for_ds

Python for Data Analysis workshop

data-analysis data-science python tutorial

Last synced: 01 May 2025

https://github.com/thecoderpinar/spotify_trends_2023_analysis

Exploring Spotify's latest trends, top songs, genres, and artists using Python, Pandas, NumPy, Matplotlib, CNNs for image-based analysis, and advanced algorithms for music recommendation. Dive into the world of music data and discover what's trending on Spotify! ๐ŸŽต๐Ÿ“Š

cnn cnn-keras data-analysis data-science data-visualization machine-learning matplotlib music-trend numpy pandas python spotify

Last synced: 30 Apr 2025

https://github.com/ptyadana/tableau_2020_a-z_hands-on

Tableau Projects for data analysis, data analytics and data visualaization on different data sets

data-analysis data-science data-visualization tableau tableau-dashboards tableau-desktop tableau-public tableau-workbooks

Last synced: 03 Aug 2025

https://github.com/doubleml/doubleml-serverless

DoubleML-Serverless - Distributed Double Machine Learning with a Serverless Architecture

aws-lambda causal-inference data-science double-machine-learning econometrics machine-learning python scikit-learn serverless statistics

Last synced: 07 May 2025

https://github.com/joaocarabetta/project-templates

Fast Project Templates

data-science python template

Last synced: 19 Sep 2025

https://github.com/cimentadaj/dataharvesting

Material for the course 'Data Harvesting' for the masters in computational social science - UC3M

api data-science r web-scraping

Last synced: 30 Apr 2025

https://github.com/rasmusrynell/predicting-nhl

The project explores the idea of using different machine learning techniques to determine different stats in NHL games.

ai algorithms data-science database machine-learning ml nhl nhl-api python scikit-learn sports sports-analytics sports-stats sportsanalytics

Last synced: 14 Apr 2025

https://github.com/dhimmel/openskistats

The study of skiing where we shred open data like pow. Quantifying alpine ski areas with geospatial metrics derived from OpenStreetMap.

data-science data-visualization downhill elevation geospatial gis mapping open-data openskimap openstreetmap orientation python quarto ski-areas skiing slope snowpack solar-irradiance sunlight topography

Last synced: 21 Jul 2025

https://github.com/bradflaugher/ai-101

Notes, links and code samples and resources for teaching yourself pytorch and tensorflow.

bootcamp course data-engineering data-science learn-to-code learning-by-doing learning-python machine-learning

Last synced: 10 May 2025

https://github.com/anaclumos/heart-diagnosis-engine

2019๋…„ ๋ฏผ์กฑ์‚ฌ๊ด€๊ณ ๋“ฑํ•™๊ต ์กธ์—… ํ”„๋กœ์ ํŠธ

data-science machine-learning pandas python scikit-learn

Last synced: 22 Aug 2025

https://github.com/networks-learning/discussion-complexity

Code for "On the Complexity of Opinions and Online Discussions", WSDM 2019

complexity data-science discussion online-discussions opinion-mining paper wsdm

Last synced: 10 Aug 2025

https://github.com/mratsim/meilleur-data-scientist-france-2018

My solution for the competition "Le meilleur data scientist de France 2018" (Best Data Scientist of France 2018)

data-science data-science-competition machine-learning xgboost

Last synced: 15 Sep 2025

https://github.com/fabriziomusacchio/python_neuro_practical

This is the course material for the advanced course into Python for Data Scientists.

data-analysis data-science jupyter jupyter-notebook jupyter-notebooks open-source python teaching teaching-materials

Last synced: 22 Jul 2025

https://github.com/hassaku/audio-plot

Python library to converts a line graph to sound and return an object that can be played in Jupyter notebook or Google Colab. Values are represented by pitches, and the timeline is represented by left and right pans. It was created to make data science fun for the visually impaired.

audio-plot colab data-science jupyter-notebook python visually-impaired

Last synced: 01 Nov 2025

https://github.com/firaskahlaoui/heart-disease-analysis-r

R for data visualization and analysis of heart disease datasets.

data-science data-visualization ggplot kaggle-dataset r statistics

Last synced: 14 Apr 2025

https://github.com/ndxdeveloper/formation-python

Formation Python - Du dรฉbutant ร  l'avancรฉ | 13 modules (FastAPI, Type Hints, Data Science, SQLAlchemy, asyncio) | 75+ sujets | 100% franรงais | MIT License

api-rest asyncio data-science developpement fastapi formation francais french learning numpy pandas poetry poo programmation pytest python python3 sqlalchemy type-hints

Last synced: 08 Apr 2026

https://github.com/juniortorresmtj/projeto_deupositivo

Projeto de Anรกlise de Dados Abertos - SUS

alura bootcampds brazil data-science projeto python

Last synced: 29 Jul 2025

https://github.com/MCodrescu/octopus

R Package for Interacting with Databases

data-science database r rshiny

Last synced: 29 Jul 2025

https://github.com/adilshamim8/100-ai-machine-learning-deep-learnin-projects

100 AI Machine Learning Deep Learning Projects is a curated repository showcasing innovative, production-ready solutions across computer vision, NLP, and more.

ai artificial-intelligence computer-vision computer-vision-projects data-science deep-learning deep-learning-projects machine-learning machine-learning-projects nlp nlp-projects python

Last synced: 20 Apr 2026

https://github.com/durgeshsamariya/100daysofdatascience

A 100 Day DS Challenge to learn and implement DS concepts ranging from the beginner of Data Science to Data Scientist.

100days 100daysofcode 100daysofdscode 100daysofmlcode data data-science

Last synced: 15 Apr 2025

https://github.com/ammarlodhi255/student_performance_indicator_end-to-end_implementation

An end-to-end machine learning project, student performance indicator. The goal of this project is to understand the influence of the parents background, test preparation, and various other variables on the students performance.

aws cd-pipeline data-analysis data-science data-science-projects eda end-to-end-machine-learning machine-learning machine-learning-projects regression regression-analysis

Last synced: 27 Sep 2025

https://github.com/bluegreen-labs/appeears

Interface to the NASA AppEEARS API

api data-science r-package remote-sensing rstats

Last synced: 23 Aug 2025

https://github.com/tezansahu/dvc-pycaret-fastapi-demo

Repository for the Demo of using DVC with PyCaret & MLOps (DVC Office Hours - 20th Jan, 2022)

data-science demo deployment dvc fastapi machine-learning mlops-workflow pycaret

Last synced: 26 Dec 2025

https://github.com/fwd/reddit

Graph Visualization UI for Reddit.

data data-science datasets worldnews

Last synced: 24 Apr 2025

https://github.com/synthesized-io/insight

๐Ÿงฟ Metrics & Monitoring of Datasets

data data-analysis data-science framework insights metrics monitoring python

Last synced: 24 Jun 2025

https://github.com/fabiosmuu/rna

Este repositรณrio tem como intuito, demonstrar um modulo de redes neurais que venho desenvolvendo.

algorithms data-science ia inteligencia-artificial redes-neurais-artificiais rna

Last synced: 10 Apr 2025

https://github.com/buccaneerai/rxjs-stats

Moved to @bottlenose/rxstats (https://github.com/buccaneerai/bottlenose)

analytics data data-mining data-science observables reactive rxjs statistics

Last synced: 15 Jul 2025

https://github.com/codewithmuh/insatgram-ai-model

Create high-quality images effortlessly for your brand using Fooocus, an advanced image generation software.

ai ai-models artificial-intelligence chatgpt data-science generative-ai-model generative-ai-tools generative-model instagram machine-learning models text-to-image

Last synced: 10 Apr 2025

https://github.com/gabrieldim/calculation-cholesterol-data-science

Cholesterol is calculated from the given set of data.

convolutional-layers data-science dense layer

Last synced: 07 Jul 2025

https://github.com/sahahn/bpt

The Brain Predictability toolbox (BPt), is a python based Machine Learning library designed primarily for tabular and neuroimaging specific neuroimaging data but can easily be generalized further.

bp bpt brain-predictability-toolbox data-analysis data-science machine-learning ml neuroimaging-data neuroscience neuroscience-methods pandas python sklearn

Last synced: 13 Apr 2025

https://github.com/vianneymi/baker

Project demonstrating a TDS article about structuring unstructured data using LLMs

data-engineering data-mining data-science langchain llm mistralai pydantic

Last synced: 11 Jul 2025

https://github.com/zohaib58/gdsc-dsx2022

Google Developers Student Club - Data Science Bootcamp 2022

data-science

Last synced: 05 May 2025

https://github.com/millengustavo/demo-datasus-streamlit

Demo Application with DataSUS death records and Streamlit

data-science datasus health healthcare streamlit

Last synced: 10 Apr 2025

https://github.com/virajbhutada/capstones

This repository contains all the necessary files and documentation for a detailed analysis of bank loan data using a combination of SQL, Power BI, Excel, and Tableau. The project aims to uncover insights related to loan applications, funding, repayments, and borrower demographics, facilitating data-driven decision-making in the banking sector.

bank-loan-analysis dashboard data-science dax-query eda excel excel-dashboard excel-functions mssql-server powerbi powerbi-reports powerbi-visuals sql sql-database tableau tableau-public tableau-server

Last synced: 30 Oct 2025

https://github.com/sithu-khant/math-for-ml-ds

Mathematics learning path for Machine Learning and Data Science.

awesome-list data-science deep-learning machine-learning mathematics

Last synced: 13 Apr 2025

https://github.com/torkamanilab/zoish

Zoish is a Python package that streamlines machine learning by leveraging SHAP values for feature selection and interpretability, making model development more efficient and user-friendly

automl data-science feature-engineering feature-selection machine-learning python scikit-learn

Last synced: 10 Apr 2025

https://github.com/tkonopka/rcssplot

R plots styled with css

css data-science r visualization

Last synced: 22 Oct 2025

https://github.com/fffaraz/datasets

My collection of random datasets

data-mining data-science dataset

Last synced: 04 Sep 2025

https://github.com/nicodupont/resources

Resources on SAS, Python, SQL, VBA-Excel, etc ...

airflow data-science data-visualization excel python r sas sql vba

Last synced: 24 Jun 2025

https://github.com/teddyoweh/dimensionality-reduction-pca

Dimensionality reduction is basically a process of reducing the amount of random features,attributes variables or in this case called dimensions in a dataset and leaving as much variation in the dataset as possible by obtaining a set of only relevant features to increase the effiency of a model.

data-science dataset dimensional-analysis dimensionality-reduction feature-extraction feature-selection machine-learning

Last synced: 09 Apr 2025

https://github.com/nikhilaravi/neuralnetflix

Movie Genre Prediction from movie posters using Deep Learning

data-science deeplearning

Last synced: 18 Oct 2025

https://github.com/mrtkp9993/anomalydetectioncpp

Simple anomaly detection for univariate time series data.

anomaly-detection cpp data-science statistics

Last synced: 24 Oct 2025

https://github.com/supercowpowers/scp-labs

SCP Labs (Open Source Team for SuperCowPowers)

data-analysis data-science pandas python scikit-learn security

Last synced: 06 May 2025

https://github.com/jdiaz97/iucnredlist.jl

API Wrapper for the IUCN Red List.

biodiversity data-science ecology

Last synced: 21 Oct 2025

https://github.com/quantifyearth/yirgacheffe

A declarative geospatial library for Python to make data-science with maps easier

data-science geospatial python3

Last synced: 01 Apr 2026

https://github.com/arose13/rosey

Data science utilities for statistics and machine learning

data-science data-visualization keras machine-learning tensorflow

Last synced: 24 Oct 2025

https://github.com/gianlucatruda/warfit-learn

A machine learning toolkit for reproducible research in anticoagulant dose estimation.

data-science iwpc pandas preprocessing python reproducible-research sklearn supervised-learning warfarin warfit-learn

Last synced: 24 Oct 2025

https://github.com/amirhosseinhonardoust/workout-efficiency-benchmark

Streamlit + Python pipeline that benchmarks gym workout efficiency (kcal/min) using present sessions only. Generates sortable workout-type benchmarks, distribution plots, fairness-aware gap analysis with uncertainty/low-sample flags, and a data-quality report to prevent misleading comparisons.

analytics benchmarking bias-audit dashboard data-analysis data-quality data-science eda fairness fitness health-data pandas plotly python reporting reproducible-research statistics streamlit visualization workout

Last synced: 10 Jun 2026

https://github.com/zgornel/datalinter

Linting tools for ML workflows, data, code

code-analysis-tool coding-agent data-science linting

Last synced: 21 Apr 2026

https://github.com/aiguofer/sql_connectors

A simple wrapper for SQL connections using SQLAlchemy and Pandas read_sql to standardize SQL workflow with multiple data sources.

data-analysis data-analytics data-exploration data-science pandas relational-databases sql sqlalchemy standardized-api

Last synced: 13 Oct 2025

https://github.com/liamarguedas/uber-eats-delivery-time

Delivery time prediction system for Uber Eats

data-science machine-learning regression

Last synced: 10 Oct 2025

https://github.com/toxpi/toxpir

toxpiR R package for the Toxicological Priority Index (ToxPi) algorithm.

data-science modeling r r-package toxicology

Last synced: 19 Aug 2025

https://github.com/nikhilba/aerial-imagery

Data Science Research Project: Map poverty using satellite images.

carnegie-mellon-university data-science deep-learning ipynb neural-network satellite-images vgg16

Last synced: 28 Oct 2025

https://github.com/ihmeuw/easylink

A tool that allows users to build and run highly configurable record linkage/entity resolution pipelines.

data-science entity-resolution record-linkage

Last synced: 01 Apr 2026