An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/l480/rewe-price-data

๐Ÿช Daily updated prices of all items from the German supermarket chain REWE as CSV (including EAN, grammage, product image etc.)

csv data-science ean inflation prices rewe shrinkflation supermarket

Last synced: 11 Jan 2026

https://github.com/rbhatia46/data-preprocessing-template

This repository includes all the Data Preprocessing required before using a dataset on a Machine Learning Model. Please refer README on how to use.

data-preprocessing data-science machine-learning python

Last synced: 11 Apr 2025

https://github.com/hsins/mpl-tc-fonts

๐Ÿ‡น๐Ÿ‡ผ A package to solve the problem of "Tofu" in your matplotlib plots whenever you're trying to use Traditional Chinese characters in labels or texts.

cjk-characters data-science matplotlib

Last synced: 29 Oct 2025

https://github.com/ptyadana/tableau_2020_a-z_hands-on

Tableau Projects for data analysis, data analytics and data visualaization on different data sets

data-analysis data-science data-visualization tableau tableau-dashboards tableau-desktop tableau-public tableau-workbooks

Last synced: 03 Aug 2025

https://github.com/alvarobartt/ea-associate-ds

Electronic Arts (EA) NLP Assignment for: Associate Data Scientist

data-science electronic-arts nlp recruitment-task

Last synced: 12 Apr 2025

https://github.com/chandraprakash-bathula/apparel-recommendations

This project implements a personalized apparel recommendation engine using content-based search with the Amazon API, NLTK, and Keras libraries.

boxplot cnn-keras data-analysis data-science deep-learning linear-regression machine-learning numpy pandas scatter-plot scikit-learn svm tensorflow xgboost

Last synced: 23 Mar 2025

https://github.com/aruizeac/alexandria

The Alexandria Project is an open-source platform where people can share their knowledge through books, podcasts, docs and videos.

alexandria data-science donation ebooks go golang grpc http kafka knowledge knowledge-sharing library microservice podcasts python societies streaming videos webservice

Last synced: 11 Mar 2026

https://github.com/nas5w/imdb-data

A JSON file of 50,000 IMDB movie reviews to be used in machine learning applications.

data data-science imdb javascript machine-learning

Last synced: 19 Apr 2025

https://github.com/rbhatia46/python-for-data-science

This repository contains iPython notebooks to get you started with sufficient amount of Python you need to learn to get started with your Data Science Journey.

data-science python-basics python3

Last synced: 03 Sep 2025

https://github.com/numeract/rflow

Flexible R Pipelines with Caching

cache data-science pipeline r rflow

Last synced: 28 May 2026

https://github.com/blurred-machine/data-science

This repository contains all of my minor projects built by me during the learning plase of Machine Learning and Data Science. Feel free to create a PR for modifications.

algorithms-python data-science jupyter-notebook learning-by-doing machine-learning-algorithms minor-project python

Last synced: 27 Apr 2025

https://github.com/koalaverse/analyticssummit19

Material for 2019 Analytics Summit Machine Learning with R Training

data-science educational-materials machine-learning r workshop-materials

Last synced: 15 May 2025

https://github.com/gabrieltempass/abtester

A web application to design and evaluate the results of A/B tests.

ab-testing data-science hypothesis-testing python sample-size statistical-significance statistics streamlit web-app

Last synced: 06 Oct 2025

https://github.com/mertguvencli/keyword-extractor

This project aims to find "what are the trending techs on Data Science jobs?" using NER.

data-science machine-learning ner nlp python spacy

Last synced: 10 Sep 2025

https://github.com/lambdaclass/data_etudes

LambdaClass statistics, machine learning and data science etudes

data-science notebook probability statistics

Last synced: 09 Apr 2025

https://github.com/arv-anshul/yt-watch-history

Analyse your YouTube watch history using Data Science, ML and NLP.

data-science docker docker-compose fastapi ml mlflow mlops mongodb nlp pydantic python3 streamlit youtube-api

Last synced: 22 Apr 2025

https://github.com/ndxdeveloper/formation-python

Formation Python - Du dรฉbutant ร  l'avancรฉ | 13 modules (FastAPI, Type Hints, Data Science, SQLAlchemy, asyncio) | 75+ sujets | 100% franรงais | MIT License

api-rest asyncio data-science developpement fastapi formation francais french learning numpy pandas poetry poo programmation pytest python python3 sqlalchemy type-hints

Last synced: 08 Apr 2026

https://github.com/urbanclimatefr/coursera-learn-sql-basics-for-data-science

This repository contains the materials to "Learn SQL Basics for Data Science", a specialization provided by University of California, Davis through Coursera.

coursera data-science sql

Last synced: 19 Feb 2026

https://github.com/fabriziomusacchio/python_neuro_practical

This is the course material for the advanced course into Python for Data Scientists.

data-analysis data-science jupyter jupyter-notebook jupyter-notebooks open-source python teaching teaching-materials

Last synced: 22 Jul 2025

https://github.com/bcgov/canwqdata

R ๐Ÿ“ฆ to download ๐Ÿ‡จ๐Ÿ‡ฆ open water quality data

data-science env r r-package rlang rstats

Last synced: 20 Jul 2025

https://github.com/mmore500/teeplot

organize data visualization output, automatically picking meaningful names based on semantic plotting variables

data-science data-visualization python python-package workflow

Last synced: 25 Feb 2026

https://github.com/kennethleungty/english-premier-league-var-analysis

Analyzing Video Assistant Referee (VAR) decisions in the English Premier League (2019 - 2021)

data-analysis data-analytics data-science english-premier-league football soccer var

Last synced: 27 Aug 2025

https://github.com/firaskahlaoui/heart-disease-analysis-r

R for data visualization and analysis of heart disease datasets.

data-science data-visualization ggplot kaggle-dataset r statistics

Last synced: 14 Apr 2025

https://github.com/bradflaugher/ai-101

Notes, links and code samples and resources for teaching yourself pytorch and tensorflow.

bootcamp course data-engineering data-science learn-to-code learning-by-doing learning-python machine-learning

Last synced: 10 May 2025

https://github.com/networks-learning/discussion-complexity

Code for "On the Complexity of Opinions and Online Discussions", WSDM 2019

complexity data-science discussion online-discussions opinion-mining paper wsdm

Last synced: 10 Aug 2025

https://github.com/joshuaulrich/stl-rug

Content presented at the Saint Louis R User Group

data-analysis data-science r

Last synced: 26 Aug 2025

https://github.com/vianneymi/baker

Project demonstrating a TDS article about structuring unstructured data using LLMs

data-engineering data-mining data-science langchain llm mistralai pydantic

Last synced: 11 Jul 2025

https://github.com/bluegreen-labs/appeears

Interface to the NASA AppEEARS API

api data-science r-package remote-sensing rstats

Last synced: 23 Aug 2025

https://github.com/a-poor/flask-celery-ml

Handling long-running processes (like ML model predictions) inside a Flask app using Celery.

api celery data-science flask machine-learning python

Last synced: 03 Aug 2025

https://github.com/ammarlodhi255/student_performance_indicator_end-to-end_implementation

An end-to-end machine learning project, student performance indicator. The goal of this project is to understand the influence of the parents background, test preparation, and various other variables on the students performance.

aws cd-pipeline data-analysis data-science data-science-projects eda end-to-end-machine-learning machine-learning machine-learning-projects regression regression-analysis

Last synced: 27 Sep 2025

https://github.com/aiguofer/sql_connectors

A simple wrapper for SQL connections using SQLAlchemy and Pandas read_sql to standardize SQL workflow with multiple data sources.

data-analysis data-analytics data-exploration data-science pandas relational-databases sql sqlalchemy standardized-api

Last synced: 13 Oct 2025

https://github.com/liamarguedas/uber-eats-delivery-time

Delivery time prediction system for Uber Eats

data-science machine-learning regression

Last synced: 10 Oct 2025

https://github.com/opt-nc/setup-duckdb-action

๐Ÿฆ† Blazing Fast and highly customizable Github Action to setup a DuckDb runtime

action actions analytics csv data-science database databases dataquality dataqualitycheck duckdb embedded-database github-actions olap sql

Last synced: 16 Mar 2026

https://github.com/nikhilba/aerial-imagery

Data Science Research Project: Map poverty using satellite images.

carnegie-mellon-university data-science deep-learning ipynb neural-network satellite-images vgg16

Last synced: 28 Oct 2025

https://github.com/sithu-khant/math-for-ml-ds

Mathematics learning path for Machine Learning and Data Science.

awesome-list data-science deep-learning machine-learning mathematics

Last synced: 13 Apr 2025

https://github.com/jdiaz97/iucnredlist.jl

API Wrapper for the IUCN Red List.

biodiversity data-science ecology

Last synced: 21 Oct 2025

https://github.com/toddbirchard/planetjupyter

:red_circle: :milky_way: :telescope: Beautify your Jupyter Notebooks.

data-science flask-application ipython jupyter jupyter-notebook jupyter-themes plotly

Last synced: 14 Apr 2025

https://github.com/zgornel/datalinter

Linting tools for ML workflows, data, code

code-analysis-tool coding-agent data-science linting

Last synced: 21 Apr 2026

https://github.com/buccaneerai/rxjs-stats

Moved to @bottlenose/rxstats (https://github.com/buccaneerai/bottlenose)

analytics data data-mining data-science observables reactive rxjs statistics

Last synced: 15 Jul 2025

https://github.com/tkonopka/rcssplot

R plots styled with css

css data-science r visualization

Last synced: 22 Oct 2025

https://github.com/mrtkp9993/anomalydetectioncpp

Simple anomaly detection for univariate time series data.

anomaly-detection cpp data-science statistics

Last synced: 24 Oct 2025

https://github.com/arose13/rosey

Data science utilities for statistics and machine learning

data-science data-visualization keras machine-learning tensorflow

Last synced: 24 Oct 2025

https://github.com/nikhilaravi/neuralnetflix

Movie Genre Prediction from movie posters using Deep Learning

data-science deeplearning

Last synced: 18 Oct 2025

https://github.com/lironmiz/data.intro

Introductory course in the field of data science of the cyber education center at campus il which touches both the theoretical and the practical aspect of big data analysis in the Python language

big-data course data-analysis data-science data-visualization education jupyter-notebook learning-by-doing matplotlib numpy pandas-library python3 statistics

Last synced: 05 Jul 2025

https://github.com/tezansahu/dvc-pycaret-fastapi-demo

Repository for the Demo of using DVC with PyCaret & MLOps (DVC Office Hours - 20th Jan, 2022)

data-science demo deployment dvc fastapi machine-learning mlops-workflow pycaret

Last synced: 26 Dec 2025

https://github.com/MCodrescu/octopus

R Package for Interacting with Databases

data-science database r rshiny

Last synced: 29 Jul 2025

https://github.com/teddyoweh/sentiment-analysis-api

The Sentiment Analysis Api was created using python flask module,it allows users to parse a text or sentence throught the (?text) arguement, then view the sentiment analysis of that sentence. It can be implementable into a web application.

api data-science flask machine-learning nlp-machine-learning php python sentiment-analysis

Last synced: 09 Apr 2025

https://github.com/gianlucatruda/warfit-learn

A machine learning toolkit for reproducible research in anticoagulant dose estimation.

data-science iwpc pandas preprocessing python reproducible-research sklearn supervised-learning warfarin warfit-learn

Last synced: 24 Oct 2025

https://github.com/supercowpowers/scp-labs

SCP Labs (Open Source Team for SuperCowPowers)

data-analysis data-science pandas python scikit-learn security

Last synced: 06 May 2025

https://github.com/tuanle618/AEDA

AEDA - Automated Data Exploratory Analysis in R

data-science eda eda-report exploratory-data-analysis r

Last synced: 29 Jul 2025

https://github.com/fffaraz/datasets

My collection of random datasets

data-mining data-science dataset

Last synced: 04 Sep 2025

https://github.com/virajbhutada/capstones

This repository contains all the necessary files and documentation for a detailed analysis of bank loan data using a combination of SQL, Power BI, Excel, and Tableau. The project aims to uncover insights related to loan applications, funding, repayments, and borrower demographics, facilitating data-driven decision-making in the banking sector.

bank-loan-analysis dashboard data-science dax-query eda excel excel-dashboard excel-functions mssql-server powerbi powerbi-reports powerbi-visuals sql sql-database tableau tableau-public tableau-server

Last synced: 30 Oct 2025

https://github.com/paritoshtripathi935/product-matching

The topic is about product matching via Machine Learning. This involves using various machine learning techniques such as natural language processing, image recognition, and collaborative filtering algorithms to match similar products together.

amazon-scraper collaborative-filtering data-science django flipkart-scraper-python langchain machine-learning nlp opencv product-matching python

Last synced: 08 Jul 2025

https://github.com/cadcad-org/snippets

Repo containing notebooks showcasing features and applications of cadCAD.

cadcad data-science education python simulation snippets

Last synced: 23 Apr 2025

https://github.com/gregyjames/insidebarscanner

Scan every stock listed on the Nasdaq to find those with daily inside bars for trading,

data-science investment pandas-dataframe python3 scanner stock-market stocks yfinance yfinance-api

Last synced: 25 Apr 2025

https://github.com/fwd/reddit

Graph Visualization UI for Reddit.

data data-science datasets worldnews

Last synced: 24 Apr 2025

https://github.com/nicodupont/resources

Resources on SAS, Python, SQL, VBA-Excel, etc ...

airflow data-science data-visualization excel python r sas sql vba

Last synced: 24 Jun 2025

https://github.com/dionhaefner/fowd

Processing framework for FOWD, a free ocean wave dataset, ready for your ML application :ocean:

data-science machine-learning ocean open-data waves

Last synced: 21 Aug 2025

https://github.com/teddyoweh/dimensionality-reduction-pca

Dimensionality reduction is basically a process of reducing the amount of random features,attributes variables or in this case called dimensions in a dataset and leaving as much variation in the dataset as possible by obtaining a set of only relevant features to increase the effiency of a model.

data-science dataset dimensional-analysis dimensionality-reduction feature-extraction feature-selection machine-learning

Last synced: 09 Apr 2025

https://github.com/mindful-ai-assistants/.github

โœฏ Empowering businesses with AI-driven technologies like Copilots, Agents, Bots, and Predictions, alongside intelligent Decision-Making Suppor

agents artificial-intelligence automation copilots data-science descion-making-systems design geolocation jupyter-notebook machine-learning mathematical-modelling mathpix mongodb oneness-consciousness predictive-analytics predictive-modeling python3 sql tsql

Last synced: 11 Jul 2025

https://github.com/durgeshsamariya/100daysofdatascience

A 100 Day DS Challenge to learn and implement DS concepts ranging from the beginner of Data Science to Data Scientist.

100days 100daysofcode 100daysofdscode 100daysofmlcode data data-science

Last synced: 15 Apr 2025

https://github.com/toxpi/toxpir

toxpiR R package for the Toxicological Priority Index (ToxPi) algorithm.

data-science modeling r r-package toxicology

Last synced: 19 Aug 2025