An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/adamvvu/snapshot_ensemble

Train TensorFlow Keras models with cosine annealing and save an ensemble of models with no additional computational expense.

data-science deep-learning keras machine-learning python tensorflow

Last synced: 28 Oct 2025

https://github.com/jcolechanged/josh.meanings

A k means implementation in Clojure which supports clustering on larger than memory but smaller than storage datasets.

assumption-free-k-mc clojure-library clustering data-science k-mc k-means k-means-clustering k-means-parallel k-means-plus-plus machine-learning medium-data

Last synced: 13 Apr 2025

https://github.com/stefanrmmr/kaggle_twitter_airline_sentiment

Kaggle Twitter US Airline Sentiment, Implementation of a Tweet Text Sentiment Analysis Model, using custom trained Word Embeddings and LSTM-Deep learning [TUM-Data Analysis&ML summer 2021] @adrianbruenger @stefanrmmr

data-science deep-learning kaggle-airline-dataset kaggle-sentiment-analysis kaggle-us-airlines lstm-neural-networks python sentiment-analysis skipgram text-sentiment-classification tweepy tweet-classification tweet-sentiment-analysis twitter twitter-sentiment-analysis us-airline-dataset word2vec

Last synced: 19 Mar 2025

https://github.com/zai-kun/2d-chess-pieces-detection

YOLO11n model for detecting 2d chess board and pieces

ai chess data-science machine-learning onnxruntime python yolov11

Last synced: 19 Jul 2025

https://github.com/crissyro/base-of-ds

This repository serves as a foundation for projects in Data Science and Machine Learning.

clustering-algorithm data-science data-visualization machine-learning

Last synced: 04 Jul 2025

https://github.com/eikevons/pandas-paddles

Access the parent Pandas data frame in loc[], iloc[], assign(), and others Pandas helpers

data-analysis data-exploration data-science pandas pandas-dataframe pandas-library pandas-loc

Last synced: 16 Jun 2025

https://github.com/praveen1664/chatbot

This is a chatbot written in python & getting inputs directly from sql database

chatbot data-science database json nlp python3 sqlite sqlite3

Last synced: 11 Jul 2025

https://github.com/blmoore/summerdatachallenge

My entry for: http://summerdatachallenge.com (I came 3rd)

analytics data-science london r real-estate rstats

Last synced: 30 Apr 2025

https://github.com/chiraag-kakar/pubg

What's the best strategy to win in PUBG? Should you sit in one spot and hide your way into victory, or do you need to be the top shot? Let's let the data do the talking!

data-science feature-engineering machine-learning-algorithms project pubg-api random-forest

Last synced: 07 May 2025

https://github.com/nationalparkservice/qckit

QCkit provides useful functions for data quality control and manipulation including updating data to DarwinCore standards, unit conversions, and data flagging.

darwin-core data-quality data-science npsdataverse quality-control r r-package rstats

Last synced: 22 Jun 2025

https://github.com/vedadiyan/genql

GenQL is a generic querying language fully written in Go

data-analysis data-mapping data-processing data-science data-translation json json-data sql

Last synced: 22 Jun 2025

https://github.com/rahul-jha98/restauranttrends.stats

Visualise the trends in food and restaurant choices of customers in a city by scraping data from Zomato.

data-analysis data-science visualization vuejs zomato zomato-api zomato-scraper

Last synced: 08 Jul 2025

https://github.com/marwan116/supreme-task

A prefect extension that builds on top of the task decorator to reduce negative engineering!

data-ops data-science infrastructure ml-ops orchestration prefect python workflow

Last synced: 23 Jun 2025

https://github.com/neo4j-graph-examples/contact-tracing

Contact Tracing graph for pandemic spread e.g. COVID-19 based on http://blog.bruggen.com/search/label/contact%20tracing

contact-tracing covid-data covid19 data-science dataset example-data graphdb healthcare neo4j neo4j-approved

Last synced: 18 Jul 2025

https://github.com/yusufcinarci/web-scraping-projects

In these project files, I will host the web scraping examples that I will make day by day.

data-analysis data-science jupyter-notebook python web-scraping

Last synced: 01 May 2025

https://github.com/iguptashubham/online-retail-sales

This Power BI dashboard, designed for marketing strategists, analyzes sales trends and customer behavior. It provides key insights empowering them to identify sales opportunities and optimize marketing campaigns, ultimately boosting business sales.

dashboard data data-analysis data-analysis-project data-analysis-project-powerbi data-analysis-python data-project data-science powerbi project

Last synced: 19 Mar 2026

https://github.com/armanx200/gold-price-prediction

๐Ÿ” Predicting the future adjusted closing price of Gold ETF using machine learning! ๐Ÿ“ˆโœจ

arman-kianian data-science data-visualization finance gold-price-prediction machine-learning prediction-models python random-forest regression stock-market time-series-analysis

Last synced: 30 Apr 2025

https://github.com/lucasrodes/pyphoon

Tools for Digital Typhoon DL/ML Project

data-science dataset environment machine-learning tropical-cyclone

Last synced: 18 Mar 2025

https://github.com/hunterdii/iriswise

IrisWise is a machine learning application for predicting Iris flower species. Built with Streamlit, this app provides a user-friendly interface to input flower measurements and receive predictions using various models, including K-Nearest Neighbors, (Random Forest, SVM, and Logistic Regression) **(Working On It...)**.

classifier-model data-science flowers-recognition iris-dataset iris-recognition knn-classification machine-learning pickle python python3 streamlit streamlit-webapp

Last synced: 21 Feb 2026

https://github.com/zaman-hamza/citadel-datathon

My submission to the 2022 East Coast Datathon. The event started on the 21st of March and ended on the 28th, lasting about a whole week. I was in a team of two where we analyzed the non-conventional indicators and instigators of traffic.

citadel data-science data-visualization datathon

Last synced: 10 Apr 2025

https://github.com/davidssmith/rawarray.jl

Raw array (RA) file format for simple, robust, and user-friendly N-dimensional array storage

bytes complex-numbers data-science file-format julia large-dataset large-files ra-format rawarray scientific-computing storage

Last synced: 10 Sep 2025

https://github.com/gjtorikian/destroy-all-monuments

This is data taken from the SPLC report titled "Whose Heritage? Public Symbols of the Confederacy" from April 21, 2016

data-science government-data social-justice

Last synced: 10 Apr 2025

https://github.com/tensorsense/vlm_databuilder

This SDK generates datasets for training Video LLMs from youtube videos.

data-generation data-science llm video-llms vlm

Last synced: 11 Sep 2025

https://github.com/jbris/stan-cmdstanr-gpu-docker

A Docker image to run Stan, cmdstanr, and brms for Bayesian statistical modelling. GPU support using OpenCL is available.

bayes bayesian-inference brms cmdstan cmdstanr data-science docker posterior probabilistic-programming projpred rstan rstanarm shinystan stan stan-gpu stan-lang stan-math-library tidybayes tidyverse

Last synced: 04 May 2025

https://github.com/whizsid/kddbscan-rs

A rust library inspired by kDDBSCAN clustering algorithm

clustering data-science density-based-clustering deviation machine-learning-algorithms pinned

Last synced: 10 Apr 2025

https://github.com/carlos-gg/digitalgarden

NO LONGER MAINTAINED. Go to: https://aigarden.vercel.app/

artificial-intelligence data-science digital-garden knowledge-management machine-learning

Last synced: 07 May 2025

https://github.com/leomaurodesenv/data-science-api-framework

A simple framework to test and deploy your Data Science API

api api-rest data-science dataops docker flask-api python

Last synced: 09 Sep 2025

https://github.com/nikbarb810/pattern-recognition

Basic pattern recognition algorithms implemented in Python

data-science ipynb-jupyter-notebook matplotlib numpy pattern-recognition python

Last synced: 06 Mar 2026

https://github.com/polakowo/textai

Applications using state-of-the-art in NLP

bert data-science gpt-2 machine-learning nlp telegram-bot transformers

Last synced: 07 May 2025

https://github.com/newjerseystyle/litepolis

The package manager of a Customizable e-democracy opinion collection and insight mining system. Built using Python and optimized for scalability and performance.

civic-tech data-science deliberative-democracy litepolis package-manager participatory-democracy

Last synced: 28 Feb 2026

https://github.com/pfed-prog/catalonia_data

we have analyzed air quality in Catalonia by using the data from the Catalan Transparency Portal.

data-science dspyt jupyter-notebook ocean oceanprotocol python3

Last synced: 05 Oct 2025

https://github.com/amirhosseinhonardoust/algorithmic-empath-human-fallibility

A deep exploration of Algorithmic Empathy, the next frontier in AI understanding. This project examines how machines can learn from human fallibility, model disagreement, and align with moral reasoning. It blends psychology, fairness metrics, interpretability, and co-learning design into one framework for humane intelligence.

ai algorithmic-bias co-learning cognitive-science data-science empathy ethics fairness human-centered-ai intelligence interpretability machine-learning neural-networks neurosymbolic philosophy psychology reflective-ai research responsible-ai xai

Last synced: 28 Feb 2026

https://github.com/oscarqjh/ntu_sc1015_project

A mini project for NTU's data science and artificial intelligence mod - Analysis on League of Legends competitive matches

data-science machine-learning pandas python scikit-learn

Last synced: 09 Apr 2025

https://github.com/hritik5102/shala2020

MastAI ki paathSHALA : Data Science, Machine Learning, and Deep Learning codes with explanation and reference links ๐Ÿ‘จโ€๐Ÿ’ป

artificial-intelligence computer-vision data-science deep-learning machine-learning statistics

Last synced: 12 Oct 2025

https://github.com/trafficgcn/optimal_path_dijkstra_for_data_science

Plotting the Optimal Route in Python for Data Scientists using the Dijkstra Algorithm

data-science dijkstra dijkstra-algorithm dijkstra-shortest-path map mapping open-street-map optimal-route osm osmnx python

Last synced: 27 Oct 2025

https://github.com/sevdanurgenc/r-programming-for-data-science-lecture-notes

In this repo, I have the course contents of R Programming For Data Science training, which will be given to Sigorta Bilgi ve Gรถzetim Merkezi by the cooperation of Academy Peak Information Technologies Training and Consultancy between 21 - 23 March 2023.

data-analysis data-science data-visualization r r-programming r-programming-projects

Last synced: 11 Oct 2025

https://github.com/joaopfonseca/ml-research

A Python library with utilities for Machine Learning research and algorithm implementations

active-learning data-science machine-learning python scikit-learn

Last synced: 26 Oct 2025

https://github.com/ahammadmejbah/python-problem-statement-and-solutions

Create a Python Problem Statement to challenge programmers. Specify a task, input/output requirements, and constraints. Ensure clarity and complexity to evaluate coding skills effectively.

data-science machine machine-learning python python3

Last synced: 27 Apr 2025

https://github.com/caerbannogwhite/aargh

A library that helps you out of data nightmares in Go. ๐Ÿง™โ€โ™‚๏ธ

csv data data-science data-wrangling dataframe go golang html json linq statistics stats xlsx xpt

Last synced: 14 Jan 2026

https://github.com/sferez/twitter_toolbox

Complete Toolbox for Scraping, Streaming, Interact with API, Cleaning, Preprocessing, Applying NLP on Twitter Data

data-collection data-science nlp preprocessing twitter twitter-api twitter-scraping twitter-streaming-api

Last synced: 10 Apr 2025

https://github.com/marianogappa/sctool

Starcraft: Remastered replay analyzer library and CLI tool

cli cli-app data-science replays starcraft starcraft-broodwar starcraft-remastered

Last synced: 17 Apr 2026

https://github.com/zsxkib/ttds-g35-cw3

TTDS Group Project: Video Games Search Engine. Sakib Ahamed. Dan Buxton, Kenza Amira, Wini Lau, Mansoor Ahmad

corpora data-science neural-ranking-models pagerank query search-engine technologies text text-analysis text-classification ttds web-search

Last synced: 10 Apr 2025

https://github.com/taharallouche/hakeem

Flexible crowdsourced data labeling solutions for scarce and incomplete annotations

crowdsourcing data-science datalabeling python

Last synced: 10 Oct 2025

https://github.com/insightsengineering/nest

Website for the Nest project ๐Ÿชบ

clinical-trial-analysis data-science nest r shiny website

Last synced: 12 Sep 2025

https://github.com/gbeckers/birdwatcher

A Python computer vision library for animal behavior

animal behavior computer-vision data-science ffmpeg opencv python science

Last synced: 13 Oct 2025

https://github.com/milos-agathon/forest_map_europe

This repo demonstrates how to easily overlay community polygons on forest cover data and make a beautiful map using R and ggplot2

data-science data-visualization ggplot2 gis r satellite-imagery spatial-analysis zonal-statistics

Last synced: 04 Apr 2026

https://github.com/akcarsten/non_negative_matrix_factorization

From scratch Python implementation of the Non-Negative Matrix Factorization algorithm.

clustering data-science machine-learning python

Last synced: 11 Mar 2026

https://github.com/cricksmaidiene/mids_machine_learning

๐Ÿค– A unified repository of coursework fragments from UC Berkeley MIDS ML courses

coursework data-science generative-ai jupyter-notebook machine-learning numpy pandas prompt-engineering scikit-learn spark tensorflow uc-berkeley

Last synced: 10 Oct 2025

https://github.com/ucla-biostat-203b/2023winter

Course webpage for UCLA Biostat 203B (Intro. to Data Science)

biostatistics data-science docker machine-learning r

Last synced: 07 Sep 2025

https://github.com/joschnitzbauer/dalymi

A lightweight, data-focused and non-opinionated pipeline manager written in and for Python.

dag data data-science pipeline python workflow

Last synced: 14 Jan 2026

https://github.com/raynardj/forgebox

The deep learning tool box

data-science machine-learning nlp pandas-dataframe

Last synced: 16 Oct 2025

https://github.com/pufanyi/genderrecognitionbyvoice

NTU SC1015 Group Project - Gender Recognition by Voice

data-science machine-learning voice-recognition

Last synced: 09 Feb 2026

https://github.com/ikegwukc/csc-405-605_spring_2022

Introductory Data Science Course Taught at UNCG (Spring 2022)

data-science datascience

Last synced: 09 Oct 2025

https://github.com/wlongxiang/dutch_traffic_monitor

Visualize traffic on dutch high way A9 as an example

computer-vision data-science deep-learning object-detection opencv visualization

Last synced: 04 Aug 2025

https://github.com/tuliosg/cdp

Repositรณrio do curso "Ciรชncia de Dados para Pesquisa".

data-analysis data-manipulation data-science data-visualization google-colab jupyter-notebook python

Last synced: 03 Mar 2026

https://github.com/erictleung/data-science

:computer: Repository for teaching materials and notes on machine learning and data science for freeCodeCamp

data-cleaning data-engineering data-science data-visualization freecodecamp learning machine-learning mathematics notes python statistics

Last synced: 25 Mar 2025

https://github.com/aianytime/fishvision

FishVision built using Streamlit identifies the different species of Fishes in a given image. It is trained on "A Large Scale Fish Data" available on Kaggle using the pre-trained model "MobileNetV2".

data-science deep-learning deep-neural-networks heroku heroku-deployment machine-learning machine-learning-algorithms mobilenetv2 python python3 streamlit streamlit-webapp

Last synced: 01 Sep 2025

https://github.com/zapier/awsjavasdk

Boilerplate rJava Access to the AWS Java SDK

awsjavasdk cran data-science

Last synced: 14 Apr 2025

https://github.com/abhinav-ark/mal_lyrics_analysis

Preprocessing and EDA on a Dataset of Malayalam Songs and Lyrics

data-science eda jupyter-notebook python

Last synced: 22 Jul 2025

https://github.com/snowflakedb/snowpark-checkpoints

Snowpark Python / Spark Migration Testing Tools

data-analytics data-engineering data-science python snowflake sql

Last synced: 31 Aug 2025

https://github.com/prakalp-pande/twitter-sentiment-analysis

Analyze public opinion on Tweet by mining and processing live Twitter data. Employ machine learning to classify tweets as positive, negative, or neutral, then visualize sentiment trends and identify key influencers.

data-science twitter-sentiment-analysis

Last synced: 18 Aug 2025

https://github.com/fusky-labs/pacopanda-drawing-stats

A case study and data analysis project that collects drawings from a furry artist Paco Panda

data-science data-visualization fastapi furries furry furry-fandom pandas python

Last synced: 09 Aug 2025

https://github.com/chris-santiago/steps

A SciKit-Learn style feature selector using best subsets and stepwise regression.

best-subset-selection data-science python scikit-learn stepwise-selection

Last synced: 28 Jun 2025

https://github.com/datadistillr/datadistillr-python-sdk

A Python SDK for Programmatically Interacting with DataDistillr

apache-drill data data-science datadistillr jupyter sql

Last synced: 01 Jul 2025

https://github.com/wgierke/git_better

3rd-placed solution for the informatiCup2017

data-science docker docker-image heroku machine-learning tensorboard

Last synced: 24 Mar 2025

https://github.com/firelink-sh/evolve-py

A highly efficient, composable, and lightweight ETL and data integration framework.

analytics arrow big-data data data-engineering data-integration data-science duckdb elt etl ingestion ingress ml olap pipeline polars postgresql python s3

Last synced: 10 Mar 2026

https://github.com/ahammadmejbah/ultimate-data-science-resources

๐Ÿš€ Welcome to the Unlimited Data Science Resources community! Dive into a wealth of knowledge with curated tutorials, courses, and insights. Elevate your data science journey with boundless learning opportunities! ๐Ÿ“Šโœจ

data-engineering data-mining data-science data-visualization database datascience

Last synced: 26 Feb 2025