An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/amirhosseinhonardoust/underwriting-decision-safety-lab

A decision-safety lab for loan approval: trains a baseline classifier, calibrates probabilities (ECE/Brier), sweeps confidence thresholds to build a coverage, quality frontier and outputs a defensible abstention policy (auto-decide vs review). Includes a Streamlit dashboard for report cards, triage UI, and data quality checks.

abstention calibration classification credit-risk data-quality data-science decision-policy loan-approval machine-learning mlops model-evaluation monitoring pandas reliability responsible-ai scikit-learn selective-classification streamlit uncertainty underwriting

Last synced: 10 Jun 2026

https://github.com/bdist/bdist-workspace

This repository provides containerized applications and microservices for the Information Systems and Databases Course @ Instituto Superior Tรฉcnico

data-engineering data-science docker jupyter jupyterlab notebook postgres postgresql python sql sqlite

Last synced: 09 Apr 2026

https://github.com/elliotwutingfeng/twitter200m

Simple analysis of the Twitter 200M Data Dump of January 2023.

200m data-science haveibeenpwned leak osint twitter

Last synced: 16 Mar 2026

https://github.com/chaganti-reddy/evmarket-india

Electric Vehicle Market Segmentation Analysis in India

data-analysis data-science machine-learning market-segmentation pandas python

Last synced: 12 Apr 2025

https://github.com/sjcobb/webxr-threejs-midi-visualizer

WebXR, augmented reality MIDI data visualization, built with Three.js and Tone.js. See video: https://youtu.be/lIecCGtbqSM

3d aframe cannonjs data-science data-visualization depth-estimation game-development hit-detection javascript midi music-theory physics three threejs tone tonejs webvr webxr

Last synced: 12 Jul 2025

https://github.com/eyadsibai/machine-learning-docker-image

Data Science/Machine Learning Docker Image for CPU

data-science docker docker-image google-cloud machine-learning

Last synced: 30 Apr 2025

https://github.com/clojurecivitas/clojurecivitas.github.io

An open effort to structure learning resources with meaningful connections.

blog clay clojure data-science literate markdown notebooks

Last synced: 24 Jun 2025

https://github.com/xuri/excelize-py

Excelize is a Python port of Go Excelize library that allow you to write to and read from XLAM / XLSM / XLSX / XLTM / XLTX files.

calculation chart data-analysis data-science data-visualization ecma-376 excel excelize golang microsoft office ooxml pipy python spreadsheet visualization xlsm xlsx xlsxreader xlsxwriter

Last synced: 07 May 2025

https://github.com/martincastroalvarez/html2vec

Algorithm that converts an HTML to a vectorized object suitable for neural networks.

data-science html2vec natural-language-processing python web-scraping word2vec

Last synced: 11 Apr 2025

https://github.com/alan-turing-institute/hds-discussiongroup

Repo of the Turing's Humanities & Data Science Discussion Group

data-science digital-humanities discussion-group

Last synced: 03 Mar 2026

https://github.com/faridrashidi/cnsplots

๐ŸŽจ Toolkit for generating publication-quality plots for Cell, Nature and Science journals

data-science data-visualization plotting publication-quality python scientific-publications

Last synced: 06 Apr 2026

https://github.com/jimbrig/lossrx

An R package, plumber API, database, and Shiny App for Actuarial Loss Development and Reserving Workflows.

actuarial-science claims-data claims-reserving data-science insurance modelling property-casualty reserving rpackage rshiny rstats workflow

Last synced: 01 Jul 2025

https://github.com/fabsta/interesting_notebooks

A collection of Data Science Jupyter notebook (reference material)

data-science eda jupyter-notebook kaggle machine-learning python

Last synced: 03 Jul 2025

https://github.com/zen-reportz/zen_dash

Simple, Fast, Scalable , production grade dashboard application . Right solution for team

dashboard data-analytics data-science fastapi flask python3 shiny streamlit

Last synced: 13 Apr 2025

https://github.com/krypty/trefle

Trefle is a scikit-learn compatible estimator implementing the FuzzyCoCo algorithm that uses a cooperative coevolution algorithm to find and build interpretable fuzzy systems.

data-science deap evolutionary-algorithm fuzzy-logic interpretability machine-learning python scikit-learn

Last synced: 29 Oct 2025

https://github.com/openbridge/ob_pysh-db

pysh-db - The Data Science Toolkit (DSK)

bash data-science mysql postgres python redshift sql

Last synced: 10 Apr 2025

https://github.com/lungben/tableio.jl

A glue package for reading and writing tabular data. It aims to provide a uniform api for reading and writing tabular data from and to multiple sources.

arrow csv data data-science database dataframe dataframes excel jdf json-format parquet postgresql sqlite zip

Last synced: 12 Oct 2025

https://github.com/shwetajoshi601/world-bank-data-analysis

An Exploratory Data Analysis on the World Bank Dataset.

analysis data-science eda python3 world-bank-api worldbank

Last synced: 02 Aug 2025

https://github.com/dovolopor-research/data-science-research-toolbox

๐Ÿงฐ ๆ•ฐๆฎ็ง‘ๅญฆ็ง‘็ ”ๅทฅๅ…ท็ฎฑ

data-science data-science-research data-science-resourses research-resources research-tool visualization

Last synced: 05 Jan 2026

https://github.com/oceannetworkscanada/api-python-client

Provides easy access to ONC data in Python

api data-science ocean-sciences onc python

Last synced: 20 Jul 2025

https://github.com/anshchoudhary/xgmodel

This repository contains code to predict the Expected Goals (xG) from shots in football using various machine learning models.

data-science football-analytics football-data machine-learning machine-learning-algorithms

Last synced: 10 Apr 2025

https://github.com/emptymalei/audiorepr

A python package to represent data using musical notes.

audiolization data data-audiolization data-science

Last synced: 12 Oct 2025

https://github.com/tristanbilot/airflow-rbac-roles-cli

A tool to create Airflow RBAC roles with dag-level permissions from cli.

airflow cloud-composer data-engineering data-science gcp permissions pipeline rbac-roles

Last synced: 25 Oct 2025

https://github.com/canagnos/mcp

Tools for Measuring Classification Performance for R, Python and Spark

artificial-intelligence classification data-mining data-science machine-learning machine-learning-algorithms

Last synced: 28 Apr 2025

https://github.com/eliasdabbas/dash-aggrid-scales

Color scales (continuous and categorical) and bar charts for Dash-Ag-Grid

aggrid color-scales color-scheme data-science data-visualization html plotly-dash table

Last synced: 16 Mar 2026

https://github.com/doarakko/kagoole

Search kaggle competitions and solutions based on data and predict type, evaluation metric, etc.

artificial-intelligence data-science heroku kaggle kaggle-competition kaggle-solution machine-learning webapp

Last synced: 17 Oct 2025

https://github.com/mathewroy/ynabr

Analyze and visualize your You Need A Budget (YNAB) data. YNAB meets R programming language.

api data-analysis data-science data-visualization r ynab ynab-api

Last synced: 30 Jul 2025

https://github.com/matteocargnelutti/maguire-lab-seizure-detection-webapp

๐Ÿง  Maguire Lab's Deep Learning Seizure Detection WebApp.

data-science eeg-signals-processing neuroscience

Last synced: 21 Apr 2025

https://github.com/devopscorner/nifi

Production Grade Nifi & Nifi Registry. Deploy for VM (Virtual Machine) with Terraform + Ansible, Helm & Helmfile for Kubernetes (EKS)

ansible data-science data-structures docker docker-compose dockerhub ecr eks eks-cluster etl kubernetes machine-learning ml mlops nifi nifi-registry terraform vpn vpn-client

Last synced: 08 Sep 2025

https://github.com/sdpython/mlstatpy

Mathematics, Algorithmic, Data-Science, Teaching Materials

algorithms data-science mathematics python3 teaching-materials

Last synced: 23 Jun 2025

https://github.com/zenml-io/template-starter

A template for a starter project for ZenML

cookiecutter copier-template data-science machine-learning mlops zenml

Last synced: 14 Apr 2025

https://github.com/hoangsonww/standard-deviation-calculator

๐Ÿ“Š This repository contains a Standard Deviation Calculator implemented in C++. It provides an efficient algorithm for calculating the statistical standard deviation of a dataset, making it a valuable tool for students, researchers, and analysts seeking a reliable method for data analysis.

algorithms cplusplus cpp data data-analysis data-analytics data-science standard-deviation standard-deviation-calculator standard-deviations

Last synced: 22 Sep 2025

https://github.com/qpwedev/blockchain-network-visualizer

Blockchain Network Visualizer for TON.

blockchain data-science network ton toncoin

Last synced: 14 Mar 2025

https://github.com/takuti/anompy

A Python library for anomaly detection

anomaly-detection data-science forecasting machine-learning python

Last synced: 15 Apr 2025

https://github.com/bsomps/OpenGeoPlotter

A PyQt5 app catered to the exploration industry for visualizing geologic drill hole data with features like cross-sections, simple 3D views, strip logs, scatter plots, and downhole line plots. Includes data transformation techniques like factor analysis, desurveying, and alpha-beta conversion.

cross-sections data-science drilling exploration geology geoscience pyqt5 python strip-logs

Last synced: 05 Mar 2025

https://github.com/mindful-ai-assistants/hackapucsp-2024

๐Ÿ† HackaPUCSP 2024 - - Data Science and AI Hackathon - Pontifical Catholic University of Sรฃo Paulo

automation data-science design github-actions hackathon-project oneness-consciousness package-manager programming pucsp pytest python3 unittest

Last synced: 11 Jul 2025

https://github.com/arv-anshul/yt-watch-history

Analyse your YouTube watch history using Data Science, ML and NLP.

data-science docker docker-compose fastapi ml mlflow mlops mongodb nlp pydantic python3 streamlit youtube-api

Last synced: 22 Apr 2025

https://github.com/koalaverse/analyticssummit19

Material for 2019 Analytics Summit Machine Learning with R Training

data-science educational-materials machine-learning r workshop-materials

Last synced: 15 May 2025

https://github.com/nas5w/imdb-data

A JSON file of 50,000 IMDB movie reviews to be used in machine learning applications.

data data-science imdb javascript machine-learning

Last synced: 19 Apr 2025

https://github.com/mratsim/meilleur-data-scientist-france-2018

My solution for the competition "Le meilleur data scientist de France 2018" (Best Data Scientist of France 2018)

data-science data-science-competition machine-learning xgboost

Last synced: 15 Sep 2025

https://github.com/dhimmel/openskistats

The study of skiing where we shred open data like pow. Quantifying alpine ski areas with geospatial metrics derived from OpenStreetMap.

data-science data-visualization downhill elevation geospatial gis mapping open-data openskimap openstreetmap orientation python quarto ski-areas skiing slope snowpack solar-irradiance sunlight topography

Last synced: 21 Jul 2025

https://github.com/firaskahlaoui/heart-disease-analysis-r

R for data visualization and analysis of heart disease datasets.

data-science data-visualization ggplot kaggle-dataset r statistics

Last synced: 14 Apr 2025

https://github.com/networks-learning/discussion-complexity

Code for "On the Complexity of Opinions and Online Discussions", WSDM 2019

complexity data-science discussion online-discussions opinion-mining paper wsdm

Last synced: 10 Aug 2025

https://github.com/anaclumos/heart-diagnosis-engine

2019๋…„ ๋ฏผ์กฑ์‚ฌ๊ด€๊ณ ๋“ฑํ•™๊ต ์กธ์—… ํ”„๋กœ์ ํŠธ

data-science machine-learning pandas python scikit-learn

Last synced: 22 Aug 2025

https://github.com/rbhatia46/python-for-data-science

This repository contains iPython notebooks to get you started with sufficient amount of Python you need to learn to get started with your Data Science Journey.

data-science python-basics python3

Last synced: 03 Sep 2025

https://github.com/strazto/mandrake

๐Ÿ“–๐Ÿ‰- Bring reading the manual ๐Ÿ“– closer to your drake ๐Ÿ‰ workflow ๐Ÿ”ฅ

data-science drake high-performance-computing makefile pipeline r r-package reproducibility reproducible-research rstats workflow

Last synced: 13 Jul 2025

https://github.com/mertguvencli/keyword-extractor

This project aims to find "what are the trending techs on Data Science jobs?" using NER.

data-science machine-learning ner nlp python spacy

Last synced: 10 Sep 2025

https://github.com/numeract/rflow

Flexible R Pipelines with Caching

cache data-science pipeline r rflow

Last synced: 28 May 2026

https://github.com/chandraprakash-bathula/apparel-recommendations

This project implements a personalized apparel recommendation engine using content-based search with the Amazon API, NLTK, and Keras libraries.

boxplot cnn-keras data-analysis data-science deep-learning linear-regression machine-learning numpy pandas scatter-plot scikit-learn svm tensorflow xgboost

Last synced: 23 Mar 2025

https://github.com/kennethleungty/english-premier-league-var-analysis

Analyzing Video Assistant Referee (VAR) decisions in the English Premier League (2019 - 2021)

data-analysis data-analytics data-science english-premier-league football soccer var

Last synced: 27 Aug 2025

https://github.com/lambdaclass/data_etudes

LambdaClass statistics, machine learning and data science etudes

data-science notebook probability statistics

Last synced: 09 Apr 2025

https://github.com/blurred-machine/data-science

This repository contains all of my minor projects built by me during the learning plase of Machine Learning and Data Science. Feel free to create a PR for modifications.

algorithms-python data-science jupyter-notebook learning-by-doing machine-learning-algorithms minor-project python

Last synced: 27 Apr 2025

https://github.com/urbanclimatefr/coursera-learn-sql-basics-for-data-science

This repository contains the materials to "Learn SQL Basics for Data Science", a specialization provided by University of California, Davis through Coursera.

coursera data-science sql

Last synced: 19 Feb 2026

https://github.com/rbhatia46/data-preprocessing-template

This repository includes all the Data Preprocessing required before using a dataset on a Machine Learning Model. Please refer README on how to use.

data-preprocessing data-science machine-learning python

Last synced: 11 Apr 2025

https://github.com/hsins/mpl-tc-fonts

๐Ÿ‡น๐Ÿ‡ผ A package to solve the problem of "Tofu" in your matplotlib plots whenever you're trying to use Traditional Chinese characters in labels or texts.

cjk-characters data-science matplotlib

Last synced: 29 Oct 2025