An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/bsomps/OpenGeoPlotter

A PyQt5 app catered to the exploration industry for visualizing geologic drill hole data with features like cross-sections, simple 3D views, strip logs, scatter plots, and downhole line plots. Includes data transformation techniques like factor analysis, desurveying, and alpha-beta conversion.

cross-sections data-science drilling exploration geology geoscience pyqt5 python strip-logs

Last synced: 05 Mar 2025

https://github.com/sjcobb/webxr-threejs-midi-visualizer

WebXR, augmented reality MIDI data visualization, built with Three.js and Tone.js. See video: https://youtu.be/lIecCGtbqSM

3d aframe cannonjs data-science data-visualization depth-estimation game-development hit-detection javascript midi music-theory physics three threejs tone tonejs webvr webxr

Last synced: 12 Jul 2025

https://github.com/sdpython/mlstatpy

Mathematics, Algorithmic, Data-Science, Teaching Materials

algorithms data-science mathematics python3 teaching-materials

Last synced: 23 Jun 2025

https://github.com/matteocargnelutti/maguire-lab-seizure-detection-webapp

🧠 Maguire Lab's Deep Learning Seizure Detection WebApp.

data-science eeg-signals-processing neuroscience

Last synced: 21 Apr 2025

https://github.com/eliasdabbas/dash-aggrid-scales

Color scales (continuous and categorical) and bar charts for Dash-Ag-Grid

aggrid color-scales color-scheme data-science data-visualization html plotly-dash table

Last synced: 16 Mar 2026

https://github.com/correia-jpv/fucking-awesome-datascience

📝 An awesome Data Science repository to learn and apply for real world problems. With repository stars⭐ and forks🍴

analytics awesome awesome-list data-mining data-science data-scientists data-visualization deep-learning hacktoberfest machine-learning science

Last synced: 27 Apr 2025

https://github.com/oceannetworkscanada/api-python-client

Provides easy access to ONC data in Python

api data-science ocean-sciences onc python

Last synced: 20 Jul 2025

https://github.com/shwetajoshi601/world-bank-data-analysis

An Exploratory Data Analysis on the World Bank Dataset.

analysis data-science eda python3 world-bank-api worldbank

Last synced: 02 Aug 2025

https://github.com/devopscorner/nifi

Production Grade Nifi & Nifi Registry. Deploy for VM (Virtual Machine) with Terraform + Ansible, Helm & Helmfile for Kubernetes (EKS)

ansible data-science data-structures docker docker-compose dockerhub ecr eks eks-cluster etl kubernetes machine-learning ml mlops nifi nifi-registry terraform vpn vpn-client

Last synced: 08 Sep 2025

https://github.com/tristanbilot/airflow-rbac-roles-cli

A tool to create Airflow RBAC roles with dag-level permissions from cli.

airflow cloud-composer data-engineering data-science gcp permissions pipeline rbac-roles

Last synced: 25 Oct 2025

https://github.com/clojurecivitas/clojurecivitas.github.io

An open effort to structure learning resources with meaningful connections.

blog clay clojure data-science literate markdown notebooks

Last synced: 24 Jun 2025

https://github.com/doarakko/kagoole

Search kaggle competitions and solutions based on data and predict type, evaluation metric, etc.

artificial-intelligence data-science heroku kaggle kaggle-competition kaggle-solution machine-learning webapp

Last synced: 17 Oct 2025

https://github.com/mathewroy/ynabr

Analyze and visualize your You Need A Budget (YNAB) data. YNAB meets R programming language.

api data-analysis data-science data-visualization r ynab ynab-api

Last synced: 30 Jul 2025

https://github.com/hoangsonww/standard-deviation-calculator

📊 This repository contains a Standard Deviation Calculator implemented in C++. It provides an efficient algorithm for calculating the statistical standard deviation of a dataset, making it a valuable tool for students, researchers, and analysts seeking a reliable method for data analysis.

algorithms cplusplus cpp data data-analysis data-analytics data-science standard-deviation standard-deviation-calculator standard-deviations

Last synced: 22 Sep 2025

https://github.com/zen-reportz/zen_dash

Simple, Fast, Scalable , production grade dashboard application . Right solution for team

dashboard data-analytics data-science fastapi flask python3 shiny streamlit

Last synced: 13 Apr 2025

https://github.com/xuri/excelize-py

Excelize is a Python port of Go Excelize library that allow you to write to and read from XLAM / XLSM / XLSX / XLTM / XLTX files.

calculation chart data-analysis data-science data-visualization ecma-376 excel excelize golang microsoft office ooxml pipy python spreadsheet visualization xlsm xlsx xlsxreader xlsxwriter

Last synced: 07 May 2025

https://github.com/fabsta/interesting_notebooks

A collection of Data Science Jupyter notebook (reference material)

data-science eda jupyter-notebook kaggle machine-learning python

Last synced: 03 Jul 2025

https://github.com/krypty/trefle

Trefle is a scikit-learn compatible estimator implementing the FuzzyCoCo algorithm that uses a cooperative coevolution algorithm to find and build interpretable fuzzy systems.

data-science deap evolutionary-algorithm fuzzy-logic interpretability machine-learning python scikit-learn

Last synced: 29 Oct 2025

https://github.com/chaganti-reddy/evmarket-india

Electric Vehicle Market Segmentation Analysis in India

data-analysis data-science machine-learning market-segmentation pandas python

Last synced: 12 Apr 2025

https://github.com/martincastroalvarez/html2vec

Algorithm that converts an HTML to a vectorized object suitable for neural networks.

data-science html2vec natural-language-processing python web-scraping word2vec

Last synced: 11 Apr 2025

https://github.com/eyadsibai/machine-learning-docker-image

Data Science/Machine Learning Docker Image for CPU

data-science docker docker-image google-cloud machine-learning

Last synced: 30 Apr 2025

https://github.com/alan-turing-institute/hds-discussiongroup

Repo of the Turing's Humanities & Data Science Discussion Group

data-science digital-humanities discussion-group

Last synced: 03 Mar 2026

https://github.com/jimbrig/lossrx

An R package, plumber API, database, and Shiny App for Actuarial Loss Development and Reserving Workflows.

actuarial-science claims-data claims-reserving data-science insurance modelling property-casualty reserving rpackage rshiny rstats workflow

Last synced: 01 Jul 2025

https://github.com/faridrashidi/cnsplots

🎨 Toolkit for generating publication-quality plots for Cell, Nature and Science journals

data-science data-visualization plotting publication-quality python scientific-publications

Last synced: 06 Apr 2026

https://github.com/zenml-io/template-starter

A template for a starter project for ZenML

cookiecutter copier-template data-science machine-learning mlops zenml

Last synced: 14 Apr 2025

https://github.com/openbridge/ob_pysh-db

pysh-db - The Data Science Toolkit (DSK)

bash data-science mysql postgres python redshift sql

Last synced: 10 Apr 2025

https://github.com/bdist/bdist-workspace

This repository provides containerized applications and microservices for the Information Systems and Databases Course @ Instituto Superior Técnico

data-engineering data-science docker jupyter jupyterlab notebook postgres postgresql python sql sqlite

Last synced: 09 Apr 2026

https://github.com/amirhosseinhonardoust/underwriting-decision-safety-lab

A decision-safety lab for loan approval: trains a baseline classifier, calibrates probabilities (ECE/Brier), sweeps confidence thresholds to build a coverage, quality frontier and outputs a defensible abstention policy (auto-decide vs review). Includes a Streamlit dashboard for report cards, triage UI, and data quality checks.

abstention calibration classification credit-risk data-quality data-science decision-policy loan-approval machine-learning mlops model-evaluation monitoring pandas reliability responsible-ai scikit-learn selective-classification streamlit uncertainty underwriting

Last synced: 10 Jun 2026

https://github.com/takuti/anompy

A Python library for anomaly detection

anomaly-detection data-science forecasting machine-learning python

Last synced: 15 Apr 2025

https://github.com/mindful-ai-assistants/hackapucsp-2024

🏆 HackaPUCSP 2024 - - Data Science and AI Hackathon - Pontifical Catholic University of São Paulo

automation data-science design github-actions hackathon-project oneness-consciousness package-manager programming pucsp pytest python3 unittest

Last synced: 11 Jul 2025

https://github.com/qpwedev/blockchain-network-visualizer

Blockchain Network Visualizer for TON.

blockchain data-science network ton toncoin

Last synced: 14 Mar 2025

https://github.com/anshchoudhary/xgmodel

This repository contains code to predict the Expected Goals (xG) from shots in football using various machine learning models.

data-science football-analytics football-data machine-learning machine-learning-algorithms

Last synced: 10 Apr 2025

https://github.com/emptymalei/audiorepr

A python package to represent data using musical notes.

audiolization data data-audiolization data-science

Last synced: 12 Oct 2025

https://github.com/lungben/tableio.jl

A glue package for reading and writing tabular data. It aims to provide a uniform api for reading and writing tabular data from and to multiple sources.

arrow csv data data-science database dataframe dataframes excel jdf json-format parquet postgresql sqlite zip

Last synced: 12 Oct 2025

https://github.com/canagnos/mcp

Tools for Measuring Classification Performance for R, Python and Spark

artificial-intelligence classification data-mining data-science machine-learning machine-learning-algorithms

Last synced: 28 Apr 2025

https://github.com/elliotwutingfeng/twitter200m

Simple analysis of the Twitter 200M Data Dump of January 2023.

200m data-science haveibeenpwned leak osint twitter

Last synced: 16 Mar 2026

https://github.com/dina-hosny/chaincare

ChainCare is a health information system that uses smart contracts to handle medical procedures and stores the medical history in Block Chains.

api-rest bigchain blockchain blockchain-technology data-science data-storage data-visualization ethereum golang health-informatics-systems healthcare insomnia metamask postgresql postman reactjs solidity truffle web3

Last synced: 13 Apr 2026

https://github.com/strazto/mandrake

📖🐉- Bring reading the manual 📖 closer to your drake 🐉 workflow 🔥

data-science drake high-performance-computing makefile pipeline r r-package reproducibility reproducible-research rstats workflow

Last synced: 13 Jul 2025

https://github.com/urbanclimatefr/coursera-learn-sql-basics-for-data-science

This repository contains the materials to "Learn SQL Basics for Data Science", a specialization provided by University of California, Davis through Coursera.

coursera data-science sql

Last synced: 19 Feb 2026

https://github.com/networks-learning/discussion-complexity

Code for "On the Complexity of Opinions and Online Discussions", WSDM 2019

complexity data-science discussion online-discussions opinion-mining paper wsdm

Last synced: 10 Aug 2025

https://github.com/gabrieltempass/abtester

A web application to design and evaluate the results of A/B tests.

ab-testing data-science hypothesis-testing python sample-size statistical-significance statistics streamlit web-app

Last synced: 06 Oct 2025

https://github.com/arv-anshul/yt-watch-history

Analyse your YouTube watch history using Data Science, ML and NLP.

data-science docker docker-compose fastapi ml mlflow mlops mongodb nlp pydantic python3 streamlit youtube-api

Last synced: 22 Apr 2025

https://github.com/kennethleungty/english-premier-league-var-analysis

Analyzing Video Assistant Referee (VAR) decisions in the English Premier League (2019 - 2021)

data-analysis data-analytics data-science english-premier-league football soccer var

Last synced: 27 Aug 2025

https://github.com/chandraprakash-bathula/apparel-recommendations

This project implements a personalized apparel recommendation engine using content-based search with the Amazon API, NLTK, and Keras libraries.

boxplot cnn-keras data-analysis data-science deep-learning linear-regression machine-learning numpy pandas scatter-plot scikit-learn svm tensorflow xgboost

Last synced: 23 Mar 2025

https://github.com/alvarobartt/ea-associate-ds

Electronic Arts (EA) NLP Assignment for: Associate Data Scientist

data-science electronic-arts nlp recruitment-task

Last synced: 12 Apr 2025

https://github.com/dhhruv/stock-price-prediction

A deep learning project in which the model was trained using LSTM layers and Tata Stock prices were predicted and compared with thier actual values.

algorithm cli college-project data data-science dataset deep-learning jupyter jupyter-notebook lstm machine-learning prediction science shell stock-price-prediction tata-beverages terminal

Last synced: 03 May 2025

https://github.com/thomasnield/oreilly_kotlin_for_data_science

Notes, slides, and contents for the O'Reilly videos using Kotlin for Data Science

data-engineering data-science etl kotlin oreilly statistics

Last synced: 27 Mar 2025

https://github.com/ndxdeveloper/formation-python

Formation Python - Du débutant à l'avancé | 13 modules (FastAPI, Type Hints, Data Science, SQLAlchemy, asyncio) | 75+ sujets | 100% français | MIT License

api-rest asyncio data-science developpement fastapi formation francais french learning numpy pandas poetry poo programmation pytest python python3 sqlalchemy type-hints

Last synced: 08 Apr 2026

https://github.com/luminousmen/python_for_ds

Python for Data Analysis workshop

data-analysis data-science python tutorial

Last synced: 01 May 2025

https://github.com/florents-tselai/sqlite-for-data-scientists

Notebooks and supporting files for SQLite for Data Scientists Online Live Training, on OReilly Learning Platform

data-science learning sql sqlite3 training-materials

Last synced: 11 Apr 2025

https://github.com/l480/rewe-price-data

🏪 Daily updated prices of all items from the German supermarket chain REWE as CSV (including EAN, grammage, product image etc.)

csv data-science ean inflation prices rewe shrinkflation supermarket

Last synced: 11 Jan 2026

https://github.com/yangfa-zhang/lunax

Lunax is a machine learning framework specifically designed for the processing and analysis of tabular data.

data-analysis data-science lunax machine-learning tabular-data

Last synced: 14 Dec 2025

https://github.com/joaocarabetta/project-templates

Fast Project Templates

data-science python template

Last synced: 19 Sep 2025

https://github.com/ptyadana/tableau_2020_a-z_hands-on

Tableau Projects for data analysis, data analytics and data visualaization on different data sets

data-analysis data-science data-visualization tableau tableau-dashboards tableau-desktop tableau-public tableau-workbooks

Last synced: 03 Aug 2025

https://github.com/bcgov/canwqdata

R 📦 to download 🇨🇦 open water quality data

data-science env r r-package rlang rstats

Last synced: 20 Jul 2025

https://github.com/cimentadaj/dataharvesting

Material for the course 'Data Harvesting' for the masters in computational social science - UC3M

api data-science r web-scraping

Last synced: 30 Apr 2025

https://github.com/thecoderpinar/spotify_trends_2023_analysis

Exploring Spotify's latest trends, top songs, genres, and artists using Python, Pandas, NumPy, Matplotlib, CNNs for image-based analysis, and advanced algorithms for music recommendation. Dive into the world of music data and discover what's trending on Spotify! 🎵📊

cnn cnn-keras data-analysis data-science data-visualization machine-learning matplotlib music-trend numpy pandas python spotify

Last synced: 30 Apr 2025