An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/emptymalei/audiorepr

A python package to represent data using musical notes.

audiolization data data-audiolization data-science

Last synced: 12 Oct 2025

https://github.com/bdist/bdist-workspace

This repository provides containerized applications and microservices for the Information Systems and Databases Course @ Instituto Superior Tรฉcnico

data-engineering data-science docker jupyter jupyterlab notebook postgres postgresql python sql sqlite

Last synced: 09 Apr 2026

https://github.com/matteocargnelutti/maguire-lab-seizure-detection-webapp

๐Ÿง  Maguire Lab's Deep Learning Seizure Detection WebApp.

data-science eeg-signals-processing neuroscience

Last synced: 21 Apr 2025

https://github.com/lungben/tableio.jl

A glue package for reading and writing tabular data. It aims to provide a uniform api for reading and writing tabular data from and to multiple sources.

arrow csv data data-science database dataframe dataframes excel jdf json-format parquet postgresql sqlite zip

Last synced: 12 Oct 2025

https://github.com/devopscorner/nifi

Production Grade Nifi & Nifi Registry. Deploy for VM (Virtual Machine) with Terraform + Ansible, Helm & Helmfile for Kubernetes (EKS)

ansible data-science data-structures docker docker-compose dockerhub ecr eks eks-cluster etl kubernetes machine-learning ml mlops nifi nifi-registry terraform vpn vpn-client

Last synced: 08 Sep 2025

https://github.com/mindful-ai-assistants/hackapucsp-2024

๐Ÿ† HackaPUCSP 2024 - - Data Science and AI Hackathon - Pontifical Catholic University of Sรฃo Paulo

automation data-science design github-actions hackathon-project oneness-consciousness package-manager programming pucsp pytest python3 unittest

Last synced: 11 Jul 2025

https://github.com/shwetajoshi601/world-bank-data-analysis

An Exploratory Data Analysis on the World Bank Dataset.

analysis data-science eda python3 world-bank-api worldbank

Last synced: 02 Aug 2025

https://github.com/openbridge/ob_pysh-db

pysh-db - The Data Science Toolkit (DSK)

bash data-science mysql postgres python redshift sql

Last synced: 10 Apr 2025

https://github.com/fabsta/interesting_notebooks

A collection of Data Science Jupyter notebook (reference material)

data-science eda jupyter-notebook kaggle machine-learning python

Last synced: 03 Jul 2025

https://github.com/alan-turing-institute/hds-discussiongroup

Repo of the Turing's Humanities & Data Science Discussion Group

data-science digital-humanities discussion-group

Last synced: 03 Mar 2026

https://github.com/clojurecivitas/clojurecivitas.github.io

An open effort to structure learning resources with meaningful connections.

blog clay clojure data-science literate markdown notebooks

Last synced: 24 Jun 2025

https://github.com/eyadsibai/machine-learning-docker-image

Data Science/Machine Learning Docker Image for CPU

data-science docker docker-image google-cloud machine-learning

Last synced: 30 Apr 2025

https://github.com/chaganti-reddy/evmarket-india

Electric Vehicle Market Segmentation Analysis in India

data-analysis data-science machine-learning market-segmentation pandas python

Last synced: 12 Apr 2025

https://github.com/xuri/excelize-py

Excelize is a Python port of Go Excelize library that allow you to write to and read from XLAM / XLSM / XLSX / XLTM / XLTX files.

calculation chart data-analysis data-science data-visualization ecma-376 excel excelize golang microsoft office ooxml pipy python spreadsheet visualization xlsm xlsx xlsxreader xlsxwriter

Last synced: 07 May 2025

https://github.com/bsomps/OpenGeoPlotter

A PyQt5 app catered to the exploration industry for visualizing geologic drill hole data with features like cross-sections, simple 3D views, strip logs, scatter plots, and downhole line plots. Includes data transformation techniques like factor analysis, desurveying, and alpha-beta conversion.

cross-sections data-science drilling exploration geology geoscience pyqt5 python strip-logs

Last synced: 05 Mar 2025

https://github.com/jimbrig/lossrx

An R package, plumber API, database, and Shiny App for Actuarial Loss Development and Reserving Workflows.

actuarial-science claims-data claims-reserving data-science insurance modelling property-casualty reserving rpackage rshiny rstats workflow

Last synced: 01 Jul 2025

https://github.com/faridrashidi/cnsplots

๐ŸŽจ Toolkit for generating publication-quality plots for Cell, Nature and Science journals

data-science data-visualization plotting publication-quality python scientific-publications

Last synced: 06 Apr 2026

https://github.com/zen-reportz/zen_dash

Simple, Fast, Scalable , production grade dashboard application . Right solution for team

dashboard data-analytics data-science fastapi flask python3 shiny streamlit

Last synced: 13 Apr 2025

https://github.com/martincastroalvarez/html2vec

Algorithm that converts an HTML to a vectorized object suitable for neural networks.

data-science html2vec natural-language-processing python web-scraping word2vec

Last synced: 11 Apr 2025

https://github.com/krypty/trefle

Trefle is a scikit-learn compatible estimator implementing the FuzzyCoCo algorithm that uses a cooperative coevolution algorithm to find and build interpretable fuzzy systems.

data-science deap evolutionary-algorithm fuzzy-logic interpretability machine-learning python scikit-learn

Last synced: 29 Oct 2025

https://github.com/sjcobb/webxr-threejs-midi-visualizer

WebXR, augmented reality MIDI data visualization, built with Three.js and Tone.js. See video: https://youtu.be/lIecCGtbqSM

3d aframe cannonjs data-science data-visualization depth-estimation game-development hit-detection javascript midi music-theory physics three threejs tone tonejs webvr webxr

Last synced: 12 Jul 2025

https://github.com/canagnos/mcp

Tools for Measuring Classification Performance for R, Python and Spark

artificial-intelligence classification data-mining data-science machine-learning machine-learning-algorithms

Last synced: 28 Apr 2025

https://github.com/hoangsonww/standard-deviation-calculator

๐Ÿ“Š This repository contains a Standard Deviation Calculator implemented in C++. It provides an efficient algorithm for calculating the statistical standard deviation of a dataset, making it a valuable tool for students, researchers, and analysts seeking a reliable method for data analysis.

algorithms cplusplus cpp data data-analysis data-analytics data-science standard-deviation standard-deviation-calculator standard-deviations

Last synced: 22 Sep 2025

https://github.com/mathewroy/ynabr

Analyze and visualize your You Need A Budget (YNAB) data. YNAB meets R programming language.

api data-analysis data-science data-visualization r ynab ynab-api

Last synced: 30 Jul 2025

https://github.com/elliotwutingfeng/twitter200m

Simple analysis of the Twitter 200M Data Dump of January 2023.

200m data-science haveibeenpwned leak osint twitter

Last synced: 16 Mar 2026

https://github.com/correia-jpv/fucking-awesome-datascience

๐Ÿ“ An awesome Data Science repository to learn and apply for real world problems. With repository starsโญ and forks๐Ÿด

analytics awesome awesome-list data-mining data-science data-scientists data-visualization deep-learning hacktoberfest machine-learning science

Last synced: 27 Apr 2025

https://github.com/amirhosseinhonardoust/underwriting-decision-safety-lab

A decision-safety lab for loan approval: trains a baseline classifier, calibrates probabilities (ECE/Brier), sweeps confidence thresholds to build a coverage, quality frontier and outputs a defensible abstention policy (auto-decide vs review). Includes a Streamlit dashboard for report cards, triage UI, and data quality checks.

abstention calibration classification credit-risk data-quality data-science decision-policy loan-approval machine-learning mlops model-evaluation monitoring pandas reliability responsible-ai scikit-learn selective-classification streamlit uncertainty underwriting

Last synced: 10 Jun 2026

https://github.com/eliasdabbas/dash-aggrid-scales

Color scales (continuous and categorical) and bar charts for Dash-Ag-Grid

aggrid color-scales color-scheme data-science data-visualization html plotly-dash table

Last synced: 16 Mar 2026

https://github.com/tristanbilot/airflow-rbac-roles-cli

A tool to create Airflow RBAC roles with dag-level permissions from cli.

airflow cloud-composer data-engineering data-science gcp permissions pipeline rbac-roles

Last synced: 25 Oct 2025

https://github.com/anshchoudhary/xgmodel

This repository contains code to predict the Expected Goals (xG) from shots in football using various machine learning models.

data-science football-analytics football-data machine-learning machine-learning-algorithms

Last synced: 10 Apr 2025

https://github.com/takuti/anompy

A Python library for anomaly detection

anomaly-detection data-science forecasting machine-learning python

Last synced: 15 Apr 2025

https://github.com/qpwedev/blockchain-network-visualizer

Blockchain Network Visualizer for TON.

blockchain data-science network ton toncoin

Last synced: 14 Mar 2025

https://github.com/dovolopor-research/data-science-research-toolbox

๐Ÿงฐ ๆ•ฐๆฎ็ง‘ๅญฆ็ง‘็ ”ๅทฅๅ…ท็ฎฑ

data-science data-science-research data-science-resourses research-resources research-tool visualization

Last synced: 05 Jan 2026

https://github.com/doarakko/kagoole

Search kaggle competitions and solutions based on data and predict type, evaluation metric, etc.

artificial-intelligence data-science heroku kaggle kaggle-competition kaggle-solution machine-learning webapp

Last synced: 17 Oct 2025

https://github.com/oceannetworkscanada/api-python-client

Provides easy access to ONC data in Python

api data-science ocean-sciences onc python

Last synced: 20 Jul 2025

https://github.com/sdpython/mlstatpy

Mathematics, Algorithmic, Data-Science, Teaching Materials

algorithms data-science mathematics python3 teaching-materials

Last synced: 23 Jun 2025

https://github.com/zenml-io/template-starter

A template for a starter project for ZenML

cookiecutter copier-template data-science machine-learning mlops zenml

Last synced: 14 Apr 2025

https://github.com/hassaku/audio-plot

Python library to converts a line graph to sound and return an object that can be played in Jupyter notebook or Google Colab. Values are represented by pitches, and the timeline is represented by left and right pans. It was created to make data science fun for the visually impaired.

audio-plot colab data-science jupyter-notebook python visually-impaired

Last synced: 01 Nov 2025

https://github.com/lucadibello/it-salary-analysis

๐Ÿ’ฐ Analysis of Salaries in IT Roles: DevOps, Cyber Security, and AI

ai cybersecurity data-science devops jupyter-notebook salary-analysis

Last synced: 03 Jul 2025

https://github.com/mertguvencli/keyword-extractor

This project aims to find "what are the trending techs on Data Science jobs?" using NER.

data-science machine-learning ner nlp python spacy

Last synced: 10 Sep 2025

https://github.com/chandraprakash-bathula/apparel-recommendations

This project implements a personalized apparel recommendation engine using content-based search with the Amazon API, NLTK, and Keras libraries.

boxplot cnn-keras data-analysis data-science deep-learning linear-regression machine-learning numpy pandas scatter-plot scikit-learn svm tensorflow xgboost

Last synced: 23 Mar 2025

https://github.com/kennethleungty/english-premier-league-var-analysis

Analyzing Video Assistant Referee (VAR) decisions in the English Premier League (2019 - 2021)

data-analysis data-analytics data-science english-premier-league football soccer var

Last synced: 27 Aug 2025

https://github.com/firaskahlaoui/heart-disease-analysis-r

R for data visualization and analysis of heart disease datasets.

data-science data-visualization ggplot kaggle-dataset r statistics

Last synced: 14 Apr 2025

https://github.com/ndxdeveloper/formation-python

Formation Python - Du dรฉbutant ร  l'avancรฉ | 13 modules (FastAPI, Type Hints, Data Science, SQLAlchemy, asyncio) | 75+ sujets | 100% franรงais | MIT License

api-rest asyncio data-science developpement fastapi formation francais french learning numpy pandas poetry poo programmation pytest python python3 sqlalchemy type-hints

Last synced: 08 Apr 2026

https://github.com/fabriziomusacchio/python_neuro_practical

This is the course material for the advanced course into Python for Data Scientists.

data-analysis data-science jupyter jupyter-notebook jupyter-notebooks open-source python teaching teaching-materials

Last synced: 22 Jul 2025

https://github.com/lambdaclass/data_etudes

LambdaClass statistics, machine learning and data science etudes

data-science notebook probability statistics

Last synced: 09 Apr 2025

https://github.com/rasmusrynell/predicting-nhl

The project explores the idea of using different machine learning techniques to determine different stats in NHL games.

ai algorithms data-science database machine-learning ml nhl nhl-api python scikit-learn sports sports-analytics sports-stats sportsanalytics

Last synced: 14 Apr 2025

https://github.com/koalaverse/analyticssummit19

Material for 2019 Analytics Summit Machine Learning with R Training

data-science educational-materials machine-learning r workshop-materials

Last synced: 15 May 2025

https://github.com/arv-anshul/yt-watch-history

Analyse your YouTube watch history using Data Science, ML and NLP.

data-science docker docker-compose fastapi ml mlflow mlops mongodb nlp pydantic python3 streamlit youtube-api

Last synced: 22 Apr 2025

https://github.com/aruizeac/alexandria

The Alexandria Project is an open-source platform where people can share their knowledge through books, podcasts, docs and videos.

alexandria data-science donation ebooks go golang grpc http kafka knowledge knowledge-sharing library microservice podcasts python societies streaming videos webservice

Last synced: 11 Mar 2026

https://github.com/dina-hosny/chaincare

ChainCare is a health information system that uses smart contracts to handle medical procedures and stores the medical history in Block Chains.

api-rest bigchain blockchain blockchain-technology data-science data-storage data-visualization ethereum golang health-informatics-systems healthcare insomnia metamask postgresql postman reactjs solidity truffle web3

Last synced: 13 Apr 2026

https://github.com/juniortorresmtj/projeto_deupositivo

Projeto de Anรกlise de Dados Abertos - SUS

alura bootcampds brazil data-science projeto python

Last synced: 29 Jul 2025

https://github.com/bradflaugher/ai-101

Notes, links and code samples and resources for teaching yourself pytorch and tensorflow.

bootcamp course data-engineering data-science learn-to-code learning-by-doing learning-python machine-learning

Last synced: 10 May 2025

https://github.com/mratsim/meilleur-data-scientist-france-2018

My solution for the competition "Le meilleur data scientist de France 2018" (Best Data Scientist of France 2018)

data-science data-science-competition machine-learning xgboost

Last synced: 15 Sep 2025

https://github.com/dhimmel/openskistats

The study of skiing where we shred open data like pow. Quantifying alpine ski areas with geospatial metrics derived from OpenStreetMap.

data-science data-visualization downhill elevation geospatial gis mapping open-data openskimap openstreetmap orientation python quarto ski-areas skiing slope snowpack solar-irradiance sunlight topography

Last synced: 21 Jul 2025

https://github.com/anaclumos/heart-diagnosis-engine

2019๋…„ ๋ฏผ์กฑ์‚ฌ๊ด€๊ณ ๋“ฑํ•™๊ต ์กธ์—… ํ”„๋กœ์ ํŠธ

data-science machine-learning pandas python scikit-learn

Last synced: 22 Aug 2025

https://github.com/nas5w/imdb-data

A JSON file of 50,000 IMDB movie reviews to be used in machine learning applications.

data data-science imdb javascript machine-learning

Last synced: 19 Apr 2025

https://github.com/numeract/rflow

Flexible R Pipelines with Caching

cache data-science pipeline r rflow

Last synced: 28 May 2026

https://github.com/rbhatia46/python-for-data-science

This repository contains iPython notebooks to get you started with sufficient amount of Python you need to learn to get started with your Data Science Journey.

data-science python-basics python3

Last synced: 03 Sep 2025

https://github.com/networks-learning/discussion-complexity

Code for "On the Complexity of Opinions and Online Discussions", WSDM 2019

complexity data-science discussion online-discussions opinion-mining paper wsdm

Last synced: 10 Aug 2025

https://github.com/urbanclimatefr/coursera-learn-sql-basics-for-data-science

This repository contains the materials to "Learn SQL Basics for Data Science", a specialization provided by University of California, Davis through Coursera.

coursera data-science sql

Last synced: 19 Feb 2026