An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/hfawaz/miccai18

Evaluating surgical skills from kinematic data using convolutional neural networks

class-activation-maps cnn cnn-keras data-science deep-learning research-paper surgery surgical time-series-classification

Last synced: 09 Apr 2025

https://github.com/saranshbansal/data-science-with-python

Data science with Python: This repository mostly contains DataCamp data-science courses/exercises that I have completed.

data-analysis data-science datacamp-exercises numpy python

Last synced: 07 Oct 2025

https://github.com/aarkue/rust4pm

Rust4PM: Rust for Process Mining. For more details, documentation and the Python/Web UI Bindings visit https://rust4pm.aarkue.eu/!

data-science process-mining processmining

Last synced: 15 Mar 2026

https://github.com/montanaz0r/mma-parser-for-sherdog-and-ufc-data

Python web scraper for Sherdog & UFC data. Creates output of your choice in csv or json format.

beautifulsoup data-science mma python ufc webscraping

Last synced: 12 Aug 2025

https://github.com/graphbookai/graphbook

The framework for AI-driven data pipelines. Build interactive, highly efficient data pipelines with PyTorch. ⭐ Leave a star to support us!

ai data-processing data-processing-pipelines data-science framework machine-learning ml pytorch research workflow

Last synced: 07 Sep 2025

https://github.com/gagolews/teaching-data

Dr Marek's Data for Teaching/Training

data data-science data-wrangling datasets machine-learning

Last synced: 03 Jan 2026

https://github.com/chiarorosa/ia_aprendizado_maquina_basico

Material Básico sobre Inteligência Artificial aplicando Aprendizado de Máquina e Data Science

artificial-intelligence data-science machine-learning python

Last synced: 26 Jul 2025

https://github.com/azure/azure-data-labs

Terraform templates to deploy Azure Data resources

analytics azure blueprints data data-science github github-actions labs terraform

Last synced: 20 Oct 2025

https://github.com/octoenergy/tentaclio

Single repository regrouping IO connectors used in the data world.

data data-science database-connections protocols python3

Last synced: 24 Jun 2025

https://github.com/hoangsonww/global-covid19-analysis

🌍 This repository hosts an in-depth analysis of COVID-19's impact across five key countries from Jan 2020 to Dec 2021. Through advanced data analysis and visualization, we aim to provide insights into how the pandemic evolved differently across these nations, shedding light on the effectiveness of various health measures and vaccination campaigns.

covid covid-19 covid19-tracker data data-analysis data-analytics data-science data-visualization ggplot2 julia julia-language python r r-language r-markdown r-programming sas sas-programming stata vaccination

Last synced: 10 Apr 2025

https://github.com/hneth/ds4psy

Data science for psychologists (ds4psy): R package supporting book and course

data-literacy data-science education exploratory-data-analysis psychology r r-package social-sciences visualisation

Last synced: 14 Apr 2025

https://github.com/gagolews/genie

Genie: A Fast and Robust Hierarchical Clustering Algorithm (this R package has now been superseded by genieclust)

cluster cluster-analysis clustering data-analysis data-mining data-science datascience genie hierarchical-clustering-algorithm machine-learning machine-learning-algorithms outliers r

Last synced: 14 Jul 2025

https://github.com/esri/arcpy

Resources and ideas about arcpy and Python in ArcGIS.

arcgis-enterprise arcgis-pro arcgis-server arcpy data-science esri geoprocessing gis jupyter python

Last synced: 07 Jul 2025

https://github.com/naqvis/crysda

Crystal library for Data Analysis, Wrangling, Munging

crystal crystal-lang crystal-language crystal-shard data-a data-science data-wrangling

Last synced: 22 Jun 2025

https://github.com/mainakrepositor/covid19-india-bcr

A bar chart race demonstrating the start and trends of COVID-19 in India

barchartrace covid-19 data-science data-visualization dataanalysisandmlusingpython visualization

Last synced: 02 May 2025

https://github.com/rfordatascience/r4dswebsite

Public repository for the R4DS community website.

blogdown data-analysis data-analytics data-science data-visualization r r4ds tidyverse

Last synced: 11 Apr 2025

https://github.com/nrennie/data-science-resources

Resources relating to data science.

data-science resources

Last synced: 02 Apr 2025

https://github.com/kalebu/desktop-chatbot-app

A python knowledge-based chatbot application built with Tkinter

chatbot chatbot-application data-science nlp nlp-projects python-tanzania python3 tanzania

Last synced: 07 May 2025

https://github.com/Azure/azure-data-labs

Terraform templates to deploy Azure Data resources

analytics azure blueprints data data-science github github-actions labs terraform

Last synced: 06 May 2025

https://github.com/njanakiev/scalable-geospatial-data-science

Scripts and notebooks for scalable geospatial data science

data-science geospatial python

Last synced: 10 Apr 2025

https://github.com/morgan-sell/caiso-price-forecast

Predicts the CAISO day-ahead market hourly prices using different forecasting methods including ARIMA and LSTM.

arima data-science electricity-prices lstm neural-networks python time-series

Last synced: 22 Jun 2025

https://github.com/loaiabdalslam/dbd

Demo By Demo Machine Learning Book Written in Arabic

book data-science deep-learning machine-learning

Last synced: 29 Jan 2026

https://github.com/ragibhasan894/phishing_website_detection

This project is based on detecting phishing/fraud/malicious website using Random Forest Classification formula. Implemented using Python programming language and Django framework.

cyber-security data-mining data-science django django-framework machine-learning phsihing python random-forest scikit-learn security

Last synced: 26 Oct 2025

https://github.com/cjtu/craterpy

A python library for impact crater data science

data-science open-science planetary python

Last synced: 10 Apr 2026

https://github.com/mindbeam/mindbase

A database for convergent intersubjectivity

data-science database language ontologies

Last synced: 08 Apr 2025

https://github.com/heavyai/heavyai.jl

Julia client for OmniSci GPU-accelerated SQL engine and analytics platform

cuda data-science database gpu julia-language julia-package julialang sql

Last synced: 13 Aug 2025

https://github.com/imvladikon/yandex-practicum

tasks and projects from the data science course by Yandex.Practicum

data-science jupyter-notebook

Last synced: 28 Apr 2025

https://github.com/pyurbans/urbans

A tool for translating text from source grammar to target grammar (context-free) with corresponding dictionary.

artificial-intelligence data-science machine-translation nlp python

Last synced: 24 Apr 2025

https://github.com/mr-easy/badminton-stroke-classification

Classifying badminton strokes based on accelorometer and gyroscope sensor data attached to player's wrist. An end-to-end Machine Learning project, from data collection and preprocessing to final model evaluation.

badminton-stroke-classification data-analysis data-analytics data-science deep-learning machine-learning model-evaluation notebook project time-series-analysis tutorial

Last synced: 31 Aug 2025

https://github.com/gbeckers/darr

A Python library for numpy arrays that persist on disk in a format that is simple, self-documented and tool-independent, and maximizes universal readability.

array bsd-3-clause data-science data-sharing data-storage idl interoperability jagged-array julia-language maple mathematica matlab numeric octave python r ragged-array science scilab

Last synced: 21 Aug 2025

https://github.com/autonomio/studio

GUI for Keras and TensorFlow with integrated hyperparameter optimization and NLP

ai artificial-intelligence data-science deep-learning hyperparameter-optimization hyperparameter-tuning keras tensorflow

Last synced: 07 Apr 2025

https://github.com/earth-artificial-intelligence/earth_ai_book_materials

The repo contains the source code, notebooks, and technical resources that assist students to read the book Artificial Intelligence in Earth Science.

data-science earth-science machine-learning python

Last synced: 08 May 2025

https://github.com/khuyentran1401/python_snippet

Python and data science snippets on the command line

cli command-line command-line-tool data-science python python3 snippet

Last synced: 13 Apr 2025

https://github.com/rubydamodar/loan-approval-prediction-

Loan approval prediction is a popular machine learning project, especially in the banking and finance industry. The goal of this project is to build a predictive model that can determine whether a loan application will be approved or not based on the applicant's information such as income, credit history, and loan amount.

ai-in-finance banking classification classification-internal credit-risk data-science exploratory-data-analysis feature-engineering financial-analytics loan-approval machine-learning matplotlib pandas predictive-modeling python scikit-learn seaborn visualization

Last synced: 17 Jul 2025

https://github.com/maastrichtu-ids/dsri-documentation

📖 Documentation for the Data Science Research Infrastructure at Maastricht University

data-science data-science-research documentation dsri kubernetes openshift

Last synced: 15 Jun 2025

https://github.com/onlyphantom/textmining

Beginner's Introduction to Text Mining: An App Store Reviews Exercise

app appstore data-science r reviews sentiment-analysis text-mining wordcloud

Last synced: 11 Jul 2025

https://github.com/eocode/docker-spark-big-data

Exercises in Spark with Docker and Data Languages

big-data data-science docker java python scala spark

Last synced: 11 Oct 2025

https://github.com/rpodcast/shinycal

The Data Science StreamRs Calendar!

data-science r shiny streaming

Last synced: 07 Jul 2025

https://github.com/bcgov/wqbc

An R package for water quality thresholds and index calculation for British Columbia

data-science env r r-package rstats

Last synced: 15 Dec 2025

https://github.com/adamvvu/tsfracdiff

Efficient and easy to use fractional differentiation transformations for stationarizing time series data in Python.

data-science machine-learning python quantitative-finance

Last synced: 01 May 2025

https://github.com/pprevos/r4h2o

Data Science for Water Utilities: Data as a Source of Value is an applied, practical guide that shows water professionals how to use data science to solve urban water management problems.

data-science r water-utilities

Last synced: 10 Jul 2025

https://github.com/bcgov/groundwater-levels-indicator

R scripts for an indicator on long-term trends in groundwater levels in B.C. published on Environmental Reporting BC

data-science env r rstats soe

Last synced: 20 Jul 2025

https://github.com/scienxlab/redflag

Safety net for machine learning pipelines. Plays nice with sklearn and pandas.

data-quality data-quality-checks data-science machine-learning numpy pandas python

Last synced: 06 Oct 2025

https://github.com/dalekube/hr

Methods for better worker data engineering in the human resources (HR) corporate domain. Designed for HR analytics practitioner to get value from common workforce-oriented data sets.

analytics data data-engineering data-science human-resources r

Last synced: 08 Mar 2026

https://github.com/izam-mohammed/geminsights

🔍 GemInsights: Unleash Gemini AI on your data! 🚀 Analyze dataframes for valuable insights, replacing traditional data analysis. 📊 A cutting-edge tool revolutionizes the way you analyze dataframes, offering a paradigm shift from conventional data analysis methods.

ai autoviz data-science gemini gemini-api gemini-pro gemini-pro-vision google google-api llama-index llms python3 trulens vertex-ai

Last synced: 28 Apr 2025

https://github.com/majorlift/volatility-modeling-python-datasci

Undergraduate thesis, Seoul National University Dept. of Economics — "Modeling Volatility and Risk Spillover Between the Financial Markets of US and China Using GARCH Value-at-Risk Forecasting and Granger Causality."

arima-forecasting data-science data-vizualization financial-engineering garch-model granger-causality jupyter-notebook numpy pandas pyplot python3 regression-models research-paper risk-modelling scipy-stats seaborn statsmodels time-series-analysis value-at-risk volatility-modeling

Last synced: 25 Apr 2025

https://github.com/laurentrdc/javelin

Haskell implementation of data structures for data science

data-science data-structures-and-algorithms dataframe haskell quantitative-finance series

Last synced: 28 Apr 2025

https://github.com/erictleung/pixarfilms

:movie_camera: R data package to explore Pixar films, the people, and reception data

data data-science datapackage disney imdb imdb-dataset pixar pixar-films rstats web-scraping wikipedia

Last synced: 29 Oct 2025

https://github.com/aws/amazon-finspace-examples

This repo contains sample code and sample notebooks to illustrate how to work with Amazon FinSpace

aws data-science data-versioning examples finspace timeseries-analysis

Last synced: 20 Oct 2025

https://github.com/vianneymi/monggregate

Library to make MongoDB aggregation framework and pipelines easy to use in python.

aggregation-framework aggregation-pipeline data-science data-wrangling database mongodb nosql pandas pydantic pymongo query-builder query-engine

Last synced: 30 Jul 2025

https://github.com/cfpb/aurora

An open source enterprise data warehousing and analysis platform.

ansible data-science data-warehousing

Last synced: 09 Apr 2025

https://github.com/zhoudaxia233/pyalpha

A process mining tool written in Python3

alpha-miner data-science petri-net process-mining

Last synced: 06 Oct 2025

https://github.com/ikivanc/data-driven-cycling-and-workout-prediction

Data-Driven Cycling using Strava data and GPX data analysis. Digital Personal Trainer using old cycling workout data to predict new workouts

botframework chatbot csharp cycling cycling-workouts data-science digital-assistant fastapi gpx-files jupyter-notebook machine-learning machine-learning-algorithms microsoft-teams python strava strava-data

Last synced: 31 Jul 2025

https://github.com/buabaj/xplore

A python package built for data scientist/analysts, AI/ML engineers for exploring features of a dataset in minimal number of lines of code for quick analysis before data wrangling and feature extraction.

artificial-intelligence data-preprocessing data-science data-wrangling machine-learning

Last synced: 12 Apr 2025

https://github.com/longxingtan/python-profeld

ProFeld: survival analysis, predictive maintenance, churn analysis, and remaining useful life prediction in Python

data-science machine-learning predictive-maintenance profeld remaining-useful-life survival-analysis tensorflow time-to-event time-to-failure weibull-distribution

Last synced: 17 Sep 2025

https://github.com/charlywargnier/keywordmapperforbrightonseo

As part of my Brightonseo talk, I created a mighty Streamlit app which auto-maps your keywords to your crawled URLs!

data-science python seo streamlit

Last synced: 15 May 2025

https://github.com/hassaku/ds-and-ml-with-screen-reader

Data science and machine learning resources for screen reader users

colaboratory data-science machine-learning python screen-reader visually-impaired

Last synced: 01 Nov 2025

https://github.com/rurlus/diptest

Python/C++ implementation of Hartigan & Hartigan's dip test, based on Martin Maechler's R package

data-science modality python statistics unimodal

Last synced: 25 Oct 2025

https://github.com/databrickslabs/databricks-sdk-r

Databricks SDK for R (Experimental)

data-science databricks r sdk

Last synced: 27 Oct 2025

https://github.com/bdpedigo/networks-course

A short course on network data science at Johns Hopkins University

data-science jupyter-book network-analysis networks python python3 teaching teaching-materials

Last synced: 01 Mar 2026

https://github.com/primaprashant/ai-customer-support

📚 Curated collection of blogs and papers on how different companies are using machine learning in production for better customer support.

ai applied-data-science applied-machine-learning applied-ml artificial-intelligence customer-service customer-support data-science deep-learning machine-learning natural-language-processing nlp paper production tech-blog

Last synced: 10 Feb 2026

https://github.com/talitalobo/controle-social

Conteúdo sobre machine learning aplicado ao controle social. Artigos sobre: ML, estatística, ética, dentre outros.

data-science deep-learning machine-learning social-control social-good social-impact social-justice

Last synced: 08 Jan 2026

https://github.com/inab/biolitmap

Code for the paper "BIOLITMAP: a web-based geolocated and temporal visualization of the evolution of bioinformatics publications" in Oxford Bioinformatics.

data-mining data-science data-visualization machine-learning maps natural-language-processing research research-paper science social-analytics-team

Last synced: 22 Apr 2025

https://github.com/dineshpinto/nft-analytics

Modeling Ethereum NFTs using lognormal models and trait based pricing

analytics blockchain data-science data-visualization ethereum nfts opensea

Last synced: 12 Apr 2025

https://github.com/ax-va/python-machine-learning-recipes-gallatin-albon-2023

Machine learning recipes in Python with scikit-learn, OpenCV, PyTorch, and other libraries, including classical machine learning and neural networks, based on the book "Machine Learning with Python Cookbook: Practical Solutions from Preprocessing to Deep Learning", Second Edition, by Kyle Gallatin and Chris Albon published by O'Reilly Media in 2023

ax-va data-science deep-learning image-processing machine-learning neural-networks opencv opencv-python python pytorch scikit-learn

Last synced: 08 Oct 2025

https://github.com/ahammadmejbah/ahammadmejbah

Data Science || Machine Learning || Deep Learning || Computer Vision || NLP Enthusiast Talks about #datascience, #deeplearning, #dataanalytics, #machinelearning, and #machinelearningalgorithms

artificial-intelligence computer-vision data-science deep-learning machine-learning nlp python

Last synced: 27 Apr 2025

https://github.com/ahammadmejbah/pytorch-developers-roadmap

PyTorch is an open-source machine learning framework that provides a flexible platform for building, training, and deploying deep learning models. It is widely used for research and development in artificial intelligence, offering dynamic computation, GPU acceleration, and a rich ecosystem of libraries and tools.

ai data-science deep-learning developer machine-learning python python3 pytroch

Last synced: 27 Apr 2025

https://github.com/wilfredpine/python-tutorial

Notebook tutorials for Python Programming Language (Fundamentals, OOP, MVT, Frameworks, Django, Machine Learning, NLP)

ai computer-vision data-analytics data-science django-framework fundamentals machine-learning nlp oop python web web-development

Last synced: 14 Oct 2025

https://github.com/melling/data-science-from-scratch-swift

Data Science from Scratch Implemented in Swift

data-science swift

Last synced: 09 Apr 2025

https://github.com/bartczernicki/ArtificialIntelligence-Presentations

Public location of delivered Artificial Intelligence & Machine Intelligence Presentations

analytics artificial-intelligence data-science machine-learning

Last synced: 18 Apr 2025

https://github.com/ccao-data/model-res-avm

Automated valuation model for all class 200 residential properties in Cook County (except vacant land and condos)

assessment data-science machine-learning model property-taxes r res tidymodels

Last synced: 10 Mar 2026

https://github.com/migraf/fhir-kindling

HL7 FHIR client library. Sync and async crud operations against R4 FHIR servers. Resource validation & serialization

data-science fhir fhir-client hl7 medical medical-data pydantic python

Last synced: 23 Aug 2025

https://github.com/edrubin/ec524w22

Masters-level applied econometrics course—focusing on prediction—at the University of Oregon (EC424/524 during Winter quarter, 2022) Taught by Ed Rubin

course data-science econometrics machine-learning prediction university

Last synced: 04 Jan 2026

https://github.com/edaaydinea/365-days-of-code-2022

This coding challenge is created by Eda AYDIN. This coding challenge aims to progress from Python programming to Computer Vision, Natural Language Processing in Artificial Intelligence and Neuroscience.

artificial-intelligence coding-challenges computer-science computer-vision data-science deep-learning machine-learning natural-language-processing neuroscience python

Last synced: 11 Apr 2025

https://github.com/mainakrepositor/whosthegoat

Find out which footballer is the greatest of all times from their La-Liga stats. Is it Leo Messi or CR7?

data-science data-visualization football-data messi ronaldo streamlit webapp

Last synced: 02 May 2025

https://github.com/mainakrepositor/brain-stroke-detection

Detects Brain Stroke using machine learning models with the highest optimal probability

data-science deployment-automation gui-application machine-learning streamlit-webapp

Last synced: 02 May 2025