An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/privacy-tech-lab/cross-device-tracking

Data and software for cross-device tracking data collection

cross-device-tracking data-science internet-tracking privacy privacy-tech

Last synced: 30 Apr 2025

https://github.com/chendaniely/ds4biomed

Data Science for the Biomedical Sciences

biomedical-sciences data-science

Last synced: 11 Apr 2025

https://github.com/facultyai/ipydataclean

Interactive cleaning for Pandas DataFrames

data-cleaning data-science dataframe jupyter-notebook pandas

Last synced: 26 Aug 2025

https://github.com/alexioannides/lime-interpretable-ml

An example of how the LIME algorithm can be used to provide real-world insight into the decision processes of a 'black-box' machine learning algorithm - in this case a Radom Forest regressor.

data-science interpretability lime machine-learning numpy pandas pydata python scikit-learn

Last synced: 29 Oct 2025

https://github.com/feedzai/feedzai-openml

API for Feedzai's Open Machine Learning that allows to integrate ML algorithms in Feedzai's platform.

api data-science feedzai machine-learning openml

Last synced: 30 Apr 2025

https://github.com/vinibrsl/internet-affordability

๐ŸŒ Did you know that internet costs >20% of the average income in some countries?

data-science dataset human-rights insights jupyter-notebook scraping

Last synced: 12 Sep 2025

https://github.com/lintangwisesa/python_fundamental_datascience

Python ๐Ÿ for Jr Data Scientist ๐Ÿ“ˆ๐Ÿ“Š๐Ÿ“‰

data-science machine-learning python

Last synced: 12 Sep 2025

https://github.com/cscherrer/sossmlj.jl

SossMLJ makes it easy to build MLJ machines from user-defined models from the Soss probabilistic programming language

bayesian-inference data-science julialang mlj probabilistic-programming soss

Last synced: 10 Apr 2025

https://github.com/harrystaley/open-source-data-science-degree-python

A fully curated, open-source Data Science curriculum focused on Python. Includes top-tier university courses (MIT, Stanford, Princeton) covering essential topics in computer science, data analysis, machine learning, and statistics โ€” everything you need to build a solid foundation in Data Science, 100% free.

data data-science dataanalysis datasci ds open open-source py python python3 science source statistics

Last synced: 13 Apr 2025

https://github.com/morningman/mcp-doris

An MCP server for Apache Doris & VeloDB

data-science doris mcp-server

Last synced: 14 Dec 2025

https://github.com/Absolventa/iruby-chartkick

Minimalistic wrapper around chartkick for using it within iruby

chartkick data-science iruby rubydatascience visualization

Last synced: 07 May 2025

https://github.com/giswqs/streamlit-mapbox

A Streamlit Component for rendering Mapbox GL JS

data-science geospatial mapping streamlit streamlit-component streamlit-webapp

Last synced: 19 Jul 2025

https://github.com/codait/flight-delay-notebooks

Analyzing flight delay and weather data using Elyra, IBM Data Asset Exchange, Kubeflow Pipelines and KFServing

codait data-science elyra jupyter jupyter-notebook jupyterlab kfserving kubeflow-pipelines machine-learning

Last synced: 06 May 2025

https://github.com/charlesaverill/satyrn

A Notebook alternative that supports branching code and local collaboration.

data-science full-stack ide jupyter-notebook machine-learning open-source python web-development

Last synced: 11 Apr 2025

https://github.com/yashksaini-coder/floraloracle--iris-inference-hub

The Objective is to combine the Prediction & classification scenarios of Machine Learning algorithms on the morphological Flower dataset

classification data-science jupyter-notebook machine-learning machine-learning-algorithms machinelearning prediction-model python3 scikit-learn scikitlearn-machine-learning

Last synced: 11 Apr 2025

https://github.com/martin-sicho/genui-gui

GenUI frontend application. It provides a GUI to the GenUI REST API web services.

cheminformatics data-science gui molecular-generation qsar react visualization webapp

Last synced: 19 Jan 2026

https://github.com/flipkart/foxtrot

A store abstraction and analytics system for real-time event data.

alerting analytics data-engineering data-science data-visualization elasticsearch hbase java monitoring

Last synced: 12 Dec 2025

https://github.com/autonlab/aqua

AQuA: A Benchmarking Tool for Label Quality Assessment

data-centric-ai data-cleaning data-science label-errors machine-learning robust-machine-learning

Last synced: 16 Jan 2026

https://github.com/exasol/data-science-examples

Collection of data science and machine learning examples with Exasol

data-science exasol-integration

Last synced: 07 May 2025

https://github.com/larribas/dagger

Define sophisticated data pipelines with Python and run them on different distributed systems (such as Argo Workflows).

argo-workflows data-engineering data-pipelines data-science distributed-systems pipelines-as-code workflows

Last synced: 28 Jul 2025

https://github.com/Lemniscate-world/Neural

Neural is a domain-specific language (DSL) designed for defining, training, debugging, and deploying neural networks. With declarative syntax, cross-framework support, and built-in execution tracing (NeuralDbg), it simplifies deep learning development.

automation data-science data-visualization diagrams dsl hyperparameter-optimization lark llms machine-learning neural-architecture-search neural-networks neural-networks-and-deep-learning neural-networks-from-scratch nocode onnx pytorch tensorflow visual-programming-language visualization

Last synced: 13 Oct 2025

https://github.com/nicbet/infozilla

The infoZilla unstructured software engineering data mining tool. It can find and extract source code regions, patches, stack traces, enumerations and itemizations from discussion threads.

bugreport bugzilla data-mining data-science tools unstructured-data

Last synced: 13 Oct 2025

https://github.com/milos-agathon/map-rivers-with-sf-and-ggplot2-in-r

Let's make a pretty map of European rivers using the Global River Classification dataset ๐Ÿง‘๐Ÿผโ€๐Ÿ’ป Check the full tutorial at https://milospopovic.net/map-rivers-with-sf-and-ggplot2-in-r/

data-science data-visualization gis r rivers

Last synced: 04 Apr 2026

https://github.com/dakimura/learn-data-science-for-free-jp

ๅˆๅญฆ่€…ใŒใƒ‡ใƒผใ‚ฟใ‚ตใ‚คใ‚จใƒณใ‚นใ‚’ใ‚ณใƒณใƒ‘ใ‚ฏใƒˆใซไธ€้€šใ‚ŠๅญฆใถใŸใ‚ใฎ็„กๆ–™ใฎ่ณ‡ๆ–™ใงใ™ใ€‚

artificial-intelligence computer-vision data-science datascienceproject deeplearning machine-learning machine-learning-algorithms natural-language-processing neural-networks

Last synced: 09 Apr 2025

https://github.com/tushar2704/streamlit-magic-cheat-sheets

Streamlit Magic Cheat Sheets- All of Streamlit in one Streamlit App!(Available in English, Franรงais & Deutsch.)

data-science machine-learning python snowflake streamlit streamlit-tushar2704 tushar2704 webapp

Last synced: 27 Apr 2026

https://github.com/chifisource/oddframes.jl

The unique data management platform for Julia

data data-science julia machine-learning

Last synced: 13 Aug 2025

https://github.com/yashuv/python-for-data-science-ai-and-development

Python for Data Science, AI & Development - offered by IBM on Coursera

coursera-course data-analysis data-science ibm numpy pandas python

Last synced: 12 Apr 2025

https://github.com/autonomio/wrangle

A data transformation package for deep learning with Autonomio, Keras and TensorFlow.

data-science deep-learning etl keras resampling transformation wrangling

Last synced: 07 Apr 2025

https://github.com/mostafa-wael/exploring-the-landscape-of-the-egyptian-software-market

A Data-driven Approach. Our story begins with a quest for knowledge. The information we obtained from LinkedIn offers us unprecedented insights into the landscape of the Egyptian Software Market.

carrers data-science hypothesis-testing linkedin market scraping software story-telling

Last synced: 13 Sep 2025

https://github.com/mohammadreza-mohammadi94/data-analysis-and-machine-learning-projects

A comprehensive collection of data analysis and machine learning projects, showcasing techniques and models for various data challenges. Dive in to explore code examples, analyses, and machine learning workflows.

data-analysis data-science dataframes deep-learning exploratory-data-analysis hyperparameter-tuning machine-learning machine-learning-algorithms pandas python scikit-learn visualization

Last synced: 06 Oct 2025

https://github.com/codelibs/fione

Fione is Enterprise AI Platform

ai automl data-science machine-learning

Last synced: 30 Apr 2025

https://github.com/som-research/hfcommunity

HFCommunity offers an offline up-to-date relational database built from the data available at the Hugging Face Hub, providing queriable data about the repositories hosted in the Hub

data-science database dataset huggingface

Last synced: 05 Apr 2026

https://github.com/clojure-finance/datajure

Clojure data manipulation DSL โ€” composable query syntax built on tech.ml.dataset

clojure data-manipulation data-science dataframe dsl empirical-research query-dsl tech-ml-dataset

Last synced: 20 Apr 2026

https://github.com/latentcat/network-vis

WIP. Visualization of social networks. ็คพไบค็ฝ‘็ปœๅฏ่ง†ๅŒ–ใ€‚

complex-networks data data-science graph graph-visualization social-media social-network-analysis visualization wechat

Last synced: 16 Mar 2026

https://github.com/eshikashah/ibm-data-science-ml-with-python-project

Capstone project for IBM data science course - ML with python.

algorithms data-analysis data-science ibm machine-learning python

Last synced: 07 May 2025

https://github.com/srowen/cdsw-simple-serving

Modeling Lifecycle with ACME Occupancy Detection and Cloudera

cloudera cloudera-data-science data-science openscoring pmml workbench

Last synced: 10 Oct 2025

https://github.com/mehmetkahya0/web-resource-downloader

This is a Python script that downloads all resources (images, scripts, stylesheets, etc.) from a given website.

algorithms beautifulsoup4 bs4 bs4-requests data-analysis data-science datascience python python3 requests scraper scraping

Last synced: 12 Oct 2025

https://github.com/mratsim/humpback-whale-identification

Kaggle Humpback whale identification: 2xGPU Data augmentation + FP16 mixed precision training

computer-vision data-science identification kaggle pytorch

Last synced: 07 May 2025

https://github.com/philipyip1988/python-notebooks

Data-science tutorials covering Python, Object-Orientated Programming Python standard libraries such as collections, itertools, math, statistics, random and datetime. The tutorials also cover the data-science libraries such as numpy, pandas, matplotlib and seaborn as well as the conda ecosystem.

anaconda anaconda-environment conda data-science jupyterlab math matplotlib numpy pandas python python3 seaborn statistics

Last synced: 31 Jul 2025

https://github.com/pmbrull/azure-ds-examdp100-notes

Personal notes for the Azure Data Science exam DP-100

azure cloud data-science exam machine-learning notes python

Last synced: 12 Apr 2025

https://github.com/boniolp/dsymb-playground

[ICDE 2024] Python and Streamlit implementation of "d_{symb} playground: an interactive tool to explore large multivariate time series datasets"

clustering data-science data-visualization streamlit symbolization time-series time-series-analysis webapp

Last synced: 30 Apr 2025

https://github.com/darenasc/data-science-for-good

Data Science for Good links.

data-for-good data-science

Last synced: 12 Jan 2026

https://github.com/khuyentran1401/prefect-alert

A decorator that sends alert when a Prefect flow fails

data data-engineering data-science prefect python

Last synced: 13 Apr 2025

https://github.com/vunb/node-crfsuite

A nodejs binding for crfsuite

crf crfsuite data-science node-crfsuite vntk

Last synced: 17 Jul 2025

https://github.com/compilerla/data-donuts

Public sector breakfast lecture series meant to inspire.

data-donuts data-science events government los-angeles

Last synced: 12 Feb 2026

https://github.com/vidhi1290/deep-learning-for-eeg-emotion-classification

This repository contains a Python code script for performing emotion classification using EEG (Electroencephalogram) data. Emotion classification from EEG signals is an important application in neuroscience and human-computer interaction. The code leverages deep learning techniques to analyze EEG data and predict emotional states.

coorelation data-exploration data-preprocessing data-science data-visualization deep-learning deep-learning-algorithms eeg-emotion-recognition egg-signals emotion-distribution emotion-prediction feature-analysis heatmap human-emotions machine-learning machine-learning-algorithms pie-chart spectral-analysis time-series-visualization

Last synced: 10 Apr 2025

https://github.com/laderast/burro

Exploring data together using shiny (burro(w) into the data)

data-science data-visualization eda exploratory-data-analysis shiny

Last synced: 25 Sep 2025

https://github.com/ditikrushna/predict-sales-revenue-using-multiple-regression-model

In this project you will build and evaluate multiple linear regression models using Python. You will use scikit-learn to calculate the regression, while using pandas for data management and seaborn for data visualization. The data for this project consists of the very popular Advertising dataset to predict sales revenue based on advertising spending through media such as TV, radio, and newspaper.

data-science multiple-regression multiple-regression-analysis regression-models seaborn

Last synced: 01 Mar 2025

https://github.com/reddyprasade/python-basic-for-all-3.x

We are going to Learn Python, it is a powerful multi-purpose programming language created by Guido van Rossum. It has simple easy-to-use syntax, making it the perfect language for someone trying to learn computer programming for the first time. This is a comprehensive guide on how to get started in Python, why you should learn it and how you can learn it. However, if you knowledge of other programming languages and want to quickly get started with Python.

comprehensive-guide data-science knowledge perfect-language programming-languages python python-3 python-3-6 python3

Last synced: 25 Feb 2026

https://github.com/adamrossnelson/stataipedsall

Scripts to download and build panel data files for IPEDS.

bigdata data-science ipeds test-scores

Last synced: 13 Feb 2026

https://github.com/dongjunlee/beawesometoday

Be Awesome Today - My Awesome List & Today I Learned & Blogging Articles

awesome-list blog chatbot data-science deep-learning machine-learning python til today-i-learned

Last synced: 17 Mar 2026

https://github.com/glentner/dataphile

Data analytics library for Python and suite of open source, command line based data ops tools.

data-analysis data-ops data-science python scientific-computing

Last synced: 07 May 2025

https://github.com/omarsar/data_mining_2017_fall_lab

Contains information and instructions for the first Data Mining lab session for 2017 Fall.

data data-analysis data-mining data-science data-visualization

Last synced: 08 Sep 2025

https://github.com/yashksaini-coder/bharat-intern

Exploring data's depths with Bharat Intern! ๐ŸŒ๐Ÿš€ Unveiling insights from the dataset, my project is a fusion of creativity and analytics. Stay tuned for more! ๐Ÿ“Š

classification data data-science data-visualization internship-project

Last synced: 19 Jul 2025

https://github.com/chalmerlowe/jupyter_tutorial

An introduction to Jupyter and Jupyter Labs for data analysis, data science, and Python development

data-analysis data-science jupyter jupyter-notebook jupyterlab notebook python tutorial

Last synced: 10 Apr 2025

https://github.com/nceas/nceas-training

Training materials and modules from R-based data science short courses at NCEAS

data-science lessons training

Last synced: 30 Apr 2025

https://github.com/sravb/nba-predictive-analytics

Being able to perform gameplay analysis of NBA players, NBA Predictive Analytics is a basketball coach's new best friend.

basketball data-mining data-science data-visualization decision-tree k-nearest-neighbors kaggle-dataset machine-learning matplotlib nba-analytics pandas predictive-analytics python scikit-learn scipy

Last synced: 07 May 2025

https://github.com/open-risk/correlationmatrix

correlationMatrix is a Python powered library for the statistical analysis and visualization of correlations

correlation-analysis correlation-matrices data-analysis data-science statistics

Last synced: 04 Jul 2025

https://github.com/greenelab/gbm_immune_validation

Validating glioblastoma immune cell immunohistochemsitry using computational deconvolution of TCGA tumors

analysis cancer data-science gene-expression glioblastoma machine-learning survival-analysis tool

Last synced: 07 Jul 2025

https://github.com/shivangraikar/datasciencevalue

Web application created using Streamlit to host an intelligent salary predictor. The project returns the position of the user in this particular field of Data Science.

data-science heroku-deployment logistic-regression machine-learning streamlit-webapp

Last synced: 15 Apr 2025

https://github.com/udst/bayarea_urbansim

UrbanSim implementation for the San Francisco Bay Area

bay-area data-science modeling simulation urbansim

Last synced: 07 May 2025

https://github.com/amey-thakur/python-crash-course

IIT ROPAR - Diginique Techlabs --> Data Science Machine Learning and AI using Python

ai amey ameythakur data-science data-science-projects house-price-prediction machine-learning python python-crash-course

Last synced: 07 Oct 2025

https://github.com/predicthq/phq-data-science-docs

PredictHQโ€™s Data Science documentation

data-science

Last synced: 11 Jun 2025

https://github.com/laderast/cvdriskdata

R package for Cardiovascular Risk Dataset and Data generation script

cardiovascular data-science synthetic-data synthetic-dataset-generation

Last synced: 25 Sep 2025

https://github.com/alex-snd/malwareclassifier

๐Ÿ‘พ Malware Classification using Deep Learning and Cuckoo Sandbox

cuckoo-sandbox cvae data-science deep-learning malware malware-classification malware-detection python pytorch vae

Last synced: 25 Apr 2025