An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/czagoni/darkgreybox

DarkGreyBox: An open-source data-driven python building thermal model inspired by Genetic Algorithms and Machine Learning

data-science genetic-algorithm machine-learning model python thermal

Last synced: 08 Apr 2026

https://github.com/welding-torch/excel-anonymizer

A Python script that anonymizes an Excel file and synthesizes new data in its place.

data-science microsoft nlp pandas presidio privacy

Last synced: 11 Apr 2025

https://github.com/okfn-brasil/whistleblower

🚨A Twitter bot for publicly reporting suspicions found by Rosie, Serenata de Amor's AI

data-science facebook-messenger-bot machine-learning twitter-bot

Last synced: 28 Mar 2025

https://github.com/ingoscholtes/kdd2018-tutorial

Companion repository for the KDD'18 hands-on tutorial on Higher-Order Data Analytics for Temporal Network Data

data-analytics data-science graph-mining higher-order-models network-science

Last synced: 11 Oct 2025

https://github.com/dfinke/psduckdb

PSDuckDB is a PowerShell module that provides seamless integration with DuckDB, enabling efficient execution of analytical SQL queries directly from the PowerShell environment.

data-analysis data-science duckdb powershell sql

Last synced: 16 Mar 2025

https://github.com/tushar2704/powerbi-portfolio

Welcome to my personal Power BI portfolio repository! Here you will find a collection of Power BI projects and dashboards that demonstrate my skills and expertise in data visualization, business intelligence, and analytics using Power BI.

artificial-intelligence dashboards data-science data-visualization powerbi streamlit-tushar2704 tushar2704

Last synced: 24 Jan 2026

https://github.com/elysian01/data-purifier

A Python library for Automated Exploratory Data Analysis, Automated Data Cleaning, and Automated Data Preprocessing For Machine Learning and Natural Language Processing Applications in Python.

data-analysis data-cleaning data-cleaning-pipeline data-preprocessing data-science data-visualization datapurifier eda exploratory-data-analysis jupyter python-lib python-library python3

Last synced: 04 Oct 2025

https://github.com/ploomber/soopervisor

☁️ Export Ploomber pipelines to Kubernetes (Argo), Airflow, AWS Batch, SLURM, and Kubeflow.

airflow argo argo-workflows aws data-science kubeflow kubeflow-pipelines kubernetes machine-learning slurm workflow

Last synced: 21 Aug 2025

https://github.com/soumyadip007/data-science-using-python-university-course-module

“Data science” is just about as broad of a term as they come. It may be easiest to describe what it is by listing its more concrete components: Data exploration & analysis. Included here: Pandas; NumPy; SciPy; a helping hand from Python's Standard Library.

data-preparation data-preprocessing data-processing data-science data-visualization jupyter-notebook knn numpy panda plotting python

Last synced: 23 Jun 2025

https://github.com/electronick1/stairs

Framework which helps you to make parallel/distributed calculations using data pipelines

data-engineering data-pipeline data-science distributed-computing python

Last synced: 23 Apr 2025

https://github.com/flintml/flint

A self-contained, lightweight and OOB research platform for modern ML

data-science deltalake jupyter machine-learning mlops polars

Last synced: 09 May 2025

https://github.com/AidanCooper/shap-analysis-guide

How to Interpret SHAP Analyses: A Non-Technical Guide

data-science machine-learning shap tutorial

Last synced: 01 May 2025

https://github.com/ismailuddin/markovclick

Python package to model clickstream data as a Markov chain. Inspired by R package clickstream.

analytics clickstream data-science markov-chain python

Last synced: 07 Apr 2026

https://github.com/briatte/dsr

Introduction to Data Science with R

course data-analysis data-science data-visualization r statistics

Last synced: 16 Mar 2025

https://github.com/mlabonne/how-to-data-science

Scripts, notebooks, and articles about data science in general.

data-science numpy pandas pandas-dataframe python pytorch

Last synced: 06 Sep 2025

https://github.com/joaopaulolndev/my-data-scientist-roadmap

Description about my roadmap to become Data Scientist and Engineer Machine Learning

artificial-intelligence data-science deep-learning machine-learning python python3

Last synced: 23 Apr 2025

https://github.com/dfinke/PSDuckDB

PSDuckDB is a PowerShell module that provides seamless integration with DuckDB, enabling efficient execution of analytical SQL queries directly from the PowerShell environment.

data-analysis data-science duckdb powershell sql

Last synced: 15 Aug 2025

https://github.com/lukasmosser/snist

A Benchmark for Seismic Velocity Inversion from Synthetics

data-science deep-learning geology geophysics machine-learning physics seismic waveform

Last synced: 21 Aug 2025

https://github.com/kb22/GitHub-User-Insights-using-API

The project involves using the GitHub API using user authentication to fetch information such as commits and repositories for that specific user and store them as CSV files for data collection and analysis.

api data-analysis data-science data-scraping github-api python

Last synced: 16 Apr 2025

https://github.com/imgcook/datacook

Machine Learning and Data Analysis in JavaScript.

data-science feature-engineering javascript machine-learning

Last synced: 24 Jun 2025

https://github.com/fremantle-industries/prop

An open and opinionated trading platform using productive & familiar open source libraries and tools for strategy research, execution and operation.

algo-trading data-science defi elixir grafana trading-platform

Last synced: 13 Apr 2025

https://github.com/plantinformatics/pretzel

Javascript full-stack framework for Big Data visualisation and analysis

big-data bioinformatics data-science data-visualization ember emberjs express expressjs javascript open-source

Last synced: 17 Jun 2025

https://github.com/franzdiebold/data-science-cheat-sheets

A collection of Data Science cheat sheets.

cheat-sheet cheat-sheets data-science pandas

Last synced: 20 Jun 2025

https://github.com/matteocourthoud/Machine-Learning-for-Economic-Analysis

Material for the exercise sessions of master course Machine Learning for Economic Analysis @UZH

course data-science econometrics economics machine-learning phd python statistics

Last synced: 15 Jun 2026

https://github.com/mlr-org/mlr3torch

Deep learning framework for the mlr3 ecosystem based on torch

data-science deep-learning machine-learning mlr3 r r-package torch

Last synced: 31 Mar 2025

https://github.com/datalab-platform/datalab

Open-source Platform for Scientific and Technical Data Processing and Visualization

data-science data-visualization image-processing opencv python scientific-computing scikit-image scipy signal-processing visualization

Last synced: 18 Nov 2025

https://github.com/leriomaggio/python-data-science

Lecture notes and materials for Python Data Science course

data-science jupyter-notebooks machine-learning materials python-tutorials

Last synced: 26 Oct 2025

https://github.com/gcedo/master-thesis

The (un)official repository for my master thesis

botnet-detection data-science latex master-thesis svm-classifier

Last synced: 08 Sep 2025

https://github.com/vida-nyu/data-polygamy

Data Polygamy is a topology-based framework that allows users to query for statistically significant relationships between spatio-temporal data sets.

data data-science nyucds

Last synced: 10 Apr 2025

https://github.com/hemansnation/python-for-data-professionals

This course is designed to get a good grip on python programming, logic building, solving algorithm-based questions, data structures, understanding of data analytics, working with pandas, professional practices, and API building.

data-analytics data-professionals data-science exploratory-data-analysis logic-programming machine-learning pandas python

Last synced: 23 Jul 2025

https://github.com/weiji14/deepbedmap

Going beyond BEDMAP2 using a super resolution deep neural network. Also a convenient flat file data repository for high resolution bed elevation datasets around Antarctica.

antarctica bedmap binder chainer data-science deep-neural-network digital-elevation-model flat-file-db generative-adversarial-network glaciology jupyter-notebook optuna pangeo remote-sensing super-resolution

Last synced: 13 Sep 2025

https://github.com/martolen1/data-science

Comprehensive repository of Data Science projects spanning Machine Learning, Deep Learning, and Natural Language Processing. Demonstrates practical applications of algorithms and tools on real-world datasets.

cnn-model data-analysis data-science data-visualization deep-learning gans-models keras machine-learning-algorithms natural-language-processing python3 rnn-lstm scikit-learn tensorflow transfer-learning transformers

Last synced: 31 Jul 2025

https://github.com/marimo-team/examples

A curated collection of example marimo notebooks — use these as templates for your own experiments, workflows, and tools.

ai data-engineering data-science examples machine-learning marimo notebooks python

Last synced: 14 Apr 2025

https://github.com/juanitorduz/website_projects

Repository containing the code of the projects presented in my personal website.

data-science data-viz knowledge-sharing mathematics python rstats

Last synced: 23 Jun 2025

https://github.com/rfordatascience/rfordatasciencewiki

Resources for the R4DS Online Learning Community, including answer keys to the text

beginner beginner-friendly beginner-tutorial-series data-science help-wanted r4ds rstats rstudio tidyverse

Last synced: 07 May 2025

https://github.com/sun-umn/pygranso

PyGRANSO: A PyTorch-enabled port of GRANSO with auto-differentiation

computer-vision data-science deep-learning machine-learning mathematical-software numerical-optimization

Last synced: 14 Jan 2026

https://github.com/nicolaskruchten/scipy2021

Data Visualization as the First and Last Mile of Data Science: Plotly Express and Dash

data-analysis data-science data-visualization python visualization

Last synced: 23 Apr 2025

https://github.com/stefan-m-lenz/BoltzmannMachines.jl

A Julia package for training and evaluating multimodal deep Boltzmann machines

data-science deep-boltzmann-machine deep-learning julia machine-learning neural-networks restricted-boltzmann-machine

Last synced: 04 May 2025

https://github.com/goldencheetah/scikit-sports

Sports analysis library for Python

data-science sports

Last synced: 19 Apr 2025

https://github.com/rrrlw/tdastats

R pipeline for computing persistent homology in topological data analysis. See https://doi.org/10.21105/joss.00860 for more details.

cran data-science ggplot2 homology homology-calculations homology-computation joss persistent-homology pipeline r r-package r-packages ripser tda topological-data-analysis topology topology-visualization visualization

Last synced: 23 Aug 2025

https://github.com/njanakiev/openstreetmap-data-science

Data Science with OpenStreetMap

data-science openstreetmap python

Last synced: 03 Oct 2025

https://github.com/joaquinamatrodrigo/cienciadedatos.net

Web de divulgación con material formativo sobre estadística, algoritmos de machine learning, ciencia de datos y programación en R y Python.

analytics ciencia-de-dados data-science estadistica forecasting machine-learning python r-programming rstats statistics

Last synced: 05 May 2025

https://github.com/realdatadriven/etlx

ETL / ELT / Reverse ETL Framework powered by DuckDB, designed to seamlessly integrate and process data from diverse sources. It leverages Markdown as a configuration medium, where YAML blocks define metadata for each data source, and embedded SQL blocks specify the extraction, transformation, and loading logic.

data-engineering data-lake data-lakehouse data-quality data-quality-checks data-quality-monitoring data-science duckdb elt elt-pipeline etl etl-elt-pipelines etl-pipeline object-storage relational-databases report report-automation s3 s3-storage

Last synced: 30 Apr 2026

https://github.com/scientific-python/scientific-python.org

Source for the Scientific Python Project homepage.

data-science python scientific-computing

Last synced: 27 Jan 2026

https://github.com/afermg/cp_measure

Morphological features from images and masks made easy.

computer-vision data-science imaging microscopy

Last synced: 14 Jan 2026

https://github.com/facultyai/scala-plotly-client

Visualise your data from Scala using Plotly

data-science graph plot plotly scala visualisation

Last synced: 14 Apr 2025

https://github.com/fsprojects/furnace

Production-grade ML - F# power & precision guiding Torch performance

ai data-science differential-equations dotnet fsharp llm-framework machine-learning ml optimization

Last synced: 07 Apr 2025

https://github.com/ipeirotis/introduction-to-python

Notes for the "Introduction to Programming for Data Science" class

data-science for-beginners python python3

Last synced: 01 Jul 2025

https://github.com/rrrlw/TDAstats

R pipeline for computing persistent homology in topological data analysis. See https://doi.org/10.21105/joss.00860 for more details.

cran data-science ggplot2 homology homology-calculations homology-computation joss persistent-homology pipeline r r-package r-packages ripser tda topological-data-analysis topology topology-visualization visualization

Last synced: 20 Nov 2025

https://github.com/SOCR/SOCRAT

A Dynamic Web Toolbox for Interactive Data Processing, Analysis, and Visualization

data-analysis data-science data-visualization socr statistics visual-analytics visualization

Last synced: 02 Apr 2025

https://github.com/edinsonrequena/articicial-inteligence-and-data-science

Este repositorio esta basado principalmente en la carrera de machine learning y data science de platzi pero también habrán recursos de otras plataformas e instituciones educativas.

algebra algorithms articicial-inteligence data-science instituciones-educativas jupyter-notebook platzi python university

Last synced: 17 Mar 2025