An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/rlan/notebooks

A docker-based starter kit for machine learning via jupyter notebooks. Designed for those who just want a runtime environment and get on with machine learning. Docker Hub:

data-science deep-learning docker docker-image gpu-computing gpu-ready jupyter jupyter-notebook keras machine-learning notebook python pytorch pytorch-lightning scikit-learn starter-kit tensorboard tensorflow

Last synced: 17 Jan 2026

https://github.com/johnbumgarner/newshound

This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around the world in over 50 languages.

article-extracting article-extractor data-extraction data-mining data-science datascience news news-aggregator news-crawler newspaper-crawler python-newspaper python3 text-mining web-scraping webscraping

Last synced: 14 Jan 2026

https://github.com/niekdt/latrend

An R package for clustering longitudinal datasets in a standardized way, providing interfaces to various R packages for longitudinal clustering, and facilitating the rapid implementation and evaluation of new methods

cluster-analysis clustering-evaluation clustering-methods data-science longitudinal-clustering longitudinal-data mixture-models r r-package time-series-analysis

Last synced: 13 Feb 2026

https://github.com/braph-software/BRAPH-2

BRAPH 2.0 is a comprehensive software package for the analysis and visualization of brain connectivity data, offering flexible customization, rich visualization capabilities, and a platform for collaboration in neuroscience research.

biomedical-engineering brain-connectivity-analysis brain-research computational-neuroscience connectomics data-analysis data-science data-visualization deep-learning graph-theory machine-learning matlab network-analysis neuroimaging neuroscience open-source reproducible-research research-tools scientific-software toolbox

Last synced: 01 May 2025

https://github.com/sandialabs/MEWS

Multi-scenario Extreme Weather Simulator (MEWS)

application data-science scr-2664 snl-applications

Last synced: 20 Jul 2025

https://github.com/blmoore/blogr

Scripts + data to recreate analyses published on http://benjaminlmoore.wordpress.com and http://blm.io

data-science dataviz r rstats statistics

Last synced: 30 Apr 2025

https://github.com/wri-dssg-omdena/policy-data-analyzer

Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.

active-learning bert data-science document-classification environmental huggingface incentives landscape-restoration lda machine-learning nlp policy sbert scraping scrapy sentence-transformers spyder text-classification topic transformers

Last synced: 27 Mar 2025

https://github.com/racinmat/mal-analysis

github repo for MyAnimeList analysis. Also links to the MAL dataset.

analysis anime crawling data-science kaggle-dataset mal scraped-data

Last synced: 07 Apr 2025

https://github.com/epistasislab/rebate

Relief Based Algorithms of ReBATE implemented in Python with Cython optimization. This repository is no longer being updated. Please see scikit-rebate.

cython data-science feature-selection

Last synced: 16 Apr 2025

https://github.com/jmaasch/sanzo

R Color Palettes Based on the Works of Sanzo Wada โ€“ย A CRAN Package

color-palettes data-analysis data-science data-visualization r sanzo-wada visualizations

Last synced: 17 Aug 2025

https://github.com/tirthajyoti/mlr

Multiple linear regression with statistical inference, residual analysis, direct CSV loading, and other features

analytics data-analytics data-science linear-regression machine-learning modeling predictive-modeling python regression scikit-learn statiscal-learning statistical-analysis statistics

Last synced: 28 Jun 2025

https://github.com/chainbound/apollo

cross-chain ETL tool for EVM chaindata

blockchain data-science dsl ethereum etl evm golang hcl web3

Last synced: 10 Apr 2025

https://github.com/fluhus/gostuff

Convenience packages for data science in Go.

data data-science data-structures go golang

Last synced: 12 Jan 2026

https://github.com/noaa-mdl/grib2io

Python interface to the NCEP G2C Library for reading and writing GRIB2 messages.

atmospheric-science data-science grib2 grib2-decoder grib2-encoder grib2-tables meteorology ncep ncep-grib2 ndfd-grib2 numpy python python3 weather weather-data

Last synced: 06 Apr 2025

https://github.com/mindinventory/bank-marketing-data-visualisation

This repository contains Python code for visualizing the Bank Marketing dataset using various data visualization techniques. The dataset is loaded from a CSV file, and both numerical and categorical features are explored using popular libraries such as Pandas, Matplotlib, Seaborn, and Plotly.

artificial-intelligence data-science data-visualization jupyter-notebook machine-learning matplotlib pandas plotly python seaborn

Last synced: 13 Apr 2025

https://github.com/ww-tech/primrose

Primrose modeling framework for simple production models

dag data-science datascience deployment machine-learning primrose python workflows

Last synced: 01 Aug 2025

https://github.com/fredhutch/wiki

SciWiki: Collective KnowledgeBase for Scientific Data and Use

bioinformatics community computing data-science documentation fhdasl open-science sciwiki wiki

Last synced: 11 Apr 2025

https://github.com/oslabs-beta/mlflow-js

A JavaScript client library for MLflow that streamlines machine learning lifecycle management in web environments.

ai data-science javascript machine-learning mlflow mlops node-js typescript

Last synced: 30 Apr 2025

https://github.com/kylegrealis/frogger

Collection of workflow tools to get a "jump" on your projects

data-science project-management quarto

Last synced: 25 Oct 2025

https://github.com/anilkumarteegala/wqu-ds-unit-2

This repo contains all the files material releated to WorldQuant University's Data Science Summer 2020 Session Unit 2: Machine Learning and Statistical Analysis

data-science machine-learning statistical-analysis wqu

Last synced: 02 Mar 2026

https://github.com/sharan-naribole/wlan_localization

A Machine Learning Approach to WLAN Fingerprinting based Localization

data-science localization machine-learning wi-fi

Last synced: 17 Mar 2026

https://github.com/noahgift/core-stats-datascience

Core Statistics for Datascience

core data-science pragmaticai statistics

Last synced: 14 Oct 2025

https://github.com/shixiangwang/self-study

My Self-Study Room: keep tidy and lightweight

bioinformatics data-science python r study study-room

Last synced: 12 Apr 2025

https://github.com/aphp/eds-scikit

eds-scikit is a Python library providing tools to process and analyse OMOP data

clinical-data-warehouse data-science medical omop python

Last synced: 12 Apr 2025

https://github.com/cortexflow/cortexbrain

CortexBrain is an ambitious open source project aimed at creating an intelligent, lightweight, and efficient service mesh architecture to seamlessly connect cloud and edge devices

big-data data-science devops docker edge-computing in-development k8s kubernetes machine-learning microservices-architecture open-source rust rust-lang service-mesh

Last synced: 11 Apr 2025

https://github.com/prakhar-ff13/customer-analytics

Machine Learning Case study on customer segmentation and prediction of groups.

analytics case-study data-analysis data-science data-visualization dimensionality-reduction machine-learning python python3

Last synced: 24 Jul 2025

https://github.com/tjmahr/madr_pipelines

Slides and materials for my talk to the Madison R Users Group

data-science dplyr magrittr presentation r

Last synced: 30 Apr 2025

https://github.com/m-clark/data-processing-and-visualization

This document forms the basis of several workshops/talks that get into everyday programming with R, but also includes mirrored code in Python as Jupyter notebooks.

data-processing data-science datatable dplyr ggplot2 htmlwidgets jupyter-notebooks machine-learning model-criticism modeling numpy pandas programming programming-exercises python r tidyverse visualization workshop workshops

Last synced: 02 Sep 2025

https://github.com/chris-greening/spyrograph

Python library for analyzing, exploring, and visualizing epitrochoids and hypotrochoids in just a few lines of code

beginner-friendly data-science data-visualization flexible hacktoberfest mathematics physics python python3

Last synced: 17 Jan 2026

https://github.com/Smat26/Roman-Urdu-Dataset

Compilation of Manually Tagged Roman Urdu Dataset (Urdu written in Latin/Roman Script), along with other helpful Roman Urdu NLP resources

data-science dataset hindi hindi-language natural-language-processing nlp urdu urdu-language urdu-nlp

Last synced: 13 May 2025

https://github.com/lenguyenthedat/minimal-datascience

This repository contains all the code and dataset used in my blog series: Minimal Data Science

blog-series data-science kaggle machine-learning python scikit-learn xgboost

Last synced: 13 Apr 2025

https://github.com/marvinbuss/explainableml-vision

This repository introduces different Explainable AI approaches and demonstrates how they can be implemented with PyTorch and torchvision. Used approaches are Class Activation Mappings, LIMA and SHapley Additive exPlanations.

cam class-activation-maps data-science explainable-ai explainable-deepneuralnetwork explainable-ml hymenoptera-dataset lime machine-learning notebook pytorch shap transfer-learning

Last synced: 29 Jul 2025

https://github.com/elysian01/codify

Codify enables data scientists to perform all the tedious and time-consuming tasks such as EDA (exploratory data analysis), data cleaning, data pre-processing, data visualization, modeling, and evaluation in the data-science life cycle, by only conveying the logic of the task in natural language (English) and the system will automatically give out all the relevant python code snippets.

ai ai-assistant autocomplete automation data-science data-science-tools final-year-project intent-classification ml named-entity-recognition nlp reactjs research-paper research-project

Last synced: 12 Apr 2025

https://github.com/petersontylerd/mlmachine

mlmachine accelerates machine learning experimentation

data-analysis data-science data-visualization machine-learning python

Last synced: 21 Mar 2025

https://github.com/microsoft/responsible-ai-toolbox-genbit

A tool for gender bias identification in text. Part of Microsoft's Responsible AI toolbox.

data-science fairness-ai fairness-ml fairnness gender gender-bias jupyter machine-learning natural-language natural-language-processing

Last synced: 27 Jul 2025

https://github.com/philips-software/latrend

An R package for clustering longitudinal datasets in a standardized way, providing interfaces to various R packages for longitudinal clustering, and facilitating the rapid implementation and evaluation of new methods

cluster-analysis clustering-evaluation clustering-methods data-science longitudinal-clustering longitudinal-data mixture-models r r-package time-series-analysis

Last synced: 13 Apr 2025

https://github.com/martin-sicho/genui

The backend services of the GenUI framework. The backend provides the REST API used for molecular generation, QSAR modelling and chemical space visualization.

chemical-space cheminformatics data-science molecular-generation qsar rest-api visualization webapp

Last synced: 19 Jan 2026

https://github.com/kwokhing/yandexcatboost-python-demo

Demo on the capability of Yandex CatBoost gradient boosting classifier on a fictitious IBM HR dataset obtained from Kaggle. Data exploration, cleaning, preprocessing and model tuning are performed on the dataset

catboost data-analysis data-preprocessing data-science feature-selection gradient-boosting gradient-boosting-classifier one-hot-encode pandas pearson-correlation python python27 seaborn variance-analysis visualization yandex-catboost

Last synced: 09 Apr 2025

https://github.com/fgazzelloni/tidytuesday

Explore fascinating TidyTuesday projects in my portfolio, showcasing data visualization and analysis skills.

data-science datavis datavisualisation datavisualization dataviz infographics rstats

Last synced: 02 Mar 2026

https://github.com/asavinov/machine-learning-and-data-processing

A collection of resources on machine learning, data processing and related areas

analytics big-data data-mining data-processing data-science databases machine-learning software stream-processing

Last synced: 02 Mar 2026

https://github.com/reymond-group/lore

WebGL engine for (big) data visualization.

3d-engine data data-science interactive visualization webgl

Last synced: 06 Mar 2026

https://github.com/jezcope/pyrefine

Execute OpenRefine JSON scripts without OpenRefine (or Java)

data-science data-wrangling openrefine python

Last synced: 17 Feb 2026

https://github.com/snowflake-labs/emerging-solutions-toolbox

The Emerging Solutions Toolbox is a collection of solutions created by Snowflake's Solution Innovation Team (SIT) that consists of demos, helpers, and frameworks to help you get the most out of Snowflake.

ai data-engineering data-science data-warehousing machine-learning native-apps notebooks optimization python snowflake streamlit

Last synced: 22 Jun 2025

https://github.com/JieZheng-ShanghaiTech/KG4SL

Synthetic lethality (SL) is a promising gold mine for the discovery of anti-cancer drug targets. KG4SL is the first graph neural network (GNN)-based model that uses knowledge graph for SL prediction.

ai4science bioinformatics cancer data-science drug-discovery machine-learning

Last synced: 06 May 2025

https://github.com/datawithbaraa/sql-data-analytics-project

This repository contains a collection of SQL scripts demonstrating various analytical techniques, such as changes over time, cumulative, performance, data segmentation, part-to-whole analysis.

analytics business-analytics business-intelligence data data-analysis data-analyst data-analytics data-engineering data-science data-scientist database datascience query reporting sql sql-queries sql-query sql-server window-functions window-functions-in-sql

Last synced: 15 Apr 2025

https://github.com/leerenjie/100-days-of-code-in-python

Udemy Angela Yu's course has 100 projects for students to make each day with classes for 2 hours each day. This repository will store all the related projects

100-days-of-code api backend-webdevelopment data-science database flask frontend-web game-development version-control

Last synced: 21 Aug 2025

https://github.com/hackersandslackers/pandas-sqlalchemy-tutorial

:panda_face: :computer: Load or insert data into a SQL database using Pandas DataFrames.

data-analysis data-science dataframes pandas pandas-sqlalchemy-tutorial python sql-database sqlalchemy tutorial

Last synced: 16 Apr 2025

https://github.com/hendersontrent/gam.jl

Fit, evaluate, and visualise generalised additive models (GAMs) in native Julia

data-science generalized-additive-models machine-learning regression statistical-models statistics

Last synced: 08 Apr 2025

https://github.com/srwi/pycharm-pixellens

Free PyCharm image viewer plugin for visualizing and debugging NumPy, OpenCV, PyTorch, TensorFlow, JAX, and PIL data.

data-science debugger-visualizer debugging machine-learning pycharm pycharm-plugin python

Last synced: 23 Oct 2025

https://github.com/alagoa/youtube-or-pornhub

Service identification on ciphered traffic.

capture data-science machinelearning ml pcap python3 spotify traffic tshark youtube

Last synced: 14 Oct 2025

https://github.com/rohan-paul/awesome-machine-learning-datascience_resources

Curated Collection of Online and Free Resources for serious learning of Machine Learning and Data Science.

artificial-intelligence data-science deep-learning deeplearning machine-learning mathematics

Last synced: 03 Jul 2025

https://github.com/alpstable/gidari

Transport web data to local/remote storage using Gidari

api-wrapper csv data-science go http mongodb storage

Last synced: 14 Jan 2026

https://github.com/mfcabrera/hooqu

hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to Python

data-quality data-quality-checks data-science

Last synced: 14 Jan 2026

https://github.com/hassaku/audio-plot-lib

This library provides graph sonification functions and has been developed for a project named "Data science and machine learning resources for screen reader users". Please refer to the project page for more details.

audio data-science google-colab graphs machine-learning python sonification visually-impaired

Last synced: 29 Oct 2025

https://github.com/sandialabs/mews

Multi-scenario Extreme Weather Simulator (MEWS)

application data-science scr-2664 snl-applications

Last synced: 20 Oct 2025

https://github.com/bluegreen-labs/daymetr

An R Interface to the Daymet Web Services

climate-data data-science daymet gridded-data netcdf ornl-daac r-package rstats

Last synced: 10 Apr 2025