An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/mdeff/ntds_2017

Material for the EPFL master course "A Network Tour of Data Science", edition 2017.

data-science education epfl graphs network-science

Last synced: 06 Jul 2025

https://github.com/fastai/book.fast.ai

Information for readers of the fastai book

data-science deep-learning machine-learning python pytorch teaching

Last synced: 24 Dec 2025

https://github.com/visual-layer/visuallayer

Simplify Your Visual Data Ops. Find and visualize issues with your computer vision datasets such as duplicates, anomalies, data leakage, mislabels and others.

cleaning computer computer-vision data data-science dataset datasets-preparation generative machine-learning python vision

Last synced: 19 Apr 2025

https://github.com/localcascadeensemble/lce

Random Forest or XGBoost? It is Time to Explore LCE

classification data-science machine-learning python regression scikit-learn-api

Last synced: 13 Apr 2025

https://github.com/gurupatil0003/python_tutorial

Python is a high-level, interpreted programming language known for its simplicity and readability.Python emphasizes code readability and allows programmers to express concepts in fewer lines of code compared to languages like C++ or Java.

data-science database library modules oops-in-python operator python

Last synced: 09 Apr 2025

https://github.com/charmve/paperweeklyai

๐Ÿ“šใ€Œ@MaiweiAIใ€Studying papers in the fields of computer vision, NLP, and machine learning algorithms every week.

advanced applied-machine-learning computer-vision data-mining data-science deep-learning machine-learning machine-learning-algorithms nlp paper-with-code papers study-papers tutorials

Last synced: 23 Jun 2025

https://github.com/dayyass/qaner

Unofficial implementation of QaNER: Prompting Question Answering Models for Few-shot Named Entity Recognition.

data-science machine-learning named-entity-recognition natural-language-processing ner nlp python python3 question-answering

Last synced: 13 Apr 2025

https://github.com/randyzwitch/streamlit-embedcode

Streamlit component for embedding code snippets such as GitHub gists, CodePen snippets, Gitlab snippets, etc.

data-analysis data-science data-visualization python streamlit streamlit-component

Last synced: 12 May 2025

https://github.com/dwhitena/oreilly-ai-k8s-tutorial

Materials for the "AI on Kubernetes" tutorial at O'Reilly AI SF 2018

ai data-science deep-learning docker kubernetes machine-learning

Last synced: 13 Sep 2025

https://github.com/hemansnation/machine-learning-mlops-generativeai-nlp-cv-mlsystem-design

MLOps - Deploy models at scale, Generative AI - Build applications with LLMs, NLP - Understand Transformers & Text Generation Models, Computer Vision - Build GANs projects like Deepfakes, ML System Design, hands-on project building and code algorithms from scratch.

computer-vision data-science deep-learning generative-ai machine-learning natural-language-processing python

Last synced: 15 Apr 2025

https://github.com/loukesio/ggvolc

๐ ๐ ๐ฏ๐จ๐ฅ๐œ effortlessly translates differential expression datasets and RNAseq data into informative volcano plots. Highlight genes of interest with unprecedented ease. With just a single line of code, visualize complex datasets, gaining deeper insights and simplifying data representation

bioinformatics data-science data-visualization gro-seq rna-seq

Last synced: 02 Sep 2025

https://github.com/hannansatopay/roughviz

A Python visualization library for creating sketchy/hand-drawn styled charts.

charts data-science hacktoberfest jupyter-notebook python-visualization roughviz vizualisation

Last synced: 14 Apr 2025

https://github.com/LaihoE/did-it-spill

Check if you have training samples in your test set

computer-vision data-science deep-learning pytorch semantic-similarity time-series

Last synced: 01 May 2025

https://github.com/rickiepark/hg-da

<ํ˜ผ์ž ๊ณต๋ถ€ํ•˜๋Š” ๋ฐ์ดํ„ฐ ๋ถ„์„ with ํŒŒ์ด์ฌ>์˜ ์ฝ”๋“œ ์ €์žฅ์†Œ

data-analysis data-science data-visualization machine-learning matplotlib numpy pandas scikit-learn scipy

Last synced: 06 Apr 2025

https://github.com/renumics/sliceguard

A library for detecting problematic data segments in structured and unstructured data with few lines of code.

data-analysis data-cleaning data-curation data-exploration data-science data-visualization deep-learning eda exploratory-data-analysis machine-learning python visualization

Last synced: 16 Mar 2025

https://github.com/tatevkaren/tatevkaren-data-science-portfolio

Data Science Portfolio of Tatev Karen Aslanyan including Case Studies and Research Projects that I have completed that solve business problems or introduce new products. Case Study papers, codes, and additional resources are all included.

blog case-study computer-science data-analysis data-science deep-learning econometrics machine-learning papers portfolio portfolio-website statistics

Last synced: 10 Apr 2025

https://github.com/opengeos/streamlit-map-template

A streamlit template for mapping applications

data-science geospatial mapping python streamlit

Last synced: 07 Apr 2025

https://github.com/cihat/datastructure

๐Ÿ“Œ๐Ÿ”Ž๐Ÿ“ Veri Yapฤฑlarฤฑ (BMU221) ve bรผtรผn derslerin dokรผmantasyonu. Notes and examples in the data structure and all lessons course. Data Structures with Java.

bilgisayar-muhendisligi computer-science data-science data-structure data-structure-blogs data-structures data-structures-and-algorithms documentation turkce-dokumantasyon veri-bilimi veri-yapilari

Last synced: 23 Jan 2026

https://github.com/codeperfectplus/machine-learning-web-applications

Data science web project implemented in Django framework.

data-science django portfolio python python3

Last synced: 13 May 2025

https://github.com/dask-contrib/dask-awkward

Native Dask collection for awkward arrays, and the library to use it.

columnar-format dask data-analysis data-science data-structure jagged-array python ragged-array

Last synced: 12 Apr 2025

https://github.com/almost-matching-exactly/dame-flame-python-package

A Python Package providing two algorithms, DAME and FLAME, for fast and interpretable treatment-control matches of categorical data

causal-inference data-science econometrics machine-learning matching python

Last synced: 09 Apr 2026

https://github.com/ucd-dnp/leila

Librerรญa para la evaluaciรณn de calidad de datos, e interacciรณn con el portal de datos.gov.co

data-quality data-science eda espanol exploratory-data-analysis python report-generator ucd

Last synced: 05 Apr 2026

https://github.com/bnosac/crfsuite

Labelling Sequential Data in Natural Language Processing with R - using CRFsuite

chunking conditional-random-fields crf crfsuite data-science intent-classification natural-language-processing ner nlp r r-package

Last synced: 15 Mar 2026

https://github.com/apple/ml-symphony

Symphony: Interactive Data Widgets (CHI 2022)

computational-notebooks data-science data-visualization machine-learning

Last synced: 19 Oct 2025

https://github.com/france-travail/gabarit

Gabarit : kickstart your data science project from scratch

data-science deep-learning machine-learning python

Last synced: 09 Apr 2025

https://github.com/terryyz/pyarmadillo

PyArmadillo: an alternative approach to linear algebra in Python

armadillo-library calculations data-science linear-algebra machine-learning

Last synced: 15 Jun 2025

https://github.com/tf-encrypted/moose

Secure distributed dataflow framework for encrypted machine learning and data processing

cryptography data-science distributed-computing machine-learning privacy secure-computation

Last synced: 08 May 2025

https://github.com/octoenergy/timeserio

Better `keras` models for time series and beyond

data data-science

Last synced: 24 Jun 2025

https://github.com/alexioannides/ml-workflow-automation

Python Machine Learning (ML) project that demonstrates the archetypal ML workflow within a Jupyter notebook, with automated model deployment as a RESTful service on Kubernetes.

classification data-science flask helm jupyter-notebook kaggle kubernetes machine-learning mlops numpy pandas python rest-api sklearn

Last synced: 21 Mar 2025

https://github.com/datamole-ai/edvart

An open-source Python library for Data Scientists & Data Analysts designed to simplify the exploratory data analysis process. Using Edvart, you can explore data sets and generate reports with minimal coding.

analysis data-analysis data-science data-visualization data-viz eda exploration exploratory-data-analysis exploratory-data-analysis-eda plots python

Last synced: 11 Feb 2026

https://github.com/balapriyac/data-science-tutorials

If you're coming from one of my data science tutorials, you'll find the code and the links to the tutorials here. I hope you find them helpful. Happy learning and coding!

data-science python tutorial-sourcecode

Last synced: 02 Jul 2025

https://github.com/symmetryinvestments/excel-d

Excel API bindings and wrapper API for D

ctfe data-science dlang excel metaprogramming native sdk wrapper-api xls xlsw

Last synced: 24 Jan 2026

https://github.com/seandavi/sars2pack

An R package with over 50 highly cited, read-to-use, up-to-date COVID-19 pandemic data resources

biomedical-data coronavirus coronavirus-tracking covid-19 data-science data-visualization datascience datasets epidemics epidemiology geospatial public-health rstats rstats-package

Last synced: 25 Feb 2026

https://github.com/AstraZeneca/judgyprophet

Forecasting for knowable future events using Bayesian informative priors (forecasting with judgmental-adjustment).

ai bayesian data-science forecasting machine-learning python statistics

Last synced: 28 Sep 2025

https://github.com/astrazeneca/judgyprophet

Forecasting for knowable future events using Bayesian informative priors (forecasting with judgmental-adjustment).

ai bayesian data-science forecasting machine-learning python statistics

Last synced: 08 May 2025

https://github.com/dataprofessor/streamlit-for-datascience

The Streamlit for Data Science shows how to build interactive data apps powered by data visualization and machine learning!!

data-science machine-learning numpy pandas python

Last synced: 19 Jun 2025

https://github.com/mganjoo/apple-health-exporter

Python module to export Apple Health dump file to a data frame for analysis

data-science python r

Last synced: 03 Mar 2025

https://github.com/mratsim/mckinsey-smartcities-traffic-prediction

Adventure into using multi attention recurrent neural networks for time-series (city traffic) for the 2017-11-18 McKinsey IronMan (24h non-stop) prediction challenge

data-science deep-learning keras machine-learning neural-networks tensorflow time-series

Last synced: 30 Apr 2025

https://github.com/ashishpatel26/datascienv

datascienv is package that helps you to setup your environment in single line of code with all dependency and it is also include pyforest that provide single line of import all required ml libraries

catboost data-science data-science-env datascienv imbalanced-data lightgbm matplotlib numpy pandas pycaret scikit-learn seaborn tensorflow2 xgboost

Last synced: 24 Oct 2025

https://mlverse.github.io/mall/

Run multiple LLM predictions against a data frame with R and Python

data-science dplyr llm polars python r

Last synced: 06 Apr 2025

https://github.com/tommyod/paretoset

Compute the Pareto (non-dominated) set, i.e., skyline operator/query.

data-mining data-science datascience multi-objective-optimization optimization pandas skyline-query

Last synced: 05 Apr 2025

https://github.com/mlverse/mall

Run multiple LLM predictions against a data frame with R and Python

data-science dplyr llm polars python r

Last synced: 24 Oct 2025

https://github.com/noopeeks/datanvim

A fully-featured batteries-included Neovim distribution for the world of Data Science. Prepared to run code and interact with Jupyter Notebooks without ever leaving your terminal.

data data-science distribution jupyter-notebook machine-learning neovim nvim nvim-config text-editor vim

Last synced: 06 Oct 2025

https://github.com/gabrieltseng/datascience-projects

A collection of personal data science projects

data-science machine-learning

Last synced: 18 Jan 2026

https://ddotta.github.io/cookbook-rpolars/

Cookbook to provide solutions to common tasks and problems in using Polars with R

benchmark cookbook data-engineering data-science datatable dplyr polars r tidyr

Last synced: 13 May 2025

https://github.com/ajl2718/whereabouts

Fast, accurate, open-source geocoding in Python

data-science duckdb geocoding geospatial record-linkage

Last synced: 26 Aug 2025

https://github.com/mdeff/ntds_2018

Material for the EPFL master course "A Network Tour of Data Science", edition 2018.

data-science education epfl graphs network-science

Last synced: 12 Jul 2025

https://github.com/jmwoloso/pychattr

Python Channel Attribution (pychattr) - A Python implementation of the excellent R ChannelAttribution library

channel-attribution data-analysis data-science machine-learning python python-channel-attribution rpy2 wrapper

Last synced: 06 May 2025

https://github.com/tgsmith61591/skoot

A package for data science practitioners. This library implements a number of helpful, common data transformations with a scikit-learn friendly interface in an effort to expedite the modeling process.

data-science imbalanced-data machine-learning pandas python scikit-learn skutil

Last synced: 11 Sep 2025

https://github.com/aravind-selvam/forest-fire-prediction

Project for Predicting Algerian Forest Fires and Fire Weather Index Using Machine Learning with Python.

classification-model data-science flask-application jupyter-notebook machine-learning ml prediction-model python regression-models sklearn

Last synced: 11 Apr 2025

https://github.com/splunk/splunk-mltk-container-docker

Splunk App for Data Science and Deep Learning - container images repository

agentic ai artificial-intelligence data-science deep-learning docker llm machine-learning rag splunk splunk-ai

Last synced: 11 Oct 2025

https://github.com/lkuffo/data-viz

Mรกs de 50 ejemplos de visualizaciones y anรกlisis de datos en Matplotlib, Pandas, Seaborn, Plotly, Bokeh y Networkx

data-analysis data-science dataviz geoviz jupyter jupyter-notebook matplotlib networkx pandas plotly python seaborn

Last synced: 30 Jul 2025

https://github.com/tatevkaren/free-resources-books-papers

Books and Papers in Mathematics, Econometrics, Machine Learning, Finance etc for different levels that can be useful for Data Scientists, Developers and everyone whoo is interesting in STEM.

books data-science databricks delta-lake developers econometrics free-books free-resources machine-learning mathematics statistics

Last synced: 17 Feb 2026

https://github.com/mine-cetinkaya-rundel/teach-r-online

Materials for the Teaching statistics and data science online workshops in July 2020

data-science education rstats statistics

Last synced: 08 Apr 2025

https://github.com/antononcube/mathematicavsr

Example projects, code, and documents for comparing Mathematica with R.

comparison data-analysis data-science machine-learning mathematica r time-series

Last synced: 17 Oct 2025

https://github.com/druths/xp

A framework (comand line tool + libraries) for creating flexible compute pipelines

data-science notebook pipeline research-tool workflow

Last synced: 27 Mar 2026

https://github.com/junpenglao/planet_sakaar_data_science

A colourful collection of codes and notebooks, like Planet Sakaar

bayesian-inference data-science pymc3

Last synced: 06 May 2025