An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/holgern/pyrdatasets

2293 datasets from various R packages packed as DataFrames through compressed pickle files

data-science datasets python rdatasets

Last synced: 13 Jul 2025

https://github.com/agnostiqhq/tutorials_covalent_pydata_2023

Covalent tutorial notebooks and slides for PyData 2023, NYC

ai aws covalent data-science gena hpc llm ml pydata pydata-nyc

Last synced: 11 May 2025

https://github.com/tseemann/kounta

๐Ÿงฎ ๐Ÿ”ข Generate multi-sample k-mer count matrix from WGS

data-science genomics-data gwas kmer-counting machine-learning

Last synced: 12 Apr 2025

https://github.com/davidgasquez/datalab

โš—๏ธ A local and devcontainer friendly alternative to Google Colab

data-science docker jupyter-notebook

Last synced: 08 May 2025

https://github.com/dusenberrymw/systemml-nn

A deep learning library for Apache SystemML.

data-science deep-learning machine-learning neural-networks systemml

Last synced: 02 Sep 2025

https://github.com/psyplot/psy-maps

The psyplot plugin for visualizations on a map

cartopy cf-conventions data-science icon-esm matplotlib netcdf psyplot ugrid visualization xarray

Last synced: 03 Aug 2025

https://github.com/aditeyabaral/kepler-exoplanet-analysis

Analysis of Kepler Objects of Interest using Machine Learning for Exoplanet Identification.

data-analytics data-science exoplanet-analysis exoplanets kepler machine-learning nasa space

Last synced: 16 Apr 2025

https://github.com/app-generator/devtool-data-converter

Open-Source Data Converter - CVS, XLS, DF | AppSeed

appseed-sample data-converter data-science

Last synced: 01 Aug 2025

https://github.com/mrankitgupta/python-libraries-roadmap

I am sharing lessons in various Python Libraries from scratch to intermediate including practice sets which were useful into my journey of Data Science.

66daysofdata ai analytics ankitgupta artificial-intelligence data-science data-visualization libraries library machine-learning matplotlib mrankitgupta numpy pandas python python-libraries python-library pythonlib scikit-learn tensorflow

Last synced: 22 Apr 2025

https://github.com/vicotrbb/data_science

Repository created to store all my studies about data science, machine learning and artificial intelligence.

data-science machine-learning python roadmap studies

Last synced: 14 Apr 2025

https://github.com/phanatagama/data-science

๐Ÿš€ This repository have an Data Science docs in JupyterNote. Using python-3 while learning about material DS.

big-data data-science image-processing matplotlib-pyplot numpy opencv pandas python3 scatter scipy

Last synced: 19 Apr 2025

https://github.com/andrewhinh/captafied

Multimodal Table Understanding

data-science python

Last synced: 31 Jan 2026

https://github.com/tnwei/nbread

Snappy previews of Jupyter notebooks from the command line, with ranger integration

data-science jupyter python ranger

Last synced: 22 Apr 2025

https://github.com/adrienc21/vulpes

Vulpes: Test many classification, regression models and clustering algorithms to see which one is most suitable for your dataset

automl data-analysis data-science machine-learning models package python scikit-learn statistics

Last synced: 25 Oct 2025

https://github.com/oneoffcoder/zava

Parallel coordinates with grand tour for exploratory data visualization of massive and high-dimensional data

angular d3 data-science exploratory-data-visualization grand-tour parallel-coordinates python typescript

Last synced: 06 Apr 2025

https://github.com/niaid/r_intro

A Gentle Introduction to R, RStudio, and visualization

bcbb-training data-science machine-learning programming r visualization

Last synced: 28 Aug 2025

https://github.com/anshumansinha3301/matplotlib_visualizations

Some Graphs using Matplotlib in Python

data-science matplotlib python

Last synced: 07 Oct 2025

https://github.com/robertvazan/sourceafis-visualization-java

Visualizations of biometric features in fingerprint templates produced by SourceAFIS and in algorithm transparency data captured during feature extraction and matching in SourceAFIS.

biometrics data-science feature-extraction fingerprint fingerprint-authentication minutia sourceafis visualization-library

Last synced: 14 Oct 2025

https://github.com/pabannier/sparseglm

Fast and modular solver for sparse generalized linear models

data-science machine-learning optimization

Last synced: 10 Apr 2025

https://github.com/datumorphism/datumorphism.github.io

My knowledgebase on machine learning, data visualization, and some fun stuff.

artificial-intelligence data-science data-visualization giscus machine-learning statistics

Last synced: 24 Oct 2025

https://github.com/TrilemmaFoundation/Trilemma-Beta

Official repo for the Trilemma Beta Tournament

bitcoin data-science forecasting tournament

Last synced: 11 May 2025

https://github.com/orico/flexeegile

Extending Agile For AI & Data Teams

agile ai data data-science flexeegile methodology

Last synced: 08 Jan 2026

https://github.com/ahammadmejbah/artificial-intelligence-research-and-development-projects

The field of Artificial Intelligence (AI) is a frontier of computer science that focuses on creating systems capable of performing tasks that would typically require human intelligence. This encompasses a wide range of capabilities such as visual perception, speech recognition, decision-making, and language translation.

data-engineering data-science data-visualization database datascience deep-learning deep-learning-algorithms deep-neural-networks deep-reinforcement-learning machine-learning machine-learning-algorithms machine-vision machinelearning

Last synced: 27 Apr 2025

https://github.com/hoxo-m/deltatest

R Package for Statistical Hypothesis Testing Using the Delta Method for Online A/B Testing

ab-testing data-science statistics

Last synced: 22 Oct 2025

https://github.com/nagasaki45/dbdapy

Following "Doing Bayesian Data Analysis", in python

bayesian-data-analysis data-science pymc3

Last synced: 29 Jul 2025

https://github.com/pottekkat/heart-disease-classifier

Given clinical parameters of a patient, can we predict whether or not they have heart disease?

data-science data-visualization heart-disease-analysis heart-disease-predictor jupyter-notebook machine-learning

Last synced: 25 Oct 2025

https://github.com/teddyoweh/cheat-model

NLP Text Binary Probabilistic Classification Model for predicting cheat statements

data-science machine-learning nlp tokenizer

Last synced: 23 Aug 2025

https://github.com/dr-montasir/mnjs

MATH NODE JS (MNJS): A tiny math library for node.js & JavaScript on browser

data-analysis data-science javascript js jsdelivr library math nextjs npm react svelte sveltekit ts typescript yarn

Last synced: 26 Apr 2025

https://github.com/dsacms/deduplifhir

Prototype for basic deduplication and aggregation of eCQM data

ai cmsoss-tier3 data-science deduplication electron government healthcare poetry python

Last synced: 13 Apr 2025

https://github.com/curiousily/ml-in-the-browser-for-hackers-with-tensorflow-js

Machine Learning examples for beginners showing how to use TensorFlow.js in the browser

data-science linear-regression machine-learning tensorflow-js tensorflow-tutorials tensorflowjs

Last synced: 26 Apr 2025

https://github.com/arjunan-k/machine-learning

Machine Learning Specialization by Andrew Ng in collaboration between DeepLearning.AI and Stanford Online in Coursera.

data-science deep-learning neural-networks tensorflow

Last synced: 05 Oct 2025

https://github.com/chicolucio/ifood-case-data-analyst

Projeto de ensino para o curso Ciรชncia de Dados ministrado por mim na Hashtag

classification-model clustering data-science python segmentation sklearn sklearn-pipeline teaching

Last synced: 07 Oct 2025

https://polis-community.github.io/red-dwarf/

A DIMensional REDuction library for stellarpunk democracy into the long haul. (Inspired by Pol.is)

civic-tech collective-intelligence data-science deliberative-democracy democracy dimensionality-reduction participatory-democracy polis

Last synced: 17 Apr 2025

https://github.com/dlopezyse/drug-repurposing-using-kge

๐Ÿ’Š Drug repurposing using knowledge graph embeddings with a focus on vector-borne diseases

biotechnology data-science drug-repurposing health knowledge-graph machine-learning

Last synced: 28 Feb 2025

https://github.com/sdcastillo/PA-R-Study-Manual

An online study guide for the SOA's predictive analytics exam.

data-science data-visualization machine-learning predictive-modeling r-programming

Last synced: 06 May 2025

https://github.com/mmore500/outset

add zoom indicators, insets, and magnified panels to matplotlib/seaborn visualizations with ease!

data-science data-visualization matplotlib pypi-package python seaborn

Last synced: 30 Apr 2025

https://github.com/olekscode/examples-pca-tsne

Some examples of using PCA and t-SNE for dimensionality reduction in Python and R

data-science dimensionality-reduction examples pca t-sne

Last synced: 18 Mar 2025

https://github.com/n1ghtf1re/map-of-emergency-incidents

Emergency Map allows you to effectively visualize multi-dimensional information, has an intuitive interface. The developed code is easily modified for use in a variety of areas. The use of color mixing technology enhances the perception and analysis of information

big-data big-data-analytics big-data-visualization bigdata color-mixing colors data data-analytics data-science data-visualization data-visualization-challenges data-visualization-simpler mysql open-source-project php student-project

Last synced: 18 Mar 2025

https://github.com/ryanlucas3/macrorandomforest

A modification of traditional random forest for time-series forecasting

data-science machine-learning random-forest time-series

Last synced: 10 Apr 2025

https://github.com/cpcloud/dpyr

Python dplyr operations for SQL databases and pandas DataFrames

data-science dplyr postgres python python-3 python-library python3 sql sqlalchemy sqlite3

Last synced: 09 Sep 2025

https://github.com/syamkakarla98/datascience_head_start

This repository focuses on the building path for the data science.

data-analysis data-science data-visualization machine-learning machinelearning-python python3

Last synced: 03 May 2025

https://github.com/poopoothegorilla/fastframe

DataFrame project that utilizes Apache Arrow

apache-arrow data-science dataframe golang

Last synced: 12 Jun 2025

https://github.com/shuyib/chronic-kidney-disease-kaggle

Using machine learning models to predict if patients have chronic kidney disease based on a few features. The results of the models are also interpreted to make it more understandable to health practitioners.

data-cleaning-pipeline data-science data-transformation data-visualization diagnostics dimensionality-reduction feature-engineering feature-selection health-data-analysis health-data-science machine-learning machine-learning-algorithm machine-learning-algorithms model-interpretability preventative-medicine

Last synced: 19 Apr 2025

https://github.com/julienmalka/neuralnetwork

Small implementation of a neural network in Python

data-science machine-learning neural-network python

Last synced: 11 Apr 2025

https://github.com/tsdataclinic/TREC

Transit Resilience for Essential Commuting (TREC)

climate-change data-science transit-data

Last synced: 20 Jul 2025

https://github.com/open-risk/dataqualitytoolkit

Python toolkit for evaluating and visualizing the data quality of excel spreadsheets

data-quality data-quality-measurement data-science excel spreadsheet

Last synced: 23 Oct 2025

https://github.com/polyaxon/polyaxon-lib

Deep Learning and Reinforcement learning library for TensorFlow for building end to end models and experiments.

data-science deep-learning machine-learning reinforcement-learning tensorflow tensorflow-experiments

Last synced: 30 Sep 2025

https://github.com/SamEdwardes/pydatafaker

A python package to create fake data with relationships between tables.

data data-science fake-data python

Last synced: 09 Jul 2025

https://github.com/dariodip/rfd-discovery

This project, written in Python and Cython, deals with Discovery of Relaxed Functional Dependencies(RFDs) using a bottom-up approach.

artificial-intelligence cython data-science python python-3 university-project

Last synced: 08 Sep 2025

https://github.com/twipped/spiral

A bio-cycles tracker for all humans

biology data-science health mobile react-native transgender womens-health

Last synced: 10 Jul 2025

https://github.com/imsanjoykb/uber-rides-prediction-flask-deploy

This repository consists of files required to deploy a **Machine Learning** Web App created with **Flask**

data-science data-visualization deployment flask machine-learning ml-algorithms predictive-modeling uber-rides-prediction

Last synced: 30 Oct 2025

https://github.com/ritvik19/vizard

Intuitive, Interactive, Easy and Quick Visualizations for Data Science Projects

data-analysis data-science data-visualization

Last synced: 10 Apr 2025

https://github.com/bradleyboehmke/cinday-rug-iml-2018

Slides and other material for Cincinnati-Dayton useR presentation on interpretable machine learning with R

data-science interpretable-machine-learning machine-learning r shortcourse-material tutorial tutorial-code

Last synced: 13 Apr 2025

https://github.com/bcgov/ghg-emissions-indicator

R scripts for a GHG emissions indicator published on Environmental Reporting BC

data-science env r rstats

Last synced: 07 May 2025

https://github.com/shawn-shan/eru

High Level Framework for PyTorch

data-science deep-learning eru neural-network python pytorch

Last synced: 30 Apr 2025

https://github.com/oscarsaharoy/functionfit

generate functions by placing points on a graph

data-science regression

Last synced: 29 Oct 2025

https://github.com/psyplot/psyplot-gui

Graphical User Interface for the psyplot package

data-science gui interactive ipython psyplot qtconsole sphinx

Last synced: 02 May 2025

https://github.com/tugot17/data-science-blog

Data science blog, https://tugot17.github.io/data-science-blog/

blog data-science xai

Last synced: 11 Jul 2025

https://github.com/trilemmafoundation/trilemma-beta

Official repo for the Trilemma Beta Tournament

bitcoin data-science forecasting tournament

Last synced: 30 Apr 2025

https://github.com/thecoderpinar/big-tech-financial-insights

๐Ÿš€ A comprehensive project analyzing Big Tech stock prices using time series analysis, volatility modeling, and macroeconomic indicators. Featuring interactive dashboards and automated reporting! ๐Ÿ“ˆ๐Ÿ’ผ

data-analysis data-science finance machine-learning macroeconomics stock-analysis time-series-analysis volatility-modeling

Last synced: 03 Apr 2025

https://github.com/tushar2704/pyverse-exploring-python-frameworks

This repository is the Ultimate guide to exploring and mastering Python Libraries & frameworks, collection of code and guide by me, Tushar!

artificial-intelligence data-analysis data-engineering data-science data-visualization machine-learning python streamlit-tushar2704 tushar2704 web-application

Last synced: 30 Oct 2025

https://github.com/tatevkaren/deep-learning-for-data-science

Deep Learning Case Studies with Tensorflow and Keras for Beginners-Advanced: ANN, CNN, RNN, Self-Organizing Maps, Boltzmann Machines, Stacked Autoencoders

ann artificial-intelligence artificial-neural-networks data-preprocessing data-science deep-learning ds keras modelling modelling-framework neural-networks numpy pandas python scikit-learn sklearn tensorflow

Last synced: 10 Apr 2025

https://github.com/zincware/znnl

A Python package for studying neural learning

data-science data-selection machinelearning mathematics physics

Last synced: 09 Aug 2025

https://github.com/mayer79/statistical_computing_material

Material for the lecture Statistical Computing

data-science machine-learning r statistics

Last synced: 01 May 2025

https://github.com/zoltan-nz/ci-cd-pipeline-template-for-data-projects

CI/CD pipeline template for data science projects using GitLab CI and Kubernetes

cd ci ci-cd data-science docker gitlab gitlab-runner kubernetes python

Last synced: 07 Mar 2026

https://github.com/albarsil/geneticml

A simple and lightweight genetic algorithm for optimization of any machine learning model

automl data-science genetic-algorithm machine-learning

Last synced: 13 Apr 2025

https://github.com/akbaritabar/dask-duckdb-dbeaver

Parallelised and out of memory data analysis using Dask in Python and DuckDB and DBeaver in SQL. Using example of publicly accessible ORCID 2019 XML files

data-analysis data-science pandas parallel-computing python

Last synced: 08 Aug 2025

https://github.com/touppercase78/tiobe-index-ratings

Index Ratings for Popular Programming Languages from TIOBE

analysis data-science datasets index jupyter-notebook programming-languages python tiobe

Last synced: 01 Apr 2025

https://github.com/ramanks19/aiml-projects

Projects which were completed as part of assignments of Great Learning's PGP in Artificial Intelligence and Machine Learning

computer-vision data-science ensemble-machine-learning greatlearning neural-networks nlp-machine-learning recommendation-system supervised-learning unsupervised-learning

Last synced: 03 Jan 2026

https://github.com/rueedlinger/ml-resources

A curated list of statistics, data visualization and machine learning resources which in find useful, have read or want to read.

curated-list data-science data-visualization deep-learning machine-learning statistics

Last synced: 01 Apr 2025

https://github.com/rajveersinghcse/excelr-assignments

๐Ÿ™‡This GitHub repository hosts my internship assignment projects, tasks, and reports, showcasing my skills and contributions during the internship period.

assignments data-science excelr excelr-assignments excelr-assignments-github

Last synced: 04 May 2025

https://github.com/lenguyenthedat/dextra-mindef-2015

My solution for Dextra Data Science Challenge #44 (Singapore Ministry of Defense) https://challenges.dextra.sg/challenge/44

classification data-science machine-learning xgboost

Last synced: 02 Jul 2025

https://github.com/srohit0/ml-misc

Miscellaneous Machine Learning and Data Analysis Projects

colaboratory data-analysis data-science data-visualization google-colab machine-learning-algorithms

Last synced: 15 Apr 2025

https://github.com/ccao-data/model-condo-avm

Automated valuation model for all class 299 and 399 residential condominiums in Cook County

assessment condo data-science machine-learning model property-taxes r tidymodels

Last synced: 11 Apr 2025