An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/piesposito/tand

TanD - Train and Deploy is a no-code framework to automatize the Machine Learning workflow.

data-science fastapi machine-learning mlflow pytorch sklearn workflow-automation

Last synced: 24 Oct 2025

https://github.com/tushar2704/data-portfolio

This repository showcases my skills and experience in the field of data analysis. Here, you will find a collection of projects and analyses that demonstrate my ability to extract insights and make data-driven decisions.

artificial-intelligence data-science dataanalysis postgresql python r sql streamlit-tushar2704 tushar2704

Last synced: 14 Mar 2026

https://github.com/eren-ck/finch

A Python implementation of "FINCH Clustering Algorithm (CVPR 2019)"

cluster-analysis clustering clustering-algorithm clustering-evaluation data-science machine-learning

Last synced: 04 Apr 2026

https://github.com/nrc-cnrc/nrc-gamma

Large labelled dataset of real-life gas meter images — Vaste ensemble d'images réelles et étiquetées de compteurs de gaz.

ai computer-vision creative-commons crowdsourcing data data-science datanalytics dataset iot machine-learning machinelearning opendata

Last synced: 05 Mar 2026

https://github.com/anshumansinha3301/matplotlib_visualizations

Some Graphs using Matplotlib in Python

data-science matplotlib python

Last synced: 07 Oct 2025

https://github.com/hoxo-m/deltatest

R Package for Statistical Hypothesis Testing Using the Delta Method for Online A/B Testing

ab-testing data-science statistics

Last synced: 22 Oct 2025

https://github.com/robertvazan/sourceafis-visualization-java

Visualizations of biometric features in fingerprint templates produced by SourceAFIS and in algorithm transparency data captured during feature extraction and matching in SourceAFIS.

biometrics data-science feature-extraction fingerprint fingerprint-authentication minutia sourceafis visualization-library

Last synced: 14 Oct 2025

https://github.com/datumorphism/datumorphism.github.io

My knowledgebase on machine learning, data visualization, and some fun stuff.

artificial-intelligence data-science data-visualization giscus machine-learning statistics

Last synced: 24 Oct 2025

https://github.com/andrewhinh/captafied

Multimodal Table Understanding

data-science python

Last synced: 31 Jan 2026

https://github.com/kylegrealis/nascar.data

R package of NASCAR race results & other information

data-science data-visualization package r racing

Last synced: 25 Oct 2025

https://github.com/gi0na/r-ghypernet

R package for Generalised Hypergeometric Ensembles of Random Graphs (gHypEG)

data-mining data-science graphs network network-analysis random-graph-generation random-graphs

Last synced: 05 Feb 2026

https://github.com/aditeyabaral/kepler-exoplanet-analysis

Analysis of Kepler Objects of Interest using Machine Learning for Exoplanet Identification.

data-analytics data-science exoplanet-analysis exoplanets kepler machine-learning nasa space

Last synced: 16 Apr 2025

https://github.com/zincware/znnl

A Python package for studying neural learning

data-science data-selection machinelearning mathematics physics

Last synced: 09 Aug 2025

https://github.com/hoshibatista/base-of-ds

This repository serves as a foundation for projects in Data Science and Machine Learning.

clustering-algorithm data-science data-visualization machine-learning

Last synced: 05 Sep 2025

https://github.com/upsonic/server

Self-Driven Autonomous Python Libraries

data data-science gpt-4o library-management ml mlops python

Last synced: 22 Aug 2025

https://github.com/ahmed-maher77/wind-turbine-power-prediction-app-using-machine-learning

"Wind Power Predictor" is a machine learning project that forecasts turbine output using real-time data from Turkish wind farms. Its web app interface offers convenient access to predictions, enabling informed decisions for maximizing energy production and advancing renewable energy usage.

ai catboost data-analysis data-science flask html-css-javascript javascript machine-learning matplotlib numpy pandas predictive-modeling pwa python sklearn web web-development wind wind-turbine wind-turbine-operational-optimization

Last synced: 10 Apr 2025

https://github.com/giswqs/geog-312-2021

First Steps in GIS Programming (GEOG 312) at the University of Tennessee, Knoxville

data-science geopython geospatial gis jupyter mapping python

Last synced: 12 Jul 2025

https://github.com/niaid/r_intro

A Gentle Introduction to R, RStudio, and visualization

bcbb-training data-science machine-learning programming r visualization

Last synced: 28 Aug 2025

https://github.com/ramanks19/aiml-projects

Projects which were completed as part of assignments of Great Learning's PGP in Artificial Intelligence and Machine Learning

computer-vision data-science ensemble-machine-learning greatlearning neural-networks nlp-machine-learning recommendation-system supervised-learning unsupervised-learning

Last synced: 03 Jan 2026

https://github.com/sondosaabed/nics-firearm-background-checks-investigation

🔫 The data comes from the FBI's National Instant Criminal Background Check System. The NICS is used by to determine whether a prospective buyer is eligible to buy firearms or explosives. 🔫

census-data criminal-background data-analyst-nanodegree data-science data-wrangling data-wrangling-data-vis data-wrangling-data-visualisation fbi matplotlib nanodegree numpy pandas python storytelling-with-data usa

Last synced: 01 Jul 2025

https://github.com/tnwei/nbread

Snappy previews of Jupyter notebooks from the command line, with ranger integration

data-science jupyter python ranger

Last synced: 22 Apr 2025

https://github.com/phanatagama/data-science

🚀 This repository have an Data Science docs in JupyterNote. Using python-3 while learning about material DS.

big-data data-science image-processing matplotlib-pyplot numpy opencv pandas python3 scatter scipy

Last synced: 19 Apr 2025

https://github.com/albarsil/geneticml

A simple and lightweight genetic algorithm for optimization of any machine learning model

automl data-science genetic-algorithm machine-learning

Last synced: 13 Apr 2025

https://github.com/nagasaki45/dbdapy

Following "Doing Bayesian Data Analysis", in python

bayesian-data-analysis data-science pymc3

Last synced: 29 Jul 2025

https://github.com/lenguyenthedat/dextra-mindef-2015

My solution for Dextra Data Science Challenge #44 (Singapore Ministry of Defense) https://challenges.dextra.sg/challenge/44

classification data-science machine-learning xgboost

Last synced: 02 Jul 2025

https://github.com/pytask-dev/cookiecutter-pytask-project

A minimal cookiecutter template for a project with pytask.

cookiecutter data-science pytask research

Last synced: 26 Jul 2025

https://github.com/tushar2704/stats-mosaic

Statistical-Minds is a comprehensive GitHub repository that aims to provide a growing collection of curated content and projects centered around statistics and its intersection with data science, machine learning, and artificial intelligence.

aritificial-intelligence data-analytics data-science machine-learning statistical-learning statistical-methods statistics streamlit

Last synced: 07 Aug 2025

https://polis-community.github.io/red-dwarf/

A DIMensional REDuction library for stellarpunk democracy into the long haul. (Inspired by Pol.is)

civic-tech collective-intelligence data-science deliberative-democracy democracy dimensionality-reduction participatory-democracy polis

Last synced: 17 Apr 2025

https://github.com/santiagxf/mlproject-sample

Sample repository about how to structure an ML project using software engineering practices

data-science data-science-projects git machine-learning mlops

Last synced: 24 Apr 2025

https://github.com/pirocheto/phishing-url-detection

Train a machine learning model for Phishing URL Detection with mlops practices.

ai anti-phishing cybersecurity data-science machine-learning mlops phishing-detection

Last synced: 06 Apr 2025

https://github.com/samedwardes/pydatafaker

A python package to create fake data with relationships between tables.

data data-science fake-data python

Last synced: 23 Apr 2025

https://github.com/app-generator/devtool-data-converter

Open-Source Data Converter - CVS, XLS, DF | AppSeed

appseed-sample data-converter data-science

Last synced: 01 Aug 2025

https://github.com/sukanyabag/text-summarization-using-bert-gpt2-xlnet

This notebook leverages Transfer Learning Algorithms and standard NLP procedures to summarize a given paragraph meaningfully.

bert-model data-science gpt-2 huggingface-transformers machine-learning natural-language-processing textsummarization transfer-learning xlnet

Last synced: 24 Apr 2025

https://github.com/rueedlinger/ml-resources

A curated list of statistics, data visualization and machine learning resources which in find useful, have read or want to read.

curated-list data-science data-visualization deep-learning machine-learning statistics

Last synced: 01 Apr 2025

https://github.com/teddyoweh/cheat-model

NLP Text Binary Probabilistic Classification Model for predicting cheat statements

data-science machine-learning nlp tokenizer

Last synced: 23 Aug 2025

https://github.com/dsacms/deduplifhir

Prototype for basic deduplication and aggregation of eCQM data

ai cmsoss-tier3 data-science deduplication electron government healthcare poetry python

Last synced: 13 Apr 2025

https://github.com/dariodip/rfd-discovery

This project, written in Python and Cython, deals with Discovery of Relaxed Functional Dependencies(RFDs) using a bottom-up approach.

artificial-intelligence cython data-science python python-3 university-project

Last synced: 08 Sep 2025

https://github.com/syamkakarla98/datascience_head_start

This repository focuses on the building path for the data science.

data-analysis data-science data-visualization machine-learning machinelearning-python python3

Last synced: 03 May 2025

https://github.com/thecoderpinar/big-tech-financial-insights

🚀 A comprehensive project analyzing Big Tech stock prices using time series analysis, volatility modeling, and macroeconomic indicators. Featuring interactive dashboards and automated reporting! 📈💼

data-analysis data-science finance machine-learning macroeconomics stock-analysis time-series-analysis volatility-modeling

Last synced: 03 Apr 2025

https://github.com/mmore500/outset

add zoom indicators, insets, and magnified panels to matplotlib/seaborn visualizations with ease!

data-science data-visualization matplotlib pypi-package python seaborn

Last synced: 30 Apr 2025

https://github.com/tatevkaren/deep-learning-for-data-science

Deep Learning Case Studies with Tensorflow and Keras for Beginners-Advanced: ANN, CNN, RNN, Self-Organizing Maps, Boltzmann Machines, Stacked Autoencoders

ann artificial-intelligence artificial-neural-networks data-preprocessing data-science deep-learning ds keras modelling modelling-framework neural-networks numpy pandas python scikit-learn sklearn tensorflow

Last synced: 10 Apr 2025

https://github.com/oscarsaharoy/functionfit

generate functions by placing points on a graph

data-science regression

Last synced: 29 Oct 2025

https://github.com/shawn-shan/eru

High Level Framework for PyTorch

data-science deep-learning eru neural-network python pytorch

Last synced: 30 Apr 2025

https://github.com/sdcastillo/PA-R-Study-Manual

An online study guide for the SOA's predictive analytics exam.

data-science data-visualization machine-learning predictive-modeling r-programming

Last synced: 06 May 2025

https://github.com/SamEdwardes/pydatafaker

A python package to create fake data with relationships between tables.

data data-science fake-data python

Last synced: 09 Jul 2025

https://github.com/shuyib/chronic-kidney-disease-kaggle

Using machine learning models to predict if patients have chronic kidney disease based on a few features. The results of the models are also interpreted to make it more understandable to health practitioners.

data-cleaning-pipeline data-science data-transformation data-visualization diagnostics dimensionality-reduction feature-engineering feature-selection health-data-analysis health-data-science machine-learning machine-learning-algorithm machine-learning-algorithms model-interpretability preventative-medicine

Last synced: 19 Apr 2025

https://github.com/twipped/spiral

A bio-cycles tracker for all humans

biology data-science health mobile react-native transgender womens-health

Last synced: 10 Jul 2025

https://github.com/ritvik19/vizard

Intuitive, Interactive, Easy and Quick Visualizations for Data Science Projects

data-analysis data-science data-visualization

Last synced: 10 Apr 2025

https://github.com/tsdataclinic/TREC

Transit Resilience for Essential Commuting (TREC)

climate-change data-science transit-data

Last synced: 20 Jul 2025

https://github.com/olekscode/examples-pca-tsne

Some examples of using PCA and t-SNE for dimensionality reduction in Python and R

data-science dimensionality-reduction examples pca t-sne

Last synced: 18 Mar 2025

https://github.com/n1ghtf1re/map-of-emergency-incidents

Emergency Map allows you to effectively visualize multi-dimensional information, has an intuitive interface. The developed code is easily modified for use in a variety of areas. The use of color mixing technology enhances the perception and analysis of information

big-data big-data-analytics big-data-visualization bigdata color-mixing colors data data-analytics data-science data-visualization data-visualization-challenges data-visualization-simpler mysql open-source-project php student-project

Last synced: 18 Mar 2025

https://github.com/polyaxon/polyaxon-lib

Deep Learning and Reinforcement learning library for TensorFlow for building end to end models and experiments.

data-science deep-learning machine-learning reinforcement-learning tensorflow tensorflow-experiments

Last synced: 30 Sep 2025

https://github.com/open-risk/dataqualitytoolkit

Python toolkit for evaluating and visualizing the data quality of excel spreadsheets

data-quality data-quality-measurement data-science excel spreadsheet

Last synced: 23 Oct 2025

https://github.com/julienmalka/neuralnetwork

Small implementation of a neural network in Python

data-science machine-learning neural-network python

Last synced: 11 Apr 2025

https://github.com/tugot17/data-science-blog

Data science blog, https://tugot17.github.io/data-science-blog/

blog data-science xai

Last synced: 11 Jul 2025

https://github.com/psyplot/psyplot-gui

Graphical User Interface for the psyplot package

data-science gui interactive ipython psyplot qtconsole sphinx

Last synced: 02 May 2025

https://github.com/bcgov/ghg-emissions-indicator

R scripts for a GHG emissions indicator published on Environmental Reporting BC

data-science env r rstats

Last synced: 07 May 2025

https://github.com/cpcloud/dpyr

Python dplyr operations for SQL databases and pandas DataFrames

data-science dplyr postgres python python-3 python-library python3 sql sqlalchemy sqlite3

Last synced: 09 Sep 2025

https://github.com/trilemmafoundation/trilemma-beta

Official repo for the Trilemma Beta Tournament

bitcoin data-science forecasting tournament

Last synced: 30 Apr 2025

https://github.com/ryanlucas3/macrorandomforest

A modification of traditional random forest for time-series forecasting

data-science machine-learning random-forest time-series

Last synced: 10 Apr 2025

https://github.com/TrilemmaFoundation/Trilemma-Beta

Official repo for the Trilemma Beta Tournament

bitcoin data-science forecasting tournament

Last synced: 11 May 2025

https://github.com/tushar2704/pyverse-exploring-python-frameworks

This repository is the Ultimate guide to exploring and mastering Python Libraries & frameworks, collection of code and guide by me, Tushar!

artificial-intelligence data-analysis data-engineering data-science data-visualization machine-learning python streamlit-tushar2704 tushar2704 web-application

Last synced: 30 Oct 2025

https://github.com/mkesslerct/data_science_Python

Un curso de introducción a Data science con Python, impartido en la Escuela Técnica Superior de Ingeniería de Telecomunicaciones de la Universidad Politécnica de Cartagena

data-science python python3

Last synced: 24 Jun 2026

https://github.com/touppercase78/tiobe-index-ratings

Index Ratings for Popular Programming Languages from TIOBE

analysis data-science datasets index jupyter-notebook programming-languages python tiobe

Last synced: 01 Apr 2025

https://github.com/ugurcanerdogan/effects-of-moon-cycles-on-cryptocurrencies

BBM469*DSCP - Data Science Capstone Project - Do Lunar Phases affect Cryptocurrencies or not? : It has been on the social media agenda lately that the moon phases have some effects on "cryptocurrencies" but there is no research on it, it just qualifies as a realization. Here, our goal in this project is a statistical investigation of whether the different phases of the moon have an effect on cryptocurrencies.

bbm469 cryptocurrency data-science dscp lunar-phases moon-cycles moon-phase statistical-analysis technical-analysis

Last synced: 18 Mar 2025

https://github.com/yash22222/ibm-csrbox-internship-project

The objective of the Data Analytics internship at CSRBOX is to provide interns with hands-on experience in applying data analytics techniques to real-world projects in the field of corporate social responsibility (CSR). Interns will gain practical skills in data collection, cleaning, analysis, visualization, and reporting, while working on projects

data-mining data-preprocessing data-science exploratory-data-analysis feature-engineering lemmatization machine-learning pandas pos-tagging random-forest random-forest-classifier scikit-learn sentiment-analysis web-scraping wordcloud

Last synced: 22 Apr 2025

https://github.com/srohit0/ml-misc

Miscellaneous Machine Learning and Data Analysis Projects

colaboratory data-analysis data-science data-visualization google-colab machine-learning-algorithms

Last synced: 15 Apr 2025

https://github.com/oneoffcoder/zava

Parallel coordinates with grand tour for exploratory data visualization of massive and high-dimensional data

angular d3 data-science exploratory-data-visualization grand-tour parallel-coordinates python typescript

Last synced: 06 Apr 2025

https://github.com/ccao-data/model-condo-avm

Automated valuation model for all class 299 and 399 residential condominiums in Cook County

assessment condo data-science machine-learning model property-taxes r tidymodels

Last synced: 11 Apr 2025

https://github.com/zoltan-nz/ci-cd-pipeline-template-for-data-projects

CI/CD pipeline template for data science projects using GitLab CI and Kubernetes

cd ci ci-cd data-science docker gitlab gitlab-runner kubernetes python

Last synced: 07 Mar 2026

https://github.com/poopoothegorilla/fastframe

DataFrame project that utilizes Apache Arrow

apache-arrow data-science dataframe golang

Last synced: 12 Jun 2025