An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/dimzachar/datatalksclub-projects

Streamlit-Powered DataTalksClub Project Analyzer: Interactive Insights at Your Fingertips

data-science gpt machine-learning openai python streamlit vizualisation

Last synced: 18 Jul 2025

https://github.com/brianruizy/2019-microsoft-iot-hackathon

๐Ÿฅ‡ 1st place winner | Bump.IT - Pothole detection and mapping. Using data science methods of analysis, mobile phone's telemetry, computer vision, and, deployed through Azure.

computer-vision data-science geocoding internet-of-things pothole-detection

Last synced: 19 Mar 2025

https://github.com/tushar2704/sql-portfolio

Collection of personal SQL projects and queries I've worked on, showcasing my skills and expertise in database management, data analysis, and data manipulation using SQL.

data data-analytics data-science dataanalysis datamanipulation machine-learning mysql postgresql sql streamlit-tushar2704 tushar2704

Last synced: 07 May 2025

https://github.com/OGFris/GoStats

GoStats is a go library for math statistics mostly used in ML domains, it covers most of the statistical measures functions.

data-science go golang gostats machine-learning math mathematics mit-license statistical-measures statistics stats

Last synced: 14 Mar 2025

https://github.com/alessandrocorradini/university-of-michigan-applied-data-science-with-python-specialization

Repository for the Applied Data Science with Python Specialization from University of Michigan on Coursera

coursera coursera-specialization data-science machine-learning mooc moocs

Last synced: 07 Sep 2025

https://github.com/bcgov/bcgroundwater

An R package to facilitate analysis and visualization of groundwater data from the British Columbia groundwater observation well network

data-science env r rstats

Last synced: 20 Jul 2025

https://github.com/amine-smahi/r-learning-journey

Some of the projects i made when starting to learn R for Data Science at the university

afc cpa data-cleaning data-integration data-science datascience r r-language

Last synced: 18 Mar 2025

https://github.com/alastairrushworth/tdf

๐Ÿšด๐Ÿ…๐Ÿ“ŠTour de France winners and stages data

data-science dataframe exploratory-data-analysis rstats tdf tour-de-france

Last synced: 13 Apr 2025

https://github.com/PySloth/pysloth

A Python Package for Probabilistic Prediction

data-analysis data-science machine-learning python statistics

Last synced: 11 May 2025

https://github.com/gyrdym/ml_preprocessing

Implementation of popular data preprocessing algorithms for Machine learning

data-preprocessing data-science machine-learning machine-learning-algorithms onehot-encoder ordinal-encoder

Last synced: 21 Mar 2025

https://github.com/tjpalanca/facebook-news-analysis

Analysis of Facebook News in the Philippines

analysis data data-science facebook news philippines

Last synced: 07 Mar 2026

https://github.com/stefen-taime/car-price-predictor

Predicting Car Prices with FastAPI, Streamlit, MLflow, Kafka, and Debezium: A Practical Demonstration

data data-science dataanalysis-projects engineering machine-learning mlops predictive-modeling

Last synced: 04 Aug 2025

https://github.com/rubydamodar/the-ultimate-pandas-bootcamp

Welcome to the Pandas for Data Science repository! This course is designed to take you from beginner to proficient in using Pandas, the powerful data manipulation library in Python. Whether you're just starting your data science journey or looking to sharpen your skills, this repository contains all the resources

beginner-friendly csv-data data-analysis data-cleaning data-manipulation data-science data-visualization dataframe exploratory-data-analysis jupyter-notebook machine-learning matplotlib numpy pandas python python-pandas series statistical-analysis time-series titanic-dataset

Last synced: 19 Apr 2025

https://github.com/systamental/cryptodatapy

CryptoDataPy is a python library that makes it easy to build high quality data pipelines for the analysis of cryptoassets

alternative-data cryptoassets data-science etl-pipeline market-data on-chain-data pandas python

Last synced: 06 Aug 2025

https://github.com/icaropires/pdf2dataset

Converts a whole subdirectory with a big (or small) volume of PDF documents to a dataset (pandas DataFrame) with error tracking and choice of features

data-science distributed-computing distributed-systems ocr pandas-dataframe parallel parquet pdf pdf2image pdftotext pyarrow pytesseract pytesseract-ocr python python3 ray tesseract tesseract-ocr

Last synced: 13 Apr 2025

https://github.com/fbruzzesi/sklearn-smithy

Toolkit to forge scikit-learn compatible estimators

cli data-science machine-learning python scikit-learn webui

Last synced: 16 Sep 2025

https://github.com/eshikashah/skillship-internship-data-science-projects

Utilized this lockdown to do something productive. SkillShip foundation provided and internship opportunity and here's the outcome. The projects made by me in these 2 months.

classification data-science internship machine-learning regression

Last synced: 28 Jul 2025

https://github.com/mohammed-majid/ml_roadmap

Comprehensive Machine Learning Roadmap

algorithms data-science deep-learning machine-learning roadmap

Last synced: 06 Mar 2025

https://github.com/psyplot/psy-view

An ncview-like GUI with psyplot

data-science gui netcdf psyplot visualization

Last synced: 20 Aug 2025

https://github.com/ahmetfurkandemir/online-istanbul-applied-data-science-102-bootcamp

Online Istanbul Applied Data Science 102 Bootcamp (Start : 15 August, Finish : 7 November)

bootcamp data-science deep-learning kodluyoruz machine-learning

Last synced: 15 Apr 2025

https://github.com/alvertogit/bigdata_docker

Big Data Docker Data Science Spark Spark4 Hadoop HDFS Scala Python Artificial Intelligence Machine Learning Jupyter Lab Notebook

big-data data-science docker jupyter-lab jupyter-notebook machine-learning python scala spark spark4

Last synced: 10 Mar 2026

https://github.com/UtrechtUniversity/iBridges

A wrapper around the python-irodsclient to allow for easy interaction with iRODS servers.

data-analysis data-engineering data-science datascience irods-client

Last synced: 29 Jun 2025

https://github.com/squey/squey

Squey is a visualization software designed to interactively explore and understand large amounts of tabular data (this is the read-only mirror of https://gitlab.com/squey/squey)

cybersecurity data-analysis data-science data-visualization exploratory-data-visualizations parallel-coordinates parquet parquet-files parquet-viewer pcap timeseries timeseries-analysis visualization

Last synced: 08 Mar 2025

https://github.com/maastrichtlawtech/case-law-explorer

โ˜๏ธ A network analysis software platform for analyzing Dutch and European court decisions.

case-law data-science network-analysis

Last synced: 23 Jan 2026

https://github.com/milos-agathon/crisp-topographical-map-with-r

In this repo, I'll show you how to programatically access satellite imagery from several APIs to create such a map of Italy. We will use a single interface to query the data without even downloading raster data to your local drive ๐Ÿ˜ฒ. For a tutorial please visit https://milospopovic.net/crisp-topography-map-with-r/

data-science data-visualization gis maps r satellite-imagery topography

Last synced: 04 Apr 2026

https://github.com/cusyio/python4datascience

Teaching materials for the cusy training courses on Python-based data science workflows: https://cusy.io/en/seminars

data-science datascience dvc git ipython numpy pandas python

Last synced: 05 Sep 2025

https://github.com/zakroum-hicham/football-analysis-cv

This repository contains a computer vision/machine learning football project that uses YOLO for object detection, Kmeans for pixel segmentation, and perspective transformation to analyze player movements in football videos

ai computer-vision data-science football-analytics kmeans-clustering machine-learning opencv yolov8

Last synced: 26 Mar 2025

https://github.com/vsimkus/torch-reparametrised-mixture-distribution

PyTorch implementation of the mixture distribution family with implicit reparametrisation gradients.

data-science gradients machine-learning mixture-distributions mixture-model mixture-of-gaussians pytorch variational-inference

Last synced: 10 Oct 2025

https://github.com/osl-pocs/skdata

Python tools for data analysis

data data-analysis data-science open-data python

Last synced: 23 Feb 2026

https://github.com/danlessa/coursera-xarray

Repository for the "Climate Geospatial Analysis with Python and Xarray" project on Coursera

climate-science course-project coursera data-science geospatial-analysis xarray

Last synced: 22 Jun 2025

https://github.com/tjmahr/polypoly

Helper functions for orthogonal polynomials in R

data-science r statistics

Last synced: 30 Apr 2025

https://github.com/mad-lab-fau/tpcp

Pipeline and Dataset helpers for complex algorithm evaluation.

algorithms biosignals data-management data-science machine-learning python

Last synced: 04 Feb 2026

https://github.com/iBridges-for-iRODS/iBridges

A wrapper around the python-irodsclient to allow for easy interaction with iRODS servers.

data-analysis data-engineering data-science datascience irods-client

Last synced: 14 Jul 2025

https://github.com/csinva/data-viz-utils

Functions for easily making publication-quality figures with matplotlib.

big-data data-analysis data-science data-visualization eda legend matplotlib python python3 scatterplot time-series

Last synced: 05 May 2025

https://github.com/umuthopeyildirim/flatironopensource

Flatiron School lessons for graduated students.

computer-science data-science javascript python ruby web-development

Last synced: 07 Mar 2026

https://github.com/mtpatter/mlflow-tutorial

Fully reproducible, Dockerized, step-by-step, tutorial on training and serving a simple sklearn classifier model using mlflow. Detailed blog post published on Towards Data Science.

data-science machine-learning mlflow mlflow-docker mlops tutorial

Last synced: 04 May 2025

https://github.com/ozguraslank/flexml

Easy-to-use and flexible AutoML library for Python

automl data-science machine-learning python scikit-learn

Last synced: 18 Jul 2025

https://github.com/yufree/datadown

ๆ•ฐๆฎๅˆ†ๆžๆฎ‹ๅท

bookdown chinese-simplified data-science r statistics

Last synced: 18 Mar 2025

https://github.com/robinlovelace/opengeohub2023

Content for lecture at OpenGeoHub 2023 on spatial data and the tidyverse

course data-science opengeohub osgeo practical r reproducible summer-school tidy-data

Last synced: 20 Mar 2025

https://github.com/duo-labs/datasci-ctf

A capture-the-flag exercise based on data analysis challenges

ctf data-science

Last synced: 30 Apr 2025

https://github.com/rezapace/komputasi-big-data

This repository contains materials and practical exercises for learning Python in the context of Big Data Computation. The focus is on analyzing and processing large datasets using various tools and techniques.

ai big data data-science git-reza gunadarma gundar komputasi-big-data

Last synced: 28 Sep 2025

https://github.com/sunnynguyen-ai/fraud-detection-system

Real-time fraud detection system using ensemble ML models, featuring streaming data processing, explainable AI with SHAP, and production-ready deployment with FastAPI and Docker.

data-science docker ensemble-models fastapi feature-engineering fraud-detection machine-learning mlops production-ml python random-forest real-time-ml shap streamlit xgboost

Last synced: 04 May 2026

https://github.com/scicloj/tablecloth.time

Tools for the processing and manipulation of time-series data in Clojure.

clojure data-processing data-science dataset scicloj tablecloth time-series

Last synced: 14 Apr 2025

https://github.com/dataship/frame

A DataFrame for Javascript

data-frame data-science javascript statistics

Last synced: 08 Jul 2025

https://github.com/991o2o9/smart-cardiologist

Intelligent Python service with FastAPI for real-time heart disease predictions using machine learning. Features AI-assisted consultations, user authentication, analysis history, RESTful API, and comprehensive error handling. Secure and scalable solution for healthcare applications.

api artificial-intelligence data-science fastapi healthcare healthcare-technology heart-disease machine-learning medical-ai medical-diagnosis prediction predictive-analytics pydantic python rest-api scikit-learn swagger uvicorn

Last synced: 30 Aug 2025

https://github.com/nirala96/bangalore-house-prediction-app

Predicts home prices of Bangalore. Used Flutter, Flask and Jupyter Notebook.

data-science datacleaning exploratory-data-analysis flask-api flutter jupyter-notebook linear-regression python

Last synced: 10 Mar 2026

https://github.com/facultyai/faculty

A Python library for interacting with the Faculty platform

data-science faculty-platform python

Last synced: 14 Apr 2025

https://github.com/jdvelasq/courses

Material de apoyo para cursos, Facultad de Minas, Universidad Nacional de Colombia

analytics big-data big-data-analytics data-science training-materials

Last synced: 23 Aug 2025

https://github.com/astarte-platform/astarte_flow

Build data processing pipelines with Astarte Flow.

ai container containers data-science docker elixir iot kubernetes lua pipelines realtime

Last synced: 07 May 2025

https://github.com/mkearney/tfse

๐Ÿ›  Useful R functions for various things

data-science functions mkearney-r-package r-language rstats utility

Last synced: 12 Apr 2025

https://github.com/amitkaps/multidim

Visualising Multi Dimensional Data

data-science data-visualization grammar python r visualization

Last synced: 01 Mar 2026

https://github.com/kennethleungty/data-centric-ai-competition

Codes for a Top 5% finish in the Data-Centric AI Competition organized by Andrew Ng and DeepLearning.AI

ai andrew-ng data-centric data-centric-ai data-science deep-learning machine-learning

Last synced: 09 Jul 2025

https://github.com/mauroluzzatto/explainy

explainy is a Python library for generating machine learning model explanations for humans

data-science explanation machine-learning machine-learning-explainability python scikit-learn

Last synced: 07 Oct 2025

https://github.com/ajayarunachalam/gui-pandas-ai

GUIPandasAI - Integrating Generative AI capabilities into Pandas as Web Interface along with key-words based data analysis services

ai chatgpt data data-analysis data-analytics data-science generative-ai gpt-3 gpt-4 llm pandas python streamlit web-app

Last synced: 06 Jul 2025

https://github.com/getindata/quickstart-ml-blueprints

Data science project development best practices and state of the art open-source tooling forged into a set of solved ML use cases to serve as blueprints for efficient prototyping.

data-science machine-learning

Last synced: 09 Apr 2025

https://github.com/wibeasley/ranalysisskeleton

Files and settings commonly used in analysis projects with R

analysis data-science r

Last synced: 06 Oct 2025

https://github.com/brakmic/data-science-for-losers

:chart_with_upwards_trend: Articles on Data Science, Jupyter, and Pandas

data-science jupyter machine-learning python

Last synced: 23 Apr 2025