An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/junpenglao/planet_sakaar_data_science

A colourful collection of codes and notebooks, like Planet Sakaar

bayesian-inference data-science pymc3

Last synced: 06 May 2025

https://github.com/druths/xp

A framework (comand line tool + libraries) for creating flexible compute pipelines

data-science notebook pipeline research-tool workflow

Last synced: 27 Mar 2026

https://github.com/jason2brownlee/machinelearningmischief

Machine Learning Mischief: Examples from the dark side of data science

data-science ethics hacking machine-learning statistics

Last synced: 29 Jan 2026

https://github.com/fcakyon/instafake-dataset

Dataset for Intagram Fake and Automated Account Detection

bot classification data-science dataset fake instafake instagram machine-learning research

Last synced: 30 Apr 2025

https://github.com/stitchfix/mab

Library for multi-armed bandit selection strategies, including efficient deterministic implementations of Thompson sampling and epsilon-greedy.

data-science experimentation go golang multi-armed-bandit multi-armed-bandits multiarmed-bandits reinforcement-learning thompson thompson-sampling

Last synced: 16 Jul 2025

https://github.com/alinski29/stonks.jl

Julia library for standardizing financial data retrieval and storage from multiple APIs.

data data-mining data-science dataframe finance julia trading trading-algorithms

Last synced: 06 May 2025

https://github.com/datakitchen/dataops-testgen

DataOps Data Quality TestGen is part of DataKitchen's Open Source Data Observability. DataOps TestGen delivers simple, fast data quality test generation and execution by data profiling,  new dataset hygiene review, AI generation of data quality validation tests, ongoing testing of data refreshes, & continuous anomaly monitoring

data data-engineering data-observability data-quality data-science data-testing datachecker dataops dataprofiling dataquality datavalidation mssql postgresql python redshift self-hosted snowflake

Last synced: 25 Feb 2026

https://github.com/ebran/grim

grim brings property graphs to the Nim language. Look around you: everything is a graph!

data-science data-structures graph graph-theory nim nim-lang property-graph

Last synced: 09 Apr 2025

https://github.com/backtick-se/cowait

Containerized distributed programming framework for Python

dask data-engineering data-science docker kubernetes python spark task-scheduler workflow-engine

Last synced: 14 Jan 2026

https://github.com/asad70/insider-trading

This program extracts insider trading data from the sec website and stores it in excel file for the specified time frame.

algotrading data-science extract-data insider-trading insiders tickers trading trading-strategies

Last synced: 27 Apr 2025

https://github.com/zincware/zntrack

Create, visualize, run & benchmark DVC pipelines in Python & Jupyter notebooks.

data-science data-version-control developer-tools dvc git machine-learning python reproducibility

Last synced: 22 Jul 2025

https://github.com/lilaq-project/lilaq

Advanced data visualization.

data-science plotting typst visualization

Last synced: 05 Mar 2025

https://github.com/shlizee/NeuroAI

NeuroAI-UW seminar, a regular weekly seminar for the UW community, organized by NeuroAI Shlizerman Lab.

ai cvpr data-science deep-learning eccv icml neural-networks neurips neuroscience-methods recurrent-neural-networks sfn

Last synced: 01 May 2025

https://github.com/hunar4321/reweight-gpt

Reweight GPT - a simple neural network using transformer architecture for next character prediction

algorithms data-science gpt language-model machine-learning nerual-networks numpy pytorch

Last synced: 10 Apr 2025

https://github.com/dlab-berkeley/Python-Data-Wrangling-Legacy

D-Lab's 3 hour introduction to data wrangling in Python. Learn how to import and manipulate dataframes using pandas in Python.

data-science pandas python

Last synced: 26 Apr 2025

https://github.com/PKU-DAIR/mindware

An efficient open-source AutoML system for automating machine learning lifecycle, including feature engineering, neural architecture search, and hyper-parameter tuning.

automl-algorithms automl-pipeline bayesian-optimization blackbox-optimization data-science deep-learning distributed-systems ensemble-learning hyper-parameter-optimization knobs-tuning machine-learning meta-learning neural-architecture-search python

Last synced: 09 May 2025

https://github.com/team-fastml/fastml

A Python package built on sklearn for running a series of classification Algorithms in a faster and easier way.

algorithms data-science deep-learning machine-learning machine-learning-algorithms neural-network python

Last synced: 11 Apr 2025

https://github.com/aw-junaid/computer-science

Explore a collection of resources and projects in Computer Science, covering algorithms, data structures, programming languages, and emerging technologies. Ideal for learners and enthusiasts looking to enhance their knowledge and skills in the field

algorithms assembly-language automata computer-architecture computer-networks computer-science computer-vision cpp cybersecurity data-science data-science-projects data-structures database game-development machine-learning networking operating-system python

Last synced: 26 Mar 2025

https://github.com/ActuariesInstitute/cookbook

Data and analytics cookbook for actuaries

actuarial analytics data-science hacktoberfest

Last synced: 20 Jul 2025

https://github.com/ulikoehler/uliengineering

A python library for calculations perfomed in electronics engineering

data-analysis data-science electronics engineering python

Last synced: 05 Apr 2025

https://github.com/ahmed-mohamed-sn/olliePy

OlliePy is a python package which can help data scientists in exploring their data and evaluating and analysing their machine learning experiments by utilising the power and structure of modern web applications. The data scientist only needs to provide the data and any required information and OlliePy will generate the rest.

ai analytics charts dashboard data data-analytics data-science data-scientist eda error-analysis exploratory-data-analysis machine-learning python visualization

Last synced: 08 May 2025

https://github.com/BojarLab/glycowork

Package for processing and analyzing glycans and their role in biology.

bioinformatics computational-biology data-science glycans glycobiology machine-learning molecular-biology open-source python

Last synced: 28 Sep 2025

https://github.com/oborchers/medium_repo

This repository provides the code examples for the corresponding blog posts. In case you have questions, feel free to contact me directly.

blog blogging business-analytics data-science data-science-notebooks deep-learning machine-learning machine-learning-algorithms marketing natural-language-processing neural-networks predictive-analytics predictive-modeling

Last synced: 02 Aug 2025

https://github.com/iesahin/xvc

A robust (🐢) and fast (🐇) MLOps tool for managing data and pipelines in Rust (🦀)

command-line-tool data data-engineering data-pipelines data-science devops machine-learning machine-learning-engineering mlops rust

Last synced: 28 Jun 2025

https://github.com/tirendazacademy/chatgpt-with-examples

This repo contains ChatGPT tutorials about data science, machine learning, deep learning, Python. We show how to use Chat GPT with examples.

chat-gpt chatgpt chatgpt-api chatgpt-python chatgpt3 data-science deep-learning machine-learning

Last synced: 19 Apr 2025

https://github.com/elshor/dstools

Javascript tools and utilities for the data scientist

data-science javascript

Last synced: 13 May 2025

https://github.com/zincware/ZnTrack

Create, visualize, run & benchmark DVC pipelines in Python & Jupyter notebooks.

data-science data-version-control developer-tools dvc git machine-learning python reproducibility

Last synced: 07 May 2025

https://github.com/lter/lterdatasampler

LTER data samples to teach environmental data science

data-science ecology lter-science r r-package

Last synced: 06 Jul 2025

https://github.com/sparkfish/shabby-pages

ShabbyPages is a state-of-the-art corpus of born-digital document images with both ground truth and distorted versions appropriate for use in training models to reverse distortions and recover to original denoised documents.

binarization born-digital computer-vision corpus data-science dataset denoising layout-detection

Last synced: 17 Aug 2025

https://github.com/nhsdigital/data-analytics-services

This repo collects the open-source work of the Analytics Service within NHS Digital Data Services

data-science health healthcare nhs nhs-digital nhs-digital-publication pyspark python python3 r rap reproducible-analytical-pipeline sql

Last synced: 24 Oct 2025

https://github.com/datakitchen/dataops-observability

DataOps Observability is part of DataKitchen's Open Source Data Observability. DataOps Observability monitors every data journey from data source to customer value, from any team development environment into production, across every tool, team, environment, and customer so that problems are detected, localized, and understood immediately.

data data-engineering data-observability data-science dataops pipleine-monitoring

Last synced: 01 Apr 2026

https://github.com/kaggledatasets/kaggledatasets

Collection of Kaggle Datasets ready to use for Everyone (Looking for contributors)

data-science datasets deep-learning kaggle keras machine-learning python pytorch scikit-learn tensorflow

Last synced: 20 Jun 2025

https://github.com/zeeshanahmad4/stock-prices-prediction-ml-flask-dashboard

This program predicts the price of GOOG stock for a specific day using the Machine Learning algorithm called Support Vector Regression (SVR) Linear Regression. Importing flask module in the project is mandatory An object of Flask class is our WSGI application.

classification data-mining data-science data-visualization dataset flask flask-dashboard linear-regression ml prediction prediction-algorithm prediction-model predictive-analytics python stock-analysis stock-market stock-prices stock-prices-prediction stock-trading visualization

Last synced: 07 May 2025

https://github.com/madhurimarawat/semester-notes

A comprehensive, well-structured repository of B.Tech (Hons) CSE notes and learning resources, specializing in Artificial Intelligence and Data Science. Includes semester-wise notes, question papers, curated study guides, and indexed materials designed for efficient learning, revision, and academic reference.

artificial-intelligence btech-notes computer-networks computer-organization-architecture computer-science cse-notes data-science data-visualization database-management-system engineering-mathematics engineering-notes learning-resources machine-learning object-oriented-programming operating-systems probability-and-statistics python-for-data-science semester-notes study-materials theory-of-computation

Last synced: 07 Mar 2026

https://github.com/tatevkaren/artificial-neural-network-business_case_study

Business Case Study to predict customer churn rate based on Artificial Neural Network (ANN), with TensorFlow and Keras in Python. This is a customer churn analysis that contains training, testing, and evaluation of an ANN model. (Includes: Case Study Paper, Code)

ann ann-model artificial-neural-network artificial-neural-networks bank-customers case-study churn-analysis data-science deep-learning machine-learning prediction-model predictive-analytics python3 tensorflow-tutorials

Last synced: 02 May 2025

https://github.com/henestrosadev/sololearn

Compilation of all SoloLearn courses with their respective projects and practices and all 72 code challenges for all 7 supported languages.

code-challenge code-practice data-science programming-exercises programming-languages python sololearn sololearn-cert sololearn-solutions

Last synced: 16 Mar 2025

https://github.com/kennethleungty/end-to-end-automl-insurance

An End-to-End Implementation of AutoML with H2O, MLflow, FastAPI, and Streamlit for Insurance Cross-Sell

automl data-science fastapi h2o h2o-automl machine-learning mlflow mlops python streamlit

Last synced: 12 Jul 2025

https://github.com/mikeizbicki/cmc-csci046

CMC's Data Structures and Algorithms Course Materials

cmc computer-science course data-science python3

Last synced: 16 May 2025

https://github.com/rcdilorenzo/ecce

ML Prediction of Bible Topics and Passages (Python / React)

data-science fastapi fully-connected-network interactive-visualizations keras-tensorflow reactjs

Last synced: 18 Jan 2026

https://github.com/jacksonburns/astartes

Better Data Splits for Machine Learning

ai data-science machine-learning ml python sampling

Last synced: 21 Aug 2025

https://github.com/jrfiedler/causal_inference_julia_code

Julia code for part 2 of the book Causal Inference: What If, by Miguel Hernán and James Robins

causal-inference causality data-science julia julialang

Last synced: 17 Mar 2026

https://github.com/daun-io/study-data-science

Practical data science notebooks that I used to study at 2016

data-science jupyter-notebook machine-learning tensorflow

Last synced: 13 May 2025

https://github.com/togethercomputer/open-data-scientist

Open AI data scientist agent that automates complex data analysis tasks using the ReAct framework. Execute Python code locally or in the cloud, upload datasets, and generate detailed analytical reports with minimal setup.

agents ai data-science llms

Last synced: 23 Jun 2025

https://github.com/jonathandinu/spark-ray-data-science

Supporting content (slides and exercises) for the Pearson video series covering best practices for developing scalable applications with Spark and Ray in the context of a data scientist's standard workflow.

artificial-intelligence data-science distributed-computing machine-learning python ray spark

Last synced: 08 May 2025

https://github.com/google-marketing-solutions/feedx

Transparent, robust and trustworthy A/B experimentation for Shopping feeds.

ab-testing data-science experimentation python shopping

Last synced: 12 Aug 2025

https://github.com/mosdeo/LKYDeepNN

Low dependency(C++11 STL only), good portability, header-only, deep neural networks for embedded

back-propagation cross-entropy data-science data-visualization deep-learning multilayer-perceptron

Last synced: 13 Nov 2025

https://github.com/daun-io/Study-Data-Science

Practical data science notebooks that I used to study at 2016

data-science jupyter-notebook machine-learning tensorflow

Last synced: 19 Jul 2025

https://github.com/theengineeringworld/statistics-using-python

These files are part of Youtube Course "Statistics Using Python" Offered By The Engineering WOrld. Offered By: http://youtube.com/theengineeringworld

cleaning data-analysis data-mining data-science data-visualization database jupyter-notebooks python python3 statistics

Last synced: 06 Sep 2025

https://github.com/ropensci/rdataretriever

R interface to the Data Retriever

data data-science database datasets r r-package rstats science

Last synced: 22 Oct 2025

https://github.com/maneprajakta/honours-in-data-science

Resources and Implementation Of Assignment For Honours In Data Science

assignment-solutions data-science honours resources sppu

Last synced: 17 Mar 2025

https://github.com/nolanbconaway/pitchfork-data

Analyses on over 18,000 pitchfork reviews.

data-science ipynb jupyter music pitchfork

Last synced: 06 Sep 2025

https://github.com/giswqs/postgis

Spatial Data Management with PostgreSQL and PostGIS https://gishub.org/sdm

data-science database geospatial postgis postgres postgresql

Last synced: 01 Aug 2025

https://github.com/ericmjl/pyds-cli

Helping you manage your data science projects sanely.

data-science workflow-automation

Last synced: 02 Apr 2026

https://github.com/timkpaine/perspective-parquet

Parquet file reader and editor in Jupyterlab, built with `perspective` for pivoting, filtering, aggregating, etc

data-science data-visualization datavisualization dataviz jupyter jupyterlab jupyterlab-extension jupyterlab-extensions parquet parquet-viewer perspective pivot-tables

Last synced: 03 Sep 2025

https://github.com/pzaino/thecrowler

A Content Discovery and Development Platform. Empowering Cybersecurity, AI, Marketing, and Finance professionals and researchers to discover, analyze, and interact with the web in all its dimensions.

automation blue-team-tool content-detection content-discovery crawler crawling cyber-security cybersecurity cybersecurity-tools data-collection data-science distributed-systems golang indexer indexing reconnaissance red-team-tools scraping search-engine vulnerability-detection

Last synced: 06 Feb 2026

https://github.com/alext234/coronavirus-stats

Automatically scrape data and statistics on Coronavirus to make them easily accessible in CSV format

australia cdc china coronavirus covid-19 data-science europe health italy jupyter-notebook pipeline scraping-data singapore south-korea stats usa wuhan-virus

Last synced: 20 Feb 2026

https://github.com/credo-ai/credoai_lens

Credo AI Lens is a comprehensive assessment framework for AI systems. Lens standardizes model and data assessment, and acts as a central gateway to assessments created in the open source community.

ai artificial-intelligence assessment data-science ethical-artificial-intelligence fairness-ai fairness-ml jupyter machine-learning ml python reporting responsible-ai visualization

Last synced: 02 Oct 2025

https://github.com/nneji123/credit-card-fraud-detection

Credit Card Fraud Detection App built with Streamlit, FastAPI and Docker.

credit-card data-science deployment docker docker-compose fastapi fraud-detection machine-learning streamlit

Last synced: 29 Oct 2025

https://github.com/pjaselin/cubist

A Python package for fitting Quinlan's Cubist regression model

data-science machine-learning python regression scikit-learn

Last synced: 10 Apr 2025