An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/tomasonjo/blogs

Jupyter notebooks that support my graph data science blog posts at https://bratanic-tomaz.medium.com/

data-science graph graph-algorithms neo4j

Last synced: 12 Apr 2025

https://github.com/google/uncertainty-baselines

High-quality implementations of standard and SOTA methods on a variety of tasks.

bayesian-methods data-science deep-learning machine-learning neural-networks probabilistic-programming statistics tensorflow

Last synced: 14 May 2025

https://github.com/h2oai/h2o-tutorials

Tutorials and training material for the H2O Machine Learning Platform

data-science deep-learning h2o machine-learning python r tutorial

Last synced: 14 May 2025

https://github.com/pixeltable/pixeltable

Pixeltable — Data Infrastructure providing a declarative, incremental approach for multimodal AI workloads.

ai artificial-intelligence chatbot computer-vision data-science database feature-engineering feature-store genai llm machine-learning ml mlops multimodal vector-database

Last synced: 01 May 2026

https://github.com/man-group/ArcticDB

ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.

big-data data data-analysis data-science database dataframe pandas quantitative-analysis quantitative-finance quantitative-trading

Last synced: 12 Mar 2025

https://github.com/microsoft/RD-Agent

Research and development (R&D) is crucial for the enhancement of industrial productivity, especially in the AI era, where the core aspects of R&D are mainly focused on data and models. We are committed to automating these high-value generic R&D processes through our open source R&D automation tool RD-Agent, which lets AI drive data-driven AI.

agent ai automation data-mining data-science development llm research

Last synced: 24 Oct 2025

https://github.com/swanhubx/swanlab

⚡️SwanLab - an open-source, modern-design AI training tracking and visualization tool. Supports Cloud / Self-hosted use. Integrated with PyTorch / Transformers / LLaMA Factory / Swift / Ultralytics / veRL / MMEngine / Keras etc.

data-science deep-learning jax logging machine-learning mlops model-versioning python pytorch tensorboard tensorflow tracking transformers visualization

Last synced: 13 May 2025

https://github.com/khuyentran1401/efficient_python_tricks_and_tools_for_data_scientists

Efficient Python Tricks and Tools for Data Scientists

data-science python python3

Last synced: 14 May 2025

https://github.com/CamDavidsonPilon/lifetimes

Lifetime value in Python

data-science python statistics

Last synced: 06 May 2025

https://github.com/denizyuret/Knet.jl

Koç University deep learning framework.

data-science deep-learning julia knet machine-learning neural-networks

Last synced: 04 May 2025

https://github.com/denizyuret/knet.jl

Koç University deep learning framework.

data-science deep-learning julia knet machine-learning neural-networks

Last synced: 14 May 2025

https://github.com/demidovakatya/vvedenie-mashinnoe-obuchenie

:memo: Подборка ресурсов по машинному обучению

collections data-mining data-science deep-learning machine-learning mooc neural-networks nlp russian university

Last synced: 26 Jan 2026

https://github.com/khuyentran1401/Efficient_Python_tricks_and_tools_for_data_scientists

Efficient Python Tricks and Tools for Data Scientists

data-science python python3

Last synced: 29 Apr 2025

https://github.com/RamiAwar/dataline

Chat with your data - AI data analysis and visualization on CSV, Postgres, MySQL, Snowflake, SQLite...

ai chart data-science data-visualization llm sql

Last synced: 24 Jul 2025

https://github.com/ebay/tsv-utils

eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.

cli command-line csv d data-mining data-science delimited-files dlang reservoir-sampling sampling shuffle statistics tabular-data tsv uniq

Last synced: 27 Jan 2026

https://github.com/eBay/tsv-utils

eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.

cli command-line csv d data-mining data-science delimited-files dlang reservoir-sampling sampling shuffle statistics tabular-data tsv uniq

Last synced: 14 Apr 2025

https://github.com/googlecloudplatform/data-science-on-gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

cloud-computing data-analysis data-engineering data-pipeline data-processing data-science data-visualization machine-learning

Last synced: 14 Apr 2025

https://github.com/kotartemiy/pygooglenews

If Google News had a Python library

data-science google news python rss

Last synced: 14 May 2025

https://github.com/ebhy/budgetml

Deploy a ML inference service on a budget in less than 10 lines of code.

api data-science deployment fastapi inference machine-learning mlops

Last synced: 15 May 2025

https://github.com/ropensci/drake

An R-focused pipeline toolkit for reproducibility and high-performance computing

data-science drake high-performance-computing makefile peer-reviewed pipeline r r-package reproducibility reproducible-research ropensci rstats workflow

Last synced: 13 May 2025

https://github.com/patmartin/dex

Dex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.

d3 d3js data-analysis data-mining data-science data-visualization datavis datavisualization dataviz groovy java javafx visualization

Last synced: 16 May 2025

https://github.com/PatMartin/Dex

Dex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.

d3 d3js data-analysis data-mining data-science data-visualization datavis datavisualization dataviz groovy java javafx visualization

Last synced: 04 May 2025

https://github.com/GoogleCloudPlatform/data-science-on-gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

cloud-computing data-analysis data-engineering data-pipeline data-processing data-science data-visualization machine-learning

Last synced: 19 Jul 2025

https://github.com/crazyhottommy/getting-started-with-genomics-tools-and-resources

Unix, R and python tools for genomics and data science

bioinformatics cancer-genomics data-science

Last synced: 06 Oct 2025

https://github.com/ikatsov/tensor-house

A collection of reference Jupyter notebooks and demo AI/ML applications for enterprise use cases: marketing, pricing, supply chain, smart manufacturing, and more.

ai customer-analysis data-science deep-learning llm machine-learning marketing models personalization reinforcement-learning supply-chain

Last synced: 08 Apr 2025

https://github.com/jrfiedler/causal_inference_python_code

Python code for part 2 of the book Causal Inference: What If, by Miguel Hernán and James Robins

causal-inference causality data-science python

Last synced: 16 May 2025

https://github.com/nok/sklearn-porter

Transpile trained scikit-learn estimators to C, Java, JavaScript and others.

data-science machine-learning scikit-learn sklearn

Last synced: 15 May 2025

https://github.com/starpig1129/datagen

DATAGEN: AI-driven multi-agent research assistant automating hypothesis generation, data analysis, and report writing. Now expanding into crypto market intelligence. Learn more: https://datagen.digital/.

agent ai ai-data-analysis artificial-intelligence code-generation data-analysis data-analytics data-science langchain langgraph large-language-model large-language-models llm multiagent-systems python

Last synced: 14 May 2025

https://github.com/alan-turing-institute/clevercsv

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.

csv csv-converter csv-export csv-files csv-format csv-import csv-parser csv-parsing csv-reader csv-reading data-analysis data-mining data-science datascience machine-learning python python-library python3

Last synced: 13 May 2025

https://github.com/scikit-learn-contrib/MAPIE

A scikit-learn-compatible module to estimate prediction intervals and control risks based on conformal predictions.

classification confidence-intervals conformal-prediction data-science python regression sklearn

Last synced: 01 May 2025

https://github.com/scikit-learn-contrib/mapie

A scikit-learn-compatible module to estimate prediction intervals and control risks based on conformal predictions.

classification confidence-intervals conformal-prediction data-science python regression sklearn

Last synced: 13 May 2025

https://github.com/starpig1129/AI-Data-Analysis-MultiAgent

DATAGEN: AI-driven multi-agent research assistant automating hypothesis generation, data analysis, and report writing. Now expanding into crypto market intelligence. Learn more: https://datagen.digital/.

agent ai ai-data-analysis artificial-intelligence code-generation data-analysis data-analytics data-science langchain langgraph large-language-model large-language-models llm multiagent-systems python

Last synced: 02 May 2025

https://github.com/davidadsp/generative_deep_learning_2nd_edition

The official code repository for the second edition of the O'Reilly book Generative Deep Learning: Teaching Machines to Paint, Write, Compose and Play.

chatgpt dalle2 data-science deep-learning diffusion-models generative-adversarial-network gpt-3 machine-learning python stable-diffusion tensorflow

Last synced: 06 Oct 2025

https://github.com/reiinakano/xcessiv

A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling in Python.

automated-machine-learning data-science ensemble-learning hyperparameter-optimization machine-learning scikit-learn stacked-ensembles

Last synced: 15 May 2025

https://github.com/alan-turing-institute/CleverCSV

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.

csv csv-converter csv-export csv-files csv-format csv-import csv-parser csv-parsing csv-reader csv-reading data-analysis data-mining data-science datascience machine-learning python python-library python3

Last synced: 26 Mar 2025

https://github.com/rocketlaunchr/dataframe-go

DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration

data-science dataframe dataframes go golang machine-learning pandas pandas-dataframe python statistics

Last synced: 15 May 2025

https://github.com/davidADSP/Generative_Deep_Learning_2nd_Edition

The official code repository for the second edition of the O'Reilly book Generative Deep Learning: Teaching Machines to Paint, Write, Compose and Play.

chatgpt dalle2 data-science deep-learning diffusion-models generative-adversarial-network gpt-3 machine-learning python stable-diffusion tensorflow

Last synced: 01 May 2025

https://github.com/mandiant/threatpursuit-vm

Threat Pursuit Virtual Machine (VM): A fully customizable, open-sourced Windows-based distribution focused on threat intelligence analysis and hunting designed for intel and malware analysts as well as threat hunters to get up and running quickly.

analytics cyber data-science fireeye intelligence intelligence-analysis malware mandiant threat threathunting threatintelligence virtual-machine

Last synced: 30 Jun 2026

https://github.com/mandiant/ThreatPursuit-VM

Threat Pursuit Virtual Machine (VM): A fully customizable, open-sourced Windows-based distribution focused on threat intelligence analysis and hunting designed for intel and malware analysts as well as threat hunters to get up and running quickly.

analytics cyber data-science fireeye intelligence intelligence-analysis malware mandiant threat threathunting threatintelligence virtual-machine

Last synced: 12 Jul 2025

https://github.com/ramiawar/dataline

Chat with your data - AI data analysis and visualization on CSV, Postgres, MySQL, Snowflake, SQLite...

ai chart data-science data-visualization llm sql

Last synced: 14 May 2025

https://github.com/fmind/mlops-python-package

Kickstart your MLOps initiative with a flexible, robust, and productive Python package.

automation data-pipelines data-science machine-learning mlflow mlops pandera pydantic python

Last synced: 14 May 2025

https://github.com/elixir-explorer/explorer

Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir

data-science dataframes elixir rust

Last synced: 14 May 2025

https://github.com/devAmoghS/Machine-Learning-with-Python

Small scale machine learning projects to understand the core concepts . Give a Star 🌟If it helps you. BONUS: Interview Bank coming up..!

beginner-friendly data-science deep-learning exercises machine-learning practice-project python python-3 scikit-learn

Last synced: 19 Jul 2025

https://github.com/business-science/free_r_tips

Free R-Tips is a FREE Newsletter provided by Business Science. It comes with bite-sized code tutorials every week.

data-science newsletter tips tips-and-tricks

Last synced: 16 Sep 2025

https://github.com/devamoghs/machine-learning-with-python

Small scale machine learning projects to understand the core concepts . Give a Star 🌟If it helps you. BONUS: Interview Bank coming up..!

beginner-friendly data-science deep-learning exercises machine-learning practice-project python python-3 scikit-learn

Last synced: 14 May 2025

https://github.com/starpig1129/DATAGEN

DATAGEN: AI-driven multi-agent research assistant automating hypothesis generation, data analysis, and report writing. Now expanding into crypto market intelligence. Learn more: https://datagen.digital/.

agent ai ai-data-analysis artificial-intelligence code-generation data-analysis data-analytics data-science langchain langgraph large-language-model large-language-models llm multiagent-systems python

Last synced: 17 Nov 2025

https://github.com/xorbitsai/xorbits

Scalable Python DS & ML, in an API compatible & lightning fast way.

data-science distributed-systems lightgbm machine-learning ml numpy pandas python scalable xgboost

Last synced: 14 May 2025

https://github.com/okfn-brasil/querido-diario

📰 Diários oficiais brasileiros acessíveis a todos | 📰 Brazilian government gazettes, accessible to everyone.

civic-tech data-science digital-public-goods dpg governments-gazettes govtech hacktoberfest open-data politics scraping sdg-16 spider

Last synced: 12 Apr 2025

https://github.com/zama-ai/concrete-ml

Concrete ML: Privacy Preserving ML framework using Fully Homomorphic Encryption (FHE), built on top of Concrete, with bindings to traditional ML frameworks.

data-science fhe fully-homomorphic-encryption homomorphic-encryption machine-learning ppml privacy python scikit-learn tfhe torch

Last synced: 14 May 2025

https://github.com/teomewhy/teomerefs

Guia de referências técnicas para carreira em dados

data data-science machine-learning python

Last synced: 14 May 2025

https://github.com/JuliaStats/Distributions.jl

A Julia package for probability distributions and associated functions.

data-science julia probability-distributions statistics

Last synced: 08 May 2025

https://github.com/juliastats/distributions.jl

A Julia package for probability distributions and associated functions.

data-science julia probability-distributions statistics

Last synced: 13 May 2025

https://github.com/sajal2692/data-science-portfolio

Portfolio of data science projects completed by me for academic, self learning, and hobby purposes.

data-science keras machine-learning nlp pandas portfolio python scikit-learn

Last synced: 16 May 2025

https://github.com/shujian2015/freeml

A List of Data Science/Machine Learning Resources (Mostly Free)

data-science deep-learning machine-learning natural-language-processing

Last synced: 25 Mar 2025

https://github.com/red-data-tools/pycall.rb

Calling Python functions from the Ruby language

data-science pycall python ruby rubydatascience rubyml

Last synced: 13 Feb 2026

https://github.com/Shujian2015/FreeML

A List of Data Science/Machine Learning Resources (Mostly Free)

data-science deep-learning machine-learning natural-language-processing

Last synced: 05 May 2025