An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/iphysresearch/DataSciComp

A collection of popular Data Science Challenges/Competitions || Countdown timers to keep track of the entry deadlines.

challenge competition data-challenge data-science data-science-competitions project

Last synced: 02 Apr 2025

https://github.com/iphysresearch/datascicomp

A collection of popular Data Science Challenges/Competitions || Countdown timers to keep track of the entry deadlines.

challenge competition data-challenge data-science data-science-competitions project

Last synced: 17 Jan 2025

https://github.com/enzoampil/fastquant

fastquant โ€” Backtest and optimize your ML trading strategies with only 3 lines of code!

algotrading backtesting cryptocurrency data-science financial-data-science machine-learning quantitative-finance stocks trading-strategies

Last synced: 14 May 2025

https://github.com/jadianes/spark-py-notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

big-data bigdata data-analysis data-science ipython ipython-notebook machine-learning mllib notebook pyspark python spark

Last synced: 15 May 2025

https://github.com/hadley/stats337

Readings in applied data science

data-science teaching

Last synced: 16 May 2025

https://github.com/iamtodor/data-science-interview-questions-and-answers

Data science interview questions with answers. Not ideally (yet)

data-science interview-preparation interview-questions machine-learning

Last synced: 22 Mar 2025

https://github.com/moj-analytical-services/splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

data-matching data-science deduplicate-data deduplication duckdb em-algorithm entity-resolution fuzzy-matching record-linkage spark uk-gov-data-science

Last synced: 13 May 2025

https://github.com/microsoft/responsible-ai-toolbox

Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libraries empower developers and stakeholders of AI systems to develop and monitor AI more responsibly, and take better data-driven actions.

data-analysis data-science data-visualization error-analysis explainability explainable-ai explainable-ml fairness fairness-ai fairness-ml interpretability jupyter machine-learning machinelearning ml responsible-ai ui visualization widget widgets

Last synced: 13 May 2025

https://github.com/mlrun/mlrun

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

data-engineering data-science experiment-tracking kubernetes machine-learning mlops mlops-workflow model-serving python workflow

Last synced: 13 May 2025

https://github.com/nubank/fklearn

fklearn: Functional Machine Learning

data-analysis data-science machine-learning ml python

Last synced: 13 May 2025

https://github.com/tomasonjo/blogs

Jupyter notebooks that support my graph data science blog posts at https://bratanic-tomaz.medium.com/

data-science graph graph-algorithms neo4j

Last synced: 12 Apr 2025

https://github.com/google/uncertainty-baselines

High-quality implementations of standard and SOTA methods on a variety of tasks.

bayesian-methods data-science deep-learning machine-learning neural-networks probabilistic-programming statistics tensorflow

Last synced: 14 May 2025

https://github.com/h2oai/h2o-tutorials

Tutorials and training material for the H2O Machine Learning Platform

data-science deep-learning h2o machine-learning python r tutorial

Last synced: 14 May 2025

https://github.com/man-group/ArcticDB

ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.

big-data data data-analysis data-science database dataframe pandas quantitative-analysis quantitative-finance quantitative-trading

Last synced: 12 Mar 2025

https://github.com/microsoft/RD-Agent

Research and development (R&D) is crucial for the enhancement of industrial productivity, especially in the AI era, where the core aspects of R&D are mainly focused on data and models. We are committed to automating these high-value generic R&D processes through our open source R&D automation tool RD-Agent, which lets AI drive data-driven AI.

agent ai automation data-mining data-science development llm research

Last synced: 09 Feb 2025

https://github.com/swanhubx/swanlab

โšก๏ธSwanLab - an open-source, modern-design AI training tracking and visualization tool. Supports Cloud / Self-hosted use. Integrated with PyTorch / Transformers / LLaMA Factory / Swift / Ultralytics / veRL / MMEngine / Keras etc.

data-science deep-learning jax logging machine-learning mlops model-versioning python pytorch tensorboard tensorflow tracking transformers visualization

Last synced: 13 May 2025

https://github.com/khuyentran1401/efficient_python_tricks_and_tools_for_data_scientists

Efficient Python Tricks and Tools for Data Scientists

data-science python python3

Last synced: 14 May 2025

https://github.com/CamDavidsonPilon/lifetimes

Lifetime value in Python

data-science python statistics

Last synced: 06 May 2025

https://github.com/denizyuret/knet.jl

Koรง University deep learning framework.

data-science deep-learning julia knet machine-learning neural-networks

Last synced: 14 May 2025

https://github.com/denizyuret/Knet.jl

Koรง University deep learning framework.

data-science deep-learning julia knet machine-learning neural-networks

Last synced: 04 May 2025

https://github.com/khuyentran1401/Efficient_Python_tricks_and_tools_for_data_scientists

Efficient Python Tricks and Tools for Data Scientists

data-science python python3

Last synced: 29 Apr 2025

https://github.com/demidovakatya/vvedenie-mashinnoe-obuchenie

:memo: ะŸะพะดะฑะพั€ะบะฐ ั€ะตััƒั€ัะพะฒ ะฟะพ ะผะฐัˆะธะฝะฝะพะผัƒ ะพะฑัƒั‡ะตะฝะธัŽ

collections data-mining data-science deep-learning machine-learning mooc neural-networks nlp russian university

Last synced: 23 Mar 2025

https://github.com/ebay/tsv-utils

eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.

cli command-line csv d data-mining data-science delimited-files dlang reservoir-sampling sampling shuffle statistics tabular-data tsv uniq

Last synced: 25 Mar 2025

https://github.com/eBay/tsv-utils

eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.

cli command-line csv d data-mining data-science delimited-files dlang reservoir-sampling sampling shuffle statistics tabular-data tsv uniq

Last synced: 14 Apr 2025

https://github.com/googlecloudplatform/data-science-on-gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

cloud-computing data-analysis data-engineering data-pipeline data-processing data-science data-visualization machine-learning

Last synced: 14 Apr 2025

https://github.com/kotartemiy/pygooglenews

If Google News had a Python library

data-science google news python rss

Last synced: 14 May 2025

https://github.com/ropensci/drake

An R-focused pipeline toolkit for reproducibility and high-performance computing

data-science drake high-performance-computing makefile peer-reviewed pipeline r r-package reproducibility reproducible-research ropensci rstats workflow

Last synced: 13 May 2025

https://github.com/ebhy/budgetml

Deploy a ML inference service on a budget in less than 10 lines of code.

api data-science deployment fastapi inference machine-learning mlops

Last synced: 15 May 2025

https://github.com/patmartin/dex

Dex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.

d3 d3js data-analysis data-mining data-science data-visualization datavis datavisualization dataviz groovy java javafx visualization

Last synced: 16 May 2025

https://github.com/PatMartin/Dex

Dex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.

d3 d3js data-analysis data-mining data-science data-visualization datavis datavisualization dataviz groovy java javafx visualization

Last synced: 04 May 2025

https://github.com/GoogleCloudPlatform/data-science-on-gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

cloud-computing data-analysis data-engineering data-pipeline data-processing data-science data-visualization machine-learning

Last synced: 27 Nov 2024

https://github.com/ikatsov/tensor-house

A collection of reference Jupyter notebooks and demo AI/ML applications for enterprise use cases: marketing, pricing, supply chain, smart manufacturing, and more.

ai customer-analysis data-science deep-learning llm machine-learning marketing models personalization reinforcement-learning supply-chain

Last synced: 08 Apr 2025

https://github.com/jrfiedler/causal_inference_python_code

Python code for part 2 of the book Causal Inference: What If, by Miguel Hernรกn and James Robins

causal-inference causality data-science python

Last synced: 16 May 2025

https://github.com/nok/sklearn-porter

Transpile trained scikit-learn estimators to C, Java, JavaScript and others.

data-science machine-learning scikit-learn sklearn

Last synced: 15 May 2025

https://github.com/starpig1129/datagen

DATAGEN: AI-driven multi-agent research assistant automating hypothesis generation, data analysis, and report writing. Now expanding into crypto market intelligence. Learn more: https://datagen.digital/.

agent ai ai-data-analysis artificial-intelligence code-generation data-analysis data-analytics data-science langchain langgraph large-language-model large-language-models llm multiagent-systems python

Last synced: 14 May 2025

https://github.com/alan-turing-institute/clevercsv

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.

csv csv-converter csv-export csv-files csv-format csv-import csv-parser csv-parsing csv-reader csv-reading data-analysis data-mining data-science datascience machine-learning python python-library python3

Last synced: 13 May 2025

https://github.com/scikit-learn-contrib/MAPIE

A scikit-learn-compatible module to estimate prediction intervals and control risks based on conformal predictions.

classification confidence-intervals conformal-prediction data-science python regression sklearn

Last synced: 01 May 2025

https://github.com/scikit-learn-contrib/mapie

A scikit-learn-compatible module to estimate prediction intervals and control risks based on conformal predictions.

classification confidence-intervals conformal-prediction data-science python regression sklearn

Last synced: 13 May 2025

https://github.com/starpig1129/AI-Data-Analysis-MultiAgent

DATAGEN: AI-driven multi-agent research assistant automating hypothesis generation, data analysis, and report writing. Now expanding into crypto market intelligence. Learn more: https://datagen.digital/.

agent ai ai-data-analysis artificial-intelligence code-generation data-analysis data-analytics data-science langchain langgraph large-language-model large-language-models llm multiagent-systems python

Last synced: 02 May 2025

https://github.com/crazyhottommy/getting-started-with-genomics-tools-and-resources

Unix, R and python tools for genomics and data science

bioinformatics cancer-genomics data-science

Last synced: 14 May 2025

https://github.com/reiinakano/xcessiv

A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling in Python.

automated-machine-learning data-science ensemble-learning hyperparameter-optimization machine-learning scikit-learn stacked-ensembles

Last synced: 15 May 2025

https://github.com/davidadsp/generative_deep_learning_2nd_edition

The official code repository for the second edition of the O'Reilly book Generative Deep Learning: Teaching Machines to Paint, Write, Compose and Play.

chatgpt dalle2 data-science deep-learning diffusion-models generative-adversarial-network gpt-3 machine-learning python stable-diffusion tensorflow

Last synced: 15 May 2025

https://github.com/alan-turing-institute/CleverCSV

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.

csv csv-converter csv-export csv-files csv-format csv-import csv-parser csv-parsing csv-reader csv-reading data-analysis data-mining data-science datascience machine-learning python python-library python3

Last synced: 26 Mar 2025

https://github.com/rocketlaunchr/dataframe-go

DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration

data-science dataframe dataframes go golang machine-learning pandas pandas-dataframe python statistics

Last synced: 15 May 2025

https://github.com/davidADSP/Generative_Deep_Learning_2nd_Edition

The official code repository for the second edition of the O'Reilly book Generative Deep Learning: Teaching Machines to Paint, Write, Compose and Play.

chatgpt dalle2 data-science deep-learning diffusion-models generative-adversarial-network gpt-3 machine-learning python stable-diffusion tensorflow

Last synced: 01 May 2025

https://github.com/mandiant/ThreatPursuit-VM

Threat Pursuit Virtual Machine (VM): A fully customizable, open-sourced Windows-based distribution focused on threat intelligence analysis and hunting designed for intel and malware analysts as well as threat hunters to get up and running quickly.

analytics cyber data-science fireeye intelligence intelligence-analysis malware mandiant threat threathunting threatintelligence virtual-machine

Last synced: 21 Nov 2024

https://github.com/mandiant/threatpursuit-vm

Threat Pursuit Virtual Machine (VM): A fully customizable, open-sourced Windows-based distribution focused on threat intelligence analysis and hunting designed for intel and malware analysts as well as threat hunters to get up and running quickly.

analytics cyber data-science fireeye intelligence intelligence-analysis malware mandiant threat threathunting threatintelligence virtual-machine

Last synced: 23 Feb 2025

https://github.com/fireeye/ThreatPursuit-VM

Threat Pursuit Virtual Machine (VM): A fully customizable, open-sourced Windows-based distribution focused on threat intelligence analysis and hunting designed for intel and malware analysts as well as threat hunters to get up and running quickly.

analytics cyber data-science fireeye intelligence intelligence-analysis malware mandiant threat threathunting threatintelligence virtual-machine

Last synced: 05 Dec 2024

https://github.com/ramiawar/dataline

Chat with your data - AI data analysis and visualization on CSV, Postgres, MySQL, Snowflake, SQLite...

ai chart data-science data-visualization llm sql

Last synced: 14 May 2025

https://github.com/fmind/mlops-python-package

Kickstart your MLOps initiative with a flexible, robust, and productive Python package.

automation data-pipelines data-science machine-learning mlflow mlops pandera pydantic python

Last synced: 14 May 2025

https://github.com/elixir-explorer/explorer

Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir

data-science dataframes elixir rust

Last synced: 14 May 2025

https://github.com/business-science/free_r_tips

Free R-Tips is a FREE Newsletter provided by Business Science. It comes with bite-sized code tutorials every week.

data-science newsletter tips tips-and-tricks

Last synced: 25 Mar 2025

https://github.com/devamoghs/machine-learning-with-python

Small scale machine learning projects to understand the core concepts . Give a Star ๐ŸŒŸIf it helps you. BONUS: Interview Bank coming up..!

beginner-friendly data-science deep-learning exercises machine-learning practice-project python python-3 scikit-learn

Last synced: 14 May 2025

https://github.com/devAmoghS/Machine-Learning-with-Python

Small scale machine learning projects to understand the core concepts . Give a Star ๐ŸŒŸIf it helps you. BONUS: Interview Bank coming up..!

beginner-friendly data-science deep-learning exercises machine-learning practice-project python python-3 scikit-learn

Last synced: 27 Nov 2024