An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/starpig1129/DATAGEN

DATAGEN: AI-driven multi-agent research assistant automating hypothesis generation, data analysis, and report writing. Now expanding into crypto market intelligence. Learn more: https://datagen.digital/.

agent ai ai-data-analysis artificial-intelligence code-generation data-analysis data-analytics data-science langchain langgraph large-language-model large-language-models llm multiagent-systems python

Last synced: 19 Feb 2025

https://github.com/xorbitsai/xorbits

Scalable Python DS & ML, in an API compatible & lightning fast way.

data-science distributed-systems lightgbm machine-learning ml numpy pandas python scalable xgboost

Last synced: 14 May 2025

https://github.com/starpig1129/ai-data-analysis-multiagent

AI-Driven Research Assistant: An advanced multi-agent system for automating complex research processes. Leveraging LangChain, OpenAI GPT, and LangGraph, this tool streamlines hypothesis generation, data analysis, visualization, and report writing. Perfect for researchers and data scientists seeking to enhance their workflow and productivity.

agent ai ai-data-analysis artificial-intelligence code-generation data-analysis data-analytics data-science langchain langgraph large-language-model large-language-models llm multiagent-systems python

Last synced: 15 Feb 2025

https://github.com/okfn-brasil/querido-diario

📰 Diários oficiais brasileiros acessíveis a todos | 📰 Brazilian government gazettes, accessible to everyone.

civic-tech data-science digital-public-goods dpg governments-gazettes govtech hacktoberfest open-data politics scraping sdg-16 spider

Last synced: 12 Apr 2025

https://github.com/zama-ai/concrete-ml

Concrete ML: Privacy Preserving ML framework using Fully Homomorphic Encryption (FHE), built on top of Concrete, with bindings to traditional ML frameworks.

data-science fhe fully-homomorphic-encryption homomorphic-encryption machine-learning ppml privacy python scikit-learn tfhe torch

Last synced: 14 May 2025

https://github.com/teomewhy/teomerefs

Guia de referências técnicas para carreira em dados

data data-science machine-learning python

Last synced: 14 May 2025

https://github.com/JuliaStats/Distributions.jl

A Julia package for probability distributions and associated functions.

data-science julia probability-distributions statistics

Last synced: 08 May 2025

https://github.com/juliastats/distributions.jl

A Julia package for probability distributions and associated functions.

data-science julia probability-distributions statistics

Last synced: 13 May 2025

https://github.com/sajal2692/data-science-portfolio

Portfolio of data science projects completed by me for academic, self learning, and hobby purposes.

data-science keras machine-learning nlp pandas portfolio python scikit-learn

Last synced: 16 May 2025

https://github.com/shujian2015/freeml

A List of Data Science/Machine Learning Resources (Mostly Free)

data-science deep-learning machine-learning natural-language-processing

Last synced: 25 Mar 2025

https://github.com/Shujian2015/FreeML

A List of Data Science/Machine Learning Resources (Mostly Free)

data-science deep-learning machine-learning natural-language-processing

Last synced: 05 May 2025

https://github.com/elixir-nx/explorer

Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir

data-science dataframes elixir rust

Last synced: 03 Mar 2025

https://github.com/novak-99/mlpp

A library created to revitalize C++ as a machine learning front end. Per aspera ad astra.

cpp data-science deep-learning machine-learning

Last synced: 16 May 2025

https://github.com/alishobeiri/thread

AI-powered Jupyter Notebook — use local AI to generate and edit code cells, automatically fix errors, and chat with your data

ai analysis analytics data-science jupyter jupyter-notebook jupyter-notebooks jupyterhub jupyterlab ollama python react reactjs

Last synced: 14 May 2025

https://github.com/novak-99/MLPP

A library created to revitalize C++ as a machine learning front end. Per aspera ad astra.

cpp data-science deep-learning machine-learning

Last synced: 20 Mar 2025

https://github.com/red-data-tools/pycall.rb

Calling Python functions from the Ruby language

data-science pycall python ruby rubydatascience rubyml

Last synced: 14 May 2025

https://github.com/mrkn/pycall.rb

Calling Python functions from the Ruby language

data-science pycall python ruby rubydatascience rubyml

Last synced: 13 Apr 2025

https://github.com/datumbox/datumbox-framework

Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.

big-data data-science java machine-learning nlp statistics

Last synced: 15 May 2025

https://github.com/TeoMeWhy/teomerefs

Guia de referências técnicas para carreira em dados

data data-science machine-learning python

Last synced: 25 Mar 2025

https://github.com/makcedward/nlp

:memo: This repository recorded my NLP journey.

ai data-science deep-learning machine-learning nlp

Last synced: 12 Apr 2025

https://github.com/squaredtechnologies/thread

AI-powered Jupyter Notebook — use local AI to generate and edit code cells, automatically fix errors, and chat with your data

ai analysis analytics data-science jupyter jupyter-notebook jupyter-notebooks jupyterhub jupyterlab ollama python react reactjs

Last synced: 06 Dec 2024

https://github.com/rhiever/datacleaner

A Python tool that automatically cleans data sets and readies them for analysis.

automation data-science machine-learning python

Last synced: 15 May 2025

https://github.com/egbertbouman/youtube-comment-downloader

Simple script for downloading Youtube comments without using the Youtube API

data-science data-scraper python youtube youtube-comments

Last synced: 15 May 2025

https://github.com/dataquestio/project-walkthroughs

Data science, machine learning, and web development project code for https://www.youtube.com/c/Dataquestio .

data-science machine-learning pandas python

Last synced: 08 Apr 2025

https://github.com/maxpumperla/deep_learning_and_the_game_of_go

Code and other material for the book "Deep Learning and the Game of Go"

alphago alphago-zero data-science deep-learning game-of-go games machine-learning neural-networks python

Last synced: 15 May 2025

https://github.com/mlr-org/mlr3

mlr3: Machine Learning in R - next generation

classification data-science machine-learning mlr3 r r-package regression

Last synced: 14 May 2025

https://github.com/dssg/hitchhikers-guide

The Hitchhiker's Guide to Data Science for Social Good

data-science dssg machine-learning training tutorial-exercises

Last synced: 03 May 2025

https://github.com/sematic-ai/sematic

An open-source ML pipeline development platform

ai data-science machine-learning ml ml-ops ml-pipeline ml-pipelines mlops pipeline python python3

Last synced: 14 May 2025

https://github.com/tidyverse/datascience-box

Data Science Course in a Box

data-science education r rstats teaching

Last synced: 15 May 2025

https://github.com/rstudio-education/datascience-box

Data Science Course in a Box

data-science education r rstats teaching

Last synced: 26 Mar 2025

https://github.com/mybridge/machine-learning-open-source

Monthly Series - Machine Learning Top 10 Open Source Projects

ai algorithm artificial-intelligence data-science machine-learning neural-network

Last synced: 19 Feb 2025

https://github.com/Mybridge/machine-learning-open-source

Monthly Series - Machine Learning Top 10 Open Source Projects

ai algorithm artificial-intelligence data-science machine-learning neural-network

Last synced: 22 Mar 2025

https://github.com/WenjieDu/PyPOTS

A Python toolkit/library for reality-centric machine/deep learning and data mining on partially-observed time series, including SOTA neural network models for scientific analysis tasks of imputation/classification/clustering/forecasting/anomaly detection/cleaning on incomplete industrial (irregularly-sampled) multivariate TS with NaN missing values

classification clustering data-mining data-science deep-learning forecasting healthcare imputation incomplete industrial interpolation machine-learning missing-values missingness neural-network partially-observed-time-series pytorch science-research time-series time-series-analysis

Last synced: 01 Apr 2025

https://github.com/firmai/data-science-career

Career Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository

analytics big-data business-analytics business-intelligence career data-science machine-learning resources

Last synced: 06 May 2025

https://github.com/grailbio/reflow

A language and runtime for distributed, incremental data processing in the cloud

analysis-pipeline aws bioinformatics-pipeline cloud-computing data-science golang language runtime scientific-computing

Last synced: 15 Mar 2025

https://github.com/caserec/Datasets-for-Recommender-Systems

This is a repository of a topic-centric public data sources in high quality for Recommender Systems (RS)

data-science database datasets public-data recommender-systems

Last synced: 28 Nov 2024

https://github.com/webartifex/intro-to-python

An intro to Python & programming for wanna-be data scientists

data-science introduction-to-programming jupyter python tutorial

Last synced: 11 May 2025

https://github.com/iamaziz/pydataset

Instant access to many datasets in Python.

data-science datasets python

Last synced: 16 May 2025

https://github.com/iamaziz/PyDataset

Instant access to many datasets in Python.

data-science datasets python

Last synced: 27 Nov 2024

https://github.com/probcomp/bayeslite

BayesDB on SQLite. A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data itself.

automatic-data-modeling data-science databases machine-learning probabilistic-programming

Last synced: 16 May 2025

https://github.com/fraunhoferportugal/tsfel

An intuitive library to extract features from time series.

classification colab-notebook data-science feature-engineering feature-extraction time-series

Last synced: 14 Mar 2025

https://github.com/youssefHosni/Practical-Machine-Learning

Practical machine learning notebook & articles covers the machine learning end to end life cycle.

data-science machine-learning

Last synced: 17 Mar 2025

https://github.com/youssefhosni/practical-machine-learning

Practical machine learning notebook & articles covers the machine learning end to end life cycle.

data-science machine-learning

Last synced: 12 Apr 2025

https://github.com/RamiAwar/dataline

Chat with your data - AI data analysis and visualization on CSV, Postgres, MySQL, Snowflake, SQLite...

ai chart data-science data-visualization llm sql

Last synced: 30 Nov 2024

https://github.com/wecoai/aideml

AIDE: AI-Driven Exploration in the Space of Code. State of the Art machine Learning engineering agents that automates AI R&D.

ai data-science llm machine-learning

Last synced: 18 Jun 2025

https://github.com/epsilla-cloud/vectordb

Epsilla is a high performance Vector Database Management System. Try out hosted Epsilla at https://cloud.epsilla.com/

ai chatgpt data data-science database embeddings embeddings-similarity infrastructure llms machine-learning neural-network neural-search rag retrieval search-engine vector-database vector-search

Last synced: 15 May 2025

https://github.com/chawlaavi/daily-dose-of-data-science

A collection of code snippets from the publication Daily Dose of Data Science on Substack: http://www.dailydoseofds.com/

data-analysis data-science data-science-tips data-visualization jupyter jupyter-notebook jupyter-tips matplotlib matplotlib-tips numpy pandas pandas-tips python python-tips sklearn

Last synced: 04 Apr 2025

https://github.com/google/lightweight_mmm

LightweightMMM 🦇 is a lightweight Bayesian Marketing Mix Modeling (MMM) library that allows users to easily train MMMs and obtain channel attribution information.

bayesian data-science econometrics marketing-science mmm

Last synced: 06 May 2025

https://github.com/empathy87/the-elements-of-statistical-learning-python-notebooks

A series of Python Jupyter notebooks that help you better understand "The Elements of Statistical Learning" book

data-analysis data-science machine-learning python sklearn statistical-learning tensorflow tutorials

Last synced: 13 Apr 2025

https://github.com/turicas/rows

A common, beautiful interface to tabular data, no matter the format

convert-data csv data data-science excel hacktoberfest python table tabular-data xls xlsx

Last synced: 14 May 2025

https://github.com/Kotlin/dataframe

Structured data processing in Kotlin

data-analysis data-science dataframe kotlin

Last synced: 11 Apr 2025

https://github.com/stitchfix/hamilton

A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton

dag data-engineering data-platform data-science dataframe etl etl-framework etl-pipeline feature-engineering featurization hamilton hamiltonian machine-learning numpy pandas python software-engineering stitch-fix

Last synced: 18 Jan 2025

https://github.com/GoogleCloudPlatform/DataflowJavaSDK

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.

big-data data-analysis data-mining data-processing data-science google-cloud-dataflow

Last synced: 01 May 2025