An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/talegari/tidypandas

A grammar of data manipulation for pandas inspired by tidyverse

data-analysis data-science dataframe dataframe-library dplyr pandas python tidyverse

Last synced: 03 Mar 2025

https://github.com/eclipse-zenoh-flow/zenoh-flow

zenoh-flow aims at providing a zenoh-based data-flow programming framework for computations that span from the cloud to the device.

autonomous-vehicles data-science dataflow-programming machine-learning robotics ros2 rust-lang

Last synced: 24 Dec 2025

https://github.com/talkpython/excel-to-python-course

Student materials and handouts for Excel to Python course

course data-science excel office pandas python video

Last synced: 21 Sep 2025

https://github.com/juanitorduz/btsa

Berlin Time Series Analysis Repository

data-science meetup python r statistics time-series-analysis

Last synced: 23 Jun 2025

https://github.com/elemento24/journey-with-artificial-intelligence

This repo consists of all the resources that can be referred during one's Journey with Artificial Intelligence.

artificial-intelligence data-science deep-learning machine-learning python

Last synced: 08 Sep 2025

https://github.com/IlyaGusev/tgcontest

Telegram Data Clustering contest solution by Mindful Squirrel

classification clustering cpp data-science document-similarity fasttext machine-learning nlp

Last synced: 03 Apr 2025

https://github.com/AnonCatalyst/Coeus-OSINT-ToolBox

Coeus ๐ŸŒ is an OSINT ToolBox empowering users with tools for effective intelligence gathering from open sources. From social media monitoring ๐Ÿ“ฑ to data analysis ๐Ÿ“Š, it offers a centralized platform for seamless OSINT investigations.

data-science data-visualization database forensic-analysis forensics forensics-tools framework information-retrieval infosec osint osint-framework osint-python osint-resources osint-tool osint-toolkit people-search reconnaissance

Last synced: 06 May 2025

https://github.com/mld3/fiddle

FlexIble Data-Driven pipeLinE โ€“ a preprocessing pipeline that transforms structured EHR data into feature vectors to be used with ML algorithms. https://doi.org/10.1093/jamia/ocaa139

data-science electronic-health-records jamia machine-learning preprocessing

Last synced: 14 Jan 2026

https://github.com/jkoutsikakis/pytorch-wrapper

Provides a systematic and extensible way to build, train, evaluate, and tune deep learning models using PyTorch.

data-science deep-learning machine-learning neural-network python pytorch pytorch-wrapper tensor

Last synced: 04 Feb 2026

https://github.com/igerber/diff-diff

A Python library for Difference-in-Differences (DiD) causal inference analysis with an sklearn-like API and statsmodels-style outputs.

analytics causal-inference data-science difference-in-differences econometrics economics

Last synced: 20 Apr 2026

https://github.com/hazemabdelkawy/QuranGPT

Quran GPT is a project that leverages the power of the GPT-4 language model to generate meaningful embeddings for Quran verses. This project not only generates embeddings for the verses but also visualizes the distribution of these embeddings using t-SNE in a 3D scatter plot.

artificial-intelligence data-science data-visualization machine-learning nlp quran sunnah

Last synced: 01 Feb 2026

https://github.com/tiledb-inc/tiledb-vcf

Efficient variant-call data storage and retrieval library using the TileDB storage library.

bioinformatics data-science genomics gwas python spark tiledb variant-calling vcf

Last synced: 05 Apr 2025

https://github.com/giswqs/manjaro-linux

Shell scripts for setting up Manjaro Linux for Python programming and deep learning

data-science deep-learning gis kde manjaro manjaro-linux notebook-jupyter python r remote-sensing shell-scripts tensorflow

Last synced: 12 May 2025

https://github.com/dkirkby/machinelearningstatistics

Machine learning and statistics for physicists

data-science machine-learning physics python statistics

Last synced: 06 Mar 2025

https://github.com/lyltj2010/DataMining

ๆ•ฐๆฎๆŒ–ๆŽ˜ๅผ€ๆบไนฆ

data-science datamining deeplearning machine-learning

Last synced: 23 Aug 2025

https://github.com/cedrickchee/data-science-notebooks

Data science Python notebooksโ€”a collection of Jupyter notebooks on machine learning, deep learning, statistical inference, data analysis and visualization.

data-science deep-learning fastai kaggle keras machine-learning notebooks numpy pandas python pytorch tensorflow

Last synced: 07 May 2025

https://github.com/synthesized-io/fairlens

Identify bias and measure fairness of your data

bias data data-analysis data-science fairness ml pandas python statistics

Last synced: 24 Jun 2025

https://github.com/microsoft/coml

Interactive coding assistant for data scientists and machine learning developers, empowered by large language models.

automated-machine automl copilot data-science hyperparameter-optimization jupyter jupyter-lab large-language-models llm machine-learning

Last synced: 04 Apr 2025

https://github.com/empower-ai/sql-agent

Ai Agent that helps you do data analytics with natural language.

analytics bigquery chatgpt chatgpt-bot data data-analytics data-science mysql postgresql slack slack-bot slackbot

Last synced: 11 Apr 2025

https://github.com/asavinov/prosto

Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

business-intelligence data-preparation data-preprocessing data-processing data-science data-wrangling feature-engineering map-reduce olap pandas python spark workflow

Last synced: 11 Apr 2025

https://github.com/questdb/time-series-streaming-analytics-template

Template to quickstart streaming analytics using Apache Kafka for ingestion, QuestDB for time-series storage and analytics, Grafana for near real-time dashboards, and Jupyter Notebook for data science

data-science grafana jupyter-notebook kafka kafka-connect monitoring pandas polars questdb telegraf timeseries timeseries-analysis timeseries-database timeseries-forecasting

Last synced: 27 Jun 2025

https://github.com/lsys/lexicalrichness

:smile_cat: :speech_balloon: A module to compute textual lexical richness (aka lexical diversity).

data-mining data-science information-retrieval lexical-analysis lexical-analyzer linguistic-analysis natural-language natural-language-processing nlp python

Last synced: 09 Apr 2025

https://github.com/nuclio/nuclio-jupyter

Nuclio Function Automation for Python and Jupyter

data-science jupyter kubernetes nuclio python

Last synced: 06 Jan 2026

https://github.com/caioricciuti/duck-ui

Duck-UI is a web-based interface for interacting with DuckDB, a high-performance analytical database system. It features a SQL editor, data import/export, data explorer, query history, theme toggle, and keyboard shortcuts, all running seamlessly in the browser using DuckDB's WebAssembly (WASM) capabilities.

data-science data-visualization dataanalysis datanalytics duckdb local

Last synced: 04 Apr 2025

https://github.com/faizanzaheergit/studentperformanceprediction-ml

This is a simple machine learning project using classifiers for predicting factors which affect student grades, using data from CSV file

ai-ml artificial-intelligence artificial-intelligence-projects csv-files data-science machine-learning machine-learning-projects ml-models python python3

Last synced: 10 Sep 2025

https://github.com/josephrp/datatonic

๐ŸŒŸDataTonic : A Data-Capable AGI-style Agent Builder of Agents , that creates swarms , runs commands and securely processes and creates datasets, databases, visualisations, and analyses.

agent-builder agi autogen azure chroma data data-science data-visualization database memgpt semantic-kernel semantic-memory taskweaver

Last synced: 11 Oct 2025

https://github.com/khuyentran1401/machine-learning-pipeline

Example machine learning pipeline with MLflow and Hydra

data-science hydra machine-learning machine-learning-pipeline mlflow

Last synced: 13 Apr 2025

https://github.com/datacarpentry/semester-biology

Forkable teaching materials for course on working with data in R

biology data-carpentry data-science r spatial-data sql teaching-materials

Last synced: 11 Mar 2026

https://github.com/stanfordnlp/edu-convokit

Edu-ConvoKit: An Open-Source Framework for Education Conversation Data

data data-analysis data-science education language natural-language-processing

Last synced: 15 Apr 2025

https://github.com/ropensci/gittargets

Data version control for reproducible analysis pipelines in R with {targets}.

data-science data-version-control data-versioning r r-package reproducibility reproducible-research rstats targets workflow

Last synced: 21 Aug 2025

https://github.com/delsner/flask-angular-data-science

Repository for a data science starter app using Flask, Angular and Docker. https://medium.com/@dvelsner/deploying-a-simple-machine-learning-model-in-a-modern-web-application-flask-angular-docker-a657db075280

angular data-science docker flask machine-learning python sklearn typescript

Last synced: 24 Jul 2025

https://github.com/rogerfitz/tutorials

Git Repo for Articles on Ergo Sum blog and the youtube channel https://www.youtube.com/channel/UCiie9CN--dazA7iT2sry5FA

algorithmia data-science draft-kings fan-duel fivethirtyeight google-maps-api ocr python sports tech text-to-speech visualizations

Last synced: 05 Apr 2025

https://github.com/akgold/do4ds

A book on DevOps for Data Scientists with CRC Press.

data-science devops it python r

Last synced: 25 Apr 2025

https://github.com/scrapinghub/aile

Automatic Item List Extraction

data-science

Last synced: 03 Mar 2026

https://github.com/quantscious/finmlkit

An open-source, lightweight, and blazing-fast financial machine learning library built with Numba. Process raw trades, generate advanced bars, features, and labels for quantitative research.

data-engineering data-science data-structures feature-engineering feature-extraction financial-analysis financial-data financial-machine-learning numba python quant quantitative-finance quantitative-research

Last synced: 17 Mar 2026

https://github.com/zjuearthdata/geochemistrypi

an open-sourced highly automated machine learning Python framework for data-driven geochemistry discovery

dash data-science fastapi flaml geochemistry mlflow nodejs ray reactjs scikit-learn typer

Last synced: 13 Dec 2025

https://github.com/sangaline/reverse-engineering-the-hacker-news-ranking-algorithm

An analysis of historical Hacker News data to determine the ranking algorithm

analysis data-science hacker-news

Last synced: 13 Apr 2025

https://github.com/fastai/fastgpu

A queue service for quickly developing scripts that use all your GPUs efficiently

data-science deep-learning gpus machine-learning python resource-management

Last synced: 18 Jun 2025

https://github.com/uc-r/uc-r.github.io

Main repository for R programming courses @ University of Cincinnati, courses and tutorials that focus on data wrangling, exploration, visualization, and analysis with R.

classroom data-science data-wrangling machine-learning r tutorial tutorial-code visualization

Last synced: 26 Mar 2025

https://github.com/palashio/nylon

An intelligent, flexible grammar of machine learning.

auto-ml data-science grammar machine-learning

Last synced: 13 Apr 2025

https://github.com/TomasBeuzen/python-programming-for-data-science

Content from the University of British Columbia's Master of Data Science course DSCI 511.

data-manipulation data-science numpy pandas programming python teaching

Last synced: 18 Jul 2025

https://github.com/FlyRanch/figurefirst

A layout-first approach to figure making

data-science inkscape inkscape-extensions matplotlib plotting python svg

Last synced: 08 May 2025

https://github.com/frlender/jandas

A very much Pandas-like JavaScript library for data science

data-science dataframe indexing pandas series

Last synced: 17 Jan 2026

https://github.com/bcgov/bcdata

An R package for searching & retrieving data from the B.C. Data Catalogue

bcdc citz data-science env r r-package rstats

Last synced: 04 Apr 2025

https://github.com/gitonthescene/csv-reconcile

A reconciliation service for OpenRefine serving data from a given CSV file.

data-science openrefine

Last synced: 03 Jan 2026

https://github.com/hongping-zh/circular-bias-detection

a comprehensive statistical framework for detecting circular reasoning bias in AI algorithm evaluation

ai-ethics bias-detection data-science llm machine-learning model-evaluation

Last synced: 07 Mar 2026

https://github.com/Dumbris/trunklucator

Python module for data scientists for quick creating annotation projects.

active-learning annotation annotation-tool data-science machine-learning nlp

Last synced: 03 Apr 2025

https://github.com/krishkumar/createml-playgrounds

Create ML playgrounds for building machine learning models. For developers and data scientists.

apple classifier coreml createml data-science ios12 machine-learning model playground xcode

Last synced: 06 Oct 2025

https://github.com/seandavi/geoquery

The bridge between the NCBI Gene Expression Omnibus and Bioconductor

bioconductor bioinformatics data-science genomics ncbi-geo r rstats

Last synced: 04 Apr 2025

https://github.com/sigvt/vtuber-livechat-dataset

๐Ÿ“Š VTuber 1B: Billion-scale Live Chat and Moderation Event Dataset

data-science dataset holodata hololive machine-learning nijisanji nlp sigvt statistics superchat vtuber youtube-livestream

Last synced: 23 Jun 2025

https://github.com/MLMI2-CSSI/foundry

Simplifying the discovery and usage of machine-learning ready datasets in materials science and chemistry

chemistry data-science datasets machine-learning materials-science

Last synced: 15 Jul 2025

https://github.com/dspinellis/alexandria3k

Local relational access to openly-available publication data sets

bibliometric-analysis crossref data-science orcid scientometrics

Last synced: 04 Apr 2025