An open API service indexing awesome lists of open source software.

data

Individual facts, statistics, or items of information, often numeric. In a technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects. (https://en.wikipedia.org/w/index.php?title=Data&oldid=1093674723, released under CC BY-SA 3.0)

https://github.com/datasets/commons

DataHub commons. Wiki catalog of interesting and important datasets

data datasets datasets-csv open-data open-datasets opendata

Last synced: 17 Jan 2026

https://github.com/epsilla-cloud/vectordb

Epsilla is a high performance Vector Database Management System. Try out hosted Epsilla at https://cloud.epsilla.com/

ai chatgpt data data-science database embeddings embeddings-similarity infrastructure llms machine-learning neural-network neural-search rag retrieval search-engine vector-database vector-search

Last synced: 15 May 2025

https://github.com/turicas/rows

A common, beautiful interface to tabular data, no matter the format

convert-data csv data data-science excel hacktoberfest python table tabular-data xls xlsx

Last synced: 14 May 2025

https://github.com/neumtry/neumai

Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.

ai chatgpt data data-engineering database embeddings etl llm llmops mlops ops pipeline python rag retrieval vector-database vectors

Last synced: 29 Oct 2025

https://github.com/arnaudmiribel/streamlit-extras

Discover, try, install and share Streamlit re-usable bits we call "extras"!

data python streamlit ui web

Last synced: 13 May 2025

https://github.com/jtkim-kaist/VAD

Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.

acam attention bdnn data dnn lstm speech speech-activity-detection speech-recognition vad voice-activity-detection voice-detection

Last synced: 07 May 2025

https://github.com/oleg-agapov/data-engineering-book

Accumulated knowledge and experience in the field of Data Engineering

data data-engineering engineering

Last synced: 15 Apr 2025

https://github.com/fsprojects/fsharp.data

F# Data: Library for Data Access

csv data fsharp html http json typeprovider worldbank xml

Last synced: 16 Dec 2025

https://github.com/fsprojects/FSharp.Data

F# Data: Library for Data Access

csv data fsharp html http json typeprovider worldbank xml

Last synced: 13 Mar 2025

https://github.com/NeumTry/NeumAI

Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.

ai chatgpt data data-engineering database embeddings etl llm llmops mlops ops pipeline python rag retrieval vector-database vectors

Last synced: 11 Apr 2025

https://github.com/datazenit/sensei-grid

Simple and lightweight data grid in JS/HTML

data data-table datagrid grid javascript jqery lodash underscore

Last synced: 22 Oct 2025

https://github.com/hatnote/listen-to-wikipedia

Live, generative music from Wikipedia edits

data sonification sound visualization wikipedia

Last synced: 15 Apr 2025

https://github.com/richox/orz

a high performance, general purpose data compressor written in the crab-lang

compression crab-lang data

Last synced: 15 May 2025

https://github.com/huggingface/dataset-viewer

Backend that powers the dataset viewer on Hugging Face dataset pages through a public API.

api-rest data datasets huggingface machine-learning nlp

Last synced: 14 Oct 2025

https://github.com/microsoft/MCW

Microsoft Cloud Workshop Project

ai apps azure data devops infra iot linux mcw sap

Last synced: 12 Apr 2025

https://github.com/bitbrain/pandora

Godot 4 addon for RPG data management such items, inventories, spells, mobs, quests and NPCs.

data godot godot4 godotengine management rpg

Last synced: 15 May 2025

https://github.com/kuwala-io/kuwala

Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times

admin-boundaries data data-integration data-science dbt elt google-trends jupyter kuwala no-code open-data open-source population postgres pyspark python react react-flow scraping spatial-analysis

Last synced: 30 Mar 2025

https://github.com/lakehq/sail

LakeSail's computation framework with a mission to unify batch processing, stream processing, and compute-intensive AI workloads.

arrow big-data data datafusion pyspark python rust spark sql

Last synced: 12 Jun 2025

https://github.com/Flowframe/laravel-trend

Generate trends for your models. Easily generate charts or reports.

charts data hacktoberfest laravel package php reports trends

Last synced: 29 Apr 2025

https://github.com/pbeshai/tidy

Tidy up your data with JavaScript, inspired by dplyr and the tidyverse

data dplyr tidyverse wrangling

Last synced: 15 May 2025

https://github.com/mdn/data

This repository contains general data for Web technologies

css data json json-data

Last synced: 17 Jun 2025

https://github.com/prismarinejs/minecraft-data

Language independent module providing minecraft data for minecraft clients, servers and libraries.

data minecraft

Last synced: 29 Jun 2025

https://pbeshai.github.io/tidy/

Tidy up your data with JavaScript, inspired by dplyr and the tidyverse

data dplyr tidyverse wrangling

Last synced: 04 Apr 2025

https://github.com/breck7/pldb

PLDB: a Programming Language DataBase

data knowledge-graph programming-languages

Last synced: 14 Apr 2025

https://github.com/akfamily/aktools

AKTools is an elegant and simple HTTP API library for AKShare, built for AKSharers!

akshare asyncio data data-science fastapi openapi pydanti

Last synced: 14 May 2025

https://github.com/pdpipe/pdpipe

Easy pipelines for pandas DataFrames.

data data-science dataframe dataframes pandas pandas-dataframe pipeline

Last synced: 04 Aug 2025

https://github.com/PrismarineJS/minecraft-data

Language independent module providing minecraft data for minecraft clients, servers and libraries.

data minecraft

Last synced: 26 Apr 2025

https://github.com/dformoso/sklearn-classification

Data Science Notebook on a Classification Task, using sklearn and Tensorflow.

classification-task data docker jupyter learning machine machine-learning notebook roc roc-curve science sklearn tensorflow

Last synced: 04 Apr 2025

https://github.com/octoproject/octo-cli

CLI tool to expose data from any database as a serverless web service.

api data database faas go knative octo-cli openfaas serverless

Last synced: 15 Mar 2025

https://github.com/jokecamp/FootballData

A hodgepodge of JSON and CSV Football/Soccer data

data football opendata soccer

Last synced: 12 Apr 2025

https://github.com/Azure-Samples/modern-data-warehouse-dataops

DataOps for Microsoft Data Platform technologies. https://aka.ms/dataops-repo

automatedtesting azure cicd data databricks datafactory dataops devops fabric

Last synced: 30 Jul 2025

https://github.com/pgflo/pg_flo

Stream, transform, and route PostgreSQL data in real-time.

data database etl go golang logical-replication postgres postgresql stream

Last synced: 15 May 2025

https://github.com/azure-samples/modern-data-warehouse-dataops

DataOps for Microsoft Data Platform technologies. https://aka.ms/dataops-repo

automatedtesting azure cicd data databricks datafactory dataops devops fabric

Last synced: 14 May 2025

https://github.com/squarespace/datasheets

Read data from, write data to, and modify the formatting of Google Sheets

data data-analytics data-science dataframe google pandas python

Last synced: 16 May 2025

https://github.com/Squarespace/datasheets

Read data from, write data to, and modify the formatting of Google Sheets

data data-analytics data-science dataframe google pandas python

Last synced: 15 Mar 2025

https://github.com/gesistsa/rio

🐟 A Swiss-Army Knife for Data I/O

cran csv csvy data data-science excel io r rio sas spss stata

Last synced: 12 Dec 2025

https://github.com/dleitee/valid.js

πŸ“ A library for data validation.

data valid validation

Last synced: 08 Apr 2025

https://github.com/foxglove/mcap

MCAP is a modular, performant, and serialization-agnostic container file format, useful for pub/sub and robotics applications.

cpp data deserialization golang python robotics serialization swift typescript

Last synced: 05 Jan 2026

https://github.com/ngneat/query

πŸš€ Powerful asynchronous state management, server-state utilities and data fetching for Angular Applications

async cache data fetch http pagination query stale-while-revalidate update

Last synced: 14 May 2025

https://github.com/diskframe/disk.frame

Fast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data

data data-science large-dataset manipulation-data medium-data r

Last synced: 16 Jun 2025

https://github.com/DiskFrame/disk.frame

Fast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data

data data-science large-dataset manipulation-data medium-data r

Last synced: 14 Mar 2025

https://github.com/xiaodaigh/disk.frame

Fast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data

data data-science large-dataset manipulation-data medium-data r

Last synced: 14 Mar 2025

https://github.com/essandess/isp-data-pollution

ISP Data Pollution to Protect Private Browsing History with Obfuscation

crawling data data-analytics obfuscation privacy privacy-enhancing-technologies web

Last synced: 29 Dec 2025

https://github.com/kkulma/climate-change-data

:earth_africa: A curated list of APIs, open data and ML/AI projects on climate change

climate climate-analysis climate-change climate-data data data-science datascience hacktoberfest python r resources rstats

Last synced: 04 Apr 2025

https://github.com/semarketir/quranjson

Quran JSON ~ 6236 verses, 114 surah, 30 Juz

audio data islam json juz moslem muslim quran quran-json religion surah tajweed tajwid translation

Last synced: 14 May 2025

https://github.com/nkzw-tech/fate

fate is a modern data client for React.

async data react

Last synced: 17 Jan 2026

https://github.com/capitalone/datacompy

Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!

compare dask data data-science dataframes fugue numpy pandas polars pyspark python snowflake snowpark spark

Last synced: 14 May 2025

https://github.com/randomfractals/vscode-data-preview

Data Preview 🈸 extension for importing πŸ“€ viewing πŸ”Ž slicing πŸ”ͺ dicing 🎲 charting πŸ“Š & exporting πŸ“₯ large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files

array arrow avro config csv data excel extension json parquet perspective viewer vscode yaml

Last synced: 04 Apr 2025

https://github.com/RandomFractals/vscode-data-preview

Data Preview 🈸 extension for importing πŸ“€ viewing πŸ”Ž slicing πŸ”ͺ dicing 🎲 charting πŸ“Š & exporting πŸ“₯ large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files

array arrow avro config csv data excel extension json parquet perspective viewer vscode yaml

Last synced: 08 Apr 2025

https://github.com/jetify-com/tyson

πŸ₯Š TypeScript as a Configuration Language. TySON stands for TypeScript Object Notation

config configuration configuration-language data json ts tson typescript tyson

Last synced: 15 May 2025

https://github.com/ivanjosipovic/blazortable

Blazor Table Component with Sorting, Paging and Filtering

aspnet aspnetcore blazor data filtering grid nuget paging sorting table

Last synced: 06 Oct 2025

https://github.com/z3z1ma/dbt-osmosis

Provides automated YAML management and a streamlit workbench. Designed to optimize dev workflows.

cli data dbt documentation editor modelling sql testing

Last synced: 14 May 2025

https://github.com/marcocesarato/react-native-big-list

This is a high performance list view for React Native with support for complex layouts using a similar FlatList usage to make easy the replacement. This list implementation for big list rendering on React Native works with a recycler focused on performance and memory usage and so it permits processing thousands items on the list.

android big data expo fast flatlist ios javascript js large list massive memory performance react react-native react-native-big-list sticky-headers web

Last synced: 15 May 2025

https://github.com/IvanJosipovic/BlazorTable

Blazor Table Component with Sorting, Paging and Filtering

aspnet aspnetcore blazor data filtering grid nuget paging sorting table

Last synced: 25 Mar 2025

https://github.com/wolf-table/table

A web-based(canvas) JavaScript Table

canvas data table typescript

Last synced: 16 May 2025

https://github.com/canner/wren-engine

πŸ€– The Semantic Engine for Model Context Protocol(MCP) Clients and AIΒ Agents πŸ”₯

agent agentic-ai ai business-intelligence data data-analysis data-analytics data-lake data-warehouse hacktoberfest llm mcp mcp-server semantic semantic-layer sql

Last synced: 22 Jan 2026

https://github.com/RunLLM/aqueduct

Aqueduct is no longer being maintained. Aqueduct allows you to run LLM and ML workloads on any cloud infrastructure.

ai data data-science kubernetes llm llms machine-learning ml ml-infrastructure ml-monitoring mlops orchestration python python3

Last synced: 18 Apr 2025

https://github.com/microsoft/Reactors

🌱 Join a community of developers at Microsoft Reactor and connect with people, skills, and technology to build your career or personal learning. We offer free livestreams, on-demand content, and hybrid/in-person events daily around the world. Access our projects and code here.

ai azure cloud data data-science devops dotnet events iot live-streaming low-code meetup mixed-reality ml no-code nodejs personal-de python web

Last synced: 05 May 2025

https://github.com/CympleTech/esse

Encrypted peer-to-peer IM for data security. Own data, own privacy. (Rust+Flutter)

cross-platform cryptography data data-security flutter p2p rust web3

Last synced: 26 Apr 2025

https://github.com/CympleTech/ESSE

Encrypted peer-to-peer IM for data security. Own data, own privacy. (Rust+Flutter)

cross-platform cryptography data data-security flutter p2p rust web3

Last synced: 27 Apr 2025

https://github.com/cympletech/esse

Encrypted peer-to-peer IM for data security. Own data, own privacy. (Rust+Flutter)

cross-platform cryptography data data-security flutter p2p rust web3

Last synced: 05 Apr 2025

https://github.com/juji-io/editscript

A library to diff and patch Clojure/ClojureScript data structures

algorithm clojure clojurescript-data data data-diffing data-structures diff editscript patch tree-diffing

Last synced: 14 May 2025

https://github.com/koordinates/kart

Distributed version-control for geospatial and tabular data

data data-versioning geospatial geospatial-data gis version-control

Last synced: 01 May 2025

https://github.com/xebia-functional/fetch

Simple & Efficient data access for Scala and Scala.js

cats concurrency data data-fetching monads monix parallelism scala scala-js sequencing

Last synced: 11 Jan 2026

https://github.com/shakedzy/dython

A set of data tools in Python

analysis correlation data modeling plot python roc

Last synced: 21 Oct 2025

https://github.com/nobrainr/morphism

⚑ Type-safe data transformer for JavaScript, TypeScript & Node.js.

array automapper data flow fp functional functors javascript js mapper morphism morphisms object parser typescript

Last synced: 12 Dec 2025

https://github.com/514-labs/moosestack

The developer framework for building analytics into your app on top of ClickHouse, Redpanda and other high-performance analytical infrastructure

analytics data dataengineering deployment framework insights metrics python rust typescript

Last synced: 20 Jan 2026

https://github.com/serpro69/kotlin-faker

Port of a popular ruby faker gem written in kotlin. Generate realistically looking fake data such as names, addresses, banking details, and many more, that can be used for testing and data anonymization purposes.

android android-development android-testing anonymisation anonymization anonymizer data faker faker-gem faker-generator faker-library faker-libs java jvm kotlin kotlin-faker kotlin-library test-automation testing testing-tools

Last synced: 15 May 2025

https://github.com/ozlerhakan/poiji

:candy: A library converting XLS and XLSX files to a list of Java objects based on Apache POI

apache apache-poi converter data deserialize excel java java-11 mapper mapping microsoft-excel parser performance poi poiji pojo unmarshall

Last synced: 02 Jan 2026

https://github.com/github/innovationgraph

GitHub Innovation Graph

data github open-data

Last synced: 15 May 2025

https://github.com/googleapis/python-bigquery-pandas

Google BigQuery connector for pandas

bigquery data pandas

Last synced: 05 Jan 2026

https://github.com/spotify/featran

A Scala feature transformation library for data science and machine learning

algebird breeze data flink ml scala scalding scio spark tensorflow xgboost

Last synced: 15 May 2025

https://github.com/paypal/data-contract-template

Template for a data contract used in a data mesh.

data data-contract data-engineering data-mesh

Last synced: 01 Mar 2025

https://github.com/Gmousse/dataframe-js

A javascript library providing a new data structure for datascientists and developpers

data data-frame dataframe datascience datastructures functional groupby javascript manipulation matrix sql sql-syntax

Last synced: 15 Mar 2025