An open API service indexing awesome lists of open source software.

data

Individual facts, statistics, or items of information, often numeric. In a technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects. (https://en.wikipedia.org/w/index.php?title=Data&oldid=1093674723, released under CC BY-SA 3.0)

https://github.com/man-group/arcticdb

ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.

big-data data data-analysis data-science database dataframe pandas quantitative-analysis quantitative-finance quantitative-trading

Last synced: 04 May 2026

https://github.com/apache/gobblin

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

apache data ingestion management replication

Last synced: 29 Apr 2025

https://github.com/TigerResearch/TigerBot

TigerBot: A multi-language multi-task LLM

chinese data llama2 llm nlp

Last synced: 04 Apr 2025

https://github.com/tigerresearch/tigerbot

TigerBot: A multi-language multi-task LLM

chinese data llama2 llm nlp

Last synced: 14 Apr 2025

https://github.com/gsa/data

Assorted data from the General Services Administration.

data domains enterprise standards technology

Last synced: 02 Feb 2026

https://github.com/GSA/data

Assorted data from the General Services Administration.

data domains enterprise standards technology

Last synced: 28 Mar 2025

https://github.com/mara/mara-pipelines

A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow

data data-integration etl pipeline postgresql python

Last synced: 14 May 2025

https://github.com/meltano/meltano

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

connectors data data-engineering data-pipelines dataops dataops-platform elt extract-data integration loaders meltano meltano-sdk open-source opensource pipelines singer tap taps target targets

Last synced: 03 Feb 2026

https://github.com/onyx-platform/onyx

Distributed, masterless, high performance, fault tolerant data processing

batch clojure data distributed streaming

Last synced: 28 Sep 2025

https://github.com/mahmoud/glom

☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️

apis cli data data-transformation declarative dictionaries nested-structures python recursion utilities

Last synced: 16 May 2025

https://github.com/keajs/kea

Batteries Included State Management for React

data framework kea react react-component redux redux-saga redux-thunk sagas

Last synced: 12 Jan 2026

https://github.com/mariusandra/kea

Batteries Included State Management for React

data framework kea react react-component redux redux-saga redux-thunk sagas

Last synced: 05 Apr 2025

https://github.com/baidu/tera

An Internet-Scale Database.

baidu bigtable c-plus-plus data database hbase nosql storage

Last synced: 15 May 2025

https://github.com/rilldata/rill-developer

Rill is a tool for effortlessly transforming data sets into powerful, opinionated dashboards using SQL. BI-as-code.

bi business-analytics csv data data-analysis data-visualization dataviz duckdb gcs golang parquet parquet-tools parquet-viewer s3 sql sql-editor svelte sveltejs sveltekit

Last synced: 08 Mar 2025

https://github.com/diffgram/diffgram

The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Catalog to get the most value out of your AI Data.

annotation annotation-tool annotations data data-analytics data-annotation data-science datasets datastore deep-learning image-annotation kubernetes labeling machine-learning training-data video-annotation

Last synced: 14 Mar 2025

https://github.com/brimdata/zui

Zui is a powerful desktop application for exploring and working with data. The official front-end to the Zed lake.

csv data data-analytics data-viz data-wrangling electron-app json-inspector keyword-search super-structured-data table-view type-system zed zng zq zui

Last synced: 12 Jun 2025

https://github.com/brimsec/brim

Zui is a powerful desktop application for exploring and working with data. The official front-end to the Zed lake.

csv data data-analytics data-viz data-wrangling electron-app json-inspector keyword-search super-structured-data table-view type-system zed zng zq zui

Last synced: 25 Feb 2025

https://github.com/thbar/kiba

Data processing & ETL framework for Ruby

data etl etl-ruby ruby rubydatascience

Last synced: 09 Apr 2025

https://github.com/dformoso/deeplearning-mindmap

A mindmap summarising Deep Learning concepts.

cheatsheet data deep jupyter learning mindmap python science

Last synced: 16 May 2025

https://github.com/dataliterate/data-populator

A plugin for Sketch and Adobe XD to populate your design mockups with meaningful data. Goodbye Lorem Ipsum. Hello JSON.

adobe adobe-xd data data-populator design design-tool design-tools meaningful-data sketch sketchapp

Last synced: 15 May 2025

https://github.com/LazyAGI/LazyLLM

Easiest and laziest way for building multi-agent LLMs applications.

agents ai-agent data deep-learning documentation-tool finetuning framework knowlege-graph langchain lazyllm llamaindex llm llms rag

Last synced: 06 May 2025

https://github.com/data-engineering-community/data-engineering-wiki

The best place to learn data engineering. Built and maintained by the data engineering community.

data data-engineer data-engineering data-modeling data-pipelines database etl sql

Last synced: 14 May 2025

https://github.com/werneror/poetry

非常全的古诗词数据,收录了从先秦到现代的共计85万余首古诗词。

chinese chinese-poetry csv data poetry

Last synced: 15 May 2025

https://github.com/nerevu/riko

A Python stream processing engine modeled after Yahoo! Pipes

asynchronous cli data etl featured functional-programming library parallelism rss stream-processing

Last synced: 15 May 2025

https://github.com/getdozer/dozer

Dozer is a real-time data movement tool that leverages CDC from various sources and moves data into various sinks.

api apis clickhouse data datawarehouse debe etl low-code postgres realtime rust snowflake sql streaming

Last synced: 11 Apr 2025

https://github.com/Werneror/Poetry

非常全的古诗词数据,收录了从先秦到现代的共计85万余首古诗词。

chinese chinese-poetry csv data poetry

Last synced: 07 May 2025

https://github.com/ucbepic/docetl

A system for agentic LLM-powered data processing and ETL

agents data data-pipelines elt etl llm python workflow

Last synced: 12 Oct 2025

https://github.com/odota/core

Open source Dota 2 data platform

api data docker dota hacktoberfest javascript nodejs

Last synced: 11 Apr 2025

https://iddan.github.io/react-spreadsheet/

Simple, customizable yet performant spreadsheet for React

csv data excel react spreadsheet

Last synced: 20 Oct 2025

https://github.com/roboyoshi/datacurator-filetree

a standard filetree for /r/datacurator [ and r/datahoarder ]

classification data datastructures file-organization filetree template

Last synced: 27 Jan 2026

https://github.com/robustmq/robustmq

Next-generation unified communication infrastructure for AI, IoT, and big data

ai amqp data infra kafka message message-queue middleware mq mqtt mqtt-broker queue rust serverless storage streaming

Last synced: 01 Apr 2026

https://github.com/DataBrewery/cubes

[NOT MAINTAINED] Light-weight Python OLAP framework for multi-dimensional data analysis

cube data data-analysis data-warehouse multidimensional-analysis olap sql

Last synced: 26 Mar 2025

https://github.com/man-group/ArcticDB

ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.

big-data data data-analysis data-science database dataframe pandas quantitative-analysis quantitative-finance quantitative-trading

Last synced: 12 Mar 2025

https://github.com/pyjanitor-devs/pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor

cleaning-data data data-engineering dataframe hacktoberfest pandas pydata

Last synced: 18 Feb 2026

https://github.com/pablolec/recoverpy

Interactively find and recover deleted or :point_right: overwritten :point_left: files from your terminal

cli console cybersecurity data data-recovery files forensics hacking linux macos pentesting python python3 recovery search search-interface terminal textual tool tui

Last synced: 05 Oct 2025

https://github.com/MaJerle/stm32-usart-uart-dma-rx-tx

STM32 examples for USART using DMA for efficient RX and TX transmission

bluepill buff buffer circular data dma dma-mode dma-tc receive ring ringbuff stm32 usart

Last synced: 17 Apr 2025

https://github.com/majerle/stm32-usart-uart-dma-rx-tx

STM32 examples for USART using DMA for efficient RX and TX transmission

bluepill buff buffer circular data dma dma-mode dma-tc receive ring ringbuff stm32 usart

Last synced: 12 Apr 2025

https://github.com/natescarlet/holiday-cn

📅🇨🇳中国法定节假日数据 自动每日抓取国务院公告

china crawling data holiday natural-language-processing

Last synced: 14 May 2025

https://github.com/lazyagi/lazyllm

Easiest and laziest way for building multi-agent LLMs applications.

agents ai-agent data deep-learning documentation-tool finetuning framework knowlege-graph langchain lazyllm llamaindex llm llms rag

Last synced: 26 Jan 2026

https://github.com/iddan/react-spreadsheet

Simple, customizable yet performant spreadsheet for React

csv data excel react spreadsheet

Last synced: 13 May 2025

https://github.com/Litlyx/litlyx

Powerful Analytics Solution. Setup in 30 seconds. Display all your data on a Simple, AI-powered dashboard. Fully self-hostable and GDPR compliant. Alternative to Google Analytics, MixPanel, Plausible, Umami & Matomo.

ai analytics angular charts data data-analysis data-visualization javascript metrics nextjs nodejs nuxt open-source react statistics typescript vue website

Last synced: 25 Aug 2025

https://github.com/NateScarlet/holiday-cn

📅🇨🇳中国法定节假日数据 自动每日抓取国务院公告

china crawling data holiday natural-language-processing

Last synced: 26 Mar 2025

https://github.com/jldbc/pybaseball

Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)

baseball data python sabermetrics statcast

Last synced: 14 May 2025

https://github.com/uwdata/arquero

Query processing and transformation of array-backed data tables.

arrays data database dataframe query table transform

Last synced: 13 May 2025

https://github.com/quiltdata/quilt

Quilt is a data mesh for connecting people with actionable data

data data-engineering data-version-control data-versioning parquet python serialization

Last synced: 13 May 2025

https://github.com/koaning/drawdata

Draw datasets from within Python notebooks.

data drawdata jupyter

Last synced: 07 Jan 2026

https://github.com/lotus-data/lotus

Use LOTUS to process all of your datasets with LLMs and embeddings. Enjoy up to 1000x speedups with fast, accurate query processing, that's as simple as writing Pandas code

ai-data-processing data llm llm-data-processing llm-document-processing pandas python semantic-operators semantic-search unstructured-data

Last synced: 19 Oct 2025

https://github.com/flipkart-incubator/proteus

Proteus : A JSON based LayoutInflater for Android

android binding data data-binding dynamic-layout functions java json layout-engine proteus

Last synced: 16 May 2025

https://github.com/sheetjs/js-word

:black_nib: Word Processing Document Library

data doc docx word xml

Last synced: 27 Jan 2026

https://github.com/SheetJS/js-word

:black_nib: Word Processing Document Library

data doc docx word xml

Last synced: 01 Apr 2025

https://github.com/lakehq/sail

LakeSail's computation framework with a mission to unify batch processing, stream processing, and compute-intensive AI workloads.

arrow artificial-intelligence big-data data data-engineering datafusion distributed-computing machine-learning pyspark python rust spark sql

Last synced: 14 Apr 2026

https://github.com/uwdata/mosaic

An extensible framework for linking databases and interactive views.

data duckdb interaction scalability visualization

Last synced: 02 May 2026

https://github.com/terriajs/terriajs

A library for building rich, web-based geospatial 2D & 3D data platforms.

3d-globe catalog cesium cesiumjs charts czml data javascript leaflet leafletjs terriajs webgl wms wps

Last synced: 13 May 2025

https://github.com/pomber/covid19

JSON time-series of coronavirus cases (confirmed, deaths and recovered) per country - updated daily

2019-ncov api coronavirus covid-19 data dataset json time-series

Last synced: 15 May 2025

https://github.com/TerriaJS/terriajs

A library for building rich, web-based geospatial data platforms.

3d-globe catalog cesium cesiumjs charts czml data javascript leaflet leafletjs terriajs webgl wms wps

Last synced: 14 Mar 2025

https://github.com/uber-archive/athenax

SQL-based streaming analytics platform at scale

analytics calcite data flink sql stream streaming uber

Last synced: 27 Sep 2025

https://github.com/uber-archive/AthenaX

SQL-based streaming analytics platform at scale

analytics calcite data flink sql stream streaming uber

Last synced: 27 Mar 2025

https://github.com/dbt-labs/metricflow

MetricFlow allows you to define, build, and maintain metrics in code.

analytics business-intelligence data data-modeling metrics pypi semantic-layer

Last synced: 13 May 2025

https://github.com/chartshq/muze

Composable data visualisation library for web with a data-first approach now powered by WebAssembly

area-chart barchart charts crosstab data data-visualization data-viz html5-charts interactive-charts javascript js-charts linechart pie-chart splom svg visualization wasm web webassembly

Last synced: 28 Sep 2025

https://github.com/projectnessie/nessie

Nessie: Transactional Catalog for Data Lakes with Git-like semantics

aws-lambda data git iceberg java spark

Last synced: 04 Feb 2026

https://github.com/PabloLec/RecoverPy

Interactively find and recover deleted or :point_right: overwritten :point_left: files from your terminal

cli console cybersecurity data data-recovery files forensics hacking linux macos pentesting python python3 recovery search search-interface terminal textual tool tui

Last synced: 24 Mar 2025

https://github.com/Shopify/maintenance_tasks

A Rails engine for queueing and managing data migrations.

backfill data migration rails ruby

Last synced: 16 Jul 2025

https://github.com/teomewhy/teomerefs

Guia de referências técnicas para carreira em dados

data data-science machine-learning python

Last synced: 14 May 2025

https://github.com/shopify/maintenance_tasks

A Rails engine for queueing and managing data migrations.

backfill data migration rails ruby

Last synced: 13 May 2025

https://github.com/lhotse-speech/lhotse

Tools for handling multimodal data in machine learning projects.

ai audio data deep-learning kaldi machine-learning python pytorch speech speech-recognition

Last synced: 20 Apr 2026

https://github.com/litlyx/litlyx

Powerful Analytics Solution. Setup in 30 seconds. Display all your data on a Simple, AI-powered dashboard. Fully self-hostable and GDPR compliant. Alternative to Google Analytics, MixPanel, Plausible, Umami & Matomo.

ai analytics angular charts data data-analysis data-visualization javascript metrics nextjs nodejs nuxt open-source react statistics typescript vue website

Last synced: 14 May 2025

https://github.com/odota/web

React web interface for the OpenDota platform

data dota hacktoberfest javascript react redux ui visualization webpack

Last synced: 12 Apr 2025

https://github.com/taleshape-com/shaper

Visualize and share your data. All in SQL. Powered by DuckDB.

analytics dashboards data duckdb

Last synced: 21 Apr 2026

https://github.com/hibuz/dev-conf-replay

🍀 최근 국내 IT 세미나 및 개발자💻 컨퍼런스 영상의 다시 보기👀 링크를 한곳에 정리했습니다!

ai blockchain cloud coding conference data developer devops docs it korean meeup mobile mobility opensource programming readme replay summit tech

Last synced: 26 Feb 2026

https://github.com/TeoMeWhy/teomerefs

Guia de referências técnicas para carreira em dados

data data-science machine-learning python

Last synced: 25 Mar 2025

https://github.com/sghall/resonance

:black_medium_small_square:Resonance | 5kb React animation library

animation charts d3 data data-driven-transitions graph react svg visualization

Last synced: 09 Apr 2025