An open API service indexing awesome lists of open source software.

data

Individual facts, statistics, or items of information, often numeric. In a technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects. (https://en.wikipedia.org/w/index.php?title=Data&oldid=1093674723, released under CC BY-SA 3.0)

https://github.com/malloydata/malloy

Malloy is an experimental language for describing data relationships and transformations.

data data-visualization database malloy semantic-modeling sql

Last synced: 13 May 2025

https://github.com/GSA/data

Assorted data from the General Services Administration.

data domains enterprise standards technology

Last synced: 28 Mar 2025

https://github.com/gsa/data

Assorted data from the General Services Administration.

data domains enterprise standards technology

Last synced: 23 Jun 2025

https://github.com/mara/mara-pipelines

A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow

data data-integration etl pipeline postgresql python

Last synced: 14 May 2025

https://github.com/meltano/meltano

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

connectors data data-engineering data-pipelines dataops dataops-platform elt extract-data integration loaders meltano meltano-sdk open-source opensource pipelines singer tap taps target targets

Last synced: 12 May 2025

https://github.com/onyx-platform/onyx

Distributed, masterless, high performance, fault tolerant data processing

batch clojure data distributed streaming

Last synced: 28 Sep 2025

https://github.com/rilldata/rill

Rill is a tool for effortlessly transforming data sets into powerful, opinionated dashboards using SQL. BI-as-code.

bi business-analytics csv data data-analysis data-visualization dataviz duckdb gcs golang parquet parquet-tools parquet-viewer s3 sql sql-editor svelte sveltejs sveltekit

Last synced: 07 Jan 2026

https://github.com/mahmoud/glom

☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️

apis cli data data-transformation declarative dictionaries nested-structures python recursion utilities

Last synced: 16 May 2025

https://github.com/keajs/kea

Batteries Included State Management for React

data framework kea react react-component redux redux-saga redux-thunk sagas

Last synced: 12 Jan 2026

https://github.com/mariusandra/kea

Batteries Included State Management for React

data framework kea react react-component redux redux-saga redux-thunk sagas

Last synced: 05 Apr 2025

https://github.com/baidu/tera

An Internet-Scale Database.

baidu bigtable c-plus-plus data database hbase nosql storage

Last synced: 15 May 2025

https://github.com/rilldata/rill-developer

Rill is a tool for effortlessly transforming data sets into powerful, opinionated dashboards using SQL. BI-as-code.

bi business-analytics csv data data-analysis data-visualization dataviz duckdb gcs golang parquet parquet-tools parquet-viewer s3 sql sql-editor svelte sveltejs sveltekit

Last synced: 08 Mar 2025

https://github.com/man-group/arcticdb

ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.

big-data data data-analysis data-science database dataframe pandas quantitative-analysis quantitative-finance quantitative-trading

Last synced: 13 May 2025

https://github.com/diffgram/diffgram

The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Catalog to get the most value out of your AI Data.

annotation annotation-tool annotations data data-analytics data-annotation data-science datasets datastore deep-learning image-annotation kubernetes labeling machine-learning training-data video-annotation

Last synced: 14 Mar 2025

https://github.com/brimsec/brim

Zui is a powerful desktop application for exploring and working with data. The official front-end to the Zed lake.

csv data data-analytics data-viz data-wrangling electron-app json-inspector keyword-search super-structured-data table-view type-system zed zng zq zui

Last synced: 25 Feb 2025

https://github.com/brimdata/zui

Zui is a powerful desktop application for exploring and working with data. The official front-end to the Zed lake.

csv data data-analytics data-viz data-wrangling electron-app json-inspector keyword-search super-structured-data table-view type-system zed zng zq zui

Last synced: 12 Jun 2025

https://github.com/thbar/kiba

Data processing & ETL framework for Ruby

data etl etl-ruby ruby rubydatascience

Last synced: 09 Apr 2025

https://github.com/deepnote/deepnote

Deepnote is a drop-in replacement for Jupyter with an AI-first design, sleek UI, new blocks, and native data integrations. Use Python, R, and SQL locally in your favorite IDE, then scale to Deepnote cloud for real-time collaboration, Deepnote agent, and deployable data apps. https://deepnote.com/

artificial-intelligence data data-analysis data-science data-visualization deepnote eda jupyter jupyterhub jupyterlab machine-learning notebooks python r sql

Last synced: 05 Jan 2026

https://github.com/spider-rs/spider

A web crawler and scraper for Rust

crawler data headless-chrome indexer rust scraping spider

Last synced: 02 Jan 2026

https://github.com/dformoso/deeplearning-mindmap

A mindmap summarising Deep Learning concepts.

cheatsheet data deep jupyter learning mindmap python science

Last synced: 16 May 2025

https://github.com/dataliterate/data-populator

A plugin for Sketch and Adobe XD to populate your design mockups with meaningful data. Goodbye Lorem Ipsum. Hello JSON.

adobe adobe-xd data data-populator design design-tool design-tools meaningful-data sketch sketchapp

Last synced: 15 May 2025

https://github.com/LazyAGI/LazyLLM

Easiest and laziest way for building multi-agent LLMs applications.

agents ai-agent data deep-learning documentation-tool finetuning framework knowlege-graph langchain lazyllm llamaindex llm llms rag

Last synced: 06 May 2025

https://github.com/data-engineering-community/data-engineering-wiki

The best place to learn data engineering. Built and maintained by the data engineering community.

data data-engineer data-engineering data-modeling data-pipelines database etl sql

Last synced: 14 May 2025

https://github.com/werneror/poetry

非常全的古诗词数据,收录了从先秦到现代的共计85万余首古诗词。

chinese chinese-poetry csv data poetry

Last synced: 15 May 2025

https://github.com/nerevu/riko

A Python stream processing engine modeled after Yahoo! Pipes

asynchronous cli data etl featured functional-programming library parallelism rss stream-processing

Last synced: 15 May 2025

https://github.com/getdozer/dozer

Dozer is a real-time data movement tool that leverages CDC from various sources and moves data into various sinks.

api apis clickhouse data datawarehouse debe etl low-code postgres realtime rust snowflake sql streaming

Last synced: 11 Apr 2025

https://github.com/Werneror/Poetry

非常全的古诗词数据,收录了从先秦到现代的共计85万余首古诗词。

chinese chinese-poetry csv data poetry

Last synced: 07 May 2025

https://github.com/ucbepic/docetl

A system for agentic LLM-powered data processing and ETL

agents data data-pipelines elt etl llm python workflow

Last synced: 12 Oct 2025

https://github.com/odota/core

Open source Dota 2 data platform

api data docker dota hacktoberfest javascript nodejs

Last synced: 11 Apr 2025

https://iddan.github.io/react-spreadsheet/

Simple, customizable yet performant spreadsheet for React

csv data excel react spreadsheet

Last synced: 20 Oct 2025

https://github.com/roboyoshi/datacurator-filetree

a standard filetree for /r/datacurator [ and r/datahoarder ]

classification data datastructures file-organization filetree template

Last synced: 23 Mar 2025

https://github.com/DataBrewery/cubes

[NOT MAINTAINED] Light-weight Python OLAP framework for multi-dimensional data analysis

cube data data-analysis data-warehouse multidimensional-analysis olap sql

Last synced: 26 Mar 2025

https://github.com/man-group/ArcticDB

ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.

big-data data data-analysis data-science database dataframe pandas quantitative-analysis quantitative-finance quantitative-trading

Last synced: 12 Mar 2025

https://github.com/pablolec/recoverpy

Interactively find and recover deleted or :point_right: overwritten :point_left: files from your terminal

cli console cybersecurity data data-recovery files forensics hacking linux macos pentesting python python3 recovery search search-interface terminal textual tool tui

Last synced: 05 Oct 2025

https://github.com/MaJerle/stm32-usart-uart-dma-rx-tx

STM32 examples for USART using DMA for efficient RX and TX transmission

bluepill buff buffer circular data dma dma-mode dma-tc receive ring ringbuff stm32 usart

Last synced: 17 Apr 2025

https://github.com/majerle/stm32-usart-uart-dma-rx-tx

STM32 examples for USART using DMA for efficient RX and TX transmission

bluepill buff buffer circular data dma dma-mode dma-tc receive ring ringbuff stm32 usart

Last synced: 12 Apr 2025

https://github.com/natescarlet/holiday-cn

📅🇨🇳中国法定节假日数据 自动每日抓取国务院公告

china crawling data holiday natural-language-processing

Last synced: 14 May 2025

https://github.com/lazyagi/lazyllm

Easiest and laziest way for building multi-agent LLMs applications.

agents ai-agent data deep-learning documentation-tool finetuning framework knowlege-graph langchain lazyllm llamaindex llm llms rag

Last synced: 16 Jan 2026

https://github.com/iddan/react-spreadsheet

Simple, customizable yet performant spreadsheet for React

csv data excel react spreadsheet

Last synced: 13 May 2025

https://github.com/Litlyx/litlyx

Powerful Analytics Solution. Setup in 30 seconds. Display all your data on a Simple, AI-powered dashboard. Fully self-hostable and GDPR compliant. Alternative to Google Analytics, MixPanel, Plausible, Umami & Matomo.

ai analytics angular charts data data-analysis data-visualization javascript metrics nextjs nodejs nuxt open-source react statistics typescript vue website

Last synced: 25 Aug 2025

https://github.com/NateScarlet/holiday-cn

📅🇨🇳中国法定节假日数据 自动每日抓取国务院公告

china crawling data holiday natural-language-processing

Last synced: 26 Mar 2025

https://github.com/pyjanitor-devs/pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor

cleaning-data data data-engineering dataframe hacktoberfest pandas pydata

Last synced: 02 Jan 2026

https://github.com/jldbc/pybaseball

Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)

baseball data python sabermetrics statcast

Last synced: 14 May 2025

https://github.com/uwdata/arquero

Query processing and transformation of array-backed data tables.

arrays data database dataframe query table transform

Last synced: 13 May 2025

https://github.com/quiltdata/quilt

Quilt is a data mesh for connecting people with actionable data

data data-engineering data-version-control data-versioning parquet python serialization

Last synced: 13 May 2025

https://github.com/koaning/drawdata

Draw datasets from within Python notebooks.

data drawdata jupyter

Last synced: 07 Jan 2026

https://github.com/lotus-data/lotus

Use LOTUS to process all of your datasets with LLMs and embeddings. Enjoy up to 1000x speedups with fast, accurate query processing, that's as simple as writing Pandas code

ai-data-processing data llm llm-data-processing llm-document-processing pandas python semantic-operators semantic-search unstructured-data

Last synced: 19 Oct 2025

https://github.com/SheetJS/js-word

:black_nib: Word Processing Document Library

data doc docx word xml

Last synced: 01 Apr 2025

https://github.com/flipkart-incubator/proteus

Proteus : A JSON based LayoutInflater for Android

android binding data data-binding dynamic-layout functions java json layout-engine proteus

Last synced: 16 May 2025

https://github.com/sheetjs/js-word

:black_nib: Word Processing Document Library

data doc docx word xml

Last synced: 24 Feb 2025

https://github.com/terriajs/terriajs

A library for building rich, web-based geospatial 2D & 3D data platforms.

3d-globe catalog cesium cesiumjs charts czml data javascript leaflet leafletjs terriajs webgl wms wps

Last synced: 13 May 2025

https://github.com/pomber/covid19

JSON time-series of coronavirus cases (confirmed, deaths and recovered) per country - updated daily

2019-ncov api coronavirus covid-19 data dataset json time-series

Last synced: 15 May 2025

https://github.com/TerriaJS/terriajs

A library for building rich, web-based geospatial data platforms.

3d-globe catalog cesium cesiumjs charts czml data javascript leaflet leafletjs terriajs webgl wms wps

Last synced: 14 Mar 2025

https://github.com/uber-archive/AthenaX

SQL-based streaming analytics platform at scale

analytics calcite data flink sql stream streaming uber

Last synced: 27 Mar 2025

https://github.com/uber-archive/athenax

SQL-based streaming analytics platform at scale

analytics calcite data flink sql stream streaming uber

Last synced: 27 Sep 2025

https://github.com/dbt-labs/metricflow

MetricFlow allows you to define, build, and maintain metrics in code.

analytics business-intelligence data data-modeling metrics pypi semantic-layer

Last synced: 13 May 2025

https://github.com/chartshq/muze

Composable data visualisation library for web with a data-first approach now powered by WebAssembly

area-chart barchart charts crosstab data data-visualization data-viz html5-charts interactive-charts javascript js-charts linechart pie-chart splom svg visualization wasm web webassembly

Last synced: 28 Sep 2025

https://github.com/projectnessie/nessie

Nessie: Transactional Catalog for Data Lakes with Git-like semantics

aws-lambda data git iceberg java spark

Last synced: 13 May 2025

https://github.com/PabloLec/RecoverPy

Interactively find and recover deleted or :point_right: overwritten :point_left: files from your terminal

cli console cybersecurity data data-recovery files forensics hacking linux macos pentesting python python3 recovery search search-interface terminal textual tool tui

Last synced: 24 Mar 2025

https://github.com/Shopify/maintenance_tasks

A Rails engine for queueing and managing data migrations.

backfill data migration rails ruby

Last synced: 16 Jul 2025

https://github.com/teomewhy/teomerefs

Guia de referências técnicas para carreira em dados

data data-science machine-learning python

Last synced: 14 May 2025

https://github.com/shopify/maintenance_tasks

A Rails engine for queueing and managing data migrations.

backfill data migration rails ruby

Last synced: 13 May 2025

https://github.com/robustmq/robustmq

New generation of cloud-native and AI-native messaging infrastructure.

activemq amqp data http infra kafka message message-queue middleware mq mqtt mqtt-broker queue rabbitmq robustmq rocketmq rust serverless storage streaming

Last synced: 14 Sep 2025

https://github.com/litlyx/litlyx

Powerful Analytics Solution. Setup in 30 seconds. Display all your data on a Simple, AI-powered dashboard. Fully self-hostable and GDPR compliant. Alternative to Google Analytics, MixPanel, Plausible, Umami & Matomo.

ai analytics angular charts data data-analysis data-visualization javascript metrics nextjs nodejs nuxt open-source react statistics typescript vue website

Last synced: 14 May 2025

https://github.com/odota/web

React web interface for the OpenDota platform

data dota hacktoberfest javascript react redux ui visualization webpack

Last synced: 12 Apr 2025

https://github.com/cocoindex-io/cocoindex

ETL framework to turn your data AI-ready - with realtime incremental updates and support custom logic like lego.

ai change-data-capture data data-engineering data-indexing data-infrastructure data-processing dataflow etl help-wanted indexing knowledge-graph llm pipeline python rag real-time rust semantic-search streaming

Last synced: 17 Jan 2026

https://github.com/hibuz/dev-conf-replay

🍀 최근 국내 IT 세미나 및 개발자💻 컨퍼런스 영상의 다시 보기👀 링크를 한곳에 정리했습니다!

ai blockchain cloud coding conference data developer devops docs it korean meeup mobile mobility opensource programming readme replay summit tech

Last synced: 24 Mar 2025

https://github.com/TeoMeWhy/teomerefs

Guia de referências técnicas para carreira em dados

data data-science machine-learning python

Last synced: 25 Mar 2025

https://github.com/lhotse-speech/lhotse

Tools for handling multimodal data in machine learning projects.

ai audio data deep-learning kaldi machine-learning python pytorch speech speech-recognition

Last synced: 16 Nov 2025

https://github.com/uwdata/mosaic

An extensible framework for linking databases and interactive views.

data duckdb interaction scalability visualization

Last synced: 14 May 2025

https://github.com/sghall/resonance

:black_medium_small_square:Resonance | 5kb React animation library

animation charts d3 data data-driven-transitions graph react svg visualization

Last synced: 09 Apr 2025

https://github.com/turicas/brasil.io

Backend do Brasil.IO (para código dos scripts de coleta de dados, veja o link na página de cada dataset)

brasil brazil dados-abertos data hacktoberfest opendata python

Last synced: 14 May 2025

https://github.com/dcmoura/spyql

Query data on the command line with SQL-like SELECTs powered by Python expressions

command-line csv data json python sql text

Last synced: 21 Oct 2025