data
Individual facts, statistics, or items of information, often numeric. In a technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects. (https://en.wikipedia.org/w/index.php?title=Data&oldid=1093674723, released under CC BY-SA 3.0)
- GitHub: https://github.com/topics/data
- Wikipedia: https://en.wikipedia.org/wiki/Data
- Related Topics: datum,
- Last updated: 2026-01-21 00:07:59 UTC
- JSON Representation
https://github.com/malloydata/malloy
Malloy is an experimental language for describing data relationships and transformations.
data data-visualization database malloy semantic-modeling sql
Last synced: 13 May 2025
https://github.com/GSA/data
Assorted data from the General Services Administration.
data domains enterprise standards technology
Last synced: 28 Mar 2025
https://github.com/gsa/data
Assorted data from the General Services Administration.
data domains enterprise standards technology
Last synced: 23 Jun 2025
https://github.com/mara/mara-pipelines
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
data data-integration etl pipeline postgresql python
Last synced: 14 May 2025
https://github.com/meltano/meltano
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
connectors data data-engineering data-pipelines dataops dataops-platform elt extract-data integration loaders meltano meltano-sdk open-source opensource pipelines singer tap taps target targets
Last synced: 12 May 2025
https://github.com/onyx-platform/onyx
Distributed, masterless, high performance, fault tolerant data processing
batch clojure data distributed streaming
Last synced: 28 Sep 2025
https://github.com/rilldata/rill
Rill is a tool for effortlessly transforming data sets into powerful, opinionated dashboards using SQL. BI-as-code.
bi business-analytics csv data data-analysis data-visualization dataviz duckdb gcs golang parquet parquet-tools parquet-viewer s3 sql sql-editor svelte sveltejs sveltekit
Last synced: 07 Jan 2026
https://github.com/mahmoud/glom
☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
apis cli data data-transformation declarative dictionaries nested-structures python recursion utilities
Last synced: 16 May 2025
https://github.com/pretzelai/pretzelai
The modern replacement for Jupyter Notebooks
analytics artificial-intelligence business-intelligence businessintelligence dashboard data data-analysis data-analytics data-science data-visualization duckdb notebooks open-source prql reporting sql sql-editor sql-editor-online visualization wasm
Last synced: 14 May 2025
https://github.com/keajs/kea
Batteries Included State Management for React
data framework kea react react-component redux redux-saga redux-thunk sagas
Last synced: 12 Jan 2026
https://github.com/mariusandra/kea
Batteries Included State Management for React
data framework kea react react-component redux redux-saga redux-thunk sagas
Last synced: 05 Apr 2025
https://github.com/baidu/tera
An Internet-Scale Database.
baidu bigtable c-plus-plus data database hbase nosql storage
Last synced: 15 May 2025
https://github.com/rilldata/rill-developer
Rill is a tool for effortlessly transforming data sets into powerful, opinionated dashboards using SQL. BI-as-code.
bi business-analytics csv data data-analysis data-visualization dataviz duckdb gcs golang parquet parquet-tools parquet-viewer s3 sql sql-editor svelte sveltejs sveltekit
Last synced: 08 Mar 2025
https://github.com/man-group/arcticdb
ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.
big-data data data-analysis data-science database dataframe pandas quantitative-analysis quantitative-finance quantitative-trading
Last synced: 13 May 2025
https://github.com/diffgram/diffgram
The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Catalog to get the most value out of your AI Data.
annotation annotation-tool annotations data data-analytics data-annotation data-science datasets datastore deep-learning image-annotation kubernetes labeling machine-learning training-data video-annotation
Last synced: 14 Mar 2025
https://github.com/jim-schwoebel/voice_datasets
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
audio-dataset audio-datasets data dataset datasets noise voice voice-activity-detection voice-assistant voice-chat voice-commands voice-computing voice-control voice-conversion voice-dataset voice-datasets voice-recognition voice-synthesis
Last synced: 26 Mar 2025
https://github.com/brimsec/brim
Zui is a powerful desktop application for exploring and working with data. The official front-end to the Zed lake.
csv data data-analytics data-viz data-wrangling electron-app json-inspector keyword-search super-structured-data table-view type-system zed zng zq zui
Last synced: 25 Feb 2025
https://github.com/brimdata/zui
Zui is a powerful desktop application for exploring and working with data. The official front-end to the Zed lake.
csv data data-analytics data-viz data-wrangling electron-app json-inspector keyword-search super-structured-data table-view type-system zed zng zq zui
Last synced: 12 Jun 2025
https://github.com/juliadata/dataframes.jl
In-memory tabular data in Julia
data data-frame dataframes datasets hacktoberfest julia tabular-data
Last synced: 13 May 2025
https://github.com/JuliaData/DataFrames.jl
In-memory tabular data in Julia
data data-frame dataframes datasets hacktoberfest julia tabular-data
Last synced: 11 Apr 2025
https://github.com/thbar/kiba
Data processing & ETL framework for Ruby
data etl etl-ruby ruby rubydatascience
Last synced: 09 Apr 2025
https://github.com/deepnote/deepnote
Deepnote is a drop-in replacement for Jupyter with an AI-first design, sleek UI, new blocks, and native data integrations. Use Python, R, and SQL locally in your favorite IDE, then scale to Deepnote cloud for real-time collaboration, Deepnote agent, and deployable data apps. https://deepnote.com/
artificial-intelligence data data-analysis data-science data-visualization deepnote eda jupyter jupyterhub jupyterlab machine-learning notebooks python r sql
Last synced: 05 Jan 2026
https://github.com/spider-rs/spider
A web crawler and scraper for Rust
crawler data headless-chrome indexer rust scraping spider
Last synced: 02 Jan 2026
https://github.com/dformoso/deeplearning-mindmap
A mindmap summarising Deep Learning concepts.
cheatsheet data deep jupyter learning mindmap python science
Last synced: 16 May 2025
https://github.com/dataliterate/data-populator
A plugin for Sketch and Adobe XD to populate your design mockups with meaningful data. Goodbye Lorem Ipsum. Hello JSON.
adobe adobe-xd data data-populator design design-tool design-tools meaningful-data sketch sketchapp
Last synced: 15 May 2025
https://github.com/LazyAGI/LazyLLM
Easiest and laziest way for building multi-agent LLMs applications.
agents ai-agent data deep-learning documentation-tool finetuning framework knowlege-graph langchain lazyllm llamaindex llm llms rag
Last synced: 06 May 2025
https://github.com/data-engineering-community/data-engineering-wiki
The best place to learn data engineering. Built and maintained by the data engineering community.
data data-engineer data-engineering data-modeling data-pipelines database etl sql
Last synced: 14 May 2025
https://github.com/werneror/poetry
非常全的古诗词数据,收录了从先秦到现代的共计85万余首古诗词。
chinese chinese-poetry csv data poetry
Last synced: 15 May 2025
https://github.com/nerevu/riko
A Python stream processing engine modeled after Yahoo! Pipes
asynchronous cli data etl featured functional-programming library parallelism rss stream-processing
Last synced: 15 May 2025
https://github.com/san089/Udacity-Data-Engineering-Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
airflow airflow-operators aws aws-ec2 aws-s3 aws-sdk cassandra cassandra-database cloudformation cluster data data-engineering data-engineering-pipeline data-lake data-modeling data-warehouse etl-pipeline infrastructure postgres postgresql-database
Last synced: 15 Apr 2025
https://github.com/san089/udacity-data-engineering-projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
airflow airflow-operators aws aws-ec2 aws-s3 aws-sdk cassandra cassandra-database cloudformation cluster data data-engineering data-engineering-pipeline data-lake data-modeling data-warehouse etl-pipeline infrastructure postgres postgresql-database
Last synced: 08 Apr 2025
https://github.com/kantord/just-dashboard
:bar_chart: :clipboard: Dashboards using YAML or JSON files
big-data business-intelligence chart csv d3 d3js dashboard data data-driven data-engineering data-science data-visualization gist github-gist json just-dashboard yaml
Last synced: 15 May 2025
https://github.com/getdozer/dozer
Dozer is a real-time data movement tool that leverages CDC from various sources and moves data into various sinks.
api apis clickhouse data datawarehouse debe etl low-code postgres realtime rust snowflake sql streaming
Last synced: 11 Apr 2025
https://github.com/Werneror/Poetry
非常全的古诗词数据,收录了从先秦到现代的共计85万余首古诗词。
chinese chinese-poetry csv data poetry
Last synced: 07 May 2025
https://github.com/ucbepic/docetl
A system for agentic LLM-powered data processing and ETL
agents data data-pipelines elt etl llm python workflow
Last synced: 12 Oct 2025
https://github.com/greyblake/nutype
Rust newtype with guarantees 🇺🇦 🦀
data data-structures invariance invariant invariants macro macros newtype rust rust-lang rust-library sanitization sanitizer typesafety validation validator web
Last synced: 23 Apr 2025
https://github.com/odota/core
Open source Dota 2 data platform
api data docker dota hacktoberfest javascript nodejs
Last synced: 11 Apr 2025
https://iddan.github.io/react-spreadsheet/
Simple, customizable yet performant spreadsheet for React
csv data excel react spreadsheet
Last synced: 20 Oct 2025
https://github.com/tanu-n-prabhu/python
This repository helps you understand python from the scratch.
data dataanalysis datascraping google-colab google-colab-notebook jupyter-notebook machine-learning numpy numpy-arrays pandas-dataframe prediction python python-3 python3
Last synced: 14 May 2025
https://github.com/roboyoshi/datacurator-filetree
a standard filetree for /r/datacurator [ and r/datahoarder ]
classification data datastructures file-organization filetree template
Last synced: 23 Mar 2025
https://github.com/DataBrewery/cubes
[NOT MAINTAINED] Light-weight Python OLAP framework for multi-dimensional data analysis
cube data data-analysis data-warehouse multidimensional-analysis olap sql
Last synced: 26 Mar 2025
https://github.com/man-group/ArcticDB
ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.
big-data data data-analysis data-science database dataframe pandas quantitative-analysis quantitative-finance quantitative-trading
Last synced: 12 Mar 2025
https://github.com/pablolec/recoverpy
Interactively find and recover deleted or :point_right: overwritten :point_left: files from your terminal
cli console cybersecurity data data-recovery files forensics hacking linux macos pentesting python python3 recovery search search-interface terminal textual tool tui
Last synced: 05 Oct 2025
https://github.com/sepandhaghighi/pycm
Multi-class confusion matrix library in Python
accuracy ai artificial-intelligence classification confusion-matrix data data-analysis data-mining data-science deep-learning deeplearning evaluation machine-learning mathematics matrix ml multiclass-classification neural-network statistical-analysis statistics
Last synced: 13 May 2025
https://github.com/natescarlet/holiday-cn
📅🇨🇳中国法定节假日数据 自动每日抓取国务院公告
china crawling data holiday natural-language-processing
Last synced: 14 May 2025
https://github.com/AlisamTechnology/ATSCAN
Advanced dork Search & Mass Exploit Scanner
data dork engine exploitation lfi linux mass-exploitation-scanner ports portscan rfi scanner security server shell sqli system tools vulnerability-scanners web-application xss
Last synced: 26 Mar 2025
https://github.com/alisamtechnology/atscan
Advanced dork Search & Mass Exploit Scanner
data dork engine exploitation lfi linux mass-exploitation-scanner ports portscan rfi scanner security server shell sqli system tools vulnerability-scanners web-application xss
Last synced: 07 Apr 2025
https://github.com/lazyagi/lazyllm
Easiest and laziest way for building multi-agent LLMs applications.
agents ai-agent data deep-learning documentation-tool finetuning framework knowlege-graph langchain lazyllm llamaindex llm llms rag
Last synced: 16 Jan 2026
https://github.com/tensorbase/tensorbase
TensorBase is a new big data warehousing with modern efforts.
analytics bigdata data data-infrastructure data-warehouse database engineering high-performance infrastructure modern rust rust-lang warehouse
Last synced: 06 Apr 2025
https://github.com/iddan/react-spreadsheet
Simple, customizable yet performant spreadsheet for React
csv data excel react spreadsheet
Last synced: 13 May 2025
https://github.com/Litlyx/litlyx
Powerful Analytics Solution. Setup in 30 seconds. Display all your data on a Simple, AI-powered dashboard. Fully self-hostable and GDPR compliant. Alternative to Google Analytics, MixPanel, Plausible, Umami & Matomo.
ai analytics angular charts data data-analysis data-visualization javascript metrics nextjs nodejs nuxt open-source react statistics typescript vue website
Last synced: 25 Aug 2025
https://github.com/NateScarlet/holiday-cn
📅🇨🇳中国法定节假日数据 自动每日抓取国务院公告
china crawling data holiday natural-language-processing
Last synced: 26 Mar 2025
https://github.com/pyjanitor-devs/pyjanitor
Clean APIs for data cleaning. Python implementation of R package Janitor
cleaning-data data data-engineering dataframe hacktoberfest pandas pydata
Last synced: 02 Jan 2026
https://github.com/jldbc/pybaseball
Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)
baseball data python sabermetrics statcast
Last synced: 14 May 2025
https://github.com/skrub-data/skrub
Machine learning with dataframes
data data-analysis data-cleaning data-preparation data-preprocessing data-science data-wrangling dataframe dataframes dirty-data machine-learning
Last synced: 06 Jan 2026
https://github.com/data-forge/data-forge-ts
The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
csv data data-analysis data-cleaning data-cleansing data-forge data-management data-manipulation data-munging data-visualization data-wrangling javascript json linq nodejs pandas visualization
Last synced: 13 May 2025
https://github.com/neherlab/covid19_scenarios
Models of COVID-19 outbreak trajectories and hospital demand
coronavirus covid covid-19 data hospital model modelling ncov neherlab open-source opensource outbreak population research sars-cov-2 science simulation ventilator
Last synced: 15 May 2025
https://github.com/quiltdata/quilt
Quilt is a data mesh for connecting people with actionable data
data data-engineering data-version-control data-versioning parquet python serialization
Last synced: 13 May 2025
https://github.com/koaning/drawdata
Draw datasets from within Python notebooks.
Last synced: 07 Jan 2026
https://github.com/lotus-data/lotus
Use LOTUS to process all of your datasets with LLMs and embeddings. Enjoy up to 1000x speedups with fast, accurate query processing, that's as simple as writing Pandas code
ai-data-processing data llm llm-data-processing llm-document-processing pandas python semantic-operators semantic-search unstructured-data
Last synced: 19 Oct 2025
https://github.com/flipkart-incubator/proteus
Proteus : A JSON based LayoutInflater for Android
android binding data data-binding dynamic-layout functions java json layout-engine proteus
Last synced: 16 May 2025
https://github.com/Tanu-N-Prabhu/Python
This repository helps you understand python from the scratch.
data dataanalysis datascraping google-colab google-colab-notebook jupyter-notebook machine-learning numpy numpy-arrays pandas-dataframe prediction python python-3 python3
Last synced: 15 Apr 2025
https://github.com/pomber/covid19
JSON time-series of coronavirus cases (confirmed, deaths and recovered) per country - updated daily
2019-ncov api coronavirus covid-19 data dataset json time-series
Last synced: 15 May 2025
https://github.com/dbt-labs/metricflow
MetricFlow allows you to define, build, and maintain metrics in code.
analytics business-intelligence data data-modeling metrics pypi semantic-layer
Last synced: 13 May 2025
https://github.com/chartshq/muze
Composable data visualisation library for web with a data-first approach now powered by WebAssembly
area-chart barchart charts crosstab data data-visualization data-viz html5-charts interactive-charts javascript js-charts linechart pie-chart splom svg visualization wasm web webassembly
Last synced: 28 Sep 2025
https://github.com/projectnessie/nessie
Nessie: Transactional Catalog for Data Lakes with Git-like semantics
aws-lambda data git iceberg java spark
Last synced: 13 May 2025
https://github.com/PabloLec/RecoverPy
Interactively find and recover deleted or :point_right: overwritten :point_left: files from your terminal
cli console cybersecurity data data-recovery files forensics hacking linux macos pentesting python python3 recovery search search-interface terminal textual tool tui
Last synced: 24 Mar 2025
https://github.com/teomewhy/teomerefs
Guia de referências técnicas para carreira em dados
data data-science machine-learning python
Last synced: 14 May 2025
https://github.com/babyfish-ct/jimmer
The most advanced ORM of JVM, for both java & kotlin
cache caffine data draft fetch graphql immer immutable immutable-collections immutable-datastructures java jdbc kotlin orm orm-framework orm-library orms redis redis-cache
Last synced: 14 May 2025
https://github.com/robustmq/robustmq
New generation of cloud-native and AI-native messaging infrastructure.
activemq amqp data http infra kafka message message-queue middleware mq mqtt mqtt-broker queue rabbitmq robustmq rocketmq rust serverless storage streaming
Last synced: 14 Sep 2025
https://github.com/elixirs/faker
Faker is a pure Elixir library for generating fake data.
data data-generator database developer-tools dummy elixir fake-content faker generator hacktoberfest phoenix qa seed seeding test testing testing-tools
Last synced: 12 May 2025
https://github.com/litlyx/litlyx
Powerful Analytics Solution. Setup in 30 seconds. Display all your data on a Simple, AI-powered dashboard. Fully self-hostable and GDPR compliant. Alternative to Google Analytics, MixPanel, Plausible, Umami & Matomo.
ai analytics angular charts data data-analysis data-visualization javascript metrics nextjs nodejs nuxt open-source react statistics typescript vue website
Last synced: 14 May 2025
https://github.com/odota/web
React web interface for the OpenDota platform
data dota hacktoberfest javascript react redux ui visualization webpack
Last synced: 12 Apr 2025
https://github.com/cocoindex-io/cocoindex
ETL framework to turn your data AI-ready - with realtime incremental updates and support custom logic like lego.
ai change-data-capture data data-engineering data-indexing data-infrastructure data-processing dataflow etl help-wanted indexing knowledge-graph llm pipeline python rag real-time rust semantic-search streaming
Last synced: 17 Jan 2026
https://github.com/hibuz/dev-conf-replay
🍀 최근 국내 IT 세미나 및 개발자💻 컨퍼런스 영상의 다시 보기👀 링크를 한곳에 정리했습니다!
ai blockchain cloud coding conference data developer devops docs it korean meeup mobile mobility opensource programming readme replay summit tech
Last synced: 24 Mar 2025
https://github.com/TeoMeWhy/teomerefs
Guia de referências técnicas para carreira em dados
data data-science machine-learning python
Last synced: 25 Mar 2025
https://github.com/lhotse-speech/lhotse
Tools for handling multimodal data in machine learning projects.
ai audio data deep-learning kaldi machine-learning python pytorch speech speech-recognition
Last synced: 16 Nov 2025
https://github.com/ruc-datalab/deepanalyze
DeepAnalyze is the first agentic LLM for autonomous data science.
agent agentic agentic-ai ai ai-scientist chatbot chatgpt data data-analysis data-engineering data-science data-visualization database gpt llama llm qwen science structured-data vllm
Last synced: 11 Nov 2025
https://github.com/wieslawsoltes/core2d
A multi-platform data driven 2D diagram editor.
avalonia avaloniaui c-sharp data diagram editor graphics gui multi-platform shapes wysiwyg-editor xaml
Last synced: 14 May 2025
https://github.com/wieslawsoltes/Core2D
A multi-platform data driven 2D diagram editor.
avalonia avaloniaui c-sharp data diagram editor graphics gui multi-platform shapes wysiwyg-editor xaml
Last synced: 02 Apr 2025
https://github.com/disclose/diodb
Open-source vulnerability disclosure and bug bounty program database
bug-bounty bug-bounty-hunters data disclosure-policy hackers legal responsible-disclosure safe-harbor-framework safety security-research simplicity vulnerability-disclosure
Last synced: 17 Jan 2026
https://github.com/NVIDIA/NeMo-Curator
Scalable data pre processing and curation toolkit for LLMs
data data-curation data-prep data-preparation data-processing data-processing-pipelines data-quality datacuration datarecipes deduplication fast-data-processing fine-tuning large-language-models large-scale-data-processing llm llm-data-quality llmapps python semantic-deduplication
Last synced: 29 Jul 2025
https://github.com/uwdata/mosaic
An extensible framework for linking databases and interactive views.
data duckdb interaction scalability visualization
Last synced: 14 May 2025
https://github.com/sghall/resonance
:black_medium_small_square:Resonance | 5kb React animation library
animation charts d3 data data-driven-transitions graph react svg visualization
Last synced: 09 Apr 2025
https://github.com/turicas/brasil.io
Backend do Brasil.IO (para código dos scripts de coleta de dados, veja o link na página de cada dataset)
brasil brazil dados-abertos data hacktoberfest opendata python
Last synced: 14 May 2025
https://github.com/tigrisdata-archive/tigris
Tigris is an Open Source Serverless NoSQL Database and Search Platform.
consensus data database dynamodb elasticsearch foundationdb go golang kubernetes mongodb open-source opensearch real-time search search-engine streaming transactional-database
Last synced: 12 Jan 2026
https://github.com/latitude-dev/latitude
Developer-first embedded analytics
analytics business-intelligence dashboard data data-analysis data-analytics data-app data-engineering data-science data-visualization duckdb embedded-analytics exploratory-data-analysis javascript-framework open-source react self-hosted sql svelte tailwindcss
Last synced: 09 Nov 2025
https://github.com/dcmoura/spyql
Query data on the command line with SQL-like SELECTs powered by Python expressions
command-line csv data json python sql text
Last synced: 21 Oct 2025