data
Individual facts, statistics, or items of information, often numeric. In a technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects. (https://en.wikipedia.org/w/index.php?title=Data&oldid=1093674723, released under CC BY-SA 3.0)
- GitHub: https://github.com/topics/data
- Wikipedia: https://en.wikipedia.org/wiki/Data
- Related Topics: datum,
- Last updated: 2026-01-21 00:07:59 UTC
- JSON Representation
https://github.com/datasets/commons
DataHub commons. Wiki catalog of interesting and important datasets
data datasets datasets-csv open-data open-datasets opendata
Last synced: 17 Jan 2026
https://github.com/epsilla-cloud/vectordb
Epsilla is a high performance Vector Database Management System. Try out hosted Epsilla at https://cloud.epsilla.com/
ai chatgpt data data-science database embeddings embeddings-similarity infrastructure llms machine-learning neural-network neural-search rag retrieval search-engine vector-database vector-search
Last synced: 15 May 2025
https://github.com/turicas/rows
A common, beautiful interface to tabular data, no matter the format
convert-data csv data data-science excel hacktoberfest python table tabular-data xls xlsx
Last synced: 14 May 2025
https://github.com/cosmicmind/graph
Graph is a semantic database that is used to create data-driven applications.
coredata cosmicmind data data-driven data-driven-design data-driven-workflows database graph graph-theory icloud icloud-sync semantic-database swift swift-3
Last synced: 16 May 2025
https://github.com/CosmicMind/Graph
Graph is a semantic database that is used to create data-driven applications.
coredata cosmicmind data data-driven data-driven-design data-driven-workflows database graph graph-theory icloud icloud-sync semantic-database swift swift-3
Last synced: 06 Aug 2025
https://github.com/data-prep-kit/data-prep-kit
Open source project for data preparation for GenAI applications
code-quality data data-prep data-preparation data-preprocessing data-preprocessing-pipelines datacuration datarecipes deduplication finetuning large-language-models large-scale-data-processing llm llmapps malware python ray spark
Last synced: 15 Dec 2025
https://github.com/neumtry/neumai
Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
ai chatgpt data data-engineering database embeddings etl llm llmops mlops ops pipeline python rag retrieval vector-database vectors
Last synced: 29 Oct 2025
https://github.com/jtkim-kaist/VAD
Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.
acam attention bdnn data dnn lstm speech speech-activity-detection speech-recognition vad voice-activity-detection voice-detection
Last synced: 07 May 2025
https://github.com/oleg-agapov/data-engineering-book
Accumulated knowledge and experience in the field of Data Engineering
data data-engineering engineering
Last synced: 15 Apr 2025
https://github.com/fsprojects/fsharp.data
F# Data: Library for Data Access
csv data fsharp html http json typeprovider worldbank xml
Last synced: 16 Dec 2025
https://github.com/mrsaeeddev/free-ai-resources
π FREE AI Resources - π Courses, π· Jobs, π Blogs, π¬ AI Research, and many more - for everyone!
ai artificial-intelligence artificial-neural-networks data data-science data-science-learning data-science-projects deep-learning deep-neural-networks hacktoberfest hacktoberfest2020 machine-learning machine-learning-algorithms machinelearning reinforcement-learning research supervised-learning unsupervised-learning
Last synced: 08 Apr 2025
https://github.com/fsprojects/FSharp.Data
F# Data: Library for Data Access
csv data fsharp html http json typeprovider worldbank xml
Last synced: 13 Mar 2025
https://github.com/NeumTry/NeumAI
Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
ai chatgpt data data-engineering database embeddings etl llm llmops mlops ops pipeline python rag retrieval vector-database vectors
Last synced: 11 Apr 2025
https://github.com/datazenit/sensei-grid
Simple and lightweight data grid in JS/HTML
data data-table datagrid grid javascript jqery lodash underscore
Last synced: 22 Oct 2025
https://github.com/hatnote/listen-to-wikipedia
Live, generative music from Wikipedia edits
data sonification sound visualization wikipedia
Last synced: 15 Apr 2025
https://github.com/richox/orz
a high performance, general purpose data compressor written in the crab-lang
Last synced: 15 May 2025
https://github.com/huggingface/dataset-viewer
Backend that powers the dataset viewer on Hugging Face dataset pages through a public API.
api-rest data datasets huggingface machine-learning nlp
Last synced: 14 Oct 2025
https://github.com/bitbrain/pandora
Godot 4 addon for RPG data management such items, inventories, spells, mobs, quests and NPCs.
data godot godot4 godotengine management rpg
Last synced: 15 May 2025
https://github.com/kuwala-io/kuwala
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times
admin-boundaries data data-integration data-science dbt elt google-trends jupyter kuwala no-code open-data open-source population postgres pyspark python react react-flow scraping spatial-analysis
Last synced: 30 Mar 2025
https://github.com/lakehq/sail
LakeSail's computation framework with a mission to unify batch processing, stream processing, and compute-intensive AI workloads.
arrow big-data data datafusion pyspark python rust spark sql
Last synced: 12 Jun 2025
https://github.com/Flowframe/laravel-trend
Generate trends for your models. Easily generate charts or reports.
charts data hacktoberfest laravel package php reports trends
Last synced: 29 Apr 2025
https://github.com/pbeshai/tidy
Tidy up your data with JavaScript, inspired by dplyr and the tidyverse
data dplyr tidyverse wrangling
Last synced: 15 May 2025
https://github.com/mdn/data
This repository contains general data for Web technologies
Last synced: 17 Jun 2025
https://github.com/prismarinejs/minecraft-data
Language independent module providing minecraft data for minecraft clients, servers and libraries.
Last synced: 29 Jun 2025
https://pbeshai.github.io/tidy/
Tidy up your data with JavaScript, inspired by dplyr and the tidyverse
data dplyr tidyverse wrangling
Last synced: 04 Apr 2025
https://github.com/unytics/bigfunctions
Supercharge BigQuery with BigFunctions
bigquery data data-analytics data-engineering data-visualization data-warehouse
Last synced: 23 Oct 2025
https://github.com/breck7/pldb
PLDB: a Programming Language DataBase
data knowledge-graph programming-languages
Last synced: 14 Apr 2025
https://github.com/piquette/finance-go
:bar_chart: Financial markets data library implemented in go.
cryptocurrency data finance financial-data financial-markets go-library golang options pandas scraper stock-data stock-market stock-trading
Last synced: 16 May 2025
https://github.com/akfamily/aktools
AKTools is an elegant and simple HTTP API library for AKShare, built for AKSharers!
akshare asyncio data data-science fastapi openapi pydanti
Last synced: 14 May 2025
https://github.com/pdpipe/pdpipe
Easy pipelines for pandas DataFrames.
data data-science dataframe dataframes pandas pandas-dataframe pipeline
Last synced: 04 Aug 2025
https://github.com/PrismarineJS/minecraft-data
Language independent module providing minecraft data for minecraft clients, servers and libraries.
Last synced: 26 Apr 2025
https://github.com/dformoso/sklearn-classification
Data Science Notebook on a Classification Task, using sklearn and Tensorflow.
classification-task data docker jupyter learning machine machine-learning notebook roc roc-curve science sklearn tensorflow
Last synced: 04 Apr 2025
https://github.com/octoproject/octo-cli
CLI tool to expose data from any database as a serverless web service.
api data database faas go knative octo-cli openfaas serverless
Last synced: 15 Mar 2025
https://github.com/LexPredict/lexpredict-lexnlp
LexNLP by LexPredict
analytics contracts data law legal legaltech linguistics ml nlp
Last synced: 16 Mar 2025
https://github.com/jokecamp/FootballData
A hodgepodge of JSON and CSV Football/Soccer data
Last synced: 12 Apr 2025
https://github.com/fatiando/pooch
A friend to fetch your data files
data download-manager fatiando-a-terra ftp http python python3 scipy scipy-stack
Last synced: 13 May 2025
https://github.com/Azure-Samples/modern-data-warehouse-dataops
DataOps for Microsoft Data Platform technologies. https://aka.ms/dataops-repo
automatedtesting azure cicd data databricks datafactory dataops devops fabric
Last synced: 30 Jul 2025
https://github.com/addisonlynch/iexfinance
Python SDK for IEX Cloud
data finance pandas stock-data stock-prices stocks-api
Last synced: 08 Apr 2025
https://github.com/pgflo/pg_flo
Stream, transform, and route PostgreSQL data in real-time.
data database etl go golang logical-replication postgres postgresql stream
Last synced: 15 May 2025
https://github.com/azure-samples/modern-data-warehouse-dataops
DataOps for Microsoft Data Platform technologies. https://aka.ms/dataops-repo
automatedtesting azure cicd data databricks datafactory dataops devops fabric
Last synced: 14 May 2025
https://github.com/squarespace/datasheets
Read data from, write data to, and modify the formatting of Google Sheets
data data-analytics data-science dataframe google pandas python
Last synced: 16 May 2025
https://github.com/Squarespace/datasheets
Read data from, write data to, and modify the formatting of Google Sheets
data data-analytics data-science dataframe google pandas python
Last synced: 15 Mar 2025
https://github.com/kipdata/kitesql
SQL as a Function for Rust
data database embeddings myrocks oltp postgresql query-engine rust rust-lang sql sql-query sql-server sqlite sqlite-database web
Last synced: 15 May 2025
https://github.com/foxglove/mcap
MCAP is a modular, performant, and serialization-agnostic container file format, useful for pub/sub and robotics applications.
cpp data deserialization golang python robotics serialization swift typescript
Last synced: 05 Jan 2026
https://github.com/ngneat/query
π Powerful asynchronous state management, server-state utilities and data fetching for Angular Applications
async cache data fetch http pagination query stale-while-revalidate update
Last synced: 14 May 2025
https://github.com/datacleaner/DataCleaner
The premier open source Data Quality solution
data data-analysis data-science database datacleaner dataquality desktop etl mdm profiling
Last synced: 27 Mar 2025
https://github.com/diskframe/disk.frame
Fast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data
data data-science large-dataset manipulation-data medium-data r
Last synced: 16 Jun 2025
https://github.com/DiskFrame/disk.frame
Fast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data
data data-science large-dataset manipulation-data medium-data r
Last synced: 14 Mar 2025
https://github.com/xiaodaigh/disk.frame
Fast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data
data data-science large-dataset manipulation-data medium-data r
Last synced: 14 Mar 2025
https://github.com/foundation/panini
A super simple flat file generator.
data flat-file gulp handlebars handlebars-helpers handlebars-partials hbs html javascript json panini partials yaml
Last synced: 15 May 2025
https://github.com/essandess/isp-data-pollution
ISP Data Pollution to Protect Private Browsing History with Obfuscation
crawling data data-analytics obfuscation privacy privacy-enhancing-technologies web
Last synced: 29 Dec 2025
https://github.com/KipData/KipSQL
SQL as a Function for Rust
data database embeddings myrocks oltp postgresql query-engine rust rust-lang sql sql-query sql-server sqlite sqlite-database web
Last synced: 24 Mar 2025
https://github.com/KipData/KiteSQL
SQL as a Function for Rust
data database embeddings myrocks oltp postgresql query-engine rust rust-lang sql sql-query sql-server sqlite sqlite-database web
Last synced: 29 Sep 2025
https://github.com/kkulma/climate-change-data
:earth_africa: A curated list of APIs, open data and ML/AI projects on climate change
climate climate-analysis climate-change climate-data data data-science datascience hacktoberfest python r resources rstats
Last synced: 04 Apr 2025
https://github.com/semarketir/quranjson
Quran JSON ~ 6236 verses, 114 surah, 30 Juz
audio data islam json juz moslem muslim quran quran-json religion surah tajweed tajwid translation
Last synced: 14 May 2025
https://github.com/capitalone/datacompy
Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!
compare dask data data-science dataframes fugue numpy pandas polars pyspark python snowflake snowpark spark
Last synced: 14 May 2025
https://github.com/randomfractals/vscode-data-preview
Data Preview πΈ extension for importing π€ viewing π slicing πͺ dicing π² charting π & exporting π₯ large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files
array arrow avro config csv data excel extension json parquet perspective viewer vscode yaml
Last synced: 04 Apr 2025
https://github.com/RandomFractals/vscode-data-preview
Data Preview πΈ extension for importing π€ viewing π slicing πͺ dicing π² charting π & exporting π₯ large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files
array arrow avro config csv data excel extension json parquet perspective viewer vscode yaml
Last synced: 08 Apr 2025
https://github.com/NVIDIA-NeMo/Curator
Scalable data pre processing and curation toolkit for LLMs
data data-curation data-prep data-preparation data-processing data-processing-pipelines data-quality datacuration datarecipes deduplication fast-data-processing fine-tuning large-language-models large-scale-data-processing llm llm-data-quality llmapps python semantic-deduplication
Last synced: 20 Jul 2025
https://github.com/datawithbaraa/sql-ultimate-course
The most comprehensive SQL guide from a real-world expert! Learn everything from basics to advanced queries, optimizations, and real-world SQL
data data-analytics data-engineering data-science database database-management database-migrations databases learn-sql project query sql sql-database sql-injection sql-queries sql-query sql-server sql-server-database sql-server-management-studio sqlite
Last synced: 12 Oct 2025
https://github.com/jetify-com/tyson
π₯ TypeScript as a Configuration Language. TySON stands for TypeScript Object Notation
config configuration configuration-language data json ts tson typescript tyson
Last synced: 15 May 2025
https://github.com/z3z1ma/dbt-osmosis
Provides automated YAML management and a streamlit workbench. Designed to optimize dev workflows.
cli data dbt documentation editor modelling sql testing
Last synced: 14 May 2025
https://github.com/marcocesarato/react-native-big-list
This is a high performance list view for React Native with support for complex layouts using a similar FlatList usage to make easy the replacement. This list implementation for big list rendering on React Native works with a recycler focused on performance and memory usage and so it permits processing thousands items on the list.
android big data expo fast flatlist ios javascript js large list massive memory performance react react-native react-native-big-list sticky-headers web
Last synced: 15 May 2025
https://github.com/canner/wren-engine
π€ The Semantic Engine for Model Context Protocol(MCP) Clients and AIΒ Agents π₯
agent agentic-ai ai business-intelligence data data-analysis data-analytics data-lake data-warehouse hacktoberfest llm mcp mcp-server semantic semantic-layer sql
Last synced: 22 Jan 2026
https://github.com/RunLLM/aqueduct
Aqueduct is no longer being maintained. Aqueduct allows you to run LLM and ML workloads on any cloud infrastructure.
ai data data-science kubernetes llm llms machine-learning ml ml-infrastructure ml-monitoring mlops orchestration python python3
Last synced: 18 Apr 2025
https://github.com/gouline/dbt-metabase
dbt + Metabase integration
analytics business-intelligence data data-modelling dbt elt metabase pypa python vizualisation
Last synced: 08 Jan 2026
https://github.com/kevinschaich/pyspark-cheatsheet
π Quick reference guide to common patterns & functions in PySpark.
cheat cheatsheet cheatsheets data data-science docs documentation guide guides pyspark pyspark-tutorial quickstart reference references spark spark-sql
Last synced: 10 Apr 2025
https://github.com/microsoft/Reactors
π± Join a community of developers at Microsoft Reactor and connect with people, skills, and technology to build your career or personal learning. We offer free livestreams, on-demand content, and hybrid/in-person events daily around the world. Access our projects and code here.
ai azure cloud data data-science devops dotnet events iot live-streaming low-code meetup mixed-reality ml no-code nodejs personal-de python web
Last synced: 05 May 2025
https://github.com/CympleTech/esse
Encrypted peer-to-peer IM for data security. Own data, own privacy. (Rust+Flutter)
cross-platform cryptography data data-security flutter p2p rust web3
Last synced: 26 Apr 2025
https://github.com/CympleTech/ESSE
Encrypted peer-to-peer IM for data security. Own data, own privacy. (Rust+Flutter)
cross-platform cryptography data data-security flutter p2p rust web3
Last synced: 27 Apr 2025
https://github.com/cympletech/esse
Encrypted peer-to-peer IM for data security. Own data, own privacy. (Rust+Flutter)
cross-platform cryptography data data-security flutter p2p rust web3
Last synced: 05 Apr 2025
https://github.com/juji-io/editscript
A library to diff and patch Clojure/ClojureScript data structures
algorithm clojure clojurescript-data data data-diffing data-structures diff editscript patch tree-diffing
Last synced: 14 May 2025
https://github.com/koordinates/kart
Distributed version-control for geospatial and tabular data
data data-versioning geospatial geospatial-data gis version-control
Last synced: 01 May 2025
https://github.com/JuliaDataScience/JuliaDataScience
Book on Julia for Data Science
book data data-manipulation data-science data-visualization julia julia-language
Last synced: 20 Jul 2025
https://github.com/xebia-functional/fetch
Simple & Efficient data access for Scala and Scala.js
cats concurrency data data-fetching monads monix parallelism scala scala-js sequencing
Last synced: 11 Jan 2026
https://github.com/toddbirchard/plotlydash-flask-tutorial
ππ Embed Plotly Dash into your Flask applications.
dashboard data data-analysis data-visualisation data-visualization flask flask-application pandas plotly plotly-dash python tutorial
Last synced: 04 Apr 2025
https://github.com/shakedzy/dython
A set of data tools in Python
analysis correlation data modeling plot python roc
Last synced: 21 Oct 2025
https://github.com/loseys/Oblivion
Data leak checker & OSINT Tool
blueteam cybersecurity data data-breach data-leak data-security email gui leak opensource osint password pentest pyqt5 pyside2 python security-team security-tools
Last synced: 11 Jul 2025
https://github.com/nobrainr/morphism
β‘ Type-safe data transformer for JavaScript, TypeScript & Node.js.
array automapper data flow fp functional functors javascript js mapper morphism morphisms object parser typescript
Last synced: 12 Dec 2025
https://github.com/514-labs/moosestack
The developer framework for building analytics into your app on top of ClickHouse, Redpanda and other high-performance analytical infrastructure
analytics data dataengineering deployment framework insights metrics python rust typescript
Last synced: 20 Jan 2026
https://github.com/juliadatascience/juliadatascience
Book on Julia for Data Science
book data data-manipulation data-science data-visualization julia julia-language
Last synced: 14 Apr 2025
https://github.com/juliadata/dataframesmeta.jl
Metaprogramming tools for DataFrames
data data-frame dataframes dataframesmeta datasets hacktoberfest julia tabular-data
Last synced: 14 May 2025
https://github.com/JuliaData/DataFramesMeta.jl
Metaprogramming tools for DataFrames
data data-frame dataframes dataframesmeta datasets hacktoberfest julia tabular-data
Last synced: 15 Mar 2025
https://github.com/serpro69/kotlin-faker
Port of a popular ruby faker gem written in kotlin. Generate realistically looking fake data such as names, addresses, banking details, and many more, that can be used for testing and data anonymization purposes.
android android-development android-testing anonymisation anonymization anonymizer data faker faker-gem faker-generator faker-library faker-libs java jvm kotlin kotlin-faker kotlin-library test-automation testing testing-tools
Last synced: 15 May 2025
https://github.com/yt-project/yt
Main yt repository
analysis astronomy astrophysics data data-visualization finite-element-analysis geophysics nuclear-engineering python scientific-computing scientific-visualization visualization
Last synced: 14 May 2025
https://github.com/ozlerhakan/poiji
:candy: A library converting XLS and XLSX files to a list of Java objects based on Apache POI
apache apache-poi converter data deserialize excel java java-11 mapper mapping microsoft-excel parser performance poi poiji pojo unmarshall
Last synced: 02 Jan 2026
https://github.com/googleapis/python-bigquery-pandas
Google BigQuery connector for pandas
Last synced: 05 Jan 2026
https://github.com/paypal/data-contract-template
Template for a data contract used in a data mesh.
data data-contract data-engineering data-mesh
Last synced: 01 Mar 2025
https://github.com/Gmousse/dataframe-js
A javascript library providing a new data structure for datascientists and developpers
data data-frame dataframe datascience datastructures functional groupby javascript manipulation matrix sql sql-syntax
Last synced: 15 Mar 2025