data
Individual facts, statistics, or items of information, often numeric. In a technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects. (https://en.wikipedia.org/w/index.php?title=Data&oldid=1093674723, released under CC BY-SA 3.0)
- GitHub: https://github.com/topics/data
- Wikipedia: https://en.wikipedia.org/wiki/Data
- Related Topics: datum,
- Last updated: 2026-06-20 00:07:41 UTC
- JSON Representation
https://github.com/ptiger10/pd
A fast, tested, and predictable way to clean, aggregate, and transform data
Last synced: 12 Jan 2026
https://github.com/jeffcore/covid-19-usa-by-state
CSV files of COVID-19 total daily confirmed cases and deaths in the USA by state and county. All data from Johns Hopkins & NYT..
confirmed-cases coronavirus coronavirus-tracking county covid-19 covid19 csv csv-files daily-files data deaths johns-hopkins nyt state usa
Last synced: 16 Jan 2026
https://github.com/rbren/vizzy
Data Visualization with LLMs
chatgpt data data-visualization llm
Last synced: 07 May 2025
https://github.com/evoluteur/kaggle-look-alike
Kaggle Data Explorer UI look-alike built in React.
data data-analysis data-engineering data-exploration data-mining data-platform data-science datascience exploratory-data-analysis explorer front-end frontend kaggle react spa
Last synced: 09 Apr 2025
https://github.com/ivailop7/healthkit-influxdb-grafana
Publish your Apple HealthKit data via Python Flask HTTP endpoint to InfluxDB to plot in Grafana
analytics apple autoexport chart data flask grafana health healthkit http influxdb linux local mac plot python selfquant visualization windows workouts
Last synced: 30 Apr 2025
https://github.com/sungchun12/airflow-dbt-cloud
dbt Cloud pipelines in airflow examples
airflow data dbt dbt-cloud schedule scheduler workflow-engine
Last synced: 04 Sep 2025
https://github.com/kristijorgji/goseeder
Go database seeder inspired from Laravel/Lumen seeder and more
data database go seeder seeders table test-seeds testing
Last synced: 14 May 2025
https://github.com/spratiher9/sparkdataset
Instant search for and access to many datasets in Pyspark.
benchmark benchmark-framework data data-analysis data-mining dataengineering dataset datasets easy-access-application instantsearch pyspark python python3 quickstart r spark standard
Last synced: 02 Aug 2025
https://github.com/travishorn/csval
Check CSV files against a set of validation rules.
cli csv data json-schema parser validation
Last synced: 09 Apr 2025
https://github.com/vijinho/epl_mysql_db
Free/open English Premier League results database from 1993-2017. Dump format is MySQL and sqlite.
data dataset epl football-data mysql premierleague soccer
Last synced: 20 Mar 2025
https://github.com/tradewelltech/protarrow
Convert from protobuf to arrow and back
apache-arrow data protobuf python
Last synced: 16 Jan 2026
https://github.com/vincentauriau/tennis-prediction
Predicts the winner of a tennis match with machine learning
atp data data-science machine-learning tennis
Last synced: 22 Apr 2025
https://github.com/milangritta/Pragmatic-Guide-to-Geoparsing-Evaluation
Full resources supporting the publication "A Pragmatic Guide to Geoparsing Evaluation."
analysis data evaluation geocoder geocoding geography geoparser geoparsing google-cloud linguistics location machine-learning named-entity-recognition places spacy-nlp taxonomy toponym-resolution toponyms toponymy training-data
Last synced: 07 Apr 2025
https://github.com/microsoft/reconner
ReconNER, Debug annotated Named Entity Recognition (NER) data for inconsistencies and get insights on improving the quality of your data.
Last synced: 31 Oct 2025
https://github.com/ocamlpro/directories
directories is an OCaml library that provides configuration, cache and data paths (and more!) following the suitable conventions on Linux, macOS and Windows. The following conventions are used: XDG Base Directory Specification and xdg-user-dirs on Linux, Known Folders on Windows, Standard Directories on macOS.
basedir cache config conventions data directories knownfolders linux macos ocaml standard standarddirectories windows xdg
Last synced: 12 Jun 2025
https://github.com/juliadata/dataapi.jl
A data-focused namespace for packages to share functions
Last synced: 11 Sep 2025
https://github.com/julianfaraway/faraway
R package, scripts and documentation supporting R books by Julian Faraway
Last synced: 21 Feb 2026
https://github.com/vatshayan/b.tech-project-rainfall-predication-in-india
Rainfall Prediction using Machine Learning. India Rainfall Prediction for 115 years. Rainfall Project with Code and Documents
artificial-intelligence btech-project data data-analysis data-mining data-science data-visualization datascience datasets final final-project final-year-project finalproject finalyearproject machine-learning machine-learning-algorithms machinelearning rainfall-prediction semester-project
Last synced: 28 Oct 2025
https://github.com/iamphytan/rosbag-tools
A ROS-agnostic toolbox for common rosbag operations
data data-management python python3 robotics ros1 ros2 rosbag
Last synced: 14 Apr 2025
https://github.com/critocrito/sugarcube
Monoidal data processes.
data data-mining data-preservation data-team human-rights javascript sugarcube
Last synced: 16 Mar 2025
https://github.com/juliaferraioli/opensource-timeline
This repository aims to collect events in open source history.
data history opendata opensource
Last synced: 10 Feb 2026
https://github.com/eidoslab/unitopatho
Dataset of 9536 H&E-stained patches for colorectal polyps classification and adenomas grading | ICIP21 https://doi.org/10.1109/ICIP42928.2021.9506198
cancer data health histopathological-image histopathology histopathology-images medical-image-processing medical-images neural-networks
Last synced: 12 Aug 2025
https://github.com/canclid/canto-filter
粵文語料篩選器 Cantonese text filter
cantonese cantonese-language corpus corpus-data data nlp
Last synced: 27 Oct 2025
https://github.com/aiven/aiven-operator
Provision and manage Aiven Services from your Kubernetes cluster.
automation data databases kubernetes operator
Last synced: 09 Apr 2026
https://github.com/flother/rio2016
Data on the 11,500+ athletes and 306 events at the Rio Olympics. Includes medals tallies
athletes data medals olympic-games olympics rio-de-janeiro rio2016
Last synced: 16 Mar 2026
https://github.com/mwouts/world_trade_data
World Integrated Trade Solution (WITS) API in Python
data statistics trade worldbank
Last synced: 03 Apr 2025
https://github.com/rxavier/econuy
Wrangling Uruguayan economic data so you don't have to.
Last synced: 17 Jan 2026
https://github.com/ctjacobs/git-rdm
A research data management plugin for the Git version control system.
curation data datasets git open-data open-science publishing research-data-management version-control
Last synced: 21 Jan 2026
https://github.com/stefen-taime/iceberg-dbt-trino-hive-modern-open-source-data-stack
To provide a deeper understanding of how the modern, open-source data stack consisting of Iceberg, dbt, Trino, and Hive operates within a music streaming platform, let’s delve into the detailed workflow and benefits of each component.
data dbt hive iceberg modern trinodb
Last synced: 20 Oct 2025
https://github.com/webankblockchain/data-stash
Data-Stash是基于FISCO-BCOS的数据仓库组件,通过解析节点的binlog日志,生成该节点状态的全量备份,从而使节点能够实现冷热数据分离和数据裁剪。
blockchain consortium data data-governance data-separation webank-blockchain
Last synced: 23 Jul 2025
https://github.com/ekmett/perhaps
A monad, perhaps.
data error-handling haskell monad monad-transformers
Last synced: 14 Apr 2025
https://github.com/fluhus/gostuff
Convenience packages for data science in Go.
data data-science data-structures go golang
Last synced: 12 Jan 2026
https://github.com/ihrke/pypillometry
Pupillometry and eyetracking with python
data data-analysis eye-tracking eyetracking pupillometry
Last synced: 10 Oct 2025
https://github.com/hodur-org/hodur-datomic-schema
Hodur is a domain modeling approach and collection of libraries to Clojure. By using Hodur you can define your domain model as data, parse and validate it, and then either consume your model via an API or use one of the many plugins to help you achieve mechanical results faster and in a purely functional manner.
clojure data database datomic modeling schema
Last synced: 12 Dec 2025
https://github.com/ropensci/weatherOz
An API Client for Australian Weather and Climate Data Resources
api-client australia climate data r rainfall rstats weather weather-api weather-forecast
Last synced: 20 Jul 2025
https://github.com/z3z1ma/cdf
A framework to manage data, continuously
data framework pipelines transformation
Last synced: 17 Mar 2025
https://github.com/matrix-msu/kora
The easiest way to manage and publish your data. Open-source, database-driven, online digital repository application for complex multimedia objects (text, images, audio, video). kora stores, manages, and delivers digital objects with corresponding metadata that enhances the research and educational value of the objects.
archive collections data laravel management matrix metadata msu mysql php repository schema
Last synced: 11 Jan 2026
https://github.com/inphyt/covid19-italy-integrated-surveillance-data
COVID-19 integrated surveillance data provided by the Italian Institute of Health and processed via UnrollingAverages.jl to deconvolve the weekly moving averages.
covid-19 covid19-data data data-analysis data-structures data-visualization data-wrangling database dataset epidemiological-data epidemiology italy italy-data italy-dataset open-data surveillance surveillance-data time-series time-series-analysis
Last synced: 26 Jul 2025
https://github.com/webankblockchain/data-reconcile
Data-Reconcile是一款基于区块链的对账组件,提供基于区块链智能合约账本的通用化数据对账解决方案,并提供了一套可动态扩展的对账框架,支持定制化开发。
blockchain consortium data data-governance reconcile webank-blockchain
Last synced: 09 Jul 2025
https://github.com/pkmn/smogon
Wrapper around Smogon's analyses and usage statistics
data git-scraping pokemon smogon
Last synced: 09 Apr 2025
https://github.com/ropenspain/infoelectoral
infoelectoral is a R library that helps retrieve and analize official electoral results for Spain from the Ministry of the Interior. It allows you to download the results of general, european and municipal elections of any year at the polling station and municipality level.
data elecciones elections electoral infoelectoral r spain
Last synced: 14 Apr 2025
https://github.com/wakataw/pyproc
SPSE (Sistem Pengadaan Secara Elektronik) Python API Wrapper
data e-procurement lkpp lpse pengadaan python sedot spse
Last synced: 17 Jan 2026
https://github.com/EIDOSLAB/UNITOPATHO
Dataset of 9536 H&E-stained patches for colorectal polyps classification and adenomas grading | ICIP21 https://doi.org/10.1109/ICIP42928.2021.9506198
cancer data health histopathological-image histopathology histopathology-images medical-image-processing medical-images neural-networks
Last synced: 06 May 2025
https://github.com/brightway-lca/brightway2-io
Importing and exporting for the Brightway LCA framework
bw2 data life-cycle-assessment python
Last synced: 04 Apr 2025
https://github.com/pinecone-io/pinecone-datasets
An open-source dataset library for pre-embedded dataset: create your own data catalog, or use Pinecone's public datasets.
data database embeddings vector
Last synced: 29 Apr 2025
https://github.com/simranjeet97/llm-rag_finance_usecases
This Repository contains the real life use cases of GenAI (LLM+RAG) in Finance Domain. I covers many projects use cases with theory and projects.
data data-augmentation-llm datascience-machinelearning finance financial-analysis fraud-detection fraud-detection-llm huggingface huggingface-transformers large-language-models llm llm-finance llm-rag-finance portfolio-analysis-llm python risk-analysis
Last synced: 11 Apr 2025
https://github.com/fiddlerwoaroof/data-lens
Functional utilities for Common Lisp
data data-transformation functional-programming lisp transducers
Last synced: 05 Feb 2026
https://github.com/iboxdb/db4o-gpl
new Db4o GPL Source Code for Java7+ & .netstardard2.0 Android Xamarin..., the best database project to help you to learn how to make databases
data database db4o embaddable java netstandard oodb
Last synced: 14 Jan 2026
https://github.com/getgrav/grav-plugin-data-manager
Grav Data Manager Plugin
data data-visualization grav grav-plugin
Last synced: 29 Apr 2025
https://github.com/ndgigliotti/shopify-spy
Extract structured data from Shopify websites.
crawler data data-acquisition data-science dropshipping ecommerce scrape scraper scraping scrapy shopify spider
Last synced: 26 Jan 2026
https://github.com/pawel-0/xdg-unused-data
A simple way to identify unused applications data in user directories such as ~./config and ~/.cache.
bash data linux unused xdg xdg-basedir
Last synced: 04 Sep 2025
https://github.com/pennlabs/penn-sdk-python
A Python module for the various services of Penn OpenData. Validated API token required.
data opendata python university-of-pennsylvania
Last synced: 31 Jul 2025
https://github.com/reymond-group/lore
WebGL engine for (big) data visualization.
3d-engine data data-science interactive visualization webgl
Last synced: 06 Mar 2026
https://github.com/htrgouvea/harpoon
[W.I.P] An ecosystem of crawlers for detecting: leaks, sensitive data exposure and attempts exfiltration of data
bing data detect exfiltrate leak notify pastebin perl sensitive-data uranus
Last synced: 01 Mar 2026
https://github.com/tniedbala/secdatatools
Simple Python utility that downloads and extracts SEC financial statement data sets.
accounting analysis csv data dataset finance financial-statements securities tsv utility
Last synced: 23 Jan 2026
https://github.com/rodabt/vduckdb
A blazing-fast DuckDB wrapper built with the V language, making it easier to leverage its power in your projects.
data duckdb vlang wrapper-library
Last synced: 09 Aug 2025
https://github.com/pkmn/randbats
Pokémon Showdown's Random Battle sets
data git-scraping pokemon pokemon-showdown
Last synced: 29 Jul 2025
https://github.com/juliagraphics/namedcolors.jl
More color names than you ever knew you wanted
Last synced: 10 Sep 2025
https://github.com/aws-samples/data-for-saas-patterns
A collection of samples, best practices and reference architectures for implementing SaaS applications on AWS for databases and data services.
Last synced: 14 Apr 2025
https://github.com/climatewatch-vizzuality/climate-watch
Climate Watch: Data for Climate Action
climate data postgresql rails react
Last synced: 08 May 2025
https://github.com/mrpaulandrewltd/Microsoft-Data-Integration-Pipeline-Training
Training workshop content on Azure Data Factory and Azure Synapse Analytics Data Integration Pipelines
azure data data-factory integration pipelines procfwk synapse-analytics
Last synced: 31 Mar 2025
https://github.com/suchjs/such
A powerful fake data library, expandable, configurable, generate data exactly as you want.
data fake faker generation generator javascript json json-data mock mocking nodejs simulate simulation typescript
Last synced: 14 Apr 2025
https://github.com/180Protocol/180protocol
Confidential compute for sensitive data sharing and commercial collaboration
blockchain confidential-computing data data-analysis data-science decentralized-storage distributed dlt enclave filecoin intel-sgx ipfs java kotlin privacy-enhancing-technologies rewards-engine
Last synced: 20 Apr 2025
https://github.com/datawithbaraa/sql-data-analytics-project
This repository contains a collection of SQL scripts demonstrating various analytical techniques, such as changes over time, cumulative, performance, data segmentation, part-to-whole analysis.
analytics business-analytics business-intelligence data data-analysis data-analyst data-analytics data-engineering data-science data-scientist database datascience query reporting sql sql-queries sql-query sql-server window-functions window-functions-in-sql
Last synced: 15 Apr 2025
https://github.com/ckan/ckanext-validation
CKAN extension for validating Data Packages using Table Schema.
Last synced: 06 Apr 2025
https://github.com/oobianom/shinyStorePlus
An R package with in-browser storage for Shiny persistent, synchronized data from the inputs using IndexedDB. Transfer browser link parameters to Shiny input or output values.
cran data data-structures r r-package shiny
Last synced: 05 Oct 2025
https://github.com/kennethleungty/image-metadata-exif
Read and modify image metadata in Python with exif
data exif exiftool image image-manipulation image-processing images metadata photos picture python
Last synced: 08 Apr 2026
https://github.com/randomfractals/observable-data-tools
Repository of web and code editor friendly Observable Data Toools 🛠️ and Notebooks 📚 in .js, .nb.json, .ojs, .omd, .html and .qmd document formats for Data Previews in a browser and in VSCode IDE with Observable JS extension, Quarto extension, and new Quarto publishing tools.
data data-notebooks data-tools diagrams editor jsnotebooks notebook quarto quartopub query sql summary tabular
Last synced: 01 Mar 2026
https://github.com/jnmclarty/validada
Another library for defensive data analysis.
checkset data data-analysis data-validation decorators pandas slice validation
Last synced: 24 Jan 2026
https://github.com/RealityBending/TemplateResults
A template for a data analysis folder that can be easily exported as a webpage or as Supplementary Materials
data open-science open-source pdf r reproducible rmarkdown scripts share statistics submit supplementary-material template webpage website word
Last synced: 30 Jul 2025
https://github.com/gher-uliege/physocean.jl
Utility functions for physical oceanography (properties of seawater, air-sea heat fluxes,...)
data density fluxes julia physical-oceanography sea-water
Last synced: 13 Oct 2025
https://github.com/asad70/stock-news-sentiment-analysis
This program uses Vader SentimentIntensityAnalyzer to calculate the news headline overall sentiment for a stock
data data-science data-visualization finviz finviz-scraper news news-analysis news-headline sentiment sentiment-analysis stock stock-analysis stock-news-analysis stocks vader-sentimentintensityanalyzer
Last synced: 27 Apr 2025
https://github.com/pepijn-devries/CopernicusMarine
Subset and download marine data from EU Copernicus Marine Service Information. Import data on the oceans physical and biogeochemical state from Copernicus into R without the need of external software.
Last synced: 20 Jul 2025
https://github.com/nbremer/datasketches
A monthly collaboration project between Shirley & Nadieh
d3 d3js data data-art data-visualization
Last synced: 14 Aug 2025
https://github.com/arm-university/rpi-pico-projects-for-schools
Raspberry Pi Pico Projects for Schools: Explore cutting-edge topics in Computing, including Machine Learning and Internet of Things. Ages 16-18.
ai data datascience iot ml pico python raspberry-pi rpi
Last synced: 23 Apr 2025
https://github.com/garystafford/streaming-sales-generator
Streaming Synthetic Sales Data Generator: Streaming sales data generator for Apache Kafka, written in Python
analytics apache-flink apache-kafka data kafka kafka-streams kstreams python spark-structured-streaming streaming-data
Last synced: 03 Aug 2025
https://github.com/ssamadgh/ModelAssistant
Elegant library to manage the interactions between view and model in Swift
collectionview controller core coredata data datasource interactor manager model mvc mvp mvvm swift tableview view viewmodel viper
Last synced: 06 Aug 2025
https://github.com/cahyadsn/db_rajaongkir
Data Kode Provinsi, Kota/Kabupaten dan Kecamatan untuk RajaOngkir
data kabupaten kecamatan kode kota provinsi rajaongkir sql
Last synced: 07 Apr 2026
https://github.com/ropensci/weatheroz
An API Client for Australian Weather and Climate Data Resources
api-client australia climate data r rainfall rstats weather weather-api weather-forecast
Last synced: 09 Apr 2026
https://github.com/favstats/uaconflict_equipmentloss
This repo scrapes Oryxspioenkop (daily) to document and visualize equipment losses in the Russia-Ukraine war. https://www.oryxspioenkop.com/2022/02/attack-on-europe-documenting-equipment.html
conflict data data-visualization ukraine-invasion ukrainewar war
Last synced: 13 Aug 2025
https://github.com/180protocol/180protocol
Confidential compute for sensitive data sharing and commercial collaboration
blockchain confidential-computing data data-analysis data-science decentralized-storage distributed dlt enclave filecoin intel-sgx ipfs java kotlin privacy-enhancing-technologies rewards-engine
Last synced: 14 Apr 2025
https://github.com/ahuang11/ahlive
animate your data to life
ahlive animate animation data gif matplotlib xarray
Last synced: 17 Mar 2025
https://github.com/xability/maidr-legacy
[DEPRECATED prototype] Multimodal Access and Interactive Data Representation
ai blind braille chart data description image impairments llm low-vision multimodality plot representation science sonification tactile visual visualization
Last synced: 12 Feb 2026
https://github.com/lolleko/mesh-data-synthesizer
Uses Unreal Engine & Cesium to generate large synthetic dataset from 3D meshes. Enables machine learning tasks like Visual Place Recognition read more in our paper on this: https://meshvpr.github.io
cesium data geospatial machine-learning mesh place-recognition synthesis synthesizer ue5 unreal-engine
Last synced: 28 Apr 2025
https://github.com/ssamadgh/modelassistant
Elegant library to manage the interactions between view and model in Swift
collectionview controller core coredata data datasource interactor manager model mvc mvp mvvm swift tableview view viewmodel viper
Last synced: 29 Apr 2025
https://github.com/feup-infolab/dendro
"Open-source Dropbox" with added description features. It is a data storage and description platform designed to help researchers and other users to describe their data files, built on Linked Open Data and ontologies. Users can use Dendro to publish data to CKAN, Zenodo, DSpace or EUDAT's B2Share and others.
data dendro dendro-platform infolab invenio linked-data rdm research
Last synced: 13 Jul 2025