An open API service indexing awesome lists of open source software.

data

Individual facts, statistics, or items of information, often numeric. In a technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects. (https://en.wikipedia.org/w/index.php?title=Data&oldid=1093674723, released under CC BY-SA 3.0)

https://github.com/itachi-uchiha581/auto-data

Auto Data is a library designed for quick and effortless creation of datasets tailored for fine-tuning Large Language Models (LLMs).

ai data finetuning-large-language-models finetuning-llms generative-ai llm llm-training python python3

Last synced: 20 Sep 2025

https://github.com/debruine/faux

R functions for simulating factorial datasets

data simulation

Last synced: 28 Aug 2025

https://github.com/ERDDAP/erddap

ERDDAP is a scientific data server that gives users a simple, consistent way to download subsets of gridded and tabular scientific datasets in common file formats and make graphs and maps. ERDDAP is a Free and Open Source (Apache and Apache-like) Java Servlet from NOAA NMFS SWFSC Environmental Research Division (ERD).

data environmental erddap noaa scientific server

Last synced: 08 May 2025

https://github.com/purarue/google_takeout_parser

A library/CLI tool to parse data out of your Google Takeout (History, Activity, Youtube, Locations, etc...)

backup data export google google-location-history google-takeout

Last synced: 14 Jun 2025

https://github.com/1n3/powerexfil

A collection of data exfiltration scripts for Red Team assessments.

data exfil exfiltration hacking powershell redteam redteaming script scripts tool tools

Last synced: 08 Aug 2025

https://github.com/joelgmsec/fakedatagen

Full Valid Fake Data Generator

data fake full generator valid

Last synced: 24 Apr 2025

https://github.com/zbrookle/dataframe_sql

A Python package that parses SQL and interprets it as methods that act upon existing pandas (or other types of) DataFrames that have been declared and registered

data dataframes pandas python sql

Last synced: 20 Aug 2025

https://github.com/ralyodio/humanparser

Parse a human name string into salutation, first name, middle name, last name, suffix.

data es6 javascript parsing scraping

Last synced: 13 Aug 2025

https://github.com/jason89521/daxus

Daxus is a server state management library for React that provides full control over data, leading to a better user experience.

cache data dedupe hook react revalidate server-state-management user-experience

Last synced: 23 Jun 2025

https://github.com/saschagobel/legislatoR

Interface to the Comparative Legislators Database

data dataset legislators parliament political-science politicians politics r wikipedia

Last synced: 13 Jul 2025

https://github.com/geostatsguy/geodatasets

Synthetic datasets for geoscience (geo)statistical modeling

data database spatial-data

Last synced: 26 Oct 2025

https://github.com/bukalapak/ktpextractor

This is a service which takes KTP image as the input, and extract the data in the KTP as the output. This is a part of open source project by Data Scientists of Bukalapak.

data datascience

Last synced: 01 Aug 2025

https://github.com/vr-25/migrator

A backup solution and data migration utility for Android

android appdata backup data factoryreset magisk migation migrate titaniumbackup

Last synced: 08 Jul 2025

https://github.com/gabrieldim/advanced-programming

Generic programming, generic classes, maps, sets, abstract data types and so on.

abstarct class data data-type data-types generic generic-programming generics interface interfaces map set

Last synced: 10 Jul 2025

https://github.com/neurosnap/cofx

A node and javascript library that helps developers describe side-effects as data in a declarative, flexible API.

asynchronous cofx data javascript node promise side-effects yield

Last synced: 14 Apr 2025

https://github.com/airbytehq/airbyte-agent-connectors

🐙 Drop-in tools that give AI agents reliable, permission-aware access to external systems.

ai ai-agents airbyte anthropic connectors data enterprise gemini integrations langchain llm mcp open-source openai pydantic-ai rag

Last synced: 23 Jan 2026

https://github.com/anthonybudd/S4

S4 is 100% S3 compatible storage, accessed through Tor and distributed using IPFS.

data docker ipfs object-storage s3 s4 storage

Last synced: 07 Apr 2025

https://github.com/anthonybudd/s4

S4 is 100% S3 compatible storage, accessed through Tor and distributed using IPFS.

data docker ipfs object-storage s3 s4 storage

Last synced: 12 Apr 2025

https://github.com/cncf/surveys

📝📊 CNCF Survey Data

cncf data surveys

Last synced: 26 Oct 2025

https://github.com/anchore/vunnel

Tool for collecting vulnerability data from various sources (used to build the grype database)

data grype hacktoberfest vulnerability

Last synced: 08 Jan 2026

https://github.com/synthesized-io/fairlens

Identify bias and measure fairness of your data

bias data data-analysis data-science fairness ml pandas python statistics

Last synced: 24 Jun 2025

https://github.com/mattphillips/jest-each

A parameterised testing library for Jest. https://www.npmjs.com/package/jest-each 🏃

data each jest parameterised test

Last synced: 13 Apr 2025

https://github.com/jobehi/isthistechdead

The place where your favourite framework will be resting

data metrics tech

Last synced: 19 Jun 2025

https://github.com/adzz/data_schema

Declarative schemas for data transformations.

data data-parsing elixir functional-programming types validation

Last synced: 20 Jul 2025

https://github.com/nucs/cryptocurrency-ticks-data

590 days of trade ticks on BTC/ETH/LTC/NEO to USDT

crypto data stock-data ticks

Last synced: 17 Aug 2025

https://github.com/googlecloudplatform/dlp-dataflow-deidentification

Multi Cloud Data Tokenization Solution By Using Dataflow and Cloud DLP

beam bigquery data dataflow dlp pii tokenization

Last synced: 11 Apr 2025

https://github.com/yuxqiu/modern-poetry

The most comprehensive database of modern Chinese poetry and foreign poetry 最全的中国近现代诗以及外国诗数据库

data json poems poetry translation

Last synced: 16 Jan 2026

https://github.com/contextdata/vectoretl

Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications

cohere data datapipeline etl etl-framework etl-pipeline openai pinecone python qdrant qdrant-vector-database unstructured vector-database weaviate

Last synced: 09 Apr 2025

https://github.com/saschagrunert/rain

Visualize vertical data inside your terminal 💦

data log logger rain rust terminal

Last synced: 09 Apr 2025

https://github.com/phyphox/phyphox-arduino

The phyphox BLE library to connect Arduino projects with the phyphox app to display data on the phone or use the phone's sensors on the Arduino

arduino ble bluetooth bluetooth-low-energy data phyphox sensors

Last synced: 16 Jan 2026

https://github.com/tmcw/simpleopendata

simple guidelines for publishing open data in useful formats

copleft copyright data formats government licensing open

Last synced: 12 Nov 2025

https://github.com/empower-ai/sql-agent

Ai Agent that helps you do data analytics with natural language.

analytics bigquery chatgpt chatgpt-bot data data-analytics data-science mysql postgresql slack slack-bot slackbot

Last synced: 11 Apr 2025

https://github.com/aws-solutions/automated-data-analytics-on-aws

The Automated Data Analytics on AWS solution provides an end-to-end data platform for ingesting, transforming, managing and querying datasets. This helps analysts and business users manage and gain insights from data without deep technical experience using Amazon Web Services (AWS).

analytics automated aws data

Last synced: 17 Apr 2025

https://github.com/josephrp/datatonic

🌟DataTonic : A Data-Capable AGI-style Agent Builder of Agents , that creates swarms , runs commands and securely processes and creates datasets, databases, visualisations, and analyses.

agent-builder agi autogen azure chroma data data-science data-visualization database memgpt semantic-kernel semantic-memory taskweaver

Last synced: 11 Oct 2025

https://github.com/leinstay/steamdb

JSON file of all games available on Steam with prices and additional data from Steam Spy, GameFAQs, Metacritic, IGDB and HLTB.

data gamefaqs games history hltb igdb json steam steamspy

Last synced: 22 Apr 2025

https://github.com/Azure/azure-data-labs-modules

A list of Terraform modules to build your Azure Data IaC templates.

analytics azure data github github-actions labs terraform terraform-modules

Last synced: 06 May 2025

https://github.com/joaocarmo/react-smart-data-table

A smart data table component for React meant to be configuration free

data data-table data-visualization plug-and-play react

Last synced: 13 Apr 2025

https://github.com/stanfordnlp/edu-convokit

Edu-ConvoKit: An Open-Source Framework for Education Conversation Data

data data-analysis data-science education language natural-language-processing

Last synced: 15 Apr 2025

https://github.com/JujuAdams/SNAP

Data format converters for GameMaker LTS 2022

array data gamemaker gamemaker-studio-2 gms2 ini json messagepack struct xml

Last synced: 01 Apr 2025

https://github.com/jujuadams/snap

Data format converters for GameMaker LTS 2022

array data gamemaker gamemaker-studio-2 gms2 ini json messagepack struct xml

Last synced: 06 Apr 2025

https://github.com/volorf/paster

Pasting a text data from a clipboard directlly to Sketch text layers [Sketch plugin]

clipboard data plugin sketch sketch-plugin text

Last synced: 21 Mar 2025

https://github.com/richienb/ros-data-waster

The easiest way to waste your data.

data html waste

Last synced: 19 Jun 2025

https://github.com/slowkow/tftargets

:dart: Human transcription factor target genes from 6 databases in convenient R format.

bioinformatics data rstats transcription-factors

Last synced: 14 Apr 2025

https://github.com/azure/azure-data-labs-modules

A list of Terraform modules to build your Azure Data IaC templates.

analytics azure data github github-actions labs terraform terraform-modules

Last synced: 05 Jul 2025

https://github.com/Baukebrenninkmeijer/table-evaluator

Evaluate real and synthetic datasets against each other

data data-evaluation evaluation generation synthetic synthetic-data table-evaluator

Last synced: 02 May 2025

https://github.com/ropensci/opentripplanner

An R package to set up and use OpenTripPlanner (OTP) as a local or remote multimodal trip planner.

data isochrones java opentripplanner otp public-transport r routing transport transportation-planning

Last synced: 08 Oct 2025

https://github.com/open-discourse/open-discourse

Open Discourse is the first fully comprehensive corpus of the plenary proceedings of the federal German Parliament (Bundestag).

bundestag corpus data hacktoberfest

Last synced: 14 Mar 2025

https://github.com/ContextData/VectorETL

Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications

cohere data datapipeline etl etl-framework etl-pipeline openai pinecone python qdrant qdrant-vector-database unstructured vector-database weaviate

Last synced: 22 Sep 2025

https://github.com/drivy/checker_jobs

Regression testing for data

data regression-testing ruby sidekiq

Last synced: 06 Apr 2025

https://github.com/JoelGMSec/FakeDataGen

Full Valid Fake Data Generator

data fake full generator valid

Last synced: 12 Jul 2025

https://github.com/hebilicious/vue-query-nuxt

A lightweight, 0 config Nuxt Module for Vue Query.

data data-fetching fetch nuxt react-query tanstack tanstack-query vue vue-query

Last synced: 04 Apr 2025

https://github.com/Hebilicious/vue-query-nuxt

A lightweight, 0 config Nuxt Module for Vue Query.

data data-fetching fetch nuxt react-query tanstack tanstack-query vue vue-query

Last synced: 02 Aug 2025

https://github.com/torkleyy/nitric

[ABANDONED] General-purpose data processing library. Mirror of https://gitlab.com/nitric/nitric

data ecs entity-component processing

Last synced: 20 Aug 2025

https://github.com/fityannugroho/idn-area-map

The map of Indonesia's administrative areas 🇮🇩🌏

data hacktoberfest idn-area indonesia island map nextjs tailwindcss wilayah

Last synced: 07 Apr 2025

https://github.com/jbzoo/data

Extended implementation of ArrayObject - useful collection for any config in your system (write, read, store, change, validate, convert to other format and etc).

arrayobject config converts data filters ini jbzoo php yml

Last synced: 05 Apr 2025

https://github.com/opensource-observer/oss-directory

A curated directory of open source software (OSS) projects and their associated artifacts

data github open-source public-goods research

Last synced: 08 Oct 2025

https://github.com/uwdata/flechette

Fast, lightweight access to Apache Arrow data.

arrow data interchange

Last synced: 04 Apr 2025

https://github.com/wildflowai/platform

Model natural ecosystems 🌎🪸🐳

ai biodiversity conservation data ocean restoration

Last synced: 25 Nov 2025

https://github.com/finos/datahub

DataHub - Synthetic data library

data library pandas python sklearn synthetic

Last synced: 30 Sep 2025

https://github.com/ngxs-labs/data

NGXS Persistence API

data entity ngxs ngxs-persistence-api

Last synced: 24 Apr 2025

https://github.com/spine-tools/Spine-Toolbox

Spine Toolbox is an open source Python package to manage data, scenarios and workflows for modelling and simulation. You can have your local workflow, but work as a team through version control and SQL databases.

anaconda data energy miniconda python simulation-model spine-toolbox workflow

Last synced: 07 May 2025

https://github.com/visgl/deck.gl-data

Data for the data visualization library deck.gl examples (https://uber.github.io/deck.gl/#/)

data data-science data-visualization uber

Last synced: 12 Jun 2025

https://github.com/ashvin27/react-datatable

React-datatable is a component which provide ability to create multifunctional table using single component like jQuery Datatable. It's fully customizable and easy to integrate in any react component. Bootstrap compatible.

data datatables datatables-plugin react react-data-table react-datagrid react-datatable react-table table

Last synced: 13 May 2025

https://github.com/smappnyu/youtube-data-api

A Python Client for collect and parse public data from the Youtube Data API

api api-wrapper data python python-client research research-tool youtube youtube-api-v3 youtube-search

Last synced: 28 Oct 2025

https://github.com/turbot/steampipe-postgres-fdw

The Steampipe foreign data wrapper (FDW) is a zero-ETL product that provides Postgres foreign tables which translate queries into API calls to cloud services and APIs. It's bundled with Steampipe and also available as a set of standalone extensions for use in your own Postgres database.

aws azure data devsecops gcp golang hacktoberfest kubernetes postgres postgresql postgresql-fdw security sql steampipe steampipe-engine

Last synced: 07 May 2025

https://github.com/apple/dnikit

A Python toolkit for analyzing machine learning models and datasets.

ai bias compression data data-duplication fairness fairness-ml introspection machine-learning ml python

Last synced: 19 Oct 2025

https://github.com/queryverse/iterabletables.jl

Implementations of the TableTraits.jl interface for various packages

data julia queryverse

Last synced: 12 Apr 2025

https://github.com/purarue/hpi

Human Programming Interface - a way to unify, access and interact with all of my personal data [my modules]

data gdpr history lifelogging personal-api quantified-self

Last synced: 04 Jul 2025

https://github.com/opennem/opennem

Australian energy market data platform

aemo climate data energy nem nemweb openelectricity opennem superpower wem

Last synced: 12 Apr 2025

https://github.com/textileio/textile-facebook

[DEPRECATED] simple parsing tool to get your data out of a facebook export

data exporters photography privacy

Last synced: 05 Jan 2026

https://github.com/tirthajyoti/synthetic-data-gen

Various methods for generating synthetic data for data science and ML

classification data data-science machine-learning python regression symbolic-computation time-series

Last synced: 30 Apr 2025

https://github.com/mainakrepositor/datasets

A bunch of some 200 datasets. You can call it mini-kaggle :)

csv data data-science database datasets image-files mini-kaggle ml nlp-machine-learning tsv

Last synced: 01 Mar 2025

https://github.com/trailheadapps/coral-cloud

Sample application that showcases Data Cloud, Agents and Prompts.

agents ai cloud data prompt salesforce

Last synced: 05 Apr 2025

https://github.com/piquette/qtrn

A cli tool to streamline financial markets data analysis :wrench:

cli data data-science finance go golang options quotes scraper stock stock-analysis stock-market

Last synced: 15 May 2025

https://github.com/mydataharbor/mydataharbor

:cn: MyDataHarbor是一个致力于解决任意数据源到任意数据源的分布式、高扩展性、高性能、事务级的数据同步中间件。帮助用户可靠、快速、稳定的对海量数据进行准实时增量同步或者定时全量同步,主要定位是为实时交易系统服务,亦可用于大数据的数据同步(ETL领域)。

data data-sync elasticsearch etl java jdbc kafka mysql pipeline redis

Last synced: 19 Apr 2025

https://github.com/melroy89/metacritic_api

PHP Metacritic API - Mirror from my GitLab

api crawler data metacritic parser php scores scraper webscraping

Last synced: 13 May 2025

https://github.com/visivo-io/visivo

✨ Build dashboards with end-to-end version control. 🔋 CLI w/ batteries included, no infra required. Develop on your laptop for instant results, deploy changes safely (with automated checks), and keep every report trustworthy for stakeholders, analysts and agents 🤖

analytics bi bi-analytics bi-as-code business-intelligence data data-analysis data-visualization duckdb plotlyjs pydantic python reactjs sql

Last synced: 16 Oct 2025

https://github.com/cipherstash/jseql

Encrypt and protect data using industry standard algorithms, field level encryption, a unique data key per record, bulk encryption operations, and decryption level identity verification.

data data-security encryption javascript postgres postgresql security typescript

Last synced: 09 Apr 2025

https://github.com/countries/countries-data-json

ISO 3116 country information in JSON format to be included in other projects.

countries currency data iso-3166-1 iso-3166-2 iso-4217 json

Last synced: 18 Jan 2026

https://github.com/capitalone/dataCompareR

dataCompareR is an R package that allows users to compare two datasets and view a report on the similarities and differences.

compare-data data data-analysis data-science r

Last synced: 30 Jul 2025

https://github.com/kaustubhhiware/facebook-archive

Just some fun you can have with facebook's archive data

data data-visualization facebook python

Last synced: 15 Apr 2025

https://github.com/rsheftel/raccoon

Python DataFrame with fast insert and appends

data dataframe frame pandas

Last synced: 03 Apr 2025

https://github.com/pdil/usmap

🗺 Create US maps including Alaska and Hawaii in R

counties data fips geodata mapping r states usa

Last synced: 07 Apr 2025

https://github.com/geonetwork/geonetwork-ui

GeoNetwork UI is a suite of Applications made to provide a modern facade to your GeoNetwork 4 catalog. It also provides Web Components to embed various parts of your data catalog in third party websites.

angular data geonetwork gis ui webcomponents

Last synced: 10 Apr 2025