An open API service indexing awesome lists of open source software.

data

Individual facts, statistics, or items of information, often numeric. In a technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects. (https://en.wikipedia.org/w/index.php?title=Data&oldid=1093674723, released under CC BY-SA 3.0)

https://github.com/Gmousse/dataframe-js

A javascript library providing a new data structure for datascientists and developpers

data data-frame dataframe datascience datastructures functional groupby javascript manipulation matrix sql sql-syntax

Last synced: 15 Mar 2025

https://github.com/geopython/pygeoapi

pygeoapi is a Python server implementation of the OGC API suite of standards. The project emerged as part of the next generation OGC API efforts in 2018 and provides the capability for organizations to deploy a RESTful OGC API endpoint using OpenAPI, GeoJSON, and HTML. pygeoapi is open source and released under an MIT license.

api data geospatial ogc ogc-api osgeo pygeoapi

Last synced: 09 Apr 2025

https://github.com/elementary-data/dbt-data-reliability

dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

analytics analytics-engineering data data-lineage data-observability data-pipeline-monitoring data-pipelines data-reliability dbt dbt-artifacts dbt-packages dbt-tests

Last synced: 16 May 2025

https://github.com/Mohitkr95/Best-Data-Science-Resources

This repository contains the best Data Science free hand-picked resources to equip you with all the industry-driven skills and interview preparation kit.

ai artificial-intelligence artificial-intelligence-algorithms aws computer-vision data data-structures datascience deep-learning git github jupyter-notebook machine-learning mongodb natural-language-processing neural-network python sql statistics

Last synced: 07 May 2025

https://github.com/vincent-pradeilles/keypathkit

KeyPathKit is a library that provides the standard functions to manipulate data along with a call-syntax that relies on typed keypaths to make the call sites as short and clean as possible.

data sql swift

Last synced: 06 Apr 2025

https://github.com/bvaughn/suspense

Utilities for working with React Suspense

async caching data fetching loading react suspense

Last synced: 16 May 2025

https://github.com/ydataai/ydata-quality

Data Quality assessment with one line of code

data machine-learning pandas python quality-assessment

Last synced: 30 Apr 2025

https://github.com/vincent-pradeilles/KeyPathKit

KeyPathKit is a library that provides the standard functions to manipulate data along with a call-syntax that relies on typed keypaths to make the call sites as short and clean as possible.

data sql swift

Last synced: 06 Aug 2025

https://github.com/DataScienceUB/introduction-datascience-python-book

Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications

analytics data data-science datascience machine-learning python sentiment-analysis

Last synced: 19 Jul 2025

https://github.com/randomfractals/geo-data-viewer

Geo Data Analytics tool for VSCode IDE with kepler.gl support to generate and view maps 🗺️ without any Python 🐍, IPyWidgets ⚙️, pandas 🐼, Jupyter notebooks 📚, or ReactJS ⚛️ app code.

data data-analytics geo keplergl map spatial tool viewer vscode

Last synced: 21 Mar 2025

https://github.com/reubano/meza

A Python toolkit for processing tabular data

csv data excel featured functional-programming library pandas tabular-data xlsx xml

Last synced: 14 May 2025

https://github.com/rebecca-vickery/data-science-learning-resources

A comprehensive list of free resources for learning data science

artificial-intelligence data data-science machine-learning python

Last synced: 26 Apr 2025

https://github.com/zhaoyachao/zdh_web

大数据采集,抽取平台,zdh_web是zdh系列服务的可视化管理平台,包含数据采集,调度,权限,审批流,私域营销等模块

bigdata collection data data-collection datapipeline datax-web etl pipline scheduler spark sparketl

Last synced: 04 Apr 2025

https://github.com/kunalj101/Data-Science-Hacks

Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.

computer-vision data data-analysis data-science data-visualization dataset hacks image-augmentation ipynb machine-learning nlp nlp-machine-learning numpy pandas pandas-dataframe pandas-python pandas-tutorial python python3 tips-and-tricks

Last synced: 05 May 2025

https://github.com/kunalj101/data-science-hacks

Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.

computer-vision data data-analysis data-science data-visualization dataset hacks image-augmentation ipynb machine-learning nlp nlp-machine-learning numpy pandas pandas-dataframe pandas-python pandas-tutorial python python3 tips-and-tricks

Last synced: 28 Oct 2025

https://github.com/cosmicmind/samples

Sample projects using Material, Graph, and Algorithm.

algorithm cosmicmind data data-structure database graph material material-design projects swift swift-3 us

Last synced: 29 Mar 2025

https://github.com/aiguofer/gspread-pandas

A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.

data data-analytics data-engineering data-science dataframes google google-sheets google-spreadsheets gspread pandas python sheets

Last synced: 15 May 2025

https://github.com/markpflug/sylvan

A collection of .NET libraries, including the fastest general-purpose CSV parser for .NET.

csv data dotnet dotnet-core

Last synced: 14 May 2025

https://github.com/opengvlab/scalecua

ScaleCUA is the open-sourced computer use agents that can operate on corss-platform environments (Windows, macOS, Ubuntu, Android).

computer-use-agents data gui-agents models online-evaluation-suite scalecua

Last synced: 28 Oct 2025

https://github.com/datalayer/jupyter-ui

🪐 ⚛️ React.js components 💯% compatible with 🪐 Jupyter https://jupyter-ui-storybook.datalayer.tech

data data-product data-science data-visualisation datalayer ipywidgets jupyter jupyterlab lumino notebook reactjs ui

Last synced: 15 May 2025

https://github.com/quickbirdeng/surveykit

Android library to create beautiful surveys (aligned with ResearchKit on iOS)

android android-library clinical-trials data forms ios-researchkit-surveys kotlin kotlin-android medical poll research study-conduct survey

Last synced: 23 Jun 2025

https://github.com/thebowja/genshin-db

npm package with searching functions for Genshin Impact data of all in-game languages. Data parsed/organized directly from GenshinData repo.

data genshin genshin-impact genshinimpact javascript json

Last synced: 14 May 2025

https://github.com/microsoft/ghcrawler

Crawl GitHub APIs and store the discovered orgs, repos, commits, ...

crawler data github github-api github-webhooks ospo

Last synced: 27 Sep 2025

https://github.com/semente/django-smuggler

Django Smuggler is a pluggable application for Django Web Framework that helps you to import/export fixtures via the automatically-generated administration interface.

admin backup data database django fixtures python

Last synced: 29 Mar 2025

https://github.com/ka8725/migration_data

Migrate data along with schema migrations in Rails and keep them up to date.

data data-migration migrations rails

Last synced: 16 May 2025

https://github.com/rpbouman/huey

Light-weight, browser-based ROLAP pivot tables on top of DuckDB-WASM

data duckdb excel pivot-tables rolap small-data sql

Last synced: 24 Mar 2025

https://github.com/SheetJS/j

:x: Multi-format spreadsheet CLI (now merged in http://github.com/sheetjs/js-xlsx )

cli csv data dbf dif excel javascript markdown ods prn spreadsheet sylk xls xlsx

Last synced: 22 Jul 2025

https://github.com/princeton-nlp/less

[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning

data data-selection influence instruction-tuning llama llm mistral

Last synced: 05 Apr 2025

https://github.com/intenthq/anon

A UNIX Command To Anonymise Data

anonymity anonymization cli csv data go golang

Last synced: 26 Mar 2025

https://github.com/compas-dev/compas

Main library of the COMPAS framework and CAD integrations for Rhino/GH and Blender.

aec blender3d data datastructures geometry grasshopper3d rhino3d rpc

Last synced: 24 Oct 2025

https://github.com/sheetjs/js-crc32

:cyclone: JS standard CRC-32 and CRC32C implementation

bytes checksum crc crc-32 crc32c data

Last synced: 14 Apr 2025

https://github.com/openlists/ElectrophysiologyData

A list of openly available datasets in (mostly human) electrophysiology.

data ecog eeg electrophysiological-data electrophysiology lfp meg open-data open-science research

Last synced: 09 May 2025

https://github.com/SheetJS/js-crc32

:cyclone: JS standard CRC-32 and CRC32C implementation

bytes checksum crc crc-32 crc32c data

Last synced: 02 May 2025

https://github.com/mtahiraslan/data-analyst-roadmap

Based on my own experience, I think this roadmap will answer all the questions of how to become a data analyst from zero, which technologies and programming languages are better to know, what kind of soft skills do we need, how do I start my professional career in this field.

blogs businessintelligence courses data dataanalysis dataanalyst excel interview mtahiraslan powerbi programming python resources resume roadmap softskills sql statistics tableau tutorials

Last synced: 14 Mar 2025

https://github.com/zutianbiao/baize

白泽自动化运维系统:配置管理、网络探测、资产管理、业务管理、CMDB、CD、DevOps、作业编排、任务编排等功能,未来将添加监控、报警、日志分析、大数据分析等部分内容

analyse ansible big-data cmdb crontab data devops django log monitor ops python

Last synced: 09 May 2025

https://github.com/alibaba/feathub

FeatHub - A stream-batch unified feature store for real-time machine learning

apache-flink data data-engineering data-quality data-science feature-engineering feature-store machine-learning mlops streaming

Last synced: 14 Oct 2025

https://github.com/maxlath/wikibase-sdk

JS utils functions to query a Wikibase instance and simplify its results

api data simplify sparql wikibase wikidata

Last synced: 12 Dec 2025

https://github.com/splitgraph/sgr

sgr (command line client for Splitgraph) and the splitgraph Python library

data data-version-control developer-tools postgres postgresql python sql

Last synced: 13 Apr 2025

https://github.com/anilogia/animedb

約100年に渡るアニメ作品リストデータベース

anime data

Last synced: 02 Apr 2025

https://github.com/redis/riot

🧨 Get data in & out of Redis with RIOT

cli data database faker file generator rdbms redis stream streaming

Last synced: 15 May 2025

https://github.com/weecology/retriever

Quickly download, clean up, and install public datasets into a database management system

data data-retrieval data-science dataset datasets hacktobefest python

Last synced: 21 Oct 2025

https://github.com/micromata/http-fake-backend

Build a fake backend by providing the content of JSON files or JavaScript objects through configurable routes.

api backend data fake fake-data http http-server json json-files mock mocking mocking-server mocks node nodejs rest rest-api restful restful-api

Last synced: 24 Mar 2025

https://github.com/tokern/piicatcher

Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub

aws-athena aws-glue aws-redshift catalog data data-catalog database phi pii python snowflake

Last synced: 12 Apr 2025

https://github.com/ag-grid/ag-charts

AG Charts is a fully-featured and highly customizable JavaScript charting library. The professional choice for developers building enterprise applications

angular angular-charts angular-component angular-graphs charts data data-component graph graphs react react-charts react-graphs reactjs vue vuejs

Last synced: 14 May 2025

https://github.com/holtzy/react-graph-gallery

A set of graph examples showing how to make react and d3.js work together

d3js data dataviz react

Last synced: 24 Mar 2025

https://github.com/juliadata/tables.jl

An interface for tables in Julia

data hacktoberfest interfaces julia tables

Last synced: 14 May 2025

https://github.com/infuseai/artivc

A version control system to manage large files.

data machinelearning storage version-control

Last synced: 01 Aug 2025

https://github.com/yamafaktory/hypergraph

Hypergraph is data structure library to create a directed hypergraph in which a hyperedge can join any number of vertices.

data data-science data-structure data-structures hypergraph hypergraphs rust rust-lang rustlang

Last synced: 15 May 2025

https://github.com/flyteorg/flytekit

Extensible Python SDK for developing Flyte tasks and workflows. Simple to get started and learn and highly extensible.

automation data data-science extensible flyte flyte-tasks hacktoberfest mlops pypi python sdk spark workflows

Last synced: 12 Jan 2026

https://github.com/noobaa/noobaa-core

High-performance S3 gateway to any backend (file, s3-compatible, multi-clouds, caching, replication...)

data hybrid mirroring multi-cloud nodejs noobaa s3 storage tiering

Last synced: 16 May 2025

https://github.com/openml/openml-python

OpenML's Python API for a World of Data and More 💫

benchmarking data datascience machine-learning meta-learning openml python tabular-data

Last synced: 10 Apr 2025

https://github.com/ZumoLabs/zpy

Synthetic data for computer vision. An open source toolkit using Blender and Python.

ai blender blender-addon computer-vision data deep-learning ml python synthetic synthetic-data

Last synced: 11 May 2025

https://github.com/dodiku/AudioOwl

Fast and simple music and audio analysis using RNN in Python 🕵️‍♀️ 🥁

analysis audio audio-features beat-detection beats data feature-extraction machine-learning mir ml music music-information-retrieval pip pypi python rnn sample-rate tempo

Last synced: 14 Apr 2025

https://github.com/felipenoris/xlsx.jl

Excel file reader and writer for the Julia language.

data excel julia julia-language

Last synced: 15 May 2025

https://github.com/felipenoris/XLSX.jl

Excel file reader and writer for the Julia language.

data excel julia julia-language

Last synced: 10 May 2025

https://github.com/InfuseAI/ArtiVC

A version control system to manage large files.

data machinelearning storage version-control

Last synced: 18 Apr 2025

https://github.com/hodur-org/hodur-engine

Hodur is a domain modeling approach and collection of libraries to Clojure. By using Hodur you can define your domain model as data, parse and validate it, and then either consume your model via an API or use one of the many plugins to help you achieve mechanical results faster and in a purely functional manner.

clojure data modeling schema

Last synced: 12 Dec 2025

https://github.com/wll8/mockm

用于处理前端在接口环节中的各种问题,例如快速生成 api 以及创造数据、页面部署等,开箱即用,便于迁移。A framework based on Express. It can quickly generate APIs and create data, ready for deployment out of the box.

api data dummy fake json mock mocking prototyping rest restfull sandbox server test testing

Last synced: 15 May 2025

https://github.com/stocknear/backend

Backend of stocknear - Open Source Stock Analysis

data data-science fastapi fastify finance javascript machine-learning nodejs pocketbase python redis

Last synced: 16 May 2025

https://github.com/nshiab/simple-data-analysis

Easy-to-use and high-performance JavaScript library for data analysis. Works with tabular and geospatial data.

analysis bun data data-analysis data-science duckdb geospatial javascript node node-js nodejs spatial spatial-analysis sql typescript

Last synced: 17 Jan 2026

https://github.com/shahinrostami/plotapi

Engaging visualisations, made easy.

data data-science data-visualization plotting python visualization

Last synced: 22 Jan 2026

https://github.com/solnic/drops

🛠️ Tools for working with data effectively - data contracts using types, schemas, domain validation rules, type-safe casting, and more.

data elixir elixir-lang elixir-library json schema validation

Last synced: 15 May 2025

https://github.com/adamhl8/filterql

A tiny query language for filtering structured data

data filter filtering filterql language query query-language typescript

Last synced: 04 Oct 2025

https://github.com/argoproj-labs/old-argo-dataflow

Dataflow is a Kubernetes-native platform for executing large parallel data-processing pipelines.

data jetstream kafka kubernetes pipeline

Last synced: 16 May 2025

https://github.com/spamscanner/spamscanner

Spam Scanner is a Node.js anti-spam, email filtering, and phishing prevention tool and service. Built for @ladjs, @forwardemail, @cabinjs, @breejs, and @lassjs.

anti-spam api data enron javascript node rspam rspamd scanner service set spam spam-classification spam-classifier spam-detection spam-filter spam-filtering spam-prevention spam-protection spamassassin

Last synced: 30 Sep 2025

https://github.com/Bears-R-Us/arkouda

Arkouda (αρκούδα): Interactive Data Analytics at Supercomputing Scale :bear:

chapel data data-analysis data-science distributed-computing eda hpc python

Last synced: 08 Jul 2025

https://github.com/markpflug/sylvan.data.excel

The fastest .NET library for reading Excel data files.

data dotnet dotnet-core excel xls xlsb xlsx

Last synced: 15 May 2025

https://github.com/empower-ai/dsensei

AI-powered key driver analysis tool that pinpoints root cause behind metrics fluctuation in one minute.

analytics business-analytics business-intelligence data data-analytics data-insights data-science

Last synced: 01 Aug 2025

https://github.com/pluginpal/strapi-plugin-config-sync

:recycle: CLI & GUI for continuous migration of config data across environments

config config-sync data dtap dump environments migrate strapi strapi-plugin sync

Last synced: 16 May 2025

https://github.com/bears-r-us/arkouda

Arkouda (αρκούδα): Interactive Data Analytics at Supercomputing Scale :bear:

chapel data data-analysis data-science distributed-computing eda hpc python

Last synced: 06 Apr 2025

https://github.com/oliver006/elasticsearch-test-data

Generate and upload test data to Elasticsearch for performance and load testing

data elasticsearch python test-data tornado

Last synced: 05 Apr 2025

https://github.com/houbb/nlp-hanzi-similar

The hanzi similar tool.(汉字相似度计算工具,中文形近字算法。可用于手写汉字识别纠正,文本混淆等。)

chinese data han nlp ocr word-correction

Last synced: 07 Apr 2025

https://github.com/martymac/fpart

Sort files and pack them into partitions

bigdata cpio data migration packing parallel rsync tar

Last synced: 21 Oct 2025