Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with data

A curated list of projects in awesome lists tagged with data .

https://github.com/tanstack/query

🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.

async cache data fetch graphql hooks query react rest solid stale stale-while-revalidate svelte typescript update vue

Last synced: 16 Dec 2024

https://github.com/TanStack/query

🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.

async cache data fetch graphql hooks query react rest stale stale-while-revalidate update

Last synced: 25 Oct 2024

https://github.com/metabase/metabase

The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:

analytics bi business-intelligence businessintelligence clojure dashboard data data-analysis data-visualization database metabase mysql postgres postgresql reporting slack sql-editor visualization

Last synced: 16 Dec 2024

https://github.com/run-llama/llama_index

LlamaIndex is a data framework for your LLM applications

agents application data fine-tuning framework llamaindex llm multi-agents rag vector-database

Last synced: 16 Dec 2024

https://github.com/sheetjs/sheetjs

📗 SheetJS Spreadsheet Data Toolkit -- New home https://git.sheetjs.com/SheetJS/sheetjs

angular bun csv data database deno excel grid html html5 ios javascript json nodejs react spreadsheet table vue xlsx xml

Last synced: 16 Dec 2024

https://github.com/SheetJS/js-xlsx

📗 SheetJS Spreadsheet Data Toolkit -- New home https://git.sheetjs.com/SheetJS/sheetjs

angular bun csv data database deno excel grid html html5 ios javascript json nodejs react spreadsheet table vue xlsx xml

Last synced: 13 Nov 2024

https://github.com/SheetJS/sheetjs

📗 SheetJS Spreadsheet Data Toolkit -- New home https://git.sheetjs.com/SheetJS/sheetjs

angular bun csv data database deno excel grid html html5 ios javascript json nodejs react spreadsheet table vue xlsx xml

Last synced: 25 Oct 2024

https://github.com/SheetJS/js-xls

📗 SheetJS Spreadsheet Data Toolkit -- New home https://git.sheetjs.com/SheetJS/sheetjs

angular bun csv data database deno excel grid html html5 ios javascript json nodejs react spreadsheet table vue xlsx xml

Last synced: 08 Sep 2024

https://github.com/mendableai/firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

ai ai-scraping crawler data html-to-markdown llm markdown rag scraper scraping web-crawler webscraping

Last synced: 16 Dec 2024

https://github.com/fivethirtyeight/data

Data and code behind the articles and graphics at FiveThirtyEight

data

Last synced: 16 Dec 2024

https://github.com/airbytehq/airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

bigquery change-data-capture data data-analysis data-collection data-engineering data-integration data-pipeline elt etl java mssql mysql pipeline postgresql python redshift s3 self-hosted snowflake

Last synced: 16 Dec 2024

https://github.com/prefecthq/prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

automation data data-engineering data-ops data-science infrastructure ml-ops observability orchestration pipeline prefect python workflow workflow-engine

Last synced: 16 Dec 2024

https://github.com/prestodb/presto

The official home of the Presto distributed SQL query engine for big data

big-data data hadoop hive java lakehouse presto query sql

Last synced: 16 Dec 2024

https://github.com/PrefectHQ/prefect

Prefect is a workflow orchestration tool empowering developers to build, observe, and react to data pipelines

automation data data-engineering data-ops data-science infrastructure ml-ops observability orchestration pipeline prefect python workflow workflow-engine

Last synced: 29 Oct 2024

https://github.com/sinaptik-ai/pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.

ai csv data data-analysis data-science database datalake gpt-3 gpt-4 llm pandas sql

Last synced: 16 Dec 2024

https://github.com/Sinaptik-AI/pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.

ai csv data data-analysis data-science database datalake gpt-3 gpt-4 llm pandas sql

Last synced: 29 Oct 2024

https://github.com/faker-js/faker

Generate massive amounts of fake data in the browser and node.js

browser data fake faker javascript nodejs

Last synced: 16 Dec 2024

https://github.com/pwxcoo/chinese-xinhua

:orange_book: 中华新华字典数据库。包括歇后语,成语,词语,汉字。

chinese chinese-characters chinese-language chinese-nlp chinese-simplified chinese-traditional data json json-data json-dataset python3 scraper

Last synced: 17 Dec 2024

https://github.com/apple/pkl

A configuration as code language with rich validation and tooling.

config configuration data functional java json kotlin language object-oriented pkl programming-language properties propertylist validation xml yaml

Last synced: 16 Dec 2024

https://github.com/prql/prql

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement

data pipeline sql

Last synced: 16 Dec 2024

https://github.com/akfamily/akshare

AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

academic akshare asset-pricing bond currency data data-analysis data-science datasets economic-data economics finance finance-api financial-data fundamental futures option quant stock

Last synced: 16 Dec 2024

https://github.com/PRQL/prql

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement

data pipeline sql

Last synced: 29 Oct 2024

https://github.com/bchavez/bogus

:card_index: A simple fake data generator for C#, F#, and VB.NET. Based on and ported from the famed faker.js.

bogus c-sharp csharp data data-access-layer data-generator database dotnet fake faker generator poco seed test-data

Last synced: 16 Dec 2024

https://github.com/rawgraphs/rawgraphs-app

A web interface to create custom vector-based visualizations on top of RAWGraphs core

d3js data data-visualization ddj design rawgraphs svg vector-graphics visualization

Last synced: 25 Oct 2024

https://github.com/bchavez/Bogus

:card_index: A simple fake data generator for C#, F#, and VB.NET. Based on and ported from the famed faker.js.

bogus c-sharp csharp data data-access-layer data-generator database dotnet fake faker generator poco seed test-data

Last synced: 27 Oct 2024

https://github.com/DataExpert-io/data-engineer-handbook

This is a repo with links to everything you'd ever want to learn about data engineering

apachespark awesome bigdata data dataengineering sql

Last synced: 05 Nov 2024

https://github.com/mrdbourke/machine-learning-roadmap

A roadmap connecting many of the most important concepts in machine learning, how to learn them and what tools to use to perform them.

data data-science deep-learning machine-learning

Last synced: 30 Nov 2024

https://github.com/dformoso/machine-learning-mindmap

A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

cheatsheet data jupyter learning machine machine-learning mindmap python science

Last synced: 19 Dec 2024

https://github.com/flyteorg/flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.

data data-analysis data-science dataops declarative fine-tuning flyte golang grpc hacktoberfest kubernetes kubernetes-operator llm machine-learning mlops orchestration-engine production python scale workflow

Last synced: 16 Dec 2024

https://github.com/axa-group/parsr

Transforms PDF, Documents and Images into Enriched Structured Data

data document extraction hacktoberfest images nlp ocr parsr pdf python typescript

Last synced: 17 Dec 2024

https://github.com/axa-group/Parsr

Transforms PDF, Documents and Images into Enriched Structured Data

data document extraction hacktoberfest images nlp ocr parsr pdf python typescript

Last synced: 25 Oct 2024

https://github.com/countly/countly-server

Countly is a product analytics platform that helps teams track, analyze and act-on their user actions and behaviour on mobile, web and desktop applications.

analytics coppa crash-analytics crash-reports dashboard data data-ownership data-privacy feature-flags gdpr hipaa insights mobile-analytics product-analytics product-management push-notifications remote-configuration tracking user-feedback web-analytics

Last synced: 17 Dec 2024

https://github.com/airbnb/knowledge-repo

A next-generation curated knowledge sharing platform for data scientists and other technical professions.

data data-analysis data-science knowledge

Last synced: 17 Dec 2024

https://github.com/Countly/countly-server

Countly is a product analytics platform that helps teams track, analyze and act-on their user actions and behaviour on mobile, web and desktop applications.

analytics coppa crash-analytics crash-reports dashboard data data-ownership data-privacy feature-flags gdpr hipaa insights mobile-analytics product-analytics product-management push-notifications remote-configuration tracking user-feedback web-analytics

Last synced: 24 Oct 2024

https://github.com/cue-lang/cue

The home of the CUE language! Validate and define text-based and dynamic configuration

configuration data kubernetes validation

Last synced: 21 Dec 2024

https://github.com/mdn/browser-compat-data

This repository contains compatibility data for Web technologies as displayed on MDN

compat compatibility data dataset json

Last synced: 16 Dec 2024

https://github.com/superduper-io/superduper

Superduper: Build end-to-end AI applications and agent workflows on your existing data infrastructure and preferred tools - without migrating your data.

ai chatbot data database distributed-ml inference llm-inference llm-serving llmops ml mlops mongodb pretrained-models python pytorch rag semantic-search torch transformers vector-search

Last synced: 16 Dec 2024

https://github.com/brianvoe/gofakeit

Random fake data generator written in go

data fake generator go golang random seed

Last synced: 16 Dec 2024

https://github.com/brianvoe/Gofakeit

Random fake data generator written in go

data fake generator go golang random seed

Last synced: 24 Oct 2024

https://github.com/lk-geimfari/mimesis

Mimesis is a robust data generator for Python that can produce a wide range of fake data in multiple languages.

data dataframe datascience dummy factory factory-boy fake fixtures generator json-generator mimesis mock pandas polars pytest-plugin python schema syntetic synthetic-data testing

Last synced: 16 Dec 2024

https://github.com/ckan/ckan

CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.

api catalog ckan ckanext data digitalpublicgoods dpg open-data python sdg16

Last synced: 16 Dec 2024

https://github.com/tensorflow/datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...

data dataset datasets jax machine-learning numpy tensorflow

Last synced: 17 Dec 2024

https://github.com/quartz/bad-data-guide

An exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them.

data documentation guide qz-things

Last synced: 03 Dec 2024

https://github.com/Quartz/bad-data-guide

An exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them.

data documentation guide qz-things

Last synced: 31 Oct 2024

https://github.com/glideapps/glide-data-grid

🚀 Glide Data Grid is a no compromise, outrageously react fast data grid with rich rendering, first class accessibility, and full TypeScript support.

accessible data datagrid grid javascript react reactjs table typescript

Last synced: 16 Dec 2024

https://github.com/jonschlinkert/gray-matter

Smarter YAML front matter parser, used by metalsmith, Gatsby, Netlify, Assemble, mapbox-gl, phenomic, vuejs vitepress, TinaCMS, Shopify Polaris, Ant Design, Astro, hashicorp, garden, slidev, saber, sourcegraph, and many others. Simple to use, and battle tested. Parses YAML by default but can also parse JSON Front Matter, Coffee Front Matter, TOML Front Matter, and has support for custom parsers. Please follow gray-matter's author: https://github.com/jonschlinkert

assemble config data front-matter front-matter-parsers frontmatter gatsby javascript jonschlinkert mapbox markdown matter metalsmith netlify node nodejs parse phenomic yaml

Last synced: 17 Dec 2024

https://github.com/tinyplex/tinybase

The reactive data store for local‑first apps.

data javascript react reactive typescript

Last synced: 29 Oct 2024

https://github.com/dtinit/data-transfer-project

The Data Transfer Project makes it easy for platforms to build interoperable user data portability features. We are establishing a common framework, including data models and protocols, to enable direct transfer of data both into and out of participating online service providers.

data data-portability portability transfer

Last synced: 18 Dec 2024

https://github.com/heroku/react-refetch

A simple, declarative, and composable way to fetch data for React components

api data fetch react rest

Last synced: 16 Dec 2024

https://github.com/ngneat/falso

All the Fake Data for All Your Real Needs 🙂

data fake fakedata mock mockdata random

Last synced: 16 Dec 2024

https://github.com/quadratichq/quadratic

Quadratic | Spreadsheet with Python, SQL, and AI

ai data data-analysis data-engineering data-science etl python quadratic spreadsheet sql wasm webgl

Last synced: 18 Dec 2024

https://github.com/uber/aresdb

A GPU-powered real-time analytics storage and query engine.

analytics cgo cuda data database golang gpu-programming query real-time storage

Last synced: 18 Dec 2024

https://github.com/weld-project/weld

High-performance runtime for data analytics applications

analytics code-generation data llvm machine-learning pandas performance rust stanford

Last synced: 19 Dec 2024

https://github.com/pydata/pandas-datareader

Extract data from a wide range of Internet sources into a pandas DataFrame.

data data-analysis dataset econdb economic-data fama-french finance financial-data fred html pandas pydata python stock-data

Last synced: 16 Dec 2024

https://github.com/montanaflynn/stats

A well tested and comprehensive Golang statistics library package with no dependencies.

algorithms analytics data go machine-learning math rounding statistics stats

Last synced: 16 Dec 2024

https://github.com/spotify/scio

A Scala API for Apache Beam and Google Cloud Dataflow.

batch beam bigquery data dataflow google-cloud ml scala scio streaming

Last synced: 17 Dec 2024

https://github.com/justinzm/gopup

数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据…

covid19-data data data-analysis data-science datasets economic-data gopup index-data python

Last synced: 19 Dec 2024

https://github.com/dlt-hub/dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

data data-engineering data-lake data-loading data-warehouse elt extract load python transform

Last synced: 29 Oct 2024

https://github.com/kayak/pypika

PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.

builder data functional python python3 pythonic query sql

Last synced: 16 Dec 2024

https://github.com/unsplash/datasets

🎁 5,400,000+ Unsplash images made available for research and machine learning

data dataset images keywords machine-learning photos research search-engine semantics unsplash

Last synced: 20 Dec 2024

https://github.com/apache/incubator-devlake

Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.

dashboard-friendly data data-analysis data-engineering data-integration data-transfers devops domain-layer dora etl golang hacktoberfest integration jira open-source user-friendly

Last synced: 21 Dec 2024

https://github.com/EntilZha/PyFunctional

Python library for creating data pipelines with chain functional programming

data datascience functional-programming pipeline python

Last synced: 29 Oct 2024

https://github.com/entilzha/pyfunctional

Python library for creating data pipelines with chain functional programming

data datascience functional-programming pipeline python

Last synced: 17 Dec 2024

https://github.com/deepinsight-ai/deepbi

LLM based data scientist, AI native data application. AI-driven infinite thinking redefines BI.

ai analysis bi csv data gpt gpt-4 llm mysql redis

Last synced: 19 Dec 2024

https://github.com/mito-ds/mito

The mitosheet package, trymito.io, and other public Mito code.

data data-analysis data-science data-visualization jupyter pandas python streamlit-component

Last synced: 17 Dec 2024

https://github.com/emirozer/fake2db

create custom test databases that are populated with fake data

data database fake-content faker python

Last synced: 25 Oct 2024

https://github.com/tigerresearch/tigerbot

TigerBot: A multi-language multi-task LLM

chinese data llama2 llm nlp

Last synced: 19 Dec 2024

https://github.com/TigerResearch/TigerBot

TigerBot: A multi-language multi-task LLM

chinese data llama2 llm nlp

Last synced: 05 Nov 2024

https://github.com/lukes/iso-3166-countries-with-regional-codes

ISO 3166-1 country lists merged with their UN Geoscheme regional codes in ready-to-use JSON, XML, CSV data sets

countries csv data dataset iso iso3166 iso3166-1 iso3166-2 json region-codes xml

Last synced: 17 Dec 2024

https://github.com/apache/incubator-gobblin

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

apache data ingestion management replication

Last synced: 21 Dec 2024

https://github.com/apache/gobblin

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

apache data ingestion management replication

Last synced: 17 Dec 2024

https://github.com/lukes/ISO-3166-Countries-with-Regional-Codes

ISO 3166-1 country lists merged with their UN Geoscheme regional codes in ready-to-use JSON, XML, CSV data sets

countries csv data dataset iso iso3166 iso3166-1 iso3166-2 json region-codes xml

Last synced: 04 Nov 2024

https://github.com/gsa/data

Assorted data from the General Services Administration.

data domains enterprise standards technology

Last synced: 03 Dec 2024

https://github.com/visualize-ml/book6_first-course-in-data-science

Book_6_《数据有道》 | 鸢尾花书:从加减乘除到机器学习;欢迎大家批评指正!纠错多的同学会得到赠书感谢!

data data-science data-visualization feature-engineering machine-learning python

Last synced: 19 Dec 2024

https://github.com/GSA/data

Assorted data from the General Services Administration.

data domains enterprise standards technology

Last synced: 31 Oct 2024

https://github.com/mara/mara-pipelines

A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow

data data-integration etl pipeline postgresql python

Last synced: 19 Dec 2024

https://github.com/onyx-platform/onyx

Distributed, masterless, high performance, fault tolerant data processing

batch clojure data distributed streaming

Last synced: 26 Sep 2024

https://github.com/malloydata/malloy

Malloy is an experimental language for describing data relationships and transformations.

data data-visualization database malloy semantic-modeling sql

Last synced: 17 Dec 2024