Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with data
A curated list of projects in awesome lists tagged with data .
https://github.com/tanstack/query
🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.
async cache data fetch graphql hooks query react rest solid stale stale-while-revalidate svelte typescript update vue
Last synced: 16 Dec 2024
https://github.com/TanStack/query
🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.
async cache data fetch graphql hooks query react rest stale stale-while-revalidate update
Last synced: 25 Oct 2024
https://github.com/metabase/metabase
The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
analytics bi business-intelligence businessintelligence clojure dashboard data data-analysis data-visualization database metabase mysql postgres postgresql reporting slack sql-editor visualization
Last synced: 16 Dec 2024
https://github.com/run-llama/llama_index
LlamaIndex is a data framework for your LLM applications
agents application data fine-tuning framework llamaindex llm multi-agents rag vector-database
Last synced: 16 Dec 2024
https://github.com/vercel/swr
React Hooks for Data Fetching
cache data data-fetching fetch hook hooks nextjs react react-native stale-while-revalidate suspense swr vercel
Last synced: 21 Dec 2024
https://github.com/mendableai/firecrawl
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
ai ai-scraping crawler data html-to-markdown llm markdown rag scraper scraping web-crawler webscraping
Last synced: 16 Dec 2024
https://github.com/fivethirtyeight/data
Data and code behind the articles and graphics at FiveThirtyEight
Last synced: 16 Dec 2024
https://github.com/airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
bigquery change-data-capture data data-analysis data-collection data-engineering data-integration data-pipeline elt etl java mssql mysql pipeline postgresql python redshift s3 self-hosted snowflake
Last synced: 16 Dec 2024
https://github.com/prefecthq/prefect
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
automation data data-engineering data-ops data-science infrastructure ml-ops observability orchestration pipeline prefect python workflow workflow-engine
Last synced: 16 Dec 2024
https://github.com/PrefectHQ/prefect
Prefect is a workflow orchestration tool empowering developers to build, observe, and react to data pipelines
automation data data-engineering data-ops data-science infrastructure ml-ops observability orchestration pipeline prefect python workflow workflow-engine
Last synced: 29 Oct 2024
https://github.com/sinaptik-ai/pandas-ai
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
ai csv data data-analysis data-science database datalake gpt-3 gpt-4 llm pandas sql
Last synced: 16 Dec 2024
https://github.com/newTendermint/awesome-bigdata
A curated list of awesome big data frameworks, ressources and other awesomeness.
awesome awesome-list bigdata data data-analytics data-science data-stream data-visualization data-warehouse database distributed-database series-database stream-processing streaming-data visualize-data
Last synced: 13 Dec 2024
https://github.com/Sinaptik-AI/pandas-ai
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
ai csv data data-analysis data-science database datalake gpt-3 gpt-4 llm pandas sql
Last synced: 29 Oct 2024
https://github.com/faker-js/faker
Generate massive amounts of fake data in the browser and node.js
browser data fake faker javascript nodejs
Last synced: 16 Dec 2024
https://github.com/pwxcoo/chinese-xinhua
:orange_book: 中华新华字典数据库。包括歇后语,成语,词语,汉字。
chinese chinese-characters chinese-language chinese-nlp chinese-simplified chinese-traditional data json json-data json-dataset python3 scraper
Last synced: 17 Dec 2024
https://github.com/apple/pkl
A configuration as code language with rich validation and tooling.
config configuration data functional java json kotlin language object-oriented pkl programming-language properties propertylist validation xml yaml
Last synced: 16 Dec 2024
https://github.com/prql/prql
PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement
Last synced: 16 Dec 2024
https://github.com/akfamily/akshare
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
academic akshare asset-pricing bond currency data data-analysis data-science datasets economic-data economics finance finance-api financial-data fundamental futures option quant stock
Last synced: 16 Dec 2024
https://github.com/PRQL/prql
PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement
Last synced: 29 Oct 2024
https://github.com/bchavez/bogus
:card_index: A simple fake data generator for C#, F#, and VB.NET. Based on and ported from the famed faker.js.
bogus c-sharp csharp data data-access-layer data-generator database dotnet fake faker generator poco seed test-data
Last synced: 16 Dec 2024
https://github.com/rawgraphs/rawgraphs-app
A web interface to create custom vector-based visualizations on top of RAWGraphs core
d3js data data-visualization ddj design rawgraphs svg vector-graphics visualization
Last synced: 25 Oct 2024
https://github.com/bchavez/Bogus
:card_index: A simple fake data generator for C#, F#, and VB.NET. Based on and ported from the famed faker.js.
bogus c-sharp csharp data data-access-layer data-generator database dotnet fake faker generator poco seed test-data
Last synced: 27 Oct 2024
https://github.com/DataExpert-io/data-engineer-handbook
This is a repo with links to everything you'd ever want to learn about data engineering
apachespark awesome bigdata data dataengineering sql
Last synced: 05 Nov 2024
https://github.com/mrdbourke/machine-learning-roadmap
A roadmap connecting many of the most important concepts in machine learning, how to learn them and what tools to use to perform them.
data data-science deep-learning machine-learning
Last synced: 30 Nov 2024
https://github.com/mage-ai/mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data.
artificial-intelligence data data-engineering data-integration data-pipelines data-science dbt elt etl machine-learning orchestration pipeline pipelines python reverse-etl spark sql transformation
Last synced: 16 Dec 2024
https://github.com/snowplow/snowplow
The leader in Next-Generation Customer Data Infrastructure
analytics data data-collection data-pipeline marketing-analytics product-analytics snowplow snowplow-events snowplow-pipeline
Last synced: 16 Dec 2024
https://github.com/olifolkerd/tabulator
Interactive Tables and Data Grids for JavaScript
ajax cdnjs data grid grid-layout grid-system javascript jquery json list react sort table tabulator tabulator-table widget
Last synced: 16 Dec 2024
https://olifolkerd.github.io/tabulator/
Interactive Tables and Data Grids for JavaScript
ajax cdnjs data grid grid-layout grid-system javascript jquery json list react sort table tabulator tabulator-table widget
Last synced: 16 Nov 2024
https://github.com/dformoso/machine-learning-mindmap
A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.
cheatsheet data jupyter learning machine machine-learning mindmap python science
Last synced: 19 Dec 2024
https://github.com/cloudquery/cloudquery
The open source high performance ELT framework powered by Apache Arrow
airbyte attack-surface-management aws azure bigquery cspm data data-analysis data-collection data-engineering data-integration elt etl etl-framework gcp github-api go google kubernetes sql
Last synced: 16 Dec 2024
https://github.com/flyteorg/flyte
Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
data data-analysis data-science dataops declarative fine-tuning flyte golang grpc hacktoberfest kubernetes kubernetes-operator llm machine-learning mlops orchestration-engine production python scale workflow
Last synced: 16 Dec 2024
https://github.com/axa-group/parsr
Transforms PDF, Documents and Images into Enriched Structured Data
data document extraction hacktoberfest images nlp ocr parsr pdf python typescript
Last synced: 17 Dec 2024
https://github.com/axa-group/Parsr
Transforms PDF, Documents and Images into Enriched Structured Data
data document extraction hacktoberfest images nlp ocr parsr pdf python typescript
Last synced: 25 Oct 2024
https://github.com/countly/countly-server
Countly is a product analytics platform that helps teams track, analyze and act-on their user actions and behaviour on mobile, web and desktop applications.
analytics coppa crash-analytics crash-reports dashboard data data-ownership data-privacy feature-flags gdpr hipaa insights mobile-analytics product-analytics product-management push-notifications remote-configuration tracking user-feedback web-analytics
Last synced: 17 Dec 2024
https://github.com/airbnb/knowledge-repo
A next-generation curated knowledge sharing platform for data scientists and other technical professions.
data data-analysis data-science knowledge
Last synced: 17 Dec 2024
https://github.com/Countly/countly-server
Countly is a product analytics platform that helps teams track, analyze and act-on their user actions and behaviour on mobile, web and desktop applications.
analytics coppa crash-analytics crash-reports dashboard data data-ownership data-privacy feature-flags gdpr hipaa insights mobile-analytics product-analytics product-management push-notifications remote-configuration tracking user-feedback web-analytics
Last synced: 24 Oct 2024
https://github.com/cue-lang/cue
The home of the CUE language! Validate and define text-based and dynamic configuration
configuration data kubernetes validation
Last synced: 21 Dec 2024
https://github.com/mdn/browser-compat-data
This repository contains compatibility data for Web technologies as displayed on MDN
compat compatibility data dataset json
Last synced: 16 Dec 2024
https://github.com/superduper-io/superduper
Superduper: Build end-to-end AI applications and agent workflows on your existing data infrastructure and preferred tools - without migrating your data.
ai chatbot data database distributed-ml inference llm-inference llm-serving llmops ml mlops mongodb pretrained-models python pytorch rag semantic-search torch transformers vector-search
Last synced: 16 Dec 2024
https://github.com/lk-geimfari/mimesis
Mimesis is a robust data generator for Python that can produce a wide range of fake data in multiple languages.
data dataframe datascience dummy factory factory-boy fake fixtures generator json-generator mimesis mock pandas polars pytest-plugin python schema syntetic synthetic-data testing
Last synced: 16 Dec 2024
https://github.com/ckan/ckan
CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.
api catalog ckan ckanext data digitalpublicgoods dpg open-data python sdg16
Last synced: 16 Dec 2024
https://github.com/tensorflow/datasets
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
data dataset datasets jax machine-learning numpy tensorflow
Last synced: 17 Dec 2024
https://github.com/quartz/bad-data-guide
An exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them.
data documentation guide qz-things
Last synced: 03 Dec 2024
https://github.com/Quartz/bad-data-guide
An exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them.
data documentation guide qz-things
Last synced: 31 Oct 2024
https://github.com/glideapps/glide-data-grid
🚀 Glide Data Grid is a no compromise, outrageously react fast data grid with rich rendering, first class accessibility, and full TypeScript support.
accessible data datagrid grid javascript react reactjs table typescript
Last synced: 16 Dec 2024
https://github.com/jonschlinkert/gray-matter
Smarter YAML front matter parser, used by metalsmith, Gatsby, Netlify, Assemble, mapbox-gl, phenomic, vuejs vitepress, TinaCMS, Shopify Polaris, Ant Design, Astro, hashicorp, garden, slidev, saber, sourcegraph, and many others. Simple to use, and battle tested. Parses YAML by default but can also parse JSON Front Matter, Coffee Front Matter, TOML Front Matter, and has support for custom parsers. Please follow gray-matter's author: https://github.com/jonschlinkert
assemble config data front-matter front-matter-parsers frontmatter gatsby javascript jonschlinkert mapbox markdown matter metalsmith netlify node nodejs parse phenomic yaml
Last synced: 17 Dec 2024
https://github.com/tinyplex/tinybase
The reactive data store for local‑first apps.
data javascript react reactive typescript
Last synced: 29 Oct 2024
https://github.com/ArroyoSystems/arroyo
Distributed stream processing engine in Rust
data data-stream-processing dev-tools infrastructure kafka rust sql stream-processing stream-processing-engine
Last synced: 30 Oct 2024
https://github.com/dtinit/data-transfer-project
The Data Transfer Project makes it easy for platforms to build interoperable user data portability features. We are establishing a common framework, including data models and protocols, to enable direct transfer of data both into and out of participating online service providers.
data data-portability portability transfer
Last synced: 18 Dec 2024
https://github.com/belval/textrecognitiondatagenerator
A synthetic data generator for text recognition
data dataset fake ocr synthetic text text-recognition training-set-generator
Last synced: 16 Dec 2024
https://github.com/docta-ai/docta
A Doctor for your data
data data-centric-ai data-centric-machine-learning data-curation data-diagnosis language-model rlhf
Last synced: 17 Dec 2024
https://github.com/Belval/TextRecognitionDataGenerator
A synthetic data generator for text recognition
data dataset fake ocr synthetic text text-recognition training-set-generator
Last synced: 31 Oct 2024
https://github.com/superstreamlabs/memphis
Memphis.dev is a highly scalable and effortless data streaming platform
data data-engineering data-pipeline data-stream-processing data-streaming enrichment golang kubernetes message-broker message-bus message-queue messaging-queue microservices schema-registry
Last synced: 19 Dec 2024
https://github.com/Docta-ai/docta
A Doctor for your data
data data-centric-ai data-centric-machine-learning data-curation data-diagnosis language-model rlhf
Last synced: 30 Oct 2024
https://github.com/quadratichq/quadratic
Quadratic | Spreadsheet with Python, SQL, and AI
ai data data-analysis data-engineering data-science etl python quadratic spreadsheet sql wasm webgl
Last synced: 18 Dec 2024
https://github.com/weld-project/weld
High-performance runtime for data analytics applications
analytics code-generation data llvm machine-learning pandas performance rust stanford
Last synced: 19 Dec 2024
https://github.com/pydata/pandas-datareader
Extract data from a wide range of Internet sources into a pandas DataFrame.
data data-analysis dataset econdb economic-data fama-french finance financial-data fred html pandas pydata python stock-data
Last synced: 16 Dec 2024
https://github.com/montanaflynn/stats
A well tested and comprehensive Golang statistics library package with no dependencies.
algorithms analytics data go machine-learning math rounding statistics stats
Last synced: 16 Dec 2024
https://github.com/datafold/data-diff
Compare tables within or across databases
data data-diffing data-engineering data-quality data-quality-monitoring data-science database databricks-sql dataengineering dataquality dbt mysql oracle-database postgres postgresql python rdbms snowflake sql trino
Last synced: 29 Oct 2024
https://github.com/justinzm/gopup
数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据…
covid19-data data data-analysis data-science datasets economic-data gopup index-data python
Last synced: 19 Dec 2024
https://github.com/dlt-hub/dlt
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
data data-engineering data-lake data-loading data-warehouse elt extract load python transform
Last synced: 29 Oct 2024
https://github.com/kayak/pypika
PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.
builder data functional python python3 pythonic query sql
Last synced: 16 Dec 2024
https://github.com/unsplash/datasets
🎁 5,400,000+ Unsplash images made available for research and machine learning
data dataset images keywords machine-learning photos research search-engine semantics unsplash
Last synced: 20 Dec 2024
https://github.com/apache/incubator-devlake
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
dashboard-friendly data data-analysis data-engineering data-integration data-transfers devops domain-layer dora etl golang hacktoberfest integration jira open-source user-friendly
Last synced: 21 Dec 2024
https://github.com/kanaries/graphic-walker
An open source alternative to Tableau. Embeddable visual analytic
bi data data-analysis data-mining data-visualization eda k6s kanaries low-code pivot-table react tableau tableau-alternative typescript vega vega-lite visualization
Last synced: 15 Dec 2024
https://github.com/Kanaries/graphic-walker
An open source alternative to Tableau. Embeddable visual analytic
bi data data-analysis data-mining data-visualization eda k6s kanaries low-code pivot-table react tableau tableau-alternative typescript vega vega-lite visualization
Last synced: 25 Oct 2024
https://github.com/EntilZha/PyFunctional
Python library for creating data pipelines with chain functional programming
data datascience functional-programming pipeline python
Last synced: 29 Oct 2024
https://github.com/entilzha/pyfunctional
Python library for creating data pipelines with chain functional programming
data datascience functional-programming pipeline python
Last synced: 17 Dec 2024
https://github.com/mito-ds/mito
The mitosheet package, trymito.io, and other public Mito code.
data data-analysis data-science data-visualization jupyter pandas python streamlit-component
Last synced: 17 Dec 2024
https://github.com/emirozer/fake2db
create custom test databases that are populated with fake data
data database fake-content faker python
Last synced: 25 Oct 2024
https://github.com/approximatelabs/sketch
AI code-writing assistant that understands data content
ai codex copilot data data-science dataframe datasketch datasketches df ds gpt3 lambdaprompt pandas python sketches tabular-data
Last synced: 01 Nov 2024
https://github.com/benkeen/generatedata
A powerful, feature-rich, random test data generator.
data data-generation data-generator data-generators human-data json random random-generation randomization rest-api test-data test-data-generator testing
Last synced: 17 Dec 2024
https://github.com/apache/incubator-gobblin
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.
apache data ingestion management replication
Last synced: 21 Dec 2024
https://github.com/apache/gobblin
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.
apache data ingestion management replication
Last synced: 17 Dec 2024
https://github.com/github/CodeSearchNet
Datasets, tools, and benchmarks for representation learning of code.
bert cnn data data-science datasets deep-learning machine-learning machine-learning-on-source-code ml natural-language-processing neural-networks nlp nlp-machine-learning open-data programming-language-theory python representation-learning rnn self-attention tensorflow
Last synced: 24 Oct 2024
https://github.com/github/codesearchnet
Datasets, tools, and benchmarks for representation learning of code.
bert cnn data data-science datasets deep-learning machine-learning machine-learning-on-source-code ml natural-language-processing neural-networks nlp nlp-machine-learning open-data programming-language-theory python representation-learning rnn self-attention tensorflow
Last synced: 26 Sep 2024
https://github.com/colour-science/colour
Colour Science for Python
color color-science color-space color-spaces colorspace colorspaces colour colour-science colour-space colour-spaces colourspace colourspaces data dataset datasets python spectral-data spectral-dataset spectral-datasets
Last synced: 17 Dec 2024
https://github.com/gsa/data
Assorted data from the General Services Administration.
data domains enterprise standards technology
Last synced: 03 Dec 2024
https://github.com/visualize-ml/book6_first-course-in-data-science
Book_6_《数据有道》 | 鸢尾花书:从加减乘除到机器学习;欢迎大家批评指正!纠错多的同学会得到赠书感谢!
data data-science data-visualization feature-engineering machine-learning python
Last synced: 19 Dec 2024
https://github.com/GSA/data
Assorted data from the General Services Administration.
data domains enterprise standards technology
Last synced: 31 Oct 2024
https://github.com/mara/mara-pipelines
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
data data-integration etl pipeline postgresql python
Last synced: 19 Dec 2024
https://github.com/onyx-platform/onyx
Distributed, masterless, high performance, fault tolerant data processing
batch clojure data distributed streaming
Last synced: 26 Sep 2024
https://github.com/pretzelai/pretzelai
The modern replacement for Jupyter Notebooks
analytics artificial-intelligence business-intelligence businessintelligence dashboard data data-analysis data-analytics data-science data-visualization duckdb notebooks open-source prql reporting sql sql-editor sql-editor-online visualization wasm
Last synced: 18 Dec 2024
https://github.com/malloydata/malloy
Malloy is an experimental language for describing data relationships and transformations.
data data-visualization database malloy semantic-modeling sql
Last synced: 17 Dec 2024