Projects in Awesome Lists tagged with parquet
A curated list of projects in awesome lists tagged with parquet .
https://github.com/apache/arrow
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
Last synced: 18 Jan 2026
https://github.com/multiprocessio/dsq
Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.
cli csv excel golang json openoffice-calc parquet sql tsv
Last synced: 14 May 2025
https://github.com/roapi/roapi
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
analytics arrow blob-storage cloud-native columnar datafusion datasets delta-lake graphql in-memory-database parquet query query-frontends rest-api rust s3 sql static-datasets
Last synced: 27 Mar 2025
https://github.com/apache/arrow-rs
Official Rust implementation of Apache Arrow
Last synced: 12 May 2025
https://github.com/dathere/qsv
Blazing-fast Data-Wrangling toolkit
ckan cli csv data-engineering data-wrangling dcat excel geocode libreoffice luau metadata opendata parquet polars postgresql sampling sql sqlite statistics timeseries
Last synced: 24 Dec 2025
https://github.com/rilldata/rill
Rill is a tool for effortlessly transforming data sets into powerful, opinionated dashboards using SQL. BI-as-code.
bi business-analytics csv data data-analysis data-visualization dataviz duckdb gcs golang parquet parquet-tools parquet-viewer s3 sql sql-editor svelte sveltejs sveltekit
Last synced: 07 Jan 2026
https://github.com/rilldata/rill-developer
Rill is a tool for effortlessly transforming data sets into powerful, opinionated dashboards using SQL. BI-as-code.
bi business-analytics csv data data-analysis data-visualization dataviz duckdb gcs golang parquet parquet-tools parquet-viewer s3 sql sql-editor svelte sveltejs sveltekit
Last synced: 08 Mar 2025
https://github.com/uber/petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
deep-learning machine-learning parquet parquet-files pyarrow pyspark pytorch sysml tensorflow
Last synced: 10 Apr 2025
https://github.com/gchq/gaffer
A large-scale entity and relation database supporting aggregation of properties
accumulo aggregation big-data graph graph-database hadoop hbase parquet spark
Last synced: 12 May 2025
https://github.com/gchq/Gaffer
A large-scale entity and relation database supporting aggregation of properties
accumulo aggregation big-data graph graph-database hadoop hbase parquet spark
Last synced: 04 May 2025
https://github.com/BemiHQ/BemiDB
Single-binary Postgres read replica optimized for analytics
analytics data-lakehouse data-movement data-warehouse duckdb iceberg olap parquet postgresql replication zero-etl
Last synced: 01 May 2025
https://github.com/quiltdata/quilt
Quilt is a data mesh for connecting people with actionable data
data data-engineering data-version-control data-versioning parquet python serialization
Last synced: 13 May 2025
https://github.com/mukunku/parquetviewer
Simple Windows desktop application for viewing & querying Apache Parquet files
apache-parquet big-data dot-net parquet windows-desktop
Last synced: 16 Jan 2026
https://github.com/tonbo-io/tonbo
A portable embedded database using Arrow.
arrow big-data database embedded-database htap lsm-tree offline-first parquet rust store-engine
Last synced: 05 Aug 2025
https://github.com/cinchoo/choetl
ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
avro cinchoo-etl csharp csv dotnet etl etl-framework flat json keyvalue parquet parquet-files parser reader writer xml yaml
Last synced: 12 Apr 2025
https://github.com/harisekhon/devops-python-tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
avro aws cloudformation devops docker dockerhub elasticsearch gcf gcp hadoop hbase hdfs json linux parquet pyspark python solr spark travis-ci
Last synced: 13 Jun 2025
https://github.com/HariSekhon/DevOps-Python-tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
avro aws cloudformation devops docker dockerhub elasticsearch gcf gcp hadoop hbase hdfs json linux parquet pyspark python solr spark travis-ci
Last synced: 11 Apr 2025
https://github.com/developmentseed/lonboard
A Python library for fast, interactive geospatial vector data visualization in Jupyter.
anywidget apache-arrow apache-parquet data-visualization deck-gl geoarrow geopandas geoparquet geospatial geospatial-analysis jupyter jupyter-widget lonboard map-visualization maps parquet python visualization webgl
Last synced: 14 May 2025
https://github.com/Cinchoo/ChoETL
ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
avro cinchoo-etl csharp csv dotnet etl etl-framework flat json keyvalue parquet parquet-files parser reader writer xml yaml
Last synced: 14 Mar 2025
https://github.com/julien040/anyquery
Query anything (GitHub, Notion, +40 more) with SQL and let LLMs (ChatGPT, Claude) connect to using MCP
ai analytics api business-intelligence chatgpt claude csv data-visualization database github go json llm mcp mysql notion parquet salesforce sql sqlite
Last synced: 15 Apr 2025
https://github.com/kylebarron/parquet-wasm
Rust-based WebAssembly bindings to read and write Apache Parquet data
apache-arrow apache-parquet arrow javascript parquet rust wasm webassembly
Last synced: 14 May 2025
https://github.com/DerwenAI/kglab
Graph Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, NetworkX, RAPIDS, RDFlib, pySHACL, PyVis, morph-kgc, pslpython, pyarrow, etc.
graph-algorithms graph-libraries graph-thinking inference json-ld knowledge-graph networkx owl pandas parquet probabilistic-soft-logic python-igraph pyvis r2rml-mapping rapids rdflib roamresearch shacl skos sparql
Last synced: 01 May 2025
https://github.com/ranaroussi/pystore
Fast data store for Pandas time-series data
dask database dataframe datastore pandas parquet timeseries
Last synced: 01 Apr 2025
https://github.com/RandomFractals/vscode-data-preview
Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files
array arrow avro config csv data excel extension json parquet perspective viewer vscode yaml
Last synced: 08 Apr 2025
https://github.com/randomfractals/vscode-data-preview
Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files
array arrow avro config csv data excel extension json parquet perspective viewer vscode yaml
Last synced: 04 Apr 2025
https://github.com/paradedb/pg_analytics
DuckDB-powered data lake analytics from Postgres
analytics arrow big-data columnar database datafusion datalake deltalake duckdb iceberg lakehouse lakehouse-platform object-storage olap paradedb parquet postgres postgresql realtime-analytics sql
Last synced: 24 Mar 2025
https://github.com/netflix/iceberg
Iceberg is a table format for large, slow-moving tabular data
Last synced: 05 Aug 2025
https://github.com/crunchydata/pg_parquet
Copy to/from Parquet in S3 or Azure Blob Storage from within PostgreSQL
columnar data-ingestion data-migration parquet postgresql
Last synced: 08 Apr 2025
https://github.com/CrunchyData/pg_parquet
Copy to/from Parquet in S3 or Azure Blob Storage from within PostgreSQL
columnar data-ingestion data-migration parquet postgresql
Last synced: 05 Mar 2025
https://github.com/hyparam/hyparquet
parquet file parser for javascript
hyparquet hyperparam javascript js parquet parquetjs parser snappy thrift
Last synced: 15 May 2025
https://github.com/apecloud/myduckserver
Unified MySQL, Postgres & FlightSQL Server, Powered by DuckDB.
analytics arrow business-analytics business-intelligence columnar-storage data-engineering data-science database duckdb htap mariadb mysql olap pandas parquet polars postgres replication sql zero-etl
Last synced: 15 May 2025
https://github.com/moshe/elasticsearch_loader
A tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch
csv elasticsearch elasticsearch-loader json logstash parquet python
Last synced: 16 May 2025
https://github.com/skale-me/skale
High performance distributed data processing engine
aws-s3 azure-storage cluster machine-learning nodejs parquet skale
Last synced: 27 Mar 2025
https://github.com/ironsource/parquetjs
fully asynchronous, pure JavaScript implementation of the Parquet file format
Last synced: 15 May 2025
https://github.com/jorgecarleitao/parquet2
Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow
Last synced: 15 May 2025
https://github.com/spotify/ratatool
A tool for data sampling, data generation, and data diffing
avro bigquery parquet protobuf scala scalacheck
Last synced: 15 May 2025
https://github.com/gigapi/gigapi
GigAPI is a Timeseries lakehouse for real-time data and sub-second queries, powered by DuckDB OLAP + Parquet Query Engine, Compactor w/ Cloud-Native Storage. Drop-in FDAP alternative ⭐
api clickhouse-server data-lake database datalake duckdb duckdb-api duckdb-server ducklake fdap gigapipe golang lakehouse olap parquet qryn query-engine rest-api s3 sql
Last synced: 05 Oct 2025
https://github.com/manojkarthick/pqrs
Command line tool for inspecting Parquet files
Last synced: 16 May 2025
https://github.com/l1xnan/duckling
A fast viewer for CSV/Parquet files and databases such as DuckDB, SQLite, PostgreSQL, MySQL, Clickhouse, etc., base on Tauri
clickhouse duckdb mysql parquet postgresql rust sqlite tauri
Last synced: 16 May 2025
https://github.com/fraugster/parquet-go
Go package to read and write parquet files. parquet is a file format to store nested data structures in a flat columnar data format. It can be used in the Hadoop ecosystem and with tools such as Presto and AWS Athena.
athena golang golang-package hacktoberfest hadoop parquet parquet-schema presto
Last synced: 16 Jan 2026
https://github.com/cldellow/sqlite-parquet-vtable
A SQLite vtable extension to read Parquet files
apache-arrow apache-parquet parquet sqlite sqlite3
Last synced: 09 Apr 2025
https://github.com/theseus-rs/rsql
Command line SQL interface for relational databases and common data file formats
cockroachdb command-line csv data database duckdb json mariadb mysql parquet postgres postgresql redshift snowflake sql sqlite sqlite3 sqlserver
Last synced: 16 May 2025
https://github.com/awslabs/amazon-s3-find-and-forget
Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
amazon-s3 aws big-data ccpa data data-erasure data-lake gdpr parquet privacy right-to-be-forgotten s3
Last synced: 04 Apr 2025
https://github.com/pacman82/odbc2parquet
A command line tool to query an ODBC data source and write the result into a parquet file.
Last synced: 14 Apr 2025
https://github.com/AI-Northstar-Tech/vector-io
Comprehensive Vector Data Tooling. The universal interface for all vector database, datasets and RAG platforms. Easily export, import, backup, re-embed (using any model) or access your vector data from any vector databases or repository.
chromadb data-backup data-exploration-and-preprocessing data-export data-import datastax huggingface huggingface-datasets kdb lancedb milvus parquet pinecone qdrant turbopuffer vector-database vector-search-engine visualization zilliz
Last synced: 09 Mar 2025
https://github.com/RumbleDB/rumble
Quick start: pip install jsoniq ⛈️ RumbleDB 2.0.0 "Lemon Ironwood" 🌳 for Apache Spark | Run queries on your large-scale, messy datasets (JSON, text, CSV, Parquet, Delta...) | Data Lakehouse with Updates, Scripting, Declarative Machine Learning and more
azure csv data-science dataframes delta-lake hdfs json jsoniq lakehouse machine-learning nested parquet query query-engine s3 scale schemaless spark svm text
Last synced: 20 Nov 2025
https://github.com/rumbledb/rumble
⛈️ RumbleDB 1.23.0 "Mountain Ash" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
avro azure csv data-science dataframes hdfs json jsoniq machine-learning nested parquet query query-engine s3 scale schemaless spark svm text yaml
Last synced: 03 Aug 2025
https://github.com/pgspider/parquet_s3_fdw
ParquetS3 Foreign Data Wrapper for PostgresSQL
fdw foreign-data-wrapper foreign-tables parquet parquets3-fdw postgresql postgresql-extension s3
Last synced: 16 Jan 2026
https://github.com/scikit-hep/awkward-0.x
Manipulate arrays of complex data structures as easily as Numpy.
analysis apache-arrow arrow big-data columnar columnar-storage hdf5 numpy parquet python python3 root root-cern scikit-hep
Last synced: 02 Oct 2025
https://github.com/chabane/bigdata-playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
angular apache-flink apache-spark avro big-data docker graphql hadoop hbase kafka kops machine-learning mongodb nodejs parquet python scala spark-sql spark-streaming twitter-api
Last synced: 13 Apr 2025
https://github.com/Chabane/bigdata-playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
angular apache-flink apache-spark avro big-data docker graphql hadoop hbase kafka kops machine-learning mongodb nodejs parquet python scala spark-sql spark-streaming twitter-api
Last synced: 28 Apr 2025
https://github.com/g-research/parquetsharp
ParquetSharp is a .NET library for reading and writing Apache Parquet files.
apache-arrow apache-parquet big-data columnar-storage csharp dotnet parquet
Last synced: 14 May 2025
https://github.com/spotify/magnolify
A collection of Magnolia add-on modules
avro bigquery bigtable cats datastore guava magnolia neo4j parquet protobuf scala scalacheck tensorflow
Last synced: 08 Apr 2025
https://github.com/hangxie/parquet-tools
A utility to deal with Parquet data
Last synced: 21 Oct 2025
https://github.com/DeepRec-AI/HybridBackend
A high-performance framework for training wide-and-deep recommender systems on heterogeneous cluster
deep-learning gpu hybrid-parallelism parquet recommender-system
Last synced: 29 Apr 2025
https://github.com/deeprec-ai/hybridbackend
A high-performance framework for training wide-and-deep recommender systems on heterogeneous cluster
deep-learning gpu hybrid-parallelism parquet recommender-system
Last synced: 13 Dec 2025
https://github.com/sunchao/parquet-rs
Apache Parquet implementation in Rust
Last synced: 17 Jul 2025
https://github.com/xiangpenghao/liquid-cache
10x lower latency for cloud-native DataFusion
arrow cache data-analytics datafusion object-store parquet query-engine
Last synced: 16 May 2025
https://github.com/mattf96s/quackdb
Open-source in-browser DuckDB SQL editor
apache-arrow comlink duckdb duckdb-wasm parquet remix remix-run shadcn sql sst
Last synced: 16 Apr 2025
https://github.com/xiangpenghao/parquet-viewer
View parquet files online
parquet parquet-viewer rust webassembly
Last synced: 05 Apr 2025
https://github.com/juliaio/parquet.jl
Julia implementation of Parquet columnar file format reader
columnar-storage julia parquet
Last synced: 16 May 2025
https://github.com/indix/schemer
Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
avro graphql-api json parquet schema-inference schema-registry spark tsv
Last synced: 27 Oct 2025
https://github.com/maxcountryman/warc-parquet
🗄️ A simple CLI for converting WARC to Parquet.
crawling duckdb parquet warc web-archiving
Last synced: 16 May 2025
https://github.com/youssef-harby/overturemapsdownloader
Overture Maps Downloader simplifies geospatial data manipulation by integrating the powerful DuckDB, Dask DataFrames, and GDAL/OGR open source tools.
duckdb duckdb-wasm gdal geoparque geospatial overture-maps parquet
Last synced: 25 Sep 2025
https://github.com/saurfang/sparksql-protobuf
Read SparkSQL parquet file as RDD[Protobuf]
Last synced: 23 Jun 2025
https://github.com/igor-suhorukov/openstreetmap_h3
OSM planet dump high performance data loader. Transform OpenStreetMap World/Region PBF dump into partitioned by H3 regions PostGIS pgsnapshot (lossless) OSM schema representation and/or into ArrowIPC/Parquet dumps
apach-sedona apache-arrow apache-spark arrow citusdb column-store converter duckdb geometry-processing geospatial java openstreetmap parquet parquet-files pbf pbf-format postgis postgresql world
Last synced: 05 Oct 2025
https://github.com/wylie102/duckdb.yazi
Yazi plugin that uses duckdb to preview data files.
csv csv-viewer duckdb duckdb-community duckdb-database excel excel-viewer json json-viewer parquet parquet-viewer yazi yazi-plugin
Last synced: 12 Aug 2025
https://github.com/coady/graphique
GraphQL service for arrow tables and parquet data sets.
Last synced: 09 May 2025
https://github.com/firebolt-db/firebolt-core
Firebolt Core is a free, self-hosted edition of Firebolt's distributed query engine (https://www.firebolt.io/); it provides high-performance data warehousing capabilities that can be deployed anywhere from a single laptop to enterprise datacenters.
ai analytics big-data cloud-native database gcs iceberg jdbc parquet postgresql query-engine s3 self-hosted sql
Last synced: 02 Jul 2025
https://github.com/jerolba/parquet-carpet
Java Parquet serialization and deserialization library using Java 17 Records
Last synced: 17 Jan 2026
https://github.com/voluntas/duckdb-wasm-parquet
duckdb-wasm parquet playwright typescript vite
Last synced: 09 Apr 2025
https://github.com/spotify/gcs-tools
GCS support for avro-tools, parquet-tools and protobuf
avro gcp gcs gcs-connector google-storage parquet protobuf
Last synced: 09 Apr 2025
https://github.com/dacort/faker-cli
Command-line interface to quickly generate fake CSV and JSON data
aws csv deltalake faker-provider json parquet pyarrow
Last synced: 31 Oct 2025
https://github.com/exyi/pg2parquet
Export PostgreSQL table or query into Parquet file
Last synced: 04 Apr 2025
https://github.com/cldellow/csv2parquet
Convert a CSV to a parquet file.
apache-arrow apache-parquet csv parquet
Last synced: 21 Aug 2025
https://github.com/monix/monix-connect
A set of connectors for Monix. 🔛
aws connectors dynamodb elasticsearch google-cloud-storage hdfs mongodb monix parquet reactive-streams redis s3 scala sqs workflow
Last synced: 21 Jul 2025
https://github.com/zsvoboda/dbd
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
bigquery csv database database-schemas elt etl excel json mysql parquet postgresql python python3 redshift snowflake sql sqlite xls xlsx
Last synced: 11 Sep 2025
https://github.com/apache/parquet-testing
Apache Parquet Testing
apache parquet parquet-testing
Last synced: 05 Apr 2025
https://github.com/mattf96s/QuackDB
Open-source in-browser DuckDB SQL editor
apache-arrow comlink duckdb duckdb-wasm parquet remix remix-run shadcn sql sst
Last synced: 14 May 2025
https://github.com/strategicblue/parquet-floor
A lightweight Java library that facilitates reading and writing Apache Parquet files without Hadoop dependencies
Last synced: 14 Jan 2026
https://github.com/dask-contrib/dask-deltatable
A Delta Lake reader for Dask
dask dask-dataframes delta-lake parquet python
Last synced: 11 Oct 2025
https://github.com/cldellow/datasette-parquet
Add DuckDB, Parquet, CSV and JSON lines support to Datasette
datasette datasette-plugin duckdb parquet
Last synced: 13 Apr 2025
https://github.com/timkpaine/perspective-parquet
Parquet file reader and editor in Jupyterlab, built with `perspective` for pivoting, filtering, aggregating, etc
data-science data-visualization datavisualization dataviz jupyter jupyterlab jupyterlab-extension jupyterlab-extensions parquet parquet-viewer perspective pivot-tables
Last synced: 03 Sep 2025
https://github.com/devinrsmith/deephaven-parquet-viewer
A browser-based Parquet file viewer
bigdata docker parquet parquet-viewer
Last synced: 14 Apr 2025