An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with parquet

A curated list of projects in awesome lists tagged with parquet .

https://github.com/apache/arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics

arrow parquet

Last synced: 18 Jan 2026

https://github.com/multiprocessio/dsq

Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.

cli csv excel golang json openoffice-calc parquet sql tsv

Last synced: 14 May 2025

https://github.com/roapi/roapi

Create full-fledged APIs for slowly moving datasets without writing a single line of code.

analytics arrow blob-storage cloud-native columnar datafusion datasets delta-lake graphql in-memory-database parquet query query-frontends rest-api rust s3 sql static-datasets

Last synced: 27 Mar 2025

https://github.com/apache/parquet-java

Apache Parquet Java

apache parquet parquet-java

Last synced: 11 Jan 2026

https://github.com/apache/arrow-rs

Official Rust implementation of Apache Arrow

arrow parquet rust

Last synced: 12 May 2025

https://github.com/rilldata/rill

Rill is a tool for effortlessly transforming data sets into powerful, opinionated dashboards using SQL. BI-as-code.

bi business-analytics csv data data-analysis data-visualization dataviz duckdb gcs golang parquet parquet-tools parquet-viewer s3 sql sql-editor svelte sveltejs sveltekit

Last synced: 07 Jan 2026

https://github.com/apache/drill

Apache Drill is a distributed MPP query layer for self describing data

big-data drill hadoop hive java jdbc parquet sql

Last synced: 13 May 2025

https://github.com/apache/parquet-format

Apache Parquet Format

apache parquet parquet-format

Last synced: 14 May 2025

https://github.com/rilldata/rill-developer

Rill is a tool for effortlessly transforming data sets into powerful, opinionated dashboards using SQL. BI-as-code.

bi business-analytics csv data data-analysis data-visualization dataviz duckdb gcs golang parquet parquet-tools parquet-viewer s3 sql sql-editor svelte sveltejs sveltekit

Last synced: 08 Mar 2025

https://github.com/uber/petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

deep-learning machine-learning parquet parquet-files pyarrow pyspark pytorch sysml tensorflow

Last synced: 10 Apr 2025

https://github.com/gchq/gaffer

A large-scale entity and relation database supporting aggregation of properties

accumulo aggregation big-data graph graph-database hadoop hbase parquet spark

Last synced: 12 May 2025

https://github.com/gchq/Gaffer

A large-scale entity and relation database supporting aggregation of properties

accumulo aggregation big-data graph graph-database hadoop hbase parquet spark

Last synced: 04 May 2025

https://github.com/BemiHQ/BemiDB

Single-binary Postgres read replica optimized for analytics

analytics data-lakehouse data-movement data-warehouse duckdb iceberg olap parquet postgresql replication zero-etl

Last synced: 01 May 2025

https://github.com/paradigmxyz/cryo

cryo is the easiest way to extract blockchain data to parquet, csv, json, or python dataframes

crypto ethereum evm parquet rust

Last synced: 13 May 2025

https://github.com/quiltdata/quilt

Quilt is a data mesh for connecting people with actionable data

data data-engineering data-version-control data-versioning parquet python serialization

Last synced: 13 May 2025

https://github.com/bigdatagenomics/adam

ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.

avro big-data bioinformatics genomics java parquet python r scala spark

Last synced: 19 Oct 2025

https://github.com/mukunku/parquetviewer

Simple Windows desktop application for viewing & querying Apache Parquet files

apache-parquet big-data dot-net parquet windows-desktop

Last synced: 16 Jan 2026

https://github.com/cinchoo/choetl

ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)

avro cinchoo-etl csharp csv dotnet etl etl-framework flat json keyvalue parquet parquet-files parser reader writer xml yaml

Last synced: 12 Apr 2025

https://github.com/harisekhon/devops-python-tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

avro aws cloudformation devops docker dockerhub elasticsearch gcf gcp hadoop hbase hdfs json linux parquet pyspark python solr spark travis-ci

Last synced: 13 Jun 2025

https://github.com/HariSekhon/DevOps-Python-tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

avro aws cloudformation devops docker dockerhub elasticsearch gcf gcp hadoop hbase hdfs json linux parquet pyspark python solr spark travis-ci

Last synced: 11 Apr 2025

https://github.com/Cinchoo/ChoETL

ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)

avro cinchoo-etl csharp csv dotnet etl etl-framework flat json keyvalue parquet parquet-files parser reader writer xml yaml

Last synced: 14 Mar 2025

https://github.com/julien040/anyquery

Query anything (GitHub, Notion, +40 more) with SQL and let LLMs (ChatGPT, Claude) connect to using MCP

ai analytics api business-intelligence chatgpt claude csv data-visualization database github go json llm mcp mysql notion parquet salesforce sql sqlite

Last synced: 15 Apr 2025

https://github.com/kylebarron/parquet-wasm

Rust-based WebAssembly bindings to read and write Apache Parquet data

apache-arrow apache-parquet arrow javascript parquet rust wasm webassembly

Last synced: 14 May 2025

https://github.com/DerwenAI/kglab

Graph Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, NetworkX, RAPIDS, RDFlib, pySHACL, PyVis, morph-kgc, pslpython, pyarrow, etc.

graph-algorithms graph-libraries graph-thinking inference json-ld knowledge-graph networkx owl pandas parquet probabilistic-soft-logic python-igraph pyvis r2rml-mapping rapids rdflib roamresearch shacl skos sparql

Last synced: 01 May 2025

https://github.com/ranaroussi/pystore

Fast data store for Pandas time-series data

dask database dataframe datastore pandas parquet timeseries

Last synced: 01 Apr 2025

https://github.com/RandomFractals/vscode-data-preview

Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files

array arrow avro config csv data excel extension json parquet perspective viewer vscode yaml

Last synced: 08 Apr 2025

https://github.com/randomfractals/vscode-data-preview

Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files

array arrow avro config csv data excel extension json parquet perspective viewer vscode yaml

Last synced: 04 Apr 2025

https://github.com/netflix/iceberg

Iceberg is a table format for large, slow-moving tabular data

avro hadoop parquet spark

Last synced: 05 Aug 2025

https://github.com/crunchydata/pg_parquet

Copy to/from Parquet in S3 or Azure Blob Storage from within PostgreSQL

columnar data-ingestion data-migration parquet postgresql

Last synced: 08 Apr 2025

https://github.com/apache/parquet-cpp

Apache Parquet

big-data java parquet

Last synced: 21 Oct 2025

https://github.com/CrunchyData/pg_parquet

Copy to/from Parquet in S3 or Azure Blob Storage from within PostgreSQL

columnar data-ingestion data-migration parquet postgresql

Last synced: 05 Mar 2025

https://github.com/hyparam/hyparquet

parquet file parser for javascript

hyparquet hyperparam javascript js parquet parquetjs parser snappy thrift

Last synced: 15 May 2025

https://github.com/moshe/elasticsearch_loader

A tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch

csv elasticsearch elasticsearch-loader json logstash parquet python

Last synced: 16 May 2025

https://github.com/skale-me/skale

High performance distributed data processing engine

aws-s3 azure-storage cluster machine-learning nodejs parquet skale

Last synced: 27 Mar 2025

https://github.com/ironsource/parquetjs

fully asynchronous, pure JavaScript implementation of the Parquet file format

javascript nodejs parquet

Last synced: 15 May 2025

https://github.com/jorgecarleitao/parquet2

Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow

parallelism parquet rust safe

Last synced: 15 May 2025

https://github.com/spotify/ratatool

A tool for data sampling, data generation, and data diffing

avro bigquery parquet protobuf scala scalacheck

Last synced: 15 May 2025

https://github.com/gigapi/gigapi

GigAPI is a Timeseries lakehouse for real-time data and sub-second queries, powered by DuckDB OLAP + Parquet Query Engine, Compactor w/ Cloud-Native Storage. Drop-in FDAP alternative ⭐

api clickhouse-server data-lake database datalake duckdb duckdb-api duckdb-server ducklake fdap gigapipe golang lakehouse olap parquet qryn query-engine rest-api s3 sql

Last synced: 05 Oct 2025

https://github.com/sksamuel/centurion

Kotlin Bigdata Toolkit

bigdata java kotlin orc parquet

Last synced: 18 Dec 2025

https://github.com/manojkarthick/pqrs

Command line tool for inspecting Parquet files

arrow parquet rust

Last synced: 16 May 2025

https://github.com/l1xnan/duckling

A fast viewer for CSV/Parquet files and databases such as DuckDB, SQLite, PostgreSQL, MySQL, Clickhouse, etc., base on Tauri

clickhouse duckdb mysql parquet postgresql rust sqlite tauri

Last synced: 16 May 2025

https://github.com/Eugene-Mark/bigdata-file-viewer

A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.

avro bigdata hdfs orc parquet

Last synced: 20 Nov 2025

https://github.com/fraugster/parquet-go

Go package to read and write parquet files. parquet is a file format to store nested data structures in a flat columnar data format. It can be used in the Hadoop ecosystem and with tools such as Presto and AWS Athena.

athena golang golang-package hacktoberfest hadoop parquet parquet-schema presto

Last synced: 16 Jan 2026

https://github.com/cldellow/sqlite-parquet-vtable

A SQLite vtable extension to read Parquet files

apache-arrow apache-parquet parquet sqlite sqlite3

Last synced: 09 Apr 2025

https://github.com/theseus-rs/rsql

Command line SQL interface for relational databases and common data file formats

cockroachdb command-line csv data database duckdb json mariadb mysql parquet postgres postgresql redshift snowflake sql sqlite sqlite3 sqlserver

Last synced: 16 May 2025

https://github.com/awslabs/amazon-s3-find-and-forget

Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)

amazon-s3 aws big-data ccpa data data-erasure data-lake gdpr parquet privacy right-to-be-forgotten s3

Last synced: 04 Apr 2025

https://github.com/pacman82/odbc2parquet

A command line tool to query an ODBC data source and write the result into a parquet file.

odbc parquet

Last synced: 14 Apr 2025

https://github.com/AI-Northstar-Tech/vector-io

Comprehensive Vector Data Tooling. The universal interface for all vector database, datasets and RAG platforms. Easily export, import, backup, re-embed (using any model) or access your vector data from any vector databases or repository.

chromadb data-backup data-exploration-and-preprocessing data-export data-import datastax huggingface huggingface-datasets kdb lancedb milvus parquet pinecone qdrant turbopuffer vector-database vector-search-engine visualization zilliz

Last synced: 09 Mar 2025

https://github.com/RumbleDB/rumble

Quick start: pip install jsoniq ⛈️ RumbleDB 2.0.0 "Lemon Ironwood" 🌳 for Apache Spark | Run queries on your large-scale, messy datasets (JSON, text, CSV, Parquet, Delta...) | Data Lakehouse with Updates, Scripting, Declarative Machine Learning and more

azure csv data-science dataframes delta-lake hdfs json jsoniq lakehouse machine-learning nested parquet query query-engine s3 scale schemaless spark svm text

Last synced: 20 Nov 2025

https://github.com/rumbledb/rumble

⛈️ RumbleDB 1.23.0 "Mountain Ash" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

avro azure csv data-science dataframes hdfs json jsoniq machine-learning nested parquet query query-engine s3 scale schemaless spark svm text yaml

Last synced: 03 Aug 2025

https://github.com/scikit-hep/awkward-0.x

Manipulate arrays of complex data structures as easily as Numpy.

analysis apache-arrow arrow big-data columnar columnar-storage hdf5 numpy parquet python python3 root root-cern scikit-hep

Last synced: 02 Oct 2025

https://github.com/chabane/bigdata-playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

angular apache-flink apache-spark avro big-data docker graphql hadoop hbase kafka kops machine-learning mongodb nodejs parquet python scala spark-sql spark-streaming twitter-api

Last synced: 13 Apr 2025

https://github.com/Chabane/bigdata-playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

angular apache-flink apache-spark avro big-data docker graphql hadoop hbase kafka kops machine-learning mongodb nodejs parquet python scala spark-sql spark-streaming twitter-api

Last synced: 28 Apr 2025

https://github.com/g-research/parquetsharp

ParquetSharp is a .NET library for reading and writing Apache Parquet files.

apache-arrow apache-parquet big-data columnar-storage csharp dotnet parquet

Last synced: 14 May 2025

https://github.com/ktrueda/parquet-tools

easy install parquet-tools

cli parquet parquet-tools

Last synced: 21 Oct 2025

https://github.com/hangxie/parquet-tools

A utility to deal with Parquet data

parquet parquet-tools

Last synced: 21 Oct 2025

https://github.com/DeepRec-AI/HybridBackend

A high-performance framework for training wide-and-deep recommender systems on heterogeneous cluster

deep-learning gpu hybrid-parallelism parquet recommender-system

Last synced: 29 Apr 2025

https://github.com/deeprec-ai/hybridbackend

A high-performance framework for training wide-and-deep recommender systems on heterogeneous cluster

deep-learning gpu hybrid-parallelism parquet recommender-system

Last synced: 13 Dec 2025

https://github.com/sunchao/parquet-rs

Apache Parquet implementation in Rust

hadoop parquet rust

Last synced: 17 Jul 2025

https://github.com/51zero/eel-sdk

Big Data Toolkit for the JVM

big-data etl hadoop hive kafka kudu orc parquet scala

Last synced: 13 Apr 2025

https://github.com/xiangpenghao/liquid-cache

10x lower latency for cloud-native DataFusion

arrow cache data-analytics datafusion object-store parquet query-engine

Last synced: 16 May 2025

https://github.com/mattf96s/quackdb

Open-source in-browser DuckDB SQL editor

apache-arrow comlink duckdb duckdb-wasm parquet remix remix-run shadcn sql sst

Last synced: 16 Apr 2025

https://github.com/xiangpenghao/parquet-viewer

View parquet files online

parquet parquet-viewer rust webassembly

Last synced: 05 Apr 2025

https://github.com/juliaio/parquet.jl

Julia implementation of Parquet columnar file format reader

columnar-storage julia parquet

Last synced: 16 May 2025

https://github.com/indix/schemer

Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.

avro graphql-api json parquet schema-inference schema-registry spark tsv

Last synced: 27 Oct 2025

https://github.com/maxcountryman/warc-parquet

🗄️ A simple CLI for converting WARC to Parquet.

crawling duckdb parquet warc web-archiving

Last synced: 16 May 2025

https://github.com/parsyl/parquet

A library for reading and writing parquet files.

dremel golang parquet reader writer

Last synced: 14 Mar 2025

https://github.com/youssef-harby/overturemapsdownloader

Overture Maps Downloader simplifies geospatial data manipulation by integrating the powerful DuckDB, Dask DataFrames, and GDAL/OGR open source tools.

duckdb duckdb-wasm gdal geoparque geospatial overture-maps parquet

Last synced: 25 Sep 2025

https://github.com/nao1215/filesql

filesql - sql driver for CSV, TSV, LTSV, Parquet, Excel with gzip, bzip2, xz, zstd support.

bzip2 csv excel go golang gzip hacktoberfest ltsv parquet sql sql-driver sql-query sqlite3 tsv xz zstd

Last synced: 24 Oct 2025

https://github.com/saurfang/sparksql-protobuf

Read SparkSQL parquet file as RDD[Protobuf]

parquet protobuf sparksql

Last synced: 23 Jun 2025

https://github.com/igor-suhorukov/openstreetmap_h3

OSM planet dump high performance data loader. Transform OpenStreetMap World/Region PBF dump into partitioned by H3 regions PostGIS pgsnapshot (lossless) OSM schema representation and/or into ArrowIPC/Parquet dumps

apach-sedona apache-arrow apache-spark arrow citusdb column-store converter duckdb geometry-processing geospatial java openstreetmap parquet parquet-files pbf pbf-format postgis postgresql world

Last synced: 05 Oct 2025

https://github.com/coady/graphique

GraphQL service for arrow tables and parquet data sets.

arrow graphql parquet

Last synced: 09 May 2025

https://github.com/firebolt-db/firebolt-core

Firebolt Core is a free, self-hosted edition of Firebolt's distributed query engine (https://www.firebolt.io/); it provides high-performance data warehousing capabilities that can be deployed anywhere from a single laptop to enterprise datacenters.

ai analytics big-data cloud-native database gcs iceberg jdbc parquet postgresql query-engine s3 self-hosted sql

Last synced: 02 Jul 2025

https://github.com/jerolba/parquet-carpet

Java Parquet serialization and deserialization library using Java 17 Records

java parquet

Last synced: 17 Jan 2026

https://github.com/spotify/gcs-tools

GCS support for avro-tools, parquet-tools and protobuf

avro gcp gcs gcs-connector google-storage parquet protobuf

Last synced: 09 Apr 2025

https://github.com/ddotta/parquetize

R package that allows to convert databases of different formats to parquet format

conversion convert converter csv parquet r r-package sas spss sqlite stata

Last synced: 30 Dec 2025

https://github.com/dacort/faker-cli

Command-line interface to quickly generate fake CSV and JSON data

aws csv deltalake faker-provider json parquet pyarrow

Last synced: 31 Oct 2025

https://github.com/exyi/pg2parquet

Export PostgreSQL table or query into Parquet file

parquet postgres postgresql

Last synced: 04 Apr 2025

https://github.com/cldellow/csv2parquet

Convert a CSV to a parquet file.

apache-arrow apache-parquet csv parquet

Last synced: 21 Aug 2025

https://github.com/zsvoboda/dbd

dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.

bigquery csv database database-schemas elt etl excel json mysql parquet postgresql python python3 redshift snowflake sql sqlite xls xlsx

Last synced: 11 Sep 2025

https://github.com/apache/parquet-testing

Apache Parquet Testing

apache parquet parquet-testing

Last synced: 05 Apr 2025

https://github.com/mattf96s/QuackDB

Open-source in-browser DuckDB SQL editor

apache-arrow comlink duckdb duckdb-wasm parquet remix remix-run shadcn sql sst

Last synced: 14 May 2025

https://github.com/strategicblue/parquet-floor

A lightweight Java library that facilitates reading and writing Apache Parquet files without Hadoop dependencies

java parquet parquet-files

Last synced: 14 Jan 2026

https://github.com/dask-contrib/dask-deltatable

A Delta Lake reader for Dask

dask dask-dataframes delta-lake parquet python

Last synced: 11 Oct 2025

https://github.com/cldellow/datasette-parquet

Add DuckDB, Parquet, CSV and JSON lines support to Datasette

datasette datasette-plugin duckdb parquet

Last synced: 13 Apr 2025

https://github.com/datacoon/undatum

undatum: a command-line tool for data processing. Brings CSV simplicity to NDJSON, BSON, XML and other data files

bson cli command-line csv data dataset json jsonl jsonlines parquet

Last synced: 18 Jan 2026

https://github.com/timkpaine/perspective-parquet

Parquet file reader and editor in Jupyterlab, built with `perspective` for pivoting, filtering, aggregating, etc

data-science data-visualization datavisualization dataviz jupyter jupyterlab jupyterlab-extension jupyterlab-extensions parquet parquet-viewer perspective pivot-tables

Last synced: 03 Sep 2025

https://github.com/devinrsmith/deephaven-parquet-viewer

A browser-based Parquet file viewer

bigdata docker parquet parquet-viewer

Last synced: 14 Apr 2025