An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with parquet-files

A curated list of projects in awesome lists tagged with parquet-files .

https://github.com/uber/petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

deep-learning machine-learning parquet parquet-files pyarrow pyspark pytorch sysml tensorflow

Last synced: 10 Apr 2025

https://github.com/cinchoo/choetl

ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)

avro cinchoo-etl csharp csv dotnet etl etl-framework flat json keyvalue parquet parquet-files parser reader writer xml yaml

Last synced: 12 Apr 2025

https://github.com/Cinchoo/ChoETL

ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)

avro cinchoo-etl csharp csv dotnet etl etl-framework flat json keyvalue parquet parquet-files parser reader writer xml yaml

Last synced: 14 Mar 2025

https://github.com/hrbrmstr/sergeant

:guardsman: Tools to Transform and Query Data with 'Apache' 'Drill'

apache-drill dplyr drill parquet-files r r-cyber rstats sql

Last synced: 16 Mar 2025

https://github.com/mongodb-labs/mongo-arrow

MongoDB integrations for Apache Arrow. Export MongoDB documents to numpy array, parquet files, and pandas dataframes in one line of code.

apache-arrow arrow mongodb numpy-arrays pandas-dataframe parquet-files python

Last synced: 16 May 2025

https://github.com/minio/spark-select

A library for Spark DataFrame using MinIO Select API

amazon-s3 bigdata minio parquet-files pyspark sbt select spark spark-sql

Last synced: 20 Jun 2025

https://github.com/adrianulbona/osm-parquetizer

A converter for the OSM PBFs to Parquet files

apache-spark converter openstreetmap parquet-files pbf

Last synced: 29 Oct 2025

https://github.com/igor-suhorukov/openstreetmap_h3

OSM planet dump high performance data loader. Transform OpenStreetMap World/Region PBF dump into partitioned by H3 regions PostGIS pgsnapshot (lossless) OSM schema representation and/or into ArrowIPC/Parquet dumps

apach-sedona apache-arrow apache-spark arrow citusdb column-store converter duckdb geometry-processing geospatial java openstreetmap parquet parquet-files pbf pbf-format postgis postgresql world

Last synced: 05 Oct 2025

https://github.com/strategicblue/parquet-floor

A lightweight Java library that facilitates reading and writing Apache Parquet files without Hadoop dependencies

java parquet parquet-files

Last synced: 14 Jan 2026

https://github.com/hannes/miniparquet

Library to read a subset of Parquet files

cpp cpp11 dependency-free parquet parquet-cpp parquet-files

Last synced: 15 Mar 2025

https://github.com/squey/squey

Squey is a visualization software designed to interactively explore and understand large amounts of tabular data (this is the read-only mirror of https://gitlab.com/squey/squey)

cybersecurity data-analysis data-science data-visualization exploratory-data-visualizations parallel-coordinates parquet parquet-files parquet-viewer pcap timeseries timeseries-analysis visualization

Last synced: 08 Mar 2025

https://github.com/hrbrmstr/sergeant-caffeinated

:guardsman: ☕️ Tools to Transform and Query Data with 'Apache' 'Drill'

dplyr drill jdbc parquet-files r rstats sql

Last synced: 18 Jul 2025

https://github.com/gaborcsardi/nanoparquet-cli

Command line Docker app to query and manipulate Parquet files

cli docker parquet-files

Last synced: 11 Feb 2026

https://github.com/adrigrillo/nycsparktaxi

Apache Spark application to get the top ten frequent routes and profitable areas

big-data nyc parquet-files python spark taxi

Last synced: 05 Apr 2025

https://github.com/domvwt/parquet-inspector

A command line tool for inspecting parquet files with PyArrow.

cli parquet parquet-cli parquet-files parquet-generator parquet-tools parquet-viewer

Last synced: 19 Sep 2025

https://github.com/dgtlss/parqbridge

ParqBridge focuses on zero PHP dependency bloat while still producing spec-compliant Parquet files by delegating the final write step to a tiny, embedded Python script using PyArrow (or any custom CLI you prefer). You keep full Laravel DX for configuration and Storage; we bridge your data to Parquet.

laravel laravel-framework laravel-package parquet parquet-files parquet-generator parquet-schema php php8 powerbi python

Last synced: 03 Oct 2025

https://github.com/cajuncoding/parquetfiles.blobhelpers

A simple library and console application to illustrate how to read and load data into class models from Parquet files saved to Azure Blob Storage using Parquet .Net (parquet-dotnet). This is useful for E-L-T processes whereby you need to load the data into Memory, Sql Server (e.g. Azure SQL), etc. or any other location where there is no built-in or default mechanism for working with Parquet data.

azure-blob azure-blob-storage azure-functions parquet parquet-data parquet-dotnet parquet-files parquet-tools

Last synced: 08 Mar 2026

https://github.com/ayushverma135/json-to-parquet-parser

Easily convert JSON data into Parquet format for efficient storage and analysis. Simplify data processing and analysis pipelines by converting JSON objects into optimized Parquet files.

json pandas parquet-files python

Last synced: 09 Jul 2025

https://github.com/alastairtree/crump

Python & CLI tool for getting data from files into a DB fast.

cdf-files csv-files parquet-files postgresql sqlite

Last synced: 04 Mar 2026

https://github.com/tee8z/noaa-oracle

NOAA data oracle, queryable from the browser and can attest to events for a Bitcoin DLC in dlctix style

data duckdb-wasm noaa-weather parquet-files sql weather

Last synced: 17 Feb 2026

https://github.com/ostrokach/uniparc_xml_parser

UniParc dataset describing ~300 million protein sequences converted into relational tables accessible through Google BigQuery (and as Parquet files).

bigquery bioinformatics csv-files parquet-files protein-domains protein-sequences

Last synced: 03 Jan 2026

https://github.com/rlesur/quarto-ojs-parquet-s3

A Quarto notebook requesting a parquet file stored in S3

minio parquet-files quarto s3-storage

Last synced: 25 Feb 2025

https://github.com/ffatahillah7/etl-python-joindataframe-topostgresql

ETL Transform dataset from joining two dataset Csv and Parquet to PostgreSQL

join-tables pandasql parquet-files postgresql python sql sqlalchemy sqlalchemy-python

Last synced: 14 May 2025

https://github.com/jhylin/ml1-1_small_mols_in_chembl

Polars dataframe library and logistic regression in scikit-learn (update)

logistic-regression machine-learning parquet-files polars-dataframe scikit-learn

Last synced: 03 Jan 2026

https://github.com/slatawa/csv_parquet

Project showing integration of upstream file into your data lake. we look at handling high volume customized data formats and converting them into parquet.

parquet-files pyspark python3

Last synced: 10 Aug 2025

https://github.com/varsha-vraj/airport_parking_toolkit

This toolkit is designed to simulate and manage airport parking events. It provides a command-line interface (CLI) for managing vehicles, zones, and parking events. It includes full integration with PostgreSQL for data storage, SQL for advanced queries, and Apache Spark for big data batch processing of parquet logs.

big-data cli dataengineering hadoop java parquet-files poetry postgresql pyspark python3 spark sqlalchemy

Last synced: 11 Aug 2025

https://github.com/yo-mah-ya/file_creator

create files which formats are like "orc", "parquet", "xlsx", "json" and so on with Python

orcfile pandas parquet parquet-files python3

Last synced: 14 Mar 2025

https://github.com/ahbiels/fegtec

FegTec é uma empresa fictícia que quer transferir arquivos parquet contendo dados dos clientes da nuvem AWS para a Google Cloud

aws bucket cloudfunctions data-engineer gcp pandas parquet-files python transfer-data

Last synced: 27 Feb 2025

https://github.com/munz0908/parqbridge

🌉 Export Laravel database tables to Apache Parquet files effortlessly, using minimal dependencies and a simple artisan command for quick data handling.

laravel laravel-package parquet parquet-files parquet-generator parquet-schema php powerbi python

Last synced: 03 Sep 2025

https://github.com/mattpopovich/dataframeioperformancetesting

Tests the speed and file size of reading and writing DataFrames to/from disk with different file and compression types

csv csv-format feather file-io hdf5 hdf5-format orc orc-format pandas pandas-dataframe pandas-python parquet parquet-files parquet-format pickle pickle-file python python3

Last synced: 07 Oct 2025

https://github.com/dotm87/triaina

big data project, information storage in hdfs

big-data hadoop-hdfs parquet-files

Last synced: 27 Jan 2026

https://github.com/alipsa/jparq

JDBC driver for parquet files

jdbc jdbc-driver parquet parquet-files parquet-tools

Last synced: 25 Oct 2025

https://github.com/yoshinariyamanaka/file_creator

create files which formats are like "orc", "parquet", "xlsx", "json" and so on with Python

orcfile pandas parquet parquet-files python3

Last synced: 29 Dec 2025