Projects in Awesome Lists tagged with parquet-files
A curated list of projects in awesome lists tagged with parquet-files .
https://github.com/uber/petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
deep-learning machine-learning parquet parquet-files pyarrow pyspark pytorch sysml tensorflow
Last synced: 10 Apr 2025
https://github.com/cinchoo/choetl
ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
avro cinchoo-etl csharp csv dotnet etl etl-framework flat json keyvalue parquet parquet-files parser reader writer xml yaml
Last synced: 12 Apr 2025
https://github.com/Cinchoo/ChoETL
ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
avro cinchoo-etl csharp csv dotnet etl etl-framework flat json keyvalue parquet parquet-files parser reader writer xml yaml
Last synced: 14 Mar 2025
https://github.com/hrbrmstr/sergeant
:guardsman: Tools to Transform and Query Data with 'Apache' 'Drill'
apache-drill dplyr drill parquet-files r r-cyber rstats sql
Last synced: 16 Mar 2025
https://github.com/mongodb-labs/mongo-arrow
MongoDB integrations for Apache Arrow. Export MongoDB documents to numpy array, parquet files, and pandas dataframes in one line of code.
apache-arrow arrow mongodb numpy-arrays pandas-dataframe parquet-files python
Last synced: 16 May 2025
https://github.com/minio/spark-select
A library for Spark DataFrame using MinIO Select API
amazon-s3 bigdata minio parquet-files pyspark sbt select spark spark-sql
Last synced: 20 Jun 2025
https://github.com/adrianulbona/osm-parquetizer
A converter for the OSM PBFs to Parquet files
apache-spark converter openstreetmap parquet-files pbf
Last synced: 29 Oct 2025
https://github.com/igor-suhorukov/openstreetmap_h3
OSM planet dump high performance data loader. Transform OpenStreetMap World/Region PBF dump into partitioned by H3 regions PostGIS pgsnapshot (lossless) OSM schema representation and/or into ArrowIPC/Parquet dumps
apach-sedona apache-arrow apache-spark arrow citusdb column-store converter duckdb geometry-processing geospatial java openstreetmap parquet parquet-files pbf pbf-format postgis postgresql world
Last synced: 05 Oct 2025
https://github.com/strategicblue/parquet-floor
A lightweight Java library that facilitates reading and writing Apache Parquet files without Hadoop dependencies
Last synced: 14 Jan 2026
https://github.com/hannes/miniparquet
Library to read a subset of Parquet files
cpp cpp11 dependency-free parquet parquet-cpp parquet-files
Last synced: 15 Mar 2025
https://github.com/data-lake-visualizer/vscode-parquet-visualizer
VScode extension for SQL querying and visualizing parquet- and CSV files
csv csv-export csv-files explorer parquet parquet-files parquet-viewer query viewer visualizer vscode-extension
Last synced: 05 Jan 2026
https://github.com/grouzen/zio-apache-parquet
Scala ZIO-powered Apache Parquet library
apache-parquet big-data bigdata parquet parquet-files parquet-format parquet-tools scala zio zio-streams zio2
Last synced: 28 Aug 2025
https://github.com/Data-Lake-Visualizer/vscode-parquet-visualizer
VScode extension for SQL querying and visualizing parquet- and CSV files
csv csv-export csv-files explorer parquet parquet-files parquet-viewer query viewer visualizer vscode-extension
Last synced: 11 May 2025
https://github.com/squey/squey
Squey is a visualization software designed to interactively explore and understand large amounts of tabular data (this is the read-only mirror of https://gitlab.com/squey/squey)
cybersecurity data-analysis data-science data-visualization exploratory-data-visualizations parallel-coordinates parquet parquet-files parquet-viewer pcap timeseries timeseries-analysis visualization
Last synced: 08 Mar 2025
https://github.com/hrbrmstr/sergeant-caffeinated
:guardsman: ☕️ Tools to Transform and Query Data with 'Apache' 'Drill'
dplyr drill jdbc parquet-files r rstats sql
Last synced: 18 Jul 2025
https://github.com/gaborcsardi/nanoparquet-cli
Command line Docker app to query and manipulate Parquet files
Last synced: 11 Feb 2026
https://github.com/adrigrillo/nycsparktaxi
Apache Spark application to get the top ten frequent routes and profitable areas
big-data nyc parquet-files python spark taxi
Last synced: 05 Apr 2025
https://github.com/domvwt/parquet-inspector
A command line tool for inspecting parquet files with PyArrow.
cli parquet parquet-cli parquet-files parquet-generator parquet-tools parquet-viewer
Last synced: 19 Sep 2025
https://github.com/dgtlss/parqbridge
ParqBridge focuses on zero PHP dependency bloat while still producing spec-compliant Parquet files by delegating the final write step to a tiny, embedded Python script using PyArrow (or any custom CLI you prefer). You keep full Laravel DX for configuration and Storage; we bridge your data to Parquet.
laravel laravel-framework laravel-package parquet parquet-files parquet-generator parquet-schema php php8 powerbi python
Last synced: 03 Oct 2025
https://github.com/cajuncoding/parquetfiles.blobhelpers
A simple library and console application to illustrate how to read and load data into class models from Parquet files saved to Azure Blob Storage using Parquet .Net (parquet-dotnet). This is useful for E-L-T processes whereby you need to load the data into Memory, Sql Server (e.g. Azure SQL), etc. or any other location where there is no built-in or default mechanism for working with Parquet data.
azure-blob azure-blob-storage azure-functions parquet parquet-data parquet-dotnet parquet-files parquet-tools
Last synced: 08 Mar 2026
https://github.com/ayushverma135/json-to-parquet-parser
Easily convert JSON data into Parquet format for efficient storage and analysis. Simplify data processing and analysis pipelines by converting JSON objects into optimized Parquet files.
json pandas parquet-files python
Last synced: 09 Jul 2025
https://github.com/alastairtree/crump
Python & CLI tool for getting data from files into a DB fast.
cdf-files csv-files parquet-files postgresql sqlite
Last synced: 04 Mar 2026
https://github.com/tee8z/noaa-oracle
NOAA data oracle, queryable from the browser and can attest to events for a Bitcoin DLC in dlctix style
data duckdb-wasm noaa-weather parquet-files sql weather
Last synced: 17 Feb 2026
https://github.com/ostrokach/uniparc_xml_parser
UniParc dataset describing ~300 million protein sequences converted into relational tables accessible through Google BigQuery (and as Parquet files).
bigquery bioinformatics csv-files parquet-files protein-domains protein-sequences
Last synced: 03 Jan 2026
https://github.com/rlesur/quarto-ojs-parquet-s3
A Quarto notebook requesting a parquet file stored in S3
minio parquet-files quarto s3-storage
Last synced: 25 Feb 2025
https://github.com/srking501/csc8101_coursework
A summative coursework for CSC8101 Engineering for AI
apache-parquet apache-spark azure-databricks big-data big-data-analytics big-data-processing data-science databri databricks-notebooks delta-file nyc-taxi-dataset parquet-files pyspark
Last synced: 12 Feb 2026
https://github.com/ffatahillah7/etl-python-joindataframe-topostgresql
ETL Transform dataset from joining two dataset Csv and Parquet to PostgreSQL
join-tables pandasql parquet-files postgresql python sql sqlalchemy sqlalchemy-python
Last synced: 14 May 2025
https://github.com/jhylin/ml1-1_small_mols_in_chembl
Polars dataframe library and logistic regression in scikit-learn (update)
logistic-regression machine-learning parquet-files polars-dataframe scikit-learn
Last synced: 03 Jan 2026
https://github.com/slatawa/csv_parquet
Project showing integration of upstream file into your data lake. we look at handling high volume customized data formats and converting them into parquet.
Last synced: 10 Aug 2025
https://github.com/varsha-vraj/airport_parking_toolkit
This toolkit is designed to simulate and manage airport parking events. It provides a command-line interface (CLI) for managing vehicles, zones, and parking events. It includes full integration with PostgreSQL for data storage, SQL for advanced queries, and Apache Spark for big data batch processing of parquet logs.
big-data cli dataengineering hadoop java parquet-files poetry postgresql pyspark python3 spark sqlalchemy
Last synced: 11 Aug 2025
https://github.com/yo-mah-ya/file_creator
create files which formats are like "orc", "parquet", "xlsx", "json" and so on with Python
orcfile pandas parquet parquet-files python3
Last synced: 14 Mar 2025
https://github.com/ahbiels/fegtec
FegTec é uma empresa fictícia que quer transferir arquivos parquet contendo dados dos clientes da nuvem AWS para a Google Cloud
aws bucket cloudfunctions data-engineer gcp pandas parquet-files python transfer-data
Last synced: 27 Feb 2025
https://github.com/munz0908/parqbridge
🌉 Export Laravel database tables to Apache Parquet files effortlessly, using minimal dependencies and a simple artisan command for quick data handling.
laravel laravel-package parquet parquet-files parquet-generator parquet-schema php powerbi python
Last synced: 03 Sep 2025
https://github.com/mattpopovich/dataframeioperformancetesting
Tests the speed and file size of reading and writing DataFrames to/from disk with different file and compression types
csv csv-format feather file-io hdf5 hdf5-format orc orc-format pandas pandas-dataframe pandas-python parquet parquet-files parquet-format pickle pickle-file python python3
Last synced: 07 Oct 2025
https://github.com/dotm87/triaina
big data project, information storage in hdfs
big-data hadoop-hdfs parquet-files
Last synced: 27 Jan 2026
https://github.com/hrmeetsingh/parquetreader
Parquet reader code in Java
java parquet parquet-files parquet-tools parquet-viewer
Last synced: 19 Oct 2025
https://github.com/alipsa/jparq
JDBC driver for parquet files
jdbc jdbc-driver parquet parquet-files parquet-tools
Last synced: 25 Oct 2025
https://github.com/yoshinariyamanaka/file_creator
create files which formats are like "orc", "parquet", "xlsx", "json" and so on with Python
orcfile pandas parquet parquet-files python3
Last synced: 29 Dec 2025