https://github.com/sipemu/excel-to-parquet
A command-line tool written in Rust that converts Excel (XLSX) files to Parquet format.
https://github.com/sipemu/excel-to-parquet
excel parquet rust
Last synced: about 2 months ago
JSON representation
A command-line tool written in Rust that converts Excel (XLSX) files to Parquet format.
- Host: GitHub
- URL: https://github.com/sipemu/excel-to-parquet
- Owner: sipemu
- Created: 2024-12-13T09:15:34.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-12-14T20:54:29.000Z (5 months ago)
- Last Synced: 2025-02-08T05:23:31.211Z (3 months ago)
- Topics: excel, parquet, rust
- Language: Rust
- Homepage:
- Size: 7.81 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Excel to Parquet Converter
A command-line tool written in Rust that converts Excel (XLSX) files to Parquet format. This tool is designed to be simple and efficient, making it easy to convert Excel data for use with modern data tools.
## Features
- Convert XLSX files to Parquet format
- Handle empty column names (auto-generates names like Field_0, Field_1, etc.)
- Skip rows option for files with headers not in the first row
- Specify custom output directory
- Simple command-line interface
- Currently, no type inference, all data is stored as strings
- First sheet is used. Currently, no support for selecting a specific sheet.## Installation
### From Source
Requires Rust toolchain to be installed. Visit [rustup.rs](https://rustup.rs/) to install Rust.
```bash
# Clone the repository
git clone https://github.com/yourusername/excel-to-parquet
cd excel-to-parquet# Build and install
cargo install --path .
```## Usage
Basic usage:
```bash
excel-to-parquet input.xlsx
```Skip first N rows:
```bash
excel-to-parquet -s 2 input.xlsx
```Specify output directory:
```bash
excel-to-parquet -o /path/to/output input.xlsx
```### Command Line Options
```
USAGE:
excel-to-parquet [OPTIONS]ARGS:
Path to the input Excel fileOPTIONS:
-h, --help Print help information
-s, --skip-rows Number of rows to skip [default: 0]
-o, --output-path Output directory [default: .]
-V, --version Print version information
```## Output
The output Parquet file will:
- Have the same name as the input file (with .parquet extension)
- Be saved in the specified output directory (or current directory if not specified)
- Preserve data as strings from the Excel file## Building from Source
```bash
# Debug build
cargo build# Release build
cargo build --release
```## Bash Script (Just for Linux)
A bash script for bulk converting Excel files to Parquet format is added to the repository. The executable must be in the same directory as the script.
```bash
# Convert all Excel files, skip 2 rows
./convert_excel_to_parquet.sh -s ./excel_files -t ./parquet_files -r 2# Convert all Excel files, no rows skipped
./convert_excel_to_parquet.sh -s ./excel_files -t ./parquet_files# Show help
./convert_excel_to_parquet.sh -h
```## License
This project is licensed under the MIT License - see the LICENSE file for details.