An open API service indexing awesome lists of open source software.

https://github.com/sunchao/parquet-rs

Apache Parquet implementation in Rust
https://github.com/sunchao/parquet-rs

hadoop parquet rust

Last synced: 3 months ago
JSON representation

Apache Parquet implementation in Rust

Awesome Lists containing this project

README

          

# parquet-rs

[![Build Status](https://travis-ci.org/sunchao/parquet-rs.svg?branch=master)](https://travis-ci.org/sunchao/parquet-rs)
[![Coverage Status](https://coveralls.io/repos/github/sunchao/parquet-rs/badge.svg?branch=master)](https://coveralls.io/github/sunchao/parquet-rs?branch=master)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![](http://meritbadge.herokuapp.com/parquet)](https://crates.io/crates/parquet)
[![API docs](https://img.shields.io/badge/docs-0.4.1-blue.svg)](https://sunchao.github.io/parquet-rs/0.4.2/parquet/)
[![Master API docs](https://img.shields.io/badge/docs-master-green.svg)](https://sunchao.github.io/parquet-rs/master/parquet/)

An [Apache Parquet](https://parquet.apache.org/) implementation in Rust.

**NOTE: this project has merged into [Apache Arrow](https://arrow.apache.org/), and development will continue there.** **To file an issue or pull request, please file a [JIRA](https://issues.apache.org/jira/projects/ARROW/issues) in the Arrow project.**

## Usage
Add this to your Cargo.toml:
```toml
[dependencies]
parquet = "0.4"
```

and this to your crate root:
```rust
extern crate parquet;
```

Example usage of reading data:
```rust
use std::fs::File;
use std::path::Path;
use parquet::file::reader::{FileReader, SerializedFileReader};

let file = File::open(&Path::new("/path/to/file")).unwrap();
let reader = SerializedFileReader::new(file).unwrap();
let mut iter = reader.get_row_iter(None).unwrap();
while let Some(record) = iter.next() {
println!("{}", record);
}
```
See [crate documentation](https://sunchao.github.io/parquet-rs/master) on available API.

## Supported Parquet Version
- Parquet-format 2.4.0

To update Parquet format to a newer version, check if [parquet-format](https://github.com/sunchao/parquet-format-rs)
version is available. Then simply update version of `parquet-format` crate in Cargo.toml.

## Features
- [X] All encodings supported
- [X] All compression codecs supported
- [X] Read support
- [X] Primitive column value readers
- [X] Row record reader
- [ ] Arrow record reader
- [X] Statistics support
- [X] Write support
- [X] Primitive column value writers
- [ ] Row record writer
- [ ] Arrow record writer
- [ ] Predicate pushdown
- [ ] Parquet format 2.5 support
- [ ] HDFS support

## Requirements
- Rust nightly

See [Working with nightly Rust](https://github.com/rust-lang-nursery/rustup.rs/blob/master/README.md#working-with-nightly-rust)
to install nightly toolchain and set it as default.

## Build
Run `cargo build` or `cargo build --release` to build in release mode.
Some features take advantage of SSE4.2 instructions, which can be
enabled by adding `RUSTFLAGS="-C target-feature=+sse4.2"` before the
`cargo build` command.

## Test
Run `cargo test` for unit tests.

## Binaries
The following binaries are provided (use `cargo install` to install them):
- **parquet-schema** for printing Parquet file schema and metadata.
`Usage: parquet-schema [verbose]`, where `file-path` is the path to a Parquet file,
and optional `verbose` is the boolean flag that allows to print full metadata or schema only
(when not specified only schema will be printed).

- **parquet-read** for reading records from a Parquet file.
`Usage: parquet-read [num-records]`, where `file-path` is the path to a Parquet file,
and `num-records` is the number of records to read from a file (when not specified all records will
be printed).

If you see `Library not loaded` error, please make sure `LD_LIBRARY_PATH` is set properly:
```
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(rustc --print sysroot)/lib
```

## Benchmarks
Run `cargo bench` for benchmarks.

## Docs
To build documentation, run `cargo doc --no-deps`.
To compile and view in the browser, run `cargo doc --no-deps --open`.

## License
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0.