Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/axsaucedo/rust-io-file-benchmark
https://github.com/axsaucedo/rust-io-file-benchmark
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/axsaucedo/rust-io-file-benchmark
- Owner: axsaucedo
- Created: 2019-03-05T14:26:15.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2019-03-10T11:01:11.000Z (almost 6 years ago)
- Last Synced: 2024-11-06T13:01:46.825Z (3 months ago)
- Language: Rust
- Size: 1.91 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-code-for-gamedev - rust-io-file-benchmark - a short report on the performance metrics obtained processing large files with a small rust/python script. (IO / benchmark)
README
# Rust benchmarking on Large File IO
This is a short report on the performance metrics obtained processing large files with a small rust/python script.
In this case the definition of "Large" is files that won't fit in memory easily (e.g. 100GB >) and require streaming / buffers.
## Experiment overview
The experiment is as follows:
* 2GB text file containing text information about objects
* Each 3 consecutive lines has information about one object (i.e. 1 line = one attribute)
* Each object is separated by one blank line## Objective
Objective is to read file, iterate through lines and write results to CSV/TSV
## Example
#### input file example
OBJECT 1 ATTR 1: CONTENT
OBJECT 1 ATTR 2: CONTENT
OBJECT 1 ATTR 3: CONTENTOBJECT 2 ATTR 1: CONTENT
OBJECT 2 ATTR 2: CONTENT
OBJECT 2 ATTR 3: CONTENT...etc
#### expected output file example
OBJECT 1 ATTR1, OBJECT 1 ATTR 2, OBJECT 1 ATTR 3
OBJECT 2 ATTR1, OBJECT 2 ATTR 2, OBJECT 2 ATTR 3
... etc# Improvements
Currently there are clear optimisations required for the Rust code, as there are several string operations.
Ideally it would be possible to process the files in rust as u8 (byte) format to save time, which would accelerate the processing, but unfortunately the BufReader class doesn't seem to provide functionality to read the files as bytes directly.
# Results
The results are provided below.
## Python:
Simple python implementation without any buffering, using the native python file IO read_line / write.
real 2m16.087s
user 1m4.397s
sys 0m4.352s
## Rust 1.33 main.rs:
Rust implementation using the BufReader and BufWriter converting to string, appending and writing bytes.
real 7m28.602s
user 7m19.379s
sys 0m5.094s
## Rust 1.33 main-vec.rs:
Rust implementation using BufReader and Bufwriter, and using a vector to attempt single string concat.
real 8m35.463s
user 8m24.227s
sys 0m5.918s
## Rust 1.33 main-copy.rs:
Rust implementation using plain File reading in bytes and copying it to another location without performing and processing.
real 41m12.918s
user 22m55.845s
sys 18m15.949s