Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/anergictcell/s3reader

A Rust library for random access to S3 objects
https://github.com/anergictcell/s3reader

Last synced: about 2 months ago
JSON representation

A Rust library for random access to S3 objects

Awesome Lists containing this project

README

        

[![Build](https://github.com/anergictcell/s3reader/actions/workflows/build.yml/badge.svg)](https://github.com/anergictcell/s3reader/actions/workflows/build.yml)
[![crates.io](https://img.shields.io/crates/v/s3reader?color=#3fb911)](https://crates.io/crates/s3reader)
[![doc-rs](https://img.shields.io/docsrs/s3reader/latest)](https://docs.rs/s3reader/latest/s3reader/)

# S3Reader

A `Rust` library to read from S3 object as if they were files on a local filesystem (almost). The `S3Reader` adds both `Read` and `Seek` traits, allowing to place the cursor anywhere within the S3 object and read from any byte offset. This allows random access to bytes within S3 objects.

## Usage
Add this to your `Cargo.toml`:

```text
[dependencies]
s3reader = "1.0.0"
```

### Use `BufRead` to read line by line
```rust
use std::io::{BufRead, BufReader};

use s3reader::S3Reader;
use s3reader::S3ObjectUri;

fn read_lines_manually() -> std::io::Result<()> {
let uri = S3ObjectUri::new("s3://my-bucket/path/to/huge/file").unwrap();
let s3obj = S3Reader::open(uri).unwrap();

let mut reader = BufReader::new(s3obj);

let mut line = String::new();
let len = reader.read_line(&mut line).unwrap();
println!("The first line >>{line}<< is {len} bytes long");

let mut line2 = String::new();
let len = reader.read_line(&mut line2).unwrap();
println!("The next line >>{line2}<< is {len} bytes long");

Ok(())
}

fn use_line_iterator() -> std::io::Result<()> {
let uri = S3ObjectUri::new("s3://my-bucket/path/to/huge/file").unwrap();
let s3obj = S3Reader::open(uri).unwrap();

let reader = BufReader::new(s3obj);

let mut count = 0;
for line in reader.lines() {
println!("{}", line.unwrap());
count += 1;
}

Ok(())
}
```

### Use `Seek` to jump to positions
```rust
use std::io::{Read, Seek, SeekFrom};

use s3reader::S3Reader;
use s3reader::S3ObjectUri;

fn jump_within_file() -> std::io::Result<()> {
let uri = S3ObjectUri::new("s3://my-bucket/path/to/huge/file").unwrap();
let mut reader = S3Reader::open(uri).unwrap();

let len = reader.len();

let cursor_1 = reader.seek(SeekFrom::Start(len as u64)).unwrap();
let cursor_2 = reader.seek(SeekFrom::End(0)).unwrap();
assert_eq!(cursor_1, cursor_2);

reader.seek(SeekFrom::Start(10)).unwrap();
let mut buf = [0; 100];
let bytes = reader.read(&mut buf).unwrap();
assert_eq!(buf.len(), 100);
assert_eq!(bytes, 100);

Ok(())
}
```

## Q/A
**Does this library really provide random access to S3 objects?**
According to this [StackOverflow answer](https://stackoverflow.com/questions/60176997/does-aws-s3-getobject-provide-random-access), yes.

**Are the reads sync or async?**
The S3-SDK uses mostly async operations, but the `Read` and `Seek` traits require sync methods. Due to this, I'm using a blocking tokio runtime to wrap the async calls. This might not be the best solution, but works well for me. Any improvement suggestions are very welcome

**Why is this useful?**
Depends on your use-cases. If you need to access random bytes in the middle of large files/S3 object, this library is useful. For example, you can read it to stream mp4 files. It's also quite useful for some bioinformatic applications, where you might have a huge, several GB reference genome, but only need to access data of a few genes, accounting to only a few MB.