https://github.com/nlevitt/warcio-rs
https://github.com/nlevitt/warcio-rs
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/nlevitt/warcio-rs
- Owner: nlevitt
- Created: 2023-01-07T06:04:39.000Z (about 3 years ago)
- Default Branch: master
- Last Pushed: 2023-03-05T09:00:07.000Z (almost 3 years ago)
- Last Synced: 2025-03-30T01:41:29.460Z (10 months ago)
- Language: Rust
- Size: 78.1 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
[](https://github.com/nlevitt/warcio-rs/actions)
# warcio-rs
## WARC library for rust
Warcio-rs is a rust library for reading and writing [WARC 1.1][1] files. Input and output are streamed: the WARC record
body is a [`Read`][2], both when reading and writing WARCs.
## Sample code
See [examples][3] for more.
### Read a WARC
```rust
use std::fs::File;
use std::io::{BufRead, Read};
use std::str::from_utf8;
use warcio::{LendingIterator as _, WarcReader, WarcRecordHeaderName};
fn main() -> Result<(), Box> {
let f = File::open("example.warc.gz")?;
let mut warc_reader = WarcReader::>::try_from(f)?;
while let Some(record) = warc_reader.next()? {
// more convenient api to come
let mut content_type: Option<&[u8]> = None;
for header in &record.headers {
match header.name {
WarcRecordHeaderName::ContentType => content_type = Some(&header.value),
_ => {}
}
}
let mut buf: [u8; 20] = [0; 20];
let n = record.payload.read(&mut buf)?;
println!(
"content_type={:?} start of body: {:?}",
from_utf8(content_type.unwrap_or(b""))?,
&buf[0..n]
);
}
Ok(())
}
```
### Write a WARC
```rust
use chrono::Utc;
use std::fs::File;
use std::io::BufWriter;
use warcio::{WarcRecord, WarcRecordType, WarcRecordWrite as _, WarcWriter};
fn main() -> Result<(), std::io::Error> {
let f = File::create("example.warc.gz")?;
let mut warc_writer = WarcWriter::new(BufWriter::new(f), true);
let payload = b"format: WARC File Format 1.1\r\n";
let record: WarcRecord<&[u8]> = WarcRecord::builder()
.generate_record_id()
.warc_type(WarcRecordType::Warcinfo)
.warc_date(Utc::now())
.content_type(b"text/plain")
.content_length(payload.len())
.body(&payload[..])
.build();
warc_writer.write_record(record)?;
Ok(())
}
```
[1]: https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1/
[2]: https://doc.rust-lang.org/std/io/trait.Read.html
[3]: https://github.com/nlevitt/warcio-rs/tree/master/examples