Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/anvie/dotext

Simple Document File Text Extraction Library for Rust
https://github.com/anvie/dotext

Last synced: 4 days ago
JSON representation

Simple Document File Text Extraction Library for Rust

Awesome Lists containing this project

README

        

Document File Text Extractor
=============================

[![Build Status](https://travis-ci.org/anvie/dotext.svg?branch=master)](https://travis-ci.org/anvie/dotext)
[![Build status](https://ci.appveyor.com/api/projects/status/rghm59ie4ax9655t?svg=true)](https://ci.appveyor.com/project/anvie/dotext)
[![Crates.io](https://img.shields.io/crates/v/dotext.svg)](https://crates.io/crates/dotext)

Simple Rust library to extract readable text from specific document format like Word Document (docx).
Currently only support several format, other format coming soon.

Supported Document
-------------------------

- [x] Microsoft Word (docx)
- [x] Microsoft Excel (xlsx)
- [x] Microsoft Power Point (pptx)
- [x] OpenOffice Writer (odt)
- [x] OpenOffice Spreadsheet (ods)
- [x] OpenDocument Presentation (odp)
- [ ] PDF

Usage
------

```rust
let mut file = Docx::open("samples/sample.docx").unwrap();
let mut isi = String::new();
let _ = file.read_to_string(&mut isi);
println!("CONTENT:");
println!("----------BEGIN----------");
println!("{}", isi);
println!("----------EOF----------");
```

Test
-----

```bash
$ cargo test
```

or run example:

```bash
$ cargo run --example readdocx data/sample.docx
```

[] Robin Sy.