https://github.com/Barthandelous01/redocx

A blazing-fast C docx decoder
https://github.com/Barthandelous01/redocx

c docx microsoft-word

Last synced: 24 days ago
JSON representation

A blazing-fast C docx decoder

Host: GitHub
URL: https://github.com/Barthandelous01/redocx
Owner: Barthandelous01
License: bsd-3-clause
Created: 2020-03-02T16:27:05.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2021-01-03T21:37:53.000Z (over 4 years ago)
Last Synced: 2024-11-01T18:37:48.193Z (6 months ago)
Topics: c, docx, microsoft-word
Language: C
Size: 2.18 MB
Stars: 4
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-blazingly-fast - redocx - A blazing-fast C docx decoder (C)

README

# redocx
A blazing-fast `.docx` decoder

## Usage
`redocx` is super simple to use! Just run:
```bash
$ redocx -f path/to/input/file.docx [-o path/to/optional/output/file.txt]
```
## Installation
There are two dependencies for `redocx`: `libzip` and `libxml2`. Both of these libraries must also be registered with pkg-config. How those two things are installed varies based on your system. For example:

* MacOS
```bash
brew install libxml2
brew install libzip
```

* Arch Linux
```bash
sudo pacman -S libxml2
sudo pacman -S libzip
```
Once those are installed, clone the repo, and move into the directory. Then follow the ritual:
```bash
./configure
make
sudo make install
```

## Inspiration
I have, like most developers, read the famous [webiquitte article](http://www.catb.org/esr/faqs/smart-questions.html). While reading it, I was thinking, "Surely someone out there has made a Word Document decoder for these hackers?! Why would they be annoyed, if they could just decode it?" So I got to searching. The best I was able to find was [this project](https://github.com/DecentM/undocx/blob/master/undocx) which, with all due respect to [the creator](https://github.com/DecentM), does not create neat or readable code. Newlines are not, for example, carried over. So, I set about to make my own.
## Benchmarks
`redocx` lives up to its description as a "blazing fast" decoder. For a small (~13kb) word document (Around one average length paragraph)... well, you can see the results yourself.

For a larger document, it takes slightly longer, but...

For a huge novel (115kb):

It's faster than anything else out there for decoding text from a `.docx` archive.
It may also be intiresting to note that I wrote a `rust` version of this program. However, it was abandoned because `redocx` performed 13.55 times (on average) better.
Thanks to [sharkdp](https://github.com/sharkdp) for the [utility](https://github.com/sharkdp/hyperfine) used in the benchmarking.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Barthandelous01/redocx

Awesome Lists containing this project

README