https://github.com/semvis123/go-catdoc
Go-catdoc, get text and metadata from .doc files.
https://github.com/semvis123/go-catdoc
catdoc doc extraction go golang imhex-pattern metadata msdoc text
Last synced: about 1 year ago
JSON representation
Go-catdoc, get text and metadata from .doc files.
- Host: GitHub
- URL: https://github.com/semvis123/go-catdoc
- Owner: semvis123
- License: mit
- Created: 2023-08-07T20:43:29.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-08-09T18:08:22.000Z (over 2 years ago)
- Last Synced: 2025-01-10T17:32:17.654Z (about 1 year ago)
- Topics: catdoc, doc, extraction, go, golang, imhex-pattern, metadata, msdoc, text
- Language: C
- Homepage:
- Size: 300 KB
- Stars: 3
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
- License: LICENSE
Awesome Lists containing this project
README
## Go-catdoc, get text and metadata from .doc files.
[](https://godoc.org/github.com/semvis123/go-catdoc)
[](https://github.com/semvis123/go-catdoc/actions/workflows/go.yml)
Uses Wazero to run catdoc as webassembly in Go.
The catdoc source is slightly modified to support reading metadata in `.doc`.
The `msdoc.hexpat` file is a pattern file for imhex that can parse the `summaryinformation` ole object inside the `.doc` file.
To compile the webassembly binary, go to ./catdoc/src/ and run `make catdoc-wasm`.
To run the tests, do `go test ./...`
Usage:
```
f, err := os.Open("test.doc")
text, err := gocatdoc.GetTextFromFile(f)
```