Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sheetjs/js-word
:black_nib: Word Processing Document Library
https://github.com/sheetjs/js-word
data doc docx word xml
Last synced: 16 days ago
JSON representation
:black_nib: Word Processing Document Library
- Host: GitHub
- URL: https://github.com/sheetjs/js-word
- Owner: SheetJS
- License: apache-2.0
- Created: 2012-12-30T06:26:45.000Z (about 12 years ago)
- Default Branch: master
- Last Pushed: 2022-05-21T09:31:33.000Z (over 2 years ago)
- Last Synced: 2024-10-29T22:37:40.686Z (3 months ago)
- Topics: data, doc, docx, word, xml
- Language: Rich Text Format
- Homepage: http://wordjs.com
- Size: 76.8 MB
- Stars: 1,309
- Watchers: 49
- Forks: 206
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# [SheetJS js-word](http://wordjs.com)
Parser and writer for various word processing doc formats. Pure-JS cleanroom
implementation from official specifications, related documents, and test files.
Emphasis on parsing and writing robustness, cross-format feature compatibility
with a unified JS representation, and maximal browser compatibility.## Test Files
Test files should be placed in the `test_files` directory, in the appropriate
subdirectory for the filetype. For example, DOCX files should be placed in
`test_files\docx\wordjs` and RTF files should be in `test_files\rtf\wordjs`.Every test file should be accompanied by a plain text `.txt` representation
whose filename is the original filename appended with `.txt`. For example, the
DOCX file `test_files\docx\wordjs\foo.docx` pairs with the plain text file `test_files\docx\wordjs\foo.docx.txt`**Generating Baselines using Word for Windows**
0. Ensure you have PowerShell version 7.0 or greater
1. Run `Set-ExecutionPolicy RemoteSigned` OR `Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass` in Powershell (PS) Admin 7.0
2. Have the PS script in the root of the repo
3. Run `.\generate_txt.ps1 .\test_files\EXT_TYPE\FOLDER` (ex. `.\generate_txt.ps1 .\test_files\docx\apachepoi`)On first run, if a test file does not have an accompanying `.txt` file, the
script will open Word and save the file as plaintext. Word will rapidly open
and close during this process.The script will not attempt to open Word or try to generate `.txt` files if they
already exist. After a clean run, Word should not open on future runs.The script will halt for documents that are broken in certain ways. Word will
display a prompt, stalling the automated process. Those documents can be
skipped by creating a `.skip` file as described below.**Skipping Files**
The script will look for files with the `.skip` extension and skip processing
the base file. For example, if `test_files\docx\wordjs\Hello.docx.skip` exists,
the script will not attempt to process `test_files\docx\wordjs\Hello.docx`When the UI blocks (for example, on a VBA error with `ThisDocument`), the
corresponding `.skip` file should be created manually. The script merely tests
if the file exists, so the content is immaterial and a single letter suffices.**Generating `.skip` files**
The script will attempt to open password-protected documents using the password
"WordJS". The script will not halt but it will not generate a text file. Instead,
an output would be written to terminal indicating a skip and will generate a `.skip`
when encountered.## License
Please consult the attached LICENSE file for details. All rights not explicitly
granted by the Apache 2.0 License are reserved by the Original Author.## References
OSP-covered Specifications (click to show)
- `MS-CFB`: Compound File Binary File Format
- `MS-DOC`: Word (.doc) Binary File Format
- `RTF`: Rich Text Format- ISO/IEC 29500:2012(E) "Information technology — Document description and processing languages — Office Open XML File Formats"
- Open Document Format for Office Applications Version 1.3 (25 December 2019)[![Analytics](https://ga-beacon.appspot.com/UA-36810333-1/SheetJS/js-word?pixel)](https://github.com/SheetJS/js-word)