https://github.com/raineorshine/wordsoap
Clean up dirty HTML output from Microsoft Word
https://github.com/raineorshine/wordsoap
Last synced: 7 months ago
JSON representation
Clean up dirty HTML output from Microsoft Word
- Host: GitHub
- URL: https://github.com/raineorshine/wordsoap
- Owner: raineorshine
- License: isc
- Created: 2014-12-10T11:03:12.000Z (almost 11 years ago)
- Default Branch: master
- Last Pushed: 2015-06-04T03:47:42.000Z (over 10 years ago)
- Last Synced: 2025-03-10T15:09:33.494Z (7 months ago)
- Language: HTML
- Homepage:
- Size: 305 KB
- Stars: 14
- Watchers: 1
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
- License: license
Awesome Lists containing this project
README
# wordsoap
[](https://travis-ci.org/metaraine/wordsoap)
[](http://badge.fury.io/js/wordsoap)> Clean up dirty HTML output from Microsoft Word
## Usage
### command line
```sh
$ npm install -g wordsoap
$ cat msword_garbage.html | wordsoap
```### module
```sh
$ npm install --save wordsoap
``````js
var wordsoap = require('wordsoap')var dirty = "
Text
")
var clean = wordsoap(dirty) //Text
// access individual regex strings
wordsoap.regexes.msoAttributes // <(\w+)(?: (?:class|lang|style|size|face|[ovwxp]))=(?:'[^']*'|""[^""]*""|[^\s>]+)(?:[^>]*)>// access individual regexes compiled with 'gi' flags
wordsoap.regexesCompiled.msoAttributes // <(\w+)(?: (?:class|lang|style|size|face|[ovwxp]))=(?:'[^']*'|""[^""]*""|[^\s>]+)(?:[^>]*)>
```## License
ISC © [Raine Lourie](https://github.com/metaraine)