https://github.com/unfoldingword/string-punctuation-tokenizer
Small library that provides functions to tokenize a string into an array of words with or without punctuation
https://github.com/unfoldingword/string-punctuation-tokenizer
javascript nlp nlp-library scripture-open-components segmentation tokenizers
Last synced: about 2 months ago
JSON representation
Small library that provides functions to tokenize a string into an array of words with or without punctuation
- Host: GitHub
- URL: https://github.com/unfoldingword/string-punctuation-tokenizer
- Owner: unfoldingWord
- License: mit
- Created: 2018-02-23T22:29:54.000Z (over 7 years ago)
- Default Branch: develop
- Last Pushed: 2023-08-09T14:35:40.000Z (almost 2 years ago)
- Last Synced: 2025-03-27T19:08:34.823Z (2 months ago)
- Topics: javascript, nlp, nlp-library, scripture-open-components, segmentation, tokenizers
- Language: JavaScript
- Homepage: https://string-punctuation-tokenizer.netlify.app/#/Tokenize
- Size: 2.14 MB
- Stars: 8
- Watchers: 8
- Forks: 1
- Open Issues: 21
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://www.npmjs.com/package/string-punctuation-tokenizer)
[](https://www.npmjs.com/package/string-punctuation-tokenizer)# string-punctuation-tokenizer
Small library that provides functions to tokenize a string into an array of words with or without punctuation## Setup
`npm install string-punctuation-tokenizer`## Usage
`var stringTokenizer = require('string-punctuation-tokenizer');`or ES6
`import {tokenize} from 'string-punctuation-tokenizer';`
#### Tokenize with punctuation
```js
import {tokenize} from './src/tokenizers'; // use the import from above instead of this
let words = tokenize({text: 'Hello world, my name is Manny!', includePunctuation: true});
// words = ["Hello", "world", ",", "my", "name", "is", "Manny", "!"]
```
#### Tokenize without punctuation
```js
import {tokenize} from './src/tokenizers'; // use the import from above instead of this
let words = tokenize({text: 'Hello world, my name is Manny!'});
// words = ["Hello", "world", "my", "name", "is", "Manny"]
```### Documentation
See detailed documentation and live WYSIWYG playground here: https://string-punctuation-tokenizer.netlify.app/#/Tokenize