https://github.com/accraze/text2token
break down a corpus of text into lines and tokens
https://github.com/accraze/text2token
nlp nlp-parsing tokenizer
Last synced: 4 months ago
JSON representation
break down a corpus of text into lines and tokens
- Host: GitHub
- URL: https://github.com/accraze/text2token
- Owner: accraze
- Created: 2015-09-26T18:47:35.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2016-11-20T18:30:33.000Z (over 8 years ago)
- Last Synced: 2025-01-10T09:59:01.050Z (6 months ago)
- Topics: nlp, nlp-parsing, tokenizer
- Language: JavaScript
- Homepage: https://www.npmjs.com/package/text2token
- Size: 12.7 KB
- Stars: 1
- Watchers: 3
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
[](https://travis-ci.org/accraze/text2token)
[](https://codecov.io/github/accraze/text2token)
[](https://www.npmjs.com/package/text2token)
[](https://www.npmjs.com/package/text2token)
[](https://github.com/semantic-release/semantic-release)## text2token
is a nodejs module that breaks down a corpus of text into lines and tokens.## Install
```
$ npm install text2token
```## Usage
The module has one method: `text2token`, which returns an object that contains a list of each `line` in your text file, as well as a list of all unique `tokens`.
```
$ node
>
> var lib = require('text2token');> var converted = lib.text2token('./src/bigtext.txt')
> converted.tokens
[ '©',
'2015',
'GitHub,',
'Inc.',
'Terms',
'Privacy',
'Security',
..........> converted.lines
[ '© 2015 GitHub, Inc. Terms Privacy Security Contact Help',
'Status API Training Shop Blog About Pricing',
'The quick brown fox jumped over the lazy dog'
.......
```MIT License 2015-2016 © Andy Craze & Contributors