Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/winkjs/wink-nlp-utils
NLP Functions for amplifying negations, managing elisions, creating ngrams, stems, phonetic codes to tokens and more.
https://github.com/winkjs/wink-nlp-utils
bag-of-words natural-language-processing ngrams nlp phonetize sentence-boundary-detection stem stop-words tokenize
Last synced: 1 day ago
JSON representation
NLP Functions for amplifying negations, managing elisions, creating ngrams, stems, phonetic codes to tokens and more.
- Host: GitHub
- URL: https://github.com/winkjs/wink-nlp-utils
- Owner: winkjs
- License: mit
- Created: 2017-05-12T17:44:31.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2024-03-03T07:32:27.000Z (8 months ago)
- Last Synced: 2024-04-14T07:52:53.756Z (7 months ago)
- Topics: bag-of-words, natural-language-processing, ngrams, nlp, phonetize, sentence-boundary-detection, stem, stop-words, tokenize
- Language: JavaScript
- Homepage: http://winkjs.org/wink-nlp-utils/
- Size: 2.98 MB
- Stars: 111
- Watchers: 7
- Forks: 12
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# wink-nlp-utils
NLP Functions for amplifying negations, managing elisions, creating ngrams, stems, phonetic codes to tokens and more.
### [![Build Status](https://app.travis-ci.com/winkjs/wink-nlp-utils.svg?branch=master)](https://app.travis-ci.com/github/winkjs/wink-nlp-utils) [![Coverage Status](https://coveralls.io/repos/github/winkjs/wink-nlp-utils/badge.svg?branch=master)](https://coveralls.io/github/winkjs/wink-nlp-utils?branch=master) [![Gitter](https://img.shields.io/gitter/room/nwjs/nw.js.svg)](https://gitter.im/winkjs/Lobby)
[](http://wink.org.in/)
Prepare raw text for Natural Language Processing (NLP) using **`wink-nlp-utils`**. It offers a set of [APIs](http://wink.org.in/wink-nlp-utils/) to work on [strings](http://wink.org.in/wink-nlp-utils/#string) such as names, sentences, paragraphs and [tokens](http://wink.org.in/wink-nlp-utils/#tokens) represented as an array of strings/words. They perform the required pre-processing for many ML tasks such as [semantic search](https://www.npmjs.com/package/wink-bm25-text-search), and [classification](https://www.npmjs.com/package/wink-naive-bayes-text-classifier).
👉🏽
We recommend using winkNLP for core natural language processing tasks.
It performs Tokenization, Sentence Boundary Detection, and Named Entity Recognition at a blazing fast speeds. It supports all your text processing needs starting from Sentiment Analysis, POS Tagging, Lemmatization, Stemming, Stop Word Removal, Negation Handling, Bigrams to Frequency Table Creation and more.
WinkNLP features user-friendly declarative APIs for Iteration, Filtering, and Text Visualization, and runs on web browsers.### Installation
Use [npm](https://www.npmjs.com/package/wink-nlp-utils) to install:
```
npm install wink-nlp-utils --save
```### Getting Started
The `wink-nlp-utils` provides over **36 utility functions** for Natural Language Processing tasks. Some representative examples are extracting person's name from a string, compose training corpus for a chat bot, sentence boundary detection, tokenization and stop words removal:
```javascript// Load wink-nlp-utils
var nlp = require( 'wink-nlp-utils' );// Extract person's name from a string:
var name = nlp.string.extractPersonsName( 'Dr. Sarah Connor M. Tech., PhD. - AI' );
console.log( name );
// -> 'Sarah Connor'// Compose all possible sentences from a string:
var str = '[I] [am having|have] [a] [problem|question]';
console.log( nlp.string.composeCorpus( str ) );
// -> [ 'I am having a problem',
// -> 'I am having a question',
// -> 'I have a problem',
// -> 'I have a question' ]// Sentence Boundary Detection.
var para = 'AI Inc. is focussing on AI. I work for AI Inc. My mail is [email protected]';
console.log( nlp.string.sentences( para ) );
// -> [ 'AI Inc. is focussing on AI.',
// 'I work for AI Inc.',
// 'My mail is [email protected]' ]// Tokenize a sentence.
var s = 'For details on wink, check out http://winkjs.org/ URL!';
console.log( nlp.string.tokenize( s, true ) );
// -> [ { value: 'For', tag: 'word' },
// { value: 'details', tag: 'word' },
// { value: 'on', tag: 'word' },
// { value: 'wink', tag: 'word' },
// { value: ',', tag: 'punctuation' },
// { value: 'check', tag: 'word' },
// { value: 'out', tag: 'word' },
// { value: 'http://winkjs.org/', tag: 'url' },
// { value: 'URL', tag: 'word' },
// { value: '!', tag: 'punctuation' } ]// Remove stop words:
var t = nlp.tokens.removeWords( [ 'mary', 'had', 'a', 'little', 'lamb' ] );
console.log( t );
// -> [ 'mary', 'little', 'lamb' ]```
Try [experimenting with these examples on Runkit](https://npm.runkit.com/wink-nlp-utils) in the browser.
### Documentation
Check out the [wink NLP utilities API](http://winkjs.org/wink-nlp-utils/) documentation to learn more.### Need Help?
If you spot a bug and the same has not yet been reported, raise a new [issue](https://github.com/winkjs/wink-nlp-utils/issues) or consider fixing it and sending a pull request.### About wink
[Wink](http://winkjs.org/) is a family of open source packages for **Statistical Analysis**, **Natural Language Processing** and **Machine Learning** in NodeJS. The code is **thoroughly documented** for easy human comprehension and has a **test coverage of ~100%** for reliability to build production grade solutions.### Copyright & License
**wink-nlp-utils** is copyright 2017-22 [GRAYPE Systems Private Limited](http://graype.in/).It is licensed under the terms of the MIT License.