Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/andreekeberg/ml-classify-text-js
Machine learning based text classification in JavaScript using n-grams and cosine similarity
https://github.com/andreekeberg/ml-classify-text-js
artificial-intelligence classification classifier cosine-similarity labels library machine-learning n-gram n-grams natural-language-processing predictions sentiment-analysis similarity text-classification text-classifier training
Last synced: 3 months ago
JSON representation
Machine learning based text classification in JavaScript using n-grams and cosine similarity
- Host: GitHub
- URL: https://github.com/andreekeberg/ml-classify-text-js
- Owner: andreekeberg
- License: mit
- Created: 2020-08-26T20:27:29.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2024-04-21T06:15:27.000Z (9 months ago)
- Last Synced: 2024-10-02T02:42:13.607Z (4 months ago)
- Topics: artificial-intelligence, classification, classifier, cosine-similarity, labels, library, machine-learning, n-gram, n-grams, natural-language-processing, predictions, sentiment-analysis, similarity, text-classification, text-classifier, training
- Language: JavaScript
- Homepage: https://www.npmjs.com/package/ml-classify-text
- Size: 87.9 KB
- Stars: 126
- Watchers: 7
- Forks: 12
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# 📄 ClassifyText (JS)
[![Version](https://img.shields.io/npm/v/ml-classify-text)](https://www.npmjs.com/package/ml-classify-text) [![Total Downloads](https://img.shields.io/npm/dt/ml-classify-text)](https://www.npmjs.com/package/ml-classify-text) [![License](https://img.shields.io/npm/l/ml-classify-text)](https://www.npmjs.com/package/ml-classify-text)
Use machine learning to classify text using [n-grams](https://en.wikipedia.org/wiki/N-gram) and [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity).
Minimal library that can be used both in the **browser** and in **Node.js**, that allows you to train a model with a large amount of text samples (and corresponding labels), and then use this model to quickly predict one or more appropriate labels for new text samples.
## Installation
**Using npm**
```
npm install ml-classify-text
```**Using yarn**
```
yarn add ml-classify-text
```## Getting started
**Import as an ES6 module**
```javascript
import Classifier from 'ml-classify-text'
```**Import as a CommonJS module**
```javascript
const { Classifier } = require('ml-classify-text')
```## Basic usage
### Setting up a new Classifier instance
```javascript
const classifier = new Classifier()
```### Training a model
```javascript
const positive = [
'This is great, so cool!',
'Wow, I love it!',
'It really is amazing'
]const negative = [
'This is really bad',
'I hate it with a passion',
'Just terrible!'
]classifier.train(positive, 'positive')
classifier.train(negative, 'negative')
```### Getting a prediction
```javascript
const predictions = classifier.predict('It sure is pretty great!')if (predictions.length) {
predictions.forEach((prediction) => {
console.log(`${prediction.label} (${prediction.confidence})`)
})
} else {
console.log('No predictions returned')
}
```Returning:
```
positive (0.5423261445466404)
```## Advanced usage
### Configuration
The following configuration options can be passed both directly to a new [Model](docs/model.md), or indirectly by passing it to the [Classifier](docs/classifier.md) constructor.
#### Options
| Property | Type | Default | Description |
| -------------- | --------------------------- | ------- | ----------------------------------------------------------------------------------------------------- |
| **nGramMin** | `int` | `1` | Minimum n-gram size |
| **nGramMax** | `int` | `1` | Maximum n-gram size |
| **vocabulary** | `Array` \| `Set` \| `false` | `[]` | Terms mapped to indexes in the model data, set to `false` to store terms directly in the data entries |
| **data** | `Object` | `{}` | Key-value store of labels and training data vectors |### Using n-grams
The default behavior is to split up texts by single words (known as a [bag of words](https://en.wikipedia.org/wiki/Bag-of-words_model), or unigrams).
This has a few limitations, since by ignoring the order of words, it's impossible to correctly match phrases and expressions.
In comes [n-grams](https://en.wikipedia.org/wiki/N-gram), which, when set to use more than one word per term, act like a sliding window that moves across the text — a continuous sequence of words of the specified amount, which can greatly improve the accuracy of predictions.
#### Example of using n-grams with a size of 2 (bigrams)
```javascript
const classifier = new Classifier({
nGramMin: 2,
nGramMax: 2
})const tokens = classifier.tokenize('I really dont like it')
console.log(tokens)
```Returning:
```javascript
{
'i really': 1,
'really dont': 1,
'dont like': 1,
'like it': 1
}
```### Serializing a model
After training a model with large sets of data, you'll want to store all this data, to allow you to simply set up a new model using this training data at another time, and quickly make predictions.
To do this, simply use the `serialize` method on your [Model](docs/model.md), and either save the data structure to a file, send it to a server, or store it in any other way you want.
```javascript
const model = classifier.modelconsole.log(model.serialize())
```Returning:
```
{
nGramMin: 1,
nGramMax: 1,
vocabulary: [
'this', 'is', 'great',
'so', 'cool', 'wow',
'i', 'love', 'it',
'really', 'amazing', 'bad',
'hate', 'with', 'a',
'passion', 'just', 'terrible'
],
data: {
positive: {
'0': 1, '1': 2, '2': 1,
'3': 1, '4': 1, '5': 1,
'6': 1, '7': 1, '8': 2,
'9': 1, '10': 1
},
negative: {
'0': 1, '1': 1, '6': 1,
'8': 1, '9': 1, '11': 1,
'12': 1, '13': 1, '14': 1,
'15': 1, '16': 1, '17': 1
}
}
}
```## Documentation
- [Classifier](docs/Classifier.md)
- [Model](docs/Model.md)
- [Vocabulary](docs/Vocabulary.md)
- [Prediction](docs/Prediction.md)## Contributing
Read the [contribution guidelines](CONTRIBUTING.md).
## Changelog
Refer to the [changelog](CHANGELOG.md) for a full history of the project.
## License
ClassifyText is licensed under the [MIT license](LICENSE).