An open API service indexing awesome lists of open source software.

https://github.com/permafrost-dev/text-classifier

Basic text classification using algorithms such as Naive-Bayes
https://github.com/permafrost-dev/text-classifier

naive-bayes text-classification text-classifier

Last synced: over 1 year ago
JSON representation

Basic text classification using algorithms such as Naive-Bayes

Awesome Lists containing this project

README

          

# text-classifier
Performs basic text classification using algorithms such as Naive-Bayes.

---
##### Installation:
You may install text-classifier using composer:

> `composer require permafrost-dev/text-classifier`

Note: The higher-quality and more complete training data used to train the model, the more accurate the classifications will be.

***
#### Example - Email Address Classification

A common use-case for classifying text is to determine whether or not an email is spam or not spam. While that's beyond
the scope of this example, we can try to determine if a given email address is spam or not spam based on its features.
*Note: all email addresses used for training/examples were randomly generated. If your email address somehow ended up
within the sample data, please contact packages@permafrost.dev and it will be promptly removed.*

```php
trainFromFile(__DIR__ . '/email-train.txt');

$emails = [
'blah44657457@whatever.rut',
'john@gmail.com',
];

foreach ($emails as $email) {
echo "classification for '$email': " . $tc->classify($email) . PHP_EOL;
}
```

Resulting output:

- `classification for 'blah44657457@whatever.rut': spam`
- `classification for 'john@gmail.com': valid`

This method can easily be applied to other areas for spam checking, such as classifiying user-provided domain names.

***

#### Example - Sentiment Analysis

See `examples/sentiment.php` for a working demo.

```php
trainFromFile(__DIR__ . '/sentiment-train.txt');

$phrases = [
'this is fantastic',
'this is terrible',
];

foreach ($phrases as $phrase) {
echo $phrase . ' - ' . $textClassifier->classify($phrase) . PHP_EOL;
}
```

Resulting output:

- `this is fantastic - positive`
- `this is terrible! - negative`

***

With more robust pre-processing and tokenizing, these methods can be applied to other data, such as determining whether or not
an email message is likely a spam message, whether a given article is of interest to a user based on basic preferences, and so on.

This does only go so far, however - machine learning is recommended when highly-accurate results are needed.