Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/camspiers/statistical-classifier
A PHP implementation of a Naive Bayes statistical classifier, including a structure for building other classifiers, multiple data sources and multiple caching backends.
https://github.com/camspiers/statistical-classifier
Last synced: about 2 months ago
JSON representation
A PHP implementation of a Naive Bayes statistical classifier, including a structure for building other classifiers, multiple data sources and multiple caching backends.
- Host: GitHub
- URL: https://github.com/camspiers/statistical-classifier
- Owner: camspiers
- License: mit
- Created: 2012-09-29T22:18:06.000Z (almost 12 years ago)
- Default Branch: master
- Last Pushed: 2016-09-26T20:55:35.000Z (almost 8 years ago)
- Last Synced: 2024-07-13T17:43:24.125Z (2 months ago)
- Language: PHP
- Homepage: http://camspiers.github.io/statistical-classifier/
- Size: 29.4 MB
- Stars: 175
- Watchers: 22
- Forks: 26
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# PHP Classifier
[![Build Status](https://travis-ci.org/camspiers/statistical-classifier.png?branch=master)](https://travis-ci.org/camspiers/statistical-classifier)
[![Latest Stable Version](https://poser.pugx.org/camspiers/statistical-classifier/v/stable.png)](https://packagist.org/packages/camspiers/statistical-classifier)PHP Classifier uses [semantic versioning](http://semver.org/), it is currently at major version 0, so the public API should not be considered stable.
# What is it?
PHP Classifier is a text classification library with a focus on reuse, customizability and performance.
Classifiers can be used for many purposes, but are particularly useful in detecting spam.## Features
* Complement Naive Bayes Classifier
* SVM (libsvm) Classifier
* Highly customizable (easily modify or build your own classifier)
* Command-line interface via separate library (phar archive)
* Multiple **data import types** to get your data into the classifier (Directory of files, Database queries, Json, Serialized arrays)
* Multiple types of **model caching**
* Compatible with HipHop VM# Installation
```bash
$ composer require camspiers/statistical-classifier
```## SVM Support
For SVM Support both libsvm and php-svm are required. For installation intructions refer to [php-svm](https://github.com/ianbarber/php-svm).
# Usage
## Non-cached Naive Bayes
```php
use Camspiers\StatisticalClassifier\Classifier\ComplementNaiveBayes;
use Camspiers\StatisticalClassifier\DataSource\DataArray;$source = new DataArray();
$source->addDocument('spam', 'Some spam document');
$source->addDocument('spam', 'Another spam document');
$source->addDocument('ham', 'Some ham document');
$source->addDocument('ham', 'Another ham document');$classifier = new ComplementNaiveBayes($source);
$classifier->is('ham', 'Some ham document'); // bool(true)
$classifier->classify('Some ham document'); // string "ham"
```## Non-cached SVM
```php
use Camspiers\StatisticalClassifier\Classifier\SVM;
use Camspiers\StatisticalClassifier\DataSource\DataArray;$source = new DataArray()
$source->addDocument('spam', 'Some spam document');
$source->addDocument('spam', 'Another spam document');
$source->addDocument('ham', 'Some ham document');
$source->addDocument('ham', 'Another ham document');$classifier = new SVM($source);
$classifier->is('ham', 'Some ham document'); // bool(true)
$classifier->classify('Some ham document'); // string "ham"
```# Caching models
Caching models requires [maximebf/CacheCache](https://github.com/maximebf/CacheCache) which can be installed via packagist. Additional caching systems can be easily integrated.
## Cached Naive Bayes
```php
use Camspiers\StatisticalClassifier\Classifier\ComplementNaiveBayes;
use Camspiers\StatisticalClassifier\Model\CachedModel;
use Camspiers\StatisticalClassifier\DataSource\DataArray;$source = new DataArray();
$source->addDocument('spam', 'Some spam document');
$source->addDocument('spam', 'Another spam document');
$source->addDocument('ham', 'Some ham document');
$source->addDocument('ham', 'Another ham document');$model = new CachedModel(
'mycachename',
new CacheCache\Cache(
new CacheCache\Backends\File(
array(
'dir' => __DIR__
)
)
)
);$classifier = new ComplementNaiveBayes($source, $model);
$classifier->is('ham', 'Some ham document'); // bool(true)
$classifier->classify('Some ham document'); // string "ham"
```## Cached SVM
```php
use Camspiers\StatisticalClassifier\Classifier\SVM;
use Camspiers\StatisticalClassifier\Model\SVMCachedModel;
use Camspiers\StatisticalClassifier\DataSource\DataArray;$source = new DataArray();
$source->addDocument('spam', 'Some spam document');
$source->addDocument('spam', 'Another spam document');
$source->addDocument('ham', 'Some ham document');
$source->addDocument('ham', 'Another ham document');$model = new Model\SVMCachedModel(
__DIR__ . '/model.svm',
new CacheCache\Cache(
new CacheCache\Backends\File(
array(
'dir' => __DIR__
)
)
)
);$classifier = new SVM($source, $model);
$classifier->is('ham', 'Some ham document'); // bool(true)
$classifier->classify('Some ham document'); // string "ham"
```# Unit testing
statistical-classifier/ $ composer install --dev
statistical-classifier/ $ phpunit