Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/briancline/crm114-python
A Python module for The CRM-114 Discriminator, which handles learning and classification of text streams.
https://github.com/briancline/crm114-python
Last synced: about 1 month ago
JSON representation
A Python module for The CRM-114 Discriminator, which handles learning and classification of text streams.
- Host: GitHub
- URL: https://github.com/briancline/crm114-python
- Owner: briancline
- License: other
- Created: 2013-10-29T18:26:46.000Z (about 11 years ago)
- Default Branch: master
- Last Pushed: 2017-07-15T03:01:07.000Z (over 7 years ago)
- Last Synced: 2024-12-15T15:48:23.248Z (about 1 month ago)
- Language: Python
- Size: 205 KB
- Stars: 11
- Watchers: 4
- Forks: 7
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[CRM-114][1] Python Module
==========================This project is a Python module to interact with [The CRM-114
Discriminator][2], which handles learning and classification of text streams.
While written and used primarily in spam classification, CRM-114 handles text
streams of logs, data, etc. just as well with recorded accuracy rates
exceeding 99.9%. A wide variety of methods can be used with CRM-114, namely
regular expressions, approximate regular expressions, Hidden Markov Model,
Orthogonal Sparse Bigrams (OSB), winnow, general correlation,
K-Nearest-Neighbor, and bit entropy.Originally crafted by [Sam Deane][3] of [Elegant Chaos][4] and
[Born Sleepy][5], with ongoing improvements and maintenance by
[Brian Cline][6].This module provides a very simplified interface to CRM-114. It does not
attempt to expose all of CRM-114's power; instead it tries to hide almost all
of the gory details.Requirements
------------Python 2.7 is strongly recommended.
Naturally, the `crm` binary itself is required, and should be in your path.
Follow the instructions here for your operating system to install CRM-114.### Debian, Ubuntu, et al.
apt-get install crm114
### CentOS, Fedora, Red Hat, et al.
## Install and enable the EPEL repository package
rpm -Uvh http://mirror.steadfast.net/epel/6/i386/epel-release-6-8.noarch.rpm
yum install --enable-repo=epel crm114### Everyone else
## If you do not yet have libtre and its headers:
curl -O http://crm114.sourceforge.net/tarballs/tre-0.7.5.tar.gz
tar -zxf tre-*.tar.gz
cd tre-*
./configure --enable-static
make
sudo make install
cd ..curl -O http://crm114.sourceforge.net/tarballs/crm114-20100106-BlameMichelson.src.tar.gz
tar -zxf crm114-*.tar.gz
cd crm114*.src
make
sudo make install
cd ..Installation
------------This is really all you need:
sudo pip install crm114
Usage
-----To use the module, create an instance of the `Classifier` class, giving it the
path to a directory where the data files will be stored, and a list of all
possible category strings--or labels--under which text will be classified.c = Classifier('/path/to/my/data', ['good', 'bad'])
To teach the classifier object about some text, call the learn method passing
in a category (one of the categories that you previously provided), and the
text.c.learn('good', 'some good text')
c.learn('bad', 'some bad text')To find out what the classifier thinks about a body of text, call the classify
method, passing in the text. The result of this method is a pair: the first
item is a category best matching the text, and the second item is the
confidence/probability of that match.label, confidence = c.classify('some text')
For a rather contrived but possibly helpful detailed example, see
`tests/basic.py`.License
-------Released under the MIT License. See LICENSE file for details.
Original code licensed under GPLv2, re-licensed 29 October 2013 by Sam Deane
under the MIT license for further curation, maintenance, and packaging.[1]: http://en.wikipedia.org/wiki/CRM_114_(fictional_device)
[2]: http://crm114.sourceforge.net/
[3]: https://github.com/samdeane
[4]: http://www.elegantchaos.com/
[5]: http://bornsleepy.com/
[6]: https://github.com/briancline