Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/pmundkur/libcrm114
C library version of CRM114, and a Python binding
https://github.com/pmundkur/libcrm114
Last synced: 30 days ago
JSON representation
C library version of CRM114, and a Python binding
- Host: GitHub
- URL: https://github.com/pmundkur/libcrm114
- Owner: pmundkur
- Created: 2011-11-22T02:16:36.000Z (almost 13 years ago)
- Default Branch: master
- Last Pushed: 2013-04-07T20:35:11.000Z (over 11 years ago)
- Last Synced: 2024-07-31T22:56:51.713Z (3 months ago)
- Language: C
- Homepage: http://crm114.sourceforge.net/wiki/doku.php?id=download
- Size: 383 KB
- Stars: 14
- Watchers: 4
- Forks: 7
- Open Issues: 2
-
Metadata Files:
- Readme: README
Awesome Lists containing this project
README
From http://crm114.sourceforge.net/wiki/doku.php?id=download:
CRM114 C-callable Library
This is the callable library version of CRM114. It has most of the
classifiers as the standalone language (with some significant
improvements- one alpha tester says they saw a 10x speedup in their
application). This version is LGPLed (Library GPL) so you can link it
with your own code, whether open-source or proprietary. You still need
TRE (on Fedora, “yum install tre-devel”). Note that with improvements
come costs: libcrm114 classifiers are NOT compatible with standalone
CRM114 class files (necessary, because libcrm114 classifiers can work
even on systems that don't have filesystems, like embedded
processors). The code is now pretty stable and the API solidly
entrenched by use in several real products, so the api is unlikely to
change in unpleasant ways.Advantages of libcrm114: It's much faster; everything is
in-memory. You can call everything directly from ANSI C. Because
everything is in memory, it's good for embedded systems where you
don't _have_ a unix-style file system to talk to. No arcane language
to learn, it's all just ANSI C. You can export classifiers as ASCII
“CSV-like” format so trained classifiers are 32/64-bit portable and
cross-platform Linux/Mac/Windows portable (the internal binary
classifier format is still tied to a particular architecture, but
that's never exported any more).Disadvantages of libcrm114: Not all classifiers are currently
supported (in particular, Neural Net, Correllator, OSBF, and Winnow
are NOT yet supported). There's no crazy language, so you need to get
your data into memory on your own. You still need TRE. You do pay a
(not horrible) startup cost loading a classifier from a an ASCII
CSV-like file, but since you can then reuse the classifier for as many
documents as you want, in the long term this cost is amortized down to
zero and you get significant speedup.Dependencies
Debian/Ubuntu: libtre5, libtre-dev
Building
$ make && cd python && python setup build