https://github.com/monkeymars/php-nlp-tools
Natural Language Processing Tools in PHP
https://github.com/monkeymars/php-nlp-tools
Last synced: 9 months ago
JSON representation
Natural Language Processing Tools in PHP
- Host: GitHub
- URL: https://github.com/monkeymars/php-nlp-tools
- Owner: monkeymars
- License: wtfpl
- Fork: true (angeloskath/php-nlp-tools)
- Created: 2013-09-01T15:48:57.000Z (almost 13 years ago)
- Default Branch: master
- Last Pushed: 2013-08-27T21:13:51.000Z (almost 13 years ago)
- Last Synced: 2024-10-02T07:56:58.338Z (over 1 year ago)
- Language: PHP
- Size: 449 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.markdown
- License: LICENSE
Awesome Lists containing this project
README
[PHP NlpTools](http://php-nlp-tools.com/)
=============
NlpTools is a set of php 5.3+ classes for beginner to
semi advanced natural language processing work.
Documentation
-------------
You can find documentation and code examples at the project's [homepage](http://php-nlp-tools.com/documentation/).
Contents
---------
### Classification Models ###
1. [Multinomial Naive Bayes](http://php-nlp-tools.com/documentation/bayesian-model.html)
2. [Maximum Entropy (Conditional Exponential model)](http://php-nlp-tools.com/documentation/maximum-entropy-model.html)
### Topic Modeling ###
Lda is still experimental and quite slow but it works. [See an example](http://php-nlp-tools.com/posts/introducing-latent-dirichlet-allocation.html).
1. [Latent Dirichlet Allocation](http://php-nlp-tools.com/documentation/api/#NlpTools/Models/Lda)
### Clustering ###
Hierarchical and Expectations Maximization are coming soon.
1. [K-Means](http://php-nlp-tools.com/documentation/clustering.html)
### Tokenizers ###
1. [WhitespaceTokenizer](http://php-nlp-tools.com/documentation/api/#NlpTools/Tokenizers/WhitespaceTokenizer)
2. [WhitespaceAndPunctuationTokenizer](http://php-nlp-tools.com/documentation/api/#NlpTools/Tokenizers/WhitespaceAndPunctuationTokenizer)
3. [ClassifierBasedTokenizer](http://php-nlp-tools.com/documentation/api/#NlpTools/Tokenizers/ClassifierBasedTokenizer)
This tokenizer allows us to build a lot more complex tokenizers
than the previous ones
### Documents ###
1. [TokensDocument](http://php-nlp-tools.com/documentation/api/#NlpTools/Documents/TokensDocument)
represents a bag of words model for a document.
2. [WordDocument](http://php-nlp-tools.com/documentation/api/#NlpTools/Documents/WordDocument)
represents a single word with the context of a larger document.
3. [TrainingDocument](http://php-nlp-tools.com/documentation/api/#NlpTools/Documents/TrainingDocument)
represents a document whose class is known.
4. [TrainingSet](http://php-nlp-tools.com/documentation/api/#NlpTools/Documents/TrainingSet)
a collection of TrainingDocuments
### Feature factories ###
1. [FunctionFeatures](http://php-nlp-tools.com/documentation/api/#NlpTools/FeatureFactories/FunctionFeatures)
Allows the creation of a feature factory from a number of callables
2. [DataAsFeatures](http://php-nlp-tools.com/documentation/api/#NlpTools/FeatureFactories/DataAsFeatures)
Simply return the data as features.
### Similarity ###
1. [Jaccard Index](http://php-nlp-tools.com/documentation/api/#NlpTools/Similarity/JaccardIndex)
2. [Cosine similarity](http://php-nlp-tools.com/documentation/api/#NlpTools/Similarity/CosineSimilarity)
3. [Simhash](http://php-nlp-tools.com/documentation/api/#NlpTools/Similarity/Simhash)
4. [Euclidean](http://php-nlp-tools.com/documentation/api/#NlpTools/Similarity/Euclidean)
5. [HammingDistance](http://php-nlp-tools.com/documentation/api/#NlpTools/Similarity/HammingDistance)
### Stemmers ###
1. [PorterStemmer](http://php-nlp-tools.com/documentation/api/#NlpTools/Stemmers/PorterStemmer)
2. [RegexStemmer](http://php-nlp-tools.com/documentation/api/#NlpTools/Stemmers/RegexStemmer)
### Optimizers (MaxEnt only) ###
1. [A gradient descent optimizer](http://php-nlp-tools.com/documentation/api/#NlpTools/Optimizers/MaxentGradientDescent)
(written in php) for educational use.
It is a simple implementation for anyone wanting to know a bit
more about either GD or MaxEnt models
2. A fast (faster than nltk-scipy), parallel gradient descent
optimizer written in [Go](http://golang.org/). This optimizer
resides in another [repo](https://github.com/angeloskath/nlp-maxent-optimizer),
it is used via the [external optimizer](http://php-nlp-tools.com/documentation/api/#NlpTools/Optimizers/ExternalMaxentOptimizer).
TODO: At least write a readme for the optimizer written in Go.