Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/angeloskath/php-nlp-tools
Natural Language Processing Tools in PHP
https://github.com/angeloskath/php-nlp-tools
Last synced: 11 days ago
JSON representation
Natural Language Processing Tools in PHP
- Host: GitHub
- URL: https://github.com/angeloskath/php-nlp-tools
- Owner: angeloskath
- License: wtfpl
- Created: 2012-08-22T23:53:43.000Z (about 12 years ago)
- Default Branch: master
- Last Pushed: 2024-07-22T11:11:32.000Z (4 months ago)
- Last Synced: 2024-09-07T07:47:56.162Z (2 months ago)
- Language: PHP
- Size: 896 KB
- Stars: 743
- Watchers: 62
- Forks: 152
- Open Issues: 14
-
Metadata Files:
- Readme: README.markdown
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- web-stuff - PHP NLP Tools - Natural Language Processing Tools in PHP (PHP)
README
[PHP NlpTools](http://php-nlp-tools.com/)
=============NlpTools is a set of php 5.3+ classes for beginner to
semi advanced natural language processing work.Documentation
-------------You can find documentation and code examples at the project's [homepage](http://php-nlp-tools.com/documentation/).
Contents
---------### Classification Models ###
1. [Multinomial Naive Bayes](http://php-nlp-tools.com/documentation/bayesian-model.html)
2. [Maximum Entropy (Conditional Exponential model)](http://php-nlp-tools.com/documentation/maximum-entropy-model.html)### Topic Modeling ###
Lda is still experimental and quite slow but it works. [See an example](http://php-nlp-tools.com/posts/introducing-latent-dirichlet-allocation.html).
1. [Latent Dirichlet Allocation](http://php-nlp-tools.com/documentation/api/#NlpTools/Models/Lda)
### Clustering ###
1. [K-Means](http://php-nlp-tools.com/documentation/clustering.html)
2. [Hierarchical Agglomerative Clustering](http://php-nlp-tools.com/documentation/clustering.html)
* SingleLink
* CompleteLink
* GroupAverage### Tokenizers ###
1. [WhitespaceTokenizer](http://php-nlp-tools.com/documentation/api/#NlpTools/Tokenizers/WhitespaceTokenizer)
2. [WhitespaceAndPunctuationTokenizer](http://php-nlp-tools.com/documentation/api/#NlpTools/Tokenizers/WhitespaceAndPunctuationTokenizer)
3. [PennTreebankTokenizer](http://php-nlp-tools.com/documentation/api/#NlpTools/Tokenizers/PennTreebankTokenizer)
4. [RegexTokenizer](http://php-nlp-tools.com/documentation/api/#NlpTools\Tokenizers\RegexTokenizer)
5. [ClassifierBasedTokenizer](http://php-nlp-tools.com/documentation/api/#NlpTools/Tokenizers/ClassifierBasedTokenizer)
This tokenizer allows us to build a lot more complex tokenizers
than the previous ones### Documents ###
1. [TokensDocument](http://php-nlp-tools.com/documentation/api/#NlpTools/Documents/TokensDocument)
represents a bag of words model for a document.
2. [WordDocument](http://php-nlp-tools.com/documentation/api/#NlpTools/Documents/WordDocument)
represents a single word with the context of a larger document.
3. [TrainingDocument](http://php-nlp-tools.com/documentation/api/#NlpTools/Documents/TrainingDocument)
represents a document whose class is known.
4. [TrainingSet](http://php-nlp-tools.com/documentation/api/#NlpTools/Documents/TrainingSet)
a collection of TrainingDocuments### Feature factories ###
1. [FunctionFeatures](http://php-nlp-tools.com/documentation/api/#NlpTools/FeatureFactories/FunctionFeatures)
Allows the creation of a feature factory from a number of callables
2. [DataAsFeatures](http://php-nlp-tools.com/documentation/api/#NlpTools/FeatureFactories/DataAsFeatures)
Simply return the data as features.### Similarity ###
1. [Jaccard Index](http://php-nlp-tools.com/documentation/api/#NlpTools/Similarity/JaccardIndex)
2. [Cosine similarity](http://php-nlp-tools.com/documentation/api/#NlpTools/Similarity/CosineSimilarity)
3. [Simhash](http://php-nlp-tools.com/documentation/api/#NlpTools/Similarity/Simhash)
4. [Euclidean](http://php-nlp-tools.com/documentation/api/#NlpTools/Similarity/Euclidean)
5. [HammingDistance](http://php-nlp-tools.com/documentation/api/#NlpTools/Similarity/HammingDistance)### Stemmers ###
1. [PorterStemmer](http://php-nlp-tools.com/documentation/api/#NlpTools/Stemmers/PorterStemmer)
2. [RegexStemmer](http://php-nlp-tools.com/documentation/api/#NlpTools/Stemmers/RegexStemmer)
3. [LancasterStemmer](http://php-nlp-tools.com/documentation/api/#NlpTools/Stemmers/LancasterStemmer)
4. [GreekStemmer](http://php-nlp-tools.com/documentation/api/#NlpTools/Stemmers/GreekStemmer)### Optimizers (MaxEnt only) ###
1. [A gradient descent optimizer](http://php-nlp-tools.com/documentation/api/#NlpTools/Optimizers/MaxentGradientDescent)
(written in php) for educational use.
It is a simple implementation for anyone wanting to know a bit
more about either GD or MaxEnt models
2. A fast (faster than nltk-scipy), parallel gradient descent
optimizer written in [Go](http://golang.org/). This optimizer
resides in another [repo](https://github.com/angeloskath/nlp-maxent-optimizer),
it is used via the [external optimizer](http://php-nlp-tools.com/documentation/api/#NlpTools/Optimizers/ExternalMaxentOptimizer).
TODO: At least write a readme for the optimizer written in Go.### Other ###
1. Idf Inverse document frequency
2. Stop words
3. Language based normalizers
4. Classifier based transformation for creating flexible preprocessing pipelines