Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/linguistic-dev/n-gram-extractor
A PHP Library to extract n-grams from a text. Simple preprocessing tools (cleaning, tokenizing) included.
https://github.com/linguistic-dev/n-gram-extractor
natural-language-processing ngram ngram-analysis ngrams nlp php php-library php7 tokenization tokenize tokenized-sentences tokenizer
Last synced: about 1 month ago
JSON representation
A PHP Library to extract n-grams from a text. Simple preprocessing tools (cleaning, tokenizing) included.
- Host: GitHub
- URL: https://github.com/linguistic-dev/n-gram-extractor
- Owner: linguistic-dev
- License: gpl-2.0
- Created: 2017-12-05T22:23:34.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2017-12-05T23:09:36.000Z (about 7 years ago)
- Last Synced: 2024-10-28T22:10:53.569Z (2 months ago)
- Topics: natural-language-processing, ngram, ngram-analysis, ngrams, nlp, php, php-library, php7, tokenization, tokenize, tokenized-sentences, tokenizer
- Language: PHP
- Size: 28.3 KB
- Stars: 3
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# NGramExtractor for PHP
## Installation
Simple install via Composer:
```
composer require linguistic/ngramextractor
```## Usage
Coming soon.
## Example
```php
$tokenizer = new Tokenizer();
$tokenizer->addRemovalRule('/<\/?\w+[\s\w\=\"\/\#\-\:\.\_]*>/') # Removes HTML Tags
->addRemovalRule('/[^a-z0-9]+/', ' ') # Replaces everything which is not text with a space
->setSeperator('/\s+/'); # Tokenizes text with whitespace as delimiter
``````php
$content = ""; # The text that should get tokenized
$stopwords = array(); # (optional) array of stopwords$extractor = new NGramExtractor($content, $tokenizer, $stopwords);
$unigrams = $extractor->getNGrams(1); # gets all n-grams in the text, n = 1$unigramsFiltered = NGramExtractor::limitByOccurance($extractor->getNGramCount(1, true), 3); # get unigrams and their occurance if the occurance is greater or equal 3
```
## Ressources* [Download of stopword lists for different languages](http://members.unine.ch/jacques.savoy/clef/index.html)