https://github.com/yooper/php-text-analysis

PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language
https://github.com/yooper/php-text-analysis

nlp php php-language php-text-analysis text-analysis tokenization

Last synced: 5 months ago
JSON representation

PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language

Host: GitHub
URL: https://github.com/yooper/php-text-analysis
Owner: yooper
License: mit
Created: 2012-05-21T02:36:54.000Z (about 14 years ago)
Default Branch: master
Last Pushed: 2024-12-28T11:55:17.000Z (over 1 year ago)
Last Synced: 2024-12-28T12:26:18.775Z (over 1 year ago)
Topics: nlp, php, php-language, php-text-analysis, text-analysis, tokenization
Language: PHP
Homepage: https://github.com/yooper/php-text-analysis/wiki
Size: 1.01 MB
Stars: 526
Watchers: 42
Forks: 87
Open Issues: 8
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

awesome-php-ml - yooper/php-text-analysis

README

          php-text-analysis

=============

![alt text](https://travis-ci.org/yooper/php-text-analysis.svg?branch=master "Build status")

[![Latest Stable Version](https://poser.pugx.org/yooper/php-text-analysis/v/stable)](https://packagist.org/packages/yooper/php-text-analysis)

[![Total Downloads](https://poser.pugx.org/yooper/php-text-analysis/downloads)](https://packagist.org/packages/yooper/php-text-analysis)

PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language. 

There are tools in this library that can perform:

* document classification

* sentiment analysis

* compare documents

* frequency analysis

* tokenization

* stemming

* collocations with Pointwise Mutual Information

* lexical diversity

* corpus analysis

* text summarization

All the documentation for this project can be found in the book and wiki. 

PHP Text Analysis Book & Wiki

=============

A book is in the works and your contributions are needed. You can find the book

at https://github.com/yooper/php-text-analysis-book

Also, documentation for the library resides in the wiki, too. 

https://github.com/yooper/php-text-analysis/wiki

Installation Instructions

=============

Add PHP Text Analysis to your project

```

composer require yooper/php-text-analysis

```

### Tokenization

```php

$tokens = tokenize($text);

```

You can customize which type of tokenizer to tokenize with by passing in the name of the tokenizer class

```php

$tokens = tokenize($text, \TextAnalysis\Tokenizers\PennTreeBankTokenizer::class);

```

The default tokenizer is **\TextAnalysis\Tokenizers\GeneralTokenizer::class** . Some tokenizers require parameters to be set upon instantiation. 

### Normalization

By default, **normalize_tokens** uses the function **strtolower** to lowercase all the tokens. To customize

the normalize function, pass in either a function or a string to be used by array_map. 

```php

$normalizedTokens = normalize_tokens(array $tokens); 

```

```php

$normalizedTokens = normalize_tokens(array $tokens, 'mb_strtolower');

$normalizedTokens = normalize_tokens(array $tokens, function($token){ return mb_strtoupper($token); });

```

### Frequency Distributions

The call to **freq_dist** returns a [FreqDist](https://github.com/yooper/php-text-analysis/blob/master/src/Analysis/FreqDist.php) instance. 

```php

$freqDist = freq_dist(tokenize($text));

```

### Ngram Generation

By default bigrams are generated.

```php

$bigrams = ngrams($tokens);

```

Customize the ngrams

```php

// create trigrams with a pipe delimiter in between each word

$trigrams = ngrams($tokens,3, '|');

```

 

### Stemming

By default stem method uses the Porter Stemmer.

```php

$stemmedTokens = stem($tokens);

```

You can customize which type of stemmer to use by passing in the name of the stemmer class name

```php

$stemmedTokens = stem($tokens, \TextAnalysis\Stemmers\MorphStemmer::class);

```

### Keyword Extract with Rake

There is a short cut method for using the Rake algorithm. You will need to clean

your data prior to using. Second parameter is the ngram size of your keywords to extract.

```php

$rake = rake($tokens, 3);

$results = $rake->getKeywordScores();

```

### Sentiment Analysis with Vader

Need Sentiment Analysis with PHP Use Vader, https://github.com/cjhutto/vaderSentiment .

The PHP implementation can be invoked easily. Just normalize your data before hand.

```php

$sentimentScores = vader($tokens);

```

### Document Classification with Naive Bayes

Need to do some document classification with PHP, trying using the Naive Bayes

implementation. An example of classifying movie reviews can be found in the unit

tests

```php

$nb = naive_bayes();

$nb->train('mexican', tokenize('taco nacho enchilada burrito'));        

$nb->train('american', tokenize('hamburger burger fries pop'));  

$nb->predict(tokenize('my favorite food is a burrito'));

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/yooper/php-text-analysis

Awesome Lists containing this project

README