{"id":43308407,"url":"https://github.com/yooper/php-text-analysis","last_synced_at":"2026-02-12T21:00:35.399Z","repository":{"id":3344662,"uuid":"4389291","full_name":"yooper/php-text-analysis","owner":"yooper","description":"PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language","archived":false,"fork":false,"pushed_at":"2024-12-28T11:55:17.000Z","size":1057,"stargazers_count":526,"open_issues_count":8,"forks_count":87,"subscribers_count":42,"default_branch":"master","last_synced_at":"2024-12-28T12:26:18.775Z","etag":null,"topics":["nlp","php","php-language","php-text-analysis","text-analysis","tokenization"],"latest_commit_sha":null,"homepage":"https://github.com/yooper/php-text-analysis/wiki","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yooper.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2012-05-21T02:36:54.000Z","updated_at":"2024-12-28T11:54:39.000Z","dependencies_parsed_at":"2023-12-25T10:50:44.305Z","dependency_job_id":null,"html_url":"https://github.com/yooper/php-text-analysis","commit_stats":{"total_commits":199,"total_committers":18,"mean_commits":"11.055555555555555","dds":0.3065326633165829,"last_synced_commit":"9b96d252f334f8ee35e067a0c7a40c24dc87a01d"},"previous_names":[],"tags_count":43,"template":false,"template_full_name":null,"purl":"pkg:github/yooper/php-text-analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yooper%2Fphp-text-analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yooper%2Fphp-text-analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yooper%2Fphp-text-analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yooper%2Fphp-text-analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yooper","download_url":"https://codeload.github.com/yooper/php-text-analysis/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yooper%2Fphp-text-analysis/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29381022,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-12T20:34:40.886Z","status":"ssl_error","status_checked_at":"2026-02-12T20:23:00.490Z","response_time":55,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["nlp","php","php-language","php-text-analysis","text-analysis","tokenization"],"created_at":"2026-02-01T21:00:22.048Z","updated_at":"2026-02-12T21:00:35.384Z","avatar_url":"https://github.com/yooper.png","language":"PHP","funding_links":[],"categories":["Natural Language Processing"],"sub_categories":["Recommended core stack"],"readme":"php-text-analysis\n=============\n![alt text](https://travis-ci.org/yooper/php-text-analysis.svg?branch=master \"Build status\")\n\n[![Latest Stable Version](https://poser.pugx.org/yooper/php-text-analysis/v/stable)](https://packagist.org/packages/yooper/php-text-analysis)\n\n[![Total Downloads](https://poser.pugx.org/yooper/php-text-analysis/downloads)](https://packagist.org/packages/yooper/php-text-analysis)\n\nPHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language. \nThere are tools in this library that can perform:\n\n* document classification\n* sentiment analysis\n* compare documents\n* frequency analysis\n* tokenization\n* stemming\n* collocations with Pointwise Mutual Information\n* lexical diversity\n* corpus analysis\n* text summarization\n\nAll the documentation for this project can be found in the book and wiki. \n\nPHP Text Analysis Book \u0026 Wiki\n=============\n\nA book is in the works and your contributions are needed. You can find the book\nat https://github.com/yooper/php-text-analysis-book\n\n\nAlso, documentation for the library resides in the wiki, too. \nhttps://github.com/yooper/php-text-analysis/wiki\n\n\nInstallation Instructions\n=============\n\nAdd PHP Text Analysis to your project\n```\ncomposer require yooper/php-text-analysis\n```\n\n### Tokenization\n```php\n$tokens = tokenize($text);\n```\n\nYou can customize which type of tokenizer to tokenize with by passing in the name of the tokenizer class\n```php\n$tokens = tokenize($text, \\TextAnalysis\\Tokenizers\\PennTreeBankTokenizer::class);\n```\nThe default tokenizer is **\\TextAnalysis\\Tokenizers\\GeneralTokenizer::class** . Some tokenizers require parameters to be set upon instantiation. \n\n### Normalization\nBy default, **normalize_tokens** uses the function **strtolower** to lowercase all the tokens. To customize\nthe normalize function, pass in either a function or a string to be used by array_map. \n\n```php\n$normalizedTokens = normalize_tokens(array $tokens); \n```\n\n```php\n$normalizedTokens = normalize_tokens(array $tokens, 'mb_strtolower');\n\n$normalizedTokens = normalize_tokens(array $tokens, function($token){ return mb_strtoupper($token); });\n```\n\n### Frequency Distributions\n\nThe call to **freq_dist** returns a [FreqDist](https://github.com/yooper/php-text-analysis/blob/master/src/Analysis/FreqDist.php) instance. \n```php\n$freqDist = freq_dist(tokenize($text));\n```\n\n### Ngram Generation\nBy default bigrams are generated.\n```php\n$bigrams = ngrams($tokens);\n```\nCustomize the ngrams\n```php\n// create trigrams with a pipe delimiter in between each word\n$trigrams = ngrams($tokens,3, '|');\n```\n \n### Stemming\nBy default stem method uses the Porter Stemmer.\n```php\n$stemmedTokens = stem($tokens);\n```\nYou can customize which type of stemmer to use by passing in the name of the stemmer class name\n```php\n$stemmedTokens = stem($tokens, \\TextAnalysis\\Stemmers\\MorphStemmer::class);\n```\n\n### Keyword Extract with Rake\nThere is a short cut method for using the Rake algorithm. You will need to clean\nyour data prior to using. Second parameter is the ngram size of your keywords to extract.\n```php\n$rake = rake($tokens, 3);\n$results = $rake-\u003egetKeywordScores();\n```\n\n### Sentiment Analysis with Vader\nNeed Sentiment Analysis with PHP Use Vader, https://github.com/cjhutto/vaderSentiment .\nThe PHP implementation can be invoked easily. Just normalize your data before hand.\n```php\n$sentimentScores = vader($tokens);\n```\n\n### Document Classification with Naive Bayes\nNeed to do some document classification with PHP, trying using the Naive Bayes\nimplementation. An example of classifying movie reviews can be found in the unit\ntests\n\n```php\n$nb = naive_bayes();\n$nb-\u003etrain('mexican', tokenize('taco nacho enchilada burrito'));        \n$nb-\u003etrain('american', tokenize('hamburger burger fries pop'));  \n$nb-\u003epredict(tokenize('my favorite food is a burrito'));\n```\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyooper%2Fphp-text-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyooper%2Fphp-text-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyooper%2Fphp-text-analysis/lists"}