{"id":13566637,"url":"https://github.com/Donatello-za/rake-php-plus","last_synced_at":"2025-04-04T00:31:47.419Z","repository":{"id":10843975,"uuid":"67216249","full_name":"Donatello-za/rake-php-plus","owner":"Donatello-za","description":"A keyword and phrase extraction library based on the Rapid Automatic Keyword Extraction algorithm (RAKE).","archived":false,"fork":false,"pushed_at":"2023-05-04T07:35:38.000Z","size":220,"stargazers_count":252,"open_issues_count":3,"forks_count":47,"subscribers_count":6,"default_branch":"master","last_synced_at":"2024-05-15T14:54:21.643Z","etag":null,"topics":["extract","keyword","language","php","phrases","stopwords"],"latest_commit_sha":null,"homepage":"","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Donatello-za.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2016-09-02T11:12:36.000Z","updated_at":"2024-05-05T06:18:39.000Z","dependencies_parsed_at":"2022-08-07T06:00:41.594Z","dependency_job_id":"9275bc57-51f6-4def-ba95-4355628108a8","html_url":"https://github.com/Donatello-za/rake-php-plus","commit_stats":{"total_commits":70,"total_committers":13,"mean_commits":5.384615384615385,"dds":0.2571428571428571,"last_synced_commit":"e9e9c0862b3dc953d288e8f42c76e4ceaeca0619"},"previous_names":[],"tags_count":21,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Donatello-za%2Frake-php-plus","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Donatello-za%2Frake-php-plus/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Donatello-za%2Frake-php-plus/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Donatello-za%2Frake-php-plus/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Donatello-za","download_url":"https://codeload.github.com/Donatello-za/rake-php-plus/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223075327,"owners_count":17083500,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["extract","keyword","language","php","phrases","stopwords"],"created_at":"2024-08-01T13:02:13.710Z","updated_at":"2025-04-04T00:31:47.403Z","avatar_url":"https://github.com/Donatello-za.png","language":"PHP","funding_links":[],"categories":["PHP"],"sub_categories":[],"readme":"# rake-php-plus\nA keyword and phrase extraction library based on the Rapid Automatic Keyword Extraction algorithm (RAKE).\n\n[![Latest Stable Version](https://poser.pugx.org/donatello-za/rake-php-plus/v/stable)](https://packagist.org/packages/donatello-za/rake-php-plus)\n[![Total Downloads](https://poser.pugx.org/donatello-za/rake-php-plus/downloads)](https://packagist.org/packages/donatello-za/rake-php-plus)\n[![License](https://poser.pugx.org/donatello-za/rake-php-plus/license)](https://packagist.org/packages/donatello-za/rake-php-plus)\n\n## Introduction\n\nKeywords describe the main topics expressed in a document/text. Keyword *extraction* in turn allows for the extraction of important words and phrases from text. \n\nExtracted keywords can be used for things like:\n- Building a list of useful tags out of a larger text\n- Building search indexes and search engines\n- Grouping similar content by its topic.\n\nExtracted phrases can be used for things like:\n- Highlighting important areas of a larger text\n- Language or documentation analysis\n- Building intelligent searches based on contextual terms\n\nThis library provides an easy method for PHP developers to get a list of keywords and phrases from a string of text \nand is based on another smaller and unmaintained project called [RAKE-PHP](https://github.com/Richdark/RAKE-PHP) by Richard Filipčík, \nwhich is a translation from a Python implementation simply called [RAKE](https://github.com/aneesha/RAKE).\n\n\u003e *As described in: Rose, S., Engel, D., Cramer, N., \u0026 Cowley, W. (2010).\n[Automatic Keyword Extraction from Individual Documents](https://www.researchgate.net/publication/227988510_Automatic_Keyword_Extraction_from_Individual_Documents).\nIn M. W. Berry \u0026 J. Kogan (Eds.), Text Mining: Theory and Applications: John Wiley \u0026 Sons.*\n\nThis particular package intends to include the following benefits over the original [RAKE-PHP](https://github.com/Richdark/RAKE-PHP) package:\n\n1. [PSR-2](http://www.php-fig.org/psr/psr-2/) coding standards.\n2. [PSR-4](http://www.php-fig.org/psr/psr-4/) to be [Composer](https://getcomposer.org) installable.\n3. Additional functionality such as method chaining.\n4. Multiple ways to provide source stopwords.\n5. Full unit test coverage.\n6. Performance improvements.\n7. Improved documentation.\n8. Easy language integration and multibyte string support.\n\n## Currently Supported Languages\n\n* Afrikaans (af_ZA)\n* Arabic (United Arab Emirates)/لإمارات العربية المتحدة (ar_AE)\n* Brazilian Portuguese/português do Brasil (pt_BR)\n* English US (en_US)\n* European Portuguese/português europeu (pt_PT)\n* French/le français (fr_FR)\n* German (Germany)/Deutsch (Deutschland) (de_DE)\n* Italian/italiano (it_IT)\n* Polish/język polski (pl_PL)\n* Russian/русский язык (ru_RU)\n* Sorani Kurdish/سۆرانی (ckb_IQ)\n* Spanish/español (es_AR)\n* Tamil/தமிழ் (ta_TA)\n* Turkish/Türkçe (tr_TR)\n* Persian/Farsi/فارسی (fa_IR)\n* Dutch/Nederlands (nl_NL)\n* Swedish/svenska (sv_SE)\n\n\u003e If your language is not listed here it can be added, please see the section\ncalled **[How to add additional languages](#how-to-add-additional-languages)** at the bottom of the page.\n\n## Version\n\nv2.0.0\n\n## Special Thanks\n\n* [Sergey Kudashev](https://github.com/kudashevs): Big help with refactoring v2 of the library.\n* [Jarosław Wasilewski](https://github.com/Orajo): Polish language and improving multi-byte support.\n* [Lev Morozov](https://github.com/levmorozov): French and Russian languages.\n* [Igor Carvalho](https://github.com/Carvlho): Brazilian Portuguese language.\n* [Khoshbin Ali Ahmed](https://github.com/Xoshbin): Sorani Kurdish and Arabic languages.\n* [RhaPT](https://github.com/RhaPT): European Portuguese language.\n* [Peter Thaleikis](https://github.com/spekulatius): German language.\n* [Yusuf Usta](https://github.com/yusufusta): Turkish language.\n* [orthosie](https://github.com/orthosie): Tamil language.\n* [ScIEnzY](https://github.com/ScIEnzY): Italian language.\n* [Reza Rabbani](https://github.com/thrashzone13): Persian language.\n* [Anne van der Aar](https://github.com/annevanderaar): Dutch language.\n\n## Installation\n\n### Installing with Composer\n\nAlthough v1 of this library is compatible up to PHP 8.3 and will be\nmaintained for the foreseeable future, it is recommended to install\nv2 to get the benefits of what PHP 7.4 and above has to offer.\n\n```bash\n# Latest: PHP v7.4 to v8.3 support\n$ composer require donatello-za/rake-php-plus:^2.0\n\n# Older: PHP v5.4 to v8.3 support\n$ composer require donatello-za/rake-php-plus:^1.0\n```\n\nOr add the following to your `composer.json` and run\n```composer install```\n\n```json\n{\n    \"require\": {\n        \"donatello-za/rake-php-plus\": \"^2.0\"\n    }\n}\n```\n\n```php\n\u003c?php\nrequire 'vendor/autoload.php';\n\nuse DonatelloZa\\RakePlus\\RakePlus;\n```\n\n### Installing older versions\n\nIf you find that a release breaks backward compatibility you\ncan install an older version of RakePlus using Composer's version\nconstraints, for example, to install version 1.x of the library \nthat still supports PHP 5.4+, use:\n\n```bash\n$ composer require donatello-za/rake-php-plus:^1.0\n```\n\n*Note: The latest release of the v1 branch is v1.0.20*\n\n### Migrating from v1.x to v2.x\n\n1. Version 2.x of the library requires PHP 7.4 and above.\n\n2. The `StopwordArray` class have been renamed to `StopwordsArray`.\n   If you use this class directly, you will have to update it in your \n   own source code.\n\n3. If you use one of the `Stopwords*` provider classes directly, you\n   will have to provide the appropriate namespace for them as they have been \n   moved to a sub-folder, see the example below:\n\nPreviously you could use `StopwordsArray`, `StopwordsPatternFile` and\n`StopwordsPHP` without providing an additional namespace. Now you will\nhave to include the namespaces, for example:\n\n```php\nuse DonatelloZa\\RakePlus\\StopwordProviders\\StopwordsArray;\nuse DonatelloZa\\RakePlus\\StopwordProviders\\StopwordsPatternFile;\nuse DonatelloZa\\RakePlus\\StopwordProviders\\StopwordsPHP;\n\n$stopwords = StopwordsArray::create(['zero', 'z', 'you\\'ve', 'yourselves', ...]);\n$stopwords = StopwordsPatternFile::create('/path/to/my/stopwords.pattern');\n$stopwords = StopwordsPHP::create('/path/to/my/stopwords.php');\n````\n\n# Usage Examples\n\n## Example 1\n\nCreates a new instance of RakePlus, extract the phrases and return the results. Assumes that the specified\ntext is English (US).\n\n```php\nuse DonatelloZa\\RakePlus\\RakePlus;\n\n$text = \"Criteria of compatibility of a system of linear Diophantine equations, \" .\n    \"strict inequations, and nonstrict inequations are considered. Upper bounds \" .\n    \"for components of a minimal set of solutions and algorithms of construction \" .\n    \"of minimal generating sets of solutions for all types of systems are given.\";\n\n$phrases = RakePlus::create($text)-\u003eget();\n\nprint_r($phrases);\n```\n\n```\nArray\n(\n    [0] =\u003e criteria\n    [1] =\u003e compatibility\n    [2] =\u003e system\n    [3] =\u003e linear diophantine equations\n    [4] =\u003e strict inequations\n    [5] =\u003e nonstrict inequations\n    [6] =\u003e considered\n    [7] =\u003e upper bounds\n    [8] =\u003e components\n    [9] =\u003e minimal set\n    [10] =\u003e solutions\n    [11] =\u003e algorithms\n    [12] =\u003e construction\n    [13] =\u003e minimal generating sets\n    [14] =\u003e types\n    [15] =\u003e systems\n)\n```\n\n## Example 2\n\nCreates a new instance of RakePlus, extract the phrases in different orders\nand also shows how to get the phrase scores.\n\n```php\nuse DonatelloZa\\RakePlus\\RakePlus;\n\n$text = \"Criteria of compatibility of a system of linear Diophantine equations, \" .\n    \"strict inequations, and nonstrict inequations are considered. Upper bounds \" .\n    \"for components of a minimal set of solutions and algorithms of construction \" .\n    \"of minimal generating sets of solutions for all types of systems are given.\";\n\n// Note: en_US is the default language.\n$rake = RakePlus::create($text, 'en_US');\n\n// 'asc' is optional and is the default sort order\n$phrases = $rake-\u003esort('asc')-\u003eget();\nprint_r($phrases);\n```\n\n```\nArray\n(\n    [0] =\u003e algorithms\n    [1] =\u003e compatibility\n    [2] =\u003e components\n    [3] =\u003e considered\n    [4] =\u003e construction\n    [5] =\u003e criteria\n    [6] =\u003e linear diophantine equations\n    [7] =\u003e minimal generating sets\n    [8] =\u003e minimal set\n    [9] =\u003e nonstrict inequations\n    [10] =\u003e solutions\n    [11] =\u003e strict inequations\n    [12] =\u003e system\n    [13] =\u003e systems\n    [14] =\u003e types\n    [15] =\u003e upper bounds\n)\n```\n\n```php\n// Sort in descending order\n$phrases = $rake-\u003esort('desc')-\u003eget();\nprint_r($phrases);\n```\n\n```\nArray\n(\n    [0] =\u003e upper bounds\n    [1] =\u003e types\n    [2] =\u003e systems\n    [3] =\u003e system\n    [4] =\u003e strict inequations\n    [5] =\u003e solutions\n    [6] =\u003e nonstrict inequations\n    [7] =\u003e minimal set\n    [8] =\u003e minimal generating sets\n    [9] =\u003e linear diophantine equations\n    [10] =\u003e criteria\n    [11] =\u003e construction\n    [12] =\u003e considered\n    [13] =\u003e components\n    [14] =\u003e compatibility\n    [15] =\u003e algorithms\n)\n```\n\n```php\n// Sort the phrases by score and return the scores\n$phrase_scores = $rake-\u003esortByScore('desc')-\u003escores();\nprint_r($phrase_scores);\n```\n\n```\nArray\n(\n    [linear diophantine equations] =\u003e 9\n    [minimal generating sets] =\u003e 8.5\n    [minimal set] =\u003e 4.5\n    [strict inequations] =\u003e 4\n    [nonstrict inequations] =\u003e 4\n    [upper bounds] =\u003e 4\n    [criteria] =\u003e 1\n    [compatibility] =\u003e 1\n    [system] =\u003e 1\n    [considered] =\u003e 1\n    [components] =\u003e 1\n    [solutions] =\u003e 1\n    [algorithms] =\u003e 1\n    [construction] =\u003e 1\n    [types] =\u003e 1\n    [systems] =\u003e 1\n)\n```\n\n```php\n// Extract phrases from a new string on the same RakePlus instance. Using the\n// same RakePlus instance is faster than creating a new instance as the\n// language files do not have to be re-loaded and parsed.\n\n$text = \"A fast Fourier transform (FFT) algorithm computes...\";\n$phrases = $rake-\u003eextract($text)-\u003esort()-\u003eget();\nprint_r($phrases);\n```\n\n```\nArray\n(\n    [0] =\u003e algorithm computes\n    [1] =\u003e fast fourier transform\n    [2] =\u003e fft\n)\n```\n\n## Example 3\n\nCreates a new instance of RakePlus and extract the unique keywords from the phrases.\n\n```php\nuse DonatelloZa\\RakePlus\\RakePlus;\n\n$text = \"Criteria of compatibility of a system of linear Diophantine equations, \" .\n    \"strict inequations, and nonstrict inequations are considered. Upper bounds \" .\n    \"for components of a minimal set of solutions and algorithms of construction \" .\n    \"of minimal generating sets of solutions for all types of systems are given.\";\n\n$keywords = RakePlus::create($text)-\u003ekeywords();\nprint_r($keywords);\n```\n\n```\nArray\n(\n    [0] =\u003e criteria\n    [1] =\u003e compatibility\n    [2] =\u003e system\n    [3] =\u003e linear\n    [4] =\u003e diophantine\n    [5] =\u003e equations\n    [6] =\u003e strict\n    [7] =\u003e inequations\n    [8] =\u003e nonstrict\n    [9] =\u003e considered\n    [10] =\u003e upper\n    [11] =\u003e bounds\n    [12] =\u003e components\n    [13] =\u003e minimal\n    [14] =\u003e set\n    [15] =\u003e solutions\n    [16] =\u003e algorithms\n    [17] =\u003e construction\n    [18] =\u003e generating\n    [19] =\u003e sets\n    [20] =\u003e types\n    [21] =\u003e systems\n)\n```\n\n## Example 4\n\nCreates a new instance of RakePlus without using the static RakePlus::create method.\n\n```php\nuse DonatelloZa\\RakePlus\\RakePlus;\n\n$text = \"Criteria of compatibility of a system of linear Diophantine equations, \" .\n    \"strict inequations, and nonstrict inequations are considered. Upper bounds \" .\n    \"for components of a minimal set of solutions and algorithms of construction \" .\n    \"of minimal generating sets of solutions for all types of systems are given.\";\n\n$rake = new RakePlus();\n$phrases = $rake-\u003eextract()-\u003eget();\n\n// Alternative method:\n$phrases = (new RakePlus($text))-\u003eget();\n```\n\n## Example 5\n\nYou can provide custom stopwords in four different ways:\n\n```php\nuse DonatelloZa\\RakePlus\\RakePlus;\nuse DonatelloZa\\RakePlus\\StopwordProviders\\StopwordsArray;\n\n// 1: The standard way (provide a language code)\n//    RakePlus will first look for ./lang/en_US.pattern, if\n//    not found, it will look for ./lang/en_US.php.\n$rake = RakePlus::create($text, 'en_US');\n\n// 2: Pass an array containing stopwords, note the stopwords are in reverse order\n$rake = RakePlus::create($text, ['zero', 'z', 'you\\'ve', 'yourselves', ...]);\n\n// 3: Pass the name of a PHP or pattern file,\n//    see lang/en_US.php and lang/en_US.pattern for examples.\n$rake = RakePlus::create($text, '/path/to/my/stopwords.pattern');\n\n// 4: Create an instance of one of the stopword provider classes (or\n//    create your own) and pass that to RakePlus:\n$stopwords = StopwordsArray::create(['zero', 'z', 'you\\'ve', 'yourselves', ...]);\n$rake = RakePlus::create($text, $stopwords);\n```\n\n## Example 6\n\nYou can specify the minimum number of characters that a phrase\\keyword\nmust be and if less than the minimum it will be filtered out. The\ndefault is 0 (no minimum).\n\n```php\nuse DonatelloZa\\RakePlus\\RakePlus;\n\n$text = '6462 Little Crest Suite, 413 Lake Carlietown, WA 12643';\n\n// Without a minimum\n$phrases = RakePlus::create($text, 'en_US', 0)-\u003eget();\nprint_r($phrases);\n```\n\n```\nArray\n(\n    [0] =\u003e crest suite\n    [1] =\u003e 413 lake carlietown\n    [2] =\u003e wa 12643\n)\n```\n\n```php\n// With a minimum\n$phrases = RakePlus::create($text, 'en_US', 10)-\u003eget();\n\nprint_r($phrases);\n```\n\n```\nArray\n(\n    [0] =\u003e crest suite\n    [1] =\u003e 413 lake carlietown\n)\n```\n\n## Example 7\n\nYou can specify whether phrases\\keywords that consists of a numeric\nnumber only should be filtered out or not. The default is to filter out\nnumerics.\n\n```php\nuse DonatelloZa\\RakePlus\\RakePlus;\n\n$text = '6462 Little Crest Suite, 413 Lake Carlietown, WA 12643';\n\n// Filter out numerics\n$phrases = RakePlus::create($text, 'en_US', 0, true)-\u003eget();\nprint_r($phrases);\n```\n\n```Array\n(\n    [0] =\u003e crest suite\n    [1] =\u003e 413 lake carlietown\n    [2] =\u003e wa 12643\n)\n```\n\n```php\n// Do not filter out numerics\n$phrases = RakePlus::create($text, 'en_US', 0, false)-\u003eget();\n\nprint_r($phrases);\n```\n\n```\nArray\n(\n    [0] =\u003e 6462\n    [1] =\u003e crest suite\n    [2] =\u003e 413 lake carlietown\n    [3] =\u003e wa 12643\n)\n```\n\n## How to add additional languages\n\nThe library requires a list of \"stopwords\" for each language. Stopwords are common words used in a language such as \"and\", \"are\", \"or\", etc.\n\nThere are [stopwords for 50 languages](https://github.com/Donatello-za/stopwords-json#languages) (including the ones already supported) available in JSON format.\nIf you are lucky enough to have your language listed then you can easily import it into the library. To\ndo so, read the section below:\n\n**Using the stopwords extractor tool**\n\n\u003e Note: These instructions assumes you are using Linux\n\nWe will be using the Greek language as an example:\n\n1. Check to see if your operating have the Greek localisation files, the Greek locale\n   code you have to look for is: `el_GR`. So run the command `$ locale -a` to see if it is listed.\n2. If it is not listed, you'll need to create it, so run:\n\n```sh\nsudo locale-gen el_GR\nsudo locale-gen el_GR.utf8\n```\n\n3. Go the [list of stopword files](https://github.com/Donatello-za/stopwords-json#languages)  and\nfind the Greek language, the file will be called `el.json` and it will contain 75 stopwords.\n4. Download the `el.json` file and store it somewhere on your system.\n5. In you terminal, go to the directory of the `rake-php-plus` library, it will \n   be under `vendor/donatello-za/rake-php-plus` if you used Composer to install it.\n\nWe now need to use the JSON file to create two new files, one will be a `.php` file\nthat contains the stopwords as a PHP array and one fill be a `.pattern` file which\nis a text file containing the stopwords as a regular expression:\n\n1. Extract and convert the .json file to a PHP file by running:\n\n```sh\n$ php ./console/extractor.php path/to/el.json --locale=el_GR --output=php \u003e ./some/dir/el_GR.php\n```\n\n2. Extract and convert the .json file to a .pattern file by running:\n\n```sh\n$ php ./console/extractor.php path/to/el.json --locale=el_GR --output=pattern \u003e ./some/dir/el_GR.pattern\n```\n\nThat is it! You can now use the new stopwords by specifying it when creating an instance\nof the RakePlus class, for example:\n\n```php\n$rake = RakePlus::create($text, '/some/dir/el_GR.pattern');\n```\n\nor\n\n```php\n$rake = RakePlus::create($text, '/some/dir/el_GR.php');\n```\n\n**Contribute by Adding a Language**\n\nIf you want your language to be officially support, you can fork this library,\ngenerate the `.pattern` and `.php` stopword files as described above, place it\nin the `./rake-php-plus/lang/` directory and submit it as a pull request.\n\nOnce your language is officially supported, you'll be able to specify the language\nwithout having to specify the file to use, for example:\n\n```php\n$rake = RakePlus::create($text, 'el_GR');\n```\n\nRakePHP will always look for a `.pattern` file first and if not found it will \nlook for a `.php` file in the `./lang/` directory.\n\n**I don't have a stopwords file for my language, what now?**\n\nIf your language is not covered in the [list of 50 languages here](https://github.com/Donatello-za/stopwords-json#languages)\nyou may have to try and find it elsewhere, try searching for \"yourlanguage stopwords\". If you\nfind a list or decide to create your own list, you can also just place it in a standard text\nfile instead of a .json file and extract the stopwords using the extractor tool, for\nexample:\n\n```sh\n$ php ./console/extractor.php path/to/mystopwords.txt --locale=LOCAL_CODE --output=php \u003e ./some/dir/LOCAL_CODE.php\n$ php ./console/extractor.php path/to/mystopwords.txt --locale=LOCAL_CODE --output=php \u003e ./some/dir/LOCAL_CODE.php\n```\n\n*Remember to replace `LOCAL_CODE` for the correct local you wish to use.*\n\nHere is an example text file containing stopwords that was copied and pasted from a \nsite: [stopwords_en_US](./console/stopwords_en_US.txt)\n\n## To run tests\n\nUnit testing is performed using PHPUnit v11.2 running on PHP v8.3.0+.\n\n`./vendor/bin/phpunit tests`\n\n## License\n\nReleased under MIT license (read LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDonatello-za%2Frake-php-plus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FDonatello-za%2Frake-php-plus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDonatello-za%2Frake-php-plus/lists"}