Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/davidbelicza/php-science-textrank
:zap: :elephant: TextRank (resource-efficient and low-cost automatic text summarisation) for PHP
https://github.com/davidbelicza/php-science-textrank
ai airtificialintelligence algorithm php science search summarization textrank
Last synced: about 1 month ago
JSON representation
:zap: :elephant: TextRank (resource-efficient and low-cost automatic text summarisation) for PHP
- Host: GitHub
- URL: https://github.com/davidbelicza/php-science-textrank
- Owner: DavidBelicza
- License: mit
- Created: 2016-08-05T19:22:06.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2023-12-29T15:31:17.000Z (11 months ago)
- Last Synced: 2024-05-17T19:02:35.481Z (6 months ago)
- Topics: ai, airtificialintelligence, algorithm, php, science, search, summarization, textrank
- Language: PHP
- Homepage: https://php.science/textrank/
- Size: 129 KB
- Stars: 240
- Watchers: 13
- Forks: 39
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
TextRank
This source code is an implementation of TextRank algorithm in PHP programming language, under MIT licence.
# TextRank vs. ChatGPT
GPTs like ChatGPT are supervised language models that understand the context and generate new content from the given
input using vast resources while TextRank is a cost-efficient/low-cost text extraction algorithm. TextRank algorithm
also can be used as a pre-processor to a GPT model to reduce the text size to save on resource consumption.# TextRank or Automatic summarization
> Automatic summarization is the process of reducing a text document with a computer program in order to create a summary that retains the most important points of the original document. Technologies that can make a coherent summary take into account variables such as length, writing style and syntax. Automatic data summarization is part of machine learning and data mining. The main idea of summarization is to find a representative subset of the data, which contains the information of the entire set. Summarization technologies are used in a large number of sectors in industry today. - WikipediaThe algorithm of this implementation is:
* Extracts sentences,
* Removes stopwords,
* Adds integer values to words by finding and counting the matching words,
* Weights the values of the words,
* Normalizes values to get the scores,
* Sorts by scores# Install to use it in your project
```
cd your-project-folder
composer require php-science/textrank
```# Install for contributing
```
cd git-project-folder
docker-compose build
docker-compose up -d
composer install
composer test
```# Examples
```phpuse PhpScience\TextRank\Tool\StopWords\English;
// String contains a long text, see the /res/sample1.txt file.
$text = "Lorem ipsum...";$api = new TextRankFacade();
// English implementation for stopwords/junk words:
$stopWords = new English();
$api->setStopWords($stopWords);// Array of the most important keywords:
$result = $api->getOnlyKeyWords($text);// Array of the sentences from the most important part of the text:
$result = $api->getHighlights($text);// Array of the most important sentences from the text:
$result = $api->summarizeTextBasic($text);
```
More examples:
* [tests/TextRankFacadeTest.php](https://github.com/DavidBelicza/PHP-Science-TextRank/blob/master/tests/TextRankFacadeTest.php)
* https://php.science# Authors, Contributors
Name | GitHub user
--- | ---
David Belicza | @DavidBelicza
Riccardo Marton | @riccardomarton
Syndesi | @Syndesi
vincentsch | @vincentsch
Andrew Welch | @khalwat
Andrey Astashov | @mvcaaa
Leo Toneff | @bragle
Willy Arisky | @willyarisky
Robert-Jan Keizer | @KeizerDev
Morty | @evil1morty
Sezer Fidancı | @SezerFidanci