https://github.com/flawed-hooman/text-summarisation
Summarise text using the various libraries available for Python: pyteaser, sumy, gensim, pytldr, XLNET, BERT, and GPT2.
https://github.com/flawed-hooman/text-summarisation
bert gensim gpt2 library natural-language-processing nlp pyteaser python pytldr sumy text-summarization xlnet
Last synced: about 1 year ago
JSON representation
Summarise text using the various libraries available for Python: pyteaser, sumy, gensim, pytldr, XLNET, BERT, and GPT2.
- Host: GitHub
- URL: https://github.com/flawed-hooman/text-summarisation
- Owner: flawed-hooman
- License: apache-2.0
- Created: 2024-07-20T11:56:38.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-07-20T12:32:13.000Z (almost 2 years ago)
- Last Synced: 2024-07-20T13:23:44.904Z (almost 2 years ago)
- Topics: bert, gensim, gpt2, library, natural-language-processing, nlp, pyteaser, python, pytldr, sumy, text-summarization, xlnet
- Language: Python
- Homepage:
- Size: 71.3 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Extractive text summarization
Various ways to summarise text using the libraries available for Python :
1. pyteaser
2. sumy
3. gensim
4. pytldr
5. XLNET
6. BERT
7. GPT2
## INSTALLATION
pip install sumy
pip install gensim
pip install pyteaser
pip install pytldr
pip install bert-extractive-summarizer
pip install spacy==2.0.12
pip install transformers==2.2.0
## Pyteaser
Pyteaser has two function:
Summarize: that takes title and text and summarizes them
SummarizeURL: that takes the url and summarizes the content of the url
## Sumy
Summy has various preprocessing libraries and summarizer libraries
sumytoken: for tokenizing the text
get_stop_words: to remove the stop words from the text
stemmer: to stemp the words
LexRankSummarizer: summarizes based on lexical ranking
LsaSummarizer: summarizes based on semantic
LuhnSummarizer: summarizes based on Luhn's algorithm
## Gensim
gensim has a summarize library which can be imported and used directly.
## pytldr
pytldr is also like sumy where they have various nlp libraries like tokenizer.
Here we have used TextRankSummarizer, RelevanceSummarzer, LsaSummarizer from pytldr
## XLNET
XLNet is an auto-regressive language model which outputs the joint probability of a sequence of tokens based on the transformer architecture with recurrence.
## BERT
Extractive Text summarization refers to extracting (summarizing) out the relevant information from a large document while retaining the most important information. BERT (Bidirectional Encoder Representations from Transformers) introduces rather advanced approach to perform NLP tasks.
## GPT2
GPT-2 model with 1.5 Billion parameters is a large transformer-based language model. It's trained for predicting the next word. So, we can use this specialty to summarize the data.
## Note:
Run main.py from "for_python3" folder while using python, else test by running "summarize.py" or the notebook named as "Text Summarizer Notebook.ipynb"
PS: pytldr and pyteaser doesn't work for python3