An open API service indexing awesome lists of open source software.

https://github.com/flawed-hooman/text-summarisation

Summarise text using the various libraries available for Python: pyteaser, sumy, gensim, pytldr, XLNET, BERT, and GPT2.
https://github.com/flawed-hooman/text-summarisation

bert gensim gpt2 library natural-language-processing nlp pyteaser python pytldr sumy text-summarization xlnet

Last synced: about 1 year ago
JSON representation

Summarise text using the various libraries available for Python: pyteaser, sumy, gensim, pytldr, XLNET, BERT, and GPT2.

Awesome Lists containing this project

README

          

# Extractive text summarization
Various ways to summarise text using the libraries available for Python :
1. pyteaser
2. sumy
3. gensim
4. pytldr
5. XLNET
6. BERT
7. GPT2

## INSTALLATION
pip install sumy

pip install gensim

pip install pyteaser

pip install pytldr

pip install bert-extractive-summarizer

pip install spacy==2.0.12

pip install transformers==2.2.0



## Pyteaser
Pyteaser has two function:

Summarize: that takes title and text and summarizes them

SummarizeURL: that takes the url and summarizes the content of the url


## Sumy
Summy has various preprocessing libraries and summarizer libraries

sumytoken: for tokenizing the text

get_stop_words: to remove the stop words from the text

stemmer: to stemp the words

LexRankSummarizer: summarizes based on lexical ranking

LsaSummarizer: summarizes based on semantic

LuhnSummarizer: summarizes based on Luhn's algorithm

## Gensim
gensim has a summarize library which can be imported and used directly.

## pytldr
pytldr is also like sumy where they have various nlp libraries like tokenizer.

Here we have used TextRankSummarizer, RelevanceSummarzer, LsaSummarizer from pytldr

## XLNET
XLNet is an auto-regressive language model which outputs the joint probability of a sequence of tokens based on the transformer architecture with recurrence.

## BERT
Extractive Text summarization refers to extracting (summarizing) out the relevant information from a large document while retaining the most important information. BERT (Bidirectional Encoder Representations from Transformers) introduces rather advanced approach to perform NLP tasks.

## GPT2
GPT-2 model with 1.5 Billion parameters is a large transformer-based language model. It's trained for predicting the next word. So, we can use this specialty to summarize the data.

## Note:
Run main.py from "for_python3" folder while using python, else test by running "summarize.py" or the notebook named as "Text Summarizer Notebook.ipynb"

PS: pytldr and pyteaser doesn't work for python3