https://github.com/flawed-hooman/text-summarisation

Summarise text using the various libraries available for Python: pyteaser, sumy, gensim, pytldr, XLNET, BERT, and GPT2.
https://github.com/flawed-hooman/text-summarisation

bert gensim gpt2 library natural-language-processing nlp pyteaser python pytldr sumy text-summarization xlnet

Last synced: over 1 year ago
JSON representation

Summarise text using the various libraries available for Python: pyteaser, sumy, gensim, pytldr, XLNET, BERT, and GPT2.

Host: GitHub
URL: https://github.com/flawed-hooman/text-summarisation
Owner: flawed-hooman
License: apache-2.0
Created: 2024-07-20T11:56:38.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-07-20T12:32:13.000Z (about 2 years ago)
Last Synced: 2024-07-20T13:23:44.904Z (about 2 years ago)
Topics: bert, gensim, gpt2, library, natural-language-processing, nlp, pyteaser, python, pytldr, sumy, text-summarization, xlnet
Language: Python
Homepage:
Size: 71.3 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Extractive text summarization
Various ways to summarise text using the libraries available for Python :
1. pyteaser
2. sumy
3. gensim
4. pytldr
5. XLNET
6. BERT
7. GPT2

## INSTALLATION
pip install sumy

pip install gensim

pip install pyteaser

pip install pytldr

pip install bert-extractive-summarizer

pip install spacy==2.0.12

pip install transformers==2.2.0

## Pyteaser
Pyteaser has two function:

Summarize: that takes title and text and summarizes them

SummarizeURL: that takes the url and summarizes the content of the url

## Sumy
Summy has various preprocessing libraries and summarizer libraries

sumytoken: for tokenizing the text

get_stop_words: to remove the stop words from the text

stemmer: to stemp the words

LexRankSummarizer: summarizes based on lexical ranking

LsaSummarizer: summarizes based on semantic

LuhnSummarizer: summarizes based on Luhn's algorithm

## Gensim
gensim has a summarize library which can be imported and used directly.

## pytldr
pytldr is also like sumy where they have various nlp libraries like tokenizer.

Here we have used TextRankSummarizer, RelevanceSummarzer, LsaSummarizer from pytldr

## XLNET
XLNet is an auto-regressive language model which outputs the joint probability of a sequence of tokens based on the transformer architecture with recurrence.

## BERT
Extractive Text summarization refers to extracting (summarizing) out the relevant information from a large document while retaining the most important information. BERT (Bidirectional Encoder Representations from Transformers) introduces rather advanced approach to perform NLP tasks.

## GPT2
GPT-2 model with 1.5 Billion parameters is a large transformer-based language model. It's trained for predicting the next word. So, we can use this specialty to summarize the data.

## Note:
Run main.py from "for_python3" folder while using python, else test by running "summarize.py" or the notebook named as "Text Summarizer Notebook.ipynb"

PS: pytldr and pyteaser doesn't work for python3

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/flawed-hooman/text-summarisation

Awesome Lists containing this project

README