https://github.com/ta7ar/precis
Scrape news articles and summarize them using NLP
https://github.com/ta7ar/precis
flask javascript latent-semantic-analysis latent-semantic-indexing mongodb natural-language-processing nltk python react scikit-learn
Last synced: 3 months ago
JSON representation
Scrape news articles and summarize them using NLP
- Host: GitHub
- URL: https://github.com/ta7ar/precis
- Owner: Ta7ar
- Created: 2021-12-29T06:11:57.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-01-24T00:58:01.000Z (over 3 years ago)
- Last Synced: 2025-01-17T11:15:27.764Z (4 months ago)
- Topics: flask, javascript, latent-semantic-analysis, latent-semantic-indexing, mongodb, natural-language-processing, nltk, python, react, scikit-learn
- Language: Python
- Homepage: http://precis-news.herokuapp.com/
- Size: 1.03 MB
- Stars: 2
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Precis
Precis scrapes news articles from news sites and summarizes them based on Latent Semantic Analysis.
## Technical Stuff
* News articles are scraped using Python `BeautifulSoup4`
* Text preprocessing is done through Python `NLTK` where stop words and punctuations are removed and words are stemmed.
* Latent Semantic Analysis/ Indexing is carried out by first computing TF-IDF matrix from the corpus and then performing truncated SVD (Singular Value Decomposition) on the matrix.
Both TF-IDF and SVD computations are done through `Scikit-Learn`.
* Top sentences are selected through 'Cross Method.'
* The backend uses a `Flask` server with `MongoDB` that serves a `React` frontend.