https://github.com/lioccoumd/news-sentiment-analysis
This scrapes a variety of news stations websites for article titles and evaluates them using distilBERT pretrained model for sentiment analysis.
https://github.com/lioccoumd/news-sentiment-analysis
beautiful datasets distilbert news nltk numpy python pytorch reques scraping-websites sentiment-analysis transformers
Last synced: 6 months ago
JSON representation
This scrapes a variety of news stations websites for article titles and evaluates them using distilBERT pretrained model for sentiment analysis.
- Host: GitHub
- URL: https://github.com/lioccoumd/news-sentiment-analysis
- Owner: LIoccoUMD
- Created: 2025-02-28T00:19:24.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-03-09T20:19:52.000Z (8 months ago)
- Last Synced: 2025-03-09T21:23:01.862Z (8 months ago)
- Topics: beautiful, datasets, distilbert, news, nltk, numpy, python, pytorch, reques, scraping-websites, sentiment-analysis, transformers
- Language: Python
- Homepage:
- Size: 30.3 KB
- Stars: 1
- Watchers: 1
- Forks: 1
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# News Sentiment Analyzer
This Python project scrapes article titles from the news sitemaps of CNN and Fox News, performs sentiment analysis on the headlines, and provides a breakdown of positive and negative sentiments. Using web scraping and natural language processing (NLP) techniques, it offers insights into the emotional tone of recent news coverage from these two major outlets. Analysis is performed on current events so as the articles are uploaded the model adjusts.
## Features
- Scrapes article titles from XML sitemaps:
- CNN: `https://www.cnn.com/sitemap/news.xml`
- Fox News: `https://www.foxnews.com/sitemap.xml?type=news`
- BBC: `https://www.bbc.com/sitemaps/https-sitemap-com-news-2.xml`
- Reuters: `https://www.reuters.com/arc/outboundfeeds/news-sitemap/?outputType=xml`
- Performs sentiment analysis using the `distilbert-base-uncased-finetuned-sst-2-english` model from Hugging Face's Transformers library.
- Optimizes performance with batch processing and GPU support (CUDA) when available.
- Outputs the count of positive and negative headlines for each news source.## Technologies Used
- **Python Libraries**:
- `requests` for HTTP requests
- `BeautifulSoup` for XML parsing
- `transformers` for sentiment analysis
- `torch` for GPU acceleration
- `datasets` for efficient data handling
- `nltk` for tokenization
- `numpy` for numerical operations
- **NLP**: Pre-trained DistilBERT model for sentiment classification
- **Web Scraping**: XML parsing with BeautifulSoup and `lxml`## How It Works
1. Fetches XML sitemap data using `requests`.
2. Extracts article titles with `BeautifulSoup`.
3. Converts titles into a `Dataset` object for batch processing.
4. Runs sentiment analysis to classify titles as "POSITIVE" or "NEGATIVE" with confidence scores.
5. Summarizes results by counting positive and negative sentiments for each news source.## Installation
1. Clone the repository: ```git clone https://github.com/LIoccoUMD/news-sentiment-analyzer.git```
2. Navigate to the directory: ``` cd news-sentiment-analyzer ```
3. Install the required dependencies: ```pip install -r requirements.txt```## Example Output
```
CUDA availability: True
Device set to use cuda:0
Device set to use cuda:0
Device set to use cuda:0
Device set to use cuda:0
CNN Count of POSITIVE scores: 91
CNN Count of NEGATIVE scores: 185
FOX Count of POSITIVE scores: 99
FOX Count of NEGATIVE scores: 223
BBC Count of POSITIVE scores: 389
BBC Count of NEGATIVE scores: 608
Reuters Count of POSITIVE scores: 11
Reuters Count of NEGATIVE scores: 39
```