https://github.com/devsenweb/ai-news-aggregator-app
A Python-powered news aggregation system that collects articles from RSS feeds, classifies them into topics using semantic similarity, summarizes content, deduplicates similar articles, and stores everything in Firebase Firestore with timeline organization.
https://github.com/devsenweb/ai-news-aggregator-app
Last synced: 2 months ago
JSON representation
A Python-powered news aggregation system that collects articles from RSS feeds, classifies them into topics using semantic similarity, summarizes content, deduplicates similar articles, and stores everything in Firebase Firestore with timeline organization.
- Host: GitHub
- URL: https://github.com/devsenweb/ai-news-aggregator-app
- Owner: devsenweb
- Created: 2025-05-13T17:59:01.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-05-13T22:10:14.000Z (about 1 year ago)
- Last Synced: 2026-05-03T10:47:33.915Z (2 months ago)
- Language: Python
- Homepage:
- Size: 18.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# AI News Aggregator
A Python-based news aggregation system that fetches, classifies, and organizes news articles into topic-based timelines, storing them in Firebase Firestore.
## Features
- Fetches news articles from multiple RSS feeds
- Classifies articles into topics using semantic similarity
- Removes duplicate or near-duplicate articles
- Generates concise summaries of articles
- Organizes articles into chronological timelines by topic
- Stores results in Firebase Firestore with a structured schema
- Command-line interface for easy execution
## Installation
1. Clone the repository:
```bash
git clone https://github.com/yourusername/ai-news-aggregator.git
cd ai-news-aggregator
```
2. Create and activate a virtual environment:
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
3. Install the dependencies:
```bash
pip install -r requirements.txt
```
4. Set up environment variables:
- Copy `.env.example` to `.env`
- Update the values in `.env` with your Firebase credentials and other settings
## Firebase Setup
1. Create a new Firebase project at [Firebase Console](https://console.firebase.google.com/)
2. Enable Firestore Database
3. Go to Project Settings > Service Accounts
4. Generate a new private key and save it as `firebase-credentials.json` in the project root
5. Update the `FIREBASE_CREDENTIALS_PATH` in `.env` to point to this file
6. Get your Firebase database URL from Project Settings > General > Your Apps > Firebase SDK snippet
## Usage
### Basic Usage
```bash
python -m ai_news_aggregator.cli run \
--rss-feeds "https://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml,http://feeds.bbci.co.uk/news/rss.xml" \
--firebase-credentials path/to/your/firebase-credentials.json \
--firebase-db-url "https://your-project-id.firebaseio.com"
```
### Command Line Options
```
Options:
--rss-feeds TEXT Comma-separated list of RSS feed URLs [required]
--firebase-credentials PATH
Path to Firebase credentials JSON file [required]
--firebase-db-url TEXT Firebase database URL [required]
--max-articles INTEGER Maximum number of articles to process [default: 50]
--similarity-threshold FLOAT
Similarity threshold for topic classification (0.0 to 1.0) [default: 0.75]
--dedupe-threshold FLOAT Similarity threshold for deduplication (0.0 to 1.0) [default: 0.85]
--summary-length INTEGER Maximum length of article summaries [default: 150]
--dry-run Process articles but do not upload to Firebase [default: False]
--help Show this message and exit.
```
### Dry Run
To test the aggregator without uploading to Firebase:
```bash
python -m ai_news_aggregator.cli run \
--rss-feeds "https://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml" \
--firebase-credentials path/to/your/firebase-credentials.json \
--firebase-db-url "https://your-project-id.firebaseio.com" \
--dry-run
```
## Project Structure
```
ai-news-aggregator/
├── ai_news_aggregator/
│ ├── __init__.py
│ ├── cli.py # Command-line interface
│ ├── deduplicator.py # Article deduplication logic
│ ├── firebase_service.py # Firebase Firestore interactions
│ ├── news_fetcher.py # RSS feed fetching and parsing
│ ├── summarizer.py # Article summarization
│ └── topic_classifier.py # Topic classification
├── tests/ # Unit tests
├── .env.example # Example environment variables
├── .gitignore
├── README.md
└── requirements.txt # Python dependencies
```
## Configuration
Edit the `.env` file to configure the application:
- `FIREBASE_CREDENTIALS_PATH`: Path to your Firebase service account JSON file
- `FIREBASE_DATABASE_URL`: Your Firebase database URL
- `RSS_FEEDS`: Comma-separated list of RSS feed URLs
- `SIMILARITY_THRESHOLD`: Threshold for topic classification (0.0 to 1.0)
- `SUMMARY_MAX_LENGTH`: Maximum length of generated summaries
## License
MIT License - see the [LICENSE](LICENSE) file for details.