https://github.com/razamehar/news-aggregator-and-categorizer-with-kafka-and-openai
https://github.com/razamehar/news-aggregator-and-categorizer-with-kafka-and-openai
apache-kafka langchain large-language-models openai postgresql python
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/razamehar/news-aggregator-and-categorizer-with-kafka-and-openai
- Owner: razamehar
- License: mit
- Created: 2025-03-17T19:13:54.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2025-03-17T19:22:00.000Z (2 months ago)
- Last Synced: 2025-03-17T20:29:56.852Z (2 months ago)
- Topics: apache-kafka, langchain, large-language-models, openai, postgresql, python
- Language: PowerShell
- Homepage:
- Size: 3.59 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# News Aggregator and Categorizer with Kafka and OpenAI
This project is a news aggregation and categorization system that pulls news from multiple sources using Google RSS feeds. The aggregated news is sent to **Apache Kafka** via a producer. Afterward, **OpenAI** is utilized to categorize each news article into predefined topics, including:
- **General**
- **Politics**
- **Technology**
- **Sports**
- **Entertainment**The categorized news articles are then consumed by a Kafka consumer and stored in a **PostgreSQL** database for easy retrieval and analysis.
## Features
- **News Aggregation**: Pulls news from various sources using Google RSS feeds.
- **Real-time Categorization**: Categorizes news articles into topics using OpenAI.
- **Kafka Integration**: News articles are sent to Kafka for streaming and processing.
- **Database Storage**: Categorized articles are stored in a PostgreSQL database for efficient querying.
## Use Cases- **Journalists**: Helps journalists to quickly access categorized news from various sources.
- **Researchers**: Enables researchers to analyze news trends and topics based on categorized data.## Requirements
Before setting up the project, ensure you have the following installed:
- **PostgreSQL**: Database to store the categorized news articles.
- **Docker Desktop**: For running Kafka and other containerized services locally.## Setup Instructions
1. **Create a Virtual Environment**
Start by creating a virtual environment for managing dependencies:
```bash
python -m venv venv
```
2. **Set Up Kafka**
Follow the instructions in the Confluent Kafka Setup Guide to install and configure Kafka on your machine.3. **Install Required Dependencies**
After Kafka setup, install all required Python packages listed in the requirements.txt:
```bash
pip install -r requirements.txt
```
## License
This project is licensed under the Raza Mehar License. For further details, refer to the LICENSE.md file.## Contact
If you have any questions or need clarification, feel free to reach out to Raza Mehar at [email protected].