https://github.com/gabya06/content_summarization
AI-powered Streamlit app for summarizing Reddit posts and comparing the outputs of two transformer models: BART and T5-Large
https://github.com/gabya06/content_summarization
ai google-cloud llm python streamlt text-summarizer
Last synced: 7 months ago
JSON representation
AI-powered Streamlit app for summarizing Reddit posts and comparing the outputs of two transformer models: BART and T5-Large
- Host: GitHub
- URL: https://github.com/gabya06/content_summarization
- Owner: Gabya06
- Created: 2025-06-10T02:28:17.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-07-01T22:05:46.000Z (9 months ago)
- Last Synced: 2025-07-01T23:20:23.941Z (9 months ago)
- Topics: ai, google-cloud, llm, python, streamlt, text-summarizer
- Language: Python
- Homepage: https://content-summarization-app-61465655650.us-east1.run.app
- Size: 646 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# AI Transformer Model Comparison: Reddit Post Summarization
This is an **AI-powered Streamlit app** for summarizing Reddit posts and comparing the outputs of two transformer models: **BART** and **T5-Large**. It uses Google Gemini to generate a concise summary and highlight differences between the model-generated summaries.
* To learn more about the Hugging Face pipeline, `BART` and `T5-Large` transfomer models, you can follow along on [Medium.com](https://medium.com/@gabya06/t5-vs-bart-the-battle-of-the-summarization-models-c1e6d37e56ca).
* The Reddit API was used to generate the data for this repo; you can read about it [here](https://medium.com/@gabya06/automating-reddit-summaries-pulling-data-with-python-91afeb6acdb3).
---
## Features
- **Upload CSVs** with `BART` and `T5-Large` summaries for Reddit posts
- **AI-generated summary & comparison** using Google Gemini
- **Visualize sentiment** (with emoji) and number of comments (progress bar)
- **Interactive UI** built with Streamlit
- **Batch processing** of multiple posts
- **Docker-ready** for easy deployment
---
## Project Structure
```
content_summarization/
│
├── app.py # Streamlit app entry point
├── summarize.py # Summarization logic and ContentSummary class
├── requirements.txt # Python dependencies
├── Dockerfile # Docker build instructions
├── .env.example # Example environment variables
├── README.md # Project documentation
│
├── data/ # Sample data
│
└── tests/ # (TODO) Unit tests
```
---
## Setup
### 1. Clone the repository
```bash
git clone https://github.com/gabya06/content_summarization.git
cd content_summarization
```
### 2. Install dependencies
```bash
pip install -r requirements.txt
```
### 3. Set up environment variables
Copy `.env.example` to `.env` and add your [Google Gemini API key](https://aistudio.google.com/app/apikey):
```
GEMINI_API_KEY=your-key-here
```
---
## Usage
### Run locally
```bash
streamlit run app.py
```
- Upload your BART and T5-Large CSV files in the sidebar.
- Select the number of posts to summarize.
- Click **Summarize and Compare** to view results.
### CSV Format
Your CSVs should have at least these columns:
- `title`
- `cleaned_text`
- `summary_bart` (for BART CSV)
- `summary_t5` (for T5-Large CSV)
- `sentiment`
- `num_comments`
---
## Docker Deployment
### Build and run with Docker
```bash
docker build -t content-summarization .
docker run -p 8501:8501 --env-file .env content-summarization
```
Then open [http://localhost:8501](http://localhost:8501) in your browser.
---
## Google Cloud
Check out the app in [Google Cloud!](https://content-summarization-app-61465655650.us-east1.run.app)
## Customization
- Edit `summarize.py` to change prompt logic or add more models.
- Tweak `app.py` for UI changes or new visualizations.
---
## Acknowledgements
- [Google Gemini](https://aistudio.google.com/)
- [Streamlit](https://streamlit.io/)
- [Hugging Face Transformers](https://huggingface.co/transformers/)
---