https://github.com/iamsherifcodes/suo-aws

A serverless application that automatically scrapes AWS blog announcements, categorizes and summarizes them using AI, and emails subscribed users based on their selected AWS service categories.
https://github.com/iamsherifcodes/suo-aws

ai aws aws-bedrock aws-lambda aws-sam generative-ai lamba serverless shell-script

Last synced: 3 months ago
JSON representation

A serverless application that automatically scrapes AWS blog announcements, categorizes and summarizes them using AI, and emails subscribed users based on their selected AWS service categories.

Host: GitHub
URL: https://github.com/iamsherifcodes/suo-aws
Owner: iAmSherifCodes
License: mit
Created: 2025-06-24T15:27:02.000Z (4 months ago)
Default Branch: main
Last Pushed: 2025-06-26T14:29:26.000Z (3 months ago)
Last Synced: 2025-06-26T14:46:22.611Z (3 months ago)
Topics: ai, aws, aws-bedrock, aws-lambda, aws-sam, generative-ai, lamba, serverless, shell-script
Language: Python
Homepage:
Size: 88.9 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# SUO-AWS: AWS News Subscription Platform

A serverless application that automatically scrapes AWS blog announcements, categorizes them using AI or URL-based logic, and emails subscribed users based on their selected AWS service categories.

## Architecture Diagram

![Architecture Diagram](https://github.com/user-attachments/assets/c8bb473d-1c67-44de-83f2-c3f7576a3ed5)

## 🏗️ Architecture Overview

SUO-AWS (Stay Updated On AWS) is built using a serverless architecture with the following AWS services:

- **AWS Lambda** - Three main functions: scraper, categorizer, and notifier.
- **Amazon DynamoDB** - Three tables for storing blog posts, user subscriptions, and categories
- **Amazon Bedrock** - Optional AI-powered categorization and summarization using Nova Pro model
- **Amazon S3** - Batch inference input/output storage
- **Amazon SNS** - Error notifications
- **API Gateway** - Direct integration with DynamoDB for user subscription management
- **Amazon Step Function** - Orchestrate functions workflow
- **Amazon EventBridge Scheduler** - Recurring, cron-based scheduler that processes SUO-AWS daily by triggering the step-function workflow every 24 hours.
- **Amazon Q** - Documentation and Scripting.

## Manaul Testing
Please refer to the README.md file
- **[./Manual_README.md](./Manual_README.md)** - Manual Testing guide

## 📁 Project Structure

> **Note:** The Project structure is the combination of all functions, codes and scripts in one folder for documentation purpose. The SAM template is only used for deploying the Notifier Lambda function.

```
suo-aws/
├── functions/
│ ├── scraper/ # Web scraping AWS blogs
│ │ ├── app.py # Main Lambda handler
│ │ ├── manual_extractor.py # Playwright-based scraper
│ │ └── requirements.txt
│ ├── categorizer/ # AI/URL-based categorization
│ │ ├── app.py # Main Lambda handler
│ │ ├── batch_inference.py # Bedrock batch processing
│ │ ├── url_categorizer.py # URL-based categorization
│ │ └── requirements.txt
│ ├── notifier/ # Email notifications
│ │ ├── app.js # Main Lambda handler (Node.js)
│ │ ├── mail_setup.js # Email configuration
│ │ └── package.json
│ └── subscription/ # User subscription API
│ └── app.js
├── template.yaml # SAM template
├── samconfig.toml # SAM configuration
└── README.md
```

## 🚀 Deployment

### Prerequisites

- AWS SAM CLI installed
- AWS CLI configured with appropriate credentials
- Python 3.10+ installed
- Node.js 18+ installed
- Docker

### Quick Start

1. **Clone the repository**
```bash
git clone https://github.com/iAmSherifCodes/suo-aws.git
cd suo-aws
```

2. **Install Playwright dependencies (for scraper)**
```bash
cd functions/scraper
pip install playwright
playwright install chromium
cd ../..
```
3. **Deploy the scraper function**
```bash
cd functions/scraper
chmod +x create_iam_role.sh
./create_iam_role.sh
chmod +x deploy.sh
./deploy.sh
```

4. **Build the SAM application**
```bash
sam build
```

5. **Deploy the application**
```bash
sam deploy --guided
```

6. **Deploy the subscription function directly from the Console(optional)**
```bash
cd functions/subscription
```

7. **Deploy the Categorizer Function directly from the Console**
```bash
cd functions/categorizer
```

### Environment Variables

The application uses the following environment variables:

| Variable | Description | Default |
|----------|-------------|---------|
| `POSTS_TABLE` | DynamoDB table for blog posts | - |
| `USERS_TABLE` | DynamoDB table for users | - |
| `CATEGORIES_TABLE` | DynamoDB table for categories | - |
| `GENAI_MODEL` | Enable AI categorization | `false` |
| `BEDROCK_MODEL_ID` | Bedrock model for AI | `amazon.nova-pro-v1:0` |
| `EMAIL_USER` | SMTP username | - |
| `EMAIL_PASS` | SMTP password | - |
| `FROM_EMAIL` | Sender email address | - |

## 🔧 How It Works

### 1. **Web Scraping (Scraper Function)**
- Uses Playwright to scrape AWS blogs
- Extracts posts for a specific date (default: previous day)
- Stores raw blog post data in DynamoDB
- Handles pagination and dynamic content loading

### 2. **Categorization (Categorizer Function)**
- **URL-based**: Extracts category from blog URL path
- **AI-powered(IAM issue: Account needed to raise a support ticket for CreateModelInvocationJob Authorization job )**: Uses Amazon Bedrock (Amazon Nova Pro Model) for intelligent categorization
- Supports batch processing for multiple posts using batch inference
- Updates posts with category information

### 3. **Notification (Notifier Function)**
- Retrieves categorized posts for a date
- Matches posts with user subscriptions
- Sends personalized emails using SMTP
- Handles error notifications via SNS

### 4. **Subscription Management**
- Direct API Gateway to DynamoDB integration
- RESTful API for user subscription management
- Input validation and CORS support

## 📊 Database Schema

### Posts Table (`suo-aws-posts`)
```json
{
"id": "string (UUID)",
"title": "string",
"url": "string",
"author": "string",
"date": "string (MM/DD/YYYY)",
"description": "string",
"category": "string",
"summary": "string (optional)",
"processed": "boolean"
}
```

### Users Table (`aws-suo-users`)
```json
{
"id": "string (UUID)",
"email": "string",
"name": "string",
"categories": ["string"],
"active": "boolean",
"created_at": "string"
}
```

### Categories Table (`suo-categories`)
```json
{
"id": "string (UUID)",
"date": "string (MM/DD/YYYY)",
"categories": ["string"]
}
```

## 🔌 API Usage

### Subscribe to Categories
```bash
curl -X POST https://pn9va5qd7k.execute-api.us-east-1.amazonaws.com/prod/subscribe \
-H "Content-Type: application/json" \
-d '{
"name": "John Doe",
"email": "john@example.com",
"categories": ["compute", "database", "machine-learning"]
}'
```

### Manual Function Invocation

**Scraper Function:**
```bash
aws lambda invoke \
--function-name suo-aws-scraper \
--payload '{"target_date": "06/25/2025"}' \
response.json
```

**Categorizer Function:**
```bash
aws lambda invoke \
--function-name suo-aws-categorizer \
--payload '{"target_date": "06/25/2025"}' \
response.json
```

**Notifier Function:**
```bash
aws lambda invoke \
--function-name suo-aws-notifier \
--payload '{}' \
response.json
```

## 🎯 AWS Service Categories

The system recognizes these AWS service categories:
- `architecture`, `mt`, `gametech`, `aws-insights`, `awsmarketplace`, `big-data`
- `compute`, `containers`, `database`, `desktop-and-application-streaming`, `developer`, `devops`, `mobile`
- `networking-and-content-delivery`, `opensource`,`machine-learning`, `media`, `quantum-computing`, `robotics`
- `awsforsap`, `security`, `spatial`, `startups`, `storage`, `supply-chain`, `training-and-certification`
- And many more ...

## 🔍 Monitoring & Troubleshooting

### CloudWatch Logs
- Each Lambda function creates its own log group
- Log levels can be controlled via `LOG_LEVEL` environment variable

### Error Handling
- SNS notifications for critical errors
- Comprehensive error logging
- Graceful degradation for non-critical failures

### Performance Optimization
- DynamoDB on-demand billing
- Lambda memory optimization (128MB default)
- Batch processing for AI operations

## 📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

## 🔗 Repository

**Public Repository URL:** https://github.com/iAmSherifCodes/suo-aws.git

---

*Built with ❤️ using AWS Serverless technologies*

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/iamsherifcodes/suo-aws

Awesome Lists containing this project

README