https://github.com/george-swift/expender-backend
Serverless AI-powered expense tracker with receipt scanning, multi-tier quotas, and real-time processing. Built on AWS
https://github.com/george-swift/expender-backend
ai aws expense-tracker graphql ocr python serverless terraform
Last synced: about 1 month ago
JSON representation
Serverless AI-powered expense tracker with receipt scanning, multi-tier quotas, and real-time processing. Built on AWS
- Host: GitHub
- URL: https://github.com/george-swift/expender-backend
- Owner: george-swift
- License: mit
- Created: 2025-07-31T23:29:24.000Z (11 months ago)
- Default Branch: feature/mvp
- Last Pushed: 2025-08-12T22:54:58.000Z (10 months ago)
- Last Synced: 2025-09-04T19:44:56.079Z (10 months ago)
- Topics: ai, aws, expense-tracker, graphql, ocr, python, serverless, terraform
- Language: Python
- Homepage: https://expender.vercel.app
- Size: 412 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.md
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# Expender - Serverless Expense Management Platform
A comprehensive serverless expense tracking and management platform built on AWS. Expender combines AI-powered receipt scanning (SmartScan) with intelligent expense categorization, multi-tier user quotas, and real-time processing capabilities.
## Features
### Core Functionality
- **📊 Expense Management**: Create, read, update, and delete individual expenses with full CRUD operations
- **🗂️ Batch Operations**: Process multiple expenses simultaneously for bulk import/export workflows
- **📈 Data Export**: Export expenses to CSV format with customizable field selection
- **🔍 Smart Filtering**: Query expenses by date ranges, categories, and other criteria using DynamoDB GSI
### SmartScan Technology
- **🤖 AI-Powered Receipt Scanning**: Automated receipt processing using AWS Textract for data extraction
- **🧠 Intelligent Categorization**: OpenAI GPT-powered expense categorization based on merchant and line items
- **📷 Multi-Format Support**: Process images (JPEG, PNG) and PDF receipts up to 5MB
- **💱 Multi-Currency Detection**: Automatic currency recognition across 18+ global currencies
- **⚡ Real-Time Processing**: GraphQL subscriptions for live SmartScan result updates
### User Management & Security
- **🔐 Clerk Authentication**: Secure user authentication and authorization via Clerk
- **📋 Multi-Tier Quotas**: Free (30 scans/month) and Pro (unlimited) plan management
- **🚫 Rate Limiting**: API Gateway throttling with tier-specific limits for optimal performance
- **🗂️ Data Isolation**: Complete user data separation with secure deletion workflows
### Infrastructure & Monitoring
- **⚡ Serverless Architecture**: 100% serverless with auto-scaling Lambda functions
- **📊 CloudWatch Integration**: Comprehensive logging and monitoring with correlation
- **🔄 Event-Driven Design**: EventBridge, DynamoDB Streams, and S3 triggers for reactive processing
- **🌍 Global CDN**: CloudFront distribution for optimized file delivery
- **🛡️ Production-Ready**: Comprehensive error handling and middleware for reliability
## Architecture Overview

_Comprehensive AWS serverless architecture for Expender, showing the complete data flow from user authentication, expense management, receipt processing and real-time notifications through to account lifecycle management. Created with Amazon Q_
### Serverless Components
**API Layer:**
- **AWS API Gateway**: RESTful API with Lambda authorization and comprehensive rate limiting
- **AWS Lambda**: Event-driven functions for business logic processing
- **AWS Chalice**: Python framework for rapid serverless API development
**Data Storage:**
- **DynamoDB**: NoSQL database with streams for real-time event processing
- `expenses`: Main expense records with date/category GSIs
- `smartscans`: Temporary SmartScan results with TTL
- `quotas`: User plan management and usage tracking
- **S3**: Secure file storage for receipt images and documents
- **CloudFront**: Global CDN for optimized file delivery
**AI & Processing:**
- **AWS Textract**: Automated document text and data extraction
- **OpenAI GPT**: Intelligent expense categorization with structured JSON responses
- **AppSync GraphQL**: Real-time subscriptions for SmartScan processing updates
**Event-Driven Architecture:**
- **EventBridge**: Custom event bus for user lifecycle management
- **DynamoDB Streams**: Real-time data change processing with filtering
- **Step Functions**: Orchestrated user data deletion workflows
- **S3 Event Notifications**: Automatic receipt processing triggers
**Monitoring & Security:**
- **AWS X-Ray**: Distributed tracing for performance monitoring
- **CloudWatch Logs**: Centralized logging with structured correlation
- **IAM**: Fine-grained permissions with least-privilege access
- **Clerk Webhooks**: Secure user lifecycle event handling
## Prerequisites
Before setting up Expender, ensure you have the following installed:
- **Python 3.12+**: Required for AWS Chalice and application code
- **AWS CLI**: Configured with appropriate IAM permissions
- **Terraform**: Version 1.0.0+ for infrastructure provisioning
- **AWS Chalice**: Python serverless framework (`pip install chalice`)
- **Node.js**: For integration with [the frontend](https://github.com/george-swift/expender) (optional)
### Required AWS Services Access
- Lambda, API Gateway, DynamoDB, S3, CloudFront
- Textract, EventBridge, Step Functions, AppSync
- IAM, CloudWatch, X-Ray
### External Service Requirements
- **Clerk Account**: For user authentication ([clerk.com](https://clerk.com/))
- **OpenAI API Key**: For AI-powered categorization ([openai.com](https://openai.com/))
## Installation
### 1. Clone the Repository
```bash
git clone https://github.com/george-swift/expender-backend.git
cd expender-backend
```
### 2. Set Up Python Environment
```bash
# Create virtual environment
python3 -m venv venv
# Activate virtual environment
source venv/bin/activate # On macOS/Linux
# OR
venv\Scripts\activate # On Windows
# Install dependencies
pip install -r requirements.txt
pip install -r requirements-dev.txt
```
### 3. Install Terraform
```bash
# macOS (using Homebrew)
brew install terraform
# Or download from: https://www.terraform.io/downloads
```
### 4. Configure AWS CLI
```bash
aws configure
# Provide your AWS Access Key ID, Secret, and default region
```
## Configuration
### Environment Variables
Create Terraform variable files for each environment:
#### `vars_dev.tfvars`
```hcl
# AWS Configuration
aws_region = "us-east-1"
environment = "dev"
# Frontend URLs
frontend_app_url = "https://dev.app.url.here"
frontend_dev_app_url = "http://localhost:PORT"
# External Service Keys (use AWS Secrets Manager in production)
clerk_secret_key = "your-clerk-secret-key"
clerk_webhook_signing_secret = "your-clerk-webhook-secret"
openai_api_key = "your-openai-api-key"
smartscan_encryption_key = "your-32-character-encryption-key"
```
#### `vars_prod.tfvars`
```hcl
# AWS Configuration
aws_region = "us-east-1"
environment = "prod"
# Frontend URLs
frontend_app_url = "https://prod.app.url.here"
frontend_dev_app_url = "https://dev.app.url.here"
# External Service Keys (AWS Secrets Manager preferably)
clerk_secret_key = "your-production-clerk-secret-key"
clerk_webhook_signing_secret = "your-production-clerk-webhook-secret"
openai_api_key = "your-production-openai-api-key"
smartscan_encryption_key = "your-production-32-character-encryption-key"
```
### Chalice Configuration
The application uses Chalice's automatic configuration. See full settings in `.chalice/config.json`:
```json
{
"version": "2.0",
"app_name": "expender",
"manage_iam_role": false,
"iam_role_arn": "${aws_iam_role.api_service_role.arn}",
"automatic_layer": true,
"xray": true,
"stages": {
"dev": {},
"prod": {}
}
}
```
### Security Configuration
**Environment Variables (Never commit to Git):**
- Store sensitive values in `*.tfvars` files (excluded by `.gitignore`)
- Use AWS Secrets Manager for production environments (PREFERRED)
- Rotate keys regularly and follow security best practices
## Deployment
### Development Environment
Deploy to development using the provided Makefile:
```bash
# Deploy to development environment
make deploy-dev
```
This command:
1. Formats Python code with Black
2. Packages the Chalice application for Terraform
3. Applies infrastructure patches for multi-environment compatibility
4. Deploys infrastructure using Terraform
### Production Environment
```bash
# Deploy to production environment
make deploy-prod
```
### Manual Deployment Steps
If you prefer manual deployment:
```bash
# 1. Format code
black .
# 2. Package Chalice application
chalice package --pkg-format terraform . --stage dev
# 3. Patch Terraform configuration
python3 scripts/patch_chalice_tf.py
# 4. Initialize Terraform (first time only)
terraform init
# 5. Plan deployment
terraform plan -var-file=vars_dev.tfvars
# 6. Apply infrastructure
terraform apply -var-file=vars_dev.tfvars
```
### Clean Up Resources
```bash
# Remove generated files
make clean
# Destroy infrastructure (be careful!)
terraform destroy -var-file=vars_dev.tfvars
```
## Usage
### API Endpoints
The API is documented in the Swagger specification (`expender-dev-swagger-postman.json`). Key endpoints include:
#### Expense Management
```bash
# List user expenses
GET /expenses
Authorization: {clerk-jwt-token}
# Create single expense
POST /expenses
Authorization: {clerk-jwt-token}
Content-Type: application/json
{
"amount": 25.99,
"merchant": "Coffee Shop",
"category": "Meals and Entertainment",
"date": "2025-08-07",
"currency": "USD",
"description": "Team meeting coffee"
}
# Batch create expenses
POST /expenses/batch
Authorization: {clerk-jwt-token}
Content-Type: application/json
{
"expenses": [
{
"amount": 15.50,
"currency": "USD",
"merchant": "Amazing Lunch Place",
"category": "Meals and Entertainment"
},
{
"amount": 8.99,
"currency": "USD",
"merchant": "Awesome Office Supplies",
"category": "Office Supplies"
}
]
}
# Update expense
PUT /expenses/{expense_id}
Authorization: {clerk-jwt-token}
Content-Type: application/json
{
"amount": 27.99,
"category": "Professional Services"
}
# Delete expense
DELETE /expenses/{expense_id}
Authorization: {clerk-jwt-token}
# Export expenses to CSV
POST /expenses/export
Authorization: {clerk-jwt-token}
Content-Type: application/json
{
"format": "csv",
"attributes": ["date", "merchant", "amount", "category"]
}
```
#### SmartScan Operations
```bash
# Initiate SmartScan (get upload URL)
POST /smartscans
Authorization: {clerk-jwt-token}
Content-Type: application/json
{
"contentType": "image/jpeg"
}
# Response includes presigned S3 upload URL and fields
# Upload receipt image using the provided URL and fields
# Results will be available via GraphQL subscription
```
#### User Quotas
```bash
# Check user quota and usage
GET /quotas
Authorization: {clerk-jwt-token}
```
### GraphQL API (SmartScan Results)
```graphql
# Subscribe to SmartScan results
subscription GetSmartScanResult($userId: String!, $scanId: String!) {
getSmartScanResult(userId: $userId, scanId: $scanId) {
userId
scanId
result {
merchant
amount
currency
date
category
confidence
lineItems {
description
amount
quantity
}
}
objectKey
createdAt
}
}
```
### Rate Limits
The API implements tiered rate limiting:
**Free Tier Users:**
- 1,000 requests/day
- 10 requests/second (burst: 20)
- 30 SmartScans/month
**Pro Tier Users:**
- 10,000 requests/day
- 50 requests/second (burst: 100)
- Unlimited SmartScans
**Endpoint-Specific Limits:**
- SmartScan: 5-10 requests/second (resource intensive)
- Export: 2-5 requests/second (memory intensive)
- Webhooks: 50-100 requests/second (external callbacks)
## Troubleshooting
### Common Issues
#### 1. Deployment Failures
**CloudWatch Logs Role Error:**
```
Error: BadRequestException: CloudWatch Logs role ARN must be set
```
**Solution:** The deployment automatically creates the required CloudWatch role. Ensure your AWS account has API Gateway permissions.
#### 2. Authentication Issues
**Unauthorized API Calls:**
```json
{
"message": "Unauthorized"
}
```
**Solution:** Ensure you're passing a valid Clerk JWT token in the Authorization header.
#### 3. SmartScan Processing
**Textract Analysis Failures:**
- Ensure uploaded files are under 5MB
- Supported formats: JPEG, PNG, PDF
- Check file is properly uploaded to S3 before processing
**AI Categorization Issues:**
- Verify OpenAI API key is valid and has sufficient credits
- Check CloudWatch logs for detailed error messages
#### 4. Rate Limiting
**Too Many Requests (429):**
- Check your user tier and current usage
- Implement exponential backoff in client applications
- Consider upgrading to Pro tier for higher limits
### Debug Tools
**CloudWatch Logs:**
```bash
# View API logs
aws logs tail /aws/lambda/expender-dev --follow
# View specific function logs
aws logs tail /aws/lambda/expender-dev-bucket_event_handler --follow
```
**X-Ray Tracing:**
- Visit AWS X-Ray console for distributed trace analysis
- Track request flows across Lambda functions and AWS services
**DynamoDB Monitoring:**
- Monitor table metrics in CloudWatch
- Check stream processing in DynamoDB console
## Contributing
Contributions to Expender are welcome! Please see [Contributing Guidelines](./CONTRIBUTING.md) for detailed information about:
- Code of conduct
- Development workflow
- Pull request process
- Coding standards and best practices
- Testing requirements
## License
This project is licensed under the terms specified in the [LICENSE.md](./LICENSE.md) file. Please review the license before using or contributing to this project.
## Contact Information & Support
### Getting Help
**🐛 Bug Reports & Feature Requests:**
- [Open an issue](https://github.com/george-swift/expender-backend/issues) on GitHub
- Include detailed reproduction steps and environment information as outlined in [Contributing Guidelines](./CONTRIBUTING.md)
**📧 Direct Support:**
- Email: [support@expender.app](mailto:support@expender.app)
- Response time: 24-48 hours for general inquiries
**📚 Documentation:**
- API Documentation: Import `expender-swagger-postman.json` in Postman to see the collection.
- Architecture Details: See inline code documentation and comments
- Infrastructure: Comprehensive Terraform module documentation