https://github.com/novama/web-automator-js
Selenium and Playwright web-automation example in Javascript, with demo docker container setup for AWS lambda
https://github.com/novama/web-automator-js
aws-lambda docker javascript playwright playwright-automation playwright-javascript selenium selenium-webdriver
Last synced: about 2 months ago
JSON representation
Selenium and Playwright web-automation example in Javascript, with demo docker container setup for AWS lambda
- Host: GitHub
- URL: https://github.com/novama/web-automator-js
- Owner: novama
- License: mit
- Created: 2025-11-04T02:11:41.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-11-14T03:19:06.000Z (7 months ago)
- Last Synced: 2025-11-14T05:29:17.391Z (7 months ago)
- Topics: aws-lambda, docker, javascript, playwright, playwright-automation, playwright-javascript, selenium, selenium-webdriver
- Language: JavaScript
- Homepage:
- Size: 79.1 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Web Automator JS
[](https://opensource.org/licenses/MIT)
[](https://nodejs.org/)
[](https://www.docker.com/)
[](https://aws.amazon.com/lambda/)
A web automation framework supporting both **Selenium** and **Playwright**, with containerized **AWS Lambda** deployment capabilities. Perfect for web scraping, E2E testing, and browser automation at scale.
## Key Features
- **Dual Framework Support** - Choose between Selenium and Playwright
- **Docker Containerization** - Production-ready Lambda containers
- **Smart Environment Detection** - Automatic browser configuration
- **Health Monitoring** - Container health checks and auto-restart
- **Cross-Platform** - Works on Windows, macOS, and Linux
- **Serverless Ready** - Optimized for AWS Lambda deployment
- **Production Stable** - Comprehensive error handling and logging
## Architecture
```txt
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Lambda Event │───▶│ Handler (index) │───▶│ Browser Engine │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │
▼ ▼
┌──────────────────┐ ┌─────────────────┐
│ Environment │ │ Selenium / │
│ Detection │ │ Playwright │
└──────────────────┘ └─────────────────┘
```
## Quick Start
### Prerequisites
- **Node.js** 18+ (22.x recommended)
- **Docker** (for containerized deployment)
- **Git** (for cloning the repository)
### Installation
```bash
# Clone the repository
git clone https://github.com/your-org/web-automator-js.git
cd web-automator-js
# Install dependencies
npm install
# Setup automation drivers (choose one or both)
npm run setup:playwright # Install Playwright browsers
npm run setup:selenium # Install Selenium drivers
npm run setup:all # Install everything
```
### Basic Usage
#### Local Testing
```bash
# Test Playwright automation
npm run playwright
# Test Selenium automation
npm run selenium
# Test Lambda handler locally
npm run lambda
```
#### Docker Development
```bash
# Build and start containerized Lambda
npm run docker:build && npm run docker:start
# Test containerized Lambda
npm run docker:test
# Monitor container health
npm run docker:health
```
## Documentation
| Document | Description |
|----------|-------------|
| [Docker Lambda Setup](docs/DOCKER-LAMBDA-SETUP.md) | Complete containerization and deployment guide |
| [AWS Deployment](docs/AWS-DEPLOYMENT.md) | AWS Lambda deployment instructions |
| [Framework Comparison](docs/SELENIUM-VS-PLAYWRIGHT.md) | Selenium vs Playwright detailed comparison |
| [Container Status](docs/CONTAINER-STATUS.md) | Health monitoring and troubleshooting |
| [Lambda Handler](docs/LAMBDA-HANDLER.md) | Lambda function implementation guide |
## Project Structure
```text
web-automator-js/
├── src/
│ ├── automator/
│ │ ├── playwright/ # Playwright automation drivers
│ │ └── selenium/ # Selenium automation drivers
│ ├── common/utils/ # Shared utilities
│ └── examples/ # Example implementations
├── tests/
│ └── test-events/ # Lambda test event files
├── config/ # Configuration files
├── scripts/ # Setup and utility scripts
├── Dockerfile # Production container image
├── docker-compose.yml # Local development environment
├── package.json # Dependencies and npm scripts
└── index.js # Main Lambda handler
```
## Available Commands
### Development Commands
```bash
npm run setup # Check and install dependencies
npm run playwright # Run Playwright example
npm run selenium # Run Selenium example
npm run lambda # Test Lambda handler locally
npm run clean # Clean output directories
```
### Docker Commands
```bash
npm run docker:build # Build Lambda container
npm run docker:start # Start Lambda service
npm run docker:test # Test containerized Lambda
npm run docker:health # Check container health
npm run docker:monitor # Continuous health monitoring
npm run docker:restart # Smart container restart
npm run docker:logs # View container logs
npm run docker:stop # Stop all containers
npm run docker:cleanup # Clean up Docker resources
```
## Framework Comparison
Both frameworks are supported with smart environment detection:
| Aspect | Selenium | Playwright |
|--------|----------|------------|
| **Speed** | Good | Excellent |
| **Reliability** | Good | Excellent |
| **AWS Lambda** | Supported | Optimized |
| **Setup** | Manual | Automatic |
**[View Detailed Comparison →](docs/SELENIUM-VS-PLAYWRIGHT.md)**
## Configuration
### Environment Variables
```bash
# Browser Configuration
HEADLESS=true # Run browsers in headless mode
BROWSER_TYPE=chromium # Browser type (chromium/firefox/webkit)
# AWS Lambda Detection (auto-detected)
AWS_LAMBDA_FUNCTION_NAME # Lambda function name
AWS_EXECUTION_ENV # AWS execution environment
NODE_ENV # Application environment
# Development Options
DEBUG=true # Enable debug logging
TIMEOUT=30000 # Default timeout in milliseconds
```
### Custom Configuration
Create `config/config.json` for custom settings:
```json
{
"browser": {
"headless": true,
"timeout": 30000,
"viewport": {
"width": 1920,
"height": 1080
}
},
"lambda": {
"timeout": 30,
"memorySize": 1024
}
}
```
## Docker Deployment
### Local Development
```bash
# Start development environment
npm run docker:start
# Test with sample event
npm run docker:test
# Monitor health
npm run docker:monitor
```
### AWS Lambda Deployment
```bash
# Build production image
docker build -t web-automator-lambda .
# Tag for ECR
docker tag web-automator-lambda:latest {account}.dkr.ecr.{region}.amazonaws.com/web-automator:latest
# Deploy to Lambda (requires AWS CLI configured)
aws lambda update-function-code --function-name web-automator --image-uri {account}.dkr.ecr.{region}.amazonaws.com/web-automator:latest
```
**[Complete Deployment Guide →](docs/AWS-DEPLOYMENT.md)**
## Testing
### Unit Tests
```bash
npm test # Run test suite (when implemented)
```
### Integration Tests
```bash
npm run docker:test # Test containerized Lambda
npm run docker:test:basic # Basic functionality test
```
### Health Monitoring
```bash
npm run docker:health # One-time health check
npm run docker:monitor # Continuous monitoring
```
## Examples
### Basic Web Scraping
```javascript
const { playwrightDriver } = require('./src/automator/playwright/drivers/playwrightDriver');
async function scrapeTitle(url) {
const driver = new playwrightDriver();
await driver.initialize();
const page = await driver.browser.newPage();
await page.goto(url);
const title = await page.title();
await driver.cleanup();
return title;
}
```
### Lambda Handler Usage
```javascript
// Event format
const event = {
"url": "https://example.com",
"selector": "h1",
"action": "getText",
"timeout": 30000
};
// Response format
{
"statusCode": 200,
"body": {
"success": true,
"url": "https://example.com",
"title": "Example Domain",
"result": "Example Domain"
}
}
```
## Troubleshooting
### Common Issues
**Container crashes with segmentation fault:**
```bash
npm run docker:restart # Smart restart with health checks
npm run docker:logs # Check error details
```
**Browser not found in Lambda:**
- Ensure using `@sparticuz/chromium` package
- Check AWS environment detection
- Verify container image includes browsers
**Network timeouts:**
- Increase timeout values in configuration
- Check Lambda function timeout settings
- Verify network connectivity
**[Complete Troubleshooting Guide →](docs/CONTAINER-STATUS.md)**
## Performance
### AWS Lambda Metrics
- **Cold Start**: ~3-5 seconds (Playwright) / ~8-12 seconds (Selenium)
- **Execution**: ~2-4 seconds per page (Playwright) / ~5-8 seconds (Selenium)
- **Memory Usage**: 256MB+ (Playwright) / 512MB+ (Selenium)
- **Container Size**: ~250MB (Playwright) / ~1.5GB (Selenium)
### Optimization Tips
- Use Playwright for better Lambda performance
- Enable connection pooling for multiple requests
- Implement intelligent caching strategies
- Use appropriate Lambda memory allocation
## License
This project is licensed under the **MIT License** - see the [LICENSE](LICENSE) file for details.
## Acknowledgments
- **[Playwright Team](https://playwright.dev/)** - Modern web automation framework
- **[Selenium Project](https://selenium.dev/)** - Web automation standard
- **[@sparticuz/chromium](https://github.com/Sparticuz/chromium)** - Serverless Chromium builds
- **AWS Lambda Team** - Serverless compute platform