https://github.com/appugouda-architect/ai-webscraper-agent
AI-powered web-scraping agent that uses the Brightdata MCP Server, Fast API, Python, Streamlit UI, Anthropic LLM
https://github.com/appugouda-architect/ai-webscraper-agent
anthropic fastapi mcp python streamlit
Last synced: 12 months ago
JSON representation
AI-powered web-scraping agent that uses the Brightdata MCP Server, Fast API, Python, Streamlit UI, Anthropic LLM
- Host: GitHub
- URL: https://github.com/appugouda-architect/ai-webscraper-agent
- Owner: appugouda-architect
- Created: 2025-06-21T04:46:58.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-24T07:00:36.000Z (about 1 year ago)
- Last Synced: 2025-06-30T10:47:34.760Z (12 months ago)
- Topics: anthropic, fastapi, mcp, python, streamlit
- Language: Python
- Homepage:
- Size: 49.8 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# πΈοΈ AI Webscraper Agent
An AI-powered webscraping agent that uses the Brightdata MCP server to extract and summarize content from the web. Built with a modular architecture combining LLM reasoning, robust scraping, and a simple web interface.
---
## π§ Tech Stack
- **Frontend**: [Streamlit](https://streamlit.io/)
- **Backend**: [FastAPI](https://fastapi.tiangolo.com/)
- **Language**: Python
- **Scraping**: [Brightdata MCP Server](https://brightdata.com/)
- **AI Model**: Anthropic LLM (Claude)
---
## π Features
- Natural language interface to extract data from websites
- Uses Brightdata MCP for reliable web scraping
- LLM-powered summarization and reasoning
- Streamlit-based interactive frontend
- Async FastAPI backend integration
---
## Environment Variables
Create a .env file and configure the following:
```dotenv
# .env
# Environment Variables for AI Webscraper Agent
# Replace 'your_key_here' with your actual API keys
# Bright Data
API_TOKEN=your_key_here
WEB_UNLOCKER_ZONE=your_key_here
BROWSER_AUTH="your_browser_auth_token"
#Anthropic AI API KEY
ANTHROPIC_API_KEY=your_key_here
```
## π¦ Installation
```bash
git clone https://github.com/yourusername/ai-webscraper-agent.git
cd ai-webscraper-agent
uv pip install -r requirements.txt
```
## RUN App
### Start the FastAPI backend server and Streamlit app'
### Start the backend FastAPI server
```bash
uv run backend.py
```
### Start frontend Streamlit app
```bash
streamlit run frontend.py
```
## Example Usage
## Ask:
```
Scrape the top 5 news headlines from https://bbc.com and summarize them.
```
## Get Response:
```
1. Headline A - Summary
2. Headline B - Summary
3. Headline C - Summary
4. Headline C - Summary
...
```
## Agent Flow
[User Prompt] β‘ [Streamlit UI] β‘ [FastAPI Router] β‘ [LLM Agent]
β‘ [Brightdata Tool via MCP] β‘ [LLM Summarization] β‘ [UI Response]