An open API service indexing awesome lists of open source software.

https://github.com/appugouda-architect/ai-webscraper-agent

AI-powered web-scraping agent that uses the Brightdata MCP Server, Fast API, Python, Streamlit UI, Anthropic LLM
https://github.com/appugouda-architect/ai-webscraper-agent

anthropic fastapi mcp python streamlit

Last synced: 12 months ago
JSON representation

AI-powered web-scraping agent that uses the Brightdata MCP Server, Fast API, Python, Streamlit UI, Anthropic LLM

Awesome Lists containing this project

README

          

# πŸ•ΈοΈ AI Webscraper Agent

An AI-powered webscraping agent that uses the Brightdata MCP server to extract and summarize content from the web. Built with a modular architecture combining LLM reasoning, robust scraping, and a simple web interface.

---

## πŸ”§ Tech Stack

- **Frontend**: [Streamlit](https://streamlit.io/)
- **Backend**: [FastAPI](https://fastapi.tiangolo.com/)
- **Language**: Python
- **Scraping**: [Brightdata MCP Server](https://brightdata.com/)
- **AI Model**: Anthropic LLM (Claude)

---

## πŸš€ Features

- Natural language interface to extract data from websites
- Uses Brightdata MCP for reliable web scraping
- LLM-powered summarization and reasoning
- Streamlit-based interactive frontend
- Async FastAPI backend integration

---

## Environment Variables

Create a .env file and configure the following:

```dotenv
# .env
# Environment Variables for AI Webscraper Agent
# Replace 'your_key_here' with your actual API keys

# Bright Data
API_TOKEN=your_key_here
WEB_UNLOCKER_ZONE=your_key_here
BROWSER_AUTH="your_browser_auth_token"

#Anthropic AI API KEY
ANTHROPIC_API_KEY=your_key_here
```

## πŸ“¦ Installation

```bash
git clone https://github.com/yourusername/ai-webscraper-agent.git
cd ai-webscraper-agent

uv pip install -r requirements.txt
```

## RUN App

### Start the FastAPI backend server and Streamlit app'

### Start the backend FastAPI server

```bash
uv run backend.py
```

### Start frontend Streamlit app

```bash
streamlit run frontend.py
```

## Example Usage

## Ask:

```
Scrape the top 5 news headlines from https://bbc.com and summarize them.
```

## Get Response:

```
1. Headline A - Summary
2. Headline B - Summary
3. Headline C - Summary
4. Headline C - Summary
...
```

## Agent Flow

[User Prompt] ➑ [Streamlit UI] ➑ [FastAPI Router] ➑ [LLM Agent]
➑ [Brightdata Tool via MCP] ➑ [LLM Summarization] ➑ [UI Response]