https://github.com/appugouda-architect/ai-webscraper-agent

AI-powered web-scraping agent that uses the Brightdata MCP Server, Fast API, Python, Streamlit UI, Anthropic LLM
https://github.com/appugouda-architect/ai-webscraper-agent

anthropic fastapi mcp python streamlit

Last synced: about 1 year ago
JSON representation

AI-powered web-scraping agent that uses the Brightdata MCP Server, Fast API, Python, Streamlit UI, Anthropic LLM

Host: GitHub
URL: https://github.com/appugouda-architect/ai-webscraper-agent
Owner: appugouda-architect
Created: 2025-06-21T04:46:58.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-06-24T07:00:36.000Z (about 1 year ago)
Last Synced: 2025-06-30T10:47:34.760Z (about 1 year ago)
Topics: anthropic, fastapi, mcp, python, streamlit
Language: Python
Homepage:
Size: 49.8 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# 🕸️ AI Webscraper Agent

An AI-powered webscraping agent that uses the Brightdata MCP server to extract and summarize content from the web. Built with a modular architecture combining LLM reasoning, robust scraping, and a simple web interface.

---

## 🔧 Tech Stack

- **Frontend**: [Streamlit](https://streamlit.io/)
- **Backend**: [FastAPI](https://fastapi.tiangolo.com/)
- **Language**: Python
- **Scraping**: [Brightdata MCP Server](https://brightdata.com/)
- **AI Model**: Anthropic LLM (Claude)

---

## 🚀 Features

- Natural language interface to extract data from websites
- Uses Brightdata MCP for reliable web scraping
- LLM-powered summarization and reasoning
- Streamlit-based interactive frontend
- Async FastAPI backend integration

---

## Environment Variables

Create a .env file and configure the following:

```dotenv
# .env
# Environment Variables for AI Webscraper Agent
# Replace 'your_key_here' with your actual API keys

# Bright Data
API_TOKEN=your_key_here
WEB_UNLOCKER_ZONE=your_key_here
BROWSER_AUTH="your_browser_auth_token"

#Anthropic AI API KEY
ANTHROPIC_API_KEY=your_key_here
```

## 📦 Installation

```bash
git clone https://github.com/yourusername/ai-webscraper-agent.git
cd ai-webscraper-agent

uv pip install -r requirements.txt
```

## RUN App

### Start the FastAPI backend server and Streamlit app'

### Start the backend FastAPI server

```bash
uv run backend.py
```

### Start frontend Streamlit app

```bash
streamlit run frontend.py
```

## Example Usage

## Ask:

```
Scrape the top 5 news headlines from https://bbc.com and summarize them.
```

## Get Response:

```
1. Headline A - Summary
2. Headline B - Summary
3. Headline C - Summary
4. Headline C - Summary
...
```

## Agent Flow

[User Prompt] ➡ [Streamlit UI] ➡ [FastAPI Router] ➡ [LLM Agent]
➡ [Brightdata Tool via MCP] ➡ [LLM Summarization] ➡ [UI Response]

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/appugouda-architect/ai-webscraper-agent

Awesome Lists containing this project

README