https://github.com/qwertyfusion/web-scraper-python

WebScrap AI - A powerful AI-driven web scraping and summarization tool that extracts content from websites, YouTube videos, and search results. It processes the extracted data using Google's Gemini Flash 2.0 for intelligent summarization.
https://github.com/qwertyfusion/web-scraper-python

ai flask gemini llm nextjs python3 webscraping

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/qwertyfusion/web-scraper-python
Owner: QwertyFusion
License: mit
Created: 2025-03-18T16:08:07.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-03-18T19:42:48.000Z (over 1 year ago)
Last Synced: 2025-03-18T19:45:39.807Z (over 1 year ago)
Topics: ai, flask, gemini, llm, nextjs, python3, webscraping
Language: TypeScript
Homepage:
Size: 495 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# 🌐 WebScrap AI

![WebScrap AI](./preview/banner.png)

Scrape websites & YouTube videos effortlessly. Extract key insights, summaries, and data in seconds.

A powerful AI-driven web scraping and summarization tool that extracts content from websites, YouTube videos, and search results. It processes the extracted data using Google's Gemini Flash 2.0 for intelligent summarization. Built with Flask, Next.js, Tailwind CSS, and TypeScript for a seamless user experience. 🚀

---

## 🚀 Features

- 🌐 Scrape websites, YouTube transcripts, or perform keyword searches.
- 🤖 Uses **Gemini Flash 2.0** API for intelligent text processing.
- 🔎 DuckDuckGo-powered web search for relevant content.
- 🖥️ **Flask** backend with a **Next.js** frontend.
- 🎨 Styled using **Tailwind CSS**.

---

## 🖼️ Preview
![Home Page](./preview/home_page.png)
![Search Result](./preview/result.png)

---

## 📜 License

WebScrap AI is open-source and released under the **MIT License**.
See the [LICENSE](./LICENSE) file for more details.

---

## 🛠️ Get Started

### 1️⃣ Clone the Repository
```sh
git clone "https://github.com/QwertyFusion/web-scraper-python.git"
cd web-scrapper-python
```

### 2️⃣ Backend Setup (Flask)

#### Navigate to Backend Folder
```sh
cd backend
```

#### Create and Activate Virtual Environment (venv)
```sh
python -m venv venv # Create virtual environment
source venv/bin/activate # MacOS/Linux
venv\Scripts\activate # Windows
```

#### Install Dependencies
```sh
pip install -r requirements.txt
```

### 3️⃣ Frontend Setup (Next.js)

#### Navigate to Frontend Folder
```sh
cd frontend
```

#### Install Dependencies
```sh
npm install
```

### 4️⃣ Environment Variables

#### Create `.env` inside `backend/` for Flask Backend:
```env
GEMINI_API_KEY=your-gemini-api-key
```

#### Create `.env.local` inside `frontend/` for Next.js:
```env
NEXT_PUBLIC_BACKEND_URL=http://127.0.0.1:5000 # Change if backend runs on a different URL
```

### 5️⃣ Run the Project

#### Start the Flask Backend from `backend/` directory
```sh
python app.py # Ensure the virtual environment is activated
```

#### Start the Next.js Frontend from `frontend/` directory
```sh
npm run dev # Runs the frontend on localhost:3000
```

Now, open your browser and go to **http://localhost:3000** to start using WebScrap AI! 🚀

---

## 🛠 Tools Used