https://github.com/qwertyfusion/web-scraper-python
WebScrap AI - A powerful AI-driven web scraping and summarization tool that extracts content from websites, YouTube videos, and search results. It processes the extracted data using Google's Gemini Flash 2.0 for intelligent summarization.
https://github.com/qwertyfusion/web-scraper-python
ai flask gemini llm nextjs python3 webscraping
Last synced: 3 months ago
JSON representation
WebScrap AI - A powerful AI-driven web scraping and summarization tool that extracts content from websites, YouTube videos, and search results. It processes the extracted data using Google's Gemini Flash 2.0 for intelligent summarization.
- Host: GitHub
- URL: https://github.com/qwertyfusion/web-scraper-python
- Owner: QwertyFusion
- License: mit
- Created: 2025-03-18T16:08:07.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-18T19:42:48.000Z (over 1 year ago)
- Last Synced: 2025-03-18T19:45:39.807Z (over 1 year ago)
- Topics: ai, flask, gemini, llm, nextjs, python3, webscraping
- Language: TypeScript
- Homepage:
- Size: 495 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 🌐 WebScrap AI

Scrape websites & YouTube videos effortlessly. Extract key insights, summaries, and data in seconds.
A powerful AI-driven web scraping and summarization tool that extracts content from websites, YouTube videos, and search results. It processes the extracted data using Google's Gemini Flash 2.0 for intelligent summarization. Built with Flask, Next.js, Tailwind CSS, and TypeScript for a seamless user experience. 🚀
---
## 🚀 Features
- 🌐 Scrape websites, YouTube transcripts, or perform keyword searches.
- 🤖 Uses **Gemini Flash 2.0** API for intelligent text processing.
- 🔎 DuckDuckGo-powered web search for relevant content.
- 🖥️ **Flask** backend with a **Next.js** frontend.
- 🎨 Styled using **Tailwind CSS**.
---
## 🖼️ Preview


---
## 📜 License
WebScrap AI is open-source and released under the **MIT License**.
See the [LICENSE](./LICENSE) file for more details.
---
## 🛠️ Get Started
### 1️⃣ Clone the Repository
```sh
git clone "https://github.com/QwertyFusion/web-scraper-python.git"
cd web-scrapper-python
```
### 2️⃣ Backend Setup (Flask)
#### Navigate to Backend Folder
```sh
cd backend
```
#### Create and Activate Virtual Environment (venv)
```sh
python -m venv venv # Create virtual environment
source venv/bin/activate # MacOS/Linux
venv\Scripts\activate # Windows
```
#### Install Dependencies
```sh
pip install -r requirements.txt
```
### 3️⃣ Frontend Setup (Next.js)
#### Navigate to Frontend Folder
```sh
cd frontend
```
#### Install Dependencies
```sh
npm install
```
### 4️⃣ Environment Variables
#### Create `.env` inside `backend/` for Flask Backend:
```env
GEMINI_API_KEY=your-gemini-api-key
```
#### Create `.env.local` inside `frontend/` for Next.js:
```env
NEXT_PUBLIC_BACKEND_URL=http://127.0.0.1:5000 # Change if backend runs on a different URL
```
### 5️⃣ Run the Project
#### Start the Flask Backend from `backend/` directory
```sh
python app.py # Ensure the virtual environment is activated
```
#### Start the Next.js Frontend from `frontend/` directory
```sh
npm run dev # Runs the frontend on localhost:3000
```
Now, open your browser and go to **http://localhost:3000** to start using WebScrap AI! 🚀
---
## 🛠 Tools Used
- Visual Studio Code
- Next.js
- TypeScript
- Tailwind CSS
- Flask
- BeautifulSoup (Web Scraping)
- DuckDuckGo Search API
- YouTube Transcript API
- Gemini API (AI Processing)
- Git & GitHub (Version Control)
---
## 🔗 Link to Tools
---
## 👨💻 Developer