{"id":26522464,"url":"https://github.com/qwertyfusion/web-scraper-python","last_synced_at":"2026-04-06T08:02:18.011Z","repository":{"id":283142046,"uuid":"950741988","full_name":"QwertyFusion/web-scraper-python","owner":"QwertyFusion","description":"WebScrap AI - A powerful AI-driven web scraping and summarization tool that extracts content from websites, YouTube videos, and search results. It processes the extracted data using Google's Gemini Flash 2.0 for intelligent summarization.","archived":false,"fork":false,"pushed_at":"2025-03-18T19:42:48.000Z","size":507,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-18T19:45:39.807Z","etag":null,"topics":["ai","flask","gemini","llm","nextjs","python3","webscraping"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/QwertyFusion.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-18T16:08:07.000Z","updated_at":"2025-03-18T19:42:51.000Z","dependencies_parsed_at":"2025-03-18T19:55:52.522Z","dependency_job_id":null,"html_url":"https://github.com/QwertyFusion/web-scraper-python","commit_stats":null,"previous_names":["qwertyfusion/web-scrapper-python"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QwertyFusion%2Fweb-scraper-python","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QwertyFusion%2Fweb-scraper-python/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QwertyFusion%2Fweb-scraper-python/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QwertyFusion%2Fweb-scraper-python/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/QwertyFusion","download_url":"https://codeload.github.com/QwertyFusion/web-scraper-python/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244805235,"owners_count":20513238,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","flask","gemini","llm","nextjs","python3","webscraping"],"created_at":"2025-03-21T13:27:07.552Z","updated_at":"2025-12-30T23:58:05.428Z","avatar_url":"https://github.com/QwertyFusion.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🌐 WebScrap AI\n\n![WebScrap AI](./preview/banner.png)\n\n\u003cp align=\"center\"\u003e\u003cstrong\u003eScrape websites \u0026 YouTube videos effortlessly. Extract key insights, summaries, and data in seconds.\u003c/strong\u003e\u003c/p\u003e\n\nA powerful AI-driven web scraping and summarization tool that extracts content from websites, YouTube videos, and search results. It processes the extracted data using Google's Gemini Flash 2.0 for intelligent summarization. Built with Flask, Next.js, Tailwind CSS, and TypeScript for a seamless user experience. 🚀\n\n---\n\n## 🚀 Features\n\n- 🌐 Scrape websites, YouTube transcripts, or perform keyword searches.\n- 🤖 Uses **Gemini Flash 2.0** API for intelligent text processing.\n- 🔎 DuckDuckGo-powered web search for relevant content.\n- 🖥️ **Flask** backend with a **Next.js** frontend.\n- 🎨 Styled using **Tailwind CSS**.\n\n---\n\n## 🖼️ Preview\n![Home Page](./preview/home_page.png) \n![Search Result](./preview/result.png)\n\n---\n\n## 📜 License  \n\nWebScrap AI is open-source and released under the **MIT License**.  \nSee the [LICENSE](./LICENSE) file for more details.\n\n---\n\n## 🛠️ Get Started\n\n### 1️⃣ Clone the Repository\n```sh\ngit clone \"https://github.com/QwertyFusion/web-scraper-python.git\"\ncd web-scrapper-python\n```\n\n### 2️⃣ Backend Setup (Flask)\n\n#### Navigate to Backend Folder\n```sh\ncd backend\n```\n\n#### Create and Activate Virtual Environment (venv)\n```sh\npython -m venv venv  # Create virtual environment\nsource venv/bin/activate  # MacOS/Linux\nvenv\\Scripts\\activate  # Windows\n```\n\n#### Install Dependencies\n```sh\npip install -r requirements.txt\n```\n\n### 3️⃣ Frontend Setup (Next.js)\n\n#### Navigate to Frontend Folder\n```sh\ncd frontend\n```\n\n#### Install Dependencies\n```sh\nnpm install\n```\n\n### 4️⃣ Environment Variables\n\n#### Create `.env` inside `backend/` for Flask Backend:\n```env\nGEMINI_API_KEY=your-gemini-api-key\n```\n\n#### Create `.env.local` inside `frontend/` for Next.js:\n```env\nNEXT_PUBLIC_BACKEND_URL=http://127.0.0.1:5000  # Change if backend runs on a different URL\n```\n\n### 5️⃣ Run the Project\n\n#### Start the Flask Backend from `backend/` directory\n```sh\npython app.py  # Ensure the virtual environment is activated\n```\n\n#### Start the Next.js Frontend from `frontend/` directory\n```sh\nnpm run dev  # Runs the frontend on localhost:3000\n```\n\nNow, open your browser and go to **http://localhost:3000** to start using WebScrap AI! 🚀\n\n---\n\n## 🛠 Tools Used  \n\n\u003col\u003e\n  \u003cli\u003eVisual Studio Code\u003c/li\u003e\n  \u003cli\u003eNext.js\u003c/li\u003e\n  \u003cli\u003eTypeScript\u003c/li\u003e\n  \u003cli\u003eTailwind CSS\u003c/li\u003e\n  \u003cli\u003eFlask\u003c/li\u003e\n  \u003cli\u003eBeautifulSoup (Web Scraping)\u003c/li\u003e\n  \u003cli\u003eDuckDuckGo Search API\u003c/li\u003e\n  \u003cli\u003eYouTube Transcript API\u003c/li\u003e\n  \u003cli\u003eGemini API (AI Processing)\u003c/li\u003e\n  \u003cli\u003eGit \u0026 GitHub (Version Control)\u003c/li\u003e\n\u003c/ol\u003e\n\n---\n\n## 🔗 Link to Tools  \n\n\u003cp align=\"left\"\u003e\n\u003ca href=\"https://code.visualstudio.com\" target=\"_blank\" rel=\"noreferrer\"\u003e\n  \u003cimg src=\"https://www.vectorlogo.zone/logos/visualstudio_code/visualstudio_code-icon.svg\" alt=\"Visual Studio Code\" width=\"40\" height=\"40\"/\u003e\n\u003c/a\u003e\u0026emsp;\n\u003ca href=\"https://nextjs.org/\" target=\"_blank\" rel=\"noreferrer\"\u003e\n  \u003cimg src=\"https://marcbruederlin.gallerycdn.vsassets.io/extensions/marcbruederlin/next-icons/0.1.0/1723747598319/Microsoft.VisualStudio.Services.Icons.Default\" alt=\"Next.js\" width=\"40\" height=\"40\"/\u003e\n\u003c/a\u003e\u0026emsp;\n\u003ca href=\"https://www.typescriptlang.org/\" target=\"_blank\" rel=\"noreferrer\"\u003e\n  \u003cimg src=\"https://raw.githubusercontent.com/devicons/devicon/master/icons/typescript/typescript-original.svg\" alt=\"TypeScript\" width=\"40\" height=\"40\"/\u003e\n\u003c/a\u003e\u0026emsp;\n\u003ca href=\"https://tailwindcss.com/\" target=\"_blank\" rel=\"noreferrer\"\u003e\n  \u003cimg src=\"https://www.vectorlogo.zone/logos/tailwindcss/tailwindcss-icon.svg\" alt=\"Tailwind CSS\" width=\"40\" height=\"40\"/\u003e\n\u003c/a\u003e\u0026emsp;\n\u003ca href=\"https://flask.palletsprojects.com/\" target=\"_blank\" rel=\"noreferrer\"\u003e\n  \u003cimg src=\"https://play-lh.googleusercontent.com/ekpyJiZppMBBxCR5hva9Zz1pr3MYlFP-vWTYR3eIU7HOMAmg3jCJengHJ1GFgFMyyYc\" alt=\"Flask\" width=\"40\" height=\"40\"/\u003e\n\u003c/a\u003e\u0026emsp;\n\u003ca href=\"https://www.crummy.com/software/BeautifulSoup/\" target=\"_blank\" rel=\"noreferrer\"\u003e\n  \u003cimg src=\"https://cdn-icons-png.flaticon.com/512/1348/1348781.png\" alt=\"BeautifulSoup\" width=\"40\" height=\"40\"/\u003e\n\u003c/a\u003e\u0026emsp;\n\u003ca href=\"https://duckduckgo.com/\" target=\"_blank\" rel=\"noreferrer\"\u003e\n  \u003cimg src=\"https://cdn-llcdl.nitrocdn.com/QAgOfWkPLJQEZBsznqhKTXqQaWtXlbkU/assets/images/optimized/rev-f21cbe9/direction.com/wp-content/uploads/2023/05/duckduckgo.png\" alt=\"DuckDuckGo\" width=\"40\" height=\"40\"/\u003e\n\u003c/a\u003e\u0026emsp;\n\u003ca href=\"https://developers.google.com/youtube/v3/docs/captions\" target=\"_blank\" rel=\"noreferrer\"\u003e\n  \u003cimg src=\"https://upload.wikimedia.org/wikipedia/commons/e/ef/Youtube_logo.png\" alt=\"YouTube Transcript API\" width=\"40\" /\u003e\n\u003c/a\u003e\u0026emsp;\n\u003ca href=\"https://ai.google.dev/\" target=\"_blank\" rel=\"noreferrer\"\u003e\n  \u003cimg src=\"https://pipedream.com/s.v0/app_ArhjGP/logo/orig\" alt=\"Gemini API\" width=\"40\" height=\"40\"/\u003e\n\u003c/a\u003e\u0026emsp;\n\u003ca href=\"https://git-scm.com/\" target=\"_blank\" rel=\"noreferrer\"\u003e\n  \u003cimg src=\"https://www.vectorlogo.zone/logos/git-scm/git-scm-icon.svg\" alt=\"Git\" width=\"40\" height=\"40\"/\u003e\n\u003c/a\u003e\u0026emsp;\n\u003ca href=\"https://github.com/\" target=\"_blank\" rel=\"noreferrer\"\u003e\n  \u003cimg src=\"https://uxwing.com/wp-content/themes/uxwing/download/brands-and-social-media/github-white-icon.png\" alt=\"GitHub\" width=\"40\" height=\"40\"/\u003e\n\u003c/a\u003e\n\u003c/p\u003e\n\n---\n\n## 👨‍💻 Developer  \n\n\u003cul\u003e\n  \u003cli\u003e\u003ca href=\"https://github.com/QwertyFusion\"\u003e[@QwertyFusion]\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqwertyfusion%2Fweb-scraper-python","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fqwertyfusion%2Fweb-scraper-python","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqwertyfusion%2Fweb-scraper-python/lists"}