Projects in Awesome Lists tagged with website-scraper

https://github.com/website-scraper/node-website-scraper

Download website to local directory (including all css, images, js, etc.)

hacktoberfest javascript nodejs scraper website-scraper

Last synced: 13 May 2025

https://github.com/goclone-dev/goclone

Website Cloner - Utilizes powerful Go routines to clone websites to your computer within seconds.

cloning crawler go golang website-cloner website-scraper

Last synced: 14 May 2025

https://github.com/imthaghost/goclone

Website Cloner - Utilizes powerful Go routines to clone websites to your computer within seconds.

cloning crawler go golang website-cloner website-scraper

Last synced: 24 Mar 2025

https://github.com/josephlimtech/linkedin-profile-scraper-api

🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON.

crawler crawling expressjs json linkedin linkedin-bot linkedin-crawler linkedin-profile linkedin-profile-scraper linkedin-scraper linkedin-scraping nodejs profile-data puppeteer scraper scrapers scraping scraping-websites spider website-scraper

Last synced: 04 Apr 2025

https://github.com/z0m31en7/uscrapper

Uscrapper Vanta: Dive deeper into the web with this powerful open-source tool. Extract valuable insights with ease and efficiency, from both surface and deep web sources. Empower your data mining and analysis with Vanta's advanced capabilities. Fast, reliable, and user-friendly, Uscrapper Vanta is the ultimate choice for researchers and analysts.

darkweb darkweb-crawler information-extraction information-gathering osint osint-python osint-tool python reconnaissance selenium selenium-webscraper tor web-scraping webcra webcrawler webscraping website-scraper websites

Last synced: 15 May 2025

https://github.com/z0m31en7/Uscrapper

Uscrapper Vanta: Dive deeper into the web with this powerful open-source tool. Extract valuable insights with ease and efficiency, from both surface and deep web sources. Empower your data mining and analysis with Vanta's advanced capabilities. Fast, reliable, and user-friendly, Uscrapper Vanta is the ultimate choice for researchers and analysts.

darkweb darkweb-crawler information-extraction information-gathering osint osint-python osint-tool python reconnaissance selenium selenium-webscraper tor web-scraping webcra webcrawler webscraping website-scraper websites

Last synced: 05 May 2025

https://github.com/website-scraper/website-scraper-puppeteer

Plugin for website-scraper which returns html for dynamic websites using puppeteer

chrome chromium hacktoberfest javascript nodejs puppeteer scraper website-scraper

Last synced: 15 May 2025

https://github.com/Kooboo/Kooboo

CMS, WebSite, Application and Ecommerce Development Tool Using JavaScript

cms development javascript kooboo magento shopify templates web-application-platform website-builder website-development website-scraper wordpress

Last synced: 24 Mar 2025

https://github.com/kooboo/kooboo

CMS, WebSite, Application and Ecommerce Development Tool Using JavaScript

cms development javascript kooboo magento shopify templates web-application-platform website-builder website-development website-scraper wordpress

Last synced: 08 Apr 2025

https://github.com/erlange/wbm-dl

Wayback Machine Downloader. 🔥 Download your entire archived websites from the Internet Archive Wayback Machine.

command-line-app command-line-parser command-line-tool console console-app console-application csharp internet internet-archive internet-wayback-machine wayback-machine wayback-machine-downloader website-scraper

Last synced: 05 Apr 2025

https://github.com/xarantolus/Collect

A server to collect & archive websites that also supports video downloads

archive self-hosted video-downloader web-archiving webinterface website-archive website-scraper

Last synced: 10 May 2025

https://github.com/LexiestLeszek/scrapeGPT

ScrapeGPT is a RAG-based Telegram bot designed to scrape and analyze websites, then answer questions based on the scraped content. The bot utilizes Retrieval Augmented Generation and webscraping to return natural language answers to the user's queries.

crawler huggingface large-language-models llm ollama proxy rag retrieval-augmented-generation robots-txt scraper telegram-bot website-scraper

Last synced: 07 Apr 2025

https://github.com/lexiestleszek/scrapegpt

ScrapeGPT is a RAG-based Telegram bot designed to scrape and analyze websites, then answer questions based on the scraped content. The bot utilizes Retrieval Augmented Generation and webscraping to return natural language answers to the user's queries.

crawler huggingface large-language-models llm ollama proxy rag retrieval-augmented-generation robots-txt scraper telegram-bot website-scraper

Last synced: 11 Mar 2026

https://github.com/html2rss/html2rss-web

🕸 Generates and delivers RSS feeds via HTTP. Docker image available! Create your own feeds or get started quickly with the included configs.

builder docker feed feed-configs html2rss html2rss-configs roda rolling-release rss rss-aggregator rss-feed rss-feed-scraper ruby scraper serves webfeed webfeeds website-scraper

Last synced: 14 Mar 2025

https://github.com/xarantolus/collect

A server to collect & archive websites that also supports video downloads

archive self-hosted video-downloader web-archiving webinterface website-archive website-scraper

Last synced: 23 Apr 2025

https://github.com/shurco/goClone

🌱 goClone - clone websites in seconds

cloner cloning crawler crawling go goclone golang hacktoberfest scraping scraping-websites scrapper website-cloner website-scraper wp2static

Last synced: 05 May 2025

https://github.com/yuis-ice/jseval

Evaluate JavaScript on a URL through headless Chrome browser.

browser-automation cli-utilities cmdline command-line commandline-interface data-scraping datascraping eval evaluator headless-browser headless-browsers pupeteer scrapers scrapper scrapping web-browser web-crawling web-scrapping webscrapping website-scraper

Last synced: 11 Apr 2025

https://github.com/faheel/file-extensions

JSON collection of scraped file extensions, along with their description and type, from FileInfo.com

file-extensions fileinfo json python3 scraped-data scraper website-scraper

Last synced: 01 Mar 2026

https://github.com/jeanrauwers/followers-scraper-serverless

Now you can keep track of your followers from YouTube, Instagram and Twitter accounts - Followers scraper API on AWS serverless

aws aws-lambda aws-serverless followers-scraper instagram instagram-scraper instagramscraper lambda nodejs-lambda scraper twitter twitter-scraper twittersc typescript webscraper webscraper-api webscraping website-scraper youtube

Last synced: 10 Apr 2025

https://github.com/website-scraper/website-scraper-existing-directory

Plugin for website-scraper which allows to save resources to existing directory

hacktoberfest javascript nodejs website-scraper

Last synced: 22 Apr 2025

https://github.com/orangmuda/SECTOOL

sᴇᴀʀᴄʜ ᴇɴɢɪɴᴇ sᴄʀᴀᴘᴇʀ ᴛᴏᴏʟ (ʙᴀsʜ)

crawler crawling scraper website-scraper

Last synced: 22 May 2026

https://github.com/dsc8x/node-scraper

Scraping websites made easy! A minimalistic yet powerful tool for collecting data from websites.

axios cheerio javascript node scraper scraping website-scraper

Last synced: 01 Apr 2025

https://github.com/dtflare/GPTparser

Use GPTparser with your OpenAI API to scrape & parse files into structured JSON files.

dataset-creation json-mode json-parser openai-api-chatbot website-scraper

Last synced: 14 Mar 2025

https://github.com/nigeld3v/Tumblr_Image_scrape

Download ALL the images (JPEG/GIF/PNG) from any Tumblr website! This project employs Python3 and BeautifulSoup4 to scrape a Tumblr site (with the url provided by the user) to download, page by page, all the images from the Tumblr site's posts. Ideal for archiving other peoples' Tumblrs <3

archive art beautifulsoup beautifulsoup4 blog blogging comics design fashion gif gifs graphics graphics-library image images scraper tumblr tumblr-image-scrape webcomics website-scraper

Last synced: 03 Apr 2025

https://github.com/codassassin/website-url-scraper

This is a website url scraper built using python.

url-finder url-parser website-scanner website-scraper

Last synced: 22 Jul 2025

https://github.com/methyldragon/news-anacrawler

Article Dataset Generator for Internet News Sites. Crawls news sites, analyses them with NLP (sentiment analysis), and pushes to a database.

dataset-generation jupyter-notebook python3 scraping script website-scraper

Last synced: 15 Jul 2025

https://github.com/tnytcoder/url_checker

Python Script To Verify Url Existence And Provide Basic Information

hacking-tool python requests termux termux-hacking termux-tools web-scraping website-scraper

Last synced: 13 Apr 2025

https://github.com/oceanside-chess/email-scraper

Scrape emails from a website using recursive crawling, the best anti-obfuscation techniques, and validate all addresses before saving to a file.

bot email-extraction email-extractor email-scraper email-validation go go-package golang spider web-crawler web-scraper web-scraping web-scraping-software website-scraper

Last synced: 17 Mar 2026

https://github.com/zebbern/reconx

🕷️ | ReconX is a Live-Website Crawler made to gather critical information with an option to take a picture of each site crawled!

crawler hacking information-gathering information-retrieval information-security livedata opsec osint osint-tool pentest python python-crawler search-engine security security-tools website website-crawler website-scraper website-security

Last synced: 03 Jul 2025

https://github.com/hudson-newey/website-text-extractor

This is a project to systematically extract all readable text out of a web page (only works on very primitive pages at the moment)

reader-mode text text-classification text-processing website-scraper

Last synced: 16 Jun 2026

https://github.com/maheshpaulj/Manga-Scraper

Manga Scraping Tool made in python, It fetches the manga page from the website and downloads it in JPG format and saves it locally. This is basically web Scraping

manga-scraper python web-scraping website-scraper

Last synced: 29 Aug 2025

https://github.com/chorozon666/chad

Chad - Dorking / Website Vulnerability Tool

dork dorking dorking-target dorking-tool dorks google python scrapper scrapper-bot scrapper-script sql vulnerability-detection vulnerability-scanners website-scraper wordpress wordpress-site

Last synced: 27 Jan 2026

https://github.com/arif98741/deadlink-checker-python

A Python tool to crawl websites and check for broken/dead links with detailed reporting in both text and PDF formats.

crawler crawling python python3 website-scraper

Last synced: 18 Apr 2026

https://github.com/maheshpaulj/manga-scraper

Manga Scraping Tool made in python, It fetches the manga page from the website and downloads it in JPG format and saves it locally. This is basically web Scraping

manga-scraper python web-scraping website-scraper

Last synced: 15 May 2025

https://github.com/1970mr/link-crawler

Web Link Crawler: A Python script to crawl websites and collect links based on a regex pattern. Efficient and customizable.

clawler crawler crawler-python link-crawler link-crawler-python link-scraper link-scraper-python links python scraper scraper-python website-crawler website-scraper

Last synced: 06 Feb 2026

https://github.com/ccrashzer0/reconnaissance_scanner

This scanner will allow you to collect information an you target site.

linux python-3 python3 reconnaissance script scripting-language scripts udemy-tutorial webscraping website-scraper

Last synced: 17 May 2026

https://github.com/tinram/login-spider

Spider through a website login and process the pages behind it.

log-in login login-spider python scraper spider website website-scraper

Last synced: 17 Aug 2025

https://github.com/maradotwebp/ra-reader

:arrow_down: A Program created to scrap website data from pmg.ages.at, the austrian register of pesticides.

files java ra-reader scraper scraping urlsession website-scraper xml

Last synced: 03 Oct 2025

https://github.com/developer-sumit/web-groper-python

WebGroper is a Python class designed to recursively scrape and download media files (images, PDFs, etc.) from a specified website directory, such as the /wp-content/uploads directory of a WordPress site.

package python-package web-scr web-scraping website-scraper wordpress-scraper wordpress-website-scraper

Last synced: 19 Feb 2026

https://github.com/methyldragon/fb_embedded_comment_scraper

A scraper for gathering data from Facebook's embedded comment widgets for all pages on any number of URLs! It bypasses the Facebook graph API (you don't need an access token) so there's little risk of throttling.

dataset-generation python3 scraping script website-scraper

Last synced: 17 Mar 2025

https://github.com/jmitander/jmscraper

Scrape web pages and effortlessly extract the data you need. Easy, robust, efficient, and intuitively user-friendly.

extract-data extract-media extract-metadata extractor scraping scraping-web scraping-websites webscraper webscraping website-scraper webtool

Last synced: 06 Sep 2025

https://github.com/linusrachlis/fringr2-scraper

Scraper for show info and performance times on the Toronto Fringe website. Used for linusrachlis/fringr2-fe

theater theatre toronto website-scraper

Last synced: 22 Jan 2026

https://github.com/i2rys/smtsaloaw

Simple module to scrape links on a website.

i2rys nodejs-scrape-website-module smtsaloaw website-links-module website-links-scraper website-scraper

Last synced: 24 Aug 2025

https://github.com/tkemza/webscrap

Webscrap, an automated tool for monitoring website responses, built for educational purposes—use responsibly!

website website-scraper websitescanner

Last synced: 28 Feb 2025

https://github.com/sharmadhiraj/web_scraper_php_goutte

Web Scraper with Goutte (PHP)

data-crawling goutte php scraper scrapy-crawler website-scraper

Last synced: 28 Mar 2025

https://github.com/masakudamatsu/site-search-chatbot

A powerful, open-source RAG (Retrieval-Augmented Generation) chatbot designed to provide a "NotebookLM-like" search experience for any specific website. Unlike generic search bars, it understands natural language queries and provides conversational answers with direct citations to the source pages.

embedding rag-chatbot search webapp website-scraper

Last synced: 02 Apr 2026