Projects in Awesome Lists tagged with website-scraper
A curated list of projects in awesome lists tagged with website-scraper .
https://github.com/website-scraper/node-website-scraper
Download website to local directory (including all css, images, js, etc.)
hacktoberfest javascript nodejs scraper website-scraper
Last synced: 13 May 2025
https://github.com/goclone-dev/goclone
Website Cloner - Utilizes powerful Go routines to clone websites to your computer within seconds.
cloning crawler go golang website-cloner website-scraper
Last synced: 14 May 2025
https://github.com/imthaghost/goclone
Website Cloner - Utilizes powerful Go routines to clone websites to your computer within seconds.
cloning crawler go golang website-cloner website-scraper
Last synced: 24 Mar 2025
https://github.com/josephlimtech/linkedin-profile-scraper-api
🕵️♂️ LinkedIn profile scraper returning structured profile data in JSON.
crawler crawling expressjs json linkedin linkedin-bot linkedin-crawler linkedin-profile linkedin-profile-scraper linkedin-scraper linkedin-scraping nodejs profile-data puppeteer scraper scrapers scraping scraping-websites spider website-scraper
Last synced: 04 Apr 2025
https://github.com/z0m31en7/uscrapper
Uscrapper Vanta: Dive deeper into the web with this powerful open-source tool. Extract valuable insights with ease and efficiency, from both surface and deep web sources. Empower your data mining and analysis with Vanta's advanced capabilities. Fast, reliable, and user-friendly, Uscrapper Vanta is the ultimate choice for researchers and analysts.
darkweb darkweb-crawler information-extraction information-gathering osint osint-python osint-tool python reconnaissance selenium selenium-webscraper tor web-scraping webcra webcrawler webscraping website-scraper websites
Last synced: 15 May 2025
https://github.com/z0m31en7/Uscrapper
Uscrapper Vanta: Dive deeper into the web with this powerful open-source tool. Extract valuable insights with ease and efficiency, from both surface and deep web sources. Empower your data mining and analysis with Vanta's advanced capabilities. Fast, reliable, and user-friendly, Uscrapper Vanta is the ultimate choice for researchers and analysts.
darkweb darkweb-crawler information-extraction information-gathering osint osint-python osint-tool python reconnaissance selenium selenium-webscraper tor web-scraping webcra webcrawler webscraping website-scraper websites
Last synced: 05 May 2025
https://github.com/website-scraper/website-scraper-puppeteer
Plugin for website-scraper which returns html for dynamic websites using puppeteer
chrome chromium hacktoberfest javascript nodejs puppeteer scraper website-scraper
Last synced: 15 May 2025
https://github.com/Kooboo/Kooboo
CMS, WebSite, Application and Ecommerce Development Tool Using JavaScript
cms development javascript kooboo magento shopify templates web-application-platform website-builder website-development website-scraper wordpress
Last synced: 24 Mar 2025
https://github.com/kooboo/kooboo
CMS, WebSite, Application and Ecommerce Development Tool Using JavaScript
cms development javascript kooboo magento shopify templates web-application-platform website-builder website-development website-scraper wordpress
Last synced: 08 Apr 2025
https://github.com/erlange/wbm-dl
Wayback Machine Downloader. 🔥 Download your entire archived websites from the Internet Archive Wayback Machine.
command-line-app command-line-parser command-line-tool console console-app console-application csharp internet internet-archive internet-wayback-machine wayback-machine wayback-machine-downloader website-scraper
Last synced: 05 Apr 2025
https://github.com/xarantolus/Collect
A server to collect & archive websites that also supports video downloads
archive self-hosted video-downloader web-archiving webinterface website-archive website-scraper
Last synced: 10 May 2025
https://github.com/LexiestLeszek/scrapeGPT
ScrapeGPT is a RAG-based Telegram bot designed to scrape and analyze websites, then answer questions based on the scraped content. The bot utilizes Retrieval Augmented Generation and webscraping to return natural language answers to the user's queries.
crawler huggingface large-language-models llm ollama proxy rag retrieval-augmented-generation robots-txt scraper telegram-bot website-scraper
Last synced: 07 Apr 2025
https://github.com/lexiestleszek/scrapegpt
ScrapeGPT is a RAG-based Telegram bot designed to scrape and analyze websites, then answer questions based on the scraped content. The bot utilizes Retrieval Augmented Generation and webscraping to return natural language answers to the user's queries.
crawler huggingface large-language-models llm ollama proxy rag retrieval-augmented-generation robots-txt scraper telegram-bot website-scraper
Last synced: 11 Mar 2026
https://github.com/html2rss/html2rss-web
🕸 Generates and delivers RSS feeds via HTTP. Docker image available! Create your own feeds or get started quickly with the included configs.
builder docker feed feed-configs html2rss html2rss-configs roda rolling-release rss rss-aggregator rss-feed rss-feed-scraper ruby scraper serves webfeed webfeeds website-scraper
Last synced: 14 Mar 2025
https://github.com/xarantolus/collect
A server to collect & archive websites that also supports video downloads
archive self-hosted video-downloader web-archiving webinterface website-archive website-scraper
Last synced: 23 Apr 2025
https://github.com/shurco/goClone
🌱 goClone - clone websites in seconds
cloner cloning crawler crawling go goclone golang hacktoberfest scraping scraping-websites scrapper website-cloner website-scraper wp2static
Last synced: 05 May 2025
https://github.com/yuis-ice/jseval
Evaluate JavaScript on a URL through headless Chrome browser.
browser-automation cli-utilities cmdline command-line commandline-interface data-scraping datascraping eval evaluator headless-browser headless-browsers pupeteer scrapers scrapper scrapping web-browser web-crawling web-scrapping webscrapping website-scraper
Last synced: 11 Apr 2025
https://github.com/faheel/file-extensions
JSON collection of scraped file extensions, along with their description and type, from FileInfo.com
file-extensions fileinfo json python3 scraped-data scraper website-scraper
Last synced: 01 Mar 2026
https://github.com/jeanrauwers/followers-scraper-serverless
Now you can keep track of your followers from YouTube, Instagram and Twitter accounts - Followers scraper API on AWS serverless
aws aws-lambda aws-serverless followers-scraper instagram instagram-scraper instagramscraper lambda nodejs-lambda scraper twitter twitter-scraper twittersc typescript webscraper webscraper-api webscraping website-scraper youtube
Last synced: 10 Apr 2025
https://github.com/website-scraper/website-scraper-existing-directory
Plugin for website-scraper which allows to save resources to existing directory
hacktoberfest javascript nodejs website-scraper
Last synced: 22 Apr 2025
https://github.com/orangmuda/SECTOOL
sᴇᴀʀᴄʜ ᴇɴɢɪɴᴇ sᴄʀᴀᴘᴇʀ ᴛᴏᴏʟ (ʙᴀsʜ)
crawler crawling scraper website-scraper
Last synced: 22 May 2026
https://github.com/dsc8x/node-scraper
Scraping websites made easy! A minimalistic yet powerful tool for collecting data from websites.
axios cheerio javascript node scraper scraping website-scraper
Last synced: 01 Apr 2025
https://github.com/dtflare/GPTparser
Use GPTparser with your OpenAI API to scrape & parse files into structured JSON files.
dataset-creation json-mode json-parser openai-api-chatbot website-scraper
Last synced: 14 Mar 2025
https://github.com/nigeld3v/Tumblr_Image_scrape
Download ALL the images (JPEG/GIF/PNG) from any Tumblr website! This project employs Python3 and BeautifulSoup4 to scrape a Tumblr site (with the url provided by the user) to download, page by page, all the images from the Tumblr site's posts. Ideal for archiving other peoples' Tumblrs <3
archive art beautifulsoup beautifulsoup4 blog blogging comics design fashion gif gifs graphics graphics-library image images scraper tumblr tumblr-image-scrape webcomics website-scraper
Last synced: 03 Apr 2025
https://github.com/codassassin/website-url-scraper
This is a website url scraper built using python.
url-finder url-parser website-scanner website-scraper
Last synced: 22 Jul 2025
https://github.com/methyldragon/news-anacrawler
Article Dataset Generator for Internet News Sites. Crawls news sites, analyses them with NLP (sentiment analysis), and pushes to a database.
dataset-generation jupyter-notebook python3 scraping script website-scraper
Last synced: 15 Jul 2025
https://github.com/tnytcoder/url_checker
Python Script To Verify Url Existence And Provide Basic Information
hacking-tool python requests termux termux-hacking termux-tools web-scraping website-scraper
Last synced: 13 Apr 2025
https://github.com/oceanside-chess/email-scraper
Scrape emails from a website using recursive crawling, the best anti-obfuscation techniques, and validate all addresses before saving to a file.
bot email-extraction email-extractor email-scraper email-validation go go-package golang spider web-crawler web-scraper web-scraping web-scraping-software website-scraper
Last synced: 17 Mar 2026
https://github.com/zebbern/reconx
🕷️ | ReconX is a Live-Website Crawler made to gather critical information with an option to take a picture of each site crawled!
crawler hacking information-gathering information-retrieval information-security livedata opsec osint osint-tool pentest python python-crawler search-engine security security-tools website website-crawler website-scraper website-security
Last synced: 03 Jul 2025
https://github.com/hudson-newey/website-text-extractor
This is a project to systematically extract all readable text out of a web page (only works on very primitive pages at the moment)
reader-mode text text-classification text-processing website-scraper
Last synced: 16 Jun 2026
https://github.com/maheshpaulj/Manga-Scraper
Manga Scraping Tool made in python, It fetches the manga page from the website and downloads it in JPG format and saves it locally. This is basically web Scraping
manga-scraper python web-scraping website-scraper
Last synced: 29 Aug 2025
https://github.com/chorozon666/chad
Chad - Dorking / Website Vulnerability Tool
dork dorking dorking-target dorking-tool dorks google python scrapper scrapper-bot scrapper-script sql vulnerability-detection vulnerability-scanners website-scraper wordpress wordpress-site
Last synced: 27 Jan 2026
https://github.com/arif98741/deadlink-checker-python
A Python tool to crawl websites and check for broken/dead links with detailed reporting in both text and PDF formats.
crawler crawling python python3 website-scraper
Last synced: 18 Apr 2026
https://github.com/maheshpaulj/manga-scraper
Manga Scraping Tool made in python, It fetches the manga page from the website and downloads it in JPG format and saves it locally. This is basically web Scraping
manga-scraper python web-scraping website-scraper
Last synced: 15 May 2025
https://github.com/1970mr/link-crawler
Web Link Crawler: A Python script to crawl websites and collect links based on a regex pattern. Efficient and customizable.
clawler crawler crawler-python link-crawler link-crawler-python link-scraper link-scraper-python links python scraper scraper-python website-crawler website-scraper
Last synced: 06 Feb 2026
https://github.com/ccrashzer0/reconnaissance_scanner
This scanner will allow you to collect information an you target site.
linux python-3 python3 reconnaissance script scripting-language scripts udemy-tutorial webscraping website-scraper
Last synced: 17 May 2026
https://github.com/tinram/login-spider
Spider through a website login and process the pages behind it.
log-in login login-spider python scraper spider website website-scraper
Last synced: 17 Aug 2025
https://github.com/maradotwebp/ra-reader
:arrow_down: A Program created to scrap website data from pmg.ages.at, the austrian register of pesticides.
files java ra-reader scraper scraping urlsession website-scraper xml
Last synced: 03 Oct 2025
https://github.com/developer-sumit/web-groper-python
WebGroper is a Python class designed to recursively scrape and download media files (images, PDFs, etc.) from a specified website directory, such as the /wp-content/uploads directory of a WordPress site.
package python-package web-scr web-scraping website-scraper wordpress-scraper wordpress-website-scraper
Last synced: 19 Feb 2026
https://github.com/methyldragon/fb_embedded_comment_scraper
A scraper for gathering data from Facebook's embedded comment widgets for all pages on any number of URLs! It bypasses the Facebook graph API (you don't need an access token) so there's little risk of throttling.
dataset-generation python3 scraping script website-scraper
Last synced: 17 Mar 2025
https://github.com/jmitander/jmscraper
Scrape web pages and effortlessly extract the data you need. Easy, robust, efficient, and intuitively user-friendly.
extract-data extract-media extract-metadata extractor scraping scraping-web scraping-websites webscraper webscraping website-scraper webtool
Last synced: 06 Sep 2025
https://github.com/linusrachlis/fringr2-scraper
Scraper for show info and performance times on the Toronto Fringe website. Used for linusrachlis/fringr2-fe
theater theatre toronto website-scraper
Last synced: 22 Jan 2026
https://github.com/i2rys/smtsaloaw
Simple module to scrape links on a website.
i2rys nodejs-scrape-website-module smtsaloaw website-links-module website-links-scraper website-scraper
Last synced: 24 Aug 2025
https://github.com/tkemza/webscrap
Webscrap, an automated tool for monitoring website responses, built for educational purposes—use responsibly!
website website-scraper websitescanner
Last synced: 28 Feb 2025
https://github.com/sharmadhiraj/web_scraper_php_goutte
Web Scraper with Goutte (PHP)
data-crawling goutte php scraper scrapy-crawler website-scraper
Last synced: 28 Mar 2025
https://github.com/masakudamatsu/site-search-chatbot
A powerful, open-source RAG (Retrieval-Augmented Generation) chatbot designed to provide a "NotebookLM-like" search experience for any specific website. Unlike generic search bars, it understands natural language queries and provides conversational answers with direct citations to the source pages.
embedding rag-chatbot search webapp website-scraper
Last synced: 02 Apr 2026
https://github.com/hoccyy/webgrabb
A simple and efficient tool written in Go for fetching and saving websites.
go golang golang-application golang-examples webscraper webscraping website-scraper
Last synced: 10 Jan 2026
https://github.com/abhinav-codealchemist/parsehub
Website Scraper
java javafx jsoup website-scraper
Last synced: 17 May 2026
https://github.com/youstinus/car-scrape
Scrapes website content, puts to sqlite3 database, downloads preview picture
pprof scraper sqlite3 sqlite3-database webpage-scraper website-scraper
Last synced: 03 Oct 2025
https://github.com/ashutoshsce/nys-liquor-authority
Directly scrape records from https://www.tran.sla.ny.gov/JSP/query/PublicQueryAdvanceSearchPage.jsp and present it in datatable format
datatable expressjs mongodb node nys-liquor-authority puppeteer vuejs website-scraper
Last synced: 27 Jun 2025
https://github.com/david0z/wayback-machine-downloader
Terminal UI tool for collective download of resources from Archive.org Wayback Machine
archive-org console-app console-application internet-archive internet-wayback-machine wayback-machine wayback-machine-downloader website-scraper
Last synced: 29 Oct 2025