Projects in Awesome Lists tagged with webscraper

https://github.com/anaskhan96/soup

Web Scraper in Go, similar to BeautifulSoup

beautifulsoup go golang html-node web-scraper webscraper webscraping

Last synced: 14 May 2025

https://github.com/jaypyles/scraperr

Self-hosted webscraper.

opensource self-hosted webscraper

Last synced: 14 May 2025

https://github.com/jaypyles/Scraperr

Self-hosted webscraper.

opensource self-hosted webscraper

Last synced: 17 Jul 2025

https://github.com/benibela/xidel

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.

cli command-line css-selector curl data-processing datascraping html http httpie json rest scraper web webscraper webscraping wget xml xmlstarlet xpath xquery

Last synced: 15 May 2025

https://github.com/scrapfly/scrapfly-scrapers

Scalable Python web scraping scripts for +40 popular domains

antibot automation captcha-bypass crawler crawling crawling-python datascraping proxies python python-scraper scraper scraping scraping-python spider twitter-scraper web-crawler web-scraping web-scraping-python webscraper webscraping

Last synced: 11 Apr 2025

https://github.com/rootviii/proxy_requests

a class that uses scraped proxies to make http GET/POST requests (Python requests)

http http-get http-getter http-proxy http-proxy-middleware proxy proxy-list proxy-requests proxy-server python python-requests python3 recursion recursion-problem requests requests-module webscraper webscraper-api webscraping

Last synced: 02 Apr 2025

https://github.com/rootVIII/proxy_requests

a class that uses scraped proxies to make http GET/POST requests (Python requests)

http http-get http-getter http-proxy http-proxy-middleware proxy proxy-list proxy-requests proxy-server python python-requests python3 recursion recursion-problem requests requests-module webscraper webscraper-api webscraping

Last synced: 22 Mar 2025

https://github.com/salimk/rcrawler

An R web crawler and scraper

crawler crawlers r rpackage scraper webcrawler webscraper webscraping webscrapping

Last synced: 12 Apr 2025

https://github.com/salimk/Rcrawler

An R web crawler and scraper

crawler crawlers r rpackage scraper webcrawler webscraper webscraping webscrapping

Last synced: 14 Mar 2025

https://github.com/onepointAI/onepoint

An AI assistant tool that integrates coding, writing, and reading functions. For better alternatives see https://monica.im/desktop

ai all-in-one chatgpt coding electron gpt-35-turbo macos react reading toolkit webscraper xiaoai-tts

Last synced: 24 Mar 2025

https://github.com/mehmetozkaya/dotnetcrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

crawler crawling csharp ddd-architecture dotnetcore entity-framework-core htmlagilitypack scraping scrapy scrapy-crawler webcrawler webcrawler-htmlagilitypack webcrawling webscraper webscraping

Last synced: 11 May 2025

https://github.com/mehmetozkaya/DotnetCrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

crawler crawling csharp ddd-architecture dotnetcore entity-framework-core htmlagilitypack scraping scrapy scrapy-crawler webcrawler webcrawler-htmlagilitypack webcrawling webscraper webscraping

Last synced: 18 Apr 2025

https://github.com/tbosak/mkfd

RSS feed builder created with Bun🥖 and Hono🔥- builds from webpages, email folders, and REST API calls.

bun bunjs contributors-welcome docker dockerfile dockerhub feed help-wanted hono honojs mkfd rss rss-generator rssfeed scraper self-hosted typescript webscraper

Last synced: 12 Apr 2025

https://github.com/curiouslearner/geeksforgeeksscrapper

Scrapes g4g and creates PDF

geeksforgeeks hacktoberfest pdf scrapper webscraper webscraping

Last synced: 07 May 2025

https://github.com/aliakhtari78/spotifyscraper

Spotify Scraper to extract all the information from spotify, download mp3 with cover of the song

album-title crawler free infromation preview-mp3 python python3 scraper spotfiy spotify-crawler spotify-downloader spotify-scraper spotify-scraping spotify-songs spotify-web-player webscraper webscraping

Last synced: 09 Apr 2025

https://github.com/nmcassa/letterboxdpy

A letterboxd webscraper

api json letterboxd library movie movies python webscraper

Last synced: 10 May 2026

https://github.com/brandonrobertz/autoscrape-py

An automated, programming-free web scraper for interactive sites

data-journalism scraper selenium webscraper

Last synced: 02 Sep 2025

https://github.com/s-r-e-e-r-a-j/webextractor

WebExtractor is a powerful OSINT and ethical hacking tool developed in Python. It is used to extract email addresses, phone numbers, and links from a target website

bugbounty-tool emailscraper information-gathering information-gathering-tool information-gathering-tools informationgathering linkscraper linux osint osint-python osint-tool osint-tools phonenumber-scrapping reconaissance termux termux-tool webscraper webscraping

Last synced: 29 Apr 2026

https://github.com/A-Wheeto/Dashboard

A tkinter GUI collating various data

apis dashboard gui tkinter webscraper webscraping

Last synced: 29 Mar 2025

https://github.com/mirkoschubert/gdpr-cli

A command line tool for checking your website for GDPR compliance.

cli command-line-tool dsgvo gdpr metadata webscraper

Last synced: 04 May 2025

https://github.com/makeyourlifeeasier/wuxiaworld-2-ebook

This Python script will download chapters from novels availaible on wuxiaworld.com saves then into the .epub format

ebook ebook-downloader epub python python-3-6 webnovel webscraper wuxiaworld

Last synced: 09 Apr 2025

https://github.com/giuseppegambino/Scraping-TripAdvisor-with-Python-2020

Python implementation of web scraping of TripAdvisor with Selenium in a new 2019 website

python selenium tripadvisor tripadvisor-scraper tripadvisorreview webscraper webscraper-website webscraping

Last synced: 08 Apr 2025

https://github.com/zoranpandovski/bookingscraper

:earth_americas: :hotel: Scrape Booking.com :hotel: :earth_americas:

beautifulsoup booking python3 request scraper web-scraping webscraper webscraping

Last synced: 20 Sep 2025

https://github.com/tech-engine/goscrapy

GoScrapy: Harnessing Go's power for blazingly fast web scraping, inspired by Python's Scrapy framework.

data-extraction go-scrapy golang goscraper scrapy spider web-crawler webscraper webscrapping

Last synced: 18 Jan 2026

https://github.com/3xploitguy/webscrape

A web scraper to scrape email's and phone numbers from Websites.

bash-script webscraper webscraping

Last synced: 16 Aug 2025

https://github.com/jonathanvusich/pcpartpicker

This is an unofficial API for the website pcpartpicker.com.

pcpartpicker pip pypi python3 webscraper

Last synced: 27 Mar 2025

https://github.com/JonathanVusich/pcpartpicker

This is an unofficial API for the website pcpartpicker.com.

pcpartpicker pip pypi python3 webscraper

Last synced: 07 Apr 2025

https://github.com/daijro/searchifyx

Fast flashcard searcher study tool

education quizizz quizlet scraper webscraper webscraping

Last synced: 10 Jul 2025

https://github.com/boringppl/linkedin-profiles-scraping

Automatically scrape the web data of people profiles on Linkedin based on a specific search query

beautifulsoup beautifulsoup4 python python3 selenium selenium-webdriver webscraper webscraping webscraping-data webscrapper webscrapping

Last synced: 06 Mar 2026

https://github.com/daijro/SearchifyX

Fast flashcard searcher study tool

education quizizz quizlet scraper webscraper webscraping

Last synced: 08 Jul 2025

https://github.com/nunux-keeper/keeper-core-api

Nunux Keeper core API

content-curation content-management nodejs restful-api webscraper webscraping

Last synced: 05 Apr 2025

https://github.com/ryfeus/gcf-packs

Library packs for google cloud functions

chrome chromium cloud functions gcf gcf-packs gcp google numpy pandas selenium serverless serverless-architectures tensorflow webscraper

Last synced: 14 Oct 2025

https://github.com/daijro/EssayGen

Essay generator

bot essay essay-generation python scraper tor webscraper webscraping

Last synced: 18 Jul 2025

https://github.com/anouarbensaad/wsvuls

wsvuls - website vulnerability scanner detect issues [ outdated server software and insecure HTTP headers.]

detection issues-tracker scanner tracker vulnerability vulnerability-scanners webscraper

Last synced: 17 Jun 2025

https://github.com/xtream1101/scraperx

Library for scraping websites or apis at any scale

framework library python python-library scrapers webscraper

Last synced: 04 Jul 2025

https://github.com/daijro/essaygen

Essay generator

bot essay essay-generation python scraper tor webscraper webscraping

Last synced: 21 Jul 2025

https://github.com/build-on-aws/bedrock-agents-webscraper

This repo provides guidance on setting up a bedrock agent to webscrape and internet search via action groups

ai-agent amazon-bedrock bedrock-agent claude internet-search webscraper

Last synced: 10 Apr 2025

https://github.com/dusanmadar/scrapemeagain

Yet another Python web scraping application

anonymity asyncio docker privoxy python sqlite tor webscraper

Last synced: 24 Apr 2025

https://github.com/architrixs/wattpad2epub

Python Script to Scrape Wattpad Story and convert to Epub and html file. Easiest to use.

beautifulsoup4 ebooks epub html pypandoc pyperclip python python-script pythonscript requests wattpad wattpad-book wattpad-download wattpad-epub webscraper webscraping

Last synced: 21 Mar 2025

https://github.com/j4nn0/linkedin-web-scraper

Python Web Scraper for LinkedIn to collect and store company data (e.g. name, description, industry, etc.) into .xls file

openpyxl python-excel python-web-scraper scraper scraping-websites scrapy scrapy-crawler scrapy-demo scrapy-spider scrapy-tutorial selenium selenium-python selenium-webdriver webscraper webscraper-api webscraper-website webscraping webscraping-search

Last synced: 14 Aug 2025

https://github.com/marcel0024/cococrawler

An declarative and easy to use web crawler and scraper in C#

cococrawler crawler crawling-tool csharp dotnet dotnetcore scraper scraping-tool webcrawler webcrawler-csharp webcrawling webscraper

Last synced: 10 Apr 2025

https://github.com/areebbeigh/anime-scraper

[partially working] Scrape and add anime episode stream URLs to uGet (Linux) or IDM (Windows) ~ Python3

anime anime-downloader anime-scraper chrome chrome-devtools downloader fun idm otaku python3 scraping stream stream-url streaming uget webscraper webscraping

Last synced: 25 Mar 2025

https://github.com/packet-sent/web-scrape-worker

Web scraper using Cloudflare Workers

cloudflare cloudflare-worker cloudflare-workers nodejs scraper webscraper

Last synced: 18 Jan 2026

https://github.com/paulseperformance/cryptopanic_scraper

cryptopanic news feed scraper

chromedriver crypto jupyter notebook python python3 selenium selenium-python selenium-webdriver webscraper webscraping

Last synced: 14 Apr 2025

https://github.com/oshekharo/link-crawler

Get movie & Tv Shows stream by IMDB, TMDB.

imdb-api oshekher tmdb-api webscraper

Last synced: 26 Feb 2026

https://github.com/henr1ko/pixthief

Stealthy .NET 8 console tool that crawls pages or whole domains and downloads images with optional format conversion.

console-application crawler csharp dotnet http-client image-downloader web-scraper web-scraping webscraper win-x64 windows

Last synced: 15 May 2026

https://github.com/FastestMolasses/fBrowser

Helpful Selenium functions to make web-scraping easier and faster

helper-functions python3 selenium selenium-functions selenium-python webscraper webscraping

Last synced: 09 Jul 2025

https://github.com/PadishahIII/SecretScraper

SecretScraper is a web scraper that crawl through target websites, scrape from http response and extract secret information via regular expression.

crawler cyper hyperscan pentest-tool pentesting python sensitivity-analysis webscraper

Last synced: 29 Jul 2025

https://github.com/ccrsxx/autobsi

Attendance script for My Best BSI

automation chromedriver python schedule selenium webscraper

Last synced: 05 Sep 2025

https://github.com/fastestmolasses/fbrowser

Helpful Selenium functions to make web-scraping easier and faster

helper-functions python3 selenium selenium-functions selenium-python webscraper webscraping

Last synced: 24 Mar 2025

https://github.com/antonengelhardt/kicktipp-bot

A bot which can submit tips for a Kicktipp competition based on quotes.

bot kicktipp python selenium selenium-python sportbetting sports webscraper

Last synced: 10 Apr 2025

https://github.com/jeanrauwers/followers-scraper-serverless

Now you can keep track of your followers from YouTube, Instagram and Twitter accounts - Followers scraper API on AWS serverless

aws aws-lambda aws-serverless followers-scraper instagram instagram-scraper instagramscraper lambda nodejs-lambda scraper twitter twitter-scraper twittersc typescript webscraper webscraper-api webscraping website-scraper youtube

Last synced: 10 Apr 2025

https://github.com/scrape-do/scrapedo-scrapers

Web scraping examples with Scrape.do 😎

antibot crawler datascraping proxies python scraper spider web-crawler web-scraping webscraper

Last synced: 12 Jun 2026

https://github.com/lntechnical2/webscrap-bot

Simple Telegram webscrap bot

telegram-bot webscraper

Last synced: 09 Jul 2025

https://github.com/code-yeongyu/trackpurchase

단 몇줄의 코드로 다양한 쇼핑 플랫폼에서 결제 내역을 긁어오자!

crawlwer puppeteer webcrawler webscraper webscraping

Last synced: 14 Aug 2025

https://github.com/hhhrrrttt222111/selenium_python

beatifulsoup beautifulsoup-library beautifulsoup4 chromedriver geckodriver hacktoberfest pycharm-ide python python-parser python-requests python-scraper requests-html scraping-python scraping-websites selenium selenium-python selenium-webdriver spiders webscraper webscraping

Last synced: 23 Oct 2025

https://github.com/dotnize/moodle-scrape

Easily scrape data from Moodle LMS sites

javascript lms moodle moodle-scrape moodle-scraper nodejs scraper web-scraper webscraper

Last synced: 15 Mar 2026

https://github.com/agenty/scrapingai

Build web scraping agents using AI to auto-extract the data from websites, capture screenshot, generate pdf from URL and web crawling with Agenty

crawler crawling datascraping extract-data scraping webscraper webscraping

Last synced: 12 Apr 2025

https://github.com/gamemann/how-to-use-selenium-and-beautifulsoup

A full lab and how-to guide on how to use Selenium paired with Beautiful Soup to parse and extract data from a website using Python.

beautifulsoup beautifulsoup4 bs4 firefox geckodriver node nodejs python react selenium selenium-python selenium-webdriver webscraper webscraping

Last synced: 25 Sep 2025

https://github.com/zoranpandovski/prodirectscraper

:necktie: Web scraper for http://www.prodirectselect.com/ :shoe:

python scraper scrapy scrapy-crawler scrapy-spider spider webscraper webscraping

Last synced: 07 Aug 2025

https://github.com/ddayto21/lead-scraper

Repository contains a web crawler that searches for emails in a webpage, along with a webscraping script that collects leads from various webpages online filters those links based on some criteria and adds the new links to a queue. All the HTML or some specific information is extracted to be processed by a different pipeline.

beautifulsoup4 python requests webcrawler webscraper yellow-pages

Last synced: 03 Sep 2025

https://github.com/gamemann/How-To-Use-Selenium-And-BeautifulSoup

A full lab and how-to guide on how to use Selenium paired with Beautiful Soup to parse and extract data from a website using Python.

beautifulsoup beautifulsoup4 bs4 firefox geckodriver node nodejs python react selenium selenium-python selenium-webdriver webscraper webscraping

Last synced: 24 Oct 2025

https://github.com/ebeagusamuel/ruby_capstone

A simple web scraper built with Ruby and the Nokogiri gem. It crawls a certain website and gets the prices and other data of cryptocurrencies. Rspec was used for testing.

nokogiri rspec ruby webscraper

Last synced: 17 Mar 2025

https://github.com/znitche/ao3-web-reader

ao3 scraper with web interface

ao3 python3 webscraper

Last synced: 16 Jan 2026

https://github.com/geminidsystems/googlenewsscraper

A Python package that scrapes Google News article data while remaining undetected by Google. Our scraper can scrape page data up until the last page and never trigger a CAPTCHA (download stats: https://pepy.tech/project/GoogleNewsScraper)

crawler googleautomator googlenews googlenewsscraper googlescraper python scraper scraping selenium web-scraping webcrawler webdriver webscraper

Last synced: 13 Aug 2025

https://github.com/mustansirzia/serverless-link-preview

A serverless, scalable service to get website description and preview deployed on AWS Lambda.

aws aws-lambda expressjs faas javascript microservice nodejs serverless up webscraper

Last synced: 28 Jul 2025

https://github.com/dimitryzub/ecommerce-scraper-py

Scrape ecommerce websites such as Amazon, eBay, Walmart, Home Depot, Google Shopping from a single module in Python🐍

data datamining ecommerce ecommerce-website python python3 selectolax selenium serpapi webscraper webscraping

Last synced: 03 Sep 2025

https://github.com/armour/pixiv-spider

🕷 哇来扒一扒p站hhhh (Web scraper for pixiv)

jarvis pixiv python spider webscraper

Last synced: 22 Jul 2025

https://github.com/mandarwagh9/web-scraper

This project is a Flask-based web application designed to scrape various types of content from a specified URL.

flask flask-application pytho webscraper webscraping

Last synced: 12 Oct 2025

https://github.com/dimitryzub/webscraping-py

Web Scraping scripts for all Google, other search engines, and other websites (currently outdated, something may not be working).

api bs4 data google-maps-api googleapi googlescraping googlesearchapi lxml parsel playwright python requests scraper scraping scrapy selenium webscraper webscraping webscraping-data webscraping-search

Last synced: 12 Aug 2025

https://github.com/eugen1j/aioscrapy

Python asynchronous library for web scrapping

asyncio crawler python-crawler python37 webscraper

Last synced: 09 Oct 2025

https://github.com/sukhcha-in/dart_web_scraper

Powerful, easy-to-use scraper for web pages and APIs. Chain parsers and transforms to extract exactly the data you need.

htmlparser jsonparser parser parsing scraper scraping webscraper webscraping

Last synced: 22 Oct 2025

https://github.com/sabsar42/google-map-scrapper-streamlit-web

Google Map Business Scraper

business google-maps webscraper

Last synced: 14 Oct 2025

https://github.com/bjoern-hempel/php-web-crawler

A php class that crawls a given url and collects recursively some data from it. The final representation will be a json object.

crawler mit-license php recursive webcrawler webscraper xpath

Last synced: 11 Apr 2025

https://github.com/0xarchit/duckduckgo-webscraper

Python based basic webscraper that uses rotating proxies from free working only proxylists (updates every 60minutes): https://webscrape.0xcloud.workers.dev/?key=test&query=

proxy proxy-list proxylist webscraper webscraping

Last synced: 24 Dec 2025

https://github.com/pptacher/web_scraper

book an appointment in city hall in website paris.fr to have your passport/id issued. Reservez en quelques minutes votre rendez-vous dans une mairie d'arrondissement a Paris pour déposer votre demande de passeport ou carte d'identité.

beautifulsoup browser-automation http libcurl paris re2 selenium webscraper webscraping

Last synced: 12 Apr 2025

https://github.com/palahsu/ScraperImages

Simple Web Images Scraper From Websites!

google-photos-download google-scraper google-scraping image-scraper imagescraper imagescraping scraper scraping scraping-python web-scraper webscraper webscraper-website

Last synced: 09 Jul 2025

https://github.com/reppon97/soccer-data-api

An easy-to-use python web-scrap package that gets json soccer (football) data/stats.

beautifulsoup football football-data leagues python python3 soccer webscraper webscraping

Last synced: 06 Apr 2026

https://github.com/palahsu/scraperimages

Simple Web Images Scraper From Websites!

google-photos-download google-scraper google-scraping image-scraper imagescraper imagescraping scraper scraping scraping-python web-scraper webscraper webscraper-website

Last synced: 24 Apr 2025

https://github.com/ikp4success/shopasource

Easiest way to find best lowest price products online.

async celery collection css data data-mining flask flask-sqlalchemy html javascript json postgresql python python3 quart scrapy spider spiders webscraper webscraping

Last synced: 07 Sep 2025

https://github.com/viveckh/qarecewebcrawler

This web crawler gathers the latest details, variations, imagery and pricing informations of a catalog of products given their urls from their corresponding online stores and prepares files ready for upload to your e-commerce platfrom. It was built with the purposes of making product additions easier for e-commerce retailers.

catalog e-commerce ecommerce-platfrom ecommerce-retailers find-products imagery kyliejenner macys pricing-informations qarece-web-crawler scrapy sephora startup-code startup-resources startup-template startups webscraper woocommerce woocommerce-extension wordpress-plugin

Last synced: 19 Mar 2025

https://github.com/zembrodt/pymdb

Python package to both parse datsets provided by IMDb and scrape information from imdb.com

actor actress api cinema composer director film imdb imdb-api imdb-dataset imdb-movies movie-database moviedb-api movies movies-api pymdb tvdb webscraper webscrapping writer

Last synced: 07 Apr 2026

https://github.com/swapnanildutta/coronavirusdatabase

I have used web scraping to collect the data and stored it into a .json file and further using the .json file to add to SQLite Database and also trying to make an API using Flask.

api hacktoberfest hacktoberfest2020 python3 sqlite webscraper

Last synced: 25 Apr 2025

https://github.com/xvertile/flux

Flux is a powerful tool designed to monitor proxy providers across the industry, analyzing response times, uptime, outgoing IPs, and more. With Flux, you can uncover the true performance and integrity of proxy providers, ensuring you're working with reliable data and not falling for misleading claims.

big-data monitoring proxies proxies-scraper residential-proxy webscraper webscraping

Last synced: 18 Mar 2025

https://github.com/ahmard/queliwrap

QueryList PHP web scrapper wrapper

php querylist webcrawler webscraper

Last synced: 18 Mar 2025

https://github.com/knmn2000/manipalfeesfiasco

An autoretweet bot to help #ManipalFeesFiasco reach trending.

automatic autoretweet bot python scraper scraping scrapy selenium tweet twitter webscraper

Last synced: 11 Apr 2025

https://github.com/ishan-surana/metadatascraper

MetaDataScraper is a Python package designed to automate the extraction of follower counts and post details from a public Facebook page. It uses Selenium WebDriver for web automation and scraping. Official documentation at https://metadatascraper.readthedocs.io

facebook meta no-api no-login python-library python-package scraper webscraper