An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with webscraper

A curated list of projects in awesome lists tagged with webscraper .

https://github.com/anaskhan96/soup

Web Scraper in Go, similar to BeautifulSoup

beautifulsoup go golang html-node web-scraper webscraper webscraping

Last synced: 14 May 2025

https://github.com/jaypyles/scraperr

Self-hosted webscraper.

opensource self-hosted webscraper

Last synced: 14 May 2025

https://github.com/jaypyles/Scraperr

Self-hosted webscraper.

opensource self-hosted webscraper

Last synced: 17 Jul 2025

https://github.com/benibela/xidel

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.

cli command-line css-selector curl data-processing datascraping html http httpie json rest scraper web webscraper webscraping wget xml xmlstarlet xpath xquery

Last synced: 15 May 2025

https://github.com/onepointAI/onepoint

An AI assistant tool that integrates coding, writing, and reading functions. For better alternatives see https://monica.im/desktop

ai all-in-one chatgpt coding electron gpt-35-turbo macos react reading toolkit webscraper xiaoai-tts

Last synced: 24 Mar 2025

https://github.com/mehmetozkaya/dotnetcrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

crawler crawling csharp ddd-architecture dotnetcore entity-framework-core htmlagilitypack scraping scrapy scrapy-crawler webcrawler webcrawler-htmlagilitypack webcrawling webscraper webscraping

Last synced: 11 May 2025

https://github.com/mehmetozkaya/DotnetCrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

crawler crawling csharp ddd-architecture dotnetcore entity-framework-core htmlagilitypack scraping scrapy scrapy-crawler webcrawler webcrawler-htmlagilitypack webcrawling webscraper webscraping

Last synced: 18 Apr 2025

https://github.com/tbosak/mkfd

RSS feed builder created with Bun🥖 and Hono🔥- builds from webpages, email folders, and REST API calls.

bun bunjs contributors-welcome docker dockerfile dockerhub feed help-wanted hono honojs mkfd rss rss-generator rssfeed scraper self-hosted typescript webscraper

Last synced: 12 Apr 2025

https://github.com/brandonrobertz/autoscrape-py

An automated, programming-free web scraper for interactive sites

data-journalism scraper selenium webscraper

Last synced: 02 Sep 2025

https://github.com/A-Wheeto/Dashboard

A tkinter GUI collating various data

apis dashboard gui tkinter webscraper webscraping

Last synced: 29 Mar 2025

https://github.com/mirkoschubert/gdpr-cli

A command line tool for checking your website for GDPR compliance.

cli command-line-tool dsgvo gdpr metadata webscraper

Last synced: 04 May 2025

https://github.com/makeyourlifeeasier/wuxiaworld-2-ebook

This Python script will download chapters from novels availaible on wuxiaworld.com saves then into the .epub format

ebook ebook-downloader epub python python-3-6 webnovel webscraper wuxiaworld

Last synced: 09 Apr 2025

https://github.com/zoranpandovski/bookingscraper

:earth_americas: :hotel: Scrape Booking.com :hotel: :earth_americas:

beautifulsoup booking python3 request scraper web-scraping webscraper webscraping

Last synced: 20 Sep 2025

https://github.com/giuseppegambino/Scraping-TripAdvisor-with-Python-2020

Python implementation of web scraping of TripAdvisor with Selenium in a new 2019 website

python selenium tripadvisor tripadvisor-scraper tripadvisorreview webscraper webscraper-website webscraping

Last synced: 08 Apr 2025

https://github.com/3xploitguy/webscrape

A web scraper to scrape email's and phone numbers from Websites.

bash-script webscraper webscraping

Last synced: 16 Aug 2025

https://github.com/JonathanVusich/pcpartpicker

This is an unofficial API for the website pcpartpicker.com.

pcpartpicker pip pypi python3 webscraper

Last synced: 07 Apr 2025

https://github.com/jonathanvusich/pcpartpicker

This is an unofficial API for the website pcpartpicker.com.

pcpartpicker pip pypi python3 webscraper

Last synced: 27 Mar 2025

https://github.com/daijro/searchifyx

Fast flashcard searcher study tool

education quizizz quizlet scraper webscraper webscraping

Last synced: 10 Jul 2025

https://github.com/s-r-e-e-r-a-j/webextractor

WebExtractor is a powerful OSINT and ethical hacking tool developed in Python. It is used to extract email addresses, phone numbers, and links (both visible and hidden) from a target website

bugbounty-tool emailscraper linkscraper linux osint osint-python osint-tool phonenumber-scrapping termux termux-tool webscraper webscraping

Last synced: 26 Jun 2025

https://github.com/boringppl/linkedin-profiles-scraping

Automatically scrape the web data of people profiles on Linkedin based on a specific search query

beautifulsoup beautifulsoup4 python python3 selenium selenium-webdriver webscraper webscraping webscraping-data webscrapper webscrapping

Last synced: 29 Jul 2025

https://github.com/daijro/SearchifyX

Fast flashcard searcher study tool

education quizizz quizlet scraper webscraper webscraping

Last synced: 08 Jul 2025

https://github.com/anouarbensaad/wsvuls

wsvuls - website vulnerability scanner detect issues [ outdated server software and insecure HTTP headers.]

detection issues-tracker scanner tracker vulnerability vulnerability-scanners webscraper

Last synced: 17 Jun 2025

https://github.com/xtream1101/scraperx

Library for scraping websites or apis at any scale

framework library python python-library scrapers webscraper

Last synced: 04 Jul 2025

https://github.com/build-on-aws/bedrock-agents-webscraper

This repo provides guidance on setting up a bedrock agent to webscrape and internet search via action groups

ai-agent amazon-bedrock bedrock-agent claude internet-search webscraper

Last synced: 10 Apr 2025

https://github.com/dusanmadar/scrapemeagain

Yet another Python web scraping application

anonymity asyncio docker privoxy python sqlite tor webscraper

Last synced: 24 Apr 2025

https://github.com/architrixs/wattpad2epub

Python Script to Scrape Wattpad Story and convert to Epub and html file. Easiest to use.

beautifulsoup4 ebooks epub html pypandoc pyperclip python python-script pythonscript requests wattpad wattpad-book wattpad-download wattpad-epub webscraper webscraping

Last synced: 21 Mar 2025

https://github.com/areebbeigh/anime-scraper

[partially working] Scrape and add anime episode stream URLs to uGet (Linux) or IDM (Windows) ~ Python3

anime anime-downloader anime-scraper chrome chrome-devtools downloader fun idm otaku python3 scraping stream stream-url streaming uget webscraper webscraping

Last synced: 25 Mar 2025

https://github.com/FastestMolasses/fBrowser

Helpful Selenium functions to make web-scraping easier and faster

helper-functions python3 selenium selenium-functions selenium-python webscraper webscraping

Last synced: 09 Jul 2025

https://github.com/PadishahIII/SecretScraper

SecretScraper is a web scraper that crawl through target websites, scrape from http response and extract secret information via regular expression.

crawler cyper hyperscan pentest-tool pentesting python sensitivity-analysis webscraper

Last synced: 29 Jul 2025

https://github.com/ccrsxx/autobsi

Attendance script for My Best BSI

automation chromedriver python schedule selenium webscraper

Last synced: 05 Sep 2025

https://github.com/fastestmolasses/fbrowser

Helpful Selenium functions to make web-scraping easier and faster

helper-functions python3 selenium selenium-functions selenium-python webscraper webscraping

Last synced: 24 Mar 2025

https://github.com/antonengelhardt/kicktipp-bot

A bot which can submit tips for a Kicktipp competition based on quotes.

bot kicktipp python selenium selenium-python sportbetting sports webscraper

Last synced: 10 Apr 2025

https://github.com/code-yeongyu/trackpurchase

단 몇줄의 코드로 다양한 쇼핑 플랫폼에서 결제 내역을 긁어오자!

crawlwer puppeteer webcrawler webscraper webscraping

Last synced: 14 Aug 2025

https://github.com/lntechnical2/webscrap-bot

Simple Telegram webscrap bot

telegram-bot webscraper

Last synced: 09 Jul 2025

https://github.com/agenty/scrapingai

Build web scraping agents using AI to auto-extract the data from websites, capture screenshot, generate pdf from URL and web crawling with Agenty

crawler crawling datascraping extract-data scraping webscraper webscraping

Last synced: 12 Apr 2025

https://github.com/ddayto21/lead-scraper

Repository contains a web crawler that searches for emails in a webpage, along with a webscraping script that collects leads from various webpages online filters those links based on some criteria and adds the new links to a queue. All the HTML or some specific information is extracted to be processed by a different pipeline.

beautifulsoup4 python requests webcrawler webscraper yellow-pages

Last synced: 03 Sep 2025

https://github.com/zoranpandovski/prodirectscraper

:necktie: Web scraper for http://www.prodirectselect.com/ :shoe:

python scraper scrapy scrapy-crawler scrapy-spider spider webscraper webscraping

Last synced: 07 Aug 2025

https://github.com/gamemann/how-to-use-selenium-and-beautifulsoup

A full lab and how-to guide on how to use Selenium paired with Beautiful Soup to parse and extract data from a website using Python.

beautifulsoup beautifulsoup4 bs4 firefox geckodriver node nodejs python react selenium selenium-python selenium-webdriver webscraper webscraping

Last synced: 25 Sep 2025

https://github.com/gamemann/How-To-Use-Selenium-And-BeautifulSoup

A full lab and how-to guide on how to use Selenium paired with Beautiful Soup to parse and extract data from a website using Python.

beautifulsoup beautifulsoup4 bs4 firefox geckodriver node nodejs python react selenium selenium-python selenium-webdriver webscraper webscraping

Last synced: 24 Oct 2025

https://github.com/ebeagusamuel/ruby_capstone

A simple web scraper built with Ruby and the Nokogiri gem. It crawls a certain website and gets the prices and other data of cryptocurrencies. Rspec was used for testing.

nokogiri rspec ruby webscraper

Last synced: 17 Mar 2025

https://github.com/oshekharo/link-crawler

Get movie & Tv Shows stream by IMDB, TMDB.

imdb-api oshekher tmdb-api webscraper

Last synced: 23 Feb 2025

https://github.com/mustansirzia/serverless-link-preview

A serverless, scalable service to get website description and preview deployed on AWS Lambda.

aws aws-lambda expressjs faas javascript microservice nodejs serverless up webscraper

Last synced: 28 Jul 2025

https://github.com/geminidsystems/googlenewsscraper

A Python package that scrapes Google News article data while remaining undetected by Google. Our scraper can scrape page data up until the last page and never trigger a CAPTCHA (download stats: https://pepy.tech/project/GoogleNewsScraper)

crawler googleautomator googlenews googlenewsscraper googlescraper python scraper scraping selenium web-scraping webcrawler webdriver webscraper

Last synced: 13 Aug 2025

https://github.com/armour/pixiv-spider

🕷 哇来扒一扒p站hhhh (Web scraper for pixiv)

jarvis pixiv python spider webscraper

Last synced: 22 Jul 2025

https://github.com/dimitryzub/ecommerce-scraper-py

Scrape ecommerce websites such as Amazon, eBay, Walmart, Home Depot, Google Shopping from a single module in Python🐍

data datamining ecommerce ecommerce-website python python3 selectolax selenium serpapi webscraper webscraping

Last synced: 03 Sep 2025

https://github.com/mandarwagh9/web-scraper

This project is a Flask-based web application designed to scrape various types of content from a specified URL.

flask flask-application pytho webscraper webscraping

Last synced: 12 Oct 2025

https://github.com/sukhcha-in/dart_web_scraper

Powerful, easy-to-use scraper for web pages and APIs. Chain parsers and transforms to extract exactly the data you need.

htmlparser jsonparser parser parsing scraper scraping webscraper webscraping

Last synced: 22 Oct 2025

https://github.com/eugen1j/aioscrapy

Python asynchronous library for web scrapping

asyncio crawler python-crawler python37 webscraper

Last synced: 09 Oct 2025

https://github.com/dimitryzub/webscraping-py

Web Scraping scripts for all Google, other search engines, and other websites (currently outdated, something may not be working).

api bs4 data google-maps-api googleapi googlescraping googlesearchapi lxml parsel playwright python requests scraper scraping scrapy selenium webscraper webscraping webscraping-data webscraping-search

Last synced: 12 Aug 2025

https://github.com/bjoern-hempel/php-web-crawler

A php class that crawls a given url and collects recursively some data from it. The final representation will be a json object.

crawler mit-license php recursive webcrawler webscraper xpath

Last synced: 11 Apr 2025

https://github.com/0xarchit/duckduckgo-webscraper

Python based basic webscraper that uses rotating proxies from free working only proxylists (updates every 60minutes): https://webscrape.0xcloud.workers.dev/?key=test&query=

proxy proxy-list proxylist webscraper webscraping

Last synced: 24 Dec 2025

https://github.com/pptacher/web_scraper

book an appointment in city hall in website paris.fr to have your passport/id issued. Reservez en quelques minutes votre rendez-vous dans une mairie d'arrondissement a Paris pour déposer votre demande de passeport ou carte d'identité.

beautifulsoup browser-automation http libcurl paris re2 selenium webscraper webscraping

Last synced: 12 Apr 2025

https://github.com/reppon97/soccer-data-api

An easy-to-use python web-scrap package that gets json soccer (football) data/stats.

beautifulsoup football football-data leagues python python3 soccer webscraper webscraping

Last synced: 04 Oct 2025

https://github.com/viveckh/qarecewebcrawler

This web crawler gathers the latest details, variations, imagery and pricing informations of a catalog of products given their urls from their corresponding online stores and prepares files ready for upload to your e-commerce platfrom. It was built with the purposes of making product additions easier for e-commerce retailers.

catalog e-commerce ecommerce-platfrom ecommerce-retailers find-products imagery kyliejenner macys pricing-informations qarece-web-crawler scrapy sephora startup-code startup-resources startup-template startups webscraper woocommerce woocommerce-extension wordpress-plugin

Last synced: 19 Mar 2025

https://github.com/swapnanildutta/coronavirusdatabase

I have used web scraping to collect the data and stored it into a .json file and further using the .json file to add to SQLite Database and also trying to make an API using Flask.

api hacktoberfest hacktoberfest2020 python3 sqlite webscraper

Last synced: 25 Apr 2025

https://github.com/xvertile/flux

Flux is a powerful tool designed to monitor proxy providers across the industry, analyzing response times, uptime, outgoing IPs, and more. With Flux, you can uncover the true performance and integrity of proxy providers, ensuring you're working with reliable data and not falling for misleading claims.

big-data monitoring proxies proxies-scraper residential-proxy webscraper webscraping

Last synced: 18 Mar 2025

https://github.com/ahmard/queliwrap

QueryList PHP web scrapper wrapper

php querylist webcrawler webscraper

Last synced: 18 Mar 2025

https://github.com/knmn2000/manipalfeesfiasco

An autoretweet bot to help #ManipalFeesFiasco reach trending.

automatic autoretweet bot python scraper scraping scrapy selenium tweet twitter webscraper

Last synced: 11 Apr 2025

https://github.com/raminmammadzada/ruby-web-scraper

A web scraper for helping individuals or companies to find out which kind of products are recommended more than others in specific category.

rspec ruby scraper webscraper

Last synced: 22 Jun 2025

https://github.com/ishan-surana/metadatascraper

MetaDataScraper is a Python package designed to automate the extraction of follower counts and post details from a public Facebook page. It uses Selenium WebDriver for web automation and scraping. Official documentation at https://metadatascraper.readthedocs.io

facebook meta no-api no-login python-library python-package scraper webscraper

Last synced: 28 Apr 2025

https://github.com/oliverhennhoefer/quant

UNMAINTAINED | R-package providing access to fundamental data and valuation metrics for thousands of publicly traded companies worldwide.

finance fundamental-analysis gurufocus market-research quantitative-analysis quantitative-finance quantitative-research r r-package r-programming r-stats stock stock-analysis stock-data stock-market stock-trading stocks tipranks webscraper webscraping

Last synced: 13 Apr 2025

https://github.com/mertgunduz/proxified

Proxified is a proxy-list creator that scrapes the data from the hidemy.name.

csharp dotnet proxy webscraper

Last synced: 28 Aug 2025

https://github.com/funbeedev/trelloboardsautoexport

A web scraper script to automate JSON export of Trello boards. Useful for backing up all your Trello boards automatically.

autoschedule crontab python selenium-python trello trello-boards webscraper

Last synced: 21 Jun 2025

https://github.com/deveshsangwan/image-scraper

This is a python program to downloads images from the given webpage.

chromedriver edgedriver hacktoberfest hacktoberfest2021 image-scraper python python3 selenium-webdriver webscraper

Last synced: 28 Oct 2025

https://github.com/sjrusso8/pga-scraper

🐍 Scrapy crawler for the PGA 2020 stats

golf python scrapy-crawler stats tabular-data webscraper

Last synced: 13 Apr 2025

https://github.com/amajji/web-scraping-with-scrapy-

This project aims to scrap a US government website using the Scrapy framework

scraper scraping scraping-websites scrapper scrapy webscraper webscraping

Last synced: 24 Sep 2025

https://github.com/michaelcurrin/html-screenshot-py

Take fullpage screenshots for a batch of URLs with this easy CLI tool

html image python screenshot selenium webscraper

Last synced: 21 Mar 2025

https://github.com/milind220/yahoo-scraper

A Python webscraper that gets data on the Hang Seng Index from Yahoo Finance and outputs it into an Excel file.

beautifulsoup4 python3 requests-python webscraper webscraping yahoo-finance

Last synced: 26 Oct 2025

https://github.com/lennart1978/picturescrape

Collect pictures from a website and download them - powered by Colly and Fyne GUI.

golang image webscraper

Last synced: 05 May 2025

https://github.com/manigandand/goscrapper

Web scrapper Go

go golang webscraper

Last synced: 25 Feb 2025

https://github.com/dimitryzub/py-google-scholar-organic-cite-to-csv-sqlite

Scrape historic Google Scholar Organic and Cite results to CSV, MySQL Lite using Python and SerpApi.

csv data dataextraction datamining datascience datascraping dataset google googlescholar python scraper serpapi sqlite webscraper webscraping

Last synced: 14 Aug 2025

https://github.com/grayhat12/gray-scraper

INSTAGRAM COMMENT SCRAPER FOR A POST

instagram py selenium-python selenium-webdriver webscraper

Last synced: 18 Oct 2025

https://github.com/aditeyabaral/newsnow

Automated document merging and extractive summarization of news articles

news nlp python-3 python-3-6 python3 python36 sklearn sklearn-vectorizer text-processing webscrape webscraper webscraping

Last synced: 09 Apr 2025