An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with web-scraping

A curated list of projects in awesome lists tagged with web-scraping .

https://github.com/scrapy/scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

crawler crawling framework hacktoberfest python scraping web-scraping web-scraping-python

Last synced: 05 Jan 2026

https://github.com/dgtlmoon/changedetection.io

The best and simplest free open source web page change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monitor which websites had a text change for free. Free Open source web page change detection, Website defacement monitoring, Price change notification

back-in-stock change-alert change-detection change-monitoring changedetection monitoring notifications restock-monitor self-hosted url-monitor web-scraping website-change-detection website-change-detector website-change-monitor website-change-notification website-change-tracker website-defacement-monitoring website-monitor website-monitoring website-watcher

Last synced: 12 May 2025

https://github.com/apifytech/apify-js

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

apify automation crawler crawling headless headless-chrome javascript nodejs npm playwright puppeteer scraper scraping typescript web-crawler web-crawling web-scraping

Last synced: 06 Jul 2025

https://github.com/apify/crawlee

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

apify automation crawler crawling headless headless-chrome javascript nodejs npm playwright puppeteer scraper scraping typescript web-crawler web-crawling web-scraping

Last synced: 03 Nov 2025

https://github.com/getmaxun/maxun

🔥 Open Source No Code Web Data Extraction Platform • Turn Websites To APIs & Spreadsheets With No-Code Robots In Minutes 🔥

agents api automation browser browser-automation data-extraction no-code no-code-web-scraper playwright robotic-process-automation rpa scraper self-hosted web-agent web-automation web-scraper web-scraping web-scraping-agent webscraping website-to-api

Last synced: 04 Jan 2026

https://github.com/evil0ctal/douyin_tiktok_download_api

🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。

api async crawler douyin douyin-api douyin-scraper douyin-tiktok-api douyin-tiktok-download fastapi no-watermark online-parsing python pywebio scraper spider tiktok tiktok-api tiktok-scraper tiktok-signature web-scraping

Last synced: 12 May 2025

https://github.com/Evil0ctal/Douyin_TikTok_Download_API

🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。

api async crawler douyin douyin-api douyin-scraper douyin-tiktok-api douyin-tiktok-download fastapi no-watermark online-parsing python pywebio scraper spider tiktok tiktok-api tiktok-scraper tiktok-signature web-scraping

Last synced: 26 Mar 2025

https://github.com/apify/crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

apify automation beautifulsoup crawler crawling hacktoberfest headless headless-chrome pip playwright python scraper scraping web-crawler web-crawling web-scraping

Last synced: 03 Nov 2025

https://github.com/D4Vinci/Scrapling

🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!

ai ai-scraping automation crawler crawling crawling-python data data-extraction hacktoberfest playwright python python3 scraping selectors stealth web-scraper web-scraping web-scraping-python webscraping xpath

Last synced: 13 May 2025

https://github.com/firecrawl/firecrawl-mcp-server

🔥 Official Firecrawl MCP Server - Adds powerful web scraping and search to Cursor, Claude and any other LLM clients.

batch-processing claude content-extraction data-collection firecrawl firecrawl-ai javascript-rendering llm-tools mcp mcp-server model-context-protocol search-api web-crawler web-scraping

Last synced: 13 Nov 2025

https://github.com/lexiforest/curl_cffi

Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.

akamai-fingerprint curl curl-impersonate fingerprinting http http-client http2-fingerprint https ja3 ja3-fingerprint tls-fingerprint web-scraping

Last synced: 14 May 2025

https://github.com/snooppr/snoop

Snoop — инструмент разведки на основе открытых данных (OSINT world)

blueteam ctf geo geocoder infosec ip nickname osint parser pentest police redteam scanner scraping security termux username username-checker username-search web-scraping

Last synced: 14 May 2025

https://github.com/d4vinci/scrapling

🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!

ai ai-scraping automation crawler crawling crawling-python data data-extraction hacktoberfest playwright python python3 scraping selectors stealth web-scraper web-scraping web-scraping-python webscraping xpath

Last synced: 13 May 2025

https://github.com/a9t9/rpa

Ui.Vision Open-Source RPA Software with Computer Vision, OCR, Anthropic Computer Use/LLM. Selenium IDE import/export.

anthropic anthropic-claude browser-automation browser-extension computer-use data-driven-tests imacros selenium-ide web-automation web-scraping

Last synced: 16 May 2025

https://github.com/tidyverse/rvest

Simple web scraping for R

html r web-scraping

Last synced: 16 Dec 2025

https://github.com/roach-php/core

The complete web scraping toolkit for PHP.

crawling php web-scraping

Last synced: 13 May 2025

https://github.com/gosom/google-maps-scraper

scrape data data from Google Maps. Extracts data such as the name, address, phone number, website URL, rating, reviews number, latitude and longitude, reviews,email and more for each place

distributed-scraper distributed-scraping golang google-maps google-maps-scraping web-scraper web-scraping

Last synced: 28 Dec 2025

https://github.com/rushter/selectolax

Python binding to Modest and Lexbor engines (fast HTML5 parser with CSS selectors).

css html5 modest-engine parser python web-scraping

Last synced: 13 May 2025

https://github.com/A9T9/RPA

UI.Vision: Open-Source RPA Software (formerly Kantu) - Modern Robotic Process Automation with Selenium IDE++

autohotkey automation browser-automation browser-extension data-driven-tests imacros opencv selenium-ide sikulix ui-tests uipath visual-recognition web-automation web-scraping webassembly

Last synced: 22 Mar 2025

https://github.com/intoli/user-agents

A JavaScript library for generating random user agents with data that's updated daily.

browser-automation browsers javascript navigator random randomization user-agent user-agent-spoofer web-scraping

Last synced: 13 May 2025

https://github.com/platonai/pulsarRPA

Automate webpages at scale, scrape web data completely and accurately with high performance, distributed AI-RPA.

ai-agents ai-crawler ai-rpa ai-scrarper crawler rpa scraper scraping web-crawler web-scraping

Last synced: 01 Apr 2025

https://github.com/postmodern/spidr

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

crawler ruby scraper spider spider-links web web-crawler web-scraper web-scraping web-spider

Last synced: 13 May 2025

https://github.com/DataHenHQ/till

DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any scraper in 5 minutes.

crawler man-in-the-middle mitm proxy-server scraper scraping web-scraping

Last synced: 15 Mar 2025

https://github.com/je-suis-tm/web-scraping

Detailed web scraping tutorials for dummies with financial data crawlers on Reddit WallStreetBets, CME (both options and futures), US Treasury, CFTC, LME, MacroTrends, SHFE and alternative data crawlers on Tomtom, BBC, Wall Street Journal, Al Jazeera, Reuters, Financial Times, Bloomberg, CNN, Fortune, The Economist

bloomberg data-scraper data-scraping financial-data financial-times futures futures-historical-data news-scraper news-websites newsletter options-data python-web-scraper reuters scrapper sraping wall-street-journal wallstreetbets web-scraper web-scrapers web-scraping

Last synced: 04 Apr 2025

https://github.com/gildas-lormeau/single-file-cli

CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)

archiving cli crawler deno dockerfile nodejs scraping-websites single-file web-archiving web-crawler web-scraper web-scraping

Last synced: 15 May 2025

https://github.com/rebrowser/rebrowser-patches

Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.

automation bot bot-detection chrome chromedriver cloudflare crawler crawling datadome headless headless-chrome playwright puppeteer puppeteer-extra rebrowser scraping selenium stealth web-scraping webdriver

Last synced: 14 May 2025

https://github.com/tinyfish-io/agentql

AgentQL is a suite of tools for connecting your AI to the web. Featuring a query language and Playwright integrations for interacting with elements and extracting data quickly, precisely, and at scale. Includes REST API, Python and JavaScript SDKs, browser debugger.

agent ai aiagent automation javascript playwright python rpa scraping web web-scraping web-scraping-colabs web-scraping-javascript web-scraping-python web-scrapping webagent

Last synced: 15 May 2025

https://github.com/alecxe/scrapy-fake-useragent

Random User-Agent middleware based on fake-useragent

python scrapy web-scraping

Last synced: 15 May 2025

https://github.com/dinubs/coolqlcool

Nextjs server to query websites with GraphQL

graphql javascript nextjs schema web-scraping

Last synced: 04 Apr 2025

https://github.com/z0m31en7/uscrapper

Uscrapper Vanta: Dive deeper into the web with this powerful open-source tool. Extract valuable insights with ease and efficiency, from both surface and deep web sources. Empower your data mining and analysis with Vanta's advanced capabilities. Fast, reliable, and user-friendly, Uscrapper Vanta is the ultimate choice for researchers and analysts.

darkweb darkweb-crawler information-extraction information-gathering osint osint-python osint-tool python reconnaissance selenium selenium-webscraper tor web-scraping webcra webcrawler webscraping website-scraper websites

Last synced: 15 May 2025

https://github.com/z0m31en7/Uscrapper

Uscrapper Vanta: Dive deeper into the web with this powerful open-source tool. Extract valuable insights with ease and efficiency, from both surface and deep web sources. Empower your data mining and analysis with Vanta's advanced capabilities. Fast, reliable, and user-friendly, Uscrapper Vanta is the ultimate choice for researchers and analysts.

darkweb darkweb-crawler information-extraction information-gathering osint osint-python osint-tool python reconnaissance selenium selenium-webscraper tor web-scraping webcra webcrawler webscraping website-scraper websites

Last synced: 05 May 2025

https://github.com/oxylabs/how-to-scrape-google-scholar

A guide for extracting titles, authors, and citations from Google Scholar using Python and Oxylabs SERP Scraper API.

google-scholar google-scholar-scraper google-scholar-scrapper google-search-scraper python python-scraper scraper-api web-scraper web-scraping

Last synced: 15 May 2025

https://github.com/oxylabs/quick-start-guide

Python quick start guides to get the most out of Oxylabs' Web Scraper API free trial.

oxylabs scraper scraper-api scraper-python scrapers scraping scraping-websites web-scraper web-scraping

Last synced: 06 Jul 2025

https://github.com/austinoboyle/scrape-linkedin-selenium

`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.

linkedin python scrape scraper scraping selenium selenium-webdriver web-scraper web-scraping

Last synced: 04 Apr 2025

https://github.com/oxylabs/how-to-scrape-amazon-prices

A code for extracting best-selling items, search results, and currently available deals from Amazon using Python and Oxylabs E-Commerce Scraper API.

amazon amazon-scraper api python python-scraper scraper-api web-scraper web-scraping

Last synced: 16 May 2025

https://github.com/sangaline/wayback-machine-scraper

A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.

archive-dot-org command-line-tool python wayback-archiver wayback-machine web-scraping

Last synced: 13 Apr 2025

https://github.com/roniemartinez/dude

dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators

async beautifulsoup4 crawler css framework lxml parsel playwright python scraper scraping selenium sync web-scraping webscraping xpath

Last synced: 16 Mar 2025

https://github.com/yusuzech/r-web-scraping-cheat-sheet

Guide, reference and cheatsheet on web scraping using rvest, httr and Rselenium.

cheatsheet httr r rselenium rvest scrape-websites web-scraping webscraping

Last synced: 20 Apr 2025

https://github.com/City-Bureau/city-scrapers

Scrape, standardize and share public meetings from local government websites

city-scrapers open-data python scrapy web-scraping

Last synced: 07 Apr 2025

https://github.com/serpapi/nokolexbor

High-performance HTML5 parser for Ruby based on Lexbor, with support for both CSS selectors and XPath.

c-extension css html5 parser ruby serpapi web-scraping xpath

Last synced: 15 May 2025

https://github.com/deedy5/primp

🪞PRIMP (Python Requests IMPersonate). The fastest python HTTP client that can impersonate web browsers

akamai fingerprint http http-client https impersonate ja3 ja4 python requests tls tls-client web-scraping

Last synced: 17 Aug 2025

https://github.com/web-agent-master/google-search

A Playwright-based Node.js tool that bypasses search engine anti-scraping mechanisms to execute Google searches. Local alternative to SERP APIs with MCP server integration.

ai google-search llm mcp-server web-scraping

Last synced: 06 Sep 2025

https://github.com/walissonsilva/web-scraping-python

🌐 Repositório com o conteúdo (slides, exemplos, códigos) da série de vídeos no YouTube sobre Web Scraping com Python.

beautifulsoup python requests selenium web-scraping

Last synced: 28 Mar 2025

https://github.com/infinilabs/crawler

🕷️ An easy-to-use spider written in Golang. (previous named GOPA.)

crawler crawling elasticsearch lightweight scraping spider web-crawler web-scraping web-spider

Last synced: 06 Apr 2025

https://github.com/oxylabs/python-web-scraping-tutorial

In this Python Web Scraping Tutorial, we will outline everything needed to get started with web scraping. We will begin with simple examples and move on to relatively more complex.

amazon-scraper-python crawler github-python json-database-python python python-projects python-web-crawler python-web-scraper scraper-python scraping web-crawler-python web-scraping web-scraping-api web-scraping-python webscraping

Last synced: 16 May 2025

https://github.com/vdutts7/gpt4V-scraper

AI agent that can SEE 👁️, control, navigate, & do stuff for you on your browser.

ai-agents browser-automation gpt-4-vision puppeteer web-scraping

Last synced: 06 Apr 2025

https://github.com/vdutts7/gpt4v-scraper

AI agent that can SEE 👁️, control, navigate, & do stuff for you on your browser.

ai-agents browser-automation gpt-4-vision puppeteer web-scraping

Last synced: 09 Apr 2025

https://github.com/amoudgl/short-jokes-dataset

Python scripts for building 'Short Jokes' dataset, featured on Kaggle

beautiful-soup dataset humor jokes oneliners python scrapers web-scraping

Last synced: 03 Apr 2025

https://github.com/tuhinpal/imdb-api

Serverless IMDB API powered by Cloudflare Worker

cloudflare-worker cloudflare-workers hono honojs imdb imdb-api movie-list web-scraping

Last synced: 08 Jul 2025

https://github.com/roach-php/laravel

Laravel adapter for Roach, the complete web scraping toolkit for PHP.

crawling laravel php web-scraping

Last synced: 11 Apr 2025