Projects in Awesome Lists tagged with web-scraping
A curated list of projects in awesome lists tagged with web-scraping .
https://github.com/scrapy/scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
crawler crawling framework hacktoberfest python scraping web-scraping web-scraping-python
Last synced: 05 Jan 2026
https://github.com/dgtlmoon/changedetection.io
The best and simplest free open source web page change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monitor which websites had a text change for free. Free Open source web page change detection, Website defacement monitoring, Price change notification
back-in-stock change-alert change-detection change-monitoring changedetection monitoring notifications restock-monitor self-hosted url-monitor web-scraping website-change-detection website-change-detector website-change-monitor website-change-notification website-change-tracker website-defacement-monitoring website-monitor website-monitoring website-watcher
Last synced: 12 May 2025
https://github.com/ScrapeGraphAI/Scrapegraph-ai
Python scraper based on AI
ai ai-scraping automated-scraper crawler html-to-markdown llm markdown rag scraping scraping-python web-crawler web-crawlers web-scraping
Last synced: 17 Oct 2025
https://github.com/scrapegraphai/scrapegraph-ai
Python scraper based on AI
ai ai-scraping automated-scraper crawler html-to-markdown llm markdown rag scraping scraping-python web-crawler web-crawlers web-scraping
Last synced: 05 Jan 2026
https://github.com/apifytech/apify-js
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
apify automation crawler crawling headless headless-chrome javascript nodejs npm playwright puppeteer scraper scraping typescript web-crawler web-crawling web-scraping
Last synced: 06 Jul 2025
https://github.com/apify/crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
apify automation crawler crawling headless headless-chrome javascript nodejs npm playwright puppeteer scraper scraping typescript web-crawler web-crawling web-scraping
Last synced: 03 Nov 2025
https://github.com/getmaxun/maxun
🔥 Open Source No Code Web Data Extraction Platform • Turn Websites To APIs & Spreadsheets With No-Code Robots In Minutes 🔥
agents api automation browser browser-automation data-extraction no-code no-code-web-scraper playwright robotic-process-automation rpa scraper self-hosted web-agent web-automation web-scraper web-scraping web-scraping-agent webscraping website-to-api
Last synced: 04 Jan 2026
https://github.com/evil0ctal/douyin_tiktok_download_api
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
api async crawler douyin douyin-api douyin-scraper douyin-tiktok-api douyin-tiktok-download fastapi no-watermark online-parsing python pywebio scraper spider tiktok tiktok-api tiktok-scraper tiktok-signature web-scraping
Last synced: 12 May 2025
https://github.com/Evil0ctal/Douyin_TikTok_Download_API
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
api async crawler douyin douyin-api douyin-scraper douyin-tiktok-api douyin-tiktok-download fastapi no-watermark online-parsing python pywebio scraper spider tiktok tiktok-api tiktok-scraper tiktok-signature web-scraping
Last synced: 26 Mar 2025
https://github.com/seleniumbase/seleniumbase
Python APIs for web automation, testing, and bypassing bot-detection.
anti-detection behave bot-detection cdp chromedriver cloudflare-bypass e2e-testing pytest pytest-plugin python python-scraper selenium selenium-python seleniumbase test-automation web-automation web-scraping web-scraping-python webdriver webkit
Last synced: 30 Dec 2025
https://github.com/seleniumbase/SeleniumBase
Python APIs for web automation, testing, and bypassing bot-detection.
anti-detection behave bot-detection cdp chromedriver cloudflare-bypass e2e-testing pytest pytest-plugin python python-scraper selenium selenium-python seleniumbase test-automation web-automation web-scraping web-scraping-python webdriver webkit
Last synced: 26 Mar 2025
https://github.com/mherrmann/selenium-python-helium
Lighter web automation with Python
chrome firefox helium python python3 selenium selenium-python web-automation web-scraping webdriver
Last synced: 17 Aug 2025
https://github.com/mherrmann/helium
Lighter web automation with Python
chrome firefox helium python python3 selenium selenium-python web-automation web-scraping webdriver
Last synced: 13 May 2025
https://github.com/alirezamika/autoscraper
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
ai artificial-intelligence automation crawler machine-learning python scrape scraper scraping web-scraping webautomation webscraping
Last synced: 13 May 2025
https://github.com/go-rod/rod
A Chrome DevTools Protocol driver for web automation and scraping.
automation cdp chrome-devtools chrome-devtools-protocol chrome-headless crawling devtools devtools-protocol go golang gorod headless rod scraper testing web web-scraping
Last synced: 15 May 2025
https://github.com/apify/crawlee-python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
apify automation beautifulsoup crawler crawling hacktoberfest headless headless-chrome pip playwright python scraper scraping web-crawler web-crawling web-scraping
Last synced: 03 Nov 2025
https://github.com/D4Vinci/Scrapling
🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!
ai ai-scraping automation crawler crawling crawling-python data data-extraction hacktoberfest playwright python python3 scraping selectors stealth web-scraper web-scraping web-scraping-python webscraping xpath
Last synced: 13 May 2025
https://github.com/firecrawl/firecrawl-mcp-server
🔥 Official Firecrawl MCP Server - Adds powerful web scraping and search to Cursor, Claude and any other LLM clients.
batch-processing claude content-extraction data-collection firecrawl firecrawl-ai javascript-rendering llm-tools mcp mcp-server model-context-protocol search-api web-crawler web-scraping
Last synced: 13 Nov 2025
https://github.com/lexiforest/curl_cffi
Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
akamai-fingerprint curl curl-impersonate fingerprinting http http-client http2-fingerprint https ja3 ja3-fingerprint tls-fingerprint web-scraping
Last synced: 14 May 2025
https://github.com/php-curl-class/php-curl-class
PHP Curl Class makes it easy to send HTTP requests and integrate with web APIs
api api-client class client curl framework http http-client http-proxy json php php-curl php-curl-library proxy requests restful web-scraper web-scraping web-service xml
Last synced: 13 May 2025
https://github.com/adbar/trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
article-extractor corpus corpus-builder corpus-tools crawler html-to-markdown html2text news news-aggregator news-crawler nlp readability rss-feed scraping tei text-cleaning text-extraction text-mining text-preprocessing web-scraping
Last synced: 24 Dec 2025
https://github.com/d4vinci/scrapling
🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!
ai ai-scraping automation crawler crawling crawling-python data data-extraction hacktoberfest playwright python python3 scraping selectors stealth web-scraper web-scraping web-scraping-python webscraping xpath
Last synced: 13 May 2025
https://github.com/x4nth055/pythoncode-tutorials
The Python Code Tutorials
computer-vision ethical-hacking face-detection machine-learning natural-language-processing network-analysis network-programming network-security programming-tutorial python python-tutorials python3 scapy scapy-tutorials socket-programming text-classification tutorials web-scraping
Last synced: 12 May 2025
https://github.com/lorien/grab
Web Scraping Framework
asynchronous crawler crawling framework http-client network pycurl python python-library python3 scraping spider urllib3 web-scraping
Last synced: 14 May 2025
https://github.com/codingforentrepreneurs/30-Days-of-Python
Learn Python for the next 30 (or so) Days.
api automation csv fastapi flask jupyter pandas python python3 rest-api selenium selenium-webdriver tutorial web-scraping
Last synced: 14 Mar 2025
https://github.com/codingforentrepreneurs/30-days-of-python
Learn Python for the next 30 (or so) Days.
api automation csv fastapi flask jupyter pandas python python3 rest-api selenium selenium-webdriver tutorial web-scraping
Last synced: 13 Apr 2025
https://github.com/oxylabs/amazon-scraper
Free Trial Amazon Scraper API for extracting search, product, offer listing, reviews, question and answers, best sellers and sellers data.
amazon-aip amazon-api amazon-api-data amazon-price-tracker amazon-product-scraper amazon-products-api amazon-scrape-api amazon-scraper amazon-scraper-api amazon-scraping amazon-scraping-library e-commerce-api how-to-scrape-amazon price-scraper price-scraper-api python scrape-amazon scraping-amazon scraping-api web-scraping
Last synced: 14 May 2025
https://github.com/justmarkham/dat8
General Assembly's 2015 Data Science course in Washington, DC
clustering course data-analysis data-cleaning data-science data-visualization decision-trees ensemble-learning jupyter-notebook linear-regression logistic-regression machine-learning model-evaluation naive-bayes natural-language-processing pandas python regular-expressions scikit-learn web-scraping
Last synced: 15 May 2025
https://github.com/a9t9/rpa
Ui.Vision Open-Source RPA Software with Computer Vision, OCR, Anthropic Computer Use/LLM. Selenium IDE import/export.
anthropic anthropic-claude browser-automation browser-extension computer-use data-driven-tests imacros selenium-ide web-automation web-scraping
Last synced: 16 May 2025
https://github.com/roach-php/core
The complete web scraping toolkit for PHP.
Last synced: 13 May 2025
https://github.com/gosom/google-maps-scraper
scrape data data from Google Maps. Extracts data such as the name, address, phone number, website URL, rating, reviews number, latitude and longitude, reviews,email and more for each place
distributed-scraper distributed-scraping golang google-maps google-maps-scraping web-scraper web-scraping
Last synced: 28 Dec 2025
https://github.com/rushter/selectolax
Python binding to Modest and Lexbor engines (fast HTML5 parser with CSS selectors).
css html5 modest-engine parser python web-scraping
Last synced: 13 May 2025
https://github.com/A9T9/RPA
UI.Vision: Open-Source RPA Software (formerly Kantu) - Modern Robotic Process Automation with Selenium IDE++
autohotkey automation browser-automation browser-extension data-driven-tests imacros opencv selenium-ide sikulix ui-tests uipath visual-recognition web-automation web-scraping webassembly
Last synced: 22 Mar 2025
https://github.com/juancarlospaco/faster-than-requests
Faster requests on Python 3
curl cython download-file faster-than-requests high-performance http-requests ndjson open-data python python-library python-requests python3 requests-toolbelt requests3 scrapy speed urllib urllib3 web-scraper web-scraping
Last synced: 14 May 2025
https://github.com/oxylabs/free-proxy-list
Claim Free proxy list with United States IP addresses and use it for your projects.
free-proxies free-proxies-for-web-scraping free-proxy free-proxy-ip free-proxy-list proxies proxies-http proxies-https proxies-list proxies-socks5 proxy proxy-list proxy-server proxypool web-scraping
Last synced: 02 Aug 2025
https://github.com/intoli/user-agents
A JavaScript library for generating random user agents with data that's updated daily.
browser-automation browsers javascript navigator random randomization user-agent user-agent-spoofer web-scraping
Last synced: 13 May 2025
https://github.com/vprusso/youtube_tutorials
Collection of scripts corresponding to LucidProgramming YouTube tutorials
ctci-solutions lucidprogramming python python-tutorial python3 python3-tutorial technical-interview web-scraping youtube-tutorial
Last synced: 08 Oct 2025
https://github.com/platonai/pulsarRPA
Automate webpages at scale, scrape web data completely and accurately with high performance, distributed AI-RPA.
ai-agents ai-crawler ai-rpa ai-scrarper crawler rpa scraper scraping web-crawler web-scraping
Last synced: 01 Apr 2025
https://github.com/postmodern/spidr
A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
crawler ruby scraper spider spider-links web web-crawler web-scraper web-scraping web-spider
Last synced: 13 May 2025
https://github.com/DataHenHQ/till
DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any scraper in 5 minutes.
crawler man-in-the-middle mitm proxy-server scraper scraping web-scraping
Last synced: 15 Mar 2025
https://github.com/kaliiiiiiiiii-vinyzu/patchright
Undetected version of the Playwright testing and automation library.
automation bot bots botting browser chrome chromedriver chromium cloudflare cloudflare-bypass playwright stealth undetectable undetected web-automation web-scraping webautomation webdriver webscraping
Last synced: 29 Dec 2025
https://github.com/kaliiiiiiiiii/Selenium-Driverless
a stealthy browser automation framework
automation detection-evasion driverless-chrome python python3 reverse-engineering scraping-python testing vulnerability-research web-scraping webdriver
Last synced: 08 Jul 2025
https://github.com/je-suis-tm/web-scraping
Detailed web scraping tutorials for dummies with financial data crawlers on Reddit WallStreetBets, CME (both options and futures), US Treasury, CFTC, LME, MacroTrends, SHFE and alternative data crawlers on Tomtom, BBC, Wall Street Journal, Al Jazeera, Reuters, Financial Times, Bloomberg, CNN, Fortune, The Economist
bloomberg data-scraper data-scraping financial-data financial-times futures futures-historical-data news-scraper news-websites newsletter options-data python-web-scraper reuters scrapper sraping wall-street-journal wallstreetbets web-scraper web-scrapers web-scraping
Last synced: 04 Apr 2025
https://github.com/gildas-lormeau/single-file-cli
CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)
archiving cli crawler deno dockerfile nodejs scraping-websites single-file web-archiving web-crawler web-scraper web-scraping
Last synced: 15 May 2025
https://github.com/kaliiiiiiiiii/selenium-driverless
a stealthy browser automation framework
automation detection-evasion driverless-chrome python python3 reverse-engineering scraping-python testing vulnerability-research web-scraping webdriver
Last synced: 14 May 2025
https://github.com/rebrowser/rebrowser-patches
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.
automation bot bot-detection chrome chromedriver cloudflare crawler crawling datadome headless headless-chrome playwright puppeteer puppeteer-extra rebrowser scraping selenium stealth web-scraping webdriver
Last synced: 14 May 2025
https://github.com/tinyfish-io/agentql
AgentQL is a suite of tools for connecting your AI to the web. Featuring a query language and Playwright integrations for interacting with elements and extracting data quickly, precisely, and at scale. Includes REST API, Python and JavaScript SDKs, browser debugger.
agent ai aiagent automation javascript playwright python rpa scraping web web-scraping web-scraping-colabs web-scraping-javascript web-scraping-python web-scrapping webagent
Last synced: 15 May 2025
https://github.com/alecxe/scrapy-fake-useragent
Random User-Agent middleware based on fake-useragent
Last synced: 15 May 2025
https://github.com/serpapi/google-search-results-python
Google Search Results via SERP API pip Python Package
bing-image google-crawler google-images python scraping serp-api serpapi web-scraping
Last synced: 11 Apr 2025
https://github.com/lit26/finvizfinance
Finviz analysis python library.
crypto earnings-calls financial-analysis forex fundament fundamental-analysis inside-trader outer-ratings pandas pypi screener stock-charts stock-news technical-analysis web-scraping
Last synced: 15 May 2025
https://github.com/dinubs/coolqlcool
Nextjs server to query websites with GraphQL
graphql javascript nextjs schema web-scraping
Last synced: 04 Apr 2025
https://github.com/z0m31en7/uscrapper
Uscrapper Vanta: Dive deeper into the web with this powerful open-source tool. Extract valuable insights with ease and efficiency, from both surface and deep web sources. Empower your data mining and analysis with Vanta's advanced capabilities. Fast, reliable, and user-friendly, Uscrapper Vanta is the ultimate choice for researchers and analysts.
darkweb darkweb-crawler information-extraction information-gathering osint osint-python osint-tool python reconnaissance selenium selenium-webscraper tor web-scraping webcra webcrawler webscraping website-scraper websites
Last synced: 15 May 2025
https://github.com/z0m31en7/Uscrapper
Uscrapper Vanta: Dive deeper into the web with this powerful open-source tool. Extract valuable insights with ease and efficiency, from both surface and deep web sources. Empower your data mining and analysis with Vanta's advanced capabilities. Fast, reliable, and user-friendly, Uscrapper Vanta is the ultimate choice for researchers and analysts.
darkweb darkweb-crawler information-extraction information-gathering osint osint-python osint-tool python reconnaissance selenium selenium-webscraper tor web-scraping webcra webcrawler webscraping website-scraper websites
Last synced: 05 May 2025
https://github.com/oxylabs/how-to-scrape-google-scholar
A guide for extracting titles, authors, and citations from Google Scholar using Python and Oxylabs SERP Scraper API.
google-scholar google-scholar-scraper google-scholar-scrapper google-search-scraper python python-scraper scraper-api web-scraper web-scraping
Last synced: 15 May 2025
https://github.com/achuthasubhash/Complete-Life-Cycle-of-a-Data-Science-Project
Complete-Life-Cycle-of-a-Data-Science-Project
analysis data-analysis data-science dataset deep-learning eda exploratory-data-analysis feature-engineering federated-learning machine-learning nlp-models python python-library pytorch reinforcement-learning scraper supervised-learning transfer-learning unsupervised-learning web-scraping
Last synced: 06 May 2025
https://github.com/spekulatius/phpscraper
A universal web-util for PHP.
beautifulsoup chromium headless-chrome php php-crawler php-scraper php-spider php-spiders puppeteer pyppeteer scraper scraping scraping-websites scrapy web-scraper web-scraping
Last synced: 15 May 2025
https://github.com/spekulatius/PHPScraper
A universal web-util for PHP.
beautifulsoup chromium headless-chrome php php-crawler php-scraper php-spider php-spiders puppeteer pyppeteer scraper scraping scraping-websites scrapy web-scraper web-scraping
Last synced: 14 Mar 2025
https://github.com/oxylabs/quick-start-guide
Python quick start guides to get the most out of Oxylabs' Web Scraper API free trial.
oxylabs scraper scraper-api scraper-python scrapers scraping scraping-websites web-scraper web-scraping
Last synced: 06 Jul 2025
https://github.com/programminghistorian/jekyll
Jekyll-based static site for The Programming Historian
api data-management data-manipulation data-mining dh digital-humanities exhibits linked-open-data mapping multi-lingual network-analysis open-educational-resources open-source pedagogy programming-historian python r-studio scraping text-analysis web-scraping
Last synced: 14 Mar 2025
https://github.com/AlexMathew/scrapple
A framework for creating semi-automatic web content extractors
beautifulsoup crawler css-selector extractor lxml python scrapers scraping scrapy selector selector-expression tutorial web-scraper web-scraping xpath-expression
Last synced: 29 Mar 2025
https://github.com/jaebradley/basketball_reference_web_scraper
NBA Stats API via Basketball Reference
basketball-reference nba python web-scraper web-scraping
Last synced: 14 May 2025
https://github.com/saifyxpro/headlessx
A lightweight, self-hosted headless browser automation platform. Designed as an alternative to Browserless, built for speed, privacy, and scalability.
automation automation-api automation-platform browser-automation browser-testing browserless chrome-headless chromedriver container-automation data-extraction headless headless-chrome headless-service playwright playwright-automation puppeteer scraping-service web-automation web-scraping
Last synced: 07 Oct 2025
https://github.com/austinoboyle/scrape-linkedin-selenium
`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
linkedin python scrape scraper scraping selenium selenium-webdriver web-scraper web-scraping
Last synced: 04 Apr 2025
https://github.com/oxylabs/how-to-scrape-amazon-prices
A code for extracting best-selling items, search results, and currently available deals from Amazon using Python and Oxylabs E-Commerce Scraper API.
amazon amazon-scraper api python python-scraper scraper-api web-scraper web-scraping
Last synced: 16 May 2025
https://github.com/scrapfly/scrapfly-scrapers
Scalable Python web scraping scripts for +40 popular domains
antibot automation captcha-bypass crawler crawling crawling-python datascraping proxies python python-scraper scraper scraping scraping-python spider twitter-scraper web-crawler web-scraping web-scraping-python webscraper webscraping
Last synced: 11 Apr 2025
https://github.com/shaikhsajid1111/social-media-profile-scrapers
Fetch user's data across social media
facebook-scraper instagram-scraper medium-scraper pinterest pinterest-scrapper python quora-scraper reddit-scraper request scrapping-python selenium-python social-media tiktok-scraper twitter-scraper web-scraper web-scraping
Last synced: 05 Apr 2025
https://github.com/vida-nyu/ache
ACHE is a web crawler for domain-specific search.
domain-specific-search focused-crawler hacktoberfest web-crawler web-scraping web-search web-spider
Last synced: 04 Apr 2025
https://github.com/VIDA-NYU/ache
ACHE is a web crawler for domain-specific search.
domain-specific-search focused-crawler hacktoberfest web-crawler web-scraping web-search web-spider
Last synced: 03 Apr 2025
https://github.com/ViDA-NYU/ache
ACHE is a web crawler for domain-specific search.
domain-specific-search focused-crawler hacktoberfest web-crawler web-scraping web-search web-spider
Last synced: 25 Mar 2025
https://github.com/sangaline/wayback-machine-scraper
A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
archive-dot-org command-line-tool python wayback-archiver wayback-machine web-scraping
Last synced: 13 Apr 2025
https://github.com/roniemartinez/dude
dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators
async beautifulsoup4 crawler css framework lxml parsel playwright python scraper scraping selenium sync web-scraping webscraping xpath
Last synced: 16 Mar 2025
https://github.com/kaliiiiiiiiii/undetected-playwright-python
Undetected version of the Playwright testing and automation library.
automation bot bots botting browser chrome chromedriver chromium cloudflare cloudflare-bypass playwright stealth undetectable undetected web-automation web-scraping webautomation webdriver webscraping
Last synced: 27 Oct 2025
https://github.com/0x676e67/wreq
An ergonomic Rust HTTP Client with TLS fingerprint
akamai boringssl crawler fingerprint http http-client http2 https impersonate ja3 ja4 requests rust scraper tls tls-client tls-fingerprint web-scraper web-scraping websocket
Last synced: 02 Aug 2025
https://github.com/yusuzech/r-web-scraping-cheat-sheet
Guide, reference and cheatsheet on web scraping using rvest, httr and Rselenium.
cheatsheet httr r rselenium rvest scrape-websites web-scraping webscraping
Last synced: 20 Apr 2025
https://github.com/flairnlp/fundus
A very simple news crawler with a funny name
cc-news commoncrawl corpus corpus-tools crawler datasets image-classification image-extraction news-crawler news-scraping nlp python rss scraper sitemap text-extraction web-corpus web-scraping
Last synced: 14 May 2025
https://github.com/kaliiiiiiiiii-vinyzu/patchright-python
Undetected Python version of the Playwright testing and automation library.
automation bot bots botting browser chrome chromedriver chromium cloudflare cloudflare-by playwright stealth undetectable undetected web-automation web-scraping webautomation webdriver webscraping
Last synced: 15 May 2025
https://github.com/crwlrsoft/crawler
Library for Rapid (Web) Crawler and Scraper Development
crawler crawling hacktoberfest php scraper scraping scraping-websites web-crawler web-crawling web-scraper web-scraping
Last synced: 15 May 2025
https://github.com/lkuffo/web-scraping
Más de 50 ejemplos de web scraping utilizando: Requests | Scrapy | Selenium | LXML | BeautifulSoup
beautifulsoup beautifulsoup4 lxml-etree scraping scraping-python scraping-websites scrapping-python scrapy scrapy-crawler scrapy-spider selenium selenium-python selenium-webdriver web-scraping webscraping
Last synced: 07 Apr 2025
https://github.com/City-Bureau/city-scrapers
Scrape, standardize and share public meetings from local government websites
city-scrapers open-data python scrapy web-scraping
Last synced: 07 Apr 2025
https://github.com/serpapi/nokolexbor
High-performance HTML5 parser for Ruby based on Lexbor, with support for both CSS selectors and XPath.
c-extension css html5 parser ruby serpapi web-scraping xpath
Last synced: 15 May 2025
https://github.com/shaikhsajid1111/twitter-scraper-selenium
Python's package to scrap Twitter's front-end easily
automation contribution-welcome csv hacktoberfest json open-source pypi python python3 selenium social-media tweets twitter twitter-api twitter-bot twitter-hashtag twitter-profile twitter-profiles twitter-scraper web-scraping
Last synced: 07 Apr 2025
https://github.com/deedy5/primp
🪞PRIMP (Python Requests IMPersonate). The fastest python HTTP client that can impersonate web browsers
akamai fingerprint http http-client https impersonate ja3 ja4 python requests tls tls-client web-scraping
Last synced: 17 Aug 2025
https://github.com/web-agent-master/google-search
A Playwright-based Node.js tool that bypasses search engine anti-scraping mechanisms to execute Google searches. Local alternative to SERP APIs with MCP server integration.
ai google-search llm mcp-server web-scraping
Last synced: 06 Sep 2025
https://github.com/walissonsilva/web-scraping-python
🌐 Repositório com o conteúdo (slides, exemplos, códigos) da série de vídeos no YouTube sobre Web Scraping com Python.
beautifulsoup python requests selenium web-scraping
Last synced: 28 Mar 2025
https://github.com/infinilabs/crawler
🕷️ An easy-to-use spider written in Golang. (previous named GOPA.)
crawler crawling elasticsearch lightweight scraping spider web-crawler web-scraping web-spider
Last synced: 06 Apr 2025
https://github.com/yaroslaff/nudecrawler
Crawl telegra.ph searching for nudes!
crawl crawler find nsfw nsfw-recognition nude nudes nudity-detection onlyfans python python3 scrape scraper scraping search spider telegra-ph tits web-scraping webscraping
Last synced: 04 Apr 2025
https://github.com/s0rg/crawley
The unix-way web crawler
cli crawler go golang golang-application pentest pentest-tool pentesting unix-way web-crawler web-scraping web-spider
Last synced: 16 May 2025
https://github.com/flairNLP/fundus
A very simple news crawler with a funny name
cc-news commoncrawl corpus crawler news-crawler news-scraping nlp python rss scraper sitemap text-extraction web-corpus web-scraping
Last synced: 04 Mar 2025
https://github.com/davidteather/everything-web-scraping
Learn everything web scraping with David Teather Codes on YouTube
course courses everything hacktoberfest hacktoerfest project-based-learning project-based-learning-courses project-based-tutorials python python-web-scraper python3 reverse-engineering web-scraping web-scraping-python web-scraping-tutorial webscraping youtube-series
Last synced: 27 Oct 2025
https://github.com/oxylabs/python-web-scraping-tutorial
In this Python Web Scraping Tutorial, we will outline everything needed to get started with web scraping. We will begin with simple examples and move on to relatively more complex.
amazon-scraper-python crawler github-python json-database-python python python-projects python-web-crawler python-web-scraper scraper-python scraping web-crawler-python web-scraping web-scraping-api web-scraping-python webscraping
Last synced: 16 May 2025
https://github.com/vdutts7/gpt4V-scraper
AI agent that can SEE 👁️, control, navigate, & do stuff for you on your browser.
ai-agents browser-automation gpt-4-vision puppeteer web-scraping
Last synced: 06 Apr 2025
https://github.com/vdutts7/gpt4v-scraper
AI agent that can SEE 👁️, control, navigate, & do stuff for you on your browser.
ai-agents browser-automation gpt-4-vision puppeteer web-scraping
Last synced: 09 Apr 2025
https://github.com/tirthajyoti/web-database-analytics
Web scrapping and related analytics using Python tools
analytics beautifulsoup4 data-science data-wrangling database json json-parser natural-language-processing nlp python regular-expression sql sqlite3 web-scraping xml-parser
Last synced: 07 Apr 2025
https://github.com/tirthajyoti/Web-Database-Analytics
Web scrapping and related analytics using Python tools
analytics beautifulsoup4 data-science data-wrangling database json json-parser natural-language-processing nlp python regular-expression sql sqlite3 web-scraping xml-parser
Last synced: 20 Apr 2025
https://github.com/amoudgl/short-jokes-dataset
Python scripts for building 'Short Jokes' dataset, featured on Kaggle
beautiful-soup dataset humor jokes oneliners python scrapers web-scraping
Last synced: 03 Apr 2025
https://github.com/tuhinpal/imdb-api
Serverless IMDB API powered by Cloudflare Worker
cloudflare-worker cloudflare-workers hono honojs imdb imdb-api movie-list web-scraping
Last synced: 08 Jul 2025
https://github.com/jrbadiabo/bet-on-sibyl
Machine Learning Model for Sport Predictions (Football, Basketball, Baseball, Hockey, Soccer & Tennis)
algorithms beautifulsoup machine-learning machine-learning-algorithms machinelearning predictive-analysis python python-2 scikit-learn selenium sports-stats sportsanalytics web-crawling web-scraping
Last synced: 13 Apr 2025
https://github.com/roach-php/laravel
Laravel adapter for Roach, the complete web scraping toolkit for PHP.
crawling laravel php web-scraping
Last synced: 11 Apr 2025