An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with webscraping

A curated list of projects in awesome lists tagged with webscraping .

https://github.com/huginn/huginn

Create agents that monitor and act on your behalf. Your agents are standing by!

agent automation feed feedgenerator huginn monitoring notifications rss scraper twitter twitter-streaming webscraping

Last synced: 12 May 2025

https://github.com/mendableai/firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

ai ai-scraping crawler data html-to-markdown llm markdown rag scraper scraping web-crawler webscraping

Last synced: 12 May 2025

https://github.com/assafelovic/gpt-researcher

An LLM agent that conducts deep research (local and web) on any given topic and generates a long report with citations.

agent ai automation deepresearch llms mcp mcp-server python research search webscraping

Last synced: 25 Jan 2026

https://github.com/getmaxun/maxun

🔥 Open Source No Code Web Data Extraction Platform • Turn Websites To APIs & Spreadsheets With No-Code Robots In Minutes 🔥

agents api automation browser browser-automation data-extraction no-code no-code-web-scraper playwright robotic-process-automation rpa scraper self-hosted web-agent web-automation web-scraper web-scraping web-scraping-agent webscraping website-to-api

Last synced: 23 Jan 2026

https://github.com/D4Vinci/Scrapling

🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!

ai ai-scraping automation crawler crawling crawling-python data data-extraction hacktoberfest playwright python python3 scraping selectors stealth web-scraper web-scraping web-scraping-python webscraping xpath

Last synced: 13 May 2025

https://github.com/niespodd/browser-fingerprinting

Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️‍♂️ when scraping the web?

automation bot bot-detection browser-fingerprinting chromedriver chromium chromium-browser crawler detection fingerprinting puppeteer recaptcha scraper spider stealth web webscraping

Last synced: 14 May 2025

https://github.com/d4vinci/scrapling

🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!

ai ai-scraping automation crawler crawling crawling-python data data-extraction hacktoberfest playwright python python3 scraping selectors stealth web-scraper web-scraping web-scraping-python webscraping xpath

Last synced: 15 Feb 2026

https://github.com/anaskhan96/soup

Web Scraper in Go, similar to BeautifulSoup

beautifulsoup go golang html-node web-scraper webscraper webscraping

Last synced: 14 May 2025

https://github.com/reworkd/tarsier

Vision utilities for web interaction agents 👀

gpt4v llms ocr playwright pypi-package python selenium webscraping

Last synced: 13 May 2025

https://github.com/thewebscrapingclub/webscraping-from-0-to-hero

The web scraping open project repository aims to share knowledge and experiences about web scraping with Python

playwright python scrapy scrapy-spider scrapysplash webscraping

Last synced: 08 Apr 2025

https://github.com/TheWebScrapingClub/webscraping-from-0-to-hero

The web scraping open project repository aims to share knowledge and experiences about web scraping with Python

playwright python scrapy scrapy-spider scrapysplash webscraping

Last synced: 14 Mar 2025

https://github.com/jamesturk/scrapeghost

👻 Experimental library for scraping websites using OpenAI's GPT API.

gpt openai-api webscraping

Last synced: 15 May 2025

https://github.com/requests-cache/requests-cache

Persistent HTTP cache for python requests

cache dynamodb http mongodb performance redis requests sqlite web webscraping

Last synced: 11 Dec 2025

https://github.com/m8sec/crosslinked

LinkedIn enumeration tool to extract valid employee names from an organization through search engine scraping

enumeration linkedin-scraper osint pentest-scripts pentest-tool python3 username-generator webscraping

Last synced: 14 May 2025

https://github.com/m8sec/CrossLinked

LinkedIn enumeration tool to extract valid employee names from an organization through search engine scraping

enumeration linkedin-scraper osint pentest-scripts pentest-tool python3 username-generator webscraping

Last synced: 07 Apr 2025

https://github.com/holgerd77/django-dynamic-scraper

Creating Scrapy scrapers via the Django admin interface

django python scraper scraping scrapy spider webscraping

Last synced: 15 May 2025

https://github.com/raznem/parsera

Lightweight library for scraping web-sites with LLMs

ai ai-scraping data-extraction llm opensource playwright python scraping webscraping

Last synced: 11 Apr 2025

https://github.com/maxhumber/gazpacho

🥫 The simple, fast, and modern web scraping library

gazpacho scraping webscraping

Last synced: 15 May 2025

https://github.com/Skallwar/suckit

Suck the InTernet

hacktoberfest rust webscraping

Last synced: 29 Mar 2025

https://github.com/skallwar/suckit

Suck the InTernet

hacktoberfest rust webscraping

Last synced: 10 May 2025

https://github.com/mov-cli/mov-cli

Watch everything from your terminal.

android cli hacktober ios linux scraping webscraping windows

Last synced: 10 Jul 2025

https://github.com/benibela/xidel

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.

cli command-line css-selector curl data-processing datascraping html http httpie json rest scraper web webscraper webscraping wget xml xmlstarlet xpath xquery

Last synced: 15 May 2025

https://github.com/openzim/zimit

Make a ZIM file from any Web site and surf offline!

docker scraper webscraping zim

Last synced: 21 Jan 2026

https://github.com/chris-greening/instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

beginner-friendly data-mining data-science instagram instagram-data instagram-scraper lightweight python python-scraper python3 webscraping

Last synced: 07 Apr 2025

https://github.com/z0m31en7/uscrapper

Uscrapper Vanta: Dive deeper into the web with this powerful open-source tool. Extract valuable insights with ease and efficiency, from both surface and deep web sources. Empower your data mining and analysis with Vanta's advanced capabilities. Fast, reliable, and user-friendly, Uscrapper Vanta is the ultimate choice for researchers and analysts.

darkweb darkweb-crawler information-extraction information-gathering osint osint-python osint-tool python reconnaissance selenium selenium-webscraper tor web-scraping webcra webcrawler webscraping website-scraper websites

Last synced: 15 May 2025

https://github.com/wodsuz/easyapplyjobsbot

A python bot to automatically apply all Linkedin,Glassdoor, etc Easy Apply jobs based on your preferences. Auto login, auto fill additional questions, apply automatically!

ai apply-jobs automated automation bot challenge chatgpt find-jobs glassdoor glassdoor-scraper indeed job jobs linkedin list-jobs python3 selenium webscraping ziprecruiter

Last synced: 13 Apr 2025

https://github.com/z0m31en7/Uscrapper

Uscrapper Vanta: Dive deeper into the web with this powerful open-source tool. Extract valuable insights with ease and efficiency, from both surface and deep web sources. Empower your data mining and analysis with Vanta's advanced capabilities. Fast, reliable, and user-friendly, Uscrapper Vanta is the ultimate choice for researchers and analysts.

darkweb darkweb-crawler information-extraction information-gathering osint osint-python osint-tool python reconnaissance selenium selenium-webscraper tor web-scraping webcra webcrawler webscraping website-scraper websites

Last synced: 05 May 2025

https://github.com/adrianhajdin/pricewise

Dive into web scraping and build a Next.js 13 eCommerce price tracker within a single video that teaches you data scraping, cron jobs, sending emails, deployment, and more.

scraping webscraping

Last synced: 04 Apr 2025

https://github.com/jchao01/TradingView-data-scraper

Extract price and indicator data from TradingView charts to create ML datasets

algorithmic-trading data-mining json tradingview webscraping

Last synced: 26 Mar 2025

https://github.com/openaustralia/morph

Take the hassle out of web scraping

civictech docker webscraping

Last synced: 03 Apr 2025

https://github.com/EZ-hwh/AutoScraper

Official implement of paper "AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation"

webscraping

Last synced: 04 Oct 2025

https://github.com/roniemartinez/dude

dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators

async beautifulsoup4 crawler css framework lxml parsel playwright python scraper scraping selenium sync web-scraping webscraping xpath

Last synced: 16 Mar 2025

https://github.com/0xMassi/webclaw

Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.

ai ai-agents ai-scraping cli crawler data-extraction html-to-markdown llm markdown mcp mcp-server rust scraper self-hosted tls-fingerprinting web-crawler web-extraction web-scraper web-scraping webscraping

Last synced: 04 Apr 2026

https://github.com/stephanlensky/zendriver

A blazing fast, async-first, undetectable webscraping/web automation framework based on ultrafunkamsterdam/nodriver. Now with Docker support!

anti-bot async bot-detection browser browser-automation captcha chrome chrome-devtools-protocol chromedriver cloudflare cloudflare-bypass python selenium webdriver webscraping

Last synced: 16 May 2025

https://github.com/yusuzech/r-web-scraping-cheat-sheet

Guide, reference and cheatsheet on web scraping using rvest, httr and Rselenium.

cheatsheet httr r rselenium rvest scrape-websites web-scraping webscraping

Last synced: 29 Apr 2026

https://github.com/davidteather/tiktokbot

A TikTokBot that downloads trending tiktok videos and compiles them using FFmpeg

api bot editing ffmpeg hacktoberfest tik tiktok tiktok-api tiktok-compilations tok trending-tiktok-videos unoffical video webscraping

Last synced: 13 Aug 2025

https://github.com/wodsuz/EasyApplyJobsBot

A python bot to automatically apply all Linkedin,Glassdoor, etc Easy Apply jobs based on your preferences. Auto login, auto fill additional questions, apply automatically!

ai apply-jobs automated automation bot challenge chatgpt find-jobs glassdoor glassdoor-scraper indeed job jobs linkedin list-jobs python3 selenium webscraping ziprecruiter

Last synced: 03 Aug 2025

https://github.com/currentslab/extractnet

A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package

author-extraction content-extraction date-extraction machine-learning news news-articles news-extraction news-extractor python text-cleaning text-mining web-scraping webscraping

Last synced: 17 Mar 2026

https://github.com/felipeall/transfermarkt-api

API service to get data from Transfermarkt

fastapi football players scraper soccer transfermarkt webscraping

Last synced: 17 Jan 2026

https://github.com/oxylabs/python-web-scraping-tutorial

In this Python Web Scraping Tutorial, we will outline everything needed to get started with web scraping. We will begin with simple examples and move on to relatively more complex.

amazon-scraper-python crawler github-python json-database-python python python-projects python-web-crawler python-web-scraper scraper-python scraping web-crawler-python web-scraping web-scraping-api web-scraping-python webscraping

Last synced: 16 May 2025

https://github.com/Olney1/ChatGPT-OpenAI-Smart-Speaker

This AI Smart Speaker uses speech recognition, TTS (text-to-speech), and STT (speech-to-text) to enable voice and vision-driven conversations, with additional web search capabilities via OpenAI and Langchain agents.

agents ai artificial-intelligence chatgpt gpt-4 langchain langsmith openai smarthome smartspeaker speech-recognition speech-to-text tavily text-to-speech vision vision-and-language webscraping

Last synced: 07 Apr 2025

https://github.com/browserutils/kooky

Go code to read cookies from browser cookie stores.

browser cookies firefox go golang google-chrome safari webscraping

Last synced: 01 Apr 2026

https://github.com/olney1/chatgpt-openai-smart-speaker

This AI Smart Speaker uses speech recognition, TTS (text-to-speech), and STT (speech-to-text) to enable voice and vision-driven conversations, with additional web search capabilities via OpenAI and Langchain agents.

agents ai artificial-intelligence chatgpt gpt-4 langchain langsmith openai smarthome smartspeaker speech-recognition speech-to-text tavily text-to-speech vision vision-and-language webscraping

Last synced: 03 Oct 2025

https://github.com/glaucocustodio/tanakai

Tanakai is a modern web scraping framework written in Ruby. A fork of Kimurai.

chrome-headless crawler kimurai scraper scrapy webscraping

Last synced: 28 Mar 2025

https://github.com/clueless-community/scrape-up

A web-scraping-based python package that enables you to scrape data from various platforms like GitHub, Twitter, Instagram, or any useful website.

beautifulsoup hacktoberfest hacktoberfest2023 package pip python selenium webscraping

Last synced: 16 May 2025

https://github.com/ispras/web-scraper-chrome-extension

Web data extraction tool implemented as chrome extension

javascript scraping scraping-tool webscraping

Last synced: 16 May 2025

https://github.com/kboghe/NordVPN-switcher

Rotate between different NordVPN servers with ease. Works both on Linux and Windows without any required changes to your code!

nordvpn vpn webscraping

Last synced: 03 Apr 2025

https://github.com/cornelk/goscrape

Web scraper that can create an offline readable version of a website

go golang scraper webscraping

Last synced: 10 Apr 2025

https://github.com/paulrobello/par_scrape

AI assisted web scraping and data extraction

ai markdown webscraping

Last synced: 28 Jan 2026

https://github.com/pwlmc/imghash

Perceptual image hashing for Node.js

computer-vision image-processing imghash webscraping

Last synced: 25 Apr 2026

https://github.com/alexjc/weboptout

Opt-Out tool to check Copyright reservations in a way that even machines can understand.

command-line-tool copyright data-ops ml-pipeline opt-out robots-txt terms-of-service webscraping

Last synced: 13 Sep 2025

https://github.com/brucedone/clock

可视化任务调度系统,精简到一个二进制文件 (Web visual task scheduler system , yes ! just one binary solve all the problems !)

dag-scheduling gocron scheduler task taskflow visual web web-scheduler webscraping

Last synced: 09 Apr 2025

https://github.com/owainlewis/falkor

Open Source web scraping API. Falkor turns web pages into queryable JSON

webscraping webscrapper

Last synced: 14 Apr 2025

https://github.com/decryptr/decryptr

An extensible API for breaking captchas

captcha r rstats tidyverse webscraping

Last synced: 25 Aug 2025

https://github.com/sshh12/llm_osint

LLM OSINT is a proof-of-concept method of using LLMs to gather information from the internet and then perform a task with this information.

gpt-4 llms osint webscraping

Last synced: 04 Apr 2025

https://github.com/driscoll42/ebayMarketAnalyzer

Scrape all eBay sold listings to determine average/median pricing, plot listings over time with trend lines, and extract to excel

ebay python scraping-websites webscraping

Last synced: 07 Apr 2025

https://github.com/mehmetozkaya/dotnetcrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

crawler crawling csharp ddd-architecture dotnetcore entity-framework-core htmlagilitypack scraping scrapy scrapy-crawler webcrawler webcrawler-htmlagilitypack webcrawling webscraper webscraping

Last synced: 11 May 2025

https://github.com/mehmetozkaya/DotnetCrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

crawler crawling csharp ddd-architecture dotnetcore entity-framework-core htmlagilitypack scraping scrapy scrapy-crawler webcrawler webcrawler-htmlagilitypack webcrawling webscraper webscraping

Last synced: 18 Apr 2025

https://github.com/guilhermecgs/ir

Projeto de calculo de Imposto de Renda em operacoes na bovespa automaticamente. Tags:canal eletronico do investidor, CEI, selenium, bovespa, IRPF, IR, imposto de renda, finance, yahoo finance, acao, fii, etf, python, crawler, webscraping, calculadora ir

acoes b3 bovespa calculadora-ir canal-eletronico-investidor cei crawler etf fii finance imposto-de-renda irpf webscraping

Last synced: 26 Apr 2025

https://github.com/aeksco/aws-pdf-textract-pipeline

:mag: Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript

aws aws-cdk aws-textract cdk cloudformation data-pipeline dynamodb jest lambda pdf puppeteer s3 serverless sns textract typescript webscraping

Last synced: 06 Oct 2025

https://github.com/dedsecinside/gotor

This program provides efficient web scraping services for Tor and non-Tor sites. The program has both a CLI and REST API.

cli command-line command-line-tool docker go golang golang-server hacktoberfest http-server information-extraction osint osint-tools rest-api service tor torbot webcrawler webcrawling webscraping

Last synced: 09 Apr 2025

https://github.com/pwlmaciejewski/imghash

Perceptual image hashing for Node.js

computer-vision image-processing imghash webscraping

Last synced: 05 Apr 2025

https://github.com/s32x/anirip

:clapper: A Crunchyroll show/season ripper

anime anime-downloader cli crunchyroll ffmpeg matroska video webscraping

Last synced: 14 Jan 2026

https://github.com/feddelegrand7/ralger

ralger makes it easy to scrape a website. Built on the shoulders of titans: rvest, xml2.

dataextraction r rstats webcrawling webscraper-website webscraping

Last synced: 06 Apr 2025