Projects in Awesome Lists tagged with webscraping
A curated list of projects in awesome lists tagged with webscraping .
https://github.com/huginn/huginn
Create agents that monitor and act on your behalf. Your agents are standing by!
agent automation feed feedgenerator huginn monitoring notifications rss scraper twitter twitter-streaming webscraping
Last synced: 12 May 2025
https://github.com/mendableai/firecrawl
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
ai ai-scraping crawler data html-to-markdown llm markdown rag scraper scraping web-crawler webscraping
Last synced: 12 May 2025
https://github.com/assafelovic/gpt-researcher
An LLM agent that conducts deep research (local and web) on any given topic and generates a long report with citations.
agent ai automation deepresearch llms mcp mcp-server python research search webscraping
Last synced: 25 Jan 2026
https://github.com/getmaxun/maxun
🔥 Open Source No Code Web Data Extraction Platform • Turn Websites To APIs & Spreadsheets With No-Code Robots In Minutes 🔥
agents api automation browser browser-automation data-extraction no-code no-code-web-scraper playwright robotic-process-automation rpa scraper self-hosted web-agent web-automation web-scraper web-scraping web-scraping-agent webscraping website-to-api
Last synced: 23 Jan 2026
https://github.com/alirezamika/autoscraper
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
ai artificial-intelligence automation crawler machine-learning python scrape scraper scraping web-scraping webautomation webscraping
Last synced: 13 May 2025
https://github.com/D4Vinci/Scrapling
🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!
ai ai-scraping automation crawler crawling crawling-python data data-extraction hacktoberfest playwright python python3 scraping selectors stealth web-scraper web-scraping web-scraping-python webscraping xpath
Last synced: 13 May 2025
https://github.com/niespodd/browser-fingerprinting
Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️♂️ when scraping the web?
automation bot bot-detection browser-fingerprinting chromedriver chromium chromium-browser crawler detection fingerprinting puppeteer recaptcha scraper spider stealth web webscraping
Last synced: 14 May 2025
https://github.com/autoscrape-labs/pydoll
Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.
anti-detection asynchronous bot-detection browser-automation bypasscaptcha captcha-breaking cdp chromium playwright puppeteer python recaptcha-v3 selenium selenium-python turnstile-bypass webdriver webscraping
Last synced: 11 Apr 2026
https://github.com/d4vinci/scrapling
🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!
ai ai-scraping automation crawler crawling crawling-python data data-extraction hacktoberfest playwright python python3 scraping selectors stealth web-scraper web-scraping web-scraping-python webscraping xpath
Last synced: 15 Feb 2026
https://github.com/anaskhan96/soup
Web Scraper in Go, similar to BeautifulSoup
beautifulsoup go golang html-node web-scraper webscraper webscraping
Last synced: 14 May 2025
https://github.com/daijro/camoufox
🦊 Anti-detect browser
antidetect antidetect-browser fingerprint firefox networking playwright scraping webscraping
Last synced: 07 Jan 2026
https://github.com/reworkd/tarsier
Vision utilities for web interaction agents 👀
gpt4v llms ocr playwright pypi-package python selenium webscraping
Last synced: 13 May 2025
https://github.com/thewebscrapingclub/webscraping-from-0-to-hero
The web scraping open project repository aims to share knowledge and experiences about web scraping with Python
playwright python scrapy scrapy-spider scrapysplash webscraping
Last synced: 08 Apr 2025
https://github.com/TheWebScrapingClub/webscraping-from-0-to-hero
The web scraping open project repository aims to share knowledge and experiences about web scraping with Python
playwright python scrapy scrapy-spider scrapysplash webscraping
Last synced: 14 Mar 2025
https://github.com/jamesturk/scrapeghost
👻 Experimental library for scraping websites using OpenAI's GPT API.
Last synced: 15 May 2025
https://github.com/requests-cache/requests-cache
Persistent HTTP cache for python requests
cache dynamodb http mongodb performance redis requests sqlite web webscraping
Last synced: 11 Dec 2025
https://github.com/m8sec/crosslinked
LinkedIn enumeration tool to extract valid employee names from an organization through search engine scraping
enumeration linkedin-scraper osint pentest-scripts pentest-tool python3 username-generator webscraping
Last synced: 14 May 2025
https://github.com/Kaliiiiiiiiii-Vinyzu/patchright-python
Undetected Python version of the Playwright testing and automation library.
automation bot bots botting browser chrome chromedriver chromium cloudflare cloudflare-by playwright stealth undetectable undetected web-automation web-scraping webautomation webdriver webscraping
Last synced: 15 May 2026
https://github.com/m8sec/CrossLinked
LinkedIn enumeration tool to extract valid employee names from an organization through search engine scraping
enumeration linkedin-scraper osint pentest-scripts pentest-tool python3 username-generator webscraping
Last synced: 07 Apr 2025
https://github.com/holgerd77/django-dynamic-scraper
Creating Scrapy scrapers via the Django admin interface
django python scraper scraping scrapy spider webscraping
Last synced: 15 May 2025
https://github.com/raznem/parsera
Lightweight library for scraping web-sites with LLMs
ai ai-scraping data-extraction llm opensource playwright python scraping webscraping
Last synced: 11 Apr 2025
https://github.com/kaliiiiiiiiii-vinyzu/patchright
Undetected version of the Playwright testing and automation library.
automation bot bots botting browser chrome chromedriver chromium cloudflare cloudflare-bypass playwright stealth undetectable undetected web-automation web-scraping webautomation webdriver webscraping
Last synced: 13 Apr 2026
https://github.com/maxhumber/gazpacho
🥫 The simple, fast, and modern web scraping library
Last synced: 15 May 2025
https://github.com/Skallwar/suckit
Suck the InTernet
hacktoberfest rust webscraping
Last synced: 29 Mar 2025
https://github.com/skallwar/suckit
Suck the InTernet
hacktoberfest rust webscraping
Last synced: 10 May 2025
https://github.com/mov-cli/mov-cli
Watch everything from your terminal.
android cli hacktober ios linux scraping webscraping windows
Last synced: 10 Jul 2025
https://github.com/benibela/xidel
Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
cli command-line css-selector curl data-processing datascraping html http httpie json rest scraper web webscraper webscraping wget xml xmlstarlet xpath xquery
Last synced: 15 May 2025
https://github.com/godsscion/auto_job_applier_linkedin
Make your job hunt easy by automating your application process with this Auto Applier
auto-apply automatic-job-applier automation automation-selenium job-application job-search linkedin linkedin-job-scraper linkedin-jobs-scraper python python3 selenium selenium-python undetected-chromedriver webscraping
Last synced: 15 May 2025
https://github.com/openzim/zimit
Make a ZIM file from any Web site and surf offline!
docker scraper webscraping zim
Last synced: 21 Jan 2026
https://github.com/chris-greening/instascrape
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
beginner-friendly data-mining data-science instagram instagram-data instagram-scraper lightweight python python-scraper python3 webscraping
Last synced: 07 Apr 2025
https://github.com/z0m31en7/uscrapper
Uscrapper Vanta: Dive deeper into the web with this powerful open-source tool. Extract valuable insights with ease and efficiency, from both surface and deep web sources. Empower your data mining and analysis with Vanta's advanced capabilities. Fast, reliable, and user-friendly, Uscrapper Vanta is the ultimate choice for researchers and analysts.
darkweb darkweb-crawler information-extraction information-gathering osint osint-python osint-tool python reconnaissance selenium selenium-webscraper tor web-scraping webcra webcrawler webscraping website-scraper websites
Last synced: 15 May 2025
https://github.com/wodsuz/easyapplyjobsbot
A python bot to automatically apply all Linkedin,Glassdoor, etc Easy Apply jobs based on your preferences. Auto login, auto fill additional questions, apply automatically!
ai apply-jobs automated automation bot challenge chatgpt find-jobs glassdoor glassdoor-scraper indeed job jobs linkedin list-jobs python3 selenium webscraping ziprecruiter
Last synced: 13 Apr 2025
https://github.com/z0m31en7/Uscrapper
Uscrapper Vanta: Dive deeper into the web with this powerful open-source tool. Extract valuable insights with ease and efficiency, from both surface and deep web sources. Empower your data mining and analysis with Vanta's advanced capabilities. Fast, reliable, and user-friendly, Uscrapper Vanta is the ultimate choice for researchers and analysts.
darkweb darkweb-crawler information-extraction information-gathering osint osint-python osint-tool python reconnaissance selenium selenium-webscraper tor web-scraping webcra webcrawler webscraping website-scraper websites
Last synced: 05 May 2025
https://github.com/adrianhajdin/pricewise
Dive into web scraping and build a Next.js 13 eCommerce price tracker within a single video that teaches you data scraping, cron jobs, sending emails, deployment, and more.
Last synced: 04 Apr 2025
https://github.com/TheCodeMonks/NYTimes-App
🗽 A Simple Demonstration of the New York Times App 📱 using Jsoup web crawler with MVVM Architecture 🔥
android android-application android-architecture android-development coroutines hacktoberfest jetpack-android jetpack-datastore jetpack-navigation jsoup-android kotlin kotlin-android livedata material-design material-ui mvvm-android recyclerview room-persistence-library viewmodel webscraping
Last synced: 13 Apr 2025
https://github.com/thecodemonks/nytimes-app
🗽 A Simple Demonstration of the New York Times App 📱 using Jsoup web crawler with MVVM Architecture 🔥
android android-application android-architecture android-development coroutines hacktoberfest jetpack-android jetpack-datastore jetpack-navigation jsoup-android kotlin kotlin-android livedata material-design material-ui mvvm-android recyclerview room-persistence-library viewmodel webscraping
Last synced: 05 Apr 2025
https://github.com/jchao01/TradingView-data-scraper
Extract price and indicator data from TradingView charts to create ML datasets
algorithmic-trading data-mining json tradingview webscraping
Last synced: 26 Mar 2025
https://github.com/scrapfly/scrapfly-scrapers
Scalable Python web scraping scripts for +40 popular domains
antibot automation captcha-bypass crawler crawling crawling-python datascraping proxies python python-scraper scraper scraping scraping-python spider twitter-scraper web-crawler web-scraping web-scraping-python webscraper webscraping
Last synced: 11 Apr 2025
https://github.com/EZ-hwh/AutoScraper
Official implement of paper "AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation"
Last synced: 04 Oct 2025
https://github.com/roniemartinez/dude
dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators
async beautifulsoup4 crawler css framework lxml parsel playwright python scraper scraping selenium sync web-scraping webscraping xpath
Last synced: 16 Mar 2025
https://github.com/0xMassi/webclaw
Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.
ai ai-agents ai-scraping cli crawler data-extraction html-to-markdown llm markdown mcp mcp-server rust scraper self-hosted tls-fingerprinting web-crawler web-extraction web-scraper web-scraping webscraping
Last synced: 04 Apr 2026
https://github.com/vil/h4x-tools
Open source toolkit for scraping, OSINT and more.
data-gathering dirbuster email-osint h4x-tools hacking hacking-tool hacktools igscraper ip-scanner linux osint phone-number port-scanner python python-script python3 tools webhook-spammer webscraping websearch
Last synced: 08 Apr 2025
https://github.com/stephanlensky/zendriver
A blazing fast, async-first, undetectable webscraping/web automation framework based on ultrafunkamsterdam/nodriver. Now with Docker support!
anti-bot async bot-detection browser browser-automation captcha chrome chrome-devtools-protocol chromedriver cloudflare cloudflare-bypass python selenium webdriver webscraping
Last synced: 16 May 2025
https://github.com/kaliiiiiiiiii/undetected-playwright-python
Undetected version of the Playwright testing and automation library.
automation bot bots botting browser chrome chromedriver chromium cloudflare cloudflare-bypass playwright stealth undetectable undetected web-automation web-scraping webautomation webdriver webscraping
Last synced: 27 Oct 2025
https://github.com/ZacharyHampton/HomeHarvest
Python package for scraping real estate property data
data finance mls properties proptech real-estate realtor redfin redfin-scraper scraper scraping webscraping zillow zillow-scraper
Last synced: 26 Oct 2025
https://github.com/rootviii/proxy_requests
a class that uses scraped proxies to make http GET/POST requests (Python requests)
http http-get http-getter http-proxy http-proxy-middleware proxy proxy-list proxy-requests proxy-server python python-requests python3 recursion recursion-problem requests requests-module webscraper webscraper-api webscraping
Last synced: 02 Apr 2025
https://github.com/rootVIII/proxy_requests
a class that uses scraped proxies to make http GET/POST requests (Python requests)
http http-get http-getter http-proxy http-proxy-middleware proxy proxy-list proxy-requests proxy-server python python-requests python3 recursion recursion-problem requests requests-module webscraper webscraper-api webscraping
Last synced: 22 Mar 2025
https://github.com/yusuzech/r-web-scraping-cheat-sheet
Guide, reference and cheatsheet on web scraping using rvest, httr and Rselenium.
cheatsheet httr r rselenium rvest scrape-websites web-scraping webscraping
Last synced: 29 Apr 2026
https://github.com/kaliiiiiiiiii-vinyzu/patchright-python
Undetected Python version of the Playwright testing and automation library.
automation bot bots botting browser chrome chromedriver chromium cloudflare cloudflare-by playwright stealth undetectable undetected web-automation web-scraping webautomation webdriver webscraping
Last synced: 06 Mar 2026
https://github.com/mthipparthi/operating-systems-three-easy-pieces
operating systems three easy pieces by Rezmi
operating-system operating-system-learning python webscraping
Last synced: 04 Apr 2025
https://github.com/salimk/rcrawler
An R web crawler and scraper
crawler crawlers r rpackage scraper webcrawler webscraper webscraping webscrapping
Last synced: 12 Apr 2025
https://github.com/lkuffo/web-scraping
Más de 50 ejemplos de web scraping utilizando: Requests | Scrapy | Selenium | LXML | BeautifulSoup
beautifulsoup beautifulsoup4 lxml-etree scraping scraping-python scraping-websites scrapping-python scrapy scrapy-crawler scrapy-spider selenium selenium-python selenium-webdriver web-scraping webscraping
Last synced: 07 Apr 2025
https://github.com/salimk/Rcrawler
An R web crawler and scraper
crawler crawlers r rpackage scraper webcrawler webscraper webscraping webscrapping
Last synced: 14 Mar 2025
https://github.com/davidteather/tiktokbot
A TikTokBot that downloads trending tiktok videos and compiles them using FFmpeg
api bot editing ffmpeg hacktoberfest tik tiktok tiktok-api tiktok-compilations tok trending-tiktok-videos unoffical video webscraping
Last synced: 13 Aug 2025
https://github.com/dmi3kno/polite
Be nice on the web
crawler memoise r r-package rate-limiter robotstxt rstats rvest scraper webscraping
Last synced: 22 Oct 2025
https://github.com/thecodemonks/nytimes-ios
🗽 NY Times is an Minimal News 🗞 iOS app 📱 built to describe the use of SwiftSoup and CoreData with SwiftUI🔥
combine coredata coredata-swiftui dependency-injection hacktoberfest ios ios-app ios-app-development ios-open-source ios-swift mvvm-architecture singleton swift5 swiftsoup swiftui swiftui-example swiftui-learning unittest viewmodel webscraping
Last synced: 06 Apr 2025
https://github.com/TheCodeMonks/NYTimes-iOS
🗽 NY Times is an Minimal News 🗞 iOS app 📱 built to describe the use of SwiftSoup and CoreData with SwiftUI🔥
combine coredata coredata-swiftui dependency-injection hacktoberfest ios ios-app ios-app-development ios-open-source ios-swift mvvm-architecture singleton swift5 swiftsoup swiftui swiftui-example swiftui-learning unittest viewmodel webscraping
Last synced: 03 May 2025
https://github.com/wodsuz/EasyApplyJobsBot
A python bot to automatically apply all Linkedin,Glassdoor, etc Easy Apply jobs based on your preferences. Auto login, auto fill additional questions, apply automatically!
ai apply-jobs automated automation bot challenge chatgpt find-jobs glassdoor glassdoor-scraper indeed job jobs linkedin list-jobs python3 selenium webscraping ziprecruiter
Last synced: 03 Aug 2025
https://github.com/N0-0NE-Dev/NoFasel
A streaming app with no ADs.
android entertainment foss free free-download hulu movie movies movies-downloader netflix open-source piracy react react-native streaming streaming-service webscraping
Last synced: 20 Apr 2025
https://github.com/GodsScion/Auto_job_applier_linkedIn
Make your job hunt easy by automating your application process with this Auto Applier
auto-apply automatic-job-applier automation automation-selenium job-application job-search linkedin linkedin-job-scraper linkedin-jobs-scraper python python3 selenium selenium-python undetected-chromedriver webscraping
Last synced: 11 Aug 2025
https://github.com/milaan9/91_python_mini_projects
covid-19-india ipython-to-pdf js-in-python mini-program mini-project mini-projects miniprogram py-to-exe python-digital-clock python-games python-mini-projects python-tutor python-tutorial-github python-tutorial-notebook python-tutorials python4beginner python4datascience python4everybody tutor-milaan9 webscraping
Last synced: 05 Apr 2025
https://github.com/yaroslaff/nudecrawler
Crawl telegra.ph searching for nudes!
crawl crawler find nsfw nsfw-recognition nude nudes nudity-detection onlyfans python python3 scrape scraper scraping search spider telegra-ph tits web-scraping webscraping
Last synced: 04 Apr 2025
https://github.com/currentslab/extractnet
A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package
author-extraction content-extraction date-extraction machine-learning news news-articles news-extraction news-extractor python text-cleaning text-mining web-scraping webscraping
Last synced: 17 Mar 2026
https://github.com/felipeall/transfermarkt-api
API service to get data from Transfermarkt
fastapi football players scraper soccer transfermarkt webscraping
Last synced: 17 Jan 2026
https://github.com/davidteather/everything-web-scraping
Learn everything web scraping with David Teather Codes on YouTube
course courses everything hacktoberfest hacktoerfest project-based-learning project-based-learning-courses project-based-tutorials python python-web-scraper python3 reverse-engineering web-scraping web-scraping-python web-scraping-tutorial webscraping youtube-series
Last synced: 27 Oct 2025
https://github.com/oxylabs/python-web-scraping-tutorial
In this Python Web Scraping Tutorial, we will outline everything needed to get started with web scraping. We will begin with simple examples and move on to relatively more complex.
amazon-scraper-python crawler github-python json-database-python python python-projects python-web-crawler python-web-scraper scraper-python scraping web-crawler-python web-scraping web-scraping-api web-scraping-python webscraping
Last synced: 16 May 2025
https://github.com/Olney1/ChatGPT-OpenAI-Smart-Speaker
This AI Smart Speaker uses speech recognition, TTS (text-to-speech), and STT (speech-to-text) to enable voice and vision-driven conversations, with additional web search capabilities via OpenAI and Langchain agents.
agents ai artificial-intelligence chatgpt gpt-4 langchain langsmith openai smarthome smartspeaker speech-recognition speech-to-text tavily text-to-speech vision vision-and-language webscraping
Last synced: 07 Apr 2025
https://github.com/browserutils/kooky
Go code to read cookies from browser cookie stores.
browser cookies firefox go golang google-chrome safari webscraping
Last synced: 01 Apr 2026
https://github.com/olney1/chatgpt-openai-smart-speaker
This AI Smart Speaker uses speech recognition, TTS (text-to-speech), and STT (speech-to-text) to enable voice and vision-driven conversations, with additional web search capabilities via OpenAI and Langchain agents.
agents ai artificial-intelligence chatgpt gpt-4 langchain langsmith openai smarthome smartspeaker speech-recognition speech-to-text tavily text-to-speech vision vision-and-language webscraping
Last synced: 03 Oct 2025
https://github.com/glaucocustodio/tanakai
Tanakai is a modern web scraping framework written in Ruby. A fork of Kimurai.
chrome-headless crawler kimurai scraper scrapy webscraping
Last synced: 28 Mar 2025
https://github.com/clueless-community/scrape-up
A web-scraping-based python package that enables you to scrape data from various platforms like GitHub, Twitter, Instagram, or any useful website.
beautifulsoup hacktoberfest hacktoberfest2023 package pip python selenium webscraping
Last synced: 16 May 2025
https://github.com/kaliiiiiiiiii-vinyzu/patchright-nodejs
Undetected NodeJS version of the Playwright testing and automation library.
automation bot bots botting browser chrome chromedriver chromium cloudflare cloudflare-bypass playwright stealth undetectable undetected web-auto web-scraping webautomation webdriver webscraping
Last synced: 15 May 2025
https://github.com/ispras/web-scraper-chrome-extension
Web data extraction tool implemented as chrome extension
javascript scraping scraping-tool webscraping
Last synced: 16 May 2025
https://github.com/kboghe/NordVPN-switcher
Rotate between different NordVPN servers with ease. Works both on Linux and Windows without any required changes to your code!
Last synced: 03 Apr 2025
https://github.com/cornelk/goscrape
Web scraper that can create an offline readable version of a website
Last synced: 10 Apr 2025
https://github.com/paulrobello/par_scrape
AI assisted web scraping and data extraction
Last synced: 28 Jan 2026
https://github.com/hhhrrrttt222111/dorkify
Perform Google Dork search with Dorkify
dork dorkify google google-dorking google-dorks hacking hacktoberfest information-gathering osint osint-python python scraping web webscraping
Last synced: 12 May 2025
https://github.com/hhhrrrttt222111/Dorkify
Perform Google Dork search with Dorkify
dork dorkify google google-dorking google-dorks hacking hacktoberfest information-gathering osint osint-python python scraping web webscraping
Last synced: 12 Jul 2025
https://github.com/pwlmc/imghash
Perceptual image hashing for Node.js
computer-vision image-processing imghash webscraping
Last synced: 25 Apr 2026
https://github.com/alexjc/weboptout
Opt-Out tool to check Copyright reservations in a way that even machines can understand.
command-line-tool copyright data-ops ml-pipeline opt-out robots-txt terms-of-service webscraping
Last synced: 13 Sep 2025
https://github.com/brucedone/clock
可视化任务调度系统,精简到一个二进制文件 (Web visual task scheduler system , yes ! just one binary solve all the problems !)
dag-scheduling gocron scheduler task taskflow visual web web-scheduler webscraping
Last synced: 09 Apr 2025
https://github.com/owainlewis/falkor
Open Source web scraping API. Falkor turns web pages into queryable JSON
Last synced: 14 Apr 2025
https://github.com/decryptr/decryptr
An extensible API for breaking captchas
captcha r rstats tidyverse webscraping
Last synced: 25 Aug 2025
https://github.com/sshh12/llm_osint
LLM OSINT is a proof-of-concept method of using LLMs to gather information from the internet and then perform a task with this information.
Last synced: 04 Apr 2025
https://github.com/driscoll42/ebayMarketAnalyzer
Scrape all eBay sold listings to determine average/median pricing, plot listings over time with trend lines, and extract to excel
ebay python scraping-websites webscraping
Last synced: 07 Apr 2025
https://github.com/mehmetozkaya/dotnetcrawler
DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
crawler crawling csharp ddd-architecture dotnetcore entity-framework-core htmlagilitypack scraping scrapy scrapy-crawler webcrawler webcrawler-htmlagilitypack webcrawling webscraper webscraping
Last synced: 11 May 2025
https://github.com/serpapi/clauneck
A tool for scraping emails, social media accounts, and much more information from websites using Google Search Results.
automation command-line command-line-tool data-extraction data-extractor email email-extract-with-proxy email-extraction email-extractor email-marketing email-scraper open-source ruby rubygem serp social-media-scraper web-crawling webscraping
Last synced: 06 Apr 2025
https://github.com/mehmetozkaya/DotnetCrawler
DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
crawler crawling csharp ddd-architecture dotnetcore entity-framework-core htmlagilitypack scraping scrapy scrapy-crawler webcrawler webcrawler-htmlagilitypack webcrawling webscraper webscraping
Last synced: 18 Apr 2025
https://github.com/ropensci/webchem
Chemical Information from the Web
cas-number chemical-information chemspider identifier r r-package ropensci rstats webscraping
Last synced: 22 Feb 2026
https://github.com/guilhermecgs/ir
Projeto de calculo de Imposto de Renda em operacoes na bovespa automaticamente. Tags:canal eletronico do investidor, CEI, selenium, bovespa, IRPF, IR, imposto de renda, finance, yahoo finance, acao, fii, etf, python, crawler, webscraping, calculadora ir
acoes b3 bovespa calculadora-ir canal-eletronico-investidor cei crawler etf fii finance imposto-de-renda irpf webscraping
Last synced: 26 Apr 2025
https://github.com/aeksco/aws-pdf-textract-pipeline
:mag: Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript
aws aws-cdk aws-textract cdk cloudformation data-pipeline dynamodb jest lambda pdf puppeteer s3 serverless sns textract typescript webscraping
Last synced: 06 Oct 2025
https://github.com/0xPrateek/Stardox
Github stargazers information gathering tool
beautifulsoup4 blackarch blackarch-packages github information-gathering-tool python3 recon stargazer stargazers webscraping
Last synced: 07 Apr 2025
https://github.com/dedsecinside/gotor
This program provides efficient web scraping services for Tor and non-Tor sites. The program has both a CLI and REST API.
cli command-line command-line-tool docker go golang golang-server hacktoberfest http-server information-extraction osint osint-tools rest-api service tor torbot webcrawler webcrawling webscraping
Last synced: 09 Apr 2025
https://github.com/pwlmaciejewski/imghash
Perceptual image hashing for Node.js
computer-vision image-processing imghash webscraping
Last synced: 05 Apr 2025
https://github.com/nuhmanpk/WebScrapper
Powerful Telegram bot for web scraping and crawling. Fast, easy, and loved by thousands!
beautifulsoup4 crawler crawler-engine crawler-python hacktoberfest hacktoberfest-accepted hacktoberfest2023 pyrogram pyrogram-bot requests scraper scraping selenium telegram telegram-bot web-scraping webscraping webscrapper webscrapping webscrapping-python
Last synced: 22 Jul 2025
https://github.com/s32x/anirip
:clapper: A Crunchyroll show/season ripper
anime anime-downloader cli crunchyroll ffmpeg matroska video webscraping
Last synced: 14 Jan 2026
https://github.com/feddelegrand7/ralger
ralger makes it easy to scrape a website. Built on the shoulders of titans: rvest, xml2.
dataextraction r rstats webcrawling webscraper-website webscraping
Last synced: 06 Apr 2025
https://github.com/nuhmanpk/webscrapper
Powerful Telegram bot for web scraping and crawling. Fast, easy, and loved by thousands!
beautifulsoup4 crawler crawler-engine crawler-python hacktoberfest hacktoberfest-accepted hacktoberfest2023 pyrogram pyrogram-bot requests scraper scraping selenium telegram telegram-bot web-scraping webscraping webscrapper webscrapping webscrapping-python
Last synced: 12 Apr 2025