Projects in Awesome Lists tagged with webscraping

https://github.com/huginn/huginn

Create agents that monitor and act on your behalf. Your agents are standing by!

agent automation feed feedgenerator huginn monitoring notifications rss scraper twitter twitter-streaming webscraping

Last synced: 12 May 2025

https://github.com/mendableai/firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

ai ai-scraping crawler data html-to-markdown llm markdown rag scraper scraping web-crawler webscraping

Last synced: 12 May 2025

https://github.com/assafelovic/gpt-researcher

An LLM agent that conducts deep research (local and web) on any given topic and generates a long report with citations.

agent ai automation deepresearch llms mcp mcp-server python research search webscraping

Last synced: 25 Jan 2026

https://github.com/getmaxun/maxun

🔥 Open Source No Code Web Data Extraction Platform • Turn Websites To APIs & Spreadsheets With No-Code Robots In Minutes 🔥

agents api automation browser browser-automation data-extraction no-code no-code-web-scraper playwright robotic-process-automation rpa scraper self-hosted web-agent web-automation web-scraper web-scraping web-scraping-agent webscraping website-to-api

Last synced: 23 Jan 2026

https://github.com/pystardust/ani-cli

A cli tool to browse and play anime

anime cli fzf linux mac posix rofi shell steamdeck syncplay terminal termux webscraping windows

Last synced: 26 Apr 2026

https://github.com/alirezamika/autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

ai artificial-intelligence automation crawler machine-learning python scrape scraper scraping web-scraping webautomation webscraping

Last synced: 13 May 2025

https://github.com/D4Vinci/Scrapling

🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!

ai ai-scraping automation crawler crawling crawling-python data data-extraction hacktoberfest playwright python python3 scraping selectors stealth web-scraper web-scraping web-scraping-python webscraping xpath

Last synced: 13 May 2025

https://github.com/niespodd/browser-fingerprinting

Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️‍♂️ when scraping the web?

automation bot bot-detection browser-fingerprinting chromedriver chromium chromium-browser crawler detection fingerprinting puppeteer recaptcha scraper spider stealth web webscraping

Last synced: 14 May 2025

https://github.com/autoscrape-labs/pydoll

Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.

anti-detection asynchronous bot-detection browser-automation bypasscaptcha captcha-breaking cdp chromium playwright puppeteer python recaptcha-v3 selenium selenium-python turnstile-bypass webdriver webscraping

Last synced: 11 Apr 2026

https://github.com/d4vinci/scrapling

🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!

ai ai-scraping automation crawler crawling crawling-python data data-extraction hacktoberfest playwright python python3 scraping selectors stealth web-scraper web-scraping web-scraping-python webscraping xpath

Last synced: 15 Feb 2026

https://github.com/anaskhan96/soup

Web Scraper in Go, similar to BeautifulSoup

beautifulsoup go golang html-node web-scraper webscraper webscraping

Last synced: 14 May 2025

https://github.com/daijro/camoufox

🦊 Anti-detect browser

antidetect antidetect-browser fingerprint firefox networking playwright scraping webscraping

Last synced: 07 Jan 2026

https://github.com/reworkd/tarsier

Vision utilities for web interaction agents 👀

gpt4v llms ocr playwright pypi-package python selenium webscraping

Last synced: 13 May 2025

https://github.com/thewebscrapingclub/webscraping-from-0-to-hero

The web scraping open project repository aims to share knowledge and experiences about web scraping with Python

playwright python scrapy scrapy-spider scrapysplash webscraping

Last synced: 08 Apr 2025

https://github.com/TheWebScrapingClub/webscraping-from-0-to-hero

The web scraping open project repository aims to share knowledge and experiences about web scraping with Python

playwright python scrapy scrapy-spider scrapysplash webscraping

Last synced: 14 Mar 2025

https://github.com/jamesturk/scrapeghost

👻 Experimental library for scraping websites using OpenAI's GPT API.

gpt openai-api webscraping

Last synced: 15 May 2025

https://github.com/requests-cache/requests-cache

Persistent HTTP cache for python requests

cache dynamodb http mongodb performance redis requests sqlite web webscraping

Last synced: 11 Dec 2025

https://github.com/m8sec/crosslinked

LinkedIn enumeration tool to extract valid employee names from an organization through search engine scraping

enumeration linkedin-scraper osint pentest-scripts pentest-tool python3 username-generator webscraping

Last synced: 14 May 2025

https://github.com/Kaliiiiiiiiii-Vinyzu/patchright-python

Undetected Python version of the Playwright testing and automation library.

automation bot bots botting browser chrome chromedriver chromium cloudflare cloudflare-by playwright stealth undetectable undetected web-automation web-scraping webautomation webdriver webscraping

Last synced: 15 May 2026

https://github.com/m8sec/CrossLinked

LinkedIn enumeration tool to extract valid employee names from an organization through search engine scraping

enumeration linkedin-scraper osint pentest-scripts pentest-tool python3 username-generator webscraping

Last synced: 07 Apr 2025

https://github.com/holgerd77/django-dynamic-scraper

Creating Scrapy scrapers via the Django admin interface

django python scraper scraping scrapy spider webscraping

Last synced: 15 May 2025

https://github.com/raznem/parsera

Lightweight library for scraping web-sites with LLMs

ai ai-scraping data-extraction llm opensource playwright python scraping webscraping

Last synced: 11 Apr 2025

https://github.com/kaliiiiiiiiii-vinyzu/patchright

Undetected version of the Playwright testing and automation library.

automation bot bots botting browser chrome chromedriver chromium cloudflare cloudflare-bypass playwright stealth undetectable undetected web-automation web-scraping webautomation webdriver webscraping

Last synced: 13 Apr 2026

https://github.com/maxhumber/gazpacho

🥫 The simple, fast, and modern web scraping library

gazpacho scraping webscraping

Last synced: 15 May 2025

https://github.com/Skallwar/suckit

Suck the InTernet

hacktoberfest rust webscraping

Last synced: 29 Mar 2025

https://github.com/skallwar/suckit

Suck the InTernet

hacktoberfest rust webscraping

Last synced: 10 May 2025

https://github.com/mov-cli/mov-cli

Watch everything from your terminal.

android cli hacktober ios linux scraping webscraping windows

Last synced: 10 Jul 2025

https://github.com/benibela/xidel

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.

cli command-line css-selector curl data-processing datascraping html http httpie json rest scraper web webscraper webscraping wget xml xmlstarlet xpath xquery

Last synced: 15 May 2025

https://github.com/godsscion/auto_job_applier_linkedin

Make your job hunt easy by automating your application process with this Auto Applier

auto-apply automatic-job-applier automation automation-selenium job-application job-search linkedin linkedin-job-scraper linkedin-jobs-scraper python python3 selenium selenium-python undetected-chromedriver webscraping

Last synced: 15 May 2025

https://github.com/openzim/zimit

Make a ZIM file from any Web site and surf offline!

docker scraper webscraping zim

Last synced: 21 Jan 2026

https://github.com/chris-greening/instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

beginner-friendly data-mining data-science instagram instagram-data instagram-scraper lightweight python python-scraper python3 webscraping

Last synced: 07 Apr 2025

https://github.com/z0m31en7/uscrapper

Uscrapper Vanta: Dive deeper into the web with this powerful open-source tool. Extract valuable insights with ease and efficiency, from both surface and deep web sources. Empower your data mining and analysis with Vanta's advanced capabilities. Fast, reliable, and user-friendly, Uscrapper Vanta is the ultimate choice for researchers and analysts.

darkweb darkweb-crawler information-extraction information-gathering osint osint-python osint-tool python reconnaissance selenium selenium-webscraper tor web-scraping webcra webcrawler webscraping website-scraper websites

Last synced: 15 May 2025

https://github.com/wodsuz/easyapplyjobsbot

A python bot to automatically apply all Linkedin,Glassdoor, etc Easy Apply jobs based on your preferences. Auto login, auto fill additional questions, apply automatically!

ai apply-jobs automated automation bot challenge chatgpt find-jobs glassdoor glassdoor-scraper indeed job jobs linkedin list-jobs python3 selenium webscraping ziprecruiter

Last synced: 13 Apr 2025

https://github.com/z0m31en7/Uscrapper

Uscrapper Vanta: Dive deeper into the web with this powerful open-source tool. Extract valuable insights with ease and efficiency, from both surface and deep web sources. Empower your data mining and analysis with Vanta's advanced capabilities. Fast, reliable, and user-friendly, Uscrapper Vanta is the ultimate choice for researchers and analysts.

darkweb darkweb-crawler information-extraction information-gathering osint osint-python osint-tool python reconnaissance selenium selenium-webscraper tor web-scraping webcra webcrawler webscraping website-scraper websites

Last synced: 05 May 2025

https://github.com/adrianhajdin/pricewise

Dive into web scraping and build a Next.js 13 eCommerce price tracker within a single video that teaches you data scraping, cron jobs, sending emails, deployment, and more.

scraping webscraping

Last synced: 04 Apr 2025

https://github.com/TheCodeMonks/NYTimes-App

🗽 A Simple Demonstration of the New York Times App 📱 using Jsoup web crawler with MVVM Architecture 🔥

android android-application android-architecture android-development coroutines hacktoberfest jetpack-android jetpack-datastore jetpack-navigation jsoup-android kotlin kotlin-android livedata material-design material-ui mvvm-android recyclerview room-persistence-library viewmodel webscraping

Last synced: 13 Apr 2025

https://github.com/thecodemonks/nytimes-app

🗽 A Simple Demonstration of the New York Times App 📱 using Jsoup web crawler with MVVM Architecture 🔥

android android-application android-architecture android-development coroutines hacktoberfest jetpack-android jetpack-datastore jetpack-navigation jsoup-android kotlin kotlin-android livedata material-design material-ui mvvm-android recyclerview room-persistence-library viewmodel webscraping

Last synced: 05 Apr 2025

https://github.com/jchao01/TradingView-data-scraper

Extract price and indicator data from TradingView charts to create ML datasets

algorithmic-trading data-mining json tradingview webscraping

Last synced: 26 Mar 2025

https://github.com/scrapfly/scrapfly-scrapers

Scalable Python web scraping scripts for +40 popular domains

antibot automation captcha-bypass crawler crawling crawling-python datascraping proxies python python-scraper scraper scraping scraping-python spider twitter-scraper web-crawler web-scraping web-scraping-python webscraper webscraping

Last synced: 11 Apr 2025

https://github.com/openaustralia/morph

Take the hassle out of web scraping

civictech docker webscraping

Last synced: 03 Apr 2025

https://github.com/EZ-hwh/AutoScraper

Official implement of paper "AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation"

webscraping

Last synced: 04 Oct 2025

https://github.com/roniemartinez/dude

dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators

async beautifulsoup4 crawler css framework lxml parsel playwright python scraper scraping selenium sync web-scraping webscraping xpath

Last synced: 16 Mar 2025

https://github.com/0xMassi/webclaw

Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.

ai ai-agents ai-scraping cli crawler data-extraction html-to-markdown llm markdown mcp mcp-server rust scraper self-hosted tls-fingerprinting web-crawler web-extraction web-scraper web-scraping webscraping

Last synced: 04 Apr 2026

https://github.com/vil/h4x-tools

Open source toolkit for scraping, OSINT and more.

data-gathering dirbuster email-osint h4x-tools hacking hacking-tool hacktools igscraper ip-scanner linux osint phone-number port-scanner python python-script python3 tools webhook-spammer webscraping websearch

Last synced: 08 Apr 2025

https://github.com/stephanlensky/zendriver

A blazing fast, async-first, undetectable webscraping/web automation framework based on ultrafunkamsterdam/nodriver. Now with Docker support!

anti-bot async bot-detection browser browser-automation captcha chrome chrome-devtools-protocol chromedriver cloudflare cloudflare-bypass python selenium webdriver webscraping

Last synced: 16 May 2025

https://github.com/kaliiiiiiiiii/undetected-playwright-python

Undetected version of the Playwright testing and automation library.

automation bot bots botting browser chrome chromedriver chromium cloudflare cloudflare-bypass playwright stealth undetectable undetected web-automation web-scraping webautomation webdriver webscraping

Last synced: 27 Oct 2025

https://github.com/ZacharyHampton/HomeHarvest

Python package for scraping real estate property data

data finance mls properties proptech real-estate realtor redfin redfin-scraper scraper scraping webscraping zillow zillow-scraper

Last synced: 26 Oct 2025

https://github.com/rootviii/proxy_requests

a class that uses scraped proxies to make http GET/POST requests (Python requests)

http http-get http-getter http-proxy http-proxy-middleware proxy proxy-list proxy-requests proxy-server python python-requests python3 recursion recursion-problem requests requests-module webscraper webscraper-api webscraping

Last synced: 02 Apr 2025

https://github.com/rootVIII/proxy_requests

a class that uses scraped proxies to make http GET/POST requests (Python requests)

http http-get http-getter http-proxy http-proxy-middleware proxy proxy-list proxy-requests proxy-server python python-requests python3 recursion recursion-problem requests requests-module webscraper webscraper-api webscraping

Last synced: 22 Mar 2025

https://github.com/yusuzech/r-web-scraping-cheat-sheet

Guide, reference and cheatsheet on web scraping using rvest, httr and Rselenium.

cheatsheet httr r rselenium rvest scrape-websites web-scraping webscraping

Last synced: 29 Apr 2026

https://github.com/kaliiiiiiiiii-vinyzu/patchright-python

Undetected Python version of the Playwright testing and automation library.

automation bot bots botting browser chrome chromedriver chromium cloudflare cloudflare-by playwright stealth undetectable undetected web-automation web-scraping webautomation webdriver webscraping

Last synced: 06 Mar 2026

https://github.com/mthipparthi/operating-systems-three-easy-pieces

operating systems three easy pieces by Rezmi

operating-system operating-system-learning python webscraping

Last synced: 04 Apr 2025

https://github.com/salimk/rcrawler

An R web crawler and scraper

crawler crawlers r rpackage scraper webcrawler webscraper webscraping webscrapping

Last synced: 12 Apr 2025

https://github.com/lkuffo/web-scraping

Más de 50 ejemplos de web scraping utilizando: Requests | Scrapy | Selenium | LXML | BeautifulSoup

beautifulsoup beautifulsoup4 lxml-etree scraping scraping-python scraping-websites scrapping-python scrapy scrapy-crawler scrapy-spider selenium selenium-python selenium-webdriver web-scraping webscraping

Last synced: 07 Apr 2025

https://github.com/salimk/Rcrawler

An R web crawler and scraper

crawler crawlers r rpackage scraper webcrawler webscraper webscraping webscrapping

Last synced: 14 Mar 2025

https://github.com/davidteather/tiktokbot

A TikTokBot that downloads trending tiktok videos and compiles them using FFmpeg

api bot editing ffmpeg hacktoberfest tik tiktok tiktok-api tiktok-compilations tok trending-tiktok-videos unoffical video webscraping

Last synced: 13 Aug 2025

https://github.com/dmi3kno/polite

Be nice on the web

crawler memoise r r-package rate-limiter robotstxt rstats rvest scraper webscraping

Last synced: 22 Oct 2025

https://github.com/thecodemonks/nytimes-ios

🗽 NY Times is an Minimal News 🗞 iOS app 📱 built to describe the use of SwiftSoup and CoreData with SwiftUI🔥

combine coredata coredata-swiftui dependency-injection hacktoberfest ios ios-app ios-app-development ios-open-source ios-swift mvvm-architecture singleton swift5 swiftsoup swiftui swiftui-example swiftui-learning unittest viewmodel webscraping

Last synced: 06 Apr 2025

https://github.com/TheCodeMonks/NYTimes-iOS

🗽 NY Times is an Minimal News 🗞 iOS app 📱 built to describe the use of SwiftSoup and CoreData with SwiftUI🔥

combine coredata coredata-swiftui dependency-injection hacktoberfest ios ios-app ios-app-development ios-open-source ios-swift mvvm-architecture singleton swift5 swiftsoup swiftui swiftui-example swiftui-learning unittest viewmodel webscraping

Last synced: 03 May 2025

https://github.com/wodsuz/EasyApplyJobsBot

A python bot to automatically apply all Linkedin,Glassdoor, etc Easy Apply jobs based on your preferences. Auto login, auto fill additional questions, apply automatically!

ai apply-jobs automated automation bot challenge chatgpt find-jobs glassdoor glassdoor-scraper indeed job jobs linkedin list-jobs python3 selenium webscraping ziprecruiter

Last synced: 03 Aug 2025

https://github.com/N0-0NE-Dev/NoFasel

A streaming app with no ADs.

android entertainment foss free free-download hulu movie movies movies-downloader netflix open-source piracy react react-native streaming streaming-service webscraping

Last synced: 20 Apr 2025

https://github.com/GodsScion/Auto_job_applier_linkedIn

Make your job hunt easy by automating your application process with this Auto Applier

auto-apply automatic-job-applier automation automation-selenium job-application job-search linkedin linkedin-job-scraper linkedin-jobs-scraper python python3 selenium selenium-python undetected-chromedriver webscraping

Last synced: 11 Aug 2025

https://github.com/milaan9/91_python_mini_projects

covid-19-india ipython-to-pdf js-in-python mini-program mini-project mini-projects miniprogram py-to-exe python-digital-clock python-games python-mini-projects python-tutor python-tutorial-github python-tutorial-notebook python-tutorials python4beginner python4datascience python4everybody tutor-milaan9 webscraping

Last synced: 05 Apr 2025

https://github.com/yaroslaff/nudecrawler

Crawl telegra.ph searching for nudes!

crawl crawler find nsfw nsfw-recognition nude nudes nudity-detection onlyfans python python3 scrape scraper scraping search spider telegra-ph tits web-scraping webscraping

Last synced: 04 Apr 2025

https://github.com/currentslab/extractnet

A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package

author-extraction content-extraction date-extraction machine-learning news news-articles news-extraction news-extractor python text-cleaning text-mining web-scraping webscraping

Last synced: 17 Mar 2026

https://github.com/felipeall/transfermarkt-api

API service to get data from Transfermarkt

fastapi football players scraper soccer transfermarkt webscraping

Last synced: 17 Jan 2026

https://github.com/davidteather/everything-web-scraping

Learn everything web scraping with David Teather Codes on YouTube

course courses everything hacktoberfest hacktoerfest project-based-learning project-based-learning-courses project-based-tutorials python python-web-scraper python3 reverse-engineering web-scraping web-scraping-python web-scraping-tutorial webscraping youtube-series

Last synced: 27 Oct 2025

https://github.com/oxylabs/python-web-scraping-tutorial

In this Python Web Scraping Tutorial, we will outline everything needed to get started with web scraping. We will begin with simple examples and move on to relatively more complex.

amazon-scraper-python crawler github-python json-database-python python python-projects python-web-crawler python-web-scraper scraper-python scraping web-crawler-python web-scraping web-scraping-api web-scraping-python webscraping

Last synced: 16 May 2025

https://github.com/Olney1/ChatGPT-OpenAI-Smart-Speaker

This AI Smart Speaker uses speech recognition, TTS (text-to-speech), and STT (speech-to-text) to enable voice and vision-driven conversations, with additional web search capabilities via OpenAI and Langchain agents.

agents ai artificial-intelligence chatgpt gpt-4 langchain langsmith openai smarthome smartspeaker speech-recognition speech-to-text tavily text-to-speech vision vision-and-language webscraping

Last synced: 07 Apr 2025

https://github.com/browserutils/kooky

Go code to read cookies from browser cookie stores.

browser cookies firefox go golang google-chrome safari webscraping

Last synced: 01 Apr 2026

https://github.com/olney1/chatgpt-openai-smart-speaker

This AI Smart Speaker uses speech recognition, TTS (text-to-speech), and STT (speech-to-text) to enable voice and vision-driven conversations, with additional web search capabilities via OpenAI and Langchain agents.

agents ai artificial-intelligence chatgpt gpt-4 langchain langsmith openai smarthome smartspeaker speech-recognition speech-to-text tavily text-to-speech vision vision-and-language webscraping

Last synced: 03 Oct 2025

https://github.com/glaucocustodio/tanakai

Tanakai is a modern web scraping framework written in Ruby. A fork of Kimurai.

chrome-headless crawler kimurai scraper scrapy webscraping

Last synced: 28 Mar 2025

https://github.com/clueless-community/scrape-up

A web-scraping-based python package that enables you to scrape data from various platforms like GitHub, Twitter, Instagram, or any useful website.

beautifulsoup hacktoberfest hacktoberfest2023 package pip python selenium webscraping

Last synced: 16 May 2025

https://github.com/kaliiiiiiiiii-vinyzu/patchright-nodejs

Undetected NodeJS version of the Playwright testing and automation library.

automation bot bots botting browser chrome chromedriver chromium cloudflare cloudflare-bypass playwright stealth undetectable undetected web-auto web-scraping webautomation webdriver webscraping

Last synced: 15 May 2025

https://github.com/ispras/web-scraper-chrome-extension

Web data extraction tool implemented as chrome extension

javascript scraping scraping-tool webscraping

Last synced: 16 May 2025

https://github.com/kboghe/NordVPN-switcher

Rotate between different NordVPN servers with ease. Works both on Linux and Windows without any required changes to your code!

nordvpn vpn webscraping

Last synced: 03 Apr 2025

https://github.com/cornelk/goscrape

Web scraper that can create an offline readable version of a website

go golang scraper webscraping

Last synced: 10 Apr 2025

https://github.com/paulrobello/par_scrape

AI assisted web scraping and data extraction

ai markdown webscraping

Last synced: 28 Jan 2026

https://github.com/hhhrrrttt222111/dorkify

Perform Google Dork search with Dorkify

dork dorkify google google-dorking google-dorks hacking hacktoberfest information-gathering osint osint-python python scraping web webscraping

Last synced: 12 May 2025

https://github.com/hhhrrrttt222111/Dorkify

Perform Google Dork search with Dorkify

dork dorkify google google-dorking google-dorks hacking hacktoberfest information-gathering osint osint-python python scraping web webscraping

Last synced: 12 Jul 2025

https://github.com/pwlmc/imghash

Perceptual image hashing for Node.js

computer-vision image-processing imghash webscraping

Last synced: 25 Apr 2026

https://github.com/alexjc/weboptout

Opt-Out tool to check Copyright reservations in a way that even machines can understand.

command-line-tool copyright data-ops ml-pipeline opt-out robots-txt terms-of-service webscraping

Last synced: 13 Sep 2025

https://github.com/brucedone/clock

可视化任务调度系统，精简到一个二进制文件 (Web visual task scheduler system , yes ! just one binary solve all the problems !)

dag-scheduling gocron scheduler task taskflow visual web web-scheduler webscraping

Last synced: 09 Apr 2025

https://github.com/owainlewis/falkor

Open Source web scraping API. Falkor turns web pages into queryable JSON

webscraping webscrapper

Last synced: 14 Apr 2025

https://github.com/decryptr/decryptr

An extensible API for breaking captchas

captcha r rstats tidyverse webscraping

Last synced: 25 Aug 2025

https://github.com/sshh12/llm_osint

LLM OSINT is a proof-of-concept method of using LLMs to gather information from the internet and then perform a task with this information.

gpt-4 llms osint webscraping

Last synced: 04 Apr 2025

https://github.com/driscoll42/ebayMarketAnalyzer

Scrape all eBay sold listings to determine average/median pricing, plot listings over time with trend lines, and extract to excel

ebay python scraping-websites webscraping

Last synced: 07 Apr 2025

https://github.com/mehmetozkaya/dotnetcrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

crawler crawling csharp ddd-architecture dotnetcore entity-framework-core htmlagilitypack scraping scrapy scrapy-crawler webcrawler webcrawler-htmlagilitypack webcrawling webscraper webscraping

Last synced: 11 May 2025

https://github.com/serpapi/clauneck

A tool for scraping emails, social media accounts, and much more information from websites using Google Search Results.

automation command-line command-line-tool data-extraction data-extractor email email-extract-with-proxy email-extraction email-extractor email-marketing email-scraper open-source ruby rubygem serp social-media-scraper web-crawling webscraping

Last synced: 06 Apr 2025

https://github.com/mehmetozkaya/DotnetCrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

crawler crawling csharp ddd-architecture dotnetcore entity-framework-core htmlagilitypack scraping scrapy scrapy-crawler webcrawler webcrawler-htmlagilitypack webcrawling webscraper webscraping

Last synced: 18 Apr 2025

https://github.com/ropensci/webchem

Chemical Information from the Web

cas-number chemical-information chemspider identifier r r-package ropensci rstats webscraping

Last synced: 22 Feb 2026

https://github.com/guilhermecgs/ir

Projeto de calculo de Imposto de Renda em operacoes na bovespa automaticamente. Tags:canal eletronico do investidor, CEI, selenium, bovespa, IRPF, IR, imposto de renda, finance, yahoo finance, acao, fii, etf, python, crawler, webscraping, calculadora ir

acoes b3 bovespa calculadora-ir canal-eletronico-investidor cei crawler etf fii finance imposto-de-renda irpf webscraping

Last synced: 26 Apr 2025

https://github.com/aeksco/aws-pdf-textract-pipeline

:mag: Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript

aws aws-cdk aws-textract cdk cloudformation data-pipeline dynamodb jest lambda pdf puppeteer s3 serverless sns textract typescript webscraping