Projects in Awesome Lists tagged with web-crawler
A curated list of projects in awesome lists tagged with web-crawler .
https://github.com/mendableai/firecrawl
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
ai ai-scraping crawler data html-to-markdown llm markdown rag scraper scraping web-crawler webscraping
Last synced: 12 May 2025
https://github.com/ScrapeGraphAI/Scrapegraph-ai
Python scraper based on AI
ai ai-scraping automated-scraper crawler html-to-markdown llm markdown rag scraping scraping-python web-crawler web-crawlers web-scraping
Last synced: 17 Oct 2025
https://github.com/scrapegraphai/scrapegraph-ai
Python scraper based on AI
ai ai-scraping automated-scraper crawler html-to-markdown llm markdown rag scraping scraping-python web-crawler web-crawlers web-scraping
Last synced: 15 Mar 2026
https://github.com/apifytech/apify-js
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
apify automation crawler crawling headless headless-chrome javascript nodejs npm playwright puppeteer scraper scraping typescript web-crawler web-crawling web-scraping
Last synced: 06 Jul 2025
https://github.com/apify/crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
apify automation crawler crawling headless headless-chrome javascript nodejs npm playwright puppeteer scraper scraping typescript web-crawler web-crawling web-scraping
Last synced: 03 Nov 2025
https://github.com/crawlab-team/crawlab
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
crawlab crawler crawling-tasks docker go platform scrapy scrapyd-ui spider spiders-management web-crawler webcrawler webspider
Last synced: 14 May 2025
https://github.com/ssssssss-team/spider-flow
新一代爬虫平台,以图形化方式定义爬虫流程,不写代码即可完成爬虫。
crawler jsoup spider spider-flow web-crawler web-spider webcrawler webspider xpath
Last synced: 14 May 2025
https://github.com/adithya-s-k/omniparse
Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
ingestion-api ocr omniparser parse-server parser-library vision-transformer web-crawler whisper-api
Last synced: 13 May 2025
https://github.com/firecrawl/firecrawl-mcp-server
🔥 Official Firecrawl MCP Server - Adds powerful web scraping and search to Cursor, Claude and any other LLM clients.
batch-processing claude content-extraction data-collection firecrawl firecrawl-ai javascript-rendering llm-tools mcp mcp-server model-context-protocol search-api web-crawler web-scraping
Last synced: 07 Apr 2026
https://github.com/apify/crawlee-python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
apify automation beautifulsoup crawler crawling hacktoberfest headless headless-chrome pip playwright python scraper scraping web-crawler web-crawling web-scraping
Last synced: 06 Mar 2026
https://github.com/apache/nutch
Apache Nutch is an extensible and scalable web crawler
apache crawling hadoop java nutch web-crawler
Last synced: 13 May 2025
https://github.com/sjdirect/abot
Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.
abot abot-nuget c-sharp crawler cross-platform csharp csharp-library javascript-renderer netcore netcore2 netcore3 netsta netstandard20 netstandard21 parsing pluggable spider spiders unit-testing web-crawler
Last synced: 13 May 2025
https://github.com/xianhu/pspider
简单易用的Python爬虫框架,QQ交流群:597510560
crawler multi-threading multiprocessing proxies python python-spider spider web-crawler web-spider
Last synced: 15 May 2025
https://github.com/xianhu/PSpider
简单易用的Python爬虫框架,QQ交流群:597510560
crawler multi-threading multiprocessing proxies python python-spider spider web-crawler web-spider
Last synced: 25 Mar 2025
https://github.com/microlinkhq/browserless
The headless Chrome/Chromium driver on top of Puppeteer. Take screenshots, generate PDFs, extract text and HTML with a production-ready API.
automation browser-automation chromium lighthouse pdf-generation screenshot web-crawler web-scraping
Last synced: 06 Mar 2026
https://github.com/Algebra-FUN/WeReadScan
扫描“微信读书”已购图书并下载本地PDF的爬虫
book-downloader selenium web-crawler weread
Last synced: 17 Oct 2025
https://github.com/webrecorder/browsertrix-crawler
Run a high-fidelity browser-based web archiving crawler in a single Docker container
crawler crawling wacz warc web-archiving web-crawler webrecorder
Last synced: 10 Feb 2026
https://github.com/apache/stormcrawler
A scalable, mature and versatile web crawler based on Apache Storm
apache-storm crawler distributed java stormcrawler web-crawler
Last synced: 13 Feb 2026
https://github.com/algebra-fun/wereadscan
扫描“微信读书”已购图书并下载本地PDF的爬虫
book-downloader selenium web-crawler weread
Last synced: 15 May 2025
https://github.com/apache/incubator-stormcrawler
A scalable, mature and versatile web crawler based on Apache Storm
apache-storm crawler distributed java stormcrawler web-crawler
Last synced: 12 Apr 2025
https://github.com/platonai/pulsarRPA
Automate webpages at scale, scrape web data completely and accurately with high performance, distributed AI-RPA.
ai-agents ai-crawler ai-rpa ai-scrarper crawler rpa scraper scraping web-crawler web-scraping
Last synced: 01 Apr 2025
https://github.com/postmodern/spidr
A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
crawler ruby scraper spider spider-links web web-crawler web-scraper web-scraping web-spider
Last synced: 13 May 2025
https://github.com/gildas-lormeau/single-file-cli
CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)
archiving cli crawler deno dockerfile nodejs scraping-websites single-file web-archiving web-crawler web-scraper web-scraping
Last synced: 15 May 2025
https://github.com/MarginaliaSearch/MarginaliaSearch
Internet search engine for text-oriented websites. Indexing the small, old and weird web.
alt-search indexer internet-search language-processing no-ai-used no-cloud search-engine small-web web-crawler
Last synced: 05 Apr 2025
https://github.com/PhialsBasement/LibreCrawl
Free desktop SEO crawler - open source alternative to Screaming Frog and similar tools. Crawl websites, analyze links, extract SEO data, and export results without subscription fees. Fully customizable and extensible!
desktop-app flask free open-source python seo seo-analysis web-crawler website-auditing
Last synced: 06 May 2026
https://github.com/scrapfly/scrapfly-scrapers
Scalable Python web scraping scripts for +40 popular domains
antibot automation captcha-bypass crawler crawling crawling-python datascraping proxies python python-scraper scraper scraping scraping-python spider twitter-scraper web-crawler web-scraping web-scraping-python webscraper webscraping
Last synced: 11 Apr 2025
https://github.com/VIDA-NYU/ache
ACHE is a web crawler for domain-specific search.
domain-specific-search focused-crawler hacktoberfest web-crawler web-scraping web-search web-spider
Last synced: 03 Apr 2025
https://github.com/vida-nyu/ache
ACHE is a web crawler for domain-specific search.
domain-specific-search focused-crawler hacktoberfest web-crawler web-scraping web-search web-spider
Last synced: 04 Apr 2025
https://github.com/ViDA-NYU/ache
ACHE is a web crawler for domain-specific search.
domain-specific-search focused-crawler hacktoberfest web-crawler web-scraping web-search web-spider
Last synced: 25 Mar 2025
https://github.com/hyunwoongko/kochat
Opensource Korean chatbot framework
chatbot deep-learning deeplearning korean korean-chatbot sentence-classification sequance-tagging web-crawler
Last synced: 05 Apr 2025
https://github.com/0xMassi/webclaw
Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.
ai ai-agents ai-scraping cli crawler data-extraction html-to-markdown llm markdown mcp mcp-server rust scraper self-hosted tls-fingerprinting web-crawler web-extraction web-scraper web-scraping webscraping
Last synced: 04 Apr 2026
https://github.com/USCDataScience/sparkler
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
big-data distributed-systems information-retrieval nutch search search-engine solr spark tika web-crawler
Last synced: 25 Mar 2025
https://github.com/brendonboshell/supercrawler
A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
crawler distributed-crawler robot sitemap web-crawler
Last synced: 12 Jan 2026
https://github.com/crwlrsoft/crawler
Library for Rapid (Web) Crawler and Scraper Development
crawler crawling hacktoberfest php scraper scraping scraping-websites web-crawler web-crawling web-scraper web-scraping
Last synced: 15 May 2025
https://github.com/rivermont/spidy
The simple, easy to use command line web crawler.
crawler crawling python python3 web-crawler web-spider
Last synced: 16 Jan 2026
https://github.com/commoncrawl/news-crawl
News crawling with StormCrawler - stores content as WARC
apache-storm common-crawl commoncrawl crawler news storm-crawler warc web-crawler
Last synced: 12 Jun 2025
https://github.com/infinilabs/crawler
🕷️ An easy-to-use spider written in Golang. (previous named GOPA.)
crawler crawling elasticsearch lightweight scraping spider web-crawler web-scraping web-spider
Last synced: 11 Apr 2026
https://github.com/s0rg/crawley
The unix-way web crawler
cli crawler go golang golang-application pentest pentest-tool pentesting unix-way web-crawler web-scraping web-spider
Last synced: 16 May 2025
https://github.com/yields/ant
A web crawler for Go
go golang scraper spider web-crawler
Last synced: 16 May 2025
https://github.com/lucasxlu/LagouJob
Data Analysis & Mining for lagou.com
data-analysis data-mining lagou machine-learning nlp python3 web-crawler
Last synced: 18 Jul 2025
https://github.com/antchfx/antch
Antch, a fast, powerful and extensible web crawling & scraping framework for Go
crawler crawling framework golang scraping web-crawler web-spider
Last synced: 14 Mar 2025
https://github.com/crawler-commons/crawler-commons
A set of reusable Java components that implement functionality common to any web crawler
java library open-source robots-txt robotstxt sitemaps web-crawler
Last synced: 06 Mar 2026
https://github.com/turnersoftware/infinitycrawler
A simple but powerful web crawler library for .NET
crawler robots-txt spider web-crawler web-crawling
Last synced: 21 Jun 2025
https://github.com/TurnerSoftware/InfinityCrawler
A simple but powerful web crawler library for .NET
crawler robots-txt spider web-crawler web-crawling
Last synced: 25 Mar 2025
https://github.com/crawlab-team/crawlab-lite
Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台
crawlab crawler crawler-management crawling-tasks platform scrapy scrapy-ui scrapyd scrapyd-ui spider web-crawler
Last synced: 28 Jan 2026
https://github.com/mendableai/firecrawl-app-examples
🔥 This repository contains complete application examples, including websites and other projects, developed using Firecrawl.
ai ai-scraping data examples html-to-markdown llm markdown rag scrapers templates web-crawler
Last synced: 13 Apr 2025
https://github.com/elliotxx/zhihu-crawler-people
A simple distributed crawler for zhihu && data analysis
crawler python python-crawler spider web-crawler web-spider
Last synced: 13 Apr 2025
https://github.com/gosom/scrapemate
Golang Crawling and scraping framework
crawler go go-framework golang scraper spider web-crawler web-scraping
Last synced: 31 Jan 2026
https://github.com/Hecate2/Ignareo-ISML-auto-voter
Ignareo the Carillon, a web crawler/spider template of ultimate high concurrency built for leprechauns. Carillons as the best web spiders; Long live the golden years of leprechauns! (ISML=international saimoe; 2022 ISML is last ISML)
asyncio chtholly concurrency distributed gevent high-performance http ignareo isml microservice python sukamoka sukasuka tiat web-crawler web-spider
Last synced: 11 Apr 2025
https://github.com/saeeddhqan/evine
Interactive CLI Web Crawler
cli crawler data-mining fuzzing go golang osint scraper web-crawler
Last synced: 12 Jan 2026
https://github.com/norconex/crawlers
Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
collector-fs collector-http crawler crawlers filesystem-crawler flexible java search-engine web-crawler
Last synced: 05 May 2026
https://github.com/madi-s/lead-generation
Python script, which empowers people with no programming background to generate robust leads on a mass scale. This repo will be compiled of various versatile techniques used in lead generation.
chromedriver lead-generation leads leadscanner parser playwright python scraper web-crawler
Last synced: 06 Jul 2025
https://github.com/abaykan/CrawlBox
Easy way to brute-force web directory.
admin-finder crawler python web-crawler wordlist
Last synced: 26 Mar 2025
https://github.com/sjdirect/abotx
Cross Platform C# Web crawler framework, headless browser, parallel crawler. Please star this project! +1.
abotx abotx-website cross-platform csharp csharp-library framework headless headless-br headless-browser javascript-renderer netcore netcore3 netstan netstandard netstandard-libraries netstandard20 spider spiders spiders- web-crawler
Last synced: 09 Apr 2025
https://github.com/mazzzystar/proxy
A simple tool for fetching usable proxies from several websites.
proxies proxy-list proxypool web-crawler
Last synced: 09 Jan 2026
https://github.com/hominee/dyer
Dyer is designed for reliable, flexible and fast web crawling, providing some high-level, comprehensive features without compromising speed.
crawler rust rust-programming-language spider web-crawler web-framework web-scraping
Last synced: 11 Mar 2026
https://github.com/maxvalue/terpene-profile-parser-for-cannabis-strains
Parser and database to index the terpene profile of different strains of Cannabis from online databases
analysis aromatherapy bioinformatics biological-data biological-data-analysis cannabis cannabis-strains crawler data-science database health plants python python-3 scrapy terpene-profile terpenes web-crawler web-crawler-python web-crawling
Last synced: 22 Apr 2025
https://github.com/pinkpixel-dev/web-scout-mcp
A powerful MCP server extension providing web search and content extraction capabilities. Integrates DuckDuckGo search functionality and URL content extraction into your MCP environment, enabling AI assistants to search the web and extract webpage content programmatically.
ai-assistant ai-tools cheerio content-extraction crawler duckduckgo duckduckgo-search google-search mcp mcp-server web-content web-crawler web-scraper web-scraping web-search web-search-agent
Last synced: 06 Mar 2026
https://github.com/kreuzberg-dev/kreuzcrawl
High-performance web crawling engine with bindings for 11 languages
crawling csharp elixir ffi golang java mcp php python ruby rust typescript wasm web-crawler web-scraping
Last synced: 24 May 2026
https://github.com/creekorful/bathyscaphe
Fast, highly configurable, cloud native dark web crawler.
architecture crawler crawling elasticsearch golang hidden-services kibana tor web-crawler
Last synced: 17 Mar 2025
https://github.com/viveckh/lilhomie
A Machine Learning Project implemented from scratch which involves web scraping, data engineering, exploratory data analysis and machine learning to predict housing prices in New York Tri-State Area.
data-engineering eda housing-price-analysis housing-price-prediction machine-learning machine-learning-projects predictions random-forest-regressor scrapy-crawler spiders trulia web-crawler
Last synced: 09 Sep 2025
https://github.com/tech-engine/goscrapy
GoScrapy: Harnessing Go's power for blazingly fast web scraping, inspired by Python's Scrapy framework.
data-extraction go-scrapy golang goscraper scrapy spider web-crawler webscraper webscrapping
Last synced: 18 Jan 2026
https://github.com/scrapingant/amazon_scraper
Amazon products scraper with using of rotating proxies and headless Chrome from ScrapingAnt
amazon amazon-scraper amazon-scraping-library data-mining js node-js price-scraper price-scraping scrape-products scraper scraping scraping-api scraping-data scraping-python scraping-web scraping-websites web-crawler web-crawlers web-crawling
Last synced: 22 Aug 2025
https://github.com/ScrapingAnt/amazon_scraper
Amazon products scraper with using of rotating proxies and headless Chrome from ScrapingAnt
amazon amazon-scraper amazon-scraping-library data-mining js node-js price-scraper price-scraping scrape-products scraper scraping scraping-api scraping-data scraping-python scraping-web scraping-websites web-crawler web-crawlers web-crawling
Last synced: 06 Apr 2025
https://github.com/redcode-labs/unchain
A tool to find redirection chains in multiple URLs
golang reconnaissance redirection url url-redirection web-crawler
Last synced: 07 Apr 2025
https://github.com/redcode-labs/UnChain
A tool to find redirection chains in multiple URLs
golang reconnaissance redirection url url-redirection web-crawler
Last synced: 11 Jul 2025
https://github.com/spider-rs/spider-py
Spider ported to Python
crawler headless-chrome python scraper spider web-crawler
Last synced: 05 Apr 2025
https://github.com/mattdeitke/cvpr2019
Displays all the 2019 CVPR Accepted Papers in a way that they are easy to parse.
computer-vision cvpr2019 imagemagick lda python web-crawler web-crawler-python
Last synced: 13 Apr 2025
https://github.com/us/crw
Fast, lightweight Firecrawl alternative in Rust. Web scraper, crawler & search API with MCP server for AI agents. Drop-in Firecrawl-compatible API (/v1/scrape, /v1/crawl, /v1/search). 2.3x faster than Tavily, 1.5x faster than Firecrawl in 1K-URL benchmarks. 6 MB RAM, single binary. Self-host or use managed cloud.
ai ai-agents crawler data-extraction docker firecrawl firecrawl-alternative html-to-markdown llm markdown mcp mcp-server rust scraping-api self-hosted tavily-alternative web-crawler web-scraper web-scraping web-search-api
Last synced: 09 May 2026
https://github.com/scrapegraphai/scrapegraph-py
Official Python SDK for the ScrapeGraph AI API. Smart scraping, search, crawling, markdownify, agentic browser automation, scheduled jobs, and structured data extraction
api json-schema python scrapegraph scraping sdk-js sdk-nodejs sdk-python web-crawler web-scraping web-scraping-python
Last synced: 21 Apr 2026
https://github.com/devopsgroup-io/siteshooter
:camera: Automate full website screenshots and PDF generation with multiple viewport support.
pdf-generation phantomjs salesforce screenshot seo sitemap web-crawler
Last synced: 13 Apr 2025
https://github.com/abo123456789/leek
Distributed task redisqueue(最简单python分布式函数调度框架)
distribute-crawler kafka leek producer-consumer queue-tasks redis redisqueue sqlite3 thread-pool web-crawler
Last synced: 10 Mar 2026
https://github.com/graphlit/graphlit-mcp-server
Model Context Protocol (MCP) Server for Graphlit Platform
claude content-extraction content-ingestion data-collection llm-tools mcp-server model-context-protocol search-api unstructured-data web-crawler web-scraping
Last synced: 12 Oct 2025
https://github.com/cheng-lin-li/market-trend-prediction
This is a project of build knowledge graph course. The project leverages historical stock price, and integrates social media listening from customers to predict market Trend On Dow Jones Industrial Average (DJIA).
djia dow-jones-industrial-average facebook facebook-crawler jupyter knowledge-graph knowledge-graph-course lstm market-trend-prediction prediction python rnn semantic-web social-media-mining twitter twitter-crawler web-crawler yahoo-finance-api
Last synced: 03 May 2025
https://github.com/avilum/smart-url-fuzzer
Explore URLs of domains fast and efficiently using fuzzing techniques
fuzzers http pentest-scripts pentest-tool pentesting python python-script python3 script scripts security security-tools urls web-crawler web-scraping website whitehat
Last synced: 21 Mar 2025
https://github.com/scrapegraphai/scrapegraph-sdk
🕷️ Official Scrapegraph API SDK: Effortlessly extract content from any website. AI-powered. 🤖 Hassle-free web scraping made simple.
api scrapegraph scraping sdk-js sdk-nodejs sdk-python web-crawler web-scraping
Last synced: 26 Jun 2025
https://github.com/shenfe/puppeteer-service
🎠 Run headless Chrome (aka Puppeteer) as a service.
headless-chrome puppeteer puppeteer-service web-crawler
Last synced: 14 Jun 2025
https://github.com/threenine/stop-web-crawlers-api
Stop Web Crawlers update API
Last synced: 13 Apr 2025
https://github.com/ahmedshahriar/youtube-comment-scraper
This script will dump youtube video comments to a CSV from youtube video links. Video links can be placed inside a variable or list or CSV
comment-parser csv data-mining-python data-science lxml pandas python python3 requests-library-python requests-module scraper scraping social-media web-crawler web-crawler-python web-scraping youtube youtube-crawler youtube-downloader youtube-scraper
Last synced: 09 Feb 2026
https://github.com/spk/maman
Rust Web Crawler saving pages on Redis
crawler http spider web web-crawler
Last synced: 07 Oct 2025
https://github.com/spk/validate-website
Web crawler for checking the validity of your documents.
Last synced: 23 Apr 2025
https://github.com/laurentvv/crawl4ai-mcp
Web crawling tool that integrates with AI assistants via the MCP
ai-tools crawl4ai mcp python3 web-crawler
Last synced: 24 Apr 2026
https://github.com/SylvainDe/ComicBookMaker
Script to fetch webcomics and use them to create ebooks.
beautiful-soup beautifulsoup comic comic-downloader comics download-comic ebook kindle mobi python web-crawler webcomic
Last synced: 21 Jul 2025
https://github.com/sylvainde/comicbookmaker
Script to fetch webcomics and use them to create ebooks.
beautiful-soup beautifulsoup comic comic-downloader comics download-comic ebook kindle mobi python web-crawler webcomic
Last synced: 17 Sep 2025
https://github.com/scrapingant/zoominfo_scraper
Zoominfo scraper with using of rotating proxies and headless Chrome from ScrapingAnt
datamining leadgen leadgeneration python scraper scraping scraping-api scraping-data scraping-tool scraping-websites web-crawler web-crawler-python web-crawling web-harvesting zoominfo-client
Last synced: 11 Jun 2025
https://github.com/debugtalk/webcrawler
A web crawler based on requests-html, mainly targets for url validation test.
crawler requests-html web-crawler weblink
Last synced: 15 Apr 2025
https://github.com/leafrock/spiderx
A simple web-crawler development framework based on .Net Core.
csharp dotnetcore spider web-crawler
Last synced: 19 Apr 2025
https://github.com/scrapy/scrapy-bench
A CLI for benchmarking Scrapy.
benchmark-suite command-line-tool python scrapy scrapy-bench web-crawler
Last synced: 14 Apr 2025
https://github.com/sergio11/eclipserecon
🌑 EclipseRecon is a personal project developed during my cybersecurity learning journey 🛡️. It helps practice web reconnaissance 🌐 by identifying subdomains 🧩, site structures 🧭, and vulnerabilities 🐞 in a controlled environment 🧪.
blue-team bug-bounty cybersecurity ethical-hacking information-gathering owasp penetration-testing reconnaissance red-team scan-tools security security-analysis security-reporting security-tools subdomain-scanner vulnerability vulnerability-scanner web-application-security web-crawler web-security
Last synced: 06 Sep 2025
https://github.com/calebwin/frequent
A utility for crawling websites and building frequency lists of words
frequency-lists python web-crawler web-crawler-python word-frequency
Last synced: 09 Apr 2025
https://github.com/bartozzz/crawlerr
A simple and fully customizable web crawler/spider for Node.js with server-side DOM. Comes with elegant and hell-simple APIs.
crawler jsdom nodejs scraper spider web-crawler
Last synced: 23 Apr 2025
https://github.com/HHN/crawler4j
Open Source Web Crawler for Java - A fork of yasserg/crawler4j
crawler crawler4j java spider web-crawler web-spider
Last synced: 05 Oct 2025
https://github.com/omkarcloud/botasaurus-starter
🚀 OFFICIAL STARTER TEMPLATE FOR BOTASAURUS SCRAPING FRAMEWORK 🤖
beautifulsoup crawler crawling crawling-framework crawling-python crawling-tool headless node-crawler python-crawler scraper scraping scraping-framework scraping-python scraping-tool selenium web-crawler web-crawling web-scraper web-scraping webscraping
Last synced: 23 Apr 2025
https://github.com/waynechang65/ptt-crawler
ptt-crawler is a web crawler module designed to scarpe data from Ptt.
api crawl crawler javascript nodejs ptt scrape scraper scraping spider typescript web-crawler webcrawler
Last synced: 08 Oct 2025
https://github.com/biraj21/web-wanderer
A multi-threaded web crawler written in Python, utilizing ThreadPoolExecutor and Playwright to efficiently crawl dynamically rendered web pages and download them.
data-extraction multithreading python web-crawler webcrawler
Last synced: 12 Jan 2026
https://github.com/hmarzban/pipe2time.ir
Web Crawler for Time.ir to Retrive JSON File, jalali, qamari, miladi JSON Calendar API.
calendar events ics jalali json-api miladi nodejs shamsi-calendar web-crawler
Last synced: 25 Jul 2025
https://github.com/bkeepers/spiderman
your friendly neighborhood web crawler
crawler crawler-engine http httprb nokogiri ruby spider spider-framework web-crawler web-scraping webcrawler webscraping
Last synced: 14 Oct 2025