Projects in Awesome Lists tagged with webcrawler
A curated list of projects in awesome lists tagged with webcrawler .
https://github.com/crawlab-team/crawlab
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
crawlab crawler crawling-tasks docker go platform scrapy scrapyd-ui spider spiders-management web-crawler webcrawler webspider
Last synced: 14 May 2025
https://github.com/ssssssss-team/spider-flow
新一代爬虫平台,以图形化方式定义爬虫流程,不写代码即可完成爬虫。
crawler jsoup spider spider-flow web-crawler web-spider webcrawler webspider xpath
Last synced: 14 May 2025
https://github.com/generalnewsextractor/generalnewsextractor
新闻网页正文通用抽取器 Beta 版.
Last synced: 14 May 2025
https://github.com/GeneralNewsExtractor/GeneralNewsExtractor
新闻网页正文通用抽取器 Beta 版.
Last synced: 24 Mar 2025
https://github.com/zorlan/skycaiji
蓝天采集器是一款开源免费的爬虫系统,仅需点选编辑规则即可采集数据,可运行在本地、虚拟主机或云服务器中,几乎能采集所有类型的网页,无缝对接各类CMS建站程序,免登录实时发布数据,全自动无需人工干预!是网页大数据采集软件中完全跨平台的云端爬虫系统
crawler crawling php spider webcrawler
Last synced: 14 May 2025
https://github.com/amirgamil/apollo
A Unix-style personal search engine and web crawler for your digital footprint.
personal-search poseidon search unix-like webcrawler
Last synced: 08 Apr 2025
https://github.com/scrapinghub/scrapyrt
HTTP API for Scrapy spiders
crawler crawling hacktoberfest hacktoberfest2021 python scraper scrapy twisted webcrawler webcrawling
Last synced: 15 May 2025
https://github.com/3nock/spidersuite
Advance web security spider/crawler
bugbounty cplusplus crawler gui information-gathering osint-tool pentest qt5 recon security-tools spider web-spider webcrawler
Last synced: 29 Oct 2025
https://github.com/z0m31en7/uscrapper
Uscrapper Vanta: Dive deeper into the web with this powerful open-source tool. Extract valuable insights with ease and efficiency, from both surface and deep web sources. Empower your data mining and analysis with Vanta's advanced capabilities. Fast, reliable, and user-friendly, Uscrapper Vanta is the ultimate choice for researchers and analysts.
darkweb darkweb-crawler information-extraction information-gathering osint osint-python osint-tool python reconnaissance selenium selenium-webscraper tor web-scraping webcra webcrawler webscraping website-scraper websites
Last synced: 15 May 2025
https://github.com/z0m31en7/Uscrapper
Uscrapper Vanta: Dive deeper into the web with this powerful open-source tool. Extract valuable insights with ease and efficiency, from both surface and deep web sources. Empower your data mining and analysis with Vanta's advanced capabilities. Fast, reliable, and user-friendly, Uscrapper Vanta is the ultimate choice for researchers and analysts.
darkweb darkweb-crawler information-extraction information-gathering osint osint-python osint-tool python reconnaissance selenium selenium-webscraper tor web-scraping webcra webcrawler webscraping website-scraper websites
Last synced: 05 May 2025
https://github.com/jaeksoft/opensearchserver
Open-source Enterprise Grade Search Engine Software
crawler custom-search enterprise indexing java lucene ocr opensearchserver search search-engine synonyms webcrawler webcrawling
Last synced: 04 Apr 2025
https://github.com/kingname/sourcecodeofbook
《Python爬虫开发 从入门到实战》配套源代码。
python python3 requests scrapy webcrawler
Last synced: 05 Apr 2025
https://github.com/salimk/rcrawler
An R web crawler and scraper
crawler crawlers r rpackage scraper webcrawler webscraper webscraping webscrapping
Last synced: 12 Apr 2025
https://github.com/salimk/Rcrawler
An R web crawler and scraper
crawler crawlers r rpackage scraper webcrawler webscraper webscraping webscrapping
Last synced: 14 Mar 2025
https://github.com/adrianosferreira/afrodite.json
O maior livro de receitas culinárias em língua portuguesa
javascript mongodb nodejs webcrawler
Last synced: 12 Apr 2025
https://github.com/mehmetozkaya/dotnetcrawler
DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
crawler crawling csharp ddd-architecture dotnetcore entity-framework-core htmlagilitypack scraping scrapy scrapy-crawler webcrawler webcrawler-htmlagilitypack webcrawling webscraper webscraping
Last synced: 11 May 2025
https://github.com/mehmetozkaya/DotnetCrawler
DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
crawler crawling csharp ddd-architecture dotnetcore entity-framework-core htmlagilitypack scraping scrapy scrapy-crawler webcrawler webcrawler-htmlagilitypack webcrawling webscraper webscraping
Last synced: 18 Apr 2025
https://github.com/dedsecinside/gotor
This program provides efficient web scraping services for Tor and non-Tor sites. The program has both a CLI and REST API.
cli command-line command-line-tool docker go golang golang-server hacktoberfest http-server information-extraction osint osint-tools rest-api service tor torbot webcrawler webcrawling webscraping
Last synced: 09 Apr 2025
https://github.com/alex-on-ai/WebReaper
AI-native web scraper. Single binary with a bundled Claude Code skill. MIT-licensed alternative to Firecrawl.
ai-agents-automation claude-code crawler dotnet firecrawl-alternative llm markdown mcp parser parsing scraper scraping scraping-api scraping-web scraping-websites webcrawler webscraping
Last synced: 14 Jun 2026
https://github.com/voliveirajr/seleniumcrawler
An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site
asp-net python scraper scraping scraping-websites scrapper scrapy selenium selenium-webdriver webcrawler webcrawling
Last synced: 11 Oct 2025
https://github.com/pavlovtech/WebReaper
Web scraper, crawler and parser in C#. Designed as simple, declarative and scalable web scraping solution.
crawler datamining parser parsing scraper scraping scraping-api scraping-data scraping-tool scraping-web scraping-websites webcrawler webscraping
Last synced: 08 Apr 2025
https://github.com/aavache/llmwebcrawler
A Web Crawler based on LLMs implemented with Ray and Huggingface. The embeddings are saved into a vector database for fast clustering and retrieval. Use it for your RAG.
api distributed-computing fastapi huggingface large-language-models llm machine-learning milvus nlp pydantic python rag ray raylib transformer vector-database webcrawler webcrawling
Last synced: 23 Oct 2025
https://github.com/shenxiangzhuang/pythondataanalysis
The data and code that used in my book.
data-science python3 webcrawler
Last synced: 08 Aug 2025
https://github.com/shenxiangzhuang/PythonDataAnalysis
The data and code that used in my book.
data-science python3 webcrawler
Last synced: 26 Mar 2025
https://github.com/hfreire/browser-as-a-service
A web browser :earth_americas: hosted as a service, to render your JavaScript web pages as HTML
browser browser-as-a-service crawler docker github-actions javascript puppeteer rest-api scraper server webcrawler
Last synced: 11 Sep 2025
https://github.com/robsonbittencourt/gafanhoto
Bot para monitoramento de promoções no fórum do Hardmob http://www.hardmob.com.br/promocoes/
chatbot gafanhoto hardmob promocoes telegram webcrawler
Last synced: 10 Apr 2025
https://github.com/deuxhuithuit/algolia-webcrawler
Simple node worker that crawls sitemaps in order to keep an algolia index up-to-date
algolia algolia-webcrawler indexing javascript search-engine webcrawler
Last synced: 09 Jul 2025
https://github.com/DeuxHuitHuit/algolia-webcrawler
Simple node worker that crawls sitemaps in order to keep an algolia index up-to-date
algolia algolia-webcrawler indexing javascript search-engine webcrawler
Last synced: 14 Mar 2025
https://github.com/Conso1eCowb0y/Deepminer
Deep web crawler and search engine
crawler crawling dark-web data-mining deepminer deepweb github hacking onion osint python-web-scraper python3 search-engine security security-tools spider the-onion-router tor tor-network webcrawler
Last synced: 20 Apr 2025
https://github.com/kshru9/web-crawler
A multithreaded web crawler using two mechanism - single lock and thread safe data structures
concurrency concurrent-data-structure cpp crawler data-structures html-parser lock multithreading openssl pagerank pthread reader-writer-lock search-engine socket threading threadsafe webcrawler website-downloader
Last synced: 23 Mar 2025
https://github.com/opencharles/charles
Java web crawling library
dynamic selenium webcrawler webdriver
Last synced: 08 Apr 2025
https://github.com/parth-vader/fb-spider
Accepts a page name and shows latest posts and comments in a new browser window.
facebook-api graph graph-api spider webcrawler
Last synced: 15 Apr 2025
https://github.com/marcel0024/cococrawler
An declarative and easy to use web crawler and scraper in C#
cococrawler crawler crawling-tool csharp dotnet dotnetcore scraper scraping-tool webcrawler webcrawler-csharp webcrawling webscraper
Last synced: 10 Apr 2025
https://github.com/gdgd009xcd/RequestRecorder
A ZAPROXY Add-on that allows testing of web application vulnerabilities by recording complex multi-step sequences. You can test applications that need to access pages in a specific order, such as shopping carts or registration of member information.
activescan addon authentication csrf multistep multistep-form security security-testing security-tools vulnerability-scanners web-security webcrawler websecurity zap-extension zaproxy
Last synced: 31 Oct 2025
https://github.com/waynechang65/ptt-crawler
ptt-crawler is a web crawler module designed to scarpe data from Ptt.
api crawl crawler javascript nodejs ptt scrape scraper scraping spider typescript web-crawler webcrawler
Last synced: 08 Oct 2025
https://github.com/biraj21/web-wanderer
A multi-threaded web crawler written in Python, utilizing ThreadPoolExecutor and Playwright to efficiently crawl dynamically rendered web pages and download them.
data-extraction multithreading python web-crawler webcrawler
Last synced: 12 Jan 2026
https://github.com/bkeepers/spiderman
your friendly neighborhood web crawler
crawler crawler-engine http httprb nokogiri ruby spider spider-framework web-crawler web-scraping webcrawler webscraping
Last synced: 14 Oct 2025
https://github.com/code-yeongyu/trackpurchase
단 몇줄의 코드로 다양한 쇼핑 플랫폼에서 결제 내역을 긁어오자!
crawlwer puppeteer webcrawler webscraper webscraping
Last synced: 14 Aug 2025
https://github.com/ddayto21/lead-scraper
Repository contains a web crawler that searches for emails in a webpage, along with a webscraping script that collects leads from various webpages online filters those links based on some criteria and adds the new links to a queue. All the HTML or some specific information is extracted to be processed by a different pipeline.
beautifulsoup4 python requests webcrawler webscraper yellow-pages
Last synced: 03 Sep 2025
https://github.com/raspi/scrapy-intel-ark
Web crawler for Intel ARK (ark.intel.com)
hardware intel python scrapy spider webcrawler
Last synced: 05 Oct 2025
https://github.com/yufree/scifetch
webpage crawling tools for pubmed, google scholar and rss
google-scholar pubmed r rss webcrawler
Last synced: 18 Mar 2025
https://github.com/deep5050/abosar
অবসর 📚 A collection of short Bengali stories web scraped from various Bengali eMagazines and eNewspapers.
bengali cron-jobs stories web-scraper web-scraping webcrawler
Last synced: 14 Jul 2025
https://github.com/geminidsystems/googlenewsscraper
A Python package that scrapes Google News article data while remaining undetected by Google. Our scraper can scrape page data up until the last page and never trigger a CAPTCHA (download stats: https://pepy.tech/project/GoogleNewsScraper)
crawler googleautomator googlenews googlenewsscraper googlescraper python scraper scraping selenium web-scraping webcrawler webdriver webscraper
Last synced: 13 Aug 2025
https://github.com/jacraig/spidey
A multi threaded web crawler library that is generic enough to allow different engines to be swapped in.
Last synced: 12 Aug 2025
https://github.com/nomomon/kamernet-puppeteer
:house: Automatic message sender to new adverts on Kamernet using puppeteer.
automation kamernet message-sender netherlands node nodejs puppeteer webcrawler
Last synced: 12 Apr 2025
https://github.com/cutta/eksiseyler
Sample MVP project uses jsoup-web-crawl like API
android dagger2 dagger2-mvp glide jsoup mvp retrofit2 rxandroid2 rxjava2 webcrawler
Last synced: 24 Jul 2025
https://github.com/bjoern-hempel/php-web-crawler
A php class that crawls a given url and collects recursively some data from it. The final representation will be a json object.
crawler mit-license php recursive webcrawler webscraper xpath
Last synced: 11 Apr 2025
https://github.com/kingname/crawlerutility
Simplify the development of your webcrawler
python3 requests scrapy webcrawler
Last synced: 11 Jul 2025
https://github.com/michaelradu/web-crawler
A Web Crawler developed in Python.
crawler crawler-python crawlers python python-3 python-script python3 script scripting scripting-language scripts web web-crawler web-crawler-python web-crawlers web-crawling webcrawl webcrawler webcrawling
Last synced: 25 Jul 2025
https://github.com/0memo07/web-crawler
Web Crawler with Python
beautifulsoup4 bs4 crawler crawlers crawling crawling-python web-crawler web-crawler-python web-crawling webcrawler
Last synced: 24 Apr 2025
https://github.com/lewisakura/spiderboi
A web crawling library written in TypeScript.
spider typescript typescript3 web-crawler web-crawling web-spider webcrawler
Last synced: 12 Apr 2025
https://github.com/luizppa/web-crawler
A web crawler that collects and indexes web pages. Made with chilkat and gumbo parser.
chilkat cpp crawler webcrawler
Last synced: 17 Aug 2025
https://github.com/sreesh-mallya/bookmyshow-notify
A Python command-line app that notifies you when a show is available on bookmyshow.com.
beautifulsoup4 bookmyshow cli notifies python webcrawler
Last synced: 28 Jul 2025
https://github.com/madexploits/madrawler
Web crawler for finding easy endpoint
Last synced: 04 Jul 2025
https://github.com/ahmard/queliwrap
QueryList PHP web scrapper wrapper
php querylist webcrawler webscraper
Last synced: 18 Mar 2025
https://github.com/vmarcosp/supervise-crawler
:male_detective: Supervise crawler
crawler esy ocaml reasonml webcrawler
Last synced: 13 May 2025
https://github.com/mcstreetguy/crawler
An advanced web-crawler written in PHP.
composer composer-library crawler crawler-engine guzzle http-requests php php-7 php-library web-crawler webcrawler
Last synced: 09 Apr 2025
https://github.com/lucasmendesl/mugiwara
:tophat: a simple web scraping to extract and download videos from animesproject.com
anime-downloader cli nodejs rxjs webcrawler webscraping
Last synced: 27 Feb 2026
https://github.com/shirokovnv/webcrawler
The service for crawling websites.
cassandra elixir-phoenix parser webcrawler
Last synced: 21 Jul 2025
https://github.com/moehmeni/ezweb
Easy to use web page analyzer
analyzer crawler scraper text-analysis text-classification text-mining webcrawler webcrawling webpage webscraper webscraping www
Last synced: 06 Apr 2025
https://github.com/leelow/nightmare-screenshot-selector
👻 📷 A Nightmare plugin to easily take screenshots.
crawler headless-browsers javascript js nightmare nightmarejs nodejs plugin webcrawler
Last synced: 12 Apr 2025
https://github.com/leonardovff/socialbot
A robot to search pictures with hashtags in facebook and instagram
facebook hahstags instagram nodejs robot webcrawler
Last synced: 11 Apr 2025
https://github.com/robmch/mindfactory_crawling
A Python 3 Crawler for Mindfactory.de
crawler crawling data webcrawler webcrawling
Last synced: 07 May 2025
https://github.com/farkaskid/webcrawler
Simple and fast web crawler.
crawler go golang goroutines web webcrawler
Last synced: 14 Jan 2026
https://github.com/asabeneh/python
dictionaries loop python python3 regular-expression tuples webcrawler
Last synced: 13 Jun 2025
https://github.com/n3wjack/sitecrawler
A command-line based web crawler
crawler tool webcrawler webcrawling webdevelopment
Last synced: 07 Mar 2026
https://github.com/waynechang65/baha-crawler
baha-crawler is a web crawler module designed to scarp data from Bahamut Forum.
bahamut crawler javascript nodejs scraper spider webcrawler
Last synced: 22 Apr 2025
https://github.com/datacollectionspecialist/web-crawler-in-python
Learn how to build a web crawler in Python with this step-by-step guide for 2025.
Last synced: 09 Mar 2026
https://github.com/simonsdave/cloudfeaster
Cloudfeaster Spider Development
docker python selenium-webdriver spider webcrawler
Last synced: 16 Mar 2026
https://github.com/elektrostudios/fhm-crawler-freehardmusic.com
Crawls download urls of albums from freehardmusic.com website
albums crawl crawler crawling desktop-app desktop-application dotnet music web-crawler web-crawling web-scraper web-scraping webcrawler webcrawling webscraper webscraping windows windows-app windowsapp winforms
Last synced: 19 Jul 2025
https://github.com/havardnyboe/dagenidag
Gjenskapning av NRKs side 199 fra Tekst-TV
dagenidag nrk tekst-tv webcrawler
Last synced: 04 Aug 2025
https://github.com/odynvolk/bing-me-links
A simple node module for scraping Baidu, Bing, StartPage, Yahoo and Qwant
baidu bing javascript nodejs scraper startpage webcrawler yahoo
Last synced: 09 Oct 2025
https://github.com/victoralessander/smith
A toolkit to make easy web scraping the world.
beautifulsoup bot extract-information python python3 telegram webcrawler webscraping
Last synced: 15 Apr 2025
https://github.com/sadatrafsanjani/spider-web-crawler
A web crawler that implements breadth first search algorithm and built with maven.
breadth-first-search jsoup webcrawler
Last synced: 15 May 2026
https://github.com/aimlpm/markcrawl
Fast Python web crawler for RAG and AI ingestion. Extracts clean Markdown from any site for LLMs and vector stores.
ai-agents anthropic-claude data-extraction gemini ingestion-pipeline llm markdown-extraction openai pgvector python rag sitemap-crawler structured-data supabase vector-database webcrawler
Last synced: 23 Apr 2026
https://github.com/bitebait/curry
🍛 Curry é um WebCrawler escrito em Golang com finalidade de verificar o valor do câmbio de Dólar para Real (USDxBRL) em algumas lojas no Paraguay.
api brasil crawler currency-exchange-rates go golang paraguay webcrawler
Last synced: 15 Jan 2026
https://github.com/0000xffff/webgrab
web page: crawler / file scanner / downloader
crawler download downloader scrape scraper webcrawler
Last synced: 17 Apr 2026
https://github.com/mominurr/social-media-scraping
Social Media Scraping – Scrapes data from TikTok, LinkedIn, Facebook, and Twitter (X.com), including user profiles, posts, engagement metrics, and comments.
datascraping facebook-scraper linkedin-scraper pandas python scraper scraping selenium tiktok-scraper twitter-scraper webcrawler webcrawling webscraping
Last synced: 13 Apr 2026
https://github.com/agarwalkaushal/higher-education-recommendation
Higher Education Recommendation system using Python with Selenium API.
education pycharm-ide python recommender-system selenium-webdriver webcrawler
Last synced: 18 Feb 2026
https://github.com/raspi/scrapy-kuntavaalit2021-yle
Fetch YLE kuntavaalit 2021 data
crawler mirror python scrapy spider webcrawler
Last synced: 26 Apr 2025
https://github.com/congcoi123/crawler-sheis
A small crawler for getting data from the website: https://sheis.vn
crawler webcrawler webcrawling webscraper webscraping
Last synced: 25 Feb 2026
https://github.com/dearopen/django-easy-scraper
Django apps to scrape data from web page easily
automation django django-rest-framework python python3 webcrawler webcrawling webscraper webscraping
Last synced: 14 May 2026
https://github.com/moredure/drum
Golang implementation of the disk repository with update management (DRUM) framework as presented by Hsin-Tsang Lee, Derek Leonard, Xiaoming Wang, and Dmitri Loguinov in the paper "IRLbot: Scaling to 6 Billion Pages and Beyond"
Last synced: 24 May 2026
https://github.com/lucasmendesl/cinepolis-movies-extractor
A reactive command line tool that extract infos from cinepolis peru website
axios nodejs rxjs typescript webcrawler webscraping
Last synced: 17 Apr 2026
https://github.com/codera21/webcrawl-js
A simple web crawler - axios and cherrio
axios cheerio javascript webcrawler webscraping
Last synced: 11 Nov 2025
https://github.com/th3-c0der/web-crawler
A simple WebCrawler for exploring and downloading content from web pages within a given domain/url.
th3-c0der th3-coder th3c0der th3coder tool tools web-tool webcrawl webcrawler webcrawlers webcrawling
Last synced: 19 Mar 2026
https://github.com/ibz-04/hudgent
Official code implementation for my ready tensor publication, an ai agent that retrieves data from an islamic website -> uses the data as alignment criteria to answer the user
ai-agent ai-alignment cython islamic-ai-agent open-source python search-agent turkish-nlp webcrawler whoosh
Last synced: 03 Oct 2025
https://github.com/jshyunbin/comment_crawler
Web crawler for online shopping mall comments using python selectolax and requests.
Last synced: 13 Oct 2025
https://github.com/antoinegagne/treewalker
A web crawler in Erlang that respects `robots.txt`.
Last synced: 11 Feb 2026
https://github.com/sgowdaks/nichirin
RAG and Webcrawler in a single package
llm rag retrieval-augmented-generation scraping webcrawler
Last synced: 26 Jan 2026
https://github.com/nikola352/cirilizator
Web app with tools for using Cyrillic script on the Serbian side od the Internet
flask postgresql python react webcrawler
Last synced: 10 Oct 2025
https://github.com/doomspork/maartz
A refactor of Maartz's web scrapper. Context: https://twitter.com/maartz4/status/1248133734760615937
asynchronous-tasks elixir webcrawler
Last synced: 17 Feb 2026
https://github.com/galarzaa90/tibiakt
Kotlin library to fetch and parse Tibia.com pages.
jsoup jvm kotlin ktor tibia webcrawler
Last synced: 13 Jul 2025
https://github.com/nobrainghost/golamv2
Lightweight Web Crawler for Emails,Keywords,Deadlinks,Dead Domains written in Go. Suitable for low resource environments
Last synced: 16 Jun 2025
https://github.com/rrmerugu/trawler
A data gathering/trawling framework to search and get information from web sources like bing
crawler-engine python search webcrawler
Last synced: 14 Jan 2026
https://github.com/elektrostudios/bt4g-torrent-magnet-scraper
Scrapes BT4G magnet links using configurable search and filtering rules.
bt4g command-line console-applications crawler dotnet magnet magnet-link scraper scraping searchengine torrent torrents vbnet web-crawler web-spider webcrawler webspider windows windows-10 windows-app
Last synced: 24 Jun 2026
https://github.com/gappeah/nike_web_crawler
This project involves web scraping Nike's product pages to extract product names, prices and links. The project showcases three different implementations of the web crawler using Selenium and BeautifulSoup. It also includes visualisation of the scraped data using Matplotlib and Seaborn.
beautifulsoup data-analysis data-visualization python selenium web-crawler web-scraper webcrawler webscraper webscraping webscraping-beautifulsoup
Last synced: 04 Jul 2025