Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-30 00:06:54 UTC
- JSON Representation
https://github.com/kluhan/kraken
Kraken is a generic, mid-scale web crawler specifically built to crawl vertical data-sources, like Youtube or the Google Play Store.
celery crawler google-play-store python web-crawling
Last synced: 07 Sep 2025
https://github.com/saadali1996/goose-rest-api
https://github.com/advancedlogic/GoOse based REST API for article content extraction
Last synced: 09 Mar 2026
https://github.com/liinen/vocalist-backend
vloom backend implementation in cloud service, with crawling dataset from karaoke website
connection-pool crawler express mysql ncloud-server pagination python3 selenium
Last synced: 13 Apr 2026
https://github.com/jofaval/webscraping
WebScraper providing tools to scrape tons of websites with the same base
crawler e-commerce python scraper webscraper webscraping
Last synced: 06 Oct 2025
https://github.com/microlinkhq/ua
A simple redis primitives to incr() and top() user agents
crawler redis user-agent user-agent-parser
Last synced: 18 Mar 2026
https://github.com/Juphex/SupremeBot
Demonstrates automated purchasing of the clothing brand "Supreme". This was a fun project and had no further application.
android chrome crawler kivy python3 webscraping windows
Last synced: 10 Mar 2025
https://github.com/arif98741/deadlink-checker-python
A Python tool to crawl websites and check for broken/dead links with detailed reporting in both text and PDF formats.
crawler crawling python python3 website-scraper
Last synced: 18 Apr 2026
https://github.com/congcoi123/crawler-sheis
A small crawler for getting data from the website: https://sheis.vn
crawler webcrawler webcrawling webscraper webscraping
Last synced: 25 Feb 2026
https://github.com/antoinegagne/treewalker
A web crawler in Erlang that respects `robots.txt`.
Last synced: 11 Feb 2026
https://github.com/sachin-kumar-2003/seocrawler
SEO Link Checker | Find Broken Links & Improve SEO I have built an SEO Link Checker that helps businesses, marketers, and site owners scan their websites, detect broken or harmful links, and fix them fast. This improves site health, user experience, and search rankings. Features: -Scan entire website for broken internal and external links
beautifulsoup crawler fastapi reactjs seo seo-optimization
Last synced: 15 Apr 2026
https://github.com/dean9703111/shopee_find_mac
用最快的速度找到便宜符合自己要求規格的mac
argparse crawler mac pip python python2 xlsxwriter
Last synced: 14 Apr 2026
https://github.com/nextlevelshit/fick
Fucking Incredible Command line King. Add CLI flavour to any website you like to.
Last synced: 17 Feb 2026
https://github.com/jjlibra/bake-mediacrawler
NanmiCoder‘s self-media data crawling software
Last synced: 06 May 2025
https://github.com/exasol/error-code-crawler-maven-plugin
Validator and crawler for exasol-error-codes in Java code
catalog crawler error-handling error-report error-reporting exasol exasol-integration java unification
Last synced: 13 Oct 2025
https://github.com/carloocchiena/python_url_crawler
A script that starting from a webpage, iterate thru all its link, appending them in a list. Sort of proxy to get all pages in a website
beautifulsoup crawler python python3
Last synced: 12 Feb 2026
https://github.com/wangshouh/icourse163_script
A python script designed for like and comments to MOOC. 用于中国大学MOOC点赞和评论的Python脚本
crawler icourse163 python requests
Last synced: 28 Mar 2025
https://github.com/qianbinbin/moebooru-crawler
Retrieve links of images from moebooru-based sites, like yande.re and konachan.com .
Last synced: 22 Oct 2025
https://github.com/marvnc/pixiv-dump
Pixiv Encyclopedia DB Dumps, updated daily
crawler database dump encyclopedia japanese pixiv pixiv-crawler pixiv-database scraping
Last synced: 12 Jan 2026
https://github.com/epigos/newsbot
A news bot written in Go for Dialogflow and Facebook messenger
autocert chatbot crawler datastore dialogflow facebook-messenger-bot golang letsencrypt newsfeed
Last synced: 22 Mar 2025
https://github.com/pinpox/go-random-downloader
Download Html using "Random Page"
Last synced: 17 Aug 2025
https://github.com/bimmr/site-crawler
Chromium Extension: Crawl a website
chrome-extension crawler downloader sitemap
Last synced: 12 Mar 2026
https://github.com/nazanin1369/searchengine
Implementing a search engine using Java, AngularJS and Elastic search
angularjs crawler elasticsearch java search-engine
Last synced: 12 Apr 2026
https://github.com/eeriemyxi/nosori
Online image viewer for https://coomer.su and https://kemono.su
api coomer crawler docker image javascript kemono server typescript video viewer web
Last synced: 01 Aug 2025
https://github.com/jxeng/site-info-crawler
A tool for batch crawling website's title, description, favicon.
Last synced: 30 May 2026
https://github.com/qin2dim/istockphoto-go
📸 Gracefully download dataset from iStockPhoto.
Last synced: 05 Apr 2025
https://github.com/spraakbanken/svt-crawler
Programme for crawling SVT's API for news articles and converting the data to XML.
Last synced: 07 Mar 2026
https://github.com/nbdy/prntscrngrb
prnt.sc / lightshot crawler, nudity detection and text extraction to a sqlite database
crawler nudity-detection prntsc text-extraction
Last synced: 04 Oct 2025
https://github.com/tufayellus/linkedin-cv-downloader
A Python based GUI automation software for downloading bulk LinkedIn CV / LinkedIn Resume from a list of profile links
crawler digital-marketing email-marketing email-scraper leads linkedin-bot linkedin-cv linkedin-cv-downloader linkedin-download linkedin-downloader linkedin-resume linkedin-resume-downloader linkedin-scraper scrape-emails scrape-websites scraper scraper-engine
Last synced: 17 Mar 2025
https://github.com/spaceemotion/goodreads-browser
Custom crawler + interface to have better filtering and sorting of the goodreads database 📚🔍
Last synced: 22 Jan 2026
https://github.com/sieep-coding/web-crawler
A simple web crawler implemented in Go.
Last synced: 09 Mar 2026
https://github.com/anyparser/anyparserjs
Anyparser Typescript SDK for RAG/ETL Pipelines - File Content Extraction. Supports extraction from various file formats including PDF, Microsoft Office documents, OCR/Image to Text, Audio to Text, and Website to Text.
anyparser artificial-intelligence cache-augmented-generation crawler etl-pipeline graph-rag knowledgebase langchain microsoft-office microsoft-word ms-office n8n-nodes ocr pdf-extraction rag retrieval-augmented-generation text-extraction web-crawler
Last synced: 17 Feb 2026
https://github.com/joeri-abbo/python-credly-scraper
This project is a set of Python scripts designed to crawl and extract data from the Credly platform, focusing on skills, organizations, and badges. The scripts allow users to perform searches using command-line arguments, predefined search terms, or skills listed in a JSON file. The collected data is then saved to JSON files for further analysis an
badges crawler credly data-extraction json organizations python python3 requests-library skills web-crawling
Last synced: 23 Sep 2025
https://github.com/panyanyany/vps_spider
VPS Spider powering https://findallvps.com
Last synced: 28 Feb 2025
https://github.com/brunojppb/airport-crawler
Simple and powerful CLI app to get worldwide airport information in JSON format
Last synced: 09 Jun 2026
https://github.com/amirhoseinsalimi/boxapi-python
Python client for https://boxapi.ir to crawl and read Instagram data.
crawler instagram instagram-api python python3
Last synced: 26 May 2026
https://github.com/akagi201/spy
A lightweight distributed web crawler
crawler distributed lightweight nsq
Last synced: 26 Feb 2025
https://github.com/tbarnes94/fortnite-weapons-bot
A bot that returns fortnite weapon statistics based on input from Discord users. Written in TypeScript.
crawler discord discord-bot discord-js typescript2
Last synced: 06 Jan 2026
https://github.com/darealfreak/figure-tracker
application to keep watch of wished figures on multiple sites and notify you about auctions, sales or sudden price drops
crawler figure-tracker monitoring
Last synced: 30 Mar 2025
https://github.com/eklem/browsercrawler
Crawling content from a site within the browser. A basis for i.e. a search solution for static sites.
crawler search-engine website-generation
Last synced: 02 Aug 2025
https://github.com/anzo52/jcrawl
Java web crawler
crawler java java-web-crawler web web-crawler
Last synced: 06 Mar 2026
https://github.com/yukito0209/is6941-ml-social-media
IS6941 Machine Learning & Social Media Analytics 课程小组项目代码仓库,探索机器学习在社交媒体数据分析中的应用。
bert city-university-of-hong-kong crawler data-collection llama machine-learning python sentiment-analysis social-media
Last synced: 01 Apr 2025
https://github.com/denrydu/baiduimagecrawler
自己写的两个用来爬取百度图片的脚本,方便CV研究者制作数据集。Two ways to download images from baidu, useful tool for making cv datasets!
Last synced: 04 Nov 2025
https://github.com/yidas/tw-stock-crawler-php
PHP Crawler for Taiwan Stock Data (台股資料爬蟲)
crawler stock taiwan taiwan-stock-information taiwan-stock-market
Last synced: 25 Mar 2025
https://github.com/nzrsky/useragent-generator
High-performance User-Agent generator for Go. Zero-alloc bots, auto-updated browser versions from real usage data.
bot browser crawler go golang http scraping user-agent useragent
Last synced: 14 Apr 2026
https://github.com/erikmueller/jazmax
Crawl JAZ for different heat pumps depending on flow and return temperatures from the JAZ calculator
crawler data-science efficiency green heatpump jaz
Last synced: 24 Mar 2025
https://github.com/wangyihang/acw-sc-v2-py
Python requests.HTTPAdapter for `acw_sc__v2`
Last synced: 18 Jun 2026
https://github.com/highbreed/web-crawler
A web crawler script that crawls the target website and lists its links
Last synced: 07 Jun 2026
https://github.com/lockblock-dev/crawlarr
Crawlarr is a fast web crawler built in Go. It searches for anchor tags in the HTML pages and follows links. It leverages concurrency to improve speed.
Last synced: 18 Mar 2025
https://github.com/ozakboy/taiwan-news-crawlers
.net-based Crawlers for news of Taiwan (.net 台灣新聞爬蟲,數據物件化,方便使用)
crawler data-collection dataset-generation dotnet news taiwan webcrawlers
Last synced: 15 Apr 2025
https://github.com/eduardozepeda/go-web-crawler
A concurrent web crawler written in go that looks for exposed .git and .env uris.
crawler environment-variables git go pentesting security-audit
Last synced: 16 Apr 2026
https://github.com/cyberdolfi/serverrawler
ServerRawler is a Minecraft Server Crawler, written in Rust
crawler minecraft ratatui-rs rust seeker servercrawler serverseeker
Last synced: 04 Mar 2026
https://github.com/marabesi/social-crawler
Easy way to find emails from social networks
crawler emails php social-crawler social-network
Last synced: 02 Mar 2026
https://github.com/ph-7/gettermails
GetterMails, Scraper
bot crawler email php python retrieve-web-page scrape scraper scraping scraping-websites scrapper webdriver
Last synced: 20 Apr 2026
https://github.com/maraf/staticsitecrawler
A simple util for crawling links from root URL and saving HTML documents.
Last synced: 21 Apr 2026
https://github.com/elky84/lol-crawler
Notification from LOL friend game start & end.
crawler csharp docker dotnet web-crawler
Last synced: 07 May 2026
https://github.com/santhoshse7en/alcoholics-anonymous
Research Project to analyse the knowledge about Alcoholics Anonymous in public
aa-meetings alcoholics alcoholics-anonymous anonymous bs4 crawler data-extraction-and-pre-processing google-search-using-python news-crawler newspaper3k python the-hindu web-scraping without-api
Last synced: 07 May 2026
https://github.com/eduardosbcabral/desafio-tecnico-mp
Desafio - Gerador de arquivos em C# utilizando Web Crawler e Buffers para a escrita do arquivo em disco.
Last synced: 08 May 2026
https://github.com/ewertoncodes/mind-crawler
A simple api written in Rails to extract quotations from the Quotes to Scrape site.
Last synced: 14 May 2026
https://github.com/restuwahyu13/node-scraper-content
example node scraper all content programming using puppeteer
crawler nodejs puppeter scrapper
Last synced: 14 May 2026
https://github.com/YGGverse/pulsarss
RSS Aggregator for Gemini Protocol
aggregator cli crawler daemon feed gemini gemini-protocol gemtext parser rss rust
Last synced: 15 Jun 2026
https://github.com/viclafouch/pe-crawler
📌 An automated system that serves data extracted from the Google Help Center
crawler javascript nodejs postgresql sequelize
Last synced: 17 Apr 2026
https://github.com/shunk031/lineblogscraper
Scraper for LINE Blog in Scrapy
crawler lineblog scraper scrapy
Last synced: 17 Jun 2026
https://github.com/zanmato/shouting-robin
SEO Crawler focused on E-commerce
crawler developer-tools seo seo-tools
Last synced: 21 Jun 2026
https://github.com/nava45/simplempcrawler
Simple Multiprocessing Crawler in python
crawler multiprocessing python
Last synced: 22 Jun 2026
https://github.com/kahsolt/allchan
An image crawler for xChan(4chan/8ch/...) image board.
4chan 4chan-downloader 8chan crawler image-crawler
Last synced: 23 Jun 2026
https://github.com/anjackson/scrapy-url-frontier
A Scrapy module for URL Frontier integration
crawler frontier scrapy spider
Last synced: 23 Jun 2026
https://github.com/ysh329/stock-newspaper-crawler
[UNMAINTAINED]Crawl 4 kinds of finance newspaper corpus (from CCSTOCK.CN).
corpus crawled-data crawler database stock-newspaper-crawler
Last synced: 28 Apr 2026
https://github.com/johnvanderton/flysh
HTML web parser powered by jQuery and JSDOM
crawler crawler-engine dom dom-manipulation html javascript javascript-library jquery jsdom parser-library scraper typescript typescript-library web-crawler web-parser
Last synced: 03 Mar 2026
https://github.com/buaadreamer/buaastar
北航星球网站 北航2021年夏季学期Python英文课大作业
crawler css flask html javascript python
Last synced: 28 Apr 2026
https://github.com/coverified/spider
A microservice with web-crawler/spider capabilities which only follows and indexes urls of the provided host domain(s)
akka crawler graphql hacktoberfest microservice spider
Last synced: 29 Apr 2026
https://github.com/devkoriel/teslalarm-kr
🚀 Teslalarm KR Real-time, AI-powered Tesla news & price alerts tailored for the Korean market. Stay updated on price changes, new model releases, and more – delivered directly to your Telegram. 🔔 Join us and help revolutionize Tesla news in Korea!
Last synced: 04 Apr 2026
https://github.com/natshah/natshah-crawler
Natshah Crawler works to crawl a selected domain with all it's internal links and internal pages.
crawler database filter natshah-crawler
Last synced: 29 Apr 2026
https://github.com/polakosz/smf-scraper
You know, just for backup :smile: - The only so the best Simple Machines Forum C# scraper on GitHub :cat:
crawler csharp forum machines php scraper simple simplemachines smf
Last synced: 30 Apr 2026
https://github.com/mashukui/xhs_pic_tool
用python开发的小红书图片采集软件,支持下载小红书笔记无水印图片、采集笔记数据、评论数据等。小红书爬虫|小红书无水印图片|小红书无水印下载|小红书评论爬虫|小红书采集工具|小红书评论采集|小红书采集软件|小红书爬取数据|xiaohongshu|xhs|XHS
crawler gui gui-application python-spider spider xhs xhs-downloader xhs-spider xiaohongshu xiaohongshu-downloader
Last synced: 04 Apr 2026
https://github.com/kapitanluffy/sunny-crawler
That moment when I tried learning things about "Big Data" and "Inverted Indexes"
big-data crawler inverted-index php search
Last synced: 30 Apr 2026
https://github.com/yuminn-k/crawling-tabelog
Crawling store information from tabelog
Last synced: 08 Jun 2026
https://github.com/manojahi/is-there-any-song-reference-in-article
It will tell if there are any songs references in article from a website.
crawler lyrics-search python webscraping
Last synced: 28 Mar 2026
https://github.com/gnujoow/crawl-repo
crawling github's repositories basic info
crawler github github-api python3
Last synced: 03 May 2026
https://github.com/arshamroshannejad/scrapify
Scrapify is a golang library that automates the process of bypassing CAPTCHAs, enabling efficient web scraping and data acquisition.
403-bypass arkose cloudflare crawler golang http-client scraper
Last synced: 18 Apr 2026
https://github.com/rebrowser/stubhub-dataset
StubHub secondary ticket market data: event listings with section, row, quantity, delivery type, ticket class, and 500+ venues across US, Canada, and Europe. Updated daily.
concert-tickets crawler data-collection data-science dataset event-tickets live-events open-data resale-tickets scraper secondary-market sports-tickets stubhub tickets web-scraping
Last synced: 03 May 2026
https://github.com/mwoss/mors
Application of topic models for information retrieval and search engine optimization.
common-crawl crawler django doc2vec gensim hacktoberfest lda python scrapy search search-engine tfidf
Last synced: 19 Apr 2026
https://github.com/ssv445/js-rendering-proxy-docker
JS Rendering Proxy API to Handle JS Website in Your Crawler.
Last synced: 25 Apr 2026
https://github.com/alquipo/dragoongames-api
Dragoon Games Store - API (E-commerce)
crawler e-commerce-project graphql jsdom nodejs postgresql strapi-cms stripe
Last synced: 16 Apr 2026
https://github.com/enansari/guess-price-car
Car price estimation based on the information of a car sales site | final project of Maktabkhooneh | حدس قیمت خودرو با ماشین لرنینگ | پروژه نهایی مکتبخونه
crawler jadi machine-learning maktabkhoone maktabkhooneh python
Last synced: 10 May 2026
https://github.com/saketh7382/smartcrawler
Package for crawling items from webpages and store them as json file
crawler crawler-python open-source pip python3 scraper selenium selenium-webdriver webdriver-manager
Last synced: 28 Apr 2026