Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-07-01 00:06:39 UTC
- JSON Representation
https://github.com/airtoxin/stackable-crawler
middleware based lightweight crawler framework
crawler javascript lightweight
Last synced: 13 Apr 2025
https://github.com/liinen/vocalist-backend
vloom backend implementation in cloud service, with crawling dataset from karaoke website
connection-pool crawler express mysql ncloud-server pagination python3 selenium
Last synced: 13 Apr 2026
https://github.com/jemaf/stackoverflow-jobs
A wrapper for crawling data at Stack Overflow Jobs portal
crawler jobs python stack-overflow
Last synced: 14 Jan 2026
https://github.com/thiiagoms/dict-crawler
Simple crawler on UOL dictionary
beautifulsoup4 crawler dic python pythonic
Last synced: 26 May 2026
https://github.com/wangyihang/acw-sc-v2-py
Python requests.HTTPAdapter for `acw_sc__v2`
Last synced: 18 Jun 2026
https://github.com/nazanin1369/searchengine
Implementing a search engine using Java, AngularJS and Elastic search
angularjs crawler elasticsearch java search-engine
Last synced: 12 Apr 2026
https://github.com/telanflow/scrago
A micro crawler framework. achieved by GOLANG.
crawler go micro-framework spider
Last synced: 25 Jun 2025
https://github.com/joelkoen/wls
Easily crawl multiple sitemaps and list URLs
Last synced: 12 Apr 2025
https://github.com/yukito0209/is6941-ml-social-media
IS6941 Machine Learning & Social Media Analytics 课程小组项目代码仓库,探索机器学习在社交媒体数据分析中的应用。
bert city-university-of-hong-kong crawler data-collection llama machine-learning python sentiment-analysis social-media
Last synced: 01 Apr 2025
https://github.com/omkarcloud/dentalkart-scraper
🚀 SCRAPE 1000'S OF PRODUCTS FROM DENTALKART 🤖
beautifulsoup crawler crawling crawling-framework crawling-python dentalkart dentalkart-product-scraper dentalkart-scraper dentalkart-scraping node-crawler scraper scraping scraping-framework scraping-python selenium web-crawler web-crawling web-scraper web-scraping webscraping
Last synced: 07 Sep 2025
https://github.com/exasol/error-code-crawler-maven-plugin
Validator and crawler for exasol-error-codes in Java code
catalog crawler error-handling error-report error-reporting exasol exasol-integration java unification
Last synced: 13 Oct 2025
https://github.com/Juphex/SupremeBot
Demonstrates automated purchasing of the clothing brand "Supreme". This was a fun project and had no further application.
android chrome crawler kivy python3 webscraping windows
Last synced: 10 Mar 2025
https://github.com/qianbinbin/moebooru-crawler
Retrieve links of images from moebooru-based sites, like yande.re and konachan.com .
Last synced: 22 Oct 2025
https://github.com/sebi75/lightweight-sitemapper
A lightweight sitemapper written in typescript, built on top of fast-xml-parser and relying on few dependencies
Last synced: 21 Jan 2026
https://github.com/dean9703111/shopee_find_mac
用最快的速度找到便宜符合自己要求規格的mac
argparse crawler mac pip python python2 xlsxwriter
Last synced: 14 Apr 2026
https://github.com/pedrohs1771/hyenzy-x-anime-scraper
A powerful all-in-one media scraper for Anime and Games with 4K Upscale (MPV) and Discord RPC.
anime-scrapper anime4k crawler discord-rpc game-downloader mpv-player playwright python upscale
Last synced: 30 May 2026
https://github.com/vmandic/tris-web-crawler
Tris is a simple NodeJS web crawler tool to help you collect links from visited links of a website's domain.
crawler data-tools nodejs scraping seo-tools web-scraper
Last synced: 20 May 2026
https://github.com/nakabonne/netsurfer
netsurfer is a very lightweight scraping framework
Last synced: 01 Apr 2025
https://github.com/codeforequity-at/botium-crawler
Botium Crawler - Like a Website Crawler, just for Conversation Flows
Last synced: 23 Apr 2025
https://github.com/anyparser/anyparserjs
Anyparser Typescript SDK for RAG/ETL Pipelines - File Content Extraction. Supports extraction from various file formats including PDF, Microsoft Office documents, OCR/Image to Text, Audio to Text, and Website to Text.
anyparser artificial-intelligence cache-augmented-generation crawler etl-pipeline graph-rag knowledgebase langchain microsoft-office microsoft-word ms-office n8n-nodes ocr pdf-extraction rag retrieval-augmented-generation text-extraction web-crawler
Last synced: 17 Feb 2026
https://github.com/andreoliwa/scrapy-tegenaria
🕷🕸 Spiders to crawl ads of houses and apartments. 🏠 🏢
crawler flask postgresql python python3 scrapy
Last synced: 13 Apr 2025
https://github.com/zabuzard/songcrawler
Crawles all song files available on 'http://downloads.khinsider.com/'. Creates a list of direct download links for all such songs, intended for use with JDownloader or similar.
command-line-tool crawler download-musics downloadmanager jdownloader multithreading song-files songs web-crawler
Last synced: 09 Jun 2026
https://github.com/jxeng/site-info-crawler
A tool for batch crawling website's title, description, favicon.
Last synced: 30 May 2026
https://github.com/kluhan/kraken
Kraken is a generic, mid-scale web crawler specifically built to crawl vertical data-sources, like Youtube or the Google Play Store.
celery crawler google-play-store python web-crawling
Last synced: 07 Sep 2025
https://github.com/zekrotja/r34-crawler
A simple CLI tool to fetch and download images from rule34.xxx
crawler go rest-api rule34 worker-pool xml
Last synced: 06 Mar 2026
https://github.com/travorlzh/temperature-analyzer
Python crawler that helps fetch temperature of Beijing, China
crawler homework python variance
Last synced: 25 Aug 2025
https://github.com/bimmr/site-crawler
Chromium Extension: Crawl a website
chrome-extension crawler downloader sitemap
Last synced: 12 Mar 2026
https://github.com/nbdy/prntscrngrb
prnt.sc / lightshot crawler, nudity detection and text extraction to a sqlite database
crawler nudity-detection prntsc text-extraction
Last synced: 04 Oct 2025
https://github.com/first-coding/django-and-web
This is a django and Web front - and back -end separation project.
Last synced: 16 Feb 2026
https://github.com/nemmusu/free-vpn-downloader
This repository contains three Python scripts designed to simplify the process of downloading and configuring free VPN .ovpn files for use with OpenVPN.
automation crawler download downloader free freevpn openvpn ovpn ovpn-files vpn
Last synced: 07 Feb 2026
https://github.com/qiubits2007/xml-sitemap
Multi-domain XML sitemap generator with support for robots.txt, meta tags, email logging & search engine pinging
crawler generator gzip multi-domain php8 robots-txt seo seotools sitemap-builder sitemap-generator sitemap-xml
Last synced: 25 Feb 2026
https://github.com/lucky845/animetimeline
使用Python脚本爬取动漫信息时间表,并保存为Markdown文件。
Last synced: 09 Jul 2025
https://github.com/supadata-ai/py
Official Python SDK for the Supadata API.
ai api crawler llm markdown scraping sdk transcript web-scraper youtube
Last synced: 22 Mar 2025
https://github.com/xiantang/mini_scrapy
模仿scrapy的轻量级爬虫框架
crawler python3 requets scrapy
Last synced: 27 Mar 2025
https://github.com/jjlibra/bake-mediacrawler
NanmiCoder‘s self-media data crawling software
Last synced: 06 May 2025
https://github.com/reisdev/reads
Real Estate Agency Data Scraper
crawler python scraping scrapy selenium-python selenium-webdriver spider
Last synced: 31 Jan 2026
https://github.com/norconex/committer-neo4j
Implementation of Norconex Committer for Neo4j.
crawler neo4j neo4j-committer norconex-committer
Last synced: 19 Jan 2026
https://github.com/afsh7n/crawly-automation
Crawly Automation is a lightweight, modular, and extensible web crawling framework built on top of Puppeteer. Whether you need to scrape data, automate browser interactions, manage CAPTCHAs, or handle advanced data extraction, Crawly Automation simplifies the process.
automation crawler nodejs puppeteer webscraping
Last synced: 25 Feb 2026
https://github.com/spraakbanken/svt-crawler
Programme for crawling SVT's API for news articles and converting the data to XML.
Last synced: 07 Mar 2026
https://github.com/microlinkhq/ua
A simple redis primitives to incr() and top() user agents
crawler redis user-agent user-agent-parser
Last synced: 18 Mar 2026
https://github.com/tufayellus/linkedin-cv-downloader
A Python based GUI automation software for downloading bulk LinkedIn CV / LinkedIn Resume from a list of profile links
crawler digital-marketing email-marketing email-scraper leads linkedin-bot linkedin-cv linkedin-cv-downloader linkedin-download linkedin-downloader linkedin-resume linkedin-resume-downloader linkedin-scraper scrape-emails scrape-websites scraper scraper-engine
Last synced: 17 Mar 2025
https://github.com/denrydu/baiduimagecrawler
自己写的两个用来爬取百度图片的脚本,方便CV研究者制作数据集。Two ways to download images from baidu, useful tool for making cv datasets!
Last synced: 04 Nov 2025
https://github.com/1970mr/link-crawler
Web Link Crawler: A Python script to crawl websites and collect links based on a regex pattern. Efficient and customizable.
clawler crawler crawler-python link-crawler link-crawler-python link-scraper link-scraper-python links python scraper scraper-python website-crawler website-scraper
Last synced: 06 Feb 2026
https://github.com/fabrix-app/spool-scraper
Spool: Webscraper
cheerio crawler fabrix nodejs scraping spools typescript webscraper
Last synced: 09 May 2026
https://github.com/antoinegagne/treewalker
A web crawler in Erlang that respects `robots.txt`.
Last synced: 11 Feb 2026
https://github.com/superreal/octopus
Recursive and multi-threaded broken link checker
Last synced: 14 May 2026
https://github.com/darealfreak/figure-tracker
application to keep watch of wished figures on multiple sites and notify you about auctions, sales or sudden price drops
crawler figure-tracker monitoring
Last synced: 30 Mar 2025
https://github.com/genfuture/cryptocurrency-scraper
Cryptocurrency Data Crawler 🚀 Updates CoinData Every 12 hours. High-performance Node.js crawler that fetches comprehensive data for 1500+ cryptocurrencies from CoinGecko API. Collects market data, and blockchain details with built-in rate limiting and resume capability. Perfect for crypto analysis, research, and building market intelligence tools
binance bitcoin bsc coingecko coingecko-api crawler crypto-bot cryptocurrencies cryptocurrency ethereum scraper
Last synced: 28 Jan 2026
https://github.com/polakosz/smf-scraper
You know, just for backup :smile: - The only so the best Simple Machines Forum C# scraper on GitHub :cat:
crawler csharp forum machines php scraper simple simplemachines smf
Last synced: 30 Apr 2026
https://github.com/santhoshse7en/alcoholics-anonymous
Research Project to analyse the knowledge about Alcoholics Anonymous in public
aa-meetings alcoholics alcoholics-anonymous anonymous bs4 crawler data-extraction-and-pre-processing google-search-using-python news-crawler newspaper3k python the-hindu web-scraping without-api
Last synced: 07 May 2026
https://github.com/rebrowser/stubhub-dataset
StubHub secondary ticket market data: event listings with section, row, quantity, delivery type, ticket class, and 500+ venues across US, Canada, and Europe. Updated daily.
concert-tickets crawler data-collection data-science dataset event-tickets live-events open-data resale-tickets scraper secondary-market sports-tickets stubhub tickets web-scraping
Last synced: 03 May 2026
https://github.com/kapitanluffy/sunny-crawler
That moment when I tried learning things about "Big Data" and "Inverted Indexes"
big-data crawler inverted-index php search
Last synced: 30 Apr 2026
https://github.com/restuwahyu13/node-scraper-content
example node scraper all content programming using puppeteer
crawler nodejs puppeter scrapper
Last synced: 14 May 2026
https://github.com/ysh329/stock-newspaper-crawler
[UNMAINTAINED]Crawl 4 kinds of finance newspaper corpus (from CCSTOCK.CN).
corpus crawled-data crawler database stock-newspaper-crawler
Last synced: 28 Apr 2026
https://github.com/YGGverse/pulsarss
RSS Aggregator for Gemini Protocol
aggregator cli crawler daemon feed gemini gemini-protocol gemtext parser rss rust
Last synced: 15 Jun 2026
https://github.com/yuminn-k/crawling-tabelog
Crawling store information from tabelog
Last synced: 08 Jun 2026
https://github.com/eduardozepeda/go-web-crawler
A concurrent web crawler written in go that looks for exposed .git and .env uris.
crawler environment-variables git go pentesting security-audit
Last synced: 16 Apr 2026
https://github.com/elky84/lol-crawler
Notification from LOL friend game start & end.
crawler csharp docker dotnet web-crawler
Last synced: 07 May 2026
https://github.com/maraf/staticsitecrawler
A simple util for crawling links from root URL and saving HTML documents.
Last synced: 21 Apr 2026
https://github.com/shunk031/lineblogscraper
Scraper for LINE Blog in Scrapy
crawler lineblog scraper scrapy
Last synced: 17 Jun 2026
https://github.com/arshamroshannejad/scrapify
Scrapify is a golang library that automates the process of bypassing CAPTCHAs, enabling efficient web scraping and data acquisition.
403-bypass arkose cloudflare crawler golang http-client scraper
Last synced: 18 Apr 2026
https://github.com/johnvanderton/flysh
HTML web parser powered by jQuery and JSDOM
crawler crawler-engine dom dom-manipulation html javascript javascript-library jquery jsdom parser-library scraper typescript typescript-library web-crawler web-parser
Last synced: 03 Mar 2026
https://github.com/mashukui/xhs_pic_tool
用python开发的小红书图片采集软件,支持下载小红书笔记无水印图片、采集笔记数据、评论数据等。小红书爬虫|小红书无水印图片|小红书无水印下载|小红书评论爬虫|小红书采集工具|小红书评论采集|小红书采集软件|小红书爬取数据|xiaohongshu|xhs|XHS
crawler gui gui-application python-spider spider xhs xhs-downloader xhs-spider xiaohongshu xiaohongshu-downloader
Last synced: 04 Apr 2026
https://github.com/buaadreamer/buaastar
北航星球网站 北航2021年夏季学期Python英文课大作业
crawler css flask html javascript python
Last synced: 28 Apr 2026
https://github.com/coverified/spider
A microservice with web-crawler/spider capabilities which only follows and indexes urls of the provided host domain(s)
akka crawler graphql hacktoberfest microservice spider
Last synced: 29 Apr 2026
https://github.com/zanmato/shouting-robin
SEO Crawler focused on E-commerce
crawler developer-tools seo seo-tools
Last synced: 21 Jun 2026
https://github.com/manojahi/is-there-any-song-reference-in-article
It will tell if there are any songs references in article from a website.
crawler lyrics-search python webscraping
Last synced: 28 Mar 2026
https://github.com/gnujoow/crawl-repo
crawling github's repositories basic info
crawler github github-api python3
Last synced: 03 May 2026
https://github.com/nava45/simplempcrawler
Simple Multiprocessing Crawler in python
crawler multiprocessing python
Last synced: 22 Jun 2026
https://github.com/viclafouch/pe-crawler
📌 An automated system that serves data extracted from the Google Help Center
crawler javascript nodejs postgresql sequelize
Last synced: 17 Apr 2026
https://github.com/kahsolt/allchan
An image crawler for xChan(4chan/8ch/...) image board.
4chan 4chan-downloader 8chan crawler image-crawler
Last synced: 23 Jun 2026
https://github.com/anjackson/scrapy-url-frontier
A Scrapy module for URL Frontier integration
crawler frontier scrapy spider
Last synced: 23 Jun 2026
https://github.com/natshah/natshah-crawler
Natshah Crawler works to crawl a selected domain with all it's internal links and internal pages.
crawler database filter natshah-crawler
Last synced: 29 Apr 2026
https://github.com/marabesi/social-crawler
Easy way to find emails from social networks
crawler emails php social-crawler social-network
Last synced: 02 Mar 2026
https://github.com/ph-7/gettermails
GetterMails, Scraper
bot crawler email php python retrieve-web-page scrape scraper scraping scraping-websites scrapper webdriver
Last synced: 20 Apr 2026
https://github.com/devkoriel/teslalarm-kr
🚀 Teslalarm KR Real-time, AI-powered Tesla news & price alerts tailored for the Korean market. Stay updated on price changes, new model releases, and more – delivered directly to your Telegram. 🔔 Join us and help revolutionize Tesla news in Korea!
Last synced: 04 Apr 2026
https://github.com/eduardosbcabral/desafio-tecnico-mp
Desafio - Gerador de arquivos em C# utilizando Web Crawler e Buffers para a escrita do arquivo em disco.
Last synced: 08 May 2026
https://github.com/mwoss/mors
Application of topic models for information retrieval and search engine optimization.
common-crawl crawler django doc2vec gensim hacktoberfest lda python scrapy search search-engine tfidf
Last synced: 19 Apr 2026
https://github.com/cyberdolfi/serverrawler
ServerRawler is a Minecraft Server Crawler, written in Rust
crawler minecraft ratatui-rs rust seeker servercrawler serverseeker
Last synced: 04 Mar 2026
https://github.com/ewertoncodes/mind-crawler
A simple api written in Rails to extract quotations from the Quotes to Scrape site.
Last synced: 14 May 2026
https://github.com/davideferre/covid19-data-crawler-ita
Covid 19 italian data crawler
coronavirus covid19 crawler hacktoberfest hacktoberfest2021 python
Last synced: 03 Jun 2026
https://github.com/dizys/weibo-crawler
A nodejs weibo crawler
crawler nodejs typescript weibo-spider
Last synced: 19 Apr 2026
https://github.com/chenty2333/tiktok-youtube_commentscraper
This tool allows you to collect public comments from TikTok and YouTube videos, either via direct video URLs or keyword-based search. It's useful for data analysis, opinion mining, and building datasets for machine learning tasks.一个轻量级的 TikTok 与 YouTube 评论爬虫工具,支持通过视频链接或关键词批量获取评论数据,适用于情感分析、文本挖掘、机器学习等数据收集任务。
comment crawler nlp scraper sentiment-analysis tiktok youtube
Last synced: 20 Apr 2026
https://github.com/luukalindgren/jobposts-utu
Web site for a database that holds job post data of IT jobs.
crawler docker fastapi mariadb react virtual-machine
Last synced: 29 Apr 2026
https://github.com/zukahai/formosa-views
View Formosa employee profile, salary, bonus year
bonus-year crawler css formosa html javascript nodejs python salary views
Last synced: 29 Apr 2026