Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-23 00:06:44 UTC
- JSON Representation
https://github.com/spraakbanken/svt-crawler
Programme for crawling SVT's API for news articles and converting the data to XML.
Last synced: 07 Mar 2026
https://github.com/Anakeyn/website-contextual-links
Récupération des liens contextuels d'un site Web avec R.
Last synced: 17 Jul 2025
https://github.com/skulltech/arachnid
Crawling Instagram for reasons.
crawler instagram instagram-scraper python3 scraper scrapy
Last synced: 13 Jun 2025
https://github.com/krishpranav/spider
A ruby web spidering tool that can spider a site, multiple domains, certain links or infinitely
crawler ruby spider web-crawler web-scraping
Last synced: 14 Feb 2026
https://github.com/andreoliwa/scrapy-tegenaria
🕷🕸 Spiders to crawl ads of houses and apartments. 🏠 🏢
crawler flask postgresql python python3 scrapy
Last synced: 13 Apr 2025
https://github.com/joelkoen/wls
Easily crawl multiple sitemaps and list URLs
Last synced: 12 Apr 2025
https://github.com/viper373/xovideos
一个为用户打造的个性化视频下载工具
boto3 crawler downloader githubactions m3u8 mongodb mp4 pornhub python s3-storage
Last synced: 16 Jun 2025
https://github.com/anyparser/anyparserjs
Anyparser Typescript SDK for RAG/ETL Pipelines - File Content Extraction. Supports extraction from various file formats including PDF, Microsoft Office documents, OCR/Image to Text, Audio to Text, and Website to Text.
anyparser artificial-intelligence cache-augmented-generation crawler etl-pipeline graph-rag knowledgebase langchain microsoft-office microsoft-word ms-office n8n-nodes ocr pdf-extraction rag retrieval-augmented-generation text-extraction web-crawler
Last synced: 17 Feb 2026
https://github.com/empire/go-tse
Go client for http://www.tsetmc.com/
crawler financial-data financial-data-analysis market-data stock stock-prices tehran-stock-exchange tse tsetmc
Last synced: 01 Jun 2026
https://github.com/keosariel/ramby
Ramby is a simple way to setup a webscraper
beautifulsoup crawler python3 webscraping
Last synced: 27 Mar 2025
https://github.com/runnin-n-gunnin/geckofxinterceptrequestcaptureresponse
[GeckoFX/Firefox]: Shows how to Intercept request(s), capture response(s), customize GeckoPreferences, handle certificate errors, change useragent++.
browser cefsharp controls crawler crawling firefox gecko geckofx geckofx60 scraping webbrowser windows windowsforms winforms
Last synced: 06 Feb 2026
https://github.com/genfuture/cryptocurrency-scraper
Cryptocurrency Data Crawler 🚀 Updates CoinData Every 12 hours. High-performance Node.js crawler that fetches comprehensive data for 1500+ cryptocurrencies from CoinGecko API. Collects market data, and blockchain details with built-in rate limiting and resume capability. Perfect for crypto analysis, research, and building market intelligence tools
binance bitcoin bsc coingecko coingecko-api crawler crypto-bot cryptocurrencies cryptocurrency ethereum scraper
Last synced: 28 Jan 2026
https://github.com/lucky845/animetimeline
使用Python脚本爬取动漫信息时间表,并保存为Markdown文件。
Last synced: 09 Jul 2025
https://github.com/1970mr/link-crawler
Web Link Crawler: A Python script to crawl websites and collect links based on a regex pattern. Efficient and customizable.
clawler crawler crawler-python link-crawler link-crawler-python link-scraper link-scraper-python links python scraper scraper-python website-crawler website-scraper
Last synced: 06 Feb 2026
https://github.com/xiantang/mini_scrapy
模仿scrapy的轻量级爬虫框架
crawler python3 requets scrapy
Last synced: 27 Mar 2025
https://github.com/anzo52/jcrawl
Java web crawler
crawler java java-web-crawler web web-crawler
Last synced: 06 Mar 2026
https://github.com/becky-dai/flower-knowledge-graph-visualization
A full stack program of knowledge graph visualization 一个关于知识图谱可视化的全栈项目
crawler css django echarts html js knowledge-graph neo4j python
Last synced: 11 Mar 2026
https://github.com/zhuruili/spider
一些简单的爬虫代码,会不定时更新,希望能帮到你
crawler drissionpage python requests
Last synced: 18 Mar 2025
https://github.com/afsh7n/crawly-automation
Crawly Automation is a lightweight, modular, and extensible web crawling framework built on top of Puppeteer. Whether you need to scrape data, automate browser interactions, manage CAPTCHAs, or handle advanced data extraction, Crawly Automation simplifies the process.
automation crawler nodejs puppeteer webscraping
Last synced: 25 Feb 2026
https://github.com/reisdev/reads
Real Estate Agency Data Scraper
crawler python scraping scrapy selenium-python selenium-webdriver spider
Last synced: 31 Jan 2026
https://github.com/agricolamz/2017_andan_course
Course for ANDAN Summer School about strings and texts in R
crawler language-detection r regular-expressions rstats string-distance string-manipulation strings teaching teaching-materials text-analysis tf-idf tidytext
Last synced: 14 Jun 2025
https://github.com/nemmusu/free-vpn-downloader
This repository contains three Python scripts designed to simplify the process of downloading and configuring free VPN .ovpn files for use with OpenVPN.
automation crawler download downloader free freevpn openvpn ovpn ovpn-files vpn
Last synced: 07 Feb 2026
https://github.com/qiubits2007/xml-sitemap
Multi-domain XML sitemap generator with support for robots.txt, meta tags, email logging & search engine pinging
crawler generator gzip multi-domain php8 robots-txt seo seotools sitemap-builder sitemap-generator sitemap-xml
Last synced: 25 Feb 2026
https://github.com/vmandic/tris-web-crawler
Tris is a simple NodeJS web crawler tool to help you collect links from visited links of a website's domain.
crawler data-tools nodejs scraping seo-tools web-scraper
Last synced: 20 May 2026
https://github.com/camara94/crawlers
Web scraping is the process of using bots to extract content and data from a website. Unlike screen scraping, which only copies pixels displayed onscreen, web scraping extracts underlying HTML code and, with it, data stored in a database. The scraper can then replicate entire website content elsewhere
crawler python scraping scrapy spider
Last synced: 09 Apr 2025
https://github.com/panyanyany/vps_spider
VPS Spider powering https://findallvps.com
Last synced: 28 Feb 2025
https://github.com/basemax/fakefaces
This repository contains a crawler that downloads thousands of fake human face images from various sources on the internet. Additionally, the repository includes a dataset of thousands of face images of fake humans.
crawler crawler-php crawler-testing crawlers curl dataset datasets face face-fake faces fake-face fake-faces php php-curl
Last synced: 27 Apr 2026
https://github.com/benderpan/fakeagent.net
Fake Agent for .Net Standard.
agent crawler fake-agent http-headers
Last synced: 12 Apr 2025
https://github.com/yidas/tw-stock-crawler-php
PHP Crawler for Taiwan Stock Data (台股資料爬蟲)
crawler stock taiwan taiwan-stock-information taiwan-stock-market
Last synced: 25 Mar 2025
https://github.com/marvnc/pixiv-dump
Pixiv Encyclopedia DB Dumps, updated daily
crawler database dump encyclopedia japanese pixiv pixiv-crawler pixiv-database scraping
Last synced: 12 Jan 2026
https://github.com/travorlzh/temperature-analyzer
Python crawler that helps fetch temperature of Beijing, China
crawler homework python variance
Last synced: 25 Aug 2025
https://github.com/wangshouh/icourse163_script
A python script designed for like and comments to MOOC. 用于中国大学MOOC点赞和评论的Python脚本
crawler icourse163 python requests
Last synced: 28 Mar 2025
https://github.com/nazanin1369/searchengine
Implementing a search engine using Java, AngularJS and Elastic search
angularjs crawler elasticsearch java search-engine
Last synced: 12 Apr 2026
https://github.com/jxeng/site-info-crawler
A tool for batch crawling website's title, description, favicon.
Last synced: 30 May 2026
https://github.com/madis/flatcrawl
Clojure app for crawling apartment information from http://kv.ee
clojure crawler real-estate webapp
Last synced: 05 Jul 2025
https://github.com/yukito0209/is6941-ml-social-media
IS6941 Machine Learning & Social Media Analytics 课程小组项目代码仓库,探索机器学习在社交媒体数据分析中的应用。
bert city-university-of-hong-kong crawler data-collection llama machine-learning python sentiment-analysis social-media
Last synced: 01 Apr 2025
https://github.com/arpan404/spidey
Spidey is a powerful asynchronous web crawler built in Python that can crawl websites and download files with specified extensions. It's designed to be efficient, configurable, and easy to use.
asynchronous crawler dataminer opensource python webscraper
Last synced: 04 Feb 2026
https://github.com/Juphex/SupremeBot
Demonstrates automated purchasing of the clothing brand "Supreme". This was a fun project and had no further application.
android chrome crawler kivy python3 webscraping windows
Last synced: 10 Mar 2025
https://github.com/nakabonne/netsurfer
netsurfer is a very lightweight scraping framework
Last synced: 01 Apr 2025
https://github.com/congcoi123/crawler-sheis
A small crawler for getting data from the website: https://sheis.vn
crawler webcrawler webcrawling webscraper webscraping
Last synced: 25 Feb 2026
https://github.com/codeforequity-at/botium-crawler
Botium Crawler - Like a Website Crawler, just for Conversation Flows
Last synced: 23 Apr 2025
https://github.com/raspi/scrapy-kuntavaalit2021-yle
Fetch YLE kuntavaalit 2021 data
crawler mirror python scrapy spider webcrawler
Last synced: 26 Apr 2025
https://github.com/first-coding/django-and-web
This is a django and Web front - and back -end separation project.
Last synced: 16 Feb 2026
https://github.com/tikazyq/colly-crawlers
Crawlers using Golang-based web crawling framework Colly
Last synced: 15 Jun 2025
https://github.com/ozakboy/taiwan-news-crawlers
.net-based Crawlers for news of Taiwan (.net 台灣新聞爬蟲,數據物件化,方便使用)
crawler data-collection dataset-generation dotnet news taiwan webcrawlers
Last synced: 15 Apr 2025
https://github.com/omkarcloud/dentalkart-scraper
🚀 SCRAPE 1000'S OF PRODUCTS FROM DENTALKART 🤖
beautifulsoup crawler crawling crawling-framework crawling-python dentalkart dentalkart-product-scraper dentalkart-scraper dentalkart-scraping node-crawler scraper scraping scraping-framework scraping-python selenium web-crawler web-crawling web-scraper web-scraping webscraping
Last synced: 07 Sep 2025
https://github.com/amirhoseinsalimi/boxapi-python
Python client for https://boxapi.ir to crawl and read Instagram data.
crawler instagram instagram-api python python3
Last synced: 26 May 2026
https://github.com/akagi201/spy
A lightweight distributed web crawler
crawler distributed lightweight nsq
Last synced: 26 Feb 2025
https://github.com/pedrohs1771/hyenzy-x-anime-scraper
A powerful all-in-one media scraper for Anime and Games with 4K Upscale (MPV) and Discord RPC.
anime-scrapper anime4k crawler discord-rpc game-downloader mpv-player playwright python upscale
Last synced: 30 May 2026
https://github.com/soakit/book-download
book-download
crawler html2epub nodejs novel-downloader
Last synced: 01 May 2026
https://github.com/microlinkhq/ua
A simple redis primitives to incr() and top() user agents
crawler redis user-agent user-agent-parser
Last synced: 18 Mar 2026
https://github.com/saadali1996/goose-rest-api
https://github.com/advancedlogic/GoOse based REST API for article content extraction
Last synced: 09 Mar 2026
https://github.com/zabuzard/songcrawler
Crawles all song files available on 'http://downloads.khinsider.com/'. Creates a list of direct download links for all such songs, intended for use with JDownloader or similar.
command-line-tool crawler download-musics downloadmanager jdownloader multithreading song-files songs web-crawler
Last synced: 09 Jun 2026
https://github.com/maxbubblegum47/spotydump
Spotify Scraper combined with a Genius Scraper. Scrape artist of a certain period of time/region of the world and dump all their songs!
crawler dump genius lyrics python spotify unimore-informatica
Last synced: 22 Mar 2025
https://github.com/kluhan/kraken
Kraken is a generic, mid-scale web crawler specifically built to crawl vertical data-sources, like Youtube or the Google Play Store.
celery crawler google-play-store python web-crawling
Last synced: 07 Sep 2025
https://github.com/bimmr/site-crawler
Chromium Extension: Crawl a website
chrome-extension crawler downloader sitemap
Last synced: 12 Mar 2026
https://github.com/fernandod1/yahoo-finance-scraper
This python script scraps "Open" and "Previous Close" values from any company in Yahoo Finance and save them in a local text file.
crawler python python3 scraper scraping scraping-websites scrapper scrapping spider yahoo yahoo-finance yahoo-finance-api
Last synced: 24 Aug 2025
https://github.com/ging-dev/sitemap-crawler
Collect links through the sitemap.xml or robots.txt
crawler php php8 sitemap sitemap-crawler
Last synced: 10 Jan 2026
https://github.com/kokseen1/chii
A minimal marketplace bot maker.
auction automation bidding bot carousell crawler ecommerce marketplace python python-telegram-bot scraper telegram telegram-bot web-scraping yahoo yahoo-auction
Last synced: 20 Aug 2025
https://github.com/pinpox/go-random-downloader
Download Html using "Random Page"
Last synced: 17 Aug 2025
https://github.com/nbdy/prntscrngrb
prnt.sc / lightshot crawler, nudity detection and text extraction to a sqlite database
crawler nudity-detection prntsc text-extraction
Last synced: 04 Oct 2025
https://github.com/wangyihang/acw-sc-v2-py
Python requests.HTTPAdapter for `acw_sc__v2`
Last synced: 18 Jun 2026
https://github.com/eklem/browsercrawler
Crawling content from a site within the browser. A basis for i.e. a search solution for static sites.
crawler search-engine website-generation
Last synced: 02 Aug 2025
https://github.com/jofaval/webscraping
WebScraper providing tools to scrape tons of websites with the same base
crawler e-commerce python scraper webscraper webscraping
Last synced: 06 Oct 2025
https://github.com/eeriemyxi/nosori
Online image viewer for https://coomer.su and https://kemono.su
api coomer crawler docker image javascript kemono server typescript video viewer web
Last synced: 01 Aug 2025
https://github.com/litingyes/cobweb
Collect, store and distribute meaningful static data
apis bing-image bing-wallpapers crawler image random-image
Last synced: 31 Jul 2025
https://github.com/joeri-abbo/python-credly-scraper
This project is a set of Python scripts designed to crawl and extract data from the Credly platform, focusing on skills, organizations, and badges. The scripts allow users to perform searches using command-line arguments, predefined search terms, or skills listed in a JSON file. The collected data is then saved to JSON files for further analysis an
badges crawler credly data-extraction json organizations python python3 requests-library skills web-crawling
Last synced: 23 Sep 2025
https://github.com/nextlevelshit/fick
Fucking Incredible Command line King. Add CLI flavour to any website you like to.
Last synced: 17 Feb 2026
https://github.com/zekrotja/r34-crawler
A simple CLI tool to fetch and download images from rule34.xxx
crawler go rest-api rule34 worker-pool xml
Last synced: 06 Mar 2026
https://github.com/sieep-coding/web-crawler
A simple web crawler implemented in Go.
Last synced: 09 Mar 2026
https://github.com/destan0098/go-agent
you can use this package to make random user agent
crawler security security-tools user-agent user-agents
Last synced: 20 Sep 2025
https://github.com/sachin-kumar-2003/seocrawler
SEO Link Checker | Find Broken Links & Improve SEO I have built an SEO Link Checker that helps businesses, marketers, and site owners scan their websites, detect broken or harmful links, and fix them fast. This improves site health, user experience, and search rankings. Features: -Scan entire website for broken internal and external links
beautifulsoup crawler fastapi reactjs seo seo-optimization
Last synced: 15 Apr 2026
https://github.com/jjlibra/bake-mediacrawler
NanmiCoder‘s self-media data crawling software
Last synced: 06 May 2025
https://github.com/tufayellus/linkedin-cv-downloader
A Python based GUI automation software for downloading bulk LinkedIn CV / LinkedIn Resume from a list of profile links
crawler digital-marketing email-marketing email-scraper leads linkedin-bot linkedin-cv linkedin-cv-downloader linkedin-download linkedin-downloader linkedin-resume linkedin-resume-downloader linkedin-scraper scrape-emails scrape-websites scraper scraper-engine
Last synced: 17 Mar 2025
https://github.com/marcus-v-freitas/crawlerbrazilgovdata
Projeto ASP.NET Core .NET 5 para Extração e Parseamento de Dados do governo de São Paulo com integração com Buckets S3, Filas SQS AWS e Persistência realizada via EF Core no Mysql.
api-rest aspnetcore automapper aws crawler csharp efcore government-data htmlagilitypack linux memory-cache mysql net5 onion-architecture parallel-computing parser s3-bucket serilog sqs-queue swagger
Last synced: 17 Jan 2026
https://github.com/darealfreak/figure-tracker
application to keep watch of wished figures on multiple sites and notify you about auctions, sales or sudden price drops
crawler figure-tracker monitoring
Last synced: 30 Mar 2025
https://github.com/antoinegagne/treewalker
A web crawler in Erlang that respects `robots.txt`.
Last synced: 11 Feb 2026
https://github.com/supadata-ai/py
Official Python SDK for the Supadata API.
ai api crawler llm markdown scraping sdk transcript web-scraper youtube
Last synced: 22 Mar 2025
https://github.com/lockblock-dev/crawlarr
Crawlarr is a fast web crawler built in Go. It searches for anchor tags in the HTML pages and follows links. It leverages concurrency to improve speed.
Last synced: 18 Mar 2025
https://github.com/galaxiat/galaxiat.serve.seo
Node.JS package to serve React app and prerender path (cron)
crawler cron puppeteer seo seo-optimization ssr
Last synced: 31 Jan 2026