Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-22 00:06:47 UTC
- JSON Representation
https://github.com/sadewadee/foxhound
Go scraping framework with native Camoufox anti-detection. Dual-mode fetching (TLS stealth + browser), 60+ identity profiles, human behavior simulation, adaptive parsing, 12-layer middleware, 9 export formats. 741 tests.
anti-detection camoufox crawler golang playwright proxy-rotation scraping stealth tls-fingerprint web-scraping
Last synced: 05 May 2026
https://github.com/coalee/hotword
Chatbot of crawling & plotting keywords in recent news.
beautifulsoup chatbot crawler dialogflow flask slackbot wordcloud
Last synced: 29 Apr 2026
https://github.com/maximebories/regexp-scraper
Advanced used of Puppeteer to scrape a web engine results against a RegExp
crawler cyber-investigations dorking google-dorking osint phishing-sites puppeteer regex regexp scraper scraping search search-engine web-security
Last synced: 01 May 2026
https://github.com/fanyong920/crawlitem-puppeteer
puppeteer抓取商品的例子
chromnium crawler javascript nodejs puppeteer scrapy
Last synced: 10 May 2026
https://github.com/schbenedikt/web-crawler
A simple web crawler using Python that stores the metadata of each web page in a database.
crawler database mariadb mysql python python-crawler web
Last synced: 14 Apr 2025
https://github.com/sefinek/known-bots-ip-whitelist
A whitelist of trusted IP addresses used by legitimate crawlers and services such as Googlebot, Bingbot, AhrefsBot, UptimeRobot, Pingdom, Cloudflare, Bunny CDN, Stripe, Shodan, FacebookBot, TelegramBot, etc.
bot bots crawler firewall goodbot goodbots googlebot ip-address ip-addresses ipset safe safe-bots safety security whitelist whitelist-bot whitelists
Last synced: 21 Jun 2025
https://github.com/elektrostudios/fhm-crawler-freehardmusic.com
Crawls download urls of albums from freehardmusic.com website
albums crawl crawler crawling desktop-app desktop-application dotnet music web-crawler web-crawling web-scraper web-scraping webcrawler webcrawling webscraper webscraping windows windows-app windowsapp winforms
Last synced: 19 Jul 2025
https://github.com/filipefilardi/crunchyroll_filters
Discover new animes filtering Crunchyroll database
anime anime-list crawler crunchyroll flask
Last synced: 30 May 2026
https://github.com/tryagi/firecrawl
Generated C# SDK based on official Firecrawl OpenAPI specification
ai crawler crawling dotnet firecrawl generated generator langchain langchain-dotnet net8 netframework netstandard openapi scrape scraping sdk
Last synced: 12 Apr 2025
https://github.com/waynechang65/baha-crawler
baha-crawler is a web crawler module designed to scarp data from Bahamut Forum.
bahamut crawler javascript nodejs scraper spider webcrawler
Last synced: 22 Apr 2025
https://github.com/yoonje/soongsil-notice-crawler
숭실대학교와 각 학과들의 공지사항을 크롤링하는 프로그램
Last synced: 19 Jun 2025
https://github.com/ruedigervoigt/salted
Smart, Asynchronous Link Tester with Database backend: works with HTML, Markdown and TeX files
asyncio crawler html-files hyperlinks latex linkchecker markdown pandoc python
Last synced: 27 Oct 2025
https://github.com/zebbern/reconx
🕷️ | ReconX is a Live-Website Crawler made to gather critical information with an option to take a picture of each site crawled!
crawler hacking information-gathering information-retrieval information-security livedata opsec osint osint-tool pentest python python-crawler search-engine security security-tools website website-crawler website-scraper website-security
Last synced: 03 Jul 2025
https://github.com/yakuza8/coronavirus-timeseries-predictor
Timeseries analyzer for coronavirus with recurrent neural network
asyncio beautifulsoup4 corona coronavirus coronavirus-analysis coronavirus-crawler coronavirus-dataset covid covid-19 covid19-data crawler python-3-6 python3 python36 rnn web-scrapper
Last synced: 09 Apr 2025
https://github.com/mmqnym/etherscan_tracker
Show how to tacker wallet on etherscan.io
Last synced: 25 Dec 2025
https://github.com/igeligel/TeamFortressOutpostApi
:repeat: An API wrapper for the TF2 Outpost platform. A platform to find great deals for your Team Fortress 2, Counter-Strike: Global Offensive and Dota 2 items with zero hassle.
bot bot-framework crawler steam steam-api steambot teamfortress2
Last synced: 05 May 2025
https://github.com/simoninithomas/news-crawler-parse-backend
This is a crawler made with Scrapy.py to crawl french news articles and send them in your Parse.com backend
Last synced: 29 Oct 2025
https://github.com/guessi/youtube-search-crawler
YouTube Search Results Crawler
Last synced: 11 Apr 2025
https://github.com/indatawetrust/reporter
Crawler queue creation tool for paging
Last synced: 05 May 2025
https://github.com/roccomuso/is-baidu
Verify that a request is from Baidu crawlers using DNS verification
baidu crawler dns ip js nodejs verification
Last synced: 19 May 2026
https://github.com/arthur3486/google-play-scraper-kotlin
Library for scraping of the application data from the Google Play Store.
crawler google-play google-play-store java kotlin kotlin-library scraper scraping
Last synced: 14 Jan 2026
https://github.com/aicore/app_info_extracter
This application would be used to extract information about apps from the internet
android appreview apps crawler googleplaystore
Last synced: 26 Oct 2025
https://github.com/vivekg13186/easy_web_crawler
Web crawler around puppeteer to crawler ajax/java script enabled pages.
Last synced: 24 Oct 2025
https://github.com/ozansz/github-crawler
A basic utility for crawling users and e-mails of users
Last synced: 15 May 2026
https://github.com/spa5k/quick-scraper
An easy, lightweight scraper built using typescript for good developer experience.
crawler dx easy-to-use esbuild scraper typescript
Last synced: 18 Aug 2025
https://github.com/dnlzrgz/winzig
A tiny search engine for personal use.
async cli crawler feeds lofi python python3 rss-feed rss-reader sqlalchemy sqlite sqlite3
Last synced: 06 Apr 2025
https://github.com/rafaelglikis/sinama
Web scraping library
crawler crawling scraper scraping
Last synced: 12 Jan 2026
https://github.com/YektaDev/Krawler
A configurable HTML Crawler written in Kotlin (JVM), powered by Coroutines, Kotlin Serialization (JSON), Ktor Client, Exposed, and SQLite.
crawl crawler crawlers crawling
Last synced: 21 Oct 2025
https://github.com/chenmozhijin/mediawikiextractor
一个用于从 MediaWiki 网站中提取数据并保存为json的 Python 脚本。|A Python script for extracting data from a MediaWiki website and saving it as json.
crawler crawler-python crawling extractor json mediawiki python regex web-crawler
Last synced: 14 Feb 2026
https://github.com/xiaoyvyv/androidcrawlerengine
A dynamic crawler plug-in for the Android platform based on Dex dynamic loading, which can dynamically load and execute the dex plug-in package, and can realize real-time updates of crawler and other functions.
android apk class crawler dex dynamic execute java jsoup jvm kotlin module okhttp pak plugin reflection scrapy spider web webmagic
Last synced: 22 Jan 2026
https://github.com/joshuaquek/docusite-to-pdf
Provide a URL and this will generate multiple PDF documents of the whole site within the bounds of the URL path. This code repo is for educational purposes only.
crawler documentation-generator html2pdf pdf pdf-converter pdf-document pdf-generation scraper
Last synced: 28 Feb 2026
https://github.com/zhoudaxia233/unilogo
A visually striking assembly of the top 1000 universities' logos from ARWU, sorted by color into a vibrant spectrum.
Last synced: 02 Apr 2025
https://github.com/zebbern/regex-crawler
Regex Web Crawler that searches on custom regexes meanwhile crawling each site to find the information your looking for!
bug-bounty bugbounty crawler information-gathering information-retrieval osint osint-tool pentest python regex regex-engine regex-match regex-pattern regex-tool toolkit tools website
Last synced: 14 Apr 2025
https://github.com/hrvadl/goweekly
Application for querying top articles from https://golangweekly.com/, translating them to Ukrainian and sending to the telegram channel
article chatgpt crawler go golang openai-api telegram telegram-bot
Last synced: 14 Feb 2026
https://github.com/basemax/jadi-net-blog
This Python script is used to extract posts from a WordPress blog (https://jadi.net/) and save them in HTML format. The script fetches the RSS feed, parses the posts, and saves each post as an individual HTML file.
blog-copier copier crawler crawler-python crawlers jadi-blog jadi-clone jadi-net-blog jadi-net-clone jadinet-blog py python python-crawler wordpress wp
Last synced: 13 Oct 2025
https://github.com/basemax/film2serial-api-service-crawler
Crawling content and Movies of a Persian site using PHP.
crawler crawler-movie crawler-php crawlers movie-crawler movie-database php php-crawler php7 php74
Last synced: 11 Oct 2025
https://github.com/roccomuso/is-duckduck
Verify that a request is from DuckDuckBot, the Web crawler for DuckDuckGo
crawler duckduck duckduckbot duckduckgo ip js nodejs verify web
Last synced: 14 Sep 2025
https://github.com/hyeockjinkim/baekjoon-management
Management program of BoJ
Last synced: 20 Mar 2025
https://github.com/jmkim/stock-crawler
Universal Stock Crawler
crawler stock stock-market yahoo-finance
Last synced: 28 Jul 2025
https://github.com/scottstraughan/simple-python-url-crawler
Super simple Python3 website URL scraper/crawler. Multi-threaded.
crawler googlebot lightweight link-collection multi-threaded python python3 scraper simple
Last synced: 29 Jul 2025
https://github.com/ericz99/go-crawler
Simple lightweight crawler, that will find all endpoints on any website.
Last synced: 07 Oct 2025
https://github.com/csjiabin/part-time-egg
part-time-egg
alinode cheerio crawler egg eggjs mongo mongoose nightmare schedule ssr ts typescript vue vue-router vue-server-renderer vuex webpack
Last synced: 11 Apr 2026
https://github.com/v-braun/hero-scrape
Find the hero (main) image of an URL
crawler fastimage hero hero-image opengraph webscraping
Last synced: 05 Mar 2025
https://github.com/n3wjack/sitecrawler
A command-line based web crawler
crawler tool webcrawler webcrawling webdevelopment
Last synced: 07 Mar 2026
https://github.com/ribeirogab/technology-insights
Program with the aim of using the data from Stack Overflow Insights 2020 and generating informative graphs.
crawler python scraping typescript
Last synced: 15 May 2025
https://github.com/misaka10843/yamibo-downloader
一款可以批量下载百合会论坛的漫画下载器(支持CBZ保存)
comic crawler downloader python yamibo
Last synced: 17 Jan 2026
https://github.com/tonnytg/webreq
Light way to make web request GET and POST easily using standard library http. This is a helpful module for your days.
Last synced: 08 Feb 2026
https://github.com/ktont/curlas
a nodejs spider tool
chrome-extension crawler spider
Last synced: 22 Sep 2025
https://github.com/astef/artlebedev-dj-crawler
Listen online and download music from artlebedev.ru/dj
Last synced: 27 Mar 2026
https://github.com/testica/a3hrgo-sdk
a3HRgo sdk to automatize your reports
a3hrgo crawler javascript puppeteer
Last synced: 25 Oct 2025
https://github.com/knourian/freelancer.com-category-scrapping
Scrapping Categories from Freelancer.com Using scrapy with number of project for each category
crawler freelancer python3 scrapy web-crawler
Last synced: 12 Sep 2025
https://github.com/marcbperez/python-webcrawler
Crawls HTML pages for prices and other pieces of data.
Last synced: 13 Apr 2026
https://github.com/sergioburdisso/solidscraper
Easy to use JQuery-Like API for Web Scraping/Crawling.
crawler crawling crawling-python jquery python scraper scraping tweets twitter web web-crawler web-scraping webscraping
Last synced: 07 Feb 2026
https://github.com/gill-singh-a/crawler
A Program that crawls on web starting from a given web page and looking for keywords through other internal links that are found
crawler multithreading osint python python3 requests scraper
Last synced: 07 Jul 2025
https://github.com/mohammadrezaamani/squirrel
Squirrel is a web crawler designed to collect all pages from Iranian websites, enabling you to download and store web page content in a structured format.
Last synced: 09 Feb 2026
https://github.com/vinouno/BilibiliDanmuCrawler
一个从 bilibili.com 爬取弹幕并生成词云的 Python 项目
Last synced: 16 Mar 2025
https://github.com/sanmak/queue-web-crawler
This application is developed to crawl a website with queue that determines no of allowed concurrent connections and find all possible hyperlinks present within it and save it to CSV file.
async chai crawler csv hyperlinks mocha nodejs queue scrapper web
Last synced: 12 Mar 2026
https://github.com/igeligel/teamfortressoutpostapi
:repeat: An API wrapper for the TF2 Outpost platform. A platform to find great deals for your Team Fortress 2, Counter-Strike: Global Offensive and Dota 2 items with zero hassle.
bot bot-framework crawler steam steam-api steambot teamfortress2
Last synced: 14 Feb 2026
https://github.com/bitebait/curry
🍛 Curry é um WebCrawler escrito em Golang com finalidade de verificar o valor do câmbio de Dólar para Real (USDxBRL) em algumas lojas no Paraguay.
api brasil crawler currency-exchange-rates go golang paraguay webcrawler
Last synced: 15 Jan 2026
https://github.com/georgea93/crawley
nodejs web crawler
crawler depth es6 javascript node nodejs nodejs-web-crawler npm npm-module npm-package robots-txt sitemap web yarn
Last synced: 11 Apr 2026
https://github.com/lykmapipo/producthunt-python-scrapy-scraper
Python Scrapy spiders that scrapes data from producthunt.com
crawler featured launch lykmapipo product producthunt python scraper scrapy spider webscraper
Last synced: 08 Apr 2025
https://github.com/agmmnn/nis-scraper
Scrapy script to scrape nisanyansozluk.com
Last synced: 08 Apr 2025
https://github.com/cmagnobarbosa/crawler_tiktok
Open tool to get TikTok Data - Crawler Tiktok
Last synced: 17 Jan 2026
https://github.com/mazzasaverio/structured-data-jobs
A data pipeline that scrapes job opportunities from company websites and uses OpenAI to structure the data. Initially focused on tech roles, but easily adaptable for any job type.
crawler docker llm logfire neon openai python uv
Last synced: 14 May 2026
https://github.com/chalkpe/dimibob-py
한국디지털미디어고등학교 급식 데이터 크롤러
beautifulsoup crawler dimigo python3
Last synced: 14 Jan 2026
https://github.com/sauerbraten/chef
Cube 2: Sauerbraten spy bot: collects IP-name combinations from extinfo and provides a web interface to search them.
crawler extinfo go sauerbraten spy stalker
Last synced: 27 Dec 2025
https://github.com/mikirasora/osuplayedbeatmapscrawler
A crawler that fetch and download osu!beatmaps which you had played
Last synced: 26 Mar 2026
https://github.com/rimiti/ping-urls
🏓 Ping URLs by batch.
cache crawler ping prerender prerendering seo
Last synced: 03 Jul 2025
https://github.com/zabuzard/mplogger
Saves marketprices for items, based on transactions, from the game 'http://www.freewar.de/' in a database by using a bot. Then processes the data and creates corresponding market price articles in 'http://www.fwwiki.de/'.
bot crawler database game mediawiki-api mmorpg mmorpg-freewar php saves-marketprices web-crawler wikipedia-api
Last synced: 03 Feb 2026
https://github.com/Anakeyn/website-contextual-links
Récupération des liens contextuels d'un site Web avec R.
Last synced: 17 Jul 2025
https://github.com/lockblock-dev/crawlarr
Crawlarr is a fast web crawler built in Go. It searches for anchor tags in the HTML pages and follows links. It leverages concurrency to improve speed.
Last synced: 18 Mar 2025
https://github.com/nakabonne/staticcollector
Application to analyze static files of competing sites
Last synced: 19 May 2026