Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2026-06-23 00:06:44 UTC
- JSON Representation
https://github.com/ashwantmanikoth/intellilsearch
This is a AI powered crawler that can search the web for information based on your input.
crawler deepseek groq-api hybrid-search llama llm pydantic python rag reranking retrieval-augmented-generation
Last synced: 15 Apr 2026
https://github.com/tjdsneto/jcnet-crawler
Extract (scrap) movie schedule info from JCNet movies page
Last synced: 11 Apr 2026
https://github.com/hoan02/novel-crawler
Tool cào dữ liệu truyện để phục vụ cho doctruyen.io.vn
Last synced: 13 Mar 2025
https://github.com/lucasromualdo/glassdoorcrawler
Crawler em Python para coletar vagas do Glassdoor e exportar para Excel
cli crawler glassdoor openpyxl pandas python web-scraping
Last synced: 25 Feb 2026
https://github.com/appliedsoul/headless-screenshot
High-level library for taking screenshot of websites based on headless chrome (puppeteer)
crawler headless-chromium javascript nodejs scrapper screenshot testing
Last synced: 21 Apr 2026
https://github.com/ggteixeira/corpus-cleaner
Linguistic tool (made by a linguist, for linguists) that scraps corpora, automatically cleans it up, and generates n-grams.
beautifulsoup4 bs4 corpora corpus corpus-linguistics crawler linguistics nlp python scraper web-scraping
Last synced: 28 Feb 2025
https://github.com/artemnikitin/crawler
Example of web crawler implemented in Go
Last synced: 22 Jun 2025
https://github.com/sanhphanvan96/php-training-crawler
Simple php crawler for training purpose
crawler docker docker-compose nginx php php-fpm
Last synced: 13 Apr 2026
https://github.com/joyceannie/moviespider
This project is used to crawl movie data from IMDb. Scrapy framework is used to extract relevant information like movie title, datePublished, summary, genres, director etc.
crawler datascience python scrapy spider webscraper
Last synced: 24 Mar 2025
https://github.com/cristiangreco/gcrawler
A simple (not concurrent) web crawler written in Java.
Last synced: 30 Jul 2025
https://github.com/bwh1270/allrecipes-scraper
crawler food-computing scraper scraping scrapy
Last synced: 18 Mar 2025
https://github.com/izh318/genie-music-artist-album-crawler
지니뮤직에 등록 되어 있는 특정 아티스트의 앨범 정보를 한 번에 크롤링 하는 Python Script 입니다.
Last synced: 08 Nov 2025
https://github.com/robin98sun/structured-web-data-crawler
crawler multi-thread structured-web-data
Last synced: 16 Mar 2025
https://github.com/bramtenhove/issue-crawler
Crawls Drupal issues and keeps stats
Last synced: 09 Jan 2026
https://github.com/yangxuhui/requests-google
A simple google related Parsing Package
Last synced: 14 Jan 2026
https://github.com/k0nxt3d/web-scrapers
Web Scraping Scripts in PhP and Bash
bash bot clone cloning crawler curl curlphp download mirroring scraping scraping-websites seo seo-optimization shell-script spider wget
Last synced: 31 Dec 2025
https://github.com/usethisname1419/connectioncrawler
crawls a website and checks for connections
connection crawler http-headers reporting website-analyzer
Last synced: 06 Jul 2025
https://github.com/tssujt/async-crawler-sample
A simple crawler sample based on asyncio~
Last synced: 15 Mar 2025
https://github.com/mikiw/reactweb3
Ethereum transaction crawler in ReactJs.
Last synced: 14 May 2026
https://github.com/loko5ja/seed-gen
Seed-gen is an innovative tool designed to generate unique and creative seed phrases for cryptocurrency wallets. With a focus on security and usability, it ensures that users have robust, memorable keys for safeguarding their digital assets efficiently.
crawler crypto crypto-2025 crypto-bot crypto-finder crypto-recovery ethereum-bruteforce laravel lost-btc-wallet-finder mnemonic-generator seed-crypto seed-recovery seed-tool yeoman
Last synced: 03 Apr 2025
https://github.com/azshurith/depth-crawler
A simple yet powerful Python web crawler that explores a given domain up to a specified depth and outputs a JSON sitemap of URLs and page titles.
Last synced: 20 Apr 2026
https://github.com/nowshad-sust/corona
A simple data endpoint for coronavirus updates
api corona coronavirus-updates crawler dcoker-compose excel nodejs
Last synced: 17 May 2026
https://github.com/sssshefer/web-crawler-http
Basic web crawler which represents the linking structure of the website
Last synced: 01 Mar 2025
https://github.com/vaenow/chromeless-coursera-caption
Chromeless crawler coursera video's caption / subtitle
caption chromeless coursera crawler crx subtitle
Last synced: 31 Mar 2025
https://github.com/jplitza/urlsearch
Index typical webserver directory listings and then search for arbitrary terms
Last synced: 17 Mar 2025
https://github.com/m1/smap
smap is a site-mapping engine written in Go.
crawler go go-library go-package golang golang-library golang-package golang-tools sitemap sitemap-generator web-crawler web-crawling
Last synced: 01 Jul 2025
https://github.com/tatamiya/gas-new-books-crawler
Crawling new book information from 版元ドットコム(https://www.hanmoto.com/)
Last synced: 30 Oct 2025
https://github.com/xprnvd/makdi
Website crawler created for pentest exercises like HTB.
crawler htb htb-scripts pentest python
Last synced: 20 Jul 2025
https://github.com/andrepradika/scrape-medrecruit.medworld.com
🛠 A Playwright-based web scraper that extracts job listings from MedRecruit, including job title, department, location, job type, duration, and job URL, saving the data to an Excel file.
Last synced: 17 Mar 2025
https://github.com/andrepradika/scrape-xpel.com
📌 A Playwright-based web scraper that extracts installer details from XPEL’s Installer Locator and saves them to CSV and Excel files.
Last synced: 17 Mar 2025
https://github.com/allancapistrano/anime-sheets
Crawler que pega as informações dos animes e salva numa planilha.
anime crawler google-sheets google-sheets-api
Last synced: 16 Mar 2025
https://github.com/kartikmehta8/pycrawler
PyCrawler is a web scraper that takes a link as input and returns all the links connected to the page(s). Goes beyond recursion. Threaded.
Last synced: 13 Sep 2025
https://github.com/pengkobe/my-web-crawler
auto pull blog update from bloggers. dev based on angular2
Last synced: 18 May 2026
https://github.com/roc41d/http-web-crawler
Http web crawler with Nodejs + TDD
crawler http javascript jest jest-test nodejs webcrawler
Last synced: 13 Apr 2026
https://github.com/moojing/coinmarketcap-crypto-crawler
A Raycast plugin for getting the latest price of your favorite coins from CoinMarketCap.
Last synced: 01 Apr 2025
https://github.com/apurvsikka/mediaverse
MediaVerse is a versatile search engine for various media types such as anime, books and drama
anime anime-api anime-api-free api-rest bun crawler extensions extensions-pack free-manga kdrama lightnovel manga manga-api manga-api-free manga-crawler manga-reader movies netflix ts tv
Last synced: 29 Mar 2025
https://github.com/jauharibill/animeindo-crawler
this crawler is used for research only. the creator doesn't take any responsibility for any harmful usage
Last synced: 08 Jul 2025
https://github.com/d7isme/pixiv-downloader-mod
Modded extension of the pixiv downloader on chrome webstore with premium feature unlocked.
chrome-extension crawler extension-chrome image pem pixiv pixiv-bot pixiv-crawler pixiv-downloader
Last synced: 14 May 2026
https://github.com/ri0n/unboxer
MP4 crawler and extractor
crawler extractor mp4 object-oriented-design qt
Last synced: 10 May 2026
https://github.com/snwfdhmp/3gm-bot
Bot for the online french indie game 3gm.fr implemented in Ruby. Mostly website crawling and task automation.
3gm-bot crawler game-bot task-automation web-crawling
Last synced: 30 Oct 2025
https://github.com/ronierisonmaciel/crawler
Um crawler utilizando BeautifulSoup tem como objetivo extrair informações de sites de maneira eficiente e estruturada. BeautifulSoup é uma biblioteca Python que facilita a análise e extração de dados de páginas HTML e XML. O projeto permite coletar e organizar informações relevantes.
beautifulsoup4 crawler crawling python python3
Last synced: 26 Mar 2025
https://github.com/kweonminsung/crawl2toast
Real-time toast notification of crawled data with CSS selectors(Windows Only)
beautifulsoup4 crawler selenium tkinter toast-notifications
Last synced: 18 May 2026
https://github.com/d-w-arnold/local-news-data-collection
Web crawler for local news sites - Generates HTML files of each webpage visited and a list of links found on the webpage, as a TXT file 🌎
crawler data-collection python
Last synced: 01 Apr 2025
https://github.com/javapuppteernodejs/bypass-cloudflare-turnstile-crawl4ai
Learn how to integrate Crawl4AI with CapSolver to automatically solve Cloudflare Turnstile challenges.
automation capsolver captcha captcha-solver cloudflare-turnstile cloudflare-turnstile-bypass cloudflare-turnstile-solver crawl4ai crawler data-extraction python turnstile web-scraping
Last synced: 17 May 2026
https://github.com/istador/mediaindexer
Software for a cronjob to crawl the ViMP media center and generate an index for it as a static website.
Last synced: 03 Jan 2026
https://github.com/lfsc09/crawl-this-go
Simple CLI tool for crawling pdf documents and html pages
Last synced: 18 Jun 2025
https://github.com/vivekg13186/lucas
A web crawler
crawler crawler-engine crawling-framework java
Last synced: 19 Apr 2026
https://github.com/keizerzilla/ssh-hunter
Script que caça por Raspberry Pis vulneráveis na internet (porta SSH aberta e senha padrão não modificada).
Last synced: 10 Apr 2025
https://github.com/keizerzilla/search4dwango9
My attempt to help solving the DWANGO9 wad mystery. More info: https://www.youtube.com/watch?v=RXGtCjdwwe8
Last synced: 10 Apr 2025
https://github.com/vaenow/crawler-chromeless
A chromeless crawler for coursera
chromeless coursera crawler puppeteer
Last synced: 18 May 2026
https://github.com/thejoin95/free-proxies.info
API service for get anonymous and non proxy, filter by latency, country, updatetime and more
api crawler http-proxy proxy proxy-list python scraper
Last synced: 29 Oct 2025
https://github.com/javapuppteernodejs/bypass-awswaf-crawl4ai
Bypass AWS WAF with Crawl4AI & CapSolver: A personal developer's guide to seamless web scraping on WAF-protected sites, featuring API and browser extension integration examples.
automation aws aws-waf capsolver captcha captcha-solver crawl4ai crawler crawling python web-scraping
Last synced: 28 Dec 2025
https://github.com/moj124/web_crawler
The web_crawler is a asynchoronous gevent link crawler that maps all the associated local links constrained by the input webpage url.
crawler crawler-python links-spider
Last synced: 13 Mar 2025
https://github.com/knguyen780/web-crawler
about crawl data
crawler jsoup-library scraper selenium-java
Last synced: 25 Jun 2025
https://github.com/qzcool/uscis-case-status-estimation-system-stat-ez
Estimates time of case results arrival, for applicants who are waiting for their USCIS case results with the receipt numbers at hand.
beautifulsoup crawler immigration web
Last synced: 16 Jun 2025
https://github.com/datvodinh/laptop-price-prediction
An End to End Data Science Project about Laptop Price Prediction
crawler ensemble-learning scrapy selenium xgboost
Last synced: 11 May 2025
https://github.com/jimut123/leaderbehaviour
Scrapy project to get and extract the names of Leaders, their misdeed by scraping news website!
crawler leaderbehaviour newsscraper scrapy timesofindia
Last synced: 16 Jan 2026
https://github.com/lilchen96/pokemon-crawler
Crawl JSON-formatted data for Pokémon, based on the PokeAPI.
Last synced: 28 Dec 2025
https://github.com/jamesponddotco/wikiextract
[READ-ONLY] A word extractor for Wikipedia articles.
crawler crawling diceware go wikipedia wikipedia-crawler word-extraction
Last synced: 15 Mar 2025
https://github.com/yaoshanliang/linkedinspider
Crawl job information from LinkedIn for data analysis
big-data crawler python social-network-analysis
Last synced: 30 Mar 2025
https://github.com/fscotto/noahcrawler
A simple web crawler written in Java to support a database of Italian regions.
Last synced: 14 Sep 2025
https://github.com/anthonysigogne/scrapy
A list of simple scrapers made with Scrapy
crawler elasticsearch python scrapy spider
Last synced: 11 Apr 2026
https://github.com/evangelos-karavas/arduino-crawler-line-follower-obstacle-avoidance
Crawler Robot following black line while avoiding obstacles found in the way. Assignment for Mehcatronics
arduino-uno autonomous-vehicles cpp crawler infrared-sensors mechatronics path-planning robotics
Last synced: 28 Apr 2026
https://github.com/rafaelmoraes003/tech-news
Analysis and manipulation of news data from a technology website obtained through data scraping using Python.
crawler data-scraping https mongodb parsel pymongo python web-scraping
Last synced: 05 May 2026
https://github.com/kahsolt/tieba-dl
A simple image crawler/downloader for Baidu tieba.
baidu-tieba crawler image-crawler tieba
Last synced: 12 Jun 2026
https://github.com/ymdarake/otenki-crawler
Yet another weather data scraper.
Last synced: 02 Feb 2026
https://github.com/tiennhm/crawl-sanfoundry-mcqs
Sanfoundry MQCS Crawler
beautifulsoup4 bs4 crawler csv flask python
Last synced: 13 Apr 2026
https://github.com/ariefrahmansyah/crawler
Simple website crawler using Go programming language.
Last synced: 27 Mar 2025
https://github.com/laffrex/xiaolanben_crawler
一个高效、稳定的小蓝本网站数据采集工具,可自动提取公司和集团产品、媒体及股东等信息,支持智能处理弹窗和自动化数据分类整理,最终目的是为了方便进行SRC信息收集。
Last synced: 23 Mar 2025
https://github.com/murilobsd/icrop-csv
Icrop-csv para automatizar o processo do download dos relatórios.
Last synced: 17 Nov 2025
https://github.com/mehdieidi/offliner
Offliner is a tool to make a website offline viewable. It's a concurrent web crawler which saves all the pages and static files in a directory.
concurrency concurrent concurrent-programming crawler go golang goroutine multiprocessing multithreading process scraper thread
Last synced: 14 Jan 2026
https://github.com/heitor57/astronomy-news
:telescope::newspaper: Astronomy News
crawler data-science news text-mining
Last synced: 06 Oct 2025
https://github.com/xyk2002/aqistudy-crawler
关于网站:https://www.aqistudy.cn/historydata/ 的空气质量数据的异步协议爬虫,可以快速的获取的数据将会保存至CSV文件
Last synced: 22 Aug 2025
https://github.com/zigai/crawlwright
Web crawling framework powered by Playwright
crawler crawling playwright python scraping wrighter
Last synced: 18 May 2026
https://github.com/truongdd03/searchengine
A search engine written in c++.
cpp crawler search search-engine
Last synced: 06 Apr 2025
https://github.com/b3j4y/unidisk
A Crawler to search for keywords and compare the score
comparison crawler nlp solr-client
Last synced: 17 Jan 2026
https://github.com/m-taghizadeh/persian_question_answering_voice2voice_ai
This repository hosts BonyadAI, a Persian question answering AI Model. We developed an initial web crawler and scraper to gather the dataset. The second phase involved building a machine learning model based on word embeddings and NLP techniques. This AI model operates end-to-end, receiving user voice input and providing responses in Persian voice.
artificial-intelligence corpus-linguistics crawler deep-learning farsi farsi-datasets large-language-models machine-learning natural-language-processing persian python question-answering scraping-python speech-to-text text-to-speech transformer-architecture word2vec
Last synced: 04 May 2026