Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2025-01-05 00:06:18 UTC
- JSON Representation
https://github.com/markelog/map
Simple site map generator, supports couple reporters, depth levels and etc
Last synced: 25 Nov 2024
https://github.com/jean-baptiste-camps/iiif-crawler
Interrogate IIIF servers and get images of manuscripts
crawler iiif iiif-image manuscripts
Last synced: 11 Oct 2024
https://github.com/gabfl/sitecrawl
Simple Python module to crawl a website and extract URLs
crawl crawler crawler-python crawling-sites
Last synced: 13 Oct 2024
https://github.com/capturr/jsonld-extract
A damn simple tool to extract json-ld metadata from webpage using jquery like api (jQuery, Cheerio, CashDom ...).
cashdom cheerio crawler crawling data extract extractor javascript jquery json jsonld metadata nodejs parser scraper scraping spider typescript
Last synced: 28 Oct 2024
https://github.com/nobodxbodon/chromecrawlerwildspider
Chrome Extension to crawl web pages by loading them into browser tabs parallelly.
chrome-extension crawler localstorage spider
Last synced: 30 Nov 2024
https://github.com/ajcerejeira/base.gov.pt
A crawler that fetches data from base.gov.pt
Last synced: 06 Nov 2024
https://github.com/yggverse/yggstate
Yggdrasil Network Explorer
analytics crawler explorer geo-ip geo-location geolite2 mysql php search-engine sphinx spider yggdrasil yggdrasil-api yggdrasil-network yggdrasil-php-api yggdrasilctl yggstate
Last synced: 06 Nov 2024
https://github.com/sayyid5416/links-extractor
Extract links from any file or the website.
crawler extract-links extractor links-extraction scraper web-crawler web-scraper
Last synced: 28 Oct 2024
https://github.com/AmirAref/DivarCrawler
an script to crawl divar.ir and extract phone numbers
Last synced: 22 Nov 2024
https://github.com/twtrubiks/pttcrawlercontent
PTT Crawler Content on python PTT文章爬蟲
Last synced: 16 Nov 2024
https://github.com/chusiang/crawler-book-info
A crawler for quick parser the book information
Last synced: 07 Nov 2024
https://github.com/arshadkazmi42/blc
Broken link checker
blc broken-link-checker broken-link-finder bug-bounty bugbounty crawler python
Last synced: 28 Oct 2024
https://github.com/luizppa/web-crawler
A web crawler that collects and indexes web pages. Made with chilkat and gumbo parser.
chilkat cpp crawler webcrawler
Last synced: 28 Oct 2024
https://github.com/sweeticelolly/sao_title_bot
一个生成骚论文题目的机器人
chrome-dr chromedriver crawler generator language-learning language-model numpy python robot scholar scholarly-articles selenium selenium-webdriver
Last synced: 24 Nov 2024
https://github.com/integralist/go-web-crawler
A web crawler built in the Go programming language
concurrency crawler go golang web-crawler
Last synced: 11 Oct 2024
https://github.com/simin75simin/libgencrawl
crawl all books from a library genesis search
crawler free-software libgen python3 scraper
Last synced: 05 Nov 2024
https://github.com/rvegas/dota_crawler
Crawler for dotapedia. Fills a Mongo and a PG database with game data.
crawler dota dota2 flask mongodb postgresql python3 regex scrapy
Last synced: 01 Jan 2025
https://github.com/0memo07/web-crawler
Web Crawler with Python
beautifulsoup4 bs4 crawler crawlers crawling crawling-python web-crawler web-crawler-python web-crawling webcrawler
Last synced: 17 Nov 2024
https://github.com/hybridx/webscraper
webcrawler made from Beautiful soup
crawler flask google-dorks javascript python3 search-engine
Last synced: 13 Dec 2024
https://github.com/vshawn/tutiempo_crawler
a crawler for climate data on en.tutiempo.net
climate-data crawler tutiempo-crawler
Last synced: 19 Nov 2024
https://github.com/juliandavidmr/raptor
Lightweight tool for scanning web sites, works as spider. Once executed, starts scanning pages looking for websites to visit, with automatic indexing.
Last synced: 09 Nov 2024
https://github.com/inishchith/python-scripts
Some Scripts & Projects
crawler python-script python3 scripts youtube
Last synced: 19 Dec 2024
https://github.com/aprilnea/xjtlu
This is how to get all the network resources of XJTLU.
crawler gateway http-auth python spider web-crawler xjtlu
Last synced: 15 Nov 2024
https://github.com/kernelerr/pixivsync
Pixiv图片下载及同步工具
crawler pixiv pixiv-crawler python
Last synced: 19 Nov 2024
https://github.com/wenyalintw/job-scraper-bot
幫朋友做好玩的Telegram機器人,已部署到Heroku
amazon-web-services aws-s3 boto3 crawler google-drive google-drive-api heroku heroku-deployment python-telegram-bot scraper scraping scrapy telegram telegram-bot telegram-bot-api web-scraping
Last synced: 11 Nov 2024
https://github.com/ivan-alone/instastories-saver-cpp
Program to saving Instagram Stories - Rewritten to C++
api backup crawler grambler gramblr insta instagram instagram-stories instastories-saver instastory stories
Last synced: 19 Dec 2024
https://github.com/karambir/ugc-colleges
Python Script to extract college names from UGC, India website.
college crawler extract html-parser python python-script ugc
Last synced: 12 Dec 2024
https://github.com/cr0hn/feed-to-exporter
Get RSS Feed and export as Wordpress Post
Last synced: 07 Nov 2024
https://github.com/giscafer/airlevel-crawler
a demo of crawler for air-level.com
Last synced: 17 Nov 2024
https://github.com/frectonz/rampilo
A telegram crawler
crawler rust telegram telegram-crawler
Last synced: 14 Nov 2024
https://github.com/hktalent/scrapysite
ScrapySite,go Web Crawler(spider), scraping,intelligence gathering
crawler elasticsearch go scraping site spider web
Last synced: 19 Nov 2024
https://github.com/vinouno/BilibiliDanmuCrawler
一个从 bilibili.com 爬取弹幕并生成词云的 Python 项目
Last synced: 27 Oct 2024
https://github.com/holmofy/spring-spider
Spring Spider App Utility Library.
crawler java spider spring spring-spider
Last synced: 27 Oct 2024
https://github.com/zain-ul-din/lgu-crawler
LGU timetable Crawler
contribute crawler lahore-garrison-university lahore-garrison-university-timetable open-source
Last synced: 10 Dec 2024
https://github.com/librecodecoop/querido-diario-php
Brazilian government gazettes, accessible to everyone.
civic-tech crawler data-science gazette-crawler governments-gazettes govtech hacktoberfest open-data php php7 politics spider
Last synced: 29 Nov 2024
https://github.com/haxzie-xx/crode.js-node-web-crawler
Node.js Crawler built for open FTP sites for movie link collection.
Last synced: 19 Dec 2024
https://github.com/hxr16f/ss-grabber
Automation script for downloading user screenshots.
automation crawler downloader grabber lightshot screenshot script
Last synced: 27 Nov 2024
https://github.com/trudi-group/mc-crawler
A MobileCoin network crawler. Corresponding preprint available on arXiv (https://arxiv.org/pdf/2111.12364.pdf).
Last synced: 02 Dec 2024
https://github.com/sanmak/queue-web-crawler
This application is developed to crawl a website with queue that determines no of allowed concurrent connections and find all possible hyperlinks present within it and save it to CSV file.
async chai crawler csv hyperlinks mocha nodejs queue scrapper web
Last synced: 28 Nov 2024
https://github.com/surelle-ha/dogma
Dogma is a CLI tool that enables interaction with the GitHub API for the purpose of searching .env files with specified keywords. You can configure a GitHub token and use the crawler to search for keys in .env files across public repositories.
Last synced: 10 Nov 2024
https://github.com/eished/tujigu_crawler
tujigu.com 图集谷 node.js 多线程爬虫 tujigu crawler
Last synced: 02 Dec 2024
https://github.com/mirocow/yii2-crawler
Http concurrent crawler for Yii2
concurrency crawler guzzle yii2-extension
Last synced: 16 Nov 2024
https://github.com/coghost/iparse
To extract HTML/json content identified by CSS selectors(with bs4) with yaml config support
crawler parser parser-library python xkcd yaml
Last synced: 09 Nov 2024
https://github.com/vitorebatista/horoscopefree
The Astrology API Rest daily horoscope
crawler horoscope horoscope-crawler horoscopes-api
Last synced: 30 Nov 2024
https://github.com/mrrfv/webarchive
Crawls websites and saves found URLs to a file.
archive archiveteam archiving crawler crawling ia internet-archive scraper web-archiving web-scraping
Last synced: 27 Oct 2024
https://github.com/danielmorell/se_bot_checker
Validate search engine user agents and IP addresses.
crawler googlebot python search-engine spider
Last synced: 15 Oct 2024
https://github.com/alishahbazi81/jobcrawler
Job crawler robot which finds jobs on job board platforms like LinkedIn, Glassdoor, and indeed based on their post time and send them to a telegram channel
asp-net-core crawler jobs jobsearch telegram telegram-bot
Last synced: 11 Nov 2024
https://github.com/foolin/scrago
An simpe, fast, extensible crawl page framework for golang
Last synced: 09 Nov 2024
https://github.com/leelow/nightmare-screenshot-selector
👻 📷 A Nightmare plugin to easily take screenshots.
crawler headless-browsers javascript js nightmare nightmarejs nodejs plugin webcrawler
Last synced: 15 Nov 2024
https://github.com/birkhofflee/blizzard_forum.js
An unofficial Node.js API for Blizzard Forums. (works in 2019)
Last synced: 18 Nov 2024
https://github.com/stopka/fedicrawl
Collect feeds to follow on Fediverse nodes.
crawler docker fediverse nodejs prisma typescript
Last synced: 05 Nov 2024
https://github.com/sayakie/pixiv-crawler
Crawls images from Pixiv 🚀
crawler nodejs pixiv typescript
Last synced: 28 Oct 2024
https://github.com/mcstreetguy/crawler
An advanced web-crawler written in PHP.
composer composer-library crawler crawler-engine guzzle http-requests php php-7 php-library web-crawler webcrawler
Last synced: 12 Oct 2024
https://github.com/doroudi/imdb-crawler
imdb.com movies crawler in scrapy
crawler data-mining python scrapy
Last synced: 12 Dec 2024
https://github.com/omerdogan3/kitapp-crawler
Web Crawler Application of KitApp - Gets data from booksellers & insert them into database.
book bookseller crawler mysql nodejs puppeteer scrapper-script web-crawler
Last synced: 13 Dec 2024
https://github.com/moehmeni/ezweb
Easy to use web page analyzer
analyzer crawler scraper text-analysis text-classification text-mining webcrawler webcrawling webpage webscraper webscraping www
Last synced: 05 Nov 2024
https://github.com/robmch/mindfactory_crawling
A Python 3 Crawler for Mindfactory.de
crawler crawling data webcrawler webcrawling
Last synced: 17 Nov 2024
https://github.com/spencerlepine/readme-crawler
A Node.js web crawler to download README files and follow contained links. Fetch repositories from a valid GitHub URL
crawler javascript node nodejs readme scraper web-crawler webcrawer
Last synced: 13 Nov 2024
https://github.com/zurdi15/nbz
Bot to automate internet browsing
automation bot browser-automation browsermob-proxy crawler selenium testing web
Last synced: 15 Oct 2024
https://github.com/yjyoon-dev/nara-crawler
Crawler for National Archives Catalog
Last synced: 20 Nov 2024
https://github.com/itszeeshan/crawlinit
A web crawler written in python3
appsec bugbounty bugbounty-tool bugbountytips crawler crawler-python enumeration infosec python recon reconnaissance scanner url web
Last synced: 12 Oct 2024
https://github.com/code-inside/sloader
Worker that loads and retrieves data from "slow" endpoints.
Last synced: 16 Nov 2024
https://github.com/liyifeng1994/go-crawler
基于golang的分布式爬虫项目
crawler elastic elasticsearch golang
Last synced: 12 Nov 2024
https://github.com/manuel-lang/autonomous-semantic-search-engine
Submission for HackDataKIBots 2018 - Web crawler combined with document analysis
crawler hackathon machine-learning mannheim microsoft natural-language-processing natural-language-understanding nextiteration rnv semantic-search textract
Last synced: 13 Nov 2024
https://github.com/leomaurodesenv/smm-course-search
A package to searching courses - Super Mario Maker
bookmark-site crawler javascript json mario-game mario-maker nodejs
Last synced: 02 Nov 2024
https://github.com/vivekg13186/easy_web_crawler
Web crawler around puppeteer to crawler ajax/java script enabled pages.
Last synced: 09 Dec 2024
https://github.com/thiiagoms/dict-crawler
Simple crawler on UOL dictionary
beautifulsoup4 crawler dic python pythonic
Last synced: 15 Nov 2024