Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2024-11-16 00:05:55 UTC
- JSON Representation
https://github.com/adileo/MicroFrontier
A lightweight crawler frontier implementation in TypeScript using Redis.
crawler frontier microservice redis robots-txt spider
Last synced: 14 Nov 2024
https://github.com/tsoliangwu0130/spotify-news
A Flask application to retrieve the singers' latest news according to your Spotify current playing song.
bootstrap crawler flask oauth2 python3 restful-api spotify-api
Last synced: 11 Nov 2024
https://github.com/shawon922/jobs-crawler
Crawl IT/Telecommunication jobs from bdjobs.com
beautifulsoup4 crawler python3
Last synced: 09 Nov 2024
https://github.com/spekulatius/spatie-crawler-cached-queue-example
Example to demonstrate the usage of cached queues across multiple requests.
crawler crawler-engine laravel php-crawler php-scraper queues spatie-crawler
Last synced: 12 Nov 2024
https://github.com/logocomune/botdetector
BotDetector is a golang library that detects Bot/Spider/Crawler from user agent
botdetector bots crawler go golang golang-library spider user-agent
Last synced: 11 Nov 2024
https://github.com/tosone/githubtraveler
Travel all of the GitHub users, orgs, repos.
Last synced: 06 Nov 2024
https://github.com/sabinbajracharya/Insta-crawler
Pulls data from instagram and saves it to Firebase for storage and Algolia for search
accounts algolia algolia-search crawler firebase firebase-database instagram instagram-feed instagram-post javascript nodejs public scraper
Last synced: 07 Nov 2024
https://github.com/trungdq88/movie-showtimes
Web Service & Android Application to look up Vietnam movie showtimes
crawler java movie-showtimes theater
Last synced: 31 Oct 2024
https://github.com/eight04/ptt-mail-backup
一個用來抓取 PTT 站內信的 BBS Bot
bbs cli crawler ptt ptt-crawler python python3
Last synced: 28 Oct 2024
https://github.com/igeligel/BackpackLogin
:arrow_forward: A .NET core library to handle the login to Backpack.tf. Backpack.tf is a trading site for Team Fortress 2, Counter-Strike: Global Offensive, and Dota 2. Community item pricing, item trading and stats, and much more.
bot bot-framework crawler csgo csgo-bot steam steam-api steambot steamweb teamfortress2
Last synced: 13 Nov 2024
https://github.com/bbc2/discolinks
Command-line tool which checks a website for broken links.
broken-links crawler html http link-checker link-checkers link-checking validator web
Last synced: 28 Oct 2024
https://github.com/umihico/minigun-requests
Web scraping API to outsource tons of GET & xpath to cloud computing
crawler crawling scraping scraping-api scraping-framework scraping-python web-scraping
Last synced: 15 Nov 2024
https://github.com/fanzeyi/torchic
A generic search engine built using Go & Spring & Redis. Project for Google's CodeU event.
Last synced: 21 Oct 2024
https://github.com/pawod/gis-berlin-rents
A web crawler for ImmobilienScout24.de, that has been implemented for a small project at the institue of geographic sciences of the Free University of Berlin.
apartment-rents berlin crawler gis immobilienscout24
Last synced: 04 Nov 2024
https://github.com/sebobo/shel.crawler
Neos based crawler for nodes and sites
Last synced: 14 Oct 2024
https://github.com/luyadev/luya-module-crawler
Crawle a Website and provide intelligent search results
crawler hacktoberfest intelligent-search luya search yii2
Last synced: 10 Oct 2024
https://github.com/yerkopalma/bash-crawler
:computer: Get a site links with bash
Last synced: 13 Oct 2024
https://github.com/nakabonne/webcrawlerforserps
Web crawler that scrapes Google search results
Last synced: 24 Oct 2024
https://github.com/amirhoseinsb/Cloud_Player_V2
You can use the cloudplayer tool to listen to the music of the singer you want without going to a specific website and at a very high speed.
cloud-player crawler crawling music music-player programming python url-player
Last synced: 04 Aug 2024
https://github.com/softmarshmallow/inked-news-crawler
🕷 korean news source crawler (realtime & bulk)
crawler naver-news python3 scrapy
Last synced: 11 Oct 2024
https://github.com/bajins/scripts_python
Python 脚本
crawler faker faker-generator python-3 python3 rclone rclone-client rclone-config rclone-configuration reptile reptile-image reptiles scraper spider
Last synced: 12 Nov 2024
https://github.com/yaroslaff/bulk-http-check
Very fast and simple concurrent HTTP client (3500 HTTP req/s)
bulk check concurrent connections crawler header http https multiple parallel spider status
Last synced: 07 Nov 2024
https://github.com/codenashwan/telegrambot_instadp
A simple BOT Telegram to downloading Instagram profiles photo
api crawler crawling instagram instagram-api instagram-bot instagramscraper laravel php scraper telegram telegram-api telegram-bot webhook
Last synced: 08 Nov 2024
https://github.com/omilab/internet-archive-link-extractor
Tool for extracting external links of a URL from Internet Archive snapshots
Last synced: 07 Aug 2024
https://github.com/appliedsoul/promise-crawler
Promise support for node-crawler (Web Crawler/Spider for NodeJS + server-side jQuery)
crawler node-crawler nodejs promise-node-crawler spider
Last synced: 08 Nov 2024
https://github.com/fedebotu/neurips2022-openreviewdata
Crawl & Visualize NeurIPS 2022 Data from OpenReview
crawler dataset neurips neurips-2022 openreview peer-review review scraper
Last synced: 06 Nov 2024
https://github.com/drogbadvc/crawlit
This project is a web crawler based on Scrapy, visualization 2D, PageRank
Last synced: 08 Nov 2024
https://github.com/duongdev/facebook-group-crawler
Facebook Groups Discussions Crawler
crawler facebook groups puppeteer
Last synced: 12 Nov 2024
https://github.com/dori-dev/quotes-crawler
Quotes crawler using scrapy and python.
crawler crawling python scraping-python scraping-websites scrapy scrapy-crawler scrapy-spider web-scraper
Last synced: 09 Nov 2024
https://github.com/thesp0nge/nightcrawler
A python program that crawls a website and tries to stress it, polluting forms with bogus data
crawler offensive-scripts offensive-security stress-test web-crawler web-crawling
Last synced: 12 Oct 2024
https://github.com/torhamdev/death-engine
A powerful recon tool
crawler death-engine directory-search google-dorks hacking-tool information-gathering pentesting pentesting-tools port-scanning python3 recon recon-tools scanner web-hacking web-penetration-testing webhacking webpentest whois
Last synced: 15 Nov 2024
https://github.com/windfarer/biu
biubiubiu~~ I'm a tiny web crawler framework
crawler python spider spider-framework web-crawler
Last synced: 28 Oct 2024
https://github.com/bfwg/node-tinycrawler
Tiny web-crawler in a nute shell for Node.js
Last synced: 11 Oct 2024
https://github.com/the1812/bingwallpapers
A tool for downloading wallpapers from Bing.
Last synced: 04 Nov 2024
https://github.com/jean-baptiste-camps/iiif-crawler
Interrogate IIIF servers and get images of manuscripts
crawler iiif iiif-image manuscripts
Last synced: 11 Oct 2024
https://github.com/capturr/jsonld-extract
A damn simple tool to extract json-ld metadata from webpage using jquery like api (jQuery, Cheerio, CashDom ...).
cashdom cheerio crawler crawling data extract extractor javascript jquery json jsonld metadata nodejs parser scraper scraping spider typescript
Last synced: 28 Oct 2024
https://github.com/gabfl/sitecrawl
Simple Python module to crawl a website and extract URLs
crawl crawler crawler-python crawling-sites
Last synced: 13 Oct 2024
https://github.com/dotenorio/freeloader-of-data
A simple crawler or scraper to get open graph and other meta data from any website.
crawler graph hacktoberfest meta-data open-graph scraper
Last synced: 25 Oct 2024
https://github.com/ajcerejeira/base.gov.pt
A crawler that fetches data from base.gov.pt
Last synced: 06 Nov 2024
https://github.com/simin75simin/libgencrawl
crawl all books from a library genesis search
crawler free-software libgen python3 scraper
Last synced: 05 Nov 2024
https://github.com/chusiang/crawler-book-info
A crawler for quick parser the book information
Last synced: 07 Nov 2024
https://github.com/arshadkazmi42/blc
Broken link checker
blc broken-link-checker broken-link-finder bug-bounty bugbounty crawler python
Last synced: 28 Oct 2024
https://github.com/luizppa/web-crawler
A web crawler that collects and indexes web pages. Made with chilkat and gumbo parser.
chilkat cpp crawler webcrawler
Last synced: 28 Oct 2024
https://github.com/sayyid5416/links-extractor
Extract links from any file or the website.
crawler extract-links extractor links-extraction scraper web-crawler web-scraper
Last synced: 28 Oct 2024
https://github.com/AmirAref/DivarCrawler
an script to crawl divar.ir and extract phone numbers
Last synced: 05 Aug 2024
https://github.com/AmirAref/Torobot
an inline telegram robot to easy access and search in torob.com products from telegram.
crawler python python-telegram-bot scraper telegtam-bot
Last synced: 05 Aug 2024
https://github.com/yggverse/yggstate
Yggdrasil Network Explorer
analytics crawler explorer geo-ip geo-location geolite2 mysql php search-engine sphinx spider yggdrasil yggdrasil-api yggdrasil-network yggdrasil-php-api yggdrasilctl yggstate
Last synced: 06 Nov 2024
https://github.com/bugfishtm/bugfish-image-downloader
💾 Bugfish Image Downloader: Effortless web image downloads, subsite exploration, and HD selection. Windows app, .NET 4.5, no registry usage. Download now!
bugfish bugfish-software bugfishtm crawler downloader downloadmanager downloadtool gplv3 image imagedownloader imagedownloadertool imageprocessing portable-executable portableapps software utilityapp webscraping windows windows-desktop
Last synced: 06 Nov 2024
https://github.com/feedeo/youtube-channel-crawler
YouTube Channel :tv: Crawler
crawler youtube youtube-channel
Last synced: 11 Oct 2024
https://github.com/bernabe9/render-it
Render any JavaScript content to create static sites ready for SEO
crawler javascript prerender prerenderio puppeteer render seo seo-tools server-side-rendering static-site static-site-generator
Last synced: 07 Nov 2024
https://github.com/poyea/coronaflight-hkg
😷 Crawler and history manager for dangerous, coronavirus-infected flights to Hong Kong (VHHH)
corona coronaflight-hkg coronavirus coronavirus-analysis coronavirus-info coronavirus-tracker coronavirus-tracking crawl crawler crawlers crawling hacktoberfest hong-kong hongkong javascript json json-api node node-js nodejs
Last synced: 28 Oct 2024
https://github.com/karambir/ugc-colleges
Python Script to extract college names from UGC, India website.
college crawler extract html-parser python python-script ugc
Last synced: 24 Oct 2024
https://github.com/integralist/go-web-crawler
A web crawler built in the Go programming language
concurrency crawler go golang web-crawler
Last synced: 11 Oct 2024
https://github.com/baraja-core/webcrawler
Simple crawling websites by following links.
bot crawler crawling-websites fast php robot speed
Last synced: 06 Nov 2024
https://github.com/aprilnea/xjtlu
This is how to get all the network resources of XJTLU.
crawler gateway http-auth python spider web-crawler xjtlu
Last synced: 15 Nov 2024
https://github.com/foolin/scrago
An simpe, fast, extensible crawl page framework for golang
Last synced: 09 Nov 2024
https://github.com/zurdi15/nbz
Bot to automate internet browsing
automation bot browser-automation browsermob-proxy crawler selenium testing web
Last synced: 15 Oct 2024
https://github.com/coghost/iparse
To extract HTML/json content identified by CSS selectors(with bs4) with yaml config support
crawler parser parser-library python xkcd yaml
Last synced: 09 Nov 2024
https://github.com/mcstreetguy/crawler
An advanced web-crawler written in PHP.
composer composer-library crawler crawler-engine guzzle http-requests php php-7 php-library web-crawler webcrawler
Last synced: 12 Oct 2024
https://github.com/sayakie/pixiv-crawler
Crawls images from Pixiv 🚀
crawler nodejs pixiv typescript
Last synced: 28 Oct 2024
https://github.com/surelle-ha/dogma
Dogma is a CLI tool that enables interaction with the GitHub API for the purpose of searching .env files with specified keywords. You can configure a GitHub token and use the crawler to search for keys in .env files across public repositories.
Last synced: 10 Nov 2024
https://github.com/ivan-alone/instastories-saver-cpp
Program to saving Instagram Stories - Rewritten to C++
api backup crawler grambler gramblr insta instagram instagram-stories instastories-saver instastory stories
Last synced: 31 Oct 2024
https://github.com/danielmorell/se_bot_checker
Validate search engine user agents and IP addresses.
crawler googlebot python search-engine spider
Last synced: 15 Oct 2024
https://github.com/mrrfv/webarchive
Crawls websites and saves found URLs to a file.
archive archiveteam archiving crawler crawling ia internet-archive scraper web-archiving web-scraping
Last synced: 27 Oct 2024
https://github.com/spencerlepine/readme-crawler
A Node.js web crawler to download README files and follow contained links. Fetch repositories from a valid GitHub URL
crawler javascript node nodejs readme scraper web-crawler webcrawer
Last synced: 13 Nov 2024
https://github.com/manuel-lang/autonomous-semantic-search-engine
Submission for HackDataKIBots 2018 - Web crawler combined with document analysis
crawler hackathon machine-learning mannheim microsoft natural-language-processing natural-language-understanding nextiteration rnv semantic-search textract
Last synced: 13 Nov 2024
https://github.com/leelow/nightmare-screenshot-selector
👻 📷 A Nightmare plugin to easily take screenshots.
crawler headless-browsers javascript js nightmare nightmarejs nodejs plugin webcrawler
Last synced: 15 Nov 2024
https://github.com/itszeeshan/crawlinit
A web crawler written in python3
appsec bugbounty bugbounty-tool bugbountytips crawler crawler-python enumeration infosec python recon reconnaissance scanner url web
Last synced: 12 Oct 2024
https://github.com/moehmeni/ezweb
Easy to use web page analyzer
analyzer crawler scraper text-analysis text-classification text-mining webcrawler webcrawling webpage webscraper webscraping www
Last synced: 05 Nov 2024
https://github.com/frectonz/rampilo
A telegram crawler
crawler rust telegram telegram-crawler
Last synced: 14 Nov 2024
https://github.com/kernelerr/pixivsync
Pixiv图片下载及同步工具
crawler pixiv pixiv-crawler python
Last synced: 12 Oct 2024
https://github.com/wenyalintw/job-scraper-bot
幫朋友做好玩的Telegram機器人,已部署到Heroku
amazon-web-services aws-s3 boto3 crawler google-drive google-drive-api heroku heroku-deployment python-telegram-bot scraper scraping scrapy telegram telegram-bot telegram-bot-api web-scraping
Last synced: 11 Nov 2024
https://github.com/vinouno/BilibiliDanmuCrawler
一个从 bilibili.com 爬取弹幕并生成词云的 Python 项目
Last synced: 27 Oct 2024
https://github.com/juliandavidmr/raptor
Lightweight tool for scanning web sites, works as spider. Once executed, starts scanning pages looking for websites to visit, with automatic indexing.
Last synced: 09 Nov 2024
https://github.com/stopka/fedicrawl
Collect feeds to follow on Fediverse nodes.
crawler docker fediverse nodejs prisma typescript
Last synced: 05 Nov 2024
https://github.com/leomaurodesenv/smm-course-search
A package to searching courses - Super Mario Maker
bookmark-site crawler javascript json mario-game mario-maker nodejs
Last synced: 02 Nov 2024
https://github.com/alishahbazi81/jobcrawler
Job crawler robot which finds jobs on job board platforms like LinkedIn, Glassdoor, and indeed based on their post time and send them to a telegram channel
asp-net-core crawler jobs jobsearch telegram telegram-bot
Last synced: 11 Nov 2024
https://github.com/liyifeng1994/go-crawler
基于golang的分布式爬虫项目
crawler elastic elasticsearch golang
Last synced: 12 Nov 2024