Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2024-12-25 00:05:56 UTC
- JSON Representation
https://github.com/akiosarkiz/manga-collector
The manga collector is a library designed to easily scrape manga content from various websites. This package is licensed under the MIT License and is fully test-covered
Last synced: 20 Nov 2024
https://github.com/feedeo/youtube-channel-crawler
YouTube Channel :tv: Crawler
crawler youtube youtube-channel
Last synced: 11 Oct 2024
https://github.com/ajcerejeira/base.gov.pt
A crawler that fetches data from base.gov.pt
Last synced: 06 Nov 2024
https://github.com/yggverse/yggstate
Yggdrasil Network Explorer
analytics crawler explorer geo-ip geo-location geolite2 mysql php search-engine sphinx spider yggdrasil yggdrasil-api yggdrasil-network yggdrasil-php-api yggdrasilctl yggstate
Last synced: 06 Nov 2024
https://github.com/poyea/coronaflight-hkg
😷 Crawler and history manager for dangerous, coronavirus-infected flights to Hong Kong (VHHH)
corona coronaflight-hkg coronavirus coronavirus-analysis coronavirus-info coronavirus-tracker coronavirus-tracking crawl crawler crawlers crawling hacktoberfest hong-kong hongkong javascript json json-api node node-js nodejs
Last synced: 28 Oct 2024
https://github.com/baraja-core/webcrawler
Simple crawling websites by following links.
bot crawler crawling-websites fast php robot speed
Last synced: 06 Nov 2024
https://github.com/AmirAref/Torobot
an inline telegram robot to easy access and search in torob.com products from telegram.
crawler python python-telegram-bot scraper telegtam-bot
Last synced: 22 Nov 2024
https://github.com/vshawn/tutiempo_crawler
a crawler for climate data on en.tutiempo.net
climate-data crawler tutiempo-crawler
Last synced: 19 Nov 2024
https://github.com/sayyid5416/links-extractor
Extract links from any file or the website.
crawler extract-links extractor links-extraction scraper web-crawler web-scraper
Last synced: 28 Oct 2024
https://github.com/sweeticelolly/sao_title_bot
一个生成骚论文题目的机器人
chrome-dr chromedriver crawler generator language-learning language-model numpy python robot scholar scholarly-articles selenium selenium-webdriver
Last synced: 24 Nov 2024
https://github.com/jsrei/page-redirect-code-location-hook
JS逆向技巧:页面跳转JS代码定位通杀方案
Last synced: 16 Nov 2024
https://github.com/dotenorio/freeloader-of-data
A simple crawler or scraper to get open graph and other meta data from any website.
crawler graph hacktoberfest meta-data open-graph scraper
Last synced: 25 Oct 2024
https://github.com/bernabe9/render-it
Render any JavaScript content to create static sites ready for SEO
crawler javascript prerender prerenderio puppeteer render seo seo-tools server-side-rendering static-site static-site-generator
Last synced: 07 Nov 2024
https://github.com/integralist/go-web-crawler
A web crawler built in the Go programming language
concurrency crawler go golang web-crawler
Last synced: 11 Oct 2024
https://github.com/gabfl/sitecrawl
Simple Python module to crawl a website and extract URLs
crawl crawler crawler-python crawling-sites
Last synced: 13 Oct 2024
https://github.com/bitscoper/bitscoper_cyber_toolbox
A Flutter application consisting of TCP Port Scanner, Route Tracer, Pinger, File Hash Calculator, String Hash Calculator, Base Encoder, Morse Code Translator, Open Graph Protocol Data Extractor, Series URI Crawler, DNS Record Retriever, and WHOIS Retriever.
android calculator crawler cybersecurity dart decoder docker encoder extractor flutter github-action ios mac retriever scanner tracer translator web windows
Last synced: 05 Dec 2024
https://github.com/arshadkazmi42/blc
Broken link checker
blc broken-link-checker broken-link-finder bug-bounty bugbounty crawler python
Last synced: 28 Oct 2024
https://github.com/nobodxbodon/chromecrawlerwildspider
Chrome Extension to crawl web pages by loading them into browser tabs parallelly.
chrome-extension crawler localstorage spider
Last synced: 30 Nov 2024
https://github.com/0memo07/web-crawler
Web Crawler with Python
beautifulsoup4 bs4 crawler crawlers crawling crawling-python web-crawler web-crawler-python web-crawling webcrawler
Last synced: 17 Nov 2024
https://github.com/synacktraa/crawl
Web crawler designed to efficiently retrieve unique href, script and form links from a web application.
bash crawler regex shell web-spidering
Last synced: 26 Nov 2024
https://github.com/coghost/iparse
To extract HTML/json content identified by CSS selectors(with bs4) with yaml config support
crawler parser parser-library python xkcd yaml
Last synced: 09 Nov 2024
https://github.com/ivan-alone/instastories-saver-cpp
Program to saving Instagram Stories - Rewritten to C++
api backup crawler grambler gramblr insta instagram instagram-stories instastories-saver instastory stories
Last synced: 19 Dec 2024
https://github.com/mirocow/yii2-crawler
Http concurrent crawler for Yii2
concurrency crawler guzzle yii2-extension
Last synced: 16 Nov 2024
https://github.com/cr0hn/feed-to-exporter
Get RSS Feed and export as Wordpress Post
Last synced: 07 Nov 2024
https://github.com/vinouno/BilibiliDanmuCrawler
一个从 bilibili.com 爬取弹幕并生成词云的 Python 项目
Last synced: 27 Oct 2024
https://github.com/omerdogan3/kitapp-crawler
Web Crawler Application of KitApp - Gets data from booksellers & insert them into database.
book bookseller crawler mysql nodejs puppeteer scrapper-script web-crawler
Last synced: 13 Dec 2024
https://github.com/juliandavidmr/raptor
Lightweight tool for scanning web sites, works as spider. Once executed, starts scanning pages looking for websites to visit, with automatic indexing.
Last synced: 09 Nov 2024
https://github.com/sanmak/queue-web-crawler
This application is developed to crawl a website with queue that determines no of allowed concurrent connections and find all possible hyperlinks present within it and save it to CSV file.
async chai crawler csv hyperlinks mocha nodejs queue scrapper web
Last synced: 28 Nov 2024
https://github.com/librecodecoop/querido-diario-php
Brazilian government gazettes, accessible to everyone.
civic-tech crawler data-science gazette-crawler governments-gazettes govtech hacktoberfest open-data php php7 politics spider
Last synced: 29 Nov 2024
https://github.com/haxzie-xx/crode.js-node-web-crawler
Node.js Crawler built for open FTP sites for movie link collection.
Last synced: 19 Dec 2024
https://github.com/holmofy/spring-spider
Spring Spider App Utility Library.
crawler java spider spring spring-spider
Last synced: 27 Oct 2024
https://github.com/sayakie/pixiv-crawler
Crawls images from Pixiv 🚀
crawler nodejs pixiv typescript
Last synced: 28 Oct 2024
https://github.com/itszeeshan/crawlinit
A web crawler written in python3
appsec bugbounty bugbounty-tool bugbountytips crawler crawler-python enumeration infosec python recon reconnaissance scanner url web
Last synced: 12 Oct 2024
https://github.com/liyifeng1994/go-crawler
基于golang的分布式爬虫项目
crawler elastic elasticsearch golang
Last synced: 12 Nov 2024
https://github.com/code-inside/sloader
Worker that loads and retrieves data from "slow" endpoints.
Last synced: 16 Nov 2024
https://github.com/yjyoon-dev/nara-crawler
Crawler for National Archives Catalog
Last synced: 20 Nov 2024
https://github.com/karambir/ugc-colleges
Python Script to extract college names from UGC, India website.
college crawler extract html-parser python python-script ugc
Last synced: 12 Dec 2024
https://github.com/spencerlepine/readme-crawler
A Node.js web crawler to download README files and follow contained links. Fetch repositories from a valid GitHub URL
crawler javascript node nodejs readme scraper web-crawler webcrawer
Last synced: 13 Nov 2024
https://github.com/hxr16f/ss-grabber
Automation script for downloading user screenshots.
automation crawler downloader grabber lightshot screenshot script
Last synced: 27 Nov 2024
https://github.com/manuel-lang/autonomous-semantic-search-engine
Submission for HackDataKIBots 2018 - Web crawler combined with document analysis
crawler hackathon machine-learning mannheim microsoft natural-language-processing natural-language-understanding nextiteration rnv semantic-search textract
Last synced: 13 Nov 2024
https://github.com/robmch/mindfactory_crawling
A Python 3 Crawler for Mindfactory.de
crawler crawling data webcrawler webcrawling
Last synced: 17 Nov 2024
https://github.com/zain-ul-din/lgu-crawler
LGU timetable Crawler
contribute crawler lahore-garrison-university lahore-garrison-university-timetable open-source
Last synced: 10 Dec 2024
https://github.com/surelle-ha/dogma
Dogma is a CLI tool that enables interaction with the GitHub API for the purpose of searching .env files with specified keywords. You can configure a GitHub token and use the crawler to search for keys in .env files across public repositories.
Last synced: 10 Nov 2024
https://github.com/birkhofflee/blizzard_forum.js
An unofficial Node.js API for Blizzard Forums. (works in 2019)
Last synced: 18 Nov 2024
https://github.com/trudi-group/mc-crawler
A MobileCoin network crawler. Corresponding preprint available on arXiv (https://arxiv.org/pdf/2111.12364.pdf).
Last synced: 02 Dec 2024
https://github.com/mrrfv/webarchive
Crawls websites and saves found URLs to a file.
archive archiveteam archiving crawler crawling ia internet-archive scraper web-archiving web-scraping
Last synced: 27 Oct 2024
https://github.com/frectonz/rampilo
A telegram crawler
crawler rust telegram telegram-crawler
Last synced: 14 Nov 2024
https://github.com/kernelerr/pixivsync
Pixiv图片下载及同步工具
crawler pixiv pixiv-crawler python
Last synced: 19 Nov 2024
https://github.com/eished/tujigu_crawler
tujigu.com 图集谷 node.js 多线程爬虫 tujigu crawler
Last synced: 02 Dec 2024
https://github.com/wenyalintw/job-scraper-bot
幫朋友做好玩的Telegram機器人,已部署到Heroku
amazon-web-services aws-s3 boto3 crawler google-drive google-drive-api heroku heroku-deployment python-telegram-bot scraper scraping scrapy telegram telegram-bot telegram-bot-api web-scraping
Last synced: 11 Nov 2024
https://github.com/moehmeni/ezweb
Easy to use web page analyzer
analyzer crawler scraper text-analysis text-classification text-mining webcrawler webcrawling webpage webscraper webscraping www
Last synced: 05 Nov 2024
https://github.com/alishahbazi81/jobcrawler
Job crawler robot which finds jobs on job board platforms like LinkedIn, Glassdoor, and indeed based on their post time and send them to a telegram channel
asp-net-core crawler jobs jobsearch telegram telegram-bot
Last synced: 11 Nov 2024
https://github.com/danielmorell/se_bot_checker
Validate search engine user agents and IP addresses.
crawler googlebot python search-engine spider
Last synced: 15 Oct 2024
https://github.com/doroudi/imdb-crawler
imdb.com movies crawler in scrapy
crawler data-mining python scrapy
Last synced: 12 Dec 2024
https://github.com/leomaurodesenv/smm-course-search
A package to searching courses - Super Mario Maker
bookmark-site crawler javascript json mario-game mario-maker nodejs
Last synced: 02 Nov 2024
https://github.com/foolin/scrago
An simpe, fast, extensible crawl page framework for golang
Last synced: 09 Nov 2024
https://github.com/stopka/fedicrawl
Collect feeds to follow on Fediverse nodes.
crawler docker fediverse nodejs prisma typescript
Last synced: 05 Nov 2024
https://github.com/giscafer/airlevel-crawler
a demo of crawler for air-level.com
Last synced: 17 Nov 2024
https://github.com/leelow/nightmare-screenshot-selector
👻 📷 A Nightmare plugin to easily take screenshots.
crawler headless-browsers javascript js nightmare nightmarejs nodejs plugin webcrawler
Last synced: 15 Nov 2024
https://github.com/vitorebatista/horoscopefree
The Astrology API Rest daily horoscope
crawler horoscope horoscope-crawler horoscopes-api
Last synced: 30 Nov 2024
https://github.com/inishchith/python-scripts
Some Scripts & Projects
crawler python-script python3 scripts youtube
Last synced: 19 Dec 2024
https://github.com/aprilnea/xjtlu
This is how to get all the network resources of XJTLU.
crawler gateway http-auth python spider web-crawler xjtlu
Last synced: 15 Nov 2024
https://github.com/mcstreetguy/crawler
An advanced web-crawler written in PHP.
composer composer-library crawler crawler-engine guzzle http-requests php php-7 php-library web-crawler webcrawler
Last synced: 12 Oct 2024
https://github.com/zurdi15/nbz
Bot to automate internet browsing
automation bot browser-automation browsermob-proxy crawler selenium testing web
Last synced: 15 Oct 2024
https://github.com/hktalent/scrapysite
ScrapySite,go Web Crawler(spider), scraping,intelligence gathering
crawler elasticsearch go scraping site spider web
Last synced: 19 Nov 2024
https://github.com/hrvadl/goweekly
Application for querying top articles from https://golangweekly.com/, translating them to Ukrainian and sending to the telegram channel
article chatgpt crawler go golang openai-api telegram telegram-bot
Last synced: 13 Oct 2024
https://github.com/rimiti/ping-urls
🏓 Ping URLs by batch.
cache crawler ping prerender prerendering seo
Last synced: 28 Dec 2024
https://github.com/tokenmill/crawling-framework-example
Demonstration on how to use the Crawling Framework to setup a simple science news crawler and store results in ElasticSearch. Use this configuration to set up your own crawler.
crawler crawling-framework elasticsearch storm-crawler
Last synced: 10 Nov 2024
https://github.com/waynechang65/baha-crawler
baha-crawler is a web crawler module designed to scarp data from Bahamut Forum.
bahamut crawler javascript nodejs scraper spider webcrawler
Last synced: 19 Oct 2024
https://github.com/sergioburdisso/solidscraper
Easy to use JQuery-Like API for Web Scraping/Crawling.
crawler crawling crawling-python jquery python scraper scraping tweets twitter web web-crawler web-scraping webscraping
Last synced: 23 Nov 2024