Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2024-11-16 00:05:55 UTC
- JSON Representation
https://github.com/root4loot/recrawl
A Web URL crawler written in Go
bugbounty crawler discovery enumeration go golang recon reconnaissance web
Last synced: 06 Nov 2024
https://github.com/matheuscas/pycnpj-crawler
Mais um módulo para extrair dados de empresas a partir do CNPJ
Last synced: 02 Oct 2024
https://github.com/hironsan/japanese-news-crawler
A complete automated japanese news crawler built on the top of Scrapy framework
Last synced: 27 Oct 2024
https://github.com/twtrubiks/google-play-store-spider-selenium
Google-Play-Store-spider use Selenium +Beautiful Soup on Python
beautifulsoup chrome crawler firefox python selenium spider sqlite
Last synced: 16 Nov 2024
https://github.com/sanix-darker/ziim
Let your CLI find available solutions for errors / exceptions online on commands you hit, for you, no need open a Browser. and find something yourself
cli crawler error-correcting-codes error-handling exception-handler exception-handling exceptions javascript python scraper stackoverflow stackoverflow-api stackoverflow-questions
Last synced: 14 Oct 2024
https://github.com/shawon922/jobs-crawler
Crawl IT/Telecommunication jobs from bdjobs.com
beautifulsoup4 crawler python3
Last synced: 09 Nov 2024
https://github.com/sabinbajracharya/Insta-crawler
Pulls data from instagram and saves it to Firebase for storage and Algolia for search
accounts algolia algolia-search crawler firebase firebase-database instagram instagram-feed instagram-post javascript nodejs public scraper
Last synced: 07 Nov 2024
https://github.com/tosone/githubtraveler
Travel all of the GitHub users, orgs, repos.
Last synced: 06 Nov 2024
https://github.com/pceuropa/youtube-crawler
Youtube crawler & scraper based on scrapy. Written in Python3.
crawler csv mariadb python3 scraper scrapy sqlalchemy youtube
Last synced: 13 Nov 2024
https://github.com/sebobo/shel.crawler
Neos based crawler for nodes and sites
Last synced: 14 Oct 2024
https://github.com/pps-22-scooby/pps-22-scooby
Scala application that allows web crawling and web scraping of web pages given as input with the use of special rules passed to it through the use of a DSL.
crawler crawlers internal-dsl scala scraper scrapers web web-crawler web-crawling web-scraper web-scrapers
Last synced: 14 Oct 2024
https://github.com/piotrpdev/webuy-cex-price-tracker
A python script that gets the prices of certain Cex products and uploads them to google sheets
cex cex-api cex-products crawler google-sheets gspread prices python-script webuy webuy-api webuy-cex webuycex
Last synced: 13 Nov 2024
https://github.com/piotrpdev/WeBuy-Cex-Price-Tracker
A python script that gets the prices of certain Cex products and uploads them to google sheets
cex cex-api cex-products crawler google-sheets gspread prices python-script webuy webuy-api webuy-cex webuycex
Last synced: 23 Oct 2024
https://github.com/trungdq88/movie-showtimes
Web Service & Android Application to look up Vietnam movie showtimes
crawler java movie-showtimes theater
Last synced: 31 Oct 2024
https://github.com/pawod/gis-berlin-rents
A web crawler for ImmobilienScout24.de, that has been implemented for a small project at the institue of geographic sciences of the Free University of Berlin.
apartment-rents berlin crawler gis immobilienscout24
Last synced: 04 Nov 2024
https://github.com/umihico/minigun-requests
Web scraping API to outsource tons of GET & xpath to cloud computing
crawler crawling scraping scraping-api scraping-framework scraping-python web-scraping
Last synced: 15 Nov 2024
https://github.com/twtrubiks/pttstatistics
統計PTT看板推文 or 文章標題 熱門關鍵詞 on python
crawler ptt ptt-hot-key python statistics
Last synced: 16 Nov 2024
https://github.com/eight04/ptt-mail-backup
一個用來抓取 PTT 站內信的 BBS Bot
bbs cli crawler ptt ptt-crawler python python3
Last synced: 28 Oct 2024
https://github.com/logocomune/botdetector
BotDetector is a golang library that detects Bot/Spider/Crawler from user agent
botdetector bots crawler go golang golang-library spider user-agent
Last synced: 11 Nov 2024
https://github.com/activatedgeek/winemag-dataset
Dataset of Wine Reviews from Wine Enthusiast Magazine :grapes: :wine_glass: :earth_asia:
crawler dataset python3 scrapy scrapy-spider vega-lite visualization wine wine-tasting
Last synced: 14 Oct 2024
https://github.com/igeligel/BackpackLogin
:arrow_forward: A .NET core library to handle the login to Backpack.tf. Backpack.tf is a trading site for Team Fortress 2, Counter-Strike: Global Offensive, and Dota 2. Community item pricing, item trading and stats, and much more.
bot bot-framework crawler csgo csgo-bot steam steam-api steambot steamweb teamfortress2
Last synced: 13 Nov 2024
https://github.com/adileo/MicroFrontier
A lightweight crawler frontier implementation in TypeScript using Redis.
crawler frontier microservice redis robots-txt spider
Last synced: 14 Nov 2024
https://github.com/luyadev/luya-module-crawler
Crawle a Website and provide intelligent search results
crawler hacktoberfest intelligent-search luya search yii2
Last synced: 10 Oct 2024
https://github.com/petersonjr/MetadataCrawler
A simple tool to extract metadata from relational databases
avro crawler database-schemas java jdbc metadata rdms relational-databases
Last synced: 13 Nov 2024
https://github.com/keul/allanon
A Web crawler that visit a predictable set of URLs, and automatically download resources you want from them
Last synced: 11 Nov 2024
https://github.com/spekulatius/spatie-crawler-cached-queue-example
Example to demonstrate the usage of cached queues across multiple requests.
crawler crawler-engine laravel php-crawler php-scraper queues spatie-crawler
Last synced: 12 Nov 2024
https://github.com/fanzeyi/torchic
A generic search engine built using Go & Spring & Redis. Project for Google's CodeU event.
Last synced: 21 Oct 2024
https://github.com/tsoliangwu0130/spotify-news
A Flask application to retrieve the singers' latest news according to your Spotify current playing song.
bootstrap crawler flask oauth2 python3 restful-api spotify-api
Last synced: 11 Nov 2024
https://github.com/bbc2/discolinks
Command-line tool which checks a website for broken links.
broken-links crawler html http link-checker link-checkers link-checking validator web
Last synced: 28 Oct 2024
https://github.com/bajins/scripts_python
Python 脚本
crawler faker faker-generator python-3 python3 rclone rclone-client rclone-config rclone-configuration reptile reptile-image reptiles scraper spider
Last synced: 12 Nov 2024
https://github.com/the1812/bingwallpapers
A tool for downloading wallpapers from Bing.
Last synced: 04 Nov 2024
https://github.com/bfwg/node-tinycrawler
Tiny web-crawler in a nute shell for Node.js
Last synced: 11 Oct 2024
https://github.com/softmarshmallow/inked-news-crawler
🕷 korean news source crawler (realtime & bulk)
crawler naver-news python3 scrapy
Last synced: 11 Oct 2024
https://github.com/torhamdev/death-engine
A powerful recon tool
crawler death-engine directory-search google-dorks hacking-tool information-gathering pentesting pentesting-tools port-scanning python3 recon recon-tools scanner web-hacking web-penetration-testing webhacking webpentest whois
Last synced: 15 Nov 2024
https://github.com/insign/spatie-crawler-queue-with-laravel-model
Spatie's Crawler with Laravel Model as Queue
cache crawler eloquent laravel queues spatie spatie-crawler
Last synced: 16 Nov 2024
https://github.com/nakabonne/webcrawlerforserps
Web crawler that scrapes Google search results
Last synced: 24 Oct 2024
https://github.com/windfarer/biu
biubiubiu~~ I'm a tiny web crawler framework
crawler python spider spider-framework web-crawler
Last synced: 28 Oct 2024
https://github.com/yerkopalma/bash-crawler
:computer: Get a site links with bash
Last synced: 13 Oct 2024
https://github.com/amirhoseinsb/Cloud_Player_V2
You can use the cloudplayer tool to listen to the music of the singer you want without going to a specific website and at a very high speed.
cloud-player crawler crawling music music-player programming python url-player
Last synced: 04 Aug 2024
https://github.com/duongdev/facebook-group-crawler
Facebook Groups Discussions Crawler
crawler facebook groups puppeteer
Last synced: 12 Nov 2024
https://github.com/dori-dev/quotes-crawler
Quotes crawler using scrapy and python.
crawler crawling python scraping-python scraping-websites scrapy scrapy-crawler scrapy-spider web-scraper
Last synced: 09 Nov 2024
https://github.com/thesp0nge/nightcrawler
A python program that crawls a website and tries to stress it, polluting forms with bogus data
crawler offensive-scripts offensive-security stress-test web-crawler web-crawling
Last synced: 12 Oct 2024
https://github.com/drogbadvc/crawlit
This project is a web crawler based on Scrapy, visualization 2D, PageRank
Last synced: 08 Nov 2024
https://github.com/appliedsoul/promise-crawler
Promise support for node-crawler (Web Crawler/Spider for NodeJS + server-side jQuery)
crawler node-crawler nodejs promise-node-crawler spider
Last synced: 08 Nov 2024
https://github.com/codenashwan/telegrambot_instadp
A simple BOT Telegram to downloading Instagram profiles photo
api crawler crawling instagram instagram-api instagram-bot instagramscraper laravel php scraper telegram telegram-api telegram-bot webhook
Last synced: 08 Nov 2024
https://github.com/yaroslaff/bulk-http-check
Very fast and simple concurrent HTTP client (3500 HTTP req/s)
bulk check concurrent connections crawler header http https multiple parallel spider status
Last synced: 07 Nov 2024
https://github.com/omilab/internet-archive-link-extractor
Tool for extracting external links of a URL from Internet Archive snapshots
Last synced: 07 Aug 2024
https://github.com/fedebotu/neurips2022-openreviewdata
Crawl & Visualize NeurIPS 2022 Data from OpenReview
crawler dataset neurips neurips-2022 openreview peer-review review scraper
Last synced: 06 Nov 2024
https://github.com/jsrei/page-redirect-code-location-hook
JS逆向技巧:页面跳转JS代码定位通杀方案
Last synced: 16 Nov 2024
https://github.com/yggverse/yggstate
Yggdrasil Network Explorer
analytics crawler explorer geo-ip geo-location geolite2 mysql php search-engine sphinx spider yggdrasil yggdrasil-api yggdrasil-network yggdrasil-php-api yggdrasilctl yggstate
Last synced: 06 Nov 2024
https://github.com/jean-baptiste-camps/iiif-crawler
Interrogate IIIF servers and get images of manuscripts
crawler iiif iiif-image manuscripts
Last synced: 11 Oct 2024
https://github.com/AmirAref/DivarCrawler
an script to crawl divar.ir and extract phone numbers
Last synced: 05 Aug 2024
https://github.com/capturr/jsonld-extract
A damn simple tool to extract json-ld metadata from webpage using jquery like api (jQuery, Cheerio, CashDom ...).
cashdom cheerio crawler crawling data extract extractor javascript jquery json jsonld metadata nodejs parser scraper scraping spider typescript
Last synced: 28 Oct 2024
https://github.com/karambir/ugc-colleges
Python Script to extract college names from UGC, India website.
college crawler extract html-parser python python-script ugc
Last synced: 24 Oct 2024
https://github.com/feedeo/youtube-channel-crawler
YouTube Channel :tv: Crawler
crawler youtube youtube-channel
Last synced: 11 Oct 2024
https://github.com/dotenorio/freeloader-of-data
A simple crawler or scraper to get open graph and other meta data from any website.
crawler graph hacktoberfest meta-data open-graph scraper
Last synced: 25 Oct 2024
https://github.com/bernabe9/render-it
Render any JavaScript content to create static sites ready for SEO
crawler javascript prerender prerenderio puppeteer render seo seo-tools server-side-rendering static-site static-site-generator
Last synced: 07 Nov 2024
https://github.com/gabfl/sitecrawl
Simple Python module to crawl a website and extract URLs
crawl crawler crawler-python crawling-sites
Last synced: 13 Oct 2024
https://github.com/sayyid5416/links-extractor
Extract links from any file or the website.
crawler extract-links extractor links-extraction scraper web-crawler web-scraper
Last synced: 28 Oct 2024
https://github.com/twtrubiks/pttcrawlercontent
PTT Crawler Content on python PTT文章爬蟲
Last synced: 16 Nov 2024
https://github.com/ajcerejeira/base.gov.pt
A crawler that fetches data from base.gov.pt
Last synced: 06 Nov 2024
https://github.com/bugfishtm/bugfish-image-downloader
💾 Bugfish Image Downloader: Effortless web image downloads, subsite exploration, and HD selection. Windows app, .NET 4.5, no registry usage. Download now!
bugfish bugfish-software bugfishtm crawler downloader downloadmanager downloadtool gplv3 image imagedownloader imagedownloadertool imageprocessing portable-executable portableapps software utilityapp webscraping windows windows-desktop
Last synced: 06 Nov 2024
https://github.com/luizppa/web-crawler
A web crawler that collects and indexes web pages. Made with chilkat and gumbo parser.
chilkat cpp crawler webcrawler
Last synced: 28 Oct 2024
https://github.com/poyea/coronaflight-hkg
😷 Crawler and history manager for dangerous, coronavirus-infected flights to Hong Kong (VHHH)
corona coronaflight-hkg coronavirus coronavirus-analysis coronavirus-info coronavirus-tracker coronavirus-tracking crawl crawler crawlers crawling hacktoberfest hong-kong hongkong javascript json json-api node node-js nodejs
Last synced: 28 Oct 2024
https://github.com/chusiang/crawler-book-info
A crawler for quick parser the book information
Last synced: 07 Nov 2024
https://github.com/baraja-core/webcrawler
Simple crawling websites by following links.
bot crawler crawling-websites fast php robot speed
Last synced: 06 Nov 2024
https://github.com/AmirAref/Torobot
an inline telegram robot to easy access and search in torob.com products from telegram.
crawler python python-telegram-bot scraper telegtam-bot
Last synced: 05 Aug 2024
https://github.com/arshadkazmi42/blc
Broken link checker
blc broken-link-checker broken-link-finder bug-bounty bugbounty crawler python
Last synced: 28 Oct 2024
https://github.com/simin75simin/libgencrawl
crawl all books from a library genesis search
crawler free-software libgen python3 scraper
Last synced: 05 Nov 2024
https://github.com/integralist/go-web-crawler
A web crawler built in the Go programming language
concurrency crawler go golang web-crawler
Last synced: 11 Oct 2024
https://github.com/coghost/iparse
To extract HTML/json content identified by CSS selectors(with bs4) with yaml config support
crawler parser parser-library python xkcd yaml
Last synced: 09 Nov 2024
https://github.com/liyifeng1994/go-crawler
基于golang的分布式爬虫项目
crawler elastic elasticsearch golang
Last synced: 12 Nov 2024
https://github.com/stopka/fedicrawl
Collect feeds to follow on Fediverse nodes.
crawler docker fediverse nodejs prisma typescript
Last synced: 05 Nov 2024
https://github.com/sayakie/pixiv-crawler
Crawls images from Pixiv 🚀
crawler nodejs pixiv typescript
Last synced: 28 Oct 2024
https://github.com/ivan-alone/instastories-saver-cpp
Program to saving Instagram Stories - Rewritten to C++
api backup crawler grambler gramblr insta instagram instagram-stories instastories-saver instastory stories
Last synced: 31 Oct 2024