Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2024-11-05 00:06:41 UTC
- JSON Representation
https://github.com/hfreire/browser-as-a-service
A web browser :earth_americas: hosted as a service, to render your JavaScript web pages as HTML
browser browser-as-a-service crawler docker github-actions javascript puppeteer rest-api scraper server webcrawler
Last synced: 26 Oct 2024
https://github.com/jaymon/wishlist
Read an Amazon wishlist programmatically with Python
amazon amazon-wishlist api crawler python scraper
Last synced: 27 Oct 2024
https://github.com/findopendata/findopendata
A search engine for Open Data
crawler dataset-search opendata
Last synced: 05 Aug 2024
https://github.com/he426100/alipay-crawler
支付宝账单爬虫
alipay crawler selenium selenium-ide selenium-php selenium-webdriver
Last synced: 06 Nov 2024
https://github.com/d4vinci/scrapling
Lightning-Fast, Adaptive Web Scraping for Python
automation crawler crawling crawling-python css dom-manipulation hacktoberfest lxml playwright python python3 scraping selectors selenium stealth web-scraper web-scraping web-scraping-python webscraping xpath
Last synced: 31 Oct 2024
https://github.com/farishijazi/rarbgcli
RARBG command line interface for scraping the rarbg.to torrent search engine
crawler rarbg rarbg-torrentapi torrent torrents torrents-crawler
Last synced: 27 Oct 2024
https://github.com/a11ywatch/crawler
gRPC web crawler turbo charged for performance
a11ywatch crawler grpc scraper
Last synced: 13 Oct 2024
https://github.com/goldarowana/douyin-crawler
抖音爬虫. 通过手机代理爬取用户的作品和用户的喜欢
crawler douyin douyin-download java vertx
Last synced: 09 Oct 2024
https://github.com/sachaarbonel/scrapy.dart
Scrapy, a fast high-level web crawling & scraping framework for dart and Flutter
Last synced: 28 Oct 2024
https://github.com/forsti0506/a11y-sitechecker
Automatic accessibility checker with website crawling + screenshots for easy use
accessibility accessibility-criteria accessibility-testing axe crawler hacktoberfest open-source puppeteer typescript typescript-library
Last synced: 31 Oct 2024
https://github.com/valerebron/usetube
search & get datas from youtube no google account needed
crawler typescript video youtube youtube-api
Last synced: 14 Oct 2024
https://github.com/ReddyyZ/URLBrute-Py
Tool to brute website sub-domains and dirs.
brute-force bruteforcer crawler dir-scanner dirscanner dirsearch sub-domain-enumeration sub-domain-scanner
Last synced: 04 Aug 2024
https://github.com/murat/tors
⏬ Yet another torrent searching application for your command line
crawler ruby-gem torrent-downloader torrent-search-engine
Last synced: 28 Oct 2024
https://github.com/spider-rs/spider-py
Spider ported to Python
crawler headless-chrome python scraper spider web-crawler
Last synced: 05 Nov 2024
https://github.com/soruly/anilist-crawler
Crawl data from anilist API and store in MariaDB.
Last synced: 27 Oct 2024
https://github.com/liangWenPeng/scrapy-admin
A django admin site for scrapy
Last synced: 17 Aug 2024
https://github.com/mike442144/seenreq
Generate an object for testing if a request is sent, request is Mikeal's request.
crawler duplicates-removed post request spider url
Last synced: 27 Oct 2024
https://github.com/Conso1eCowb0y/Deepminer
Deep web crawler and search engine
crawler crawling dark-web data-mining deepminer deepweb github hacking onion osint python-web-scraper python3 search-engine security security-tools spider the-onion-router tor tor-network webcrawler
Last synced: 02 Aug 2024
https://github.com/spk/maman
Rust Web Crawler saving pages on Redis
crawler http spider web web-crawler
Last synced: 01 Nov 2024
https://github.com/riquellopes/fii
API para recuperar informações sobre FII
crawler investiment mongodb nodejs
Last synced: 31 Oct 2024
https://github.com/golang-collection/go-crawler-distributed
分布式爬虫项目,本项目支持个性化定制页面解析器二次开发,项目整体采用微服务架构,通过消息队列实现消息的异步发送,使用到的框架包括:redigo, gorm, goquery, easyjson, viper, amqp, zap, go-micro,并通过Docker实现容器化部署,中间爬虫节点支持水平拓展。
crawler docker elasticsearch go go-micro gocrawler microservice rabbitmq
Last synced: 04 Aug 2024
https://github.com/healeycodes/Broken-Link-Crawler
:robot: Python bot that crawls your website looking for dead stuff
Last synced: 26 Sep 2024
https://github.com/healeycodes/broken-link-crawler
:robot: Python bot that crawls your website looking for dead stuff
Last synced: 22 Oct 2024
https://github.com/axetroy/crawler
nodejs 爬虫框架. crawler framework for nodejs
Last synced: 27 Oct 2024
https://github.com/elboletaire/php-crawler
:spider: A simple crawler (spider) writen in php just for fun, with zero dependencies
Last synced: 31 Oct 2024
https://github.com/ronin-rb/ronin-web
ronin-web is a collection of useful web helper methods and commands.
cli crawler hacktoberfest helpers html proxy-server ronin-rb ruby server spider web xml
Last synced: 04 Nov 2024
https://github.com/charlespikachu/seleniumlogin
Login some website using selenium.
crawler selenium selenium-webdriver spider taobao
Last synced: 09 Oct 2024
https://github.com/ryuchen/deadpool
该项目是一个使用celery作为主体框架的爬虫应用,能够灵活的添加爬虫任务,并且同时运行多站点的爬虫工作,所有组件都能够原生支持规模并发和分布式,加上celery原生的分布式调用,实现大规模并发。
celery crawler deadpool python3 spider taobao taobao-spider tmall tmall-spider
Last synced: 28 Oct 2024
https://github.com/p0dalirius/robotstester
This Python script can enumerate all URLs present in robots.txt files, and test whether they can be accessed or not.
bugbounty crawler pentesting python robots tool
Last synced: 29 Oct 2024
https://github.com/mrxujiang/crawel
基于Apify+node+react搭建的有点意思的爬虫平台
apify crawler node puppeteer react react-hooks umi umi3
Last synced: 14 Oct 2024
https://github.com/jonaslejon/lolcrawler
Headless web crawler for bugbounty and penetration-testing/redteaming
bugbounty crawler docker penetration-testing penetration-testing-tools redteam redteam-tools redteaming
Last synced: 04 Aug 2024
https://github.com/0xhjk/x12306
12306查票助手,一键查询沿途所有站点,先上车后补票,让你的出行更省心。
12306 12306buyticket 12306helper 12306qiang-piao crawler fk12306 helper reqeusts spider ticket train x12306
Last synced: 31 Oct 2024
https://github.com/kylemocode/medium-stat-box
Practical pinned gist which show your latest medium status 📌
awesome-pinned-gists crawler github-action github-gists medium-stats
Last synced: 02 Nov 2024
https://github.com/hackfengJam/ArticleSpider
Crawling zhihu, jobbole, lagou by Scrapy, and using Elasticsearch+Django to build a Search Engine website --- README_zh.md (including: implementation roadmap, distributed-crawler and coping with anti-crawling strategies).
crawler distributed-systems django elasticsearch scrapy
Last synced: 31 Oct 2024
https://github.com/apocelipes/schannel-qt5
A GUI client of schannel powered by therecipe/qt and golang
client-side crawler go golang goqt linux qcharts qt5
Last synced: 23 Oct 2024
https://github.com/haxzie-xx/instagram-downloader
Node.js/Express app to retrive instagram video/image download urls
crawler downloader express instagram instagram-scraper nodejs
Last synced: 27 Oct 2024
https://github.com/veliovgroup/spiderable-middleware
🤖 Prerendering for JavaScript powered websites. Great solution for PWAs (Progressive Web Apps), SPAs (Single Page Applications), and other websites based on top of front-end JavaScript frameworks
crawler meteor meteor-package middleware nodejs npm npm-package seo seo-optimization spiderable
Last synced: 14 Oct 2024
https://github.com/jfreegman/toxcrawler
A Tox DHT network crawler
crawler dht dht-network tox toxcore
Last synced: 15 Oct 2024
https://github.com/VeliovGroup/spiderable-middleware
🤖 Prerendering for JavaScript powered websites. Great solution for PWAs (Progressive Web Apps), SPAs (Single Page Applications), and other websites based on top of front-end JavaScript frameworks
crawler meteor meteor-package middleware nodejs npm npm-package seo seo-optimization spiderable
Last synced: 04 Aug 2024
https://github.com/ph-7/crawling-emails
Very simple bash script to crawl email addresses from a specific website.
bash crawler email email-scraper scrape scrape-email scraper scraping shell wget
Last synced: 28 Oct 2024
https://github.com/code4everything/visual-spider
欢迎体验我们全新的桌面端效率工具RunFlow,https://myrest.top/myflow
crawler crawler4j-java java-8 java8 javafx javafx-application spider visualization
Last synced: 29 Sep 2024
https://github.com/debugtalk/webcrawler
A web crawler based on requests-html, mainly targets for url validation test.
crawler requests-html web-crawler weblink
Last synced: 16 Oct 2024
https://github.com/gomjellie/pysaint
[deprecated] 유세인트 파이썬 클라이언트
crawler sap soongsil unofficial
Last synced: 28 Oct 2024
https://github.com/mamal72/iranian-calendar-events
Fetch Iranian calendar events (Jalali, Hijri and Gregorian) from time.ir website
crawler events iranian jalali jalali-calendar persian
Last synced: 02 Nov 2024
https://github.com/kshru9/web-crawler
A multithreaded web crawler using two mechanism - single lock and thread safe data structures
concurrency concurrent-data-structure cpp crawler data-structures html-parser lock multithreading openssl pagerank pthread reader-writer-lock search-engine socket threading threadsafe webcrawler website-downloader
Last synced: 28 Oct 2024
https://github.com/k1low/utsusemi
A tool to generate a static website by crawling the original site.
api aws aws-lambda crawler s3-website serverless serverless-framework
Last synced: 17 Oct 2024
https://github.com/k1LoW/utsusemi
A tool to generate a static website by crawling the original site.
api aws aws-lambda crawler s3-website serverless serverless-framework
Last synced: 04 Aug 2024
https://github.com/minhhungit/github-action-rss-crawler
Auto crawl RSS feeds using Github Action
crawler csharp github-actions litedb netcore rss rss-crawler rss-items
Last synced: 02 Aug 2024
https://github.com/pykong/pypergrabber
Fetches PubMed article IDs (PMIDs) from email inbox, then crawls PubMed, Google Scholar and Sci-Hub for respective PDF files.
crawler email-inbox google-scholar pdf pmid pubmed python sci-hub scraper
Last synced: 16 Oct 2024
https://github.com/riptl/ytpriv
YT metadata exporter
big-data crawler csv datascience json video youtube
Last synced: 03 Aug 2024
https://github.com/ERap320/CrowLeer
Powerful C++ web crawler based on libcurl
Last synced: 03 Aug 2024
https://github.com/spider-rs/spider-nodejs
Spider ported to Node.js
crawler distributed-systems headless-chrome indexer nodejs scraper spider typescript
Last synced: 05 Nov 2024
https://github.com/alex-page/get-site-urls
🔗 Get all of the URL's from a website.
crawler sitemap-generator urls
Last synced: 27 Oct 2024
https://github.com/marcel0024/cococrawler
An declarative and easy to use web crawler and scraper in C#
cococrawler crawler crawling-tool csharp dotnet dotnetcore scraper scraping-tool webcrawler webcrawler-csharp webcrawling webscraper
Last synced: 12 Oct 2024
https://github.com/novemberde/serverless-crawler-demo
Serverless Architecture Crawler demo
aws crawler demo handson serverless
Last synced: 04 Aug 2024
https://github.com/Smartproxy/Python-scraper-tutorial
A short introduction to scraping with Python with given steps and an example scraper script.
beautifulsoup crawler data-mining data-science github-python json-database-python learning python python-projects python-web-crawler python-web-scraper scraper-python scraping web-crawler-python web-scraping web-scraping-api web-scraping-python webscraping
Last synced: 04 Aug 2024
https://github.com/bartozzz/crawlerr
A simple and fully customizable web crawler/spider for Node.js with server-side DOM. Comes with elegant and hell-simple APIs.
crawler jsdom nodejs scraper spider web-crawler
Last synced: 20 Oct 2024
https://github.com/aliosm/kontests
Competitive programming contests schedule
a2oj atcoder codeforces codeforces-gym codeshef competitive-programming crawler csacademy hackerearth hackerrank kickstart leetcode topcoder
Last synced: 09 Oct 2024
https://github.com/italia/publiccode-crawler
publiccode.yml crawler for the Open Source software catalog of Developers Italia
crawler developers-italia hacktoberfest publiccode publiccodeyml
Last synced: 02 Aug 2024
https://github.com/mattwang44/uspto-patft-web-crawler
Crawler for fetching information of US Patents and PDF bulk download
crawler patent patent-crawler pyqt5 python3 uspto
Last synced: 02 Oct 2024
https://github.com/matheusfelipeog/froxy
Hide your IP with free proxies using Froxy 🔄
crawler free-proxy froxy hide-ip proxies proxies-scraper proxy python requests requests-module scraping
Last synced: 26 Oct 2024
https://github.com/alessandrodd/googleplay_api
Google Play Unofficial Python 3 API Library
android crawler googleplay googleplay-api playstore
Last synced: 27 Oct 2024
https://github.com/kagami/tistore
:camera: Tistory photo grabber
crawler cross-platform electron tistory
Last synced: 22 Oct 2024
https://github.com/ivan-sincek/chad
Search Google Dorks like Chad. / Broken link hijacking tool.
broken-link-hijacking bug-bounty crawler ethical-hacking google-dorking google-dorks offensive-security penetration-testing playwright python red-team-engagement scraper search-engine security social-media social-media-takeover threat-hunting threat-intelligence web web-penetration-testing
Last synced: 31 Oct 2024
https://github.com/feng19/spider_man
SpiderMan,a base-on Broadway fast high-level web crawling & scraping framework for Elixir.
crawler data-mining elixir erlang framework spider
Last synced: 29 Oct 2024
https://github.com/mechazawa/redbetter-wm2
Better.php crawler for Redacted that uses WhatManager
crawler flac redacted seedbox transcoding whatcd whatmanager
Last synced: 06 Nov 2024
https://github.com/rzo1/crawler4j
Open Source Web Crawler for Java - A maintained fork of yasserg/crawler4j
crawler crawler4j java spider web-crawler web-spider
Last synced: 29 Sep 2024
https://github.com/yokawasa/scrapy-azuresearch-crawler-samples
Scrapy as a Web Crawler for Azure Search Samples
azure azure-search crawler python python3 scrapy search
Last synced: 30 Oct 2024
https://github.com/RuedigerVoigt/exoskeleton
A Python framework to build polite, but tenacious crawlers / scrapers with a MariaDB backend
crawler crawling-framework database machine-learning mariadb network python python-3 scraping
Last synced: 01 Aug 2024
https://github.com/nvk681/gumo
A crawler that extracts data from a dynamic webpage. Written in node js.
crawler elasticsearch neo4j nodejs
Last synced: 11 Oct 2024
https://github.com/ruedigervoigt/exoskeleton
A Python framework to build polite, but tenacious crawlers / scrapers with a MariaDB backend
crawler crawling-framework database machine-learning mariadb network python python-3 scraping
Last synced: 15 Oct 2024
https://github.com/thaoshibe/crawl-original-google-images
python scripts for crawling original image from Google Images
chrome-extension crawler crawling crawling-python google google-images pafy scraper youtube youtube-dl youtube-search
Last synced: 11 Oct 2024
https://github.com/capjamesg/indieweb-search
Source code for the IndieWeb search engine.
crawler indieweb search search-engine
Last synced: 03 Aug 2024
https://github.com/asing1001/movierater
A useful website for finding movie's rating in Chinese and English. By crawling Yahoo, Ptt, IMDB.
apollo-client chai crawler graphql material-ui mocha mongodb movies nodejs reactjs redis server-side-rendering service-worker sinon typescript
Last synced: 14 Oct 2024
https://github.com/Actomaton/ActoCrawler
🕸️ Swift Concurrency-powered crawler engine on top of Actomaton.
Last synced: 09 Aug 2024
https://github.com/petehouston/udemy-crawler
Crawling Udemy course info and save into JSON format.
crawler crawling node node-cli udemy udemy-api udemy-crawl
Last synced: 23 Oct 2024
https://github.com/waynechang65/ptt-crawler
ptt-crawler is a web crawler module designed to scarpe data from Ptt.
crawler javascript nodejs ptt scraper scraping spider web-crawler webcrawler
Last synced: 19 Oct 2024
https://github.com/ArchiveTeam/WebArchiver
Decentralized web archiving
archiver archiving crawler decentralized python warc web webarchiving
Last synced: 06 Nov 2024
https://github.com/p0dalirius/crawlersuseragents
Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.
bugbounty crawler crawlers pentest request tool user-agent web
Last synced: 29 Oct 2024
https://github.com/paambaati/websight
🕷A simple but *really* fast crawler built with Node.js & TypeScript
coding-challenge crawler interview-questions javascript monzo nodejs typescript
Last synced: 15 Oct 2024
https://github.com/alinebastos/crawler
Web Crawler created with Node.js and Puppeteer
crawler fs javascript nodejs puppeteer scraping
Last synced: 05 Nov 2024
https://github.com/bkeepers/spiderman
your friendly neighborhood web crawler
crawler crawler-engine http httprb nokogiri ruby spider spider-framework web-crawler web-scraping webcrawler webscraping
Last synced: 23 Oct 2024
https://github.com/enijkamp/supermonkey
A crawler for automated Android UI testing.
Last synced: 22 Oct 2024