Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2024-11-14 00:05:40 UTC
- JSON Representation
https://github.com/rajatomar788/pywebcopy
Locally saves webpages to your hard disk with images, css, js & links as is.
archive-tool crawler html html-parser mirror python web webpage
Last synced: 04 Aug 2024
https://github.com/crawljax/crawljax
Crawljax
crawler crawling dom dynamic event-driven-crawling javascript test-generation web-analysis web-testing
Last synced: 29 Oct 2024
https://github.com/nanshihui/scan-t
a new crawler based on python with more function including Network fingerprint search
crawler netfingerprint python sybersecurity
Last synced: 03 Nov 2024
https://github.com/nanshihui/Scan-T
a new crawler based on python with more function including Network fingerprint search
crawler netfingerprint python sybersecurity
Last synced: 13 Nov 2024
https://github.com/zhuyingda/webster
a reliable high-level web crawling & scraping framework for Node.js.
automation-test automation-ui chromium crawler crawling headless-chrome javascript javascript-framework nodejs nodejs-framework puppeteer scraping-framework spider
Last synced: 10 Oct 2024
https://github.com/abhisharma404/vault
swiss army knife for hackers
crawler fuzzing hacking hacking-tool information-gathering lfi networking offensive-security osint pentesting port-scanner python rfi scanner scrapy security sqlite ssl-inspection vault xss-vulnerability
Last synced: 03 Nov 2024
https://github.com/jaeksoft/opensearchserver
Open-source Enterprise Grade Search Engine Software
crawler custom-search enterprise indexing java lucene ocr opensearchserver search search-engine synonyms webcrawler webcrawling
Last synced: 29 Oct 2024
https://github.com/dirtyfilthy/freshonions-torscraper
Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion
crawler darknet hidden-services onion scraper spider tor
Last synced: 06 Nov 2024
https://github.com/AlexMathew/scrapple
A framework for creating semi-automatic web content extractors
beautifulsoup crawler css-selector extractor lxml python scrapers scraping scrapy selector selector-expression tutorial web-scraper web-scraping xpath-expression
Last synced: 31 Oct 2024
https://github.com/chushuai/wscan
Wscan is a web security scanner that focuses on web security, dedicated to making web security accessible to everyone.
cel-go chromedp crawler headless martian passive-vulnerability-scanner poc sql-injection subdomains testwaf vulnerability-scanner waf webscan wscan xss
Last synced: 04 Aug 2024
https://github.com/ChenZixinn/spider_reverse
爬虫逆向案例,已完成:TLS指纹|瑞数|震坤行 | 网易易盾 | 微信小程序反编译逆向(百达星系) | 同花顺 | rpc解密 | 加速乐 | 极验滑块验证码 | 巨量算数 | Boss直聘 | 企查查 | 中国五矿 | qq音乐 | 产业政策大数据平台 | 企知道 | 雪球网(acw_sc__v2) | 1688 | 七麦数据 | whggzy | 企名科技 | mohurd | 艺恩数据 | 欧科云链
crawler python requests spider
Last synced: 31 Oct 2024
https://github.com/yhy0/Jie
Jie stands out as a comprehensive security assessment and exploitation tool meticulously crafted for web applications. Its robust suite of features encompasses vulnerability scanning, information gathering, and exploitation, elevating it to an indispensable toolkit for both security professionals and penetration testers.(expectations)
apollo-exp crawler jie scan scanner security-copilot shiro-exp vul vulnerability vulnerability-detection vulnerability-exploitation vulnerability-scanners
Last synced: 10 Sep 2024
https://github.com/yhy0/jie
Jie stands out as a comprehensive security assessment and exploitation tool meticulously crafted for web applications. Its robust suite of features encompasses vulnerability scanning, information gathering, and exploitation, elevating it to an indispensable toolkit for both security professionals and penetration testers.(expectations)
apollo-exp crawler jie scan scanner security-copilot shiro-exp vul vulnerability vulnerability-detection vulnerability-exploitation vulnerability-scanners
Last synced: 08 Nov 2024
https://github.com/shaohua0116/ICLR2020-OpenReviewData
Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.
conference crawler data-analysis iclr iclr2020 machine-learning visualization
Last synced: 07 Aug 2024
https://github.com/hect0x7/jmcomic-crawler-python
Python API for JMComic | 提供Python API访问禁漫天堂,同时支持网页端和移动端 | 禁漫天堂GitHub Actions下载器🚀
18comic crawler downloader github-actions jmcomic pypi python readthedocs
Last synced: 08 Nov 2024
https://github.com/andythefactory/newspaper4k
📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.
articles articles-data crawler datasets-preparation news newspaper3k python requests scraper scraping
Last synced: 07 Nov 2024
https://github.com/AndyTheFactory/newspaper4k
📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.
articles articles-data crawler datasets-preparation news newspaper3k python requests scraper scraping
Last synced: 26 Oct 2024
https://github.com/tasos-py/Search-Engines-Scraper
Search google, bing, yahoo, and other search engines with python
bing crawler google python scraper search-engine yahoo
Last synced: 04 Aug 2024
https://github.com/lixi5338619/lxbook
《爬虫逆向进阶实战》书籍代码库
android-resever crawler frida java javascript python smali spiders unidbg xposed
Last synced: 05 Nov 2024
https://github.com/gadfly0x/signature_algorithm
各种App、小程序、网站的请求签名或加密算法。 现已有:自如、小红书、蛋壳公寓、luckin coffee(瑞幸咖啡)、bangkokair(曼谷航空)
crawler reverse-engineering spider
Last synced: 11 Nov 2024
https://github.com/roniemartinez/dude
dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators
async beautifulsoup4 crawler css framework lxml parsel playwright python scraper scraping selenium sync web-scraping webscraping xpath
Last synced: 11 Oct 2024
https://github.com/platonai/PulsarRPA
Automate webpages at scale, scrape web data completely and accurately with high performance, distributed RPA.
crawler data-mining data-science rpa scraper scraping web-automation web-crawler web-mining web-scraping web-sql
Last synced: 05 Nov 2024
https://github.com/lgraubner/sitemap-generator
Easily create XML sitemaps for your website.
crawler google seo sitemap sitemap-generator xml-sitemap
Last synced: 08 Aug 2024
https://github.com/cyubuchen/free_proxy_website
获取免费socks/https/http代理的网站集合
crawler free-proxy-list ip proxy proxy-checker spider
Last synced: 03 Aug 2024
https://github.com/shaohua0116/ICLR2019-OpenReviewData
Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.
crawler crawling-python openreview tutorial
Last synced: 07 Aug 2024
https://github.com/smuyyh/crawlerforreader
Android 本地网络小说爬虫,基于jsoup及xpath
android bookreader crawler jsoup xpath
Last synced: 10 Nov 2024
https://github.com/brendonboshell/supercrawler
A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
crawler distributed-crawler robot sitemap web-crawler
Last synced: 25 Oct 2024
https://github.com/microsoft/ghcrawler
Crawl GitHub APIs and store the discovered orgs, repos, commits, ...
crawler data github github-api github-webhooks ospo
Last synced: 25 Sep 2024
https://github.com/mhmdiaa/second-order
Second-order subdomain takeover scanner
crawler crawling infosec mapping penetration-testing penetration-testing-tools pentesting recon reconnaissance security security-tools web-application-security wordlist wordlist-generator
Last synced: 03 Nov 2024
https://github.com/Josue87/EmailFinder
Search emails from a domain through search engines
Last synced: 13 Nov 2024
https://github.com/scrapy-plugins/scrapy-crawlera
Zyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy
crawler crawler-detection plugin proxy scraping scrapy
Last synced: 05 Sep 2024
https://github.com/scrapy-plugins/scrapy-zyte-smartproxy
Zyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy
crawler crawler-detection plugin proxy scraping scrapy
Last synced: 12 Nov 2024
https://github.com/salimk/Rcrawler
An R web crawler and scraper
crawler crawlers r rpackage scraper webcrawler webscraper webscraping webscrapping
Last synced: 25 Oct 2024
https://github.com/Malwarize/webpalm
🕸️ Crawl in the web network
crawler crawling data data-science datamining go golang hack mining osint redteam spider tool
Last synced: 08 Nov 2024
https://github.com/xiyuan-fengyu/ppspider
web spider built by puppeteer, support task-queue and task-scheduling by decorators,support nedb / mongodb, support data visualization; 基于puppeteer的web爬虫框架,提供灵活的任务队列管理调度方案,提供便捷的数据保存方案(nedb/mongodb),提供数据可视化和用户交互的实现方案
angular cheerio crawler headless mongodb nedb node node-spider nodejs nodejs-spider proxy puppeteer spider task-queue task-scheduling typescript
Last synced: 10 Oct 2024
https://github.com/crwlrsoft/crawler
Library for Rapid (Web) Crawler and Scraper Development
crawler crawling hacktoberfest php scraper scraping scraping-websites web-crawler web-crawling web-scraper web-scraping
Last synced: 25 Oct 2024
https://github.com/rivermont/spidy
The simple, easy to use command line web crawler.
crawler crawling python python3 web-crawler web-spider
Last synced: 29 Oct 2024
https://github.com/dmi3kno/polite
Be nice on the web
crawler memoise r r-package rate-limiter robotstxt rstats rvest scraper webscraping
Last synced: 25 Oct 2024
https://github.com/infinilabs/crawler
🕷️ An easy-to-use spider written in Golang. (previous named GOPA.)
crawler crawling elasticsearch lightweight scraping spider web-crawler web-scraping web-spider
Last synced: 09 Nov 2024
https://github.com/krypton-byte/tiktok-downloader
Tiktok Downloader/Scraper using requests & bs4
asynchronous asyncio beautifulsoup bs4 crawler downloader flask krypton-byte lightweight nowm python python3 requests tiktok watermark web without
Last synced: 11 Nov 2024
https://github.com/dennis-tra/nebula
🌌 A network agnostic DHT crawler, monitor, and measurement tool that exposes timely information about DHT networks.
cid crawler filecoin golang hacktoberfest ipfs libp2p
Last synced: 06 Nov 2024
https://github.com/TikHubIO/TikHub-API-Python-SDK
High-performance asynchronous Douyin(抖音) TikTok Xiaohongshu(小红书) Kuaishou(快手) Weibo(微博) Instagram YouTube(油管) Twitter(X) Captcha Solver(验证码解决器) Temp Mail(临时邮箱) API(接口).
api captcha-solver crawler data-api douyin douyin-tiktok-api instagram kuaishou netease-cloud-music private-api scrapy tiktok twitter weibo xiaohongshu xiguashipin
Last synced: 29 Oct 2024
https://github.com/lgraubner/sitemap-generator-cli
Creates an XML-Sitemap by crawling a given site.
cli crawler google seo sitemap xml-sitemap
Last synced: 11 Nov 2024
https://github.com/mustafadalga/instagram-bot
An Instagram bot developed using the Selenium Framework
automation automation-selenium bot bulk-comments bulk-unfollow crawler crawling download-stories instagram instagram-api instagram-bot instagram-downloader instagram-without-api mass-liking python python3 selenium selenium-framework selenium-python selenium-webdriver
Last synced: 28 Sep 2024
https://github.com/yaroslaff/nudecrawler
Crawl telegra.ph searching for nudes!
crawl crawler find nsfw nsfw-recognition nude nudes nudity-detection onlyfans python python3 scrape scraper scraping search spider telegra-ph tits web-scraping webscraping
Last synced: 07 Nov 2024
https://github.com/GraySilver/wencai
This is a wencai crawler.(i问财的策略回测接口的Pythonic工具包)
crawler finance pandas quant quantitative-finance tushare wencai
Last synced: 30 Oct 2024
https://github.com/oppsec/pinkerton
🕵️ Pinkerton is an JavaScript file crawler and secret finder tool developed in Python
crawl crawler hacktoberfest javascript pentest python python3 redteam secrets
Last synced: 08 Nov 2024
https://github.com/devanshbatham/Gorecon
Gorecon is a All in one Reconnaissance Tool , a.k.a swiss knife for Reconnaissance , A tool that every pentester/bughunter might wanna consider into their arsenal
admin-panel-finder backups-finder cmsdetecter configurationfiles crawler directory-bruteforce dns dnsrecon email-hunter geo-ip nameserver recon reconaissance reverse-dns scanner subdomain-enumeration subdomain-scanner subnet-lookup whois-lookup wordpress-scanner
Last synced: 04 Nov 2024
https://github.com/zhupingqi/RuiJi.Net
crawler framework, distributed crawler extractor
crawler extractor headless-chrome netcore owin scraper scrapy
Last synced: 13 Nov 2024
https://github.com/eight04/comiccrawler
An image crawler written in Python.
cli crawler gui image-crawler python tkinter
Last synced: 13 Nov 2024
https://github.com/Jasonnor/th-music-video-generator
Touhou Project random music video generator/player, crawling image and video from websites to generate MV.
crawler javascript music-video touhou web
Last synced: 11 Nov 2024
https://github.com/eight04/ComicCrawler
An image crawler written in Python.
cli crawler gui image-crawler python tkinter
Last synced: 15 Aug 2024
https://github.com/marshalx/telegram-crawler
🕷 Automatically detect changes made to the official Telegram sites, clients and servers.
crawler crawling crawling-python parser telegram telegram-org telegram-updates
Last synced: 12 Nov 2024
https://github.com/algolia/algoliasearch-netlify
Official Algolia Plugin for Netlify. Index your website to Algolia when deploying your project to Netlify with the Algolia Crawler
algolia algolia-crawler algoliasearch crawler jamstack netlify netlify-plugin search
Last synced: 12 Oct 2024
https://github.com/glaucocustodio/tanakai
Tanakai is a modern web scraping framework written in Ruby. A fork of Kimurai.
chrome-headless crawler kimurai scraper scrapy webscraping
Last synced: 31 Oct 2024
https://github.com/lucasjinreal/weibo_terminator_workflow
Update Version of weibo_terminator, This is Workflow Version aim at Get Job Done!
crawler nlp scraper sentiment-analysis weibo-terminator
Last synced: 06 Nov 2024
https://github.com/antchfx/antch
Antch, a fast, powerful and extensible web crawling & scraping framework for Go
crawler crawling framework golang scraping web-crawler web-spider
Last synced: 26 Oct 2024
https://github.com/zrashwani/arachnid
Crawl all unique internal links found on a given website, and extract SEO related information - supports javascript based sites
Last synced: 29 Oct 2024
https://github.com/xyntax/filesensor
Dynamic file detection tool based on crawler 基于爬虫的动态敏感文件探测工具
crawler fuzzing pentesting scrapy
Last synced: 07 Nov 2024
https://github.com/zntfdr/Selenops
A Swift Web Crawler 🕷
command-line-tool crawler scripting swift web
Last synced: 06 Aug 2024
https://github.com/dwisiswant0/galer
A fast tool to fetch URLs from HTML attributes by crawl-in.
crawler devtool extractor galer go golang spider url-extractor url-parser waybackurls
Last synced: 28 Oct 2024
https://github.com/commoncrawl/news-crawl
News crawling with StormCrawler - stores content as WARC
apache-storm common-crawl commoncrawl crawler news storm-crawler warc web-crawler
Last synced: 03 Aug 2024
https://github.com/zntfdr/selenops
A Swift Web Crawler 🕷
command-line-tool crawler scripting swift web
Last synced: 31 Oct 2024
https://github.com/yangjianxin1/qqmusicspider
基于Scrapy的QQ音乐爬虫(QQ Music Spider),爬取歌曲信息、歌词、精彩评论等,并且分享了QQ音乐中排名前6400名的内地和港台歌手的49万+的音乐语料
crawler music musicspider qqmusic scrapy
Last synced: 07 Nov 2024
https://github.com/s0rg/crawley
The unix-way web crawler
cli crawler go golang golang-application pentest pentest-tool pentesting unix-way web-crawler web-scraping web-spider
Last synced: 02 Nov 2024
https://github.com/kong36088/ZhihuSpider
多线程知乎用户爬虫,基于python3
crawler multi-threading python python3 spider zhihu
Last synced: 07 Aug 2024
https://github.com/MarshalX/telegram-crawler
🕷 Automatically detect changes made to the official Telegram sites, clients and servers.
crawler crawling crawling-python parser telegram telegram-org telegram-updates
Last synced: 04 Aug 2024
https://github.com/ScottSloan/Bili23-Downloader
下载 Bilibili 视频/番剧/电影/纪录片 等资源
bilibili crawler linux macos python videodownloader windows wxpython
Last synced: 27 Oct 2024
https://github.com/lgh06/web-page-monitor
Web Site Page Changes Monitor. 网站网页页面更新变更监控提醒。
change-alert change-detection change-monitor crawler monitor website-change-monitor website-monitoring
Last synced: 04 Aug 2024
https://github.com/dwisiswant0/gf-secrets
Secret and/or credential patterns used for gf.
alienvault-otx bugbounty crawler gau gf gitleaks infosec open-threat-exchange secrets-detection trufflehog trufflehog3 wayback wayback-machine waybackurl
Last synced: 28 Oct 2024
https://github.com/R4yGM/dorkscout
DorkScout - Golang tool to automate google dork scan against the entiere internet or specific targets
bug-bounty crawler ghdb golang google-dorks osint scraper security
Last synced: 04 Aug 2024
https://github.com/kirralabs/indonesian-NLP-resources
data resource untuk NLP bahasa indonesia
corpus corpus-linguistics crawler dataset dependency-parser indonesian indonesian-language named-entity-recognition nlp parallel-corpus pos-tagging sentiment-analysis
Last synced: 08 Nov 2024
https://github.com/spatie/robots-txt
Determine if a page may be crawled from robots.txt, robots meta tags and robot headers
Last synced: 10 Nov 2024
https://github.com/linkedtales/scrapedin-linkedin-crawler
Crawler for LinkedIn full profiles 2019
crawler linkedin linkedin-crawler
Last synced: 06 Nov 2024
https://github.com/crypto-crawler/crypto-crawler-rs
A rock-solid cryptocurrency crawler library.
crawler cryptocurrency websocket
Last synced: 28 Oct 2024
https://github.com/macacajs/NoSmoke
A cross platform UI crawler which scans view trees then generate and execute UI test cases.
android crawler ios macaca smoke-tests test-automation webdriver
Last synced: 08 Nov 2024
https://github.com/mgleon08/instagram-crawler
Crawl instagram photos, posts and videos for download.
crawler gem instagram instagram-crawler instagram-scraper ruby rubygems scraper
Last synced: 14 Aug 2024
https://github.com/ovnrain/javbus-api
一个自我托管的 JavBus API 服务
adults api api-server crawler docker javbus magnet nodejs spider typescript vercel vercel-deployment
Last synced: 13 Nov 2024
https://github.com/webysther/packagist-mirror
📦✂️📋📦 Create a mirror of packagist.org metadata for use locally with composer
composer composer-packages crawler mirror packagist packagist-mirror php
Last synced: 03 Nov 2024
https://github.com/Webysther/packagist-mirror
📦✂️📋📦 Create a mirror of packagist.org metadata for use locally with composer
composer composer-packages crawler mirror packagist packagist-mirror php
Last synced: 02 Nov 2024