Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2024-12-25 00:05:56 UTC
- JSON Representation
https://github.com/avidlearnerinprogress/python-automation-scripts
Simple yet powerful automation stuffs.
beautifulsoup codetopdf comic-downloader crawler cricinfo cricket-api crime-data-scraper images imdb-webscrapping instagram instagram-scraper medium-downloader news-scraper pdf pdf-converter quora quora-crawler scraping-websites selenium-webdriver word-of-the-day
Last synced: 21 Dec 2024
https://github.com/avidLearnerInProgress/python-automation-scripts
Simple yet powerful automation stuffs.
beautifulsoup codetopdf comic-downloader crawler cricinfo cricket-api crime-data-scraper images imdb-webscrapping instagram instagram-scraper medium-downloader news-scraper pdf pdf-converter quora quora-crawler scraping-websites selenium-webdriver word-of-the-day
Last synced: 10 Nov 2024
https://github.com/erma0/douyin
抖音爬虫——采集账号主页、喜欢、收藏、音乐原声、话题、搜索、合集、作品、关注、粉丝等公开数据。
Last synced: 29 Oct 2024
https://github.com/zhuyingda/webster
a reliable high-level web crawling & scraping framework for Node.js.
automation-test automation-ui chromium crawler crawling headless-chrome javascript javascript-framework nodejs nodejs-framework puppeteer scraping-framework spider
Last synced: 27 Dec 2024
https://github.com/crawljax/crawljax
Crawljax
crawler crawling dom dynamic event-driven-crawling javascript test-generation web-analysis web-testing
Last synced: 22 Dec 2024
https://github.com/nanshihui/scan-t
a new crawler based on python with more function including Network fingerprint search
crawler netfingerprint python sybersecurity
Last synced: 03 Nov 2024
https://github.com/nanshihui/Scan-T
a new crawler based on python with more function including Network fingerprint search
crawler netfingerprint python sybersecurity
Last synced: 13 Nov 2024
https://github.com/abhisharma404/vault
swiss army knife for hackers
crawler fuzzing hacking hacking-tool information-gathering lfi networking offensive-security osint pentesting port-scanner python rfi scanner scrapy security sqlite ssl-inspection vault xss-vulnerability
Last synced: 03 Nov 2024
https://github.com/chushuai/wscan
Wscan is a web security scanner that focuses on web security, dedicated to making web security accessible to everyone.
cel-go chromedp crawler headless martian passive-vulnerability-scanner poc sql-injection subdomains testwaf vulnerability-scanner waf webscan wscan xss
Last synced: 21 Nov 2024
https://github.com/jaeksoft/opensearchserver
Open-source Enterprise Grade Search Engine Software
crawler custom-search enterprise indexing java lucene ocr opensearchserver search search-engine synonyms webcrawler webcrawling
Last synced: 21 Dec 2024
https://github.com/dirtyfilthy/freshonions-torscraper
Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion
crawler darknet hidden-services onion scraper spider tor
Last synced: 06 Nov 2024
https://github.com/AlexMathew/scrapple
A framework for creating semi-automatic web content extractors
beautifulsoup crawler css-selector extractor lxml python scrapers scraping scrapy selector selector-expression tutorial web-scraper web-scraping xpath-expression
Last synced: 31 Oct 2024
https://github.com/ChenZixinn/spider_reverse
爬虫逆向案例,已完成:TLS指纹|瑞数|震坤行 | 网易易盾 | 微信小程序反编译逆向(百达星系) | 同花顺 | rpc解密 | 加速乐 | 极验滑块验证码 | 巨量算数 | Boss直聘 | 企查查 | 中国五矿 | qq音乐 | 产业政策大数据平台 | 企知道 | 雪球网(acw_sc__v2) | 1688 | 七麦数据 | whggzy | 企名科技 | mohurd | 艺恩数据 | 欧科云链
crawler python requests spider
Last synced: 31 Oct 2024
https://github.com/yhy0/jie
Jie stands out as a comprehensive security assessment and exploitation tool meticulously crafted for web applications. Its robust suite of features encompasses vulnerability scanning, information gathering, and exploitation, elevating it to an indispensable toolkit for both security professionals and penetration testers.(expectations)
apollo-exp crawler jie scan scanner security-copilot shiro-exp vul vulnerability vulnerability-detection vulnerability-exploitation vulnerability-scanners
Last synced: 21 Dec 2024
https://github.com/yhy0/Jie
Jie stands out as a comprehensive security assessment and exploitation tool meticulously crafted for web applications. Its robust suite of features encompasses vulnerability scanning, information gathering, and exploitation, elevating it to an indispensable toolkit for both security professionals and penetration testers.(expectations)
apollo-exp crawler jie scan scanner security-copilot shiro-exp vul vulnerability vulnerability-detection vulnerability-exploitation vulnerability-scanners
Last synced: 10 Sep 2024
https://github.com/shaohua0116/ICLR2020-OpenReviewData
Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.
conference crawler data-analysis iclr iclr2020 machine-learning visualization
Last synced: 27 Nov 2024
https://github.com/hect0x7/jmcomic-crawler-python
Python API for JMComic | 提供Python API访问禁漫天堂,同时支持网页端和移动端 | 禁漫天堂GitHub Actions下载器🚀
18comic crawler downloader github-actions jmcomic pypi python readthedocs
Last synced: 21 Dec 2024
https://github.com/andythefactory/newspaper4k
📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.
articles articles-data crawler datasets-preparation news newspaper3k python requests scraper scraping
Last synced: 20 Dec 2024
https://github.com/AndyTheFactory/newspaper4k
📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.
articles articles-data crawler datasets-preparation news newspaper3k python requests scraper scraping
Last synced: 26 Oct 2024
https://github.com/tasos-py/Search-Engines-Scraper
Search google, bing, yahoo, and other search engines with python
bing crawler google python scraper search-engine yahoo
Last synced: 20 Nov 2024
https://github.com/cyubuchen/free_proxy_website
获取免费socks/https/http代理的网站集合
crawler free-proxy-list ip proxy proxy-checker spider
Last synced: 17 Nov 2024
https://github.com/gadfly0x/signature_algorithm
各种App、小程序、网站的请求签名或加密算法。 现已有:自如、小红书、蛋壳公寓、luckin coffee(瑞幸咖啡)、bangkokair(曼谷航空)
crawler reverse-engineering spider
Last synced: 11 Nov 2024
https://github.com/roniemartinez/dude
dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators
async beautifulsoup4 crawler css framework lxml parsel playwright python scraper scraping selenium sync web-scraping webscraping xpath
Last synced: 13 Dec 2024
https://github.com/lgraubner/sitemap-generator
Easily create XML sitemaps for your website.
crawler google seo sitemap sitemap-generator xml-sitemap
Last synced: 27 Nov 2024
https://github.com/platonai/PulsarRPA
Automate webpages at scale, scrape web data completely and accurately with high performance, distributed RPA.
crawler data-mining data-science rpa scraper scraping web-automation web-crawler web-mining web-scraping web-sql
Last synced: 05 Nov 2024
https://github.com/howie6879/magic_google
Google search results crawler, get google search results that you need
crawler google google-search spider
Last synced: 21 Dec 2024
https://github.com/smuyyh/crawlerforreader
Android 本地网络小说爬虫,基于jsoup及xpath
android bookreader crawler jsoup xpath
Last synced: 23 Dec 2024
https://github.com/rebrowser/rebrowser-patches
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.
automation bot bot-detection chrome chromedriver cloudflare crawler crawling datadome headless headless-chrome playwright puppeteer puppeteer-extra rebrowser scraping selenium stealth web-scraping webdriver
Last synced: 21 Dec 2024
https://github.com/shaohua0116/ICLR2019-OpenReviewData
Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.
crawler crawling-python openreview tutorial
Last synced: 27 Nov 2024
https://github.com/mhmdiaa/second-order
Second-order subdomain takeover scanner
crawler crawling infosec mapping penetration-testing penetration-testing-tools pentesting recon reconnaissance security security-tools web-application-security wordlist wordlist-generator
Last synced: 26 Dec 2024
https://github.com/brendonboshell/supercrawler
A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
crawler distributed-crawler robot sitemap web-crawler
Last synced: 25 Oct 2024
https://github.com/microsoft/ghcrawler
Crawl GitHub APIs and store the discovered orgs, repos, commits, ...
crawler data github github-api github-webhooks ospo
Last synced: 25 Sep 2024
https://github.com/chishui/jssoup
JavaScript + BeautifulSoup = JSSoup
beautifulsoup crawler html javascript nodejs parser react-native spider
Last synced: 23 Dec 2024
https://github.com/duzun/hquery.php
An extremely fast web scraper that parses megabytes of invalid HTML in a blink of an eye. PHP5.3+, no dependencies.
broken-html crawler css-selectors domcrawler fast hquery html html-parser invalid-html jquery-like jquery-selectors parser php psr-0 psr-4 scraper selectors xml xml-parser
Last synced: 21 Dec 2024
https://github.com/Josue87/EmailFinder
Search emails from a domain through search engines
Last synced: 13 Nov 2024
https://github.com/salimk/rcrawler
An R web crawler and scraper
crawler crawlers r rpackage scraper webcrawler webscraper webscraping webscrapping
Last synced: 24 Dec 2024
https://github.com/scrapy-plugins/scrapy-crawlera
Zyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy
crawler crawler-detection plugin proxy scraping scrapy
Last synced: 05 Sep 2024
https://github.com/scrapy-plugins/scrapy-zyte-smartproxy
Zyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy
crawler crawler-detection plugin proxy scraping scrapy
Last synced: 21 Dec 2024
https://github.com/salimk/Rcrawler
An R web crawler and scraper
crawler crawlers r rpackage scraper webcrawler webscraper webscraping webscrapping
Last synced: 25 Oct 2024
https://github.com/Malwarize/webpalm
🕸️ Crawl in the web network
crawler crawling data data-science datamining go golang hack mining osint redteam spider tool
Last synced: 08 Nov 2024
https://github.com/xiyuan-fengyu/ppspider
web spider built by puppeteer, support task-queue and task-scheduling by decorators,support nedb / mongodb, support data visualization; 基于puppeteer的web爬虫框架,提供灵活的任务队列管理调度方案,提供便捷的数据保存方案(nedb/mongodb),提供数据可视化和用户交互的实现方案
angular cheerio crawler headless mongodb nedb node node-spider nodejs nodejs-spider proxy puppeteer spider task-queue task-scheduling typescript
Last synced: 21 Dec 2024
https://github.com/crwlrsoft/crawler
Library for Rapid (Web) Crawler and Scraper Development
crawler crawling hacktoberfest php scraper scraping scraping-websites web-crawler web-crawling web-scraper web-scraping
Last synced: 25 Oct 2024
https://github.com/rivermont/spidy
The simple, easy to use command line web crawler.
crawler crawling python python3 web-crawler web-spider
Last synced: 29 Oct 2024
https://github.com/dmi3kno/polite
Be nice on the web
crawler memoise r r-package rate-limiter robotstxt rstats rvest scraper webscraping
Last synced: 25 Oct 2024
https://github.com/dennis-tra/nebula
🌌 A network agnostic DHT crawler, monitor, and measurement tool that exposes timely information about DHT networks.
cid crawler filecoin golang hacktoberfest ipfs libp2p
Last synced: 21 Dec 2024
https://github.com/yangjianxin1/qqmusicspider
基于Scrapy的QQ音乐爬虫(QQ Music Spider),爬取歌曲信息、歌词、精彩评论等,并且分享了QQ音乐中排名前6400名的内地和港台歌手的49万+的音乐语料
crawler music musicspider qqmusic scrapy
Last synced: 23 Dec 2024
https://github.com/krypton-byte/tiktok-downloader
Tiktok Downloader/Scraper using requests & bs4
asynchronous asyncio beautifulsoup bs4 crawler downloader flask krypton-byte lightweight nowm python python3 requests tiktok watermark web without
Last synced: 22 Dec 2024
https://github.com/infinitbyte/gopa
🕷️ An easy-to-use spider written in Golang. (previous named GOPA.)
crawler crawling elasticsearch lightweight scraping spider web-crawler web-scraping web-spider
Last synced: 14 Dec 2024
https://github.com/infinilabs/crawler
🕷️ An easy-to-use spider written in Golang. (previous named GOPA.)
crawler crawling elasticsearch lightweight scraping spider web-crawler web-scraping web-spider
Last synced: 27 Dec 2024
https://github.com/TikHubIO/TikHub-API-Python-SDK
High-performance asynchronous Douyin(抖音) TikTok Xiaohongshu(小红书) Kuaishou(快手) Weibo(微博) Instagram YouTube(油管) Twitter(X) Captcha Solver(验证码解决器) Temp Mail(临时邮箱) API(接口).
api captcha-solver crawler data-api douyin douyin-tiktok-api instagram kuaishou netease-cloud-music private-api scrapy tiktok twitter weibo xiaohongshu xiguashipin
Last synced: 29 Oct 2024
https://github.com/lgraubner/sitemap-generator-cli
Creates an XML-Sitemap by crawling a given site.
cli crawler google seo sitemap xml-sitemap
Last synced: 11 Nov 2024
https://github.com/twtrubiks/line-bot-tutorial
line-bot-tutorial use python flask
bot crawler heroku line ptt python-flask tutorial
Last synced: 22 Dec 2024
https://github.com/yaroslaff/nudecrawler
Crawl telegra.ph searching for nudes!
crawl crawler find nsfw nsfw-recognition nude nudes nudity-detection onlyfans python python3 scrape scraper scraping search spider telegra-ph tits web-scraping webscraping
Last synced: 21 Dec 2024
https://github.com/flairnlp/fundus
A very simple news crawler with a funny name
cc-news commoncrawl corpus crawler news-crawler news-scraping nlp python rss scraper sitemap text-extraction web-corpus web-scraping
Last synced: 22 Dec 2024
https://github.com/oppsec/pinkerton
🕵️ Pinkerton is an JavaScript file crawler and secret finder tool developed in Python
crawl crawler hacktoberfest javascript pentest python python3 redteam secrets
Last synced: 24 Dec 2024
https://github.com/mustafadalga/instagram-bot
An Instagram bot developed using the Selenium Framework
automation automation-selenium bot bulk-comments bulk-unfollow crawler crawling download-stories instagram instagram-api instagram-bot instagram-downloader instagram-without-api mass-liking python python3 selenium selenium-framework selenium-python selenium-webdriver
Last synced: 28 Sep 2024
https://github.com/GraySilver/wencai
This is a wencai crawler.(i问财的策略回测接口的Pythonic工具包)
crawler finance pandas quant quantitative-finance tushare wencai
Last synced: 30 Oct 2024
https://github.com/oxylabs/python-web-scraping-tutorial
In this Python Web Scraping Tutorial, we will outline everything needed to get started with web scraping. We will begin with simple examples and move on to relatively more complex.
amazon-scraper-python crawler github-python json-database-python python python-projects python-web-crawler python-web-scraper scraper-python scraping web-crawler-python web-scraping web-scraping-api web-scraping-python webscraping
Last synced: 22 Dec 2024
https://github.com/s0rg/crawley
The unix-way web crawler
cli crawler go golang golang-application pentest pentest-tool pentesting unix-way web-crawler web-scraping web-spider
Last synced: 25 Dec 2024
https://github.com/devanshbatham/Gorecon
Gorecon is a All in one Reconnaissance Tool , a.k.a swiss knife for Reconnaissance , A tool that every pentester/bughunter might wanna consider into their arsenal
admin-panel-finder backups-finder cmsdetecter configurationfiles crawler directory-bruteforce dns dnsrecon email-hunter geo-ip nameserver recon reconaissance reverse-dns scanner subdomain-enumeration subdomain-scanner subnet-lookup whois-lookup wordpress-scanner
Last synced: 04 Nov 2024
https://github.com/eight04/ComicCrawler
An image crawler written in Python.
cli crawler gui image-crawler python tkinter
Last synced: 07 Dec 2024
https://github.com/zhupingqi/RuiJi.Net
crawler framework, distributed crawler extractor
crawler extractor headless-chrome netcore owin scraper scrapy
Last synced: 13 Nov 2024
https://github.com/eight04/comiccrawler
An image crawler written in Python.
cli crawler gui image-crawler python tkinter
Last synced: 20 Dec 2024
https://github.com/Jasonnor/th-music-video-generator
Touhou Project random music video generator/player, crawling image and video from websites to generate MV.
crawler javascript music-video touhou web
Last synced: 11 Nov 2024
https://github.com/MarshalX/telegram-crawler
🕷 Automatically detect changes made to the official Telegram sites, clients and servers.
crawler crawling crawling-python parser telegram telegram-org telegram-updates
Last synced: 19 Nov 2024
https://github.com/marshalx/telegram-crawler
🕷 Automatically detect changes made to the official Telegram sites, clients and servers.
crawler crawling crawling-python parser telegram telegram-org telegram-updates
Last synced: 23 Dec 2024
https://github.com/algolia/algoliasearch-netlify
Official Algolia Plugin for Netlify. Index your website to Algolia when deploying your project to Netlify with the Algolia Crawler
algolia algolia-crawler algoliasearch crawler jamstack netlify netlify-plugin search
Last synced: 21 Dec 2024
https://github.com/glaucocustodio/tanakai
Tanakai is a modern web scraping framework written in Ruby. A fork of Kimurai.
chrome-headless crawler kimurai scraper scrapy webscraping
Last synced: 31 Oct 2024
https://github.com/antchfx/antch
Antch, a fast, powerful and extensible web crawling & scraping framework for Go
crawler crawling framework golang scraping web-crawler web-spider
Last synced: 26 Oct 2024
https://github.com/lucasjinreal/weibo_terminator_workflow
Update Version of weibo_terminator, This is Workflow Version aim at Get Job Done!
crawler nlp scraper sentiment-analysis weibo-terminator
Last synced: 26 Dec 2024
https://github.com/dwisiswant0/galer
A fast tool to fetch URLs from HTML attributes by crawl-in.
crawler devtool extractor galer go golang spider url-extractor url-parser waybackurls
Last synced: 25 Dec 2024
https://github.com/xyntax/filesensor
Dynamic file detection tool based on crawler 基于爬虫的动态敏感文件探测工具
crawler fuzzing pentesting scrapy
Last synced: 25 Dec 2024
https://github.com/zntfdr/selenops
A Swift Web Crawler 🕷
command-line-tool crawler scripting swift web
Last synced: 26 Dec 2024
https://github.com/zrashwani/arachnid
Crawl all unique internal links found on a given website, and extract SEO related information - supports javascript based sites
Last synced: 29 Oct 2024
https://github.com/zntfdr/Selenops
A Swift Web Crawler 🕷
command-line-tool crawler scripting swift web
Last synced: 25 Nov 2024
https://github.com/commoncrawl/news-crawl
News crawling with StormCrawler - stores content as WARC
apache-storm common-crawl commoncrawl crawler news storm-crawler warc web-crawler
Last synced: 16 Nov 2024
https://github.com/vitorfs/woid
Simple news aggregator displaying top stories in real time
Last synced: 25 Dec 2024
https://github.com/kong36088/ZhihuSpider
多线程知乎用户爬虫,基于python3
crawler multi-threading python python3 spider zhihu
Last synced: 26 Nov 2024
https://github.com/sudheer-ranga/aliexpress-product-scraper
Get Aliexpress product details as a json response including feedbacks, variants, shipping info, description, images, etc.,
aliexpress aliexpress-api aliexpress-crawler aliexpress-product-json aliexpress-product-scraper aliexpress-scraper aliexpress-spider crawler dropship dropshipping hacktoberfest hacktoberfest19 hacktoberfest2019 product-json product-reviews product-scraper scraper spider
Last synced: 24 Dec 2024
https://github.com/dwisiswant0/gf-secrets
Secret and/or credential patterns used for gf.
alienvault-otx bugbounty crawler gau gf gitleaks infosec open-threat-exchange secrets-detection trufflehog trufflehog3 wayback wayback-machine waybackurl
Last synced: 25 Dec 2024
https://github.com/ScottSloan/Bili23-Downloader
下载 Bilibili 视频/番剧/电影/纪录片 等资源
bilibili crawler linux macos python videodownloader windows wxpython
Last synced: 27 Oct 2024
https://github.com/lgh06/web-page-monitor
Web Site Page Changes Monitor. 网站网页页面更新变更监控提醒。
change-alert change-detection change-monitor crawler monitor website-change-monitor website-monitoring
Last synced: 26 Dec 2024
https://github.com/R4yGM/dorkscout
DorkScout - Golang tool to automate google dork scan against the entiere internet or specific targets
bug-bounty crawler ghdb golang google-dorks osint scraper security
Last synced: 21 Nov 2024