Crawler | Ecosyste.ms: Awesome

https://github.com/vinaygopinath/ngMeta

Dynamic meta tags in your AngularJS single page application

angularjs crawler meta-tags opengraph seo ui-router

Last synced: 07 Aug 2024

https://github.com/beb7/gflare-tk

Open-Source Python Based SEO Web Crawler

crawler python robots-txt scraper seo seo-crawler tkinter

Last synced: 03 Aug 2024

https://github.com/luohaha/jlitespider

A lite distributed Java spider framework :-)

crawler distributed distributed-systems rabbitmq spider

Last synced: 03 Aug 2024

https://github.com/tijme/not-your-average-web-crawler

A web crawler (for bug hunting) that gathers more than you can imagine.

bug-bounty callbacks crawler custom get post python request scanner scraper security spider vulnerability

Last synced: 04 Aug 2024

https://github.com/cwjokaka/bilibili_member_crawler

B站用户爬虫好耶~是爬虫

bilibili crawler multithreading mysql python python3 queue requests spider web

Last synced: 27 Oct 2024

https://github.com/jin10086/pachong

一些爬虫的代码

crawler python2

Last synced: 12 Oct 2024

https://github.com/liu233w/acm-statistics

An online tool (crawler) to analyze users performance in online judges (coding competition websites). Supported OJ: POJ, HDU, HYSBZ, CodeForces, UVA, ICPC Live Archive, FZU, SPOJ, Timus (URAL), LeetCode_CN, CSU, LibreOJ, 洛谷, 牛客OJ, Lutece (UESTC), AtCoder, AIZU, CodeChef, El Judge, BNUOJ, Codewars, UOJ, NBUT, 51Nod, DMOJ, VJudge

acm-icpc codechef-api codeforces-api crawler csharp docker javascript nodejs spoj-api vue

Last synced: 30 Oct 2024

https://github.com/Liu233w/acm-statistics

An online tool (crawler) to analyze users performance in online judges (coding competition websites). Supported OJ: POJ, HDU, HYSBZ, CodeForces, UVA, ICPC Live Archive, FZU, SPOJ, Timus (URAL), LeetCode_CN, CSU, LibreOJ, 洛谷, 牛客OJ, Lutece (UESTC), AtCoder, AIZU, CodeChef, El Judge, BNUOJ, Codewars, UOJ, NBUT, 51Nod, DMOJ, VJudge

acm-icpc codechef-api codeforces-api crawler csharp docker javascript nodejs spoj-api vue

Last synced: 01 Aug 2024

https://github.com/clarketm/s3recon

Amazon S3 bucket finder and crawler.

crawler finder python recon s3 s3-bucket

Last synced: 04 Aug 2024

https://github.com/janreges/siteone-crawler

SiteOne Crawler is a website analyzer and exporter you'll ♥ as a Dev/DevOps, QA engineer, website owner or consultant. Works on all popular platforms - Windows, macOS and Linux (x64 and arm64 too).

analyzer crawler crawling performance qa quality-assessment security seo seotools stress-testing swoole testing website

Last synced: 25 Oct 2024

https://github.com/twiny/spidy

Domain names collector - Crawl websites and collect domain names along with their availability status.

backlinks crawler domain expired-domain golang scraper seotools spider

Last synced: 05 Nov 2024

https://github.com/bartdag/pylinkvalidator

pylinkvalidator is a standalone and pure python link validator and crawler that traverses a web site and reports errors (e.g., 500 and 404 errors) encountered.

crawler link-checker networking python

Last synced: 31 Oct 2024

https://github.com/abaykan/CrawlBox

Easy way to brute-force web directory.

admin-finder crawler python web-crawler wordlist

Last synced: 30 Oct 2024

https://github.com/egoist/taki

Take a snapshot of any website.

crawler prerender snapshot

Last synced: 31 Oct 2024

https://github.com/JarryShaw/darc

Darkweb Crawler Project

crawler darkweb

Last synced: 30 Oct 2024

https://github.com/moranzcw/Zhihu-Spider

一个获取知乎用户主页信息的多线程Python爬虫程序。

crawler jupyter-notebook matplotlib python requests zhihu-spider

Last synced: 31 Oct 2024

https://github.com/karust/gogetcrawl

Extract web archive data using Wayback Machine and Common Crawl

commoncrawl concurrency crawler golang wayback-machine webarchive

Last synced: 01 Aug 2024

https://github.com/hominee/dyer

Dyer is designed for reliable, flexible and fast web crawling, providing some high-level, comprehensive features without compromising speed.

crawler rust rust-programming-language spider web-crawler web-framework web-scraping

Last synced: 01 Aug 2024

https://github.com/algolia/npm-search

🗿 npm ↔️ Algolia replication tool :skier: :snail: :artificial_satellite:

algolia couchdb crawler npm search sync yarn

Last synced: 01 Aug 2024

https://github.com/teal33t/poopak

POOPAK - TOR Hidden Service Crawler

crawler dark-web darknet deepweb docker flask hidden-services mongo osint redis tor tor-network

Last synced: 13 Oct 2024

https://github.com/tgiles/auto-lighthouse

A utility package for automating lighthouse reporting

audits auto-lighthouse crawler lighthouse-reports robots simplecrawler

Last synced: 01 Nov 2024

https://github.com/nuhmanpk/webscrapper

Simple and powerfull all in one Telegram Bot to scrap / crawl webpages using Requests, html5lib and Beautifulsoup

beautifulsoup4 crawler crawler-engine crawler-python hacktoberfest hacktoberfest-accepted hacktoberfest2023 pyrogram pyrogram-bot requests scraper scraping selenium telegram telegram-bot web-scraping webscraping webscrapper webscrapping webscrapping-python

Last synced: 26 Oct 2024

https://github.com/karthikuj/sasori

Sasori is a dynamic web crawler powered by Puppeteer, designed for lightning-fast endpoint discovery.

automation crawler crawling dast dynamic endpoint-discovery infosec puppeteer scraping security

Last synced: 01 Nov 2024

https://github.com/TGiles/auto-lighthouse

A utility package for automating lighthouse reporting

audits auto-lighthouse crawler lighthouse-reports robots simplecrawler

Last synced: 01 Aug 2024

https://github.com/duckduckgo/tracker-radar-collector

🕸 Modular, multithreaded, puppeteer-based crawler

crawler puppeteer tracker-radar

Last synced: 27 Oct 2024

https://github.com/WuLC/GoogleImagesDownloader

Enlarge training dataset by searching images with specified keywords in google and download the presented images

crawler google image keyword selenium

Last synced: 01 Aug 2024

https://github.com/JakePartusch/lumberjack

An automated website accessibility scanner and cli

a11y accessibility axe cli crawler lumberjack

Last synced: 03 Aug 2024

https://github.com/jakepartusch/lumberjack

An automated website accessibility scanner and cli

a11y accessibility axe cli crawler lumberjack

Last synced: 27 Oct 2024

https://github.com/alash3al/scraply

Scraply a simple dom scraper to fetch information from any html based website

crawler crawling dom golang scraper scrapers scraping-websites scrapy server

Last synced: 04 Nov 2024

https://github.com/luckylittle/blinkist-m4a-downloader

Grabs all of the audio files from all of the Blinkist books

audiobooks blinkist books crawler data-archiving data-mining data-processing go golang scraper spider

Last synced: 27 Oct 2024

https://github.com/wxyyxc1992/declarative-crawler

Xenomorph Crawler, a Concise, Declarative and Observable Distributed Crawler(Node / Go / Java / Rust) For Web, RDB, OS, also can act as a Monitor(with Prometheus) or ETL for Infrastructure :dizzy: 多语言执行器，分布式爬虫

crawler etl koa2 monitor nodejs react wx-code

Last synced: 04 Aug 2024

https://github.com/wx-chevalier/sentinel-crawler

Xenomorph Crawler, a Concise, Declarative and Observable Distributed Crawler(Node / Go / Java / Rust) For Web, RDB, OS, also can act as a Monitor(with Prometheus) or ETL for Infrastructure :dizzy: 多语言执行器，分布式爬虫

crawler etl koa2 monitor nodejs react wx-code

Last synced: 14 Oct 2024

https://github.com/nasa-jpl-memex/memex-explorer

Viewers for statistics and dashboarding of Domain Search Engine data

ache anaconda apache crawler dashboard domain-discovery memex-explorer miniconda nutch tika

Last synced: 07 Aug 2024

https://github.com/greengerong/prerender-java

java framework for prerender

angular1 crawler java prerender prerendered-page seo

Last synced: 11 Oct 2024

https://github.com/glouw/andvaranaut

A dungeon crawler

crawl crawler dungeon

Last synced: 27 Oct 2024

https://github.com/ethereum/node-crawler

Attempts to crawl the Ethereum network of valid Ethereum execution nodes and visualizes them in a nice web dashboard.

crawler ethereum

Last synced: 07 Oct 2024

https://github.com/simfin/pdf-crawler

SimFin's open source PDF crawler

crawler crawling geckodriver pdf pdf-crawler puppeteer python selenium-webdriver

Last synced: 11 Oct 2024

https://github.com/hardikvasa/webb

Python: An all-in-one Web Crawler, Web Parser and Web Scrapping library!

crawl-pages crawler python-library

Last synced: 26 Oct 2024

https://github.com/SeaQL/starfish-ql

✴️ An experimental graph database

crates-io crawler database graph hacktoberfest network rust sql visualization

Last synced: 02 Aug 2024

https://github.com/schollz/linkcrawler

Cross-platform persistent and distributed web crawler :link:

crawler hyperlinks web

Last synced: 18 Oct 2024

https://github.com/clemfromspace/scrapy-puppeteer

Scrapy + Puppeteer

crawler puppeteer python scraping scrapy

Last synced: 27 Oct 2024

https://github.com/SimFin/pdf-crawler

SimFin's open source PDF crawler

crawler crawling geckodriver pdf pdf-crawler puppeteer python selenium-webdriver

Last synced: 01 Aug 2024

https://github.com/lixi5338619/asyncpy

使用asyncio和aiohttp开发的轻量级异步协程web爬虫框架

aiohttp asyncio asyncpy crawler python scrapy

Last synced: 05 Nov 2024

https://github.com/antoinevastel/bots-zoo

bot crawler crawling playwright puppeteer scraper scraping selenium user-agent useragent

Last synced: 02 Nov 2024

https://github.com/ducdev/aliexscrape

Get Aliexpress product details in JSON

aliexpress aliexpress-api aliexpress-crawler aliexpress-scraper aliexpress-spider crawler dropship dropshipping hacktoberfest hacktoberfest19 hacktoberfest2019 json scraper spider

Last synced: 27 Oct 2024

https://github.com/wuchunfu/ipproxypool

Golang 实现的 IP 代理池, 涉及到的技术点: go gorm proxy proxypool ip crawler 爬虫 mysql viper cobra

crawler go ip proxy proxy-server proxypool

Last synced: 30 Oct 2024

https://github.com/patrickschur/pappet

A command-line tool to crawl websites using puppeteer.

cli crawler pdf puppeteer screenshot

Last synced: 01 Aug 2024

https://github.com/foolin/pagser

Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler

colly crawler deserialization go golang goquery html page parser scrapy

Last synced: 26 Oct 2024

https://github.com/pavlovtech/WebReaper

Web scraper, crawler and parser in C#. Designed as simple, declarative and scalable web scraping solution.

crawler datamining parser parsing scraper scraping scraping-api scraping-data scraping-tool scraping-web scraping-websites webcrawler webscraping

Last synced: 01 Aug 2024

https://github.com/kostas-pa/LFITester

LFITester is a Python3 program that automates the detection and exploitation of Local File Inclusion (LFI) vulnerabilities on a server.

bugbounty crawler cybersecurity enumeration exploitation fuzzing hacking lfi lfi-detection lfi-exploitation lfi-vulnerability penetration-testing penetration-testing-tools pentest-tool pentesting python web-hacking webhacking

Last synced: 04 Aug 2024

https://github.com/zhaow-de/rotating-tor-http-proxy

A multi-arch image provides one HTTP proxy endpoint with many concurrent tunnels to the Tor network.

amd64 arm64 armv6 armv7 crawler docker-image dockerhub-image haproxy multi-platform privoxy-tor proxy tor

Last synced: 01 Aug 2024

https://github.com/medcl/gopa-abandoned

GOPA, a spider written in Go.（NOTE: this project moved to https://github.com/infinitbyte/gopa ）

crawler golang lightweight spider

Last synced: 04 Aug 2024

https://github.com/creekorful/bathyscaphe

Fast, highly configurable, cloud native dark web crawler.

architecture crawler crawling elasticsearch golang hidden-services kibana tor web-crawler

Last synced: 27 Oct 2024

https://github.com/kurogai/deepweb-scappering

Discover hidden deepweb pages

crawler deepweb hacking hacking-tool internet kali python3 scappering scapre tor tor-network

Last synced: 27 Oct 2024

https://github.com/foo-git/rewe-discounts

Grabs current REWE discounts and saves them in a markdown file || Holt sich aktuelle REWE-Angebote und exportiert sie in eine Markdown-Liste

api crawler python rewe

Last synced: 26 Oct 2024

https://github.com/jefferyhus/es6-crawler-detect

:spider: This is an ES6 adaptation of the original PHP library CrawlerDetect, this library will help you detect bots/crawlers/spiders vie the useragent.

bots crawler detection es6-javascript spider

Last synced: 27 Oct 2024

https://github.com/JefferyHus/es6-crawler-detect

:spider: This is an ES6 adaptation of the original PHP library CrawlerDetect, this library will help you detect bots/crawlers/spiders vie the useragent.

bots crawler detection es6-javascript spider

Last synced: 02 Aug 2024

https://github.com/nietaki/crawlie

A simple Elixir library for writing decently-performing crawlers with minimum effort.

crawler elixir elixir-library genstage

Last synced: 26 Oct 2024

https://github.com/hueristiq/xcrawl3r

A command-line interface (CLI) based utility to recursively crawl webpages. It is designed to systematically browse webpages' URLs and follow links to discover linked webpages' URLs.

bug-bounty bug-bounty-tools contentdiscovery crawler ethical-hacking ethical-hacking-tools go golang penetration-testing penetration-testing-tools reconnaissance red-teaming red-teaming-tools web-security

Last synced: 02 Aug 2024

https://github.com/tobecrazy/seleniumdemo

Selenium automation test framework

container crawler docker docker-compose jenkins maven pip python selenium selenium-grid selenium-webdriver snapshot

Last synced: 11 Oct 2024

https://github.com/Randark-JMT/Bilibili_manga_download

带图形界面的哔哩哔哩漫画下载工具

bilibili crawler downloader pyside6 python python3 qt spider

Last synced: 27 Oct 2024

https://github.com/roccomuso/is-google

Verify that a request is from Google crawlers using Google's DNS verification steps

bot check crawler dns google ip js nodejs verify

Last synced: 27 Oct 2024

https://github.com/tensojka/instastories-backup

Backup your friends' Instagram Stories forever and get to keep them even after 24 hours.

backup crawler instagram instagram-stories python python-3-6 python3

Last synced: 12 Oct 2024

https://github.com/boris-code/feaplat

爬虫管理系统，支持集群，弹性伸缩。支持运行feapder、scrapy、selenium、playwright等各种框架及脚本

crawler feapder feaplat spider

Last synced: 15 Oct 2024

https://github.com/yuanxu-li/html-table-extractor

extract data from html table

beautifulsoup crawler extract-data html html-table scraping table

Last synced: 01 Aug 2024

https://github.com/ArchiveTeam/wget-lua

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

archiveteam archiving crawl crawler crawlers crawling downloader ftp lua scraper scraping spider warc webarchiving wget wget-lua zstd

Last synced: 06 Aug 2024

https://github.com/lrlna/puppeteer-walker

a puppeteer walker 🕷 🕸

chrome crawler headless puppeteer spider walker

Last synced: 27 Oct 2024

https://github.com/kcubeterm/achoz

Search through all your personal data efficiently like web search.

crawler document-search filesearch search-engine websearch

Last synced: 07 Aug 2024

https://github.com/crawlzone/crawlzone

Crawlzone is a fast asynchronous internet crawling framework for PHP.

automated-testing crawler crawling-framework middleware php web-scraping web-search

Last synced: 29 Oct 2024

https://github.com/feiskyer/scrapy-examples

Some scrapy and web.py exmaples

crawler python scrapy

Last synced: 02 Nov 2024

https://github.com/jannchie/simpyder

超高速异步协程Python爬虫

crawler python spider

Last synced: 27 Oct 2024

https://github.com/tzw0745/tumblr-crawler-cli

Tumblr Download Tool with High Speed and Customization. 高性能&高定制化的Tumblr下载工具。

cli-app crawler python tumblr tumblr-downloader

Last synced: 05 Aug 2024

https://github.com/zhang2333/light-crawler

a simplified directed customizable website crawler

crawler node-js

Last synced: 03 Aug 2024

https://github.com/aufzayed/HydraRecon

All In One, Fast, Easy Recon Tool

bugbounty bugbounty-tool bugbountytips crawler hacking hacking-tools information-gathering open-source-intelligence osnit pentest pentest-tools pentesting python recon recon-tools

Last synced: 03 Aug 2024

https://github.com/jhao104/spider

python crawler spider

crawler python spider

Last synced: 28 Oct 2024

https://github.com/melroy89/metacritic_api

PHP Metacritic API - Mirror from my GitLab

api crawler data metacritic parser php scores scraper webscraping

Last synced: 03 Oct 2024

https://github.com/mzollin/qr-pirate

crawl QR-codes from search engines and look for bitcoin private keys

bitcoin bitcoin-wallet crawler cryptocurrency private-key python qr-code qrcode qrcode-reader

Last synced: 11 Oct 2024

https://github.com/ityouknow/python-crawler

Python Crawler

crawler python python-crawler

Last synced: 28 Oct 2024

https://github.com/alexfazio/devdocs-to-llm

Turn any developer documentation into a GPT

crawler crawling firecrawl scraper scraping

Last synced: 27 Oct 2024

https://github.com/liameno/librengine

Privacy Web Search Engine (not meta, own crawler)

cpp crawler encryption frontend privacy robots-txt rsa search-engine self-hosted spider websearch websearchengine

Last synced: 02 Aug 2024

https://github.com/lin-jun-xiang/chatgpt-line-bot

🤖Free ChatGPT Line Bot with Horoscope, Music Broadcast, Google Image Search...

chatbot chatgpt craw crawler cron gpt gpt-3 gpt4free linebot replit scraper

Last synced: 26 Oct 2024

https://github.com/absingh31/tor_spider

Python project to crawl and scrap the lesser known deep web or one can say dark web. Just provide the onion link and get started.

crawler file-manager ioc python3 scraper scraping socks stem tor tor-config tor-spider

Last synced: 03 Aug 2024

https://github.com/saltyshiomix/nest-crawler

An easiest crawling and scraping module for NestJS

crawler nestjs nodejs scraper typescript

Last synced: 27 Oct 2024

https://github.com/schollz/crawdad

Cross-platform persistent and distributed web crawler :crab:

crawler golang redis web

Last synced: 18 Oct 2024

https://github.com/cho45/chemrtron

A document viewer; fuzzy match incremental search.

crawler document-viewer electron increment javascript

Last synced: 31 Oct 2024

https://github.com/dannyben/snapcrawl

Crawl a website and take screenshots

capture crawler gem ruby screenshot

Last synced: 31 Oct 2024

https://github.com/mmerian/phpcrawl

Copy of http://phpcrawl.cuab.de/ for using with composer

composer crawler php phpcrawl

Last synced: 13 Oct 2024

https://github.com/drkostas/jobapplicationbot

A bot that automatically sends emails to new ads posted in any desired xe.gr search url.

bot crawler email-sender python scraper

Last synced: 28 Oct 2024

https://github.com/bajins/tool-gin

基于go-gin框架建立减少冗余动作项目，如：下载一些工具

crawler gin gin-gonic golang key keygen mobaxterm-keygen navicat nginx-conf nginx-configuration python3 registry-workshop scraper shell spider xftp xmanager xshell

Last synced: 15 Oct 2024

https://github.com/lobehub/chat-plugin-web-crawler

🧩 / 🕸 WebsiteCrawler - This plugin automatically crawls the main content of a specified URL webpage and uses it as context input.

ai chatgpt crawler function-calling lobe-chat lobe-chat-plugin openai

Last synced: 01 Nov 2024

https://github.com/nicholaskajoh/devsearch

A web search engine built with Python which uses TF-IDF and PageRank to sort search results.

crawler flask mongodb pagerank python scrapy search search-engine spider tf-idf

Last synced: 02 Aug 2024

https://github.com/howie6879/talospider

talospider - A simple,lightweight scraping micro-framework

crawler crawling python spider web-spider

Last synced: 21 Oct 2024

https://github.com/shurco/goClone

🌱 goClone - clone websites in seconds

cloner cloning crawler go goclone golang hacktoberfest scraping scrapper website-cloner website-scraper wp2static

Last synced: 02 Aug 2024

https://github.com/eliashaeussler/cache-warmup

🔥 PHP library to warm up caches of URLs located in XML sitemaps

cache-warmup crawler php xml-sitemap

Last synced: 01 Nov 2024

https://github.com/roccomuso/price-monitoring

Node.js price monitoring library, leveraging the power of x-ray and nightmare.

alert comparison crawler javascript monitoring nodejs price-tracker

Last synced: 28 Oct 2024

https://github.com/findopendata/findopendata

A search engine for Open Data

crawler dataset-search opendata

Last synced: 05 Aug 2024

https://github.com/jaymon/wishlist

Read an Amazon wishlist programmatically with Python

amazon amazon-wishlist api crawler python scraper

Last synced: 27 Oct 2024

https://github.com/hfreire/browser-as-a-service

A web browser :earth_americas: hosted as a service, to render your JavaScript web pages as HTML

browser browser-as-a-service crawler docker github-actions javascript puppeteer rest-api scraper server webcrawler

Last synced: 26 Oct 2024

https://github.com/d4vinci/scrapling

Lightning-Fast, Adaptive Web Scraping for Python

automation crawler crawling crawling-python css dom-manipulation hacktoberfest lxml playwright python python3 scraping selectors selenium stealth web-scraper web-scraping web-scraping-python webscraping xpath

Last synced: 31 Oct 2024

https://github.com/farishijazi/rarbgcli

RARBG command line interface for scraping the rarbg.to torrent search engine

crawler rarbg rarbg-torrentapi torrent torrents torrents-crawler

Last synced: 27 Oct 2024