Crawler | Ecosyste.ms: Awesome

https://github.com/d4vinci/scrapling

Lightning-Fast, Adaptive Web Scraping for Python

automation crawler crawling crawling-python css dom-manipulation hacktoberfest lxml playwright python python3 scraping selectors selenium stealth web-scraper web-scraping web-scraping-python webscraping xpath

Last synced: 31 Oct 2024

https://github.com/farishijazi/rarbgcli

RARBG command line interface for scraping the rarbg.to torrent search engine

crawler rarbg rarbg-torrentapi torrent torrents torrents-crawler

Last synced: 27 Oct 2024

https://github.com/valerebron/usetube

search & get datas from youtube no google account needed

crawler typescript video youtube youtube-api

Last synced: 14 Oct 2024

https://github.com/forsti0506/a11y-sitechecker

Automatic accessibility checker with website crawling + screenshots for easy use

accessibility accessibility-criteria accessibility-testing axe crawler hacktoberfest open-source puppeteer typescript typescript-library

Last synced: 31 Oct 2024

https://github.com/ReedD/crawler

Chromium / Puppeteer site crawler

bot chromium crawler puppeteer redis scraper

Last synced: 25 Oct 2024

https://github.com/a11ywatch/crawler

gRPC web crawler turbo charged for performance

a11ywatch crawler grpc scraper

Last synced: 13 Oct 2024

https://github.com/goldarowana/douyin-crawler

抖音爬虫. 通过手机代理爬取用户的作品和用户的喜欢

crawler douyin douyin-download java vertx

Last synced: 09 Oct 2024

https://github.com/sachaarbonel/scrapy.dart

Scrapy, a fast high-level web crawling & scraping framework for dart and Flutter

crawler dart scrapy

Last synced: 28 Oct 2024

https://github.com/ReddyyZ/URLBrute-Py

Tool to brute website sub-domains and dirs.

brute-force bruteforcer crawler dir-scanner dirscanner dirsearch sub-domain-enumeration sub-domain-scanner

Last synced: 04 Aug 2024

https://github.com/murat/tors

⏬ Yet another torrent searching application for your command line

crawler ruby-gem torrent-downloader torrent-search-engine

Last synced: 28 Oct 2024

https://github.com/spider-rs/spider-py

Spider ported to Python

crawler headless-chrome python scraper spider web-crawler

Last synced: 05 Nov 2024

https://github.com/soruly/anilist-crawler

Crawl data from anilist API and store in MariaDB.

anilist anime crawler

Last synced: 27 Oct 2024

https://github.com/mike442144/seenreq

Generate an object for testing if a request is sent, request is Mikeal's request.

crawler duplicates-removed post request spider url

Last synced: 27 Oct 2024

https://github.com/jin10086/copyheaders

方便的从浏览器复制浏览器头

crawler python tools

Last synced: 27 Oct 2024

https://github.com/Conso1eCowb0y/Deepminer

Deep web crawler and search engine

crawler crawling dark-web data-mining deepminer deepweb github hacking onion osint python-web-scraper python3 search-engine security security-tools spider the-onion-router tor tor-network webcrawler

Last synced: 02 Aug 2024

https://github.com/liangWenPeng/scrapy-admin

A django admin site for scrapy

crawler scrapy scrapyd spider

Last synced: 17 Aug 2024

https://github.com/riquellopes/fii

API para recuperar informações sobre FII

crawler investiment mongodb nodejs

Last synced: 31 Oct 2024

https://github.com/spk/maman

Rust Web Crawler saving pages on Redis

crawler http spider web web-crawler

Last synced: 01 Nov 2024

https://github.com/golang-collection/go-crawler-distributed

分布式爬虫项目，本项目支持个性化定制页面解析器二次开发，项目整体采用微服务架构，通过消息队列实现消息的异步发送，使用到的框架包括：redigo, gorm, goquery, easyjson, viper, amqp, zap, go-micro，并通过Docker实现容器化部署，中间爬虫节点支持水平拓展。

crawler docker elasticsearch go go-micro gocrawler microservice rabbitmq

Last synced: 04 Aug 2024

https://github.com/healeycodes/Broken-Link-Crawler

:robot: Python bot that crawls your website looking for dead stuff

bot crawler python

Last synced: 26 Sep 2024

https://github.com/healeycodes/broken-link-crawler

:robot: Python bot that crawls your website looking for dead stuff

bot crawler python

Last synced: 22 Oct 2024

https://github.com/kant2002/ncrawler

Web Crawler written in C#

crawler scrapper

Last synced: 22 Oct 2024

https://github.com/elboletaire/php-crawler

:spider: A simple crawler (spider) writen in php just for fun, with zero dependencies

crawler php spider

Last synced: 31 Oct 2024

https://github.com/axetroy/crawler

nodejs 爬虫框架. crawler framework for nodejs

crawler nodejs

Last synced: 27 Oct 2024

https://github.com/ronin-rb/ronin-web

ronin-web is a collection of useful web helper methods and commands.

cli crawler hacktoberfest helpers html proxy-server ronin-rb ruby server spider web xml

Last synced: 04 Nov 2024

https://github.com/ryuchen/deadpool

该项目是一个使用celery作为主体框架的爬虫应用，能够灵活的添加爬虫任务，并且同时运行多站点的爬虫工作，所有组件都能够原生支持规模并发和分布式，加上celery原生的分布式调用，实现大规模并发。

celery crawler deadpool python3 spider taobao taobao-spider tmall tmall-spider

Last synced: 28 Oct 2024

https://github.com/charlespikachu/seleniumlogin

Login some website using selenium.

crawler selenium selenium-webdriver spider taobao

Last synced: 09 Oct 2024

https://github.com/jonaslejon/lolcrawler

Headless web crawler for bugbounty and penetration-testing/redteaming

bugbounty crawler docker penetration-testing penetration-testing-tools redteam redteam-tools redteaming

Last synced: 04 Aug 2024

https://github.com/mrxujiang/crawel

基于Apify+node+react搭建的有点意思的爬虫平台

apify crawler node puppeteer react react-hooks umi umi3

Last synced: 14 Oct 2024

https://github.com/p0dalirius/robotstester

This Python script can enumerate all URLs present in robots.txt files, and test whether they can be accessed or not.

bugbounty crawler pentesting python robots tool

Last synced: 29 Oct 2024

https://github.com/himself65/luogucrawler

一个python爬虫来爬取洛谷各种信息

crawler python python3

Last synced: 01 Oct 2024

https://github.com/bin-huang/nodespider

[DEPRECATED] Simple, flexible, delightful web crawler/spider package

async crawl crawler node pipeline promise spider web

Last synced: 27 Oct 2024

https://github.com/0xhjk/x12306

12306查票助手，一键查询沿途所有站点，先上车后补票，让你的出行更省心。

12306 12306buyticket 12306helper 12306qiang-piao crawler fk12306 helper reqeusts spider ticket train x12306

Last synced: 31 Oct 2024

https://github.com/BitTheByte/Domainker

BugBounty Tool

bb bugbounty bugcrowd checker code crawler h1 hackerone hacking hacking-tool injection python rce response struts2 subdomain sudomains

Last synced: 23 Oct 2024

https://github.com/hackfengJam/ArticleSpider

Crawling zhihu, jobbole, lagou by Scrapy, and using Elasticsearch+Django to build a Search Engine website --- README_zh.md (including: implementation roadmap, distributed-crawler and coping with anti-crawling strategies).

crawler distributed-systems django elasticsearch scrapy

Last synced: 31 Oct 2024

https://github.com/kylemocode/medium-stat-box

Practical pinned gist which show your latest medium status 📌

awesome-pinned-gists crawler github-action github-gists medium-stats

Last synced: 02 Nov 2024

https://github.com/xiantang/spider

web crawler

crawler python3

Last synced: 15 Oct 2024

https://github.com/gamemann/bestbuy-parser

A personal tool using Python's Scrapy framework to scrape Best Buy's product pages for RTX 3080 TIs and notify if available/not sold out.

3080 automation best bestbuy bot buy crawler parser python python3 rtx scrapy ti

Last synced: 27 Oct 2024

https://github.com/haxzie-xx/instagram-downloader

Node.js/Express app to retrive instagram video/image download urls

crawler downloader express instagram instagram-scraper nodejs

Last synced: 27 Oct 2024

https://github.com/apocelipes/schannel-qt5

A GUI client of schannel powered by therecipe/qt and golang

client-side crawler go golang goqt linux qcharts qt5

Last synced: 23 Oct 2024

https://github.com/jfreegman/toxcrawler

A Tox DHT network crawler

crawler dht dht-network tox toxcore

Last synced: 15 Oct 2024

https://github.com/veliovgroup/spiderable-middleware

🤖 Prerendering for JavaScript powered websites. Great solution for PWAs (Progressive Web Apps), SPAs (Single Page Applications), and other websites based on top of front-end JavaScript frameworks

crawler meteor meteor-package middleware nodejs npm npm-package seo seo-optimization spiderable

Last synced: 14 Oct 2024

https://github.com/VeliovGroup/spiderable-middleware

🤖 Prerendering for JavaScript powered websites. Great solution for PWAs (Progressive Web Apps), SPAs (Single Page Applications), and other websites based on top of front-end JavaScript frameworks

crawler meteor meteor-package middleware nodejs npm npm-package seo seo-optimization spiderable

Last synced: 04 Aug 2024

https://github.com/code4everything/visual-spider

欢迎体验我们全新的桌面端效率工具RunFlow，https://myrest.top/myflow

crawler crawler4j-java java-8 java8 javafx javafx-application spider visualization

Last synced: 29 Sep 2024

https://github.com/ph-7/crawling-emails

Very simple bash script to crawl email addresses from a specific website.

bash crawler email email-scraper scrape scrape-email scraper scraping shell wget

Last synced: 28 Oct 2024

https://github.com/debugtalk/webcrawler

A web crawler based on requests-html, mainly targets for url validation test.

crawler requests-html web-crawler weblink

Last synced: 16 Oct 2024

https://github.com/fanhuaandluomu/sina_spider

新浪微博爬虫：登录、关键词微博查询、微博监控

crawler python-2 sina-spider

Last synced: 12 Oct 2024

https://github.com/gomjellie/pysaint

[deprecated] 유세인트 파이썬 클라이언트

crawler sap soongsil unofficial

Last synced: 28 Oct 2024

https://github.com/mamal72/iranian-calendar-events

Fetch Iranian calendar events (Jalali, Hijri and Gregorian) from time.ir website

crawler events iranian jalali jalali-calendar persian

Last synced: 02 Nov 2024

https://github.com/kshru9/web-crawler

A multithreaded web crawler using two mechanism - single lock and thread safe data structures

concurrency concurrent-data-structure cpp crawler data-structures html-parser lock multithreading openssl pagerank pthread reader-writer-lock search-engine socket threading threadsafe webcrawler website-downloader

Last synced: 28 Oct 2024

https://github.com/k1LoW/utsusemi

A tool to generate a static website by crawling the original site.

api aws aws-lambda crawler s3-website serverless serverless-framework

Last synced: 04 Aug 2024

https://github.com/minhhungit/github-action-rss-crawler

Auto crawl RSS feeds using Github Action

crawler csharp github-actions litedb netcore rss rss-crawler rss-items

Last synced: 02 Aug 2024

https://github.com/k1low/utsusemi

A tool to generate a static website by crawling the original site.

api aws aws-lambda crawler s3-website serverless serverless-framework

Last synced: 17 Oct 2024

https://github.com/pykong/pypergrabber

Fetches PubMed article IDs (PMIDs) from email inbox, then crawls PubMed, Google Scholar and Sci-Hub for respective PDF files.

crawler email-inbox google-scholar pdf pmid pubmed python sci-hub scraper

Last synced: 16 Oct 2024

https://github.com/riptl/ytpriv

YT metadata exporter

big-data crawler csv datascience json video youtube

Last synced: 03 Aug 2024

https://github.com/ERap320/CrowLeer

Powerful C++ web crawler based on libcurl

cli crawler crawling download

Last synced: 03 Aug 2024

https://github.com/alex-page/get-site-urls

🔗 Get all of the URL's from a website.

crawler sitemap-generator urls

Last synced: 27 Oct 2024

https://github.com/marcel0024/cococrawler

An declarative and easy to use web crawler and scraper in C#

cococrawler crawler crawling-tool csharp dotnet dotnetcore scraper scraping-tool webcrawler webcrawler-csharp webcrawling webscraper

Last synced: 12 Oct 2024

https://github.com/spider-rs/spider-nodejs

Spider ported to Node.js

crawler distributed-systems headless-chrome indexer nodejs scraper spider typescript

Last synced: 05 Nov 2024

https://github.com/novemberde/serverless-crawler-demo

Serverless Architecture Crawler demo

aws crawler demo handson serverless

Last synced: 04 Aug 2024

https://github.com/matheusfelipeog/froxy

Hide your IP with free proxies using Froxy 🔄

crawler free-proxy froxy hide-ip proxies proxies-scraper proxy python requests requests-module scraping

Last synced: 26 Oct 2024

https://github.com/italia/publiccode-crawler

publiccode.yml crawler for the Open Source software catalog of Developers Italia

crawler developers-italia hacktoberfest publiccode publiccodeyml

Last synced: 02 Aug 2024

https://github.com/mattwang44/uspto-patft-web-crawler

Crawler for fetching information of US Patents and PDF bulk download

crawler patent patent-crawler pyqt5 python3 uspto

Last synced: 02 Oct 2024

https://github.com/Smartproxy/Python-scraper-tutorial

A short introduction to scraping with Python with given steps and an example scraper script.

beautifulsoup crawler data-mining data-science github-python json-database-python learning python python-projects python-web-crawler python-web-scraper scraper-python scraping web-crawler-python web-scraping web-scraping-api web-scraping-python webscraping

Last synced: 04 Aug 2024

https://github.com/bartozzz/crawlerr

A simple and fully customizable web crawler/spider for Node.js with server-side DOM. Comes with elegant and hell-simple APIs.

crawler jsdom nodejs scraper spider web-crawler

Last synced: 20 Oct 2024

https://github.com/aliosm/kontests

Competitive programming contests schedule

a2oj atcoder codeforces codeforces-gym codeshef competitive-programming crawler csacademy hackerearth hackerrank kickstart leetcode topcoder

Last synced: 09 Oct 2024

https://github.com/alessandrodd/googleplay_api

Google Play Unofficial Python 3 API Library

android crawler googleplay googleplay-api playstore

Last synced: 27 Oct 2024

https://github.com/ivan-sincek/chad

Search Google Dorks like Chad. / Broken link hijacking tool.

broken-link-hijacking bug-bounty crawler ethical-hacking google-dorking google-dorks offensive-security penetration-testing playwright python red-team-engagement scraper search-engine security social-media social-media-takeover threat-hunting threat-intelligence web web-penetration-testing

Last synced: 31 Oct 2024

https://github.com/kagami/tistore

:camera: Tistory photo grabber

crawler cross-platform electron tistory

Last synced: 22 Oct 2024

https://github.com/ysh329/douban-crawler

抓取豆瓣小组相关信息（小组、用户、帖子）。

crawler douban douban-crawler

Last synced: 23 Oct 2024

https://github.com/feng19/spider_man

SpiderMan,a base-on Broadway fast high-level web crawling & scraping framework for Elixir.

crawler data-mining elixir erlang framework spider

Last synced: 29 Oct 2024

https://github.com/xiongwilee/techweekly

高可配的技术周报邮件推送工具

crawler nodejs techweekly

Last synced: 18 Oct 2024

https://github.com/rzo1/crawler4j

Open Source Web Crawler for Java - A maintained fork of yasserg/crawler4j

crawler crawler4j java spider web-crawler web-spider

Last synced: 29 Sep 2024

https://github.com/alanshaw/libp2p-dht-scrape-aas

🧹 A libp2p DHT scraper as a service allowing anyone to collect, consume and use to generate useful reports & visualisations.

crawler dht kademlia libp2p p2p scraper

Last synced: 21 Oct 2024

https://github.com/capjamesg/indieweb-search

Source code for the IndieWeb search engine.

crawler indieweb search search-engine

Last synced: 03 Aug 2024

https://github.com/Actomaton/ActoCrawler

🕸️ Swift Concurrency-powered crawler engine on top of Actomaton.

crawler swift

Last synced: 09 Aug 2024

https://github.com/thaoshibe/crawl-original-google-images

python scripts for crawling original image from Google Images

chrome-extension crawler crawling crawling-python google google-images pafy scraper youtube youtube-dl youtube-search

Last synced: 11 Oct 2024

https://github.com/mendableai/firecrawl-py

Crawl and convert any website into clean markdown

ai crawler llm python scraper

Last synced: 13 Aug 2024

https://github.com/RuedigerVoigt/exoskeleton

A Python framework to build polite, but tenacious crawlers / scrapers with a MariaDB backend

crawler crawling-framework database machine-learning mariadb network python python-3 scraping

Last synced: 01 Aug 2024

https://github.com/ruedigervoigt/exoskeleton

A Python framework to build polite, but tenacious crawlers / scrapers with a MariaDB backend

crawler crawling-framework database machine-learning mariadb network python python-3 scraping

Last synced: 15 Oct 2024

https://github.com/nvk681/gumo

A crawler that extracts data from a dynamic webpage. Written in node js.

crawler elasticsearch neo4j nodejs

Last synced: 11 Oct 2024

https://github.com/yokawasa/scrapy-azuresearch-crawler-samples

Scrapy as a Web Crawler for Azure Search Samples

azure azure-search crawler python python3 scrapy search

Last synced: 30 Oct 2024

https://github.com/asing1001/movierater

A useful website for finding movie's rating in Chinese and English. By crawling Yahoo, Ptt, IMDB.

apollo-client chai crawler graphql material-ui mocha mongodb movies nodejs reactjs redis server-side-rendering service-worker sinon typescript

Last synced: 14 Oct 2024

https://github.com/petehouston/udemy-crawler

Crawling Udemy course info and save into JSON format.

crawler crawling node node-cli udemy udemy-api udemy-crawl

Last synced: 23 Oct 2024

https://github.com/waynechang65/ptt-crawler

ptt-crawler is a web crawler module designed to scarpe data from Ptt.

crawler javascript nodejs ptt scraper scraping spider web-crawler webcrawler

Last synced: 19 Oct 2024

https://github.com/ArchiveTeam/WebArchiver

Decentralized web archiving

archiver archiving crawler decentralized python warc web webarchiving

Last synced: 01 Aug 2024

https://github.com/p0dalirius/crawlersuseragents

Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

bugbounty crawler crawlers pentest request tool user-agent web

Last synced: 29 Oct 2024

https://github.com/loomisloud/onion-crawler

Tor website crawler (specific for Alphabay at the time)

crawler onion parser python tor

Last synced: 03 Aug 2024

https://github.com/tower1229/crawler

Nodejs crawler for cnbeta.com

crawler nodejs

Last synced: 14 Oct 2024

https://github.com/bkeepers/spiderman

your friendly neighborhood web crawler

crawler crawler-engine http httprb nokogiri ruby spider spider-framework web-crawler web-scraping webcrawler webscraping

Last synced: 23 Oct 2024

https://github.com/alinebastos/crawler

Web Crawler created with Node.js and Puppeteer

crawler fs javascript nodejs puppeteer scraping

Last synced: 05 Nov 2024

https://github.com/josecelano/my-favourite-appliances

Laravel CRUD sample

crawler crud laravel sample

Last synced: 29 Oct 2024

https://github.com/mauriceconrad/xml-parser

A Node.js XML DOM, Parser & Stringifier.

crawler crawling dom html html-parser html-parsing xml xml-parser xml-parsing xml-schema

Last synced: 28 Oct 2024

https://github.com/PadishahIII/SecretScraper

SecretScraper is a web scraper that crawl through target websites, scrape from http response and extract secret information via regular expression.

crawler cyper hyperscan pentest-tool pentesting python sensitivity-analysis webscraper

Last synced: 13 Aug 2024

https://github.com/enijkamp/supermonkey

A crawler for automated Android UI testing.

ai android crawler

Last synced: 22 Oct 2024

https://github.com/paambaati/websight

🕷A simple but *really* fast crawler built with Node.js & TypeScript

coding-challenge crawler interview-questions javascript monzo nodejs typescript

Last synced: 15 Oct 2024

https://github.com/racinmat/premium-downloader

crawler pornhub pornhub-downloader python

Last synced: 06 Nov 2024

https://github.com/pourmand1376/persiancrawler

Open source crawler for Persian websites.

crawler machine-learning news python scrapy tasnim text-classification

Last synced: 11 Oct 2024

https://github.com/vignif/crawler-google-scholar

This bot crawls and downloads statistics and pictures from google scholar's researchers.

crawler downloading-statistics google-scholar indexes statistics

Last synced: 01 Aug 2024

https://github.com/Knovour/json-web-crawler

Use JSON to list all elements (with css 3 and jquery selector) that you want to crawl.

crawler javascript jquery json web-crawler

Last synced: 03 Aug 2024