Crawler | Ecosyste.ms: Awesome

https://github.com/scrwdrv/siege-crawler

This CLI tool will find same domain urls in a web page and requesting them to find even more urls until server crash (or at the end of benchmark). It is used to test maximun capacity of server or finding for glitches that users might encounter.

benchmark cli crawler ddos debug siege tool

Last synced: 05 Apr 2025

https://github.com/turtiesocks/zendriver-rs

Async-first, undetectable browser automation in Rust via the Chrome DevTools Protocol. Stealth-by-default port of zendriver — no WebDriver, no JS shim.

anti-detection async automation bot browser-automation cdp chrome-devtools-protocol chromium cloudflare-bypass crawler headless-chrome playwright-alternative rust scraping stealth tokio undetectable-chromedriver web-scraping web-testing zendriver

Last synced: 13 Jun 2026

https://github.com/coghost/crawlers

crawlers in one

crawler python3 staticimg weibo

Last synced: 10 Jul 2025

https://github.com/princed/specht

Check links found in html or js files by pattern

cli crawler html javascript streams

Last synced: 10 Jul 2025

https://github.com/sachin21/dmm-crawler

Fetch DMM.R18's data by crawler. Now, All arts for dojin and eroge is crawlable.

crawler dmm dojin doujin gem ruby

Last synced: 12 Sep 2025

https://github.com/akashrajpurohit/node-crawler

Nodejs Crawler which scrapes a website on live domain and crawls to find all URL of the domain

crawler node-crawler nodejs url

Last synced: 27 Apr 2026

https://github.com/sudolife/shopify

An easy-to-use crawler to keep track of reviews of an app on Shopify.

crawler go golang shopify

Last synced: 16 May 2026

https://github.com/orsinium-labs/gpcc

Python library and CLI tool to fetch information from GCP Browser (https://gpc-browser.gs1.org/)

crawler gpc gs1

Last synced: 19 Jun 2025

https://github.com/purrproof/smartcrawl

An adaptable framework for gathering, aggregating and analyzing data, focusing on blockchain and smart contracts.

blockchain cli crawler explorer framework go golang hacktoberfest

Last synced: 16 May 2026

https://github.com/pjullrich/link-crawler

Python Crawler that reports broken links on a given website and its sup-pages

asyncio breadth-first-search broken-links crawler python

Last synced: 11 Jul 2025

https://github.com/matheusfelipeog/google-doodles

Mapeie e faça download dos Doodles do Google.

crawler google google-doodle python web-scraping

Last synced: 13 Jul 2025

https://github.com/tibiasolutions/sharp-parser

Tibia.com parser informations in C#

crawler nuget parsed-data tibia tibia-parser

Last synced: 17 May 2026

https://github.com/uranusx86/dcard-crawler-analyzer

get Dcard & Meteor forum content and analyze !

crawl crawler dcard nlp python word-cloud word-count word-frequency

Last synced: 14 Jul 2025

https://github.com/songjiayang/china_repos

github repo 爬虫

china crawler statistics

Last synced: 18 Jul 2025

https://github.com/iamgideonidoko/web-crawler-with-php

Sample implementation of web crawler in PHP

crawler php webcrawler

Last synced: 21 Mar 2025

https://github.com/bytejoseph/osintgit

OSINT investigation tool for Github

crawler email github github-to-email hacking hacking-tool hacktoberfest hacktoberfest2024 latest open-source-intelligence osint osint-python osint-tool pentesting pentesting-tools python python3 script streamlit streamlit-webapp

Last synced: 23 Jul 2025

https://github.com/sirius-mhlee/naver-cafe-crawler

NAVER Cafe Crawler using pandas, tqdm, Selenium, BeautifulSoup4

beautifulsoup4 crawler pandas selenium tqdm

Last synced: 09 Mar 2026

https://github.com/adamfisher/scrapyrt.client

A C# client to make calls to a scrapyrt (Scrapy real-time) HTTP endpoint.

crawler scraper scrapy scrapy-crawler scrapy-framework scrapy-spider

Last synced: 21 Mar 2025

https://github.com/gnaneshkunal/book-miner

Web crawler for Book reviews (Goodreads)

crawler goodreads typescript

Last synced: 03 Apr 2025

https://github.com/bing-su/arcalive-crawler-python

아카라이브 크롤러

crawler python

Last synced: 21 Jun 2026

https://github.com/droiddevgeeks/nodelearning

This is node learning demo. It has covered all basics of node.

crawler database ejs ejs-express mcv middleware-nodes mongodb node node-module nodejs nodemailer npm-package router sign

Last synced: 05 Apr 2026

https://github.com/deployment-helper/api-template-crawler

API interface to crawl the templates

api crawler deployment-helper gcp gcp-cloud-run golang rest

Last synced: 01 Sep 2025

https://github.com/pavelsr/email-extractor

Fast email crawler

crawler email-crawler email-marketing perl telemarketing

Last synced: 18 Mar 2025

https://github.com/mahmoudgalalz/pupt

A starter for web crawling using Puppeteer

crawler nodejs scraping

Last synced: 17 May 2026

https://github.com/duaraghav8/larry-crawler

Kayako Twitter challenge

crawler fetch-tweets hashtag nodejs pagination tweets twitter-api

Last synced: 17 May 2026

https://github.com/rix4uni/pathcrawler

Discover new paths via scanning html.

bug-bounty bugbounty bugbountytips crawler hacking infosec osint osint-resources osint-tool pathcrawler penetration-testing pentest-tool pentesting recon reconnaissance scrape security security-tools threat-intelligence

Last synced: 17 Feb 2026

https://github.com/skylightqp/namu2csv

A namuwiki crawler that converts header to csv file for kartrider wiki

crawler rust

Last synced: 24 Jun 2025

https://github.com/flavien-hugs/scrapy-test

Manipulation de la librairie Scrapy. Mini script permet d'extraire l'ensemble des personnages de dessin animé sur Wikipedia.

crawler python scraping scrapy

Last synced: 29 Mar 2025

https://github.com/pierlauro/mdbubing

From WARC records to MongoDB documents

bubing crawler crawling warc warc-files warc-format warc-record webarchive webarchiving

Last synced: 29 Mar 2025

https://github.com/richecr/pyhltv

Repository to extract information from the HLTV website.

crawler csgo hacktoberfest hltv hltv-api python3

Last synced: 21 May 2026

https://github.com/srx-2000/swaiter

a programe to wait until the selenium element has loaded——selenium模拟器元素等待程序

crawler selenium selenium-python

Last synced: 18 May 2026

https://github.com/opda0887/bahamut-crawler-to-gmail

發想：使用Python爬蟲取得巴哈姆特版面的最新論壇，並用gmail傳送這些訊息給自己。A thought: Use Python crawler to the latest forums in Bahamut, and use gmail to send these messages to myself.

crawler crawler-python

Last synced: 21 Mar 2025

https://github.com/teal33t/base_crawler

Simple scaffold for selenium based crawler bots

crawler scaffold-template selenium selenium-python

Last synced: 18 May 2026

https://github.com/altescy/mincrawler

A minimal web crawler.

configurable crawler python scraping

Last synced: 21 Mar 2025

https://github.com/zhs007/lottery-crawler

基于jarvis-task的爬虫，主要用来爬取lottery数据。

crawler jarvis-task

Last synced: 30 Oct 2025

https://github.com/cryptoc1/earl

Earl is looking for URLs in your area.

crawler middleware nuget webscraping

Last synced: 18 May 2026

https://github.com/morungos/github-issue-crawler

Github crawler for public repositories, issues, and comments

crawler github issues

Last synced: 30 Apr 2026

https://github.com/basemax/kashan-university-phone-directory

This repository contains a scraper and dataset for extracting and publishing the phone directory of employees and other personnel from the University of Kashan. It includes tools to scrape, parse, and export data from a given HTML file into JSON format.

crawler crawlers database html-scraper json kashan kashan-university scraper scraper-api scraper-html scrapers university university-of-kashan

Last synced: 18 May 2026

https://github.com/maxiroellplenty/gs-robot

NodeJs tool to scrap gelbe-seiten

axios cheerio crawler gelbe-seiten nodejs scraper yargs

Last synced: 18 May 2026

https://github.com/hong539/acgbox_crawler

An web-crawler for gamer.com.tw/acgbox

beautifulsoup4 crawler pandas python requests scrapy sqlalchemy web-crawler

Last synced: 05 Apr 2025

https://github.com/igorbrizack/web-scraper

Aplicação de raspagem de dados HTML, construída em python.

crawler pytest python3 scraper

Last synced: 08 May 2026

https://github.com/buren/site_health

Crawl a site and check various health indicators

crawler rubygem site-health

Last synced: 21 Mar 2025

https://github.com/tungct/golangcrawler

Crawler goroutine Golang

crawler go

Last synced: 07 Jun 2026

https://github.com/tungct/facebook-crawler

crawler facebook python

Last synced: 04 Mar 2025

https://github.com/khadkarajesh/aptoide

Aptoide app crawler using beautifulsoup

beautifulsoup4 crawler flask python3 web-application

Last synced: 19 May 2026

https://github.com/dineshsprabu/concurrent-web-crawler

Flexible and concurrent web crawler implemented in 'go'

concurrent-web-crawler crawler go-crawler spider web-crawler

Last synced: 12 Jan 2026

https://github.com/basemax/doostihaacrawler

A PHP-implemented crawler for Doostihaa.com. (Database of thousands of movies)

crawler crawler-example crawler-php crawler-testing crawlers database-movie database-movies doostihaa doostihaa-com movie movie-database movie-database-api movie-database-website movie-db movies movies-database php php-crawler

Last synced: 13 Sep 2025

https://github.com/yordadev/fenrisjs

A NodeJS application that scrapes any links from a given input and outputs the results nicely into one of two files, external or internal file for further analysis.

analysis crawler link-collection link-crawler nodejs nodejs-application

Last synced: 10 May 2026

https://github.com/arghyadipchak/craww

Gemini (protocol) crawler written in Rust

crawler gemini gemini-protocol rust

Last synced: 15 Jun 2026

https://github.com/rxcai/python3-weibo-crawler

基于Python3实现的微博小爬虫

crawler python python3 spider weibo

Last synced: 22 Mar 2025

https://github.com/eea/eea-crawler

EEA Crawler contains the tasks (DAGs) used by Apache Airflow to index content from various EEA-Eionet websites into a central Elasticsearch (aka content hub).

airflow-dags crawler elasticsearch etl-pipeline indexing

Last synced: 20 May 2026

https://github.com/gozeon/weibo-crawler

微博爬虫

crawler web-crawler

Last synced: 21 Mar 2025

https://github.com/meilisearch/actions

Meilisearch Github Actions

crawler meilisearch

Last synced: 26 Jun 2025

https://github.com/ccrashzer0/web_crawler

A python based web crawler

crawler internet python python3 webcrawler

Last synced: 22 Mar 2025

https://github.com/orafaelfragoso/itunes-crawler

Retrieves information about an artist by crawling the iTunes API and iTunes Page

api crawler itunes itunes-api

Last synced: 31 Jul 2025

https://github.com/lencx/hero-crawler

⚔️ Hero Info(King Of Glory)

crawler hero

Last synced: 01 Jul 2025

https://github.com/panagiks/asset

ASynchronous Spidering Essential Tool (ASSET).

async asyncio crawler graph reporting spider

Last synced: 28 Jul 2025

https://github.com/muhfalihr/pyxdtelebot

PyXDTeleBot is a Telegram bot created using the Python programming language, specifically designed to facilitate the seamless sharing of media such as photos and videos from Twitter user posts.

crawler crawling crawling-python crontab python3 telegram-bot telegram-bot-api twitter twitter-api x

Last synced: 06 Apr 2025

https://github.com/krishealty/whoknows

All in One Advanced and Detailed Web Scanner with over 1000 plug-ins.

bug-bounty bypass crawler enumeration ethical-hacking footprinting hacking hacking-tool intelligence-gathering javascript offensive-security osint pentesting pentesting-tools security-tools subdomain-enumeration vulnerability-analysis vulnerability-detection web-application-security web-reconnaissance

Last synced: 11 Apr 2026

https://github.com/ryanking13/bellorin

Multi-threaded Social Media Crawler 🔍

crawler instagram social-media

Last synced: 29 Jun 2025

https://github.com/tasooshi/digslash

A site mapping and enumeration tool for Web applications analysis

crawler mapping sitemap spider

Last synced: 08 Apr 2026

https://github.com/shimech/pokemon-db-maker

Webクローリングでポケモン図鑑を生成

beautifulsoup crawler docker pokemon scraper

Last synced: 25 Jan 2026

https://github.com/rebrowser/autotrader-dataset

AutoTrader car listings database: new, used & CPO vehicles with make, model, trim, mileage, MSRP, KBB fair price range, deal rating, body style, fuel type, and seller state. Updated daily.

automotive autotrader car-listings car-prices crawler data-collection data-science dataset kbb open-data scraper used-cars vehicle-data web-scraping

Last synced: 03 May 2026

https://github.com/andrew-ld/wowroms-downloader

download all roms from wowroms

aiohttp asyncio crawler python3

Last synced: 17 Jan 2026

https://github.com/toannd96/chromedp-example-login

chromedp crawler golang goquery

Last synced: 21 May 2026

https://github.com/roccomuso/is-apple

Verify that a request is from Apple crawlers using DNS verification steps

apple bot crawler dns ip js nodejs

Last synced: 21 May 2026

https://github.com/hwywl/mzitu-crawler

爬取mzitu网站的妹子，注意营养

crawler mzitu python

Last synced: 29 Apr 2026

https://github.com/lucasbotang/project_financial_markets_text_mining

Predict stock market movement based on news

crawler data-science natural-language-processing python

Last synced: 21 May 2026

https://github.com/pymarcus/webscrapingiii

Um crawler que pega produtos em uma lista e percorre as páginas do mercado livre selecionando preços, o nome e o link para acessá-los.

crawler mercadolivre python webscraping

Last synced: 15 Sep 2025

https://github.com/vietdoo/sg-property-hub

SG Property Hub is a comprehensive platform for managing and analyzing property data.

airflow celery-redis crawler etl etl-pipeline fastapi minio mongodb nextjs postgresql s3 spark webscraping

Last synced: 08 Apr 2026

https://github.com/vlad1kudelko/2023.08.15-scraping

Crawler of cooking sites

cloudflare cloudflare-bypass crawler docker parsing python scraping selenium undetected-chromedriver

Last synced: 08 Apr 2026

https://github.com/nelcifranmagalhaes/web_crawler

A web crawler for all Naruto characters

anime beautifulsoup characters crawler naruto python

Last synced: 14 Jul 2025

https://github.com/khdxsohee/email-miner-pro

EMail Miner Pro is designed specifically for professionals scraping data from search engines like Google, ensuring that generic emails (e.g., Gmail, Yahoo) are correctly linked to their business websites found on the page.

chrome crawler crawling email email-extractor extension-chrome lead-generation miner scraper

Last synced: 03 Feb 2026

https://github.com/sonhm3029/crawl-data-bot

This project making a base crawl data from web bot, include text data and images data

crawler google medical vietnamese

Last synced: 08 Mar 2026

https://github.com/somehowchris/swisslos-cralwer

(WIP) Crawler to access the current and history numbers of swisslos

crawler euromillions lotto rust swisslos

Last synced: 22 Mar 2025

https://github.com/aleclarson/recrawl

Filesystem crawler

crawler fs nodejs

Last synced: 16 Sep 2025

https://github.com/deventerprisesoftware/scrapi-sdk-dotnet

The only web scraping service you'll ever need that offers advanced features that are simple to use for efficient data extraction.

browser-automation crawler scraper-api web-scraping webscraper

Last synced: 22 May 2026

https://github.com/thomashirtz/douban-crawler

A simple crawler for retrieving information about movies or TV shows from the famous www.douban.com website.

crawler douban

Last synced: 14 May 2025

https://github.com/im-perativa/public_crawler

A collection of crawler project for Indonesia dataset

crawler indonesia indonesia-api scrapy

Last synced: 20 Mar 2025

https://github.com/tranbavinhson/crawler

Crawler by Scrapy

crawler python scrapy

Last synced: 25 Jul 2025

https://github.com/konradlinkowski/wikipediafinder

Find words in wikipage

crawler scraper wikipedia

Last synced: 17 Sep 2025

https://github.com/willi-dev/dtcapp

dtcapp : distributed twitter crawler.

crawler distributed-systems hazelcast java twitter twitter-api

Last synced: 18 Sep 2025

https://github.com/cseas/shares-monitor

Web crawler to fetch and monitor shares details.

crawler python python3 scraper scraping-websites shares

Last synced: 27 Jul 2025

https://github.com/machu-gwu/crawlib-project

tool set for crawler project.

crawler framework mongodb python scrapy

Last synced: 20 Sep 2025

https://github.com/davidkhala/ml

classic AI index

crawler

Last synced: 17 Jan 2026

https://github.com/panakour/pkscraper

Extract structured data from the web

crawler crawling scraper scraping scraping-websites webcrawler

Last synced: 19 Feb 2026

https://github.com/fengdongfa1995/video-dl

download video from online video websites.

bilibili crawler pornhub python3 video

Last synced: 09 Apr 2026

https://github.com/leandrols/scliper

CLI Tool to make simple web scraping.

cli-scripts crawler golang scraping

Last synced: 01 Nov 2025

https://github.com/ghost---shadow/feature-extractor-from-codebase

Copies the target java file and all its dependencies recursively to another directory

code-splitting crawler

Last synced: 22 Sep 2025

https://github.com/beanwei/zmt-post-crawler

Crawler the ZMT platform site ,put the author id, get the post list.This project is coding for my friend

crawler golang golang-ui

Last synced: 08 Nov 2025

https://github.com/programming-with-love/skyeyesystem

天眼系统，每隔十分钟爬取各个平台的热搜数据并入库。包括原始热搜数据存入mysql。词频统计存入Redis。

crawler mysql redis skyeye skyeyewall springboot

Last synced: 25 Sep 2025

https://github.com/kangoo13/textbroker-author-article-picker

Bot that automatically lock an order into a textbroker's author account.

author-textbroker automation bot colly crawler go gocolly golang scrapper spider textbroker textbroker-author textbroker-order-picker textbroker-orders textbroker-scrapper

Last synced: 02 Aug 2025

https://github.com/arihantbansal/cybersec-python

Cybersec/CTF practice problems solved in Python

crawler cryptography ctf cybersecurity sockets webscraping

Last synced: 02 Aug 2025

https://github.com/udaykiran2017/seo-reports

📊 Generate and analyze SEO reports effortlessly to enhance your website's visibility and performance across search engines.

audit broken-links cli crawler extraction google-lighthouse hreflang-checker hreflang-matrix puppeteer scan-website searchengineoptimization seo seo-macroscope seo-manager seo-meta seo-optimization web-scraping webmaster

Last synced: 16 May 2026

https://github.com/tsoliangwu0130/ptt-search

A simple Python script to fetch PTT post from the command line.

crawler ptt python

Last synced: 08 Aug 2025

https://github.com/jfcherng/wiki-cgroup-crawler

此腳本用於抓取維基百科的公共轉換組詞庫，並將結果儲存為外部檔案。

crawler php-71 wiki-cgroup-crawler wikipedia

Last synced: 03 Oct 2025

https://github.com/mindfiredigital/deepscanbot

It allows you to crawl websites with various configurations, including crawl depth, timeout settings, proxy support, and output options.

bot crawl crawler go golang google webcrawler

Last synced: 10 Aug 2025

https://github.com/ptthanh02/vietnam-news-crawler

crawler crawling-python newspaper text-data text-mining

Last synced: 11 Aug 2025

https://github.com/dylanhogg/cloud-products

A package for getting cloud products and product descriptions from a cloud provider website.

aws cloud-products crawler data text-processing

Last synced: 05 Oct 2025

https://github.com/win7user10/laraue.crawling

The set of tools for fast writing crawlers on the .NET

crawler csharp csharp-crawler parser

Last synced: 17 Aug 2025