Crawler | Ecosyste.ms: Awesome

https://github.com/s3rgeym/wscrap

Command line web scraping tool.

crawler scraping

Last synced: 09 Apr 2025

https://github.com/rabattkarte/free-domain-scanner

crawler dns domain domain-name domain-names go golang scanner whois

Last synced: 26 May 2026

https://github.com/ghsaboias/alpha-agent

An intelligent web research assistant that combines web crawling, search functionality, and AI-powered analysis using Anthropic's Claude API.

ai claude crawler search web

Last synced: 14 Mar 2025

https://github.com/ismoreirakt/spyder

The web is changing. Spyder sees it.

alerts automation crawler monitor

Last synced: 01 Mar 2025

https://github.com/mnemocron/VPNNetworkShareCrawler

ugly scripts to connect a Raspberry Pi to a VPN and attach network share to periodically crawl the documents on it

crawler samba vpn

Last synced: 11 Mar 2025

https://github.com/thesurlydev/surly-spider

A command line interface for the spider library

crawl crawler rust spider surly surly-spider

Last synced: 16 Feb 2026

https://github.com/leshniak/robotstxt-debug

A tool for debugging robots.txt

crawler debugger indexing robots-txt seo seo-optimization seo-tools tester

Last synced: 25 Jun 2025

https://github.com/insectmk/douban-crawler

豆瓣电影Top250爬虫及数据展示

analysis crawler django echarts mysql python3 website

Last synced: 10 Mar 2026

https://github.com/dalthviz/csapp

Crawler-Scrapper for the playstore

crawler csapp keyword nlp playstore rating review scrapper

Last synced: 13 May 2026

https://github.com/tungct/tngtcrawler

Crawler using Scrapy

crawler python scrapy

Last synced: 29 May 2026

https://github.com/cak/foot

Foot is a library that fetches a list of URLs and silly walks through each site to gather information.

bugbounty crawler scraping

Last synced: 22 May 2026

https://github.com/r3c0ger/douban-movie-top250-crawler

Crawl the movie information of Douban Movie Top-250, including movie name, movie link, director, starring, release time, production country/region, type, rating, number of reviews and introduction.

beautifulsoup4 crawler lxml python3 spider

Last synced: 10 Jun 2026

https://github.com/codegram01/go-ai-crawl

Golang Web Crawl with AI

ai chromedp crawler golang ollama

Last synced: 16 Apr 2026

https://github.com/guanbinrui/img-crawler

A image crawler.

crawler

Last synced: 10 Feb 2026

https://github.com/alphadev3296/scrap-www.floridabar.org

automation crawler csv playwriht python scraper selenium xlsx

Last synced: 26 Dec 2025

https://github.com/appliedsoul/headless-screenshot

High-level library for taking screenshot of websites based on headless chrome (puppeteer)

crawler headless-chromium javascript nodejs scrapper screenshot testing

Last synced: 21 Apr 2026

https://github.com/huakunshen/cron-crawler-template

Web Crawler Cron Job Template running with GitHub Action. Capable of sending email notifications.

crawler github-actions python

Last synced: 15 May 2026

https://github.com/crosscutsaw/iscsicrawler

iscsicrawler is a bash script that crawls files in the iscsi targets with ease.

crawler iscsi iscsi-target iscsiadm

Last synced: 16 Jan 2026

https://github.com/moparisthebest/nginx-limit-crawlers

rate limit crawlers in nginx

ai crawler nginx

Last synced: 14 Mar 2025

https://github.com/jonesrussell/pipelinex

Firecrawl-style web intelligence pipeline powered by North Cloud

crawler pipeline vue

Last synced: 09 Mar 2026

https://github.com/andrefs/derzis

A path-aware distributed linked data crawler

crawler linked-data

Last synced: 09 Aug 2025

https://github.com/fritz-c/itunes-stats

Fetch info on podcasts, etc. from iTunes RSS data

crawler itunes

Last synced: 18 Jun 2026

https://github.com/fengzixu/crawlinganything

如果你对数据有兴趣，那么就应该立即行动起来

crawler python

Last synced: 15 Jun 2026

https://github.com/smikodanic/dex8-sdk

DEX8 SDK is software development kit for DEX8.com platform.

crawler crawler-engine data-extraction dex8 scraper scraping-websites spider

Last synced: 11 Jul 2025

https://github.com/lulurun/kick-off-crawling

make web scraping easy

crawler nodejs scraper

Last synced: 01 May 2026

https://github.com/jiusanzhou/reaper

Distributed Elegant Scraper and Crawler Framework for Rust.

crawler data-scraping rust scraper spider

Last synced: 24 Jul 2025

https://github.com/alphabs/navercafeclient

네이버 카페 글 목록 크롤링을 위한 닷넷 라이브러리

crawler crawling dotnet naver naver-api naver-cafe web-scraper web-scraping

Last synced: 06 May 2026

https://github.com/robin98sun/structured-web-data-crawler

crawler multi-thread structured-web-data

Last synced: 16 Mar 2025

https://github.com/tech-espm/misc-webbot

This project is aimed on creating personal assistants for replying messages about specifics issues.

classification-model crawler nlp

Last synced: 12 Jun 2026

https://github.com/Arman2409/data-falcon

Web crawler

crawler extract-data

Last synced: 02 Apr 2025

https://github.com/bramtenhove/issue-crawler

Crawls Drupal issues and keeps stats

crawler

Last synced: 09 Jan 2026

https://github.com/yangxuhui/requests-google

A simple google related Parsing Package

crawler google-api parsing

Last synced: 14 Jan 2026

https://github.com/k0nxt3d/web-scrapers

Web Scraping Scripts in PhP and Bash

bash bot clone cloning crawler curl curlphp download mirroring scraping scraping-websites seo seo-optimization shell-script spider wget

Last synced: 31 Dec 2025

https://github.com/usethisname1419/connectioncrawler

crawls a website and checks for connections

connection crawler http-headers reporting website-analyzer

Last synced: 06 Jul 2025

https://github.com/fusetim/bitcrawler

Small experiments to learn a bit more about BitTorrent, DHT and etc. Might also be a BitTorrent DHT crawler one day?

bittorrent crawler dht

Last synced: 30 Mar 2025

https://github.com/mikiw/reactweb3

Ethereum transaction crawler in ReactJs.

blockchain crawler ethereum

Last synced: 14 May 2026

https://github.com/loko5ja/seed-gen

Seed-gen is an innovative tool designed to generate unique and creative seed phrases for cryptocurrency wallets. With a focus on security and usability, it ensures that users have robust, memorable keys for safeguarding their digital assets efficiently.

crawler crypto crypto-2025 crypto-bot crypto-finder crypto-recovery ethereum-bruteforce laravel lost-btc-wallet-finder mnemonic-generator seed-crypto seed-recovery seed-tool yeoman

Last synced: 03 Apr 2025

https://github.com/chenbingwei1201/threads_scraper

A Python package for scraping Threads posts.

chromedriver crawler csv-format pypi pypi-package python python3 scraper scraping-websites

Last synced: 03 Feb 2026

https://github.com/nowshad-sust/corona

A simple data endpoint for coronavirus updates

api corona coronavirus-updates crawler dcoker-compose excel nodejs

Last synced: 17 May 2026

https://github.com/radityaharya/sitesweeper

Sitesweeper is a python package to help you automate your web scraping process, outputting pages to a file

crawler pdf python website-crawler

Last synced: 27 Mar 2025

https://github.com/sssshefer/web-crawler-http

Basic web crawler which represents the linking structure of the website

crawler jest jest-tests js

Last synced: 01 Mar 2025

https://github.com/diegojromerolopez/relwrac

A basic crawler developed with python and asyncio

asyncio crawler page-rank python

Last synced: 11 Nov 2025

https://github.com/tormol/zenphoto-dl

A script for recursively downloading all pictures from zenphoto-based photo albums.

crawler python-script

Last synced: 30 Aug 2025

https://github.com/orshahar91/crawler

Simple Web Crawler

crawler crawling-websites image-crawler java servlets webcrawler

Last synced: 11 Nov 2025

https://github.com/jmousqueton/check-broken-link

Multi-threaded Python tool for crawling and checking all internal links on a website, with live Rich dashboard, broken link export (CSV), and detailed source tracking.

check crawler error400 error404 error500 links

Last synced: 29 Aug 2025

https://github.com/ekojs/web-crawler

Web Crawler untuk mengambil judul penelitian pada Google Scholar

crawler nodejs web-crawler

Last synced: 12 Apr 2026

https://github.com/orkan/tlc

Simple PHP/cURL/FlareSolverr framework with Logger, Cache and more!

crawler curl flaresolverr net scrap

Last synced: 27 Aug 2025

https://github.com/ferru97/jsketchfabcrawler

jSketchfabCrawler is a java for the automatic crawling of model's information from sketchfab.com

crawler data database java sketchfab sql

Last synced: 03 Jan 2026

https://github.com/kahsolt/qzone_mood_dumper

Dump your qzone mood(说说) history to local SQL database storage

crawler dumper qzone-mood

Last synced: 25 Aug 2025

https://github.com/hoosnick/olx-parser

OLX Real Estate Parser

crawler olx

Last synced: 25 Aug 2025

https://github.com/allancapistrano/anime-sheets

Crawler que pega as informações dos animes e salva numa planilha.

anime crawler google-sheets google-sheets-api

Last synced: 16 Mar 2025

https://github.com/eivindarvesen/naive-spider

A minimal web crawler

crawler python spider

Last synced: 31 May 2026

https://github.com/leegeunhyeok/python-gongucrawler

파이썬3 공유마당 이미지 및 상세정보 크롤러

crawler python

Last synced: 24 Aug 2025

https://github.com/kevincolemaninc/mm-crawler

Scrapes meetme user profiles

crawler docker fake-data meetme ruby scraper sidekiq

Last synced: 07 May 2026

https://github.com/roc41d/http-web-crawler

Http web crawler with Nodejs + TDD

crawler http javascript jest jest-test nodejs webcrawler

Last synced: 13 Apr 2026

https://github.com/mohitk05/drstrange

A simple breadth-first search web crawler

bfs crawler

Last synced: 22 Aug 2025

https://github.com/moojing/coinmarketcap-crypto-crawler

A Raycast plugin for getting the latest price of your favorite coins from CoinMarketCap.

crawler cryptocurrency

Last synced: 01 Apr 2025

https://github.com/wangluyu/trending-crawler

crawler python3 trending

Last synced: 10 Apr 2025

https://github.com/ronniery/crawler.synom

A crawler for the sinonimo.com.br website that saves the words into mongodb database.

bot crawler html html5 javascript mongodb nodejs nosql npm scraper thesaurus typescript web website xml

Last synced: 10 Apr 2026

https://github.com/luickk/vulnerability-crawler

Small python program meant to analyze random sites found on google for any vulnerabilities!

crawler xss

Last synced: 20 Aug 2025

https://github.com/kianoushamirpour/crawl_google_scholar_with_selenium_fastapi_mongodb

Crawl google scholar profiles with selenium, store the extracted data in the MongoDB and serve the queries with FastAPI.

crawler fastapi google-scholar mongodb python selenium

Last synced: 16 Apr 2026

https://github.com/hong539/ip_lookup

For ip_lookup with some Public or Private API

crawler ipv4 ipwhois python

Last synced: 19 Aug 2025

https://github.com/billy0402/tibame-python-data-analysis

A learning project from TibaMe Python data analysis course.

ai course crawler jupyter-notebook matplotlib pandas python requests

Last synced: 10 Apr 2026

https://github.com/ilovebacteria/digikala-api

This python package requests to Digikala API and gets a product detail.

crawler digikala pypi

Last synced: 11 Feb 2026

https://github.com/uinaf/lincrawl

Local-first Linear work-graph archive CLI

age-encryption archive cli crawler crawlkit linear sqlite

Last synced: 24 May 2026

https://github.com/casoon/astro-crawler-policy

Policy-first crawler control for Astro — generates robots.txt and llms.txt with presets, per-bot rules, AI crawler registry, and build-time audits.

ai-crawler astro astro-integration crawler llms-txt robots-txt seo typescript

Last synced: 24 May 2026

https://github.com/d-w-arnold/local-news-data-collection

Web crawler for local news sites - Generates HTML files of each webpage visited and a list of links found on the webpage, as a TXT file 🌎

crawler data-collection python

Last synced: 01 Apr 2025

https://github.com/jul10l1r4/objetive

This software is a mini-crawler that aims to grab some text parts from some website or ip that responds http*

bigdata crawler data-science security-tools web

Last synced: 12 Aug 2025

https://github.com/dylancl/sitemap-crawler

Verify the status of each url in a (hosted) sitemap XML file.

crawler parser scraper sitemap xml

Last synced: 04 Oct 2025

https://github.com/iamkushvanth/real-time-data-analysis-using-kafka

In this project, you will execute an End-To-End Data Engineering Project on Real-Time Stock Market Data using Kafka. We are going to use different technologies such as Python, Amazon Web Services (AWS), Apache Kafka, Glue, Athena, and SQL.

athena aws aws-ec2 aws-s3 crawler glue kafka kafka-consumer python sql

Last synced: 18 Jun 2026

https://github.com/anshiii/pixder

🤔 A spider for pixiv.net

crawler pixiv spider

Last synced: 09 Aug 2025

https://github.com/keizerzilla/ssh-hunter

Script que caça por Raspberry Pis vulneráveis na internet (porta SSH aberta e senha padrão não modificada).

crawler raspberry-pi ssh

Last synced: 10 Apr 2025

https://github.com/keizerzilla/search4dwango9

My attempt to help solving the DWANGO9 wad mystery. More info: https://www.youtube.com/watch?v=RXGtCjdwwe8

crawler datamining doom-wad

Last synced: 10 Apr 2025

https://github.com/dyslab/otglite

Online TXT Grabee Lite Edition :bee:

crawler expressjs jquery nodejs sqlite3

Last synced: 09 Apr 2026

https://github.com/alonecandies/golwarc

All-in-One crawlers for Golang

crawler crawling go golang scraper scraping

Last synced: 12 Jan 2026

https://github.com/pixlcrashr/stwhh-mensa

Better STWHH Mensa menu data / interface / notifier

api crawler data food studierendenwerk-hamburg university website

Last synced: 07 Aug 2025

https://github.com/marceloneppel/crawler

Simple web crawler developed in Go.

crawler go golang web-crawler

Last synced: 07 Aug 2025

https://github.com/whateverzpy/douban_comments

HITSZ 2025 秋季的大数据导论课程作业内容

bigdata crawler scrapy

Last synced: 01 Oct 2025

https://github.com/seart-group/github-keyword-crawler

A simple and easy-to-deploy script for mining mentions of keywords across various :octocat: API endpoints

api-mining crawler dockerized github-api miner mongodb-database python-script

Last synced: 04 Aug 2025

https://github.com/tom-draper/wiki-crawl

A game of path finding through Wikipedia topics.

api crawler crawlers crawling crawling-python game pathfinding python requests wiki wikipedia wikipedia-api wikipedia-search

Last synced: 09 Mar 2026

https://github.com/juliocesarscheidt/stock-trader

aws-alb aws-ecs aws-xray crawler flask github-actions mongodb python rabbitmq terraform

Last synced: 09 Apr 2026

https://github.com/zenoyang/webcrawler

一些爬虫代码

crawler scrapy spider web-crawler

Last synced: 02 Aug 2025

https://github.com/Mahdijamebozorg/CryptoFundamentalAnalyzer

An end-to-end AI pipeline that performs technical and fundamental analysis of different cryptocurrencies.

crawler crypto cryptocurrency data-mining datamining information-retrieval llm python

Last synced: 25 Sep 2025

https://github.com/dappros/site_crawler

Site crawler used in Ethora platform as an option to import your specific business data into your AI agent chat bot.

crawler data-ingestion embedding-vectors embeddings ethora llm rag retrieval-augmented-generation retrieval-based-chatbots retrieval-chatbot semantic-search site-crawler vectorstore web-scraping website-indexing

Last synced: 20 Jan 2026

https://github.com/imrany/spindle

An open-source, lightweight web crawler and scraper. It can discover links on the web (crawler) and extract structured data from webpages (scraper).

crawler go golang scraper

Last synced: 24 Sep 2025

https://github.com/hackthedev/botnet

Tool to find IP's on the Web and check SSH availability and brute force login with a wordlist. Educationally only !!!

botnet bruteforce crawler education educational ip malicious proof-of-concept ssh testing web

Last synced: 17 Mar 2025

https://github.com/basemax/github-repos-report-generator

A Python CLI tool to fetch all public repositories of a GitHub user, extracting repository details such as name, URL, description, top language, and tags. Outputs data in CSV, JSON, and HTML formats.

api api-github crawler csv export extract github github-api github-export github-exporter github-info html json py python

Last synced: 16 Apr 2026

https://github.com/cristiangreco/gcrawler

A simple (not concurrent) web crawler written in Java.

crawler java

Last synced: 30 Jul 2025

https://github.com/jenting/compare-drugstore-price

Compare price between cosmeceutical shops

cosmed crawler golang poya side-project watsons

Last synced: 27 Mar 2025

https://github.com/shashankgroovy/crawler

Python crawler

crawler python webcrawler

Last synced: 30 Jul 2025

https://github.com/izh318/genie-music-artist-album-crawler

지니뮤직에 등록 되어 있는 특정 아티스트의 앨범 정보를 한 번에 크롤링 하는 Python Script 입니다.

crawler genie genie-music gui

Last synced: 08 Nov 2025

https://github.com/shunk031/amebloscraper

Scraper for Ameblo in Scrapy

ameblo crawler scraper scrapy

Last synced: 30 Jul 2025

https://github.com/allanbian1017/mbpprice

二手Macbook Pro資訊

crawler python

Last synced: 14 Jan 2026

https://github.com/tiennhm/crawl-sanfoundry-mcqs

Sanfoundry MQCS Crawler

beautifulsoup4 bs4 crawler csv flask python

Last synced: 13 Apr 2026

https://github.com/istador/mediaindexer

Software for a cronjob to crawl the ViMP media center and generate an index for it as a static website.

crawler website

Last synced: 03 Jan 2026

https://github.com/sauerbraten/monzter

Link crawler with configurable maximum depth and rate limiting

crawler go golang web-crawler

Last synced: 23 May 2026

https://github.com/eneax/web-crawler

A web crawler built in Node.js

crawler javascript nodejs web-crawler

Last synced: 15 Apr 2026

https://github.com/mehdieidi/offliner

Offliner is a tool to make a website offline viewable. It's a concurrent web crawler which saves all the pages and static files in a directory.

concurrency concurrent concurrent-programming crawler go golang goroutine multiprocessing multithreading process scraper thread

Last synced: 14 Jan 2026

https://github.com/prorobot-ai/worker

A concurrent web worker written in Go (Golang) designed to crawl websites efficiently while respecting basic crawling policies. The worker stops automatically after crawling a specified number of links (default: 64).

crawler golang grpc-server scraper

Last synced: 29 Jul 2025

https://github.com/heitor57/astronomy-news

:telescope::newspaper: Astronomy News

crawler data-science news text-mining

Last synced: 06 Oct 2025