Crawler | Ecosyste.ms: Awesome

https://github.com/mindfiredigital/deepscanbot

It allows you to crawl websites with various configurations, including crawl depth, timeout settings, proxy support, and output options.

bot crawl crawler go golang google webcrawler

Last synced: 10 Aug 2025

https://github.com/ptthanh02/vietnam-news-crawler

crawler crawling-python newspaper text-data text-mining

Last synced: 11 Aug 2025

https://github.com/panagiks/asset

ASynchronous Spidering Essential Tool (ASSET).

async asyncio crawler graph reporting spider

Last synced: 28 Jul 2025

https://github.com/dylanhogg/cloud-products

A package for getting cloud products and product descriptions from a cloud provider website.

aws cloud-products crawler data text-processing

Last synced: 05 Oct 2025

https://github.com/win7user10/laraue.crawling

The set of tools for fast writing crawlers on the .NET

crawler csharp csharp-crawler parser

Last synced: 17 Aug 2025

https://github.com/lencx/hero-crawler

⚔️ Hero Info(King Of Glory)

crawler hero

Last synced: 01 Jul 2025

https://github.com/salman0ansari/sitefetch

Fetch a site and extract its readable content as Markdown (to be used with AI models).

ai chatgpt crawler fetcher golang scraping

Last synced: 19 Aug 2025

https://github.com/orafaelfragoso/itunes-crawler

Retrieves information about an artist by crawling the iTunes API and iTunes Page

api crawler itunes itunes-api

Last synced: 31 Jul 2025

https://github.com/ccrashzer0/web_crawler

A python based web crawler

crawler internet python python3 webcrawler

Last synced: 22 Mar 2025

https://github.com/meilisearch/actions

Meilisearch Github Actions

crawler meilisearch

Last synced: 26 Jun 2025

https://github.com/gozeon/weibo-crawler

微博爬虫

crawler web-crawler

Last synced: 21 Mar 2025

https://github.com/eea/eea-crawler

EEA Crawler contains the tasks (DAGs) used by Apache Airflow to index content from various EEA-Eionet websites into a central Elasticsearch (aka content hub).

airflow-dags crawler elasticsearch etl-pipeline indexing

Last synced: 20 May 2026

https://github.com/rxcai/python3-weibo-crawler

基于Python3实现的微博小爬虫

crawler python python3 spider weibo

Last synced: 22 Mar 2025

https://github.com/arghyadipchak/craww

Gemini (protocol) crawler written in Rust

crawler gemini gemini-protocol rust

Last synced: 15 Jun 2026

https://github.com/amirzenoozi/aparat-videos-dataset

Some Simple Information About Aparat Videos for DataScientists

aparat cli crawler data-science data-science-projects pandas python python3 sdk-python sqlite3 video

Last synced: 17 May 2026

https://github.com/sinipelto/repo-license-crawler

Collects and summarizes license information on Python and NPM packages into output files.

crawler crawler-python license license-checker license-checking license-crawler license-management licenses licensing nodejs npm npm-license-crawler npm-license-tracker npm-licenses python python-script python3

Last synced: 09 May 2026

https://github.com/ipanalytics/crawlerscope

Interactive crawler IP intelligence dashboard for search, AI, and user-triggered fetchers.

ai-bots ai-crawlers bingbot bot-detection cidr crawler crawler-detection data-visualization github-pages googlebot gptbot ip-ranges nginx open-data osint robots-txt threat-intelligence waf web-security

Last synced: 09 Jun 2026

https://github.com/yordadev/fenrisjs

A NodeJS application that scrapes any links from a given input and outputs the results nicely into one of two files, external or internal file for further analysis.

analysis crawler link-collection link-crawler nodejs nodejs-application

Last synced: 10 May 2026

https://github.com/rogerluo410/gcrawler

Google search crawler for Ruby version. Crawling each links' text and url by keywords on Google.com.

crawler crawling google ruby

Last synced: 22 Jun 2026

https://github.com/basemax/doostihaacrawler

A PHP-implemented crawler for Doostihaa.com. (Database of thousands of movies)

crawler crawler-example crawler-php crawler-testing crawlers database-movie database-movies doostihaa doostihaa-com movie movie-database movie-database-api movie-database-website movie-db movies movies-database php php-crawler

Last synced: 13 Sep 2025

https://github.com/jovijovi/ether-crawler

A transaction crawler for the Ethereum ecosystem.

blockchain crawler ether ethereum transaction

Last synced: 08 May 2026

https://github.com/dineshsprabu/concurrent-web-crawler

Flexible and concurrent web crawler implemented in 'go'

concurrent-web-crawler crawler go-crawler spider web-crawler

Last synced: 12 Jan 2026

https://github.com/khadkarajesh/aptoide

Aptoide app crawler using beautifulsoup

beautifulsoup4 crawler flask python3 web-application

Last synced: 19 May 2026

https://github.com/tungct/facebook-crawler

crawler facebook python

Last synced: 04 Mar 2025

https://github.com/tungct/golangcrawler

Crawler goroutine Golang

crawler go

Last synced: 07 Jun 2026

https://github.com/okwilkins/web-crawler

This program will crawl through entire domains, exporting every link it can find into a txt file.

crawler crawling files html htmlparser python python3 reader scraper threading threads web writer

Last synced: 14 Mar 2025

https://github.com/vitaee/laravelandcrawlers

php web crawler examples with oop concept and laravel project

crawler laravel php

Last synced: 25 Apr 2026

https://github.com/jiamingla/mvdis_i18n

機車駕照預約考試多語友善版 Non-official

crawler jquery koa koajs nodejs supertest

Last synced: 04 Jan 2026

https://github.com/zephyrpersonal/github-trending-crawler

transform github-trending repos to json data

cheerio crawler fetch github node repository spider trending

Last synced: 04 Jan 2026

https://github.com/buren/site_health

Crawl a site and check various health indicators

crawler rubygem site-health

Last synced: 21 Mar 2025

https://github.com/igorbrizack/web-scraper

Aplicação de raspagem de dados HTML, construída em python.

crawler pytest python3 scraper

Last synced: 08 May 2026

https://github.com/zhangwinning/go-crawler

crawler golang

Last synced: 30 Mar 2025

https://github.com/rogerchappel/crawldeck

Local-first crawl job deck for fixture-backed queues, health, and crawler adapter seams.

agent-tools cli crawler local-first queue typescript

Last synced: 26 May 2026

https://github.com/hong539/acgbox_crawler

An web-crawler for gamer.com.tw/acgbox

beautifulsoup4 crawler pandas python requests scrapy sqlalchemy web-crawler

Last synced: 05 Apr 2025

https://github.com/maxiroellplenty/gs-robot

NodeJs tool to scrap gelbe-seiten

axios cheerio crawler gelbe-seiten nodejs scraper yargs

Last synced: 18 May 2026

https://github.com/basemax/kashan-university-phone-directory

This repository contains a scraper and dataset for extracting and publishing the phone directory of employees and other personnel from the University of Kashan. It includes tools to scrape, parse, and export data from a given HTML file into JSON format.

crawler crawlers database html-scraper json kashan kashan-university scraper scraper-api scraper-html scrapers university university-of-kashan

Last synced: 18 May 2026

https://github.com/basemax/css-properties

The CSS Properties Repository is a comprehensive collection of CSS properties, categorized and detailed for web developers. It offers a structured overview of various CSS properties, including their names, categories, brief descriptions, and links to detailed references.

crawler css css-properties css-property css3

Last synced: 11 Jun 2026

https://github.com/ilsonlasmar/inovamind

Desafio Inovamind - Crawler em Ruby on Rails com Sidekiq + Redis

crawler rails5 sidekiq

Last synced: 12 Sep 2025

https://github.com/milouk/web-crawler

Phoneutria Crawler

crawler crawlers database internet jar java spider web web-crawler

Last synced: 21 Apr 2026

https://github.com/khoinguyen2k/web-crawler

about crawl data

crawler jsoup-library scraper selenium-java

Last synced: 06 Mar 2025

https://github.com/morungos/github-issue-crawler

Github crawler for public repositories, issues, and comments

crawler github issues

Last synced: 30 Apr 2026

https://github.com/cryptoc1/earl

Earl is looking for URLs in your area.

crawler middleware nuget webscraping

Last synced: 18 May 2026

https://github.com/zhs007/lottery-crawler

基于jarvis-task的爬虫，主要用来爬取lottery数据。

crawler jarvis-task

Last synced: 30 Oct 2025

https://github.com/altescy/mincrawler

A minimal web crawler.

configurable crawler python scraping

Last synced: 21 Mar 2025

https://github.com/zhanymkanov/marketplace_parser

Products and Reviews Crawler

crawler python scrapy

Last synced: 26 May 2026

https://github.com/teal33t/base_crawler

Simple scaffold for selenium based crawler bots

crawler scaffold-template selenium selenium-python

Last synced: 18 May 2026

https://github.com/buttermiilk/sentakusha

simple (and badly written express.js) crawler for the washing machine game.

api crawler imagegeneration maimai

Last synced: 07 Apr 2025

https://github.com/opda0887/bahamut-crawler-to-gmail

發想：使用Python爬蟲取得巴哈姆特版面的最新論壇，並用gmail傳送這些訊息給自己。A thought: Use Python crawler to the latest forums in Bahamut, and use gmail to send these messages to myself.

crawler crawler-python

Last synced: 21 Mar 2025

https://github.com/srx-2000/swaiter

a programe to wait until the selenium element has loaded——selenium模拟器元素等待程序

crawler selenium selenium-python

Last synced: 18 May 2026

https://github.com/richecr/pyhltv

Repository to extract information from the HLTV website.

crawler csgo hacktoberfest hltv hltv-api python3

Last synced: 21 May 2026

https://github.com/pierlauro/mdbubing

From WARC records to MongoDB documents

bubing crawler crawling warc warc-files warc-format warc-record webarchive webarchiving

Last synced: 29 Mar 2025

https://github.com/xcrypt0r/xcrawler

✂️ A crawling example for maplestory with various languages using multi-threading

crawler crawling multithreading parsing regexp

Last synced: 14 Jun 2025

https://github.com/mazzasaverio/lean-jobs-crawler

(Let's build) A lean, high-performance web crawler specializing in job posting extraction directly from company websites. Uses LLM for intelligent URL discovery and data extraction.

crawler docker llm logfire neon openai python uv

Last synced: 15 Mar 2025

https://github.com/konradlinkowski/mailcrawler

Crawler to find emails in the websites

crawler scraper

Last synced: 05 Jan 2026

https://github.com/wafflecomposite/yggdrasil-crawler-python

Small Yggdrasil network crawler with CLI, written in Python3

crawler mesh-networks no-dependencies python python3 yggdrasil yggdrasil-api yggdrasil-network

Last synced: 17 Nov 2025

https://github.com/flavien-hugs/scrapy-test

Manipulation de la librairie Scrapy. Mini script permet d'extraire l'ensemble des personnages de dessin animé sur Wikipedia.

crawler python scraping scrapy

Last synced: 29 Mar 2025

https://github.com/danoctavian/proxy-master

manage a set of http proxies

crawler http-proxy node-proxy-server

Last synced: 27 May 2026

https://github.com/naveenaidu/google-crawler

Google Crawler - Curates the search results

beautifulsoup crawler scraper

Last synced: 27 May 2026

https://github.com/hedon954/go-crawler

A crawler system implemented in Go.

crawler go

Last synced: 15 Mar 2025

https://github.com/skylightqp/namu2csv

A namuwiki crawler that converts header to csv file for kartrider wiki

crawler rust

Last synced: 24 Jun 2025

https://github.com/taurusolson/jobscraper

Je cherche un poste de développeur en France

crawler

Last synced: 23 Jun 2025

https://github.com/rix4uni/pathcrawler

Discover new paths via scanning html.

bug-bounty bugbounty bugbountytips crawler hacking infosec osint osint-resources osint-tool pathcrawler penetration-testing pentest-tool pentesting recon reconnaissance scrape security security-tools threat-intelligence

Last synced: 17 Feb 2026

https://github.com/fnkr/gocrawl

Simple web crawler.

crawler http-client

Last synced: 23 Mar 2025

https://github.com/trixsec/zeuscrawler

The ultimate web crawling powerhouse, striking the web like lightning to harvest data with divine precision.

crawler cybersecurity information-gathering information-retrieval osint python scraper spider web-crawler web-scraper

Last synced: 07 Apr 2025

https://github.com/fiandev/otaku-crawler

simple way to scrape and collect anime list from otakudesu

anime bun crawler nodejs scraper

Last synced: 08 May 2026

https://github.com/duaraghav8/larry-crawler

Kayako Twitter challenge

crawler fetch-tweets hashtag nodejs pagination tweets twitter-api

Last synced: 17 May 2026

https://github.com/mahmoudgalalz/pupt

A starter for web crawling using Puppeteer

crawler nodejs scraping

Last synced: 17 May 2026

https://github.com/pavelsr/email-extractor

Fast email crawler

crawler email-crawler email-marketing perl telemarketing

Last synced: 18 Mar 2025

https://github.com/deployment-helper/api-template-crawler

API interface to crawl the templates

api crawler deployment-helper gcp gcp-cloud-run golang rest

Last synced: 01 Sep 2025

https://github.com/bac0id/wayback-machine-auto-save

A crawler to save web pages on list to Save Page Now of Internet Archive's Wayback Machine.

crawler internet-archive python save-page-now wayback-machine

Last synced: 28 May 2026

https://github.com/mkfsn/chronos

A light cron-like container service - create cron job easily.

crawler cron cronjob golang

Last synced: 20 Jul 2025

https://github.com/mohabmes/matool

A collection of various custom tools. { Antesh, CITerm, INetSC, KADManga, Tomado }

cli codeigniter-terminal crawler mangareader markd markdown markdown-to-html parser readme scan-tool scanner-web

Last synced: 15 May 2026

https://github.com/droiddevgeeks/nodelearning

This is node learning demo. It has covered all basics of node.

crawler database ejs ejs-express mcv middleware-nodes mongodb node node-module nodejs nodemailer npm-package router sign

Last synced: 05 Apr 2026

https://github.com/bing-su/arcalive-crawler-python

아카라이브 크롤러

crawler python

Last synced: 21 Jun 2026

https://github.com/vinzdef/apartment-crawler

Crawler for housing in the Netherlands. It scrapes FB groups and Kamernet listings

amsterdam crawler fb-groups housing phantomjs

Last synced: 31 Mar 2025

https://github.com/gnaneshkunal/book-miner

Web crawler for Book reviews (Goodreads)

crawler goodreads typescript

Last synced: 03 Apr 2025

https://github.com/tanja-4732/od-get

A Rust tool for recursively crawling & downloading data from open directories

cli crawler open-directory open-directory-downloader rust

Last synced: 26 May 2026

https://github.com/bingxyz/blackcat

使用telegram bot查詢黑貓物流

crawler nodejs telegram-bot

Last synced: 21 May 2026

https://github.com/adamfisher/scrapyrt.client

A C# client to make calls to a scrapyrt (Scrapy real-time) HTTP endpoint.

crawler scraper scrapy scrapy-crawler scrapy-framework scrapy-spider

Last synced: 21 Mar 2025

https://github.com/liyun-li/meh-bot

Just a bot that clicks an image

bot crawler docker headless-firefox meh python python3 selenium twilio-sms-api

Last synced: 20 Mar 2025

https://github.com/sirius-mhlee/naver-cafe-crawler

NAVER Cafe Crawler using pandas, tqdm, Selenium, BeautifulSoup4

beautifulsoup4 crawler pandas selenium tqdm

Last synced: 09 Mar 2026

https://github.com/bytejoseph/osintgit

OSINT investigation tool for Github

crawler email github github-to-email hacking hacking-tool hacktoberfest hacktoberfest2024 latest open-source-intelligence osint osint-python osint-tool pentesting pentesting-tools python python3 script streamlit streamlit-webapp

Last synced: 23 Jul 2025

https://github.com/iamgideonidoko/web-crawler-with-php

Sample implementation of web crawler in PHP

crawler php webcrawler

Last synced: 21 Mar 2025

https://github.com/fenying/huaban-crawler

A board-pins crawler for huaban.com, base on Node.js

crawler huaban

Last synced: 02 Jul 2025

https://github.com/geoffreybauduin/website-checker

Performs useful checks against a website, such as 404 errors reporting, structured data validation...

crawler seo structured-data web-spider website

Last synced: 19 Apr 2025

https://github.com/songjiayang/china_repos

github repo 爬虫

china crawler statistics

Last synced: 18 Jul 2025

https://github.com/simonrichardson/crwlr

Crawl all the things!

crawler meshuggah

Last synced: 24 Mar 2025

https://github.com/uranusx86/dcard-crawler-analyzer

get Dcard & Meteor forum content and analyze !

crawl crawler dcard nlp python word-cloud word-count word-frequency

Last synced: 14 Jul 2025

https://github.com/tibiasolutions/sharp-parser

Tibia.com parser informations in C#

crawler nuget parsed-data tibia tibia-parser

Last synced: 17 May 2026

https://github.com/matheusfelipeog/google-doodles

Mapeie e faça download dos Doodles do Google.

crawler google google-doodle python web-scraping

Last synced: 13 Jul 2025

https://github.com/donuts-are-good/araknnid

GO GO TINY SPIDER!

crawler hacktoberfest search-engine spider

Last synced: 20 Nov 2025

https://github.com/pjullrich/link-crawler

Python Crawler that reports broken links on a given website and its sup-pages

asyncio breadth-first-search broken-links crawler python

Last synced: 11 Jul 2025

https://github.com/purrproof/smartcrawl

An adaptable framework for gathering, aggregating and analyzing data, focusing on blockchain and smart contracts.

blockchain cli crawler explorer framework go golang hacktoberfest

Last synced: 16 May 2026

https://github.com/henkman/crawlers

:squirrel: some crawlers and downloaders

crawler

Last synced: 28 May 2026

https://github.com/homuchen/instagram-crawler

Instagram crawler

crawler instagram nodejs-crawler

Last synced: 24 Mar 2025

https://github.com/orsinium-labs/gpcc

Python library and CLI tool to fetch information from GCP Browser (https://gpc-browser.gs1.org/)

crawler gpc gs1

Last synced: 19 Jun 2025

https://github.com/hudson-newey/user-web-crawler

The Archive.org Crawler works through volunteering users who install an extension on their browsers. When the user visits a webpage, the URL is anonymously added to the Archive.org database.

archive crawler open-internet

Last synced: 27 Feb 2025