Crawler | Ecosyste.ms: Awesome

https://github.com/igor-karpukhin/web-crawler

Web site crawler

crawler go website

Last synced: 29 Mar 2025

https://github.com/zigai/crawlwright

Web crawling framework powered by Playwright

crawler crawling playwright python scraping wrighter

Last synced: 18 May 2026

https://github.com/xyk2002/aqistudy-crawler

关于网站：https://www.aqistudy.cn/historydata/ 的空气质量数据的异步协议爬虫，可以快速的获取的数据将会保存至CSV文件

aqistudy crawler python-3

Last synced: 22 Aug 2025

https://github.com/ecklf/reddit-clawler

A command-line tool written in Rust that crawls Reddit posts from a user or subreddit

cli crawler downloader downloader-for-reddit reddit

Last synced: 31 Mar 2025

https://github.com/datvodinh/laptop-price-prediction

An End to End Data Science Project about Laptop Price Prediction

crawler ensemble-learning scrapy selenium xgboost

Last synced: 11 May 2025

https://github.com/kweonminsung/crawl2toast

Real-time toast notification of crawled data with CSS selectors(Windows Only)

beautifulsoup4 crawler selenium tkinter toast-notifications

Last synced: 18 May 2026

https://github.com/knguyen780/web-crawler

about crawl data

crawler jsoup-library scraper selenium-java

Last synced: 25 Jun 2025

https://github.com/lfsc09/crawl-this-go

Simple CLI tool for crawling pdf documents and html pages

crawler go

Last synced: 18 Jun 2025

https://github.com/moj124/web_crawler

The web_crawler is a asynchoronous gevent link crawler that maps all the associated local links constrained by the input webpage url.

crawler crawler-python links-spider

Last synced: 13 Mar 2025

https://github.com/isaqueveras/scrape-google-results

Scrape Google Results in Golang

crawler golang google scraper webcrawler

Last synced: 21 Mar 2025

https://github.com/jefftriplett/pholcidae-demo

:spider: A Pholcidae demo for crawling/spidering a website

crawler csv pholcidae python scrapper scrapy-crawler spider toml

Last synced: 22 Jul 2025

https://github.com/zhou-chaoxian/ax-spider

A simple, powerful, and fast asynchronous Python crawler framework.

asyncio ax-spider crawler httpx python scrapy

Last synced: 18 Mar 2025

https://github.com/thejoin95/free-proxies.info

API service for get anonymous and non proxy, filter by latency, country, updatetime and more

api crawler http-proxy proxy proxy-list python scraper

Last synced: 29 Oct 2025

https://github.com/vaenow/crawler-chromeless

A chromeless crawler for coursera

chromeless coursera crawler puppeteer

Last synced: 18 May 2026

https://github.com/jakubboucek/blog.cz-backup-robot

crawler

Last synced: 25 Feb 2025

https://github.com/zaneh/ocw-crawler

Crawl MIT OpenCourseWare courses with Kimurai. Not affiliated.

crawler kimurai mit ocw opencourseware spider

Last synced: 28 May 2026

https://github.com/vivekg13186/lucas

A web crawler

crawler crawler-engine crawling-framework java

Last synced: 19 Apr 2026

https://github.com/evansuner/smartproxypool

智能代理,自动获取可用高匿代理

crawler fastapi proxy python

Last synced: 15 May 2026

https://github.com/javapuppteernodejs/bypass-cloudflare-turnstile-crawl4ai

Learn how to integrate Crawl4AI with CapSolver to automatically solve Cloudflare Turnstile challenges.

automation capsolver captcha captcha-solver cloudflare-turnstile cloudflare-turnstile-bypass cloudflare-turnstile-solver crawl4ai crawler data-extraction python turnstile web-scraping

Last synced: 17 May 2026

https://github.com/sxoxgxi/webcrawler

A multi threaded web crawler

crawler python webcrawling

Last synced: 28 Jul 2025

https://github.com/joyceannie/moviespider

This project is used to crawl movie data from IMDb. Scrapy framework is used to extract relevant information like movie title, datePublished, summary, genres, director etc.

crawler datascience python scrapy spider webscraper

Last synced: 24 Mar 2025

https://github.com/snwfdhmp/3gm-bot

Bot for the online french indie game 3gm.fr implemented in Ruby. Mostly website crawling and task automation.

3gm-bot crawler game-bot task-automation web-crawling

Last synced: 30 Oct 2025

https://github.com/jianlizh429/crawler

crawler spider

Last synced: 03 Jul 2025

https://github.com/guilhem/cachanais

Populate cache by crawling pages

cache crawler hacktoberfest

Last synced: 08 Apr 2025

https://github.com/balintpethe/laravel-universal-scraper

Universal Scraper for Laravel

crawler laravel scraper web-scraper

Last synced: 13 Jan 2026

https://github.com/apurvsikka/mediaverse

MediaVerse is a versatile search engine for various media types such as anime, books and drama

anime anime-api anime-api-free api-rest bun crawler extensions extensions-pack free-manga kdrama lightnovel manga manga-api manga-api-free manga-crawler manga-reader movies netflix ts tv

Last synced: 29 Mar 2025

https://github.com/gigio1023/comic-crawler

Comic crawler.

beautifulsoup crawler python3

Last synced: 05 Jul 2025

https://github.com/yuchenq/comp90055-project

This is the lastest version of my project belong to Comp90055.

couchdb crawler data-visualization python3 textblob tweepy

Last synced: 16 Jul 2025

https://github.com/n3d1117/sisop17

Esercizio per esame di Sistemi Operativi - 2017

crawler html java parser semaphores synchronization thread-safety threading

Last synced: 06 Apr 2025

https://github.com/phatpham9/scraper.fun

Building, using & sharing HTML scraper are way funnier!

crawler html-scraper scraper

Last synced: 24 Mar 2025

https://github.com/xprnvd/makdi

Website crawler created for pentest exercises like HTB.

crawler htb htb-scripts pentest python

Last synced: 20 Jul 2025

https://github.com/longluo/spider

My Python Spider / Crawler

crawler python spider twitter weibo weibo-crawler weibo-spider

Last synced: 11 Jun 2025

https://github.com/yggverse/yps

YPS - Yggdrasil Port Scanner

cli crawler network port scanner tcp tool udp yggdrasil

Last synced: 03 Jul 2025

https://github.com/tatamiya/gas-new-books-crawler

Crawling new book information from 版元ドットコム(https://www.hanmoto.com/)

crawler gas

Last synced: 30 Oct 2025

https://github.com/huyduc1602/uniapp-crawler

Crawl và Dịch tài liệu Uni-app

crawler docker python

Last synced: 25 Jan 2026

https://github.com/qzcool/uscis-case-status-estimation-system-stat-ez

Estimates time of case results arrival, for applicants who are waiting for their USCIS case results with the receipt numbers at hand.

beautifulsoup crawler immigration web

Last synced: 16 Jun 2025

https://github.com/jimut123/leaderbehaviour

Scrapy project to get and extract the names of Leaders, their misdeed by scraping news website!

crawler leaderbehaviour newsscraper scrapy timesofindia

Last synced: 16 Jan 2026

https://github.com/arefshojaei/spider

PHP web spider

bot crawler crawling php php-library php-tools php8 scraper scrapping spider web web-bot

Last synced: 13 Feb 2026

https://github.com/billy0402/python-application

A learning project from the book 'Python 技術者們'.

course crawler matplotlib opencv pandas python requests selenium sklearn

Last synced: 12 Apr 2026

https://github.com/polunlin/crawler

crawler news-crawler python python3

Last synced: 14 May 2026

https://github.com/pjt3591oo/spider-base_crawler

scrapy 기반 크롤러 만들기

crawler python scrapy spider

Last synced: 16 May 2025

https://github.com/jamesponddotco/wikiextract

[READ-ONLY] A word extractor for Wikipedia articles.

crawler crawling diceware go wikipedia wikipedia-crawler word-extraction

Last synced: 15 Mar 2025

https://github.com/Kissaki/website-downloader

A website Crawler and downloader. Useful for archiving dynamic websites as static files.

archive crawler csharp download gpl website

Last synced: 10 Mar 2025

https://github.com/dylancl/sitemap-crawler

Verify the status of each url in a (hosted) sitemap XML file.

crawler parser scraper sitemap xml

Last synced: 04 Oct 2025

https://github.com/anthonysigogne/scrapy

A list of simple scrapers made with Scrapy

crawler elasticsearch python scrapy spider

Last synced: 11 Apr 2026

https://github.com/ma-pony/playwright-spider-utils

Playwright Spider Utils is a utility library for engineers using the Playwright framework to build web crawlers. This project provides common web scraping functions, simplifying the process of crawler development and enhancing productivity.

crawl crawler playwright python scrapy selenium spider spiderman

Last synced: 06 Jan 2026

https://github.com/azshurith/depth-crawler

A simple yet powerful Python web crawler that explores a given domain up to a specified depth and outputs a JSON sitemap of URLs and page titles.

crawler puppeteer python

Last synced: 20 Apr 2026

https://github.com/yyj08070631/web-spider

一个网络蜘蛛

crawler spider webspider

Last synced: 11 Sep 2025

https://github.com/amazingcoderpro/pythonup

玩转Python！for improving python skills

crawler python

Last synced: 19 May 2026

https://github.com/rafaelmoraes003/tech-news

Analysis and manipulation of news data from a technology website obtained through data scraping using Python.

crawler data-scraping https mongodb parsel pymongo python web-scraping

Last synced: 05 May 2026

https://github.com/laffrex/xiaolanben_crawler

一个高效、稳定的小蓝本网站数据采集工具，可自动提取公司和集团产品、媒体及股东等信息，支持智能处理弹窗和自动化数据分类整理，最终目的是为了方便进行SRC信息收集。

crawler pandas selenium src

Last synced: 23 Mar 2025

https://github.com/manikantasanjay/stackoverflow_tag_generator_webcrawler

StackOverFlow Tag Generator Using a WebCrawler.

crawler python

Last synced: 08 Apr 2025

https://github.com/eklem/vinmonopolet-crawler

Crawling Vinmonopolet-data and indexing it to a norch search index

crawler dataset javascript norch search-engine

Last synced: 26 Mar 2025

https://github.com/recepkizilarslan/console-tourist

Tourist is a simple tool that allows you to collect console messages, errors, unsuccessful requests of all your pages after the DOM loading with authentication support.

console-log crawler crawling crawling-tool error-monitoring error-reporting qa qa-automation qatools

Last synced: 24 Feb 2026

https://github.com/coding-dream/aspider

A spider run on Android Platform

crawler jsoup spider

Last synced: 24 Jun 2025

https://github.com/murilobsd/icrop-csv

Icrop-csv para automatizar o processo do download dos relatórios.

crawler csv-export python3

Last synced: 17 Nov 2025

https://github.com/hoan02/novel-crawler

Tool cào dữ liệu truyện để phục vụ cho doctruyen.io.vn

crawler python

Last synced: 13 Mar 2025

https://github.com/murilobsd/rakun

async crawler rust spider

Last synced: 17 Nov 2025

https://github.com/onetail/applenews

simple crawler

crawler simple

Last synced: 18 Mar 2025

https://github.com/andresayac/cuevana3

Cuevana3 scraper is a content provider of the latest in the world of movies and tv show in Latin Spanish dub or subtitled.

crawler cuevana3 php scraper

Last synced: 05 Apr 2025

https://github.com/beckkramer/puppeteer-traverse

Puppeteer utility to easily run a function you define per route on a set of routes.

crawler crawling nodejs puppeteer

Last synced: 06 May 2026

https://github.com/pranavj1001/webcrawler

A simple Web Crawler

crawler java javascript nodejs web-crawler

Last synced: 11 May 2026

https://github.com/jjeffcaii/ok-spider

a simple web crawler like scrapy

crawler nodejs scrapy spider

Last synced: 02 May 2026

https://github.com/pwcong/zhihuhook

知乎钩子，愿者上钩。

crawler zhihu

Last synced: 08 Dec 2025

https://github.com/sajjadanwar0/booking.com-scraping

Scraping booking.com using Selenium and Beautiful Soup

crawler data python scraping selenium

Last synced: 18 Oct 2025

https://github.com/kiranjisonawane143/blockchain-data-crawler

🔍 Discover and extract valuable data from blockchain networks efficiently with this easy-to-use data crawler.

binance bitcoin bsc coingecko coingecko-api crawler crypto-bot cryptocurrencies cryptocurrency ethereum scraper

Last synced: 06 May 2026

https://github.com/mawkler/go-web-crawler

Toy web server written in Go

crawler go

Last synced: 15 Aug 2025

https://github.com/lucasfogliarini/minhaentradacrawler.consoleapp

Web crawler em C# que usa a biblioteca AngleSharp para extrair detalhes de eventos do site "https://minhaentrada.com.br". Ele analisa o HTML da página e recupera informações como título, data, local e links dos eventos.

anglesharp crawler minhaentrada

Last synced: 19 Jul 2025

https://github.com/earelin/jwraith

A Java clone of the Wraith website comparison tool.

crawler screenshots screenshots-comparison selenium webtest

Last synced: 17 May 2026

https://github.com/xjchenhao/crawler-hangzhou

杭州网的新闻爬虫

crawler hangzhou node

Last synced: 21 Feb 2026

https://github.com/brnrajoriya/india-s-states-and-cities-crawler

Crawler to crawl india's all states and cities

cities crawler india php script states

Last synced: 29 May 2026

https://github.com/fulcrum6378/twitter_profile_exporter

A web-based application which crawls profiles on Twitter for all of their tweets, all tweets related to them, including their attachments, statistics and data of their authors. Main data is stored in an SQLite database and all media are downloaded. Then it'll be able to reconstruct a Twitter profile in front-end.

crawler exporter profile social-media sqlite twitter twitter-api

Last synced: 17 May 2026

https://github.com/shivamsaraswat/webxcrawler

WebXCrawler is a fast static crawler to crawl a website and get all the links.

crawler crawling python scraping webcrawler webxcrawler

Last synced: 13 Feb 2026

https://github.com/viktorholk/ranged

A Rust-based web crawler and pattern matcher

crawler regex rust scraper web

Last synced: 30 Mar 2025

https://github.com/kodemartin/webcrawler

A simple webcrawler

crawler rust

Last synced: 18 Jul 2025

https://github.com/allancapistrano/steam.py

An API wrapper for Steam written in Python.

crawler python steam

Last synced: 16 Mar 2025

https://github.com/tetreum/puppeteer-for-crawling

Daily use crawling methods for puppeteer

crawler crawling puppeteer

Last synced: 12 Apr 2026

https://github.com/iyowei/fs-deep-walk

专注于深度扫描指定磁盘位置。

crawler directory file folder folder-tooling fs nodejs recursively-search scan scandir scandir-recursive scanner walker

Last synced: 20 May 2026

https://github.com/reineimi/va2crawl

Website crawler, validator and SEO optimizer

crawler seo-optimization seotools validator website-crawler

Last synced: 07 Jul 2025

https://github.com/bradsec/gomine

A Go CLI tool to quickly crawl and mine (download) specific file types from websites.

cli crawler golang terminal-based

Last synced: 09 Apr 2025

https://github.com/martincastroalvarez/web-to-pdf

Web crawlers using Python & Beautiful Soup

crawler python3 webcrawler

Last synced: 08 Apr 2025

https://github.com/tsaohucn/crawler_fb_user_group

This is crawler use selenium for facebook user groups

crawler facebook-user-groups rails ruby

Last synced: 16 May 2026

https://github.com/raspi/scrapy-transcend

Crawler for transcend (us.transcend-info.com)

crawler hardware memory scrapy spider

Last synced: 16 Jul 2025

https://github.com/chamzzzzzz/supersimplesoup

a go package implements a super simple soup like DOM API

beatifulsoup crawler crawler-go dom go golang html-parser

Last synced: 28 Jan 2026

https://github.com/kimi0230/pstocks

Python 爬股市

crawler numpy pandas python python3 stocks

Last synced: 07 Apr 2026

https://github.com/rowyio/llm-web-crawler

Web Scraper and Crawler for LLM Apps and AI Workflows with NoCode / LowCode. Plug and play with your own logic and customize it flexibly and scalably on BuildShip.

ai automation crawler llm lowcode nocode scraper web web-crawler workflow

Last synced: 15 Jul 2025

https://github.com/raspi/scrapy-amigaremix

amiga crawler music python scrapy spider

Last synced: 28 Apr 2026

https://github.com/nick121212/crawler.v5

crawler nodejs

Last synced: 16 May 2026

https://github.com/montenegrodr/letmecrawl

Curated free proxies

crawler proxy proxy-server proxypool scraper

Last synced: 18 Jan 2026

https://github.com/gn00678465/crawler

使用 Firecrawl API 的 Python CLI 工具，支援多種輸出格式的網頁爬取。

crawler pythone

Last synced: 06 Feb 2026

https://github.com/dinofizz/sitemapper

sitemapper is a site mapping tool which provides a JSON output listing each internal URL and the internal links found at that URL. The crawl depth is configurable, as well as the mode of operation: "synchronous", "concurrent" and "concurrent limited". The tool runs stand-alone or as a distributed crawl engine running in a Kubernetes cluster.

astradb cassandra concurrency crawler go golang kubernetes nats sitemap