Crawler | Ecosyste.ms: Awesome

https://github.com/estroz/seekret

Seekret is a sensitive data crawler for GitHub repositories

crawler security

Last synced: 20 Oct 2025

https://github.com/snuzi/devblogs-aggregator

The backend aggregator project of DevBlogs.net

aggregator blog crawler engineering engineering-blogs tech tech-blogs tech-companies tech-news

Last synced: 23 Jan 2026

https://github.com/elektrostudios/gamefaqs-platform-exclusive-games-scraper

Crawls exclusive video games released for the platforms specified on GameFAQs website to generate a table in Markdown format with the crawled titles.

console-app console-application crawler dotnet game gamefaqs games megadrive netframework nintendo ps3 ps4 ps5 scraper snes vbnet videogame videogames windows xbox

Last synced: 09 May 2026

https://github.com/kgruiz/stealth-crawler

Asynchronous headless-Chrome web crawler that discovers internal links and optionally saves HTML, Markdown, screenshots, or PDFs. Built for scripting, inspection, and automation.

asyncio cli crawler headless-chrome html-scraper pydoll python web-crawler

Last synced: 25 Oct 2025

https://github.com/jonasrenault/cprex

Chemical Properties Relation Extraction

chemistry crawler deep-learning information-extraction machine-learning named-entity-recognition nlp pubchem relation-extraction scientific-articles spacy transformers

Last synced: 23 Feb 2026

https://github.com/bigmeech/mangaka

Crawl scanlation websites for manga pages

comic crawler manga scanlation webtoon

Last synced: 23 Jan 2026

https://github.com/68publishers/crawler-client-php

:spider_web: PHP Client for https://github.com/68publishers/crawler

crawler crawling php scraper scraping

Last synced: 23 Jan 2026

https://github.com/bkdev98/ebooks-crawler

Ebooks crawler for personal purpose using ReactJS.

crawler material-ui nodejs reactjs

Last synced: 12 Apr 2026

https://github.com/dhchenx/quick-crawler

A toolkit for quickly performing crawler functions

crawler crawler-python

Last synced: 27 Oct 2025

https://github.com/dimo414/pycrawl

Simple Python web crawler, primarily designed for inspecting and diagnosing your own website

crawler python

Last synced: 28 Oct 2025

https://github.com/amirespahbodi/url_crawler

Async Web Crawler for Website Title and Favicon

crawler fastapi pydantic python3 sqlalchemy

Last synced: 15 Apr 2026

https://github.com/citiususc/polypus

Polypus: a Big Data Self-Deployable Architecture for Microblogging Text Extraction and Real-Time Sentiment Analysis

analytics bigdata crawler scraper sentiment-analysis twitter

Last synced: 09 Feb 2026

https://github.com/piopi/behatcrawler

A Behat extension that crawls links on a website and executes user-defined function on each one of them.

behat behat-extension crawler php selenium-webdriver

Last synced: 09 Feb 2026

https://github.com/mc256/node-static-webpage-crawler

download entire website with its directory structure.

cache-server crawler nodejs static-site

Last synced: 16 Apr 2026

https://github.com/shgopher/retuo

A distributed crawler

crawler go

Last synced: 27 Feb 2026

https://github.com/jongwony/boardgame_finder

나무위키의 보드게임 카테고리를 모두 크롤링해서 특정 필터를 걸기 위한 프로젝트입니다.

asyncio crawler namuwiki python38

Last synced: 27 Feb 2026

https://github.com/dhsagaryt/multisearch

Search efficiently across different platforms with ease. Type your query and choose from multiple search engines, streamlining your experience.

browser crawler internet search search-algorithm search-engine searchbar searchengine webcrawler

Last synced: 14 Feb 2026

https://github.com/captain-woof/zhi-zhu

Zhi-Zhu is a multithreaded spidering script that recursively searches base webpages and all urls appearing in it, for specific (regex) words.

crawler crawler-python crawling-python python3

Last synced: 15 Feb 2026

https://github.com/seanghay/wpget

⚡️wpget - A tool for downloading all posts from a WordPress website via public JSON API

crawler wordpress wp-json

Last synced: 08 Feb 2026

https://github.com/zaneh/ocw-crawler

Crawl MIT OpenCourseWare courses with Kimurai. Not affiliated.

crawler kimurai mit ocw opencourseware spider

Last synced: 28 May 2026

https://github.com/zhou-chaoxian/ax-spider

A simple, powerful, and fast asynchronous Python crawler framework.

asyncio ax-spider crawler httpx python scrapy

Last synced: 18 Mar 2025

https://github.com/pvital/cra-cra

Another web crawler

crawler python

Last synced: 16 Mar 2025

https://github.com/ecklf/reddit-clawler

A command-line tool written in Rust that crawls Reddit posts from a user or subreddit

cli crawler downloader downloader-for-reddit reddit

Last synced: 31 Mar 2025

https://github.com/linjonh/videowebsidesparser

This Project is used to parse a video web side to remove ads.

crawler parser python

Last synced: 13 Jun 2025

https://github.com/jlenon7/sef_automation

📑 Crawler that automatically enrol in open vacancies in SEF website.

athenna crawler esm nodejs playwright portugal residence sef typescript

Last synced: 03 Mar 2026

https://github.com/danielemoraschi/go-sitemap-app

crawler golang sitemap sitemap-generator

Last synced: 29 Apr 2026

https://github.com/danielemoraschi/sitemap-common

Simple PHP Sitemap generator and crawler library.

crawler php php-library php-sitemap-generator sitemap

Last synced: 11 Mar 2026

https://github.com/johanbook/node-web-crawler

Nodejs CLI for web crawling

cli crawler nodejs typescript

Last synced: 11 Apr 2026

https://github.com/dominikrys/web-scraper

🎬 IMDB Web Scraper in Go

crawler go mongodb

Last synced: 14 Apr 2026

https://github.com/ri0n/unboxer

MP4 crawler and extractor

crawler extractor mp4 object-oriented-design qt

Last synced: 10 May 2026

https://github.com/rsheremeta/web-crawler

A tiny web-crawler which looks for the links, extract and prints them concurrently to the Terminal output

crawler go golang web-crawler webcrawler

Last synced: 12 Jun 2026

https://github.com/raspi/scrapy-kuntavaalit2021-keskisuomalainen

Fetch Keskisuomalainen kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 26 Apr 2025

https://github.com/raspi/scrapy-kuntavaalit2021-sanoma

Fetch Sanoma kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 26 Apr 2025

https://github.com/raspi/scrapy-kuntavaalit2021-almamedia

Fetch Almamedia kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 26 Apr 2025

https://github.com/dubniczky/bad-robot

This is a python crawler that disregards robots.txt rules and downloads disallowed resources

crawler osint-python osint-tool python robots-txt

Last synced: 31 Mar 2025

https://github.com/dubniczky/webmap

Website mapping crawler implemented in python

crawler mapping mapping-tools package python scraping security

Last synced: 31 Mar 2025

https://github.com/sedrubal/webcrawler

Crawl sites and search for security issues.

crawler script security website-auditing

Last synced: 17 Mar 2025

https://github.com/basemax/crawler-news-currency-gold-coins

PHP Crawler to get Persian news related to currency coin and gold.

crawler crawler-php crawler-testing currency currency-exchange-rates gold php php-crawler

Last synced: 05 Jul 2025

https://github.com/solracsf/perplexitybot-ips

Collected PerplexityBot IPs

bots crawler ip ipset perplexity

Last synced: 15 Feb 2026

https://github.com/der3318/daily-pixiv

Integrated Flow - Line Notification of Top Ranked Pixiv Illustrations

crawler line-notify pixiv workflow

Last synced: 03 Mar 2025

https://github.com/basemax/okala-store-ids

A PHP script designed to systematically query the Okala API and extract a comprehensive list of valid store IDs. By automating the retrieval of store details, it enables users to efficiently compile and maintain an up-to-date dataset of active Okala stores for analysis, integration, or further processing.

crawler curl id ids ir iran okala okala-store okala-store-id php store store-okala

Last synced: 10 Jun 2025

https://github.com/shentengtu/cht-yp-crawler

Simple Crawler of www.iyp.com.tw.

crawler node-js nodejs yellow-pages yellowpages

Last synced: 09 May 2026

https://github.com/raspi/scrapy-crucial

Web crawler for Crucial (crucial.com)

crawler hardware memory scrapy spider

Last synced: 02 Jul 2025

https://github.com/jpleorx/tagblender

A simple java API to retrieve hashtags from https://www.tagblender.net/

api crawler hashtags java jsoup parser

Last synced: 20 Mar 2025

https://github.com/edumucelli/rubybikes

A set of Bike Sharing System parsers in Ruby

bike-sharing crawler ruby

Last synced: 12 Apr 2025

https://github.com/leonardopinho/instagramfeed

Image list based on a tag for the Instagram feed.

crawler instagram python

Last synced: 28 Mar 2025

https://github.com/waived/pastebin-ripper

Scrape all pastes from pastebin page + sub-pages

crawler mass-downloader pastebin-ripper pastebin-scraper python3 ripper scraper

Last synced: 24 Jun 2025

https://github.com/ymdarake/otenki-crawler

Yet another weather data scraper.

crawler weather weather-data

Last synced: 02 Feb 2026

https://github.com/luanpotter/series-api

A simple IMDB crawler feeding a Series API

api crawler imdb json rest series

Last synced: 15 Feb 2026

https://github.com/mnoalett/cscrawler

BSc degree thesis - crawler for www.couchsurfing.org

bsc-thesis couchsurfing crawler data-analysis database python

Last synced: 02 May 2026

https://github.com/massongit/ibaraki-univ-circle-crawler

Crawls official circles in Ibaraki University from university's website

crawler python

Last synced: 25 Mar 2025

https://github.com/w3labkr/ipynb-scraper

A collection of frequently used Jupiter notebook code.

crawler ipynb jupyter jupyter-notebook python scrapper

Last synced: 19 Apr 2026

https://github.com/hvtuananh/twitter_crawler

Daemon to call and get tweets from Twitter Public Stream API

crawler java streaming-api tweets twitter twitter-crawler

Last synced: 11 Mar 2025

https://github.com/cls1991/gank.io-go

A simple crawler for fetching pictures from http://gank.io, implemented in golang.

crawler gankio goquery pictures

Last synced: 27 Feb 2025

https://github.com/precioux/pacman

AI Course Projects - Fall 2022

adversial artificial-intelligence bfs-search crawler csp dfs mdp pacman-agent pacman-game pacman-projects reinforcement-learning ucs

Last synced: 28 May 2026

https://github.com/tjdsneto/jcnet-crawler

Extract (scrap) movie schedule info from JCNet movies page

crawler scraping

Last synced: 11 Apr 2026

https://github.com/sanhphanvan96/php-training-crawler

Simple php crawler for training purpose

crawler docker docker-compose nginx php php-fpm

Last synced: 13 Apr 2026

https://github.com/ariefrahmansyah/crawler

Simple website crawler using Go programming language.

crawler go

Last synced: 27 Mar 2025

https://github.com/joyceannie/moviespider

This project is used to crawl movie data from IMDb. Scrapy framework is used to extract relevant information like movie title, datePublished, summary, genres, director etc.

crawler datascience python scrapy spider webscraper

Last synced: 24 Mar 2025

https://github.com/martinius96/web-scraper

Web scraper on ESP8266 board in client mode. Postprocessing in PHP with regular expressions.

arduino bot code crawler esp32 esp8266 html mysql php php7 robot scraper source web

Last synced: 11 Apr 2026

https://github.com/ericc-ch/crawldown

Crawl websites and convert their pages into clean, readable Markdown content using Mozilla's Readability and Turndown.

crawler markdown scraper

Last synced: 05 Jul 2025

https://github.com/bwh1270/allrecipes-scraper

crawler food-computing scraper scraping scrapy

Last synced: 02 Jul 2026

https://github.com/not-raspberry/aio_crawler

AIO single website crawler

asyncio crawler python3

Last synced: 23 Mar 2025

https://github.com/lilchen96/pokemon-crawler

Crawl JSON-formatted data for Pokémon, based on the PokeAPI.

crawler pokemon

Last synced: 28 Dec 2025

https://github.com/matheusfaustino/jazzmaster_crawler

It is a crawling for getting the audio programs from a specific radio program called Jazzmaster

crawler python scrapy

Last synced: 14 Jun 2025

https://github.com/roele/roast

A JVM Data Crawler

cli crawler jvm

Last synced: 16 May 2025

https://github.com/taleblou/brokenlinkchecker_python

This Python web crawler traverses a website, verifies resource links (CSS, JS, images, videos, iframes), and identifies broken links with HTTP errors (400-599)

crawler http links python resources website

Last synced: 03 Apr 2025

https://github.com/tssujt/async-crawler-sample

A simple crawler sample based on asyncio~

aiohttp asyncio crawler

Last synced: 15 Mar 2025

https://github.com/faridfr/dribbble-crawler-php

Dribbble crawler with PHP

crawler dribbble dribbble-crawler php php-crawler user-interface

Last synced: 17 Mar 2025

https://github.com/vaenow/chromeless-coursera-caption

Chromeless crawler coursera video's caption / subtitle

caption chromeless coursera crawler crx subtitle

Last synced: 31 Mar 2025

https://github.com/kimseogyu/crawling-music-ranks

음원순위 크롤링 코드

crawler jest typescript

Last synced: 07 Apr 2025

https://github.com/m1/smap

smap is a site-mapping engine written in Go.

crawler go go-library go-package golang golang-library golang-package golang-tools sitemap sitemap-generator web-crawler web-crawling

Last synced: 01 Jul 2025

https://github.com/kartikmehta8/pycrawler

PyCrawler is a web scraper that takes a link as input and returns all the links connected to the page(s). Goes beyond recursion. Threaded.

crawler cybersecurity python

Last synced: 13 Sep 2025

https://github.com/pengkobe/my-web-crawler

auto pull blog update from bloggers. dev based on angular2

crawler nodejs

Last synced: 18 May 2026

https://github.com/kweonminsung/crawl2toast

Real-time toast notification of crawled data with CSS selectors(Windows Only)

beautifulsoup4 crawler selenium tkinter toast-notifications

Last synced: 18 May 2026

https://github.com/lfsc09/crawl-this-go

Simple CLI tool for crawling pdf documents and html pages

crawler go

Last synced: 18 Jun 2025

https://github.com/jakubboucek/blog.cz-backup-robot

crawler

Last synced: 25 Feb 2025

https://github.com/qzcool/uscis-case-status-estimation-system-stat-ez

Estimates time of case results arrival, for applicants who are waiting for their USCIS case results with the receipt numbers at hand.

beautifulsoup crawler immigration web

Last synced: 16 Jun 2025

https://github.com/jimut123/leaderbehaviour

Scrapy project to get and extract the names of Leaders, their misdeed by scraping news website!

crawler leaderbehaviour newsscraper scrapy timesofindia

Last synced: 16 Jan 2026

https://github.com/kasperomari/simplecrawlerapi

A simple RESTful API that takes a URL and returns all the links in a specific depth.

crawler flask-api flask-restful

Last synced: 02 Apr 2025

https://github.com/pjt3591oo/spider-base_crawler

scrapy 기반 크롤러 만들기

crawler python scrapy spider

Last synced: 16 May 2025

https://github.com/jamesponddotco/wikiextract

[READ-ONLY] A word extractor for Wikipedia articles.

crawler crawling diceware go wikipedia wikipedia-crawler word-extraction

Last synced: 15 Mar 2025

https://github.com/anthonysigogne/scrapy

A list of simple scrapers made with Scrapy

crawler elasticsearch python scrapy spider

Last synced: 11 Apr 2026

https://github.com/rafaelmoraes003/tech-news

Analysis and manipulation of news data from a technology website obtained through data scraping using Python.

crawler data-scraping https mongodb parsel pymongo python web-scraping

Last synced: 05 May 2026

https://github.com/laffrex/xiaolanben_crawler

一个高效、稳定的小蓝本网站数据采集工具，可自动提取公司和集团产品、媒体及股东等信息，支持智能处理弹窗和自动化数据分类整理，最终目的是为了方便进行SRC信息收集。

crawler pandas selenium src

Last synced: 23 Mar 2025

https://github.com/murilobsd/icrop-csv

Icrop-csv para automatizar o processo do download dos relatórios.

crawler csv-export python3

Last synced: 17 Nov 2025

https://github.com/murilobsd/rakun

async crawler rust spider

Last synced: 17 Nov 2025

https://github.com/xjchenhao/crawler-hangzhou

杭州网的新闻爬虫

crawler hangzhou node

Last synced: 21 Feb 2026

https://github.com/viktorholk/ranged

A Rust-based web crawler and pattern matcher

crawler regex rust scraper web

Last synced: 30 Mar 2025

https://github.com/dinofizz/sitemapper

sitemapper is a site mapping tool which provides a JSON output listing each internal URL and the internal links found at that URL. The crawl depth is configurable, as well as the mode of operation: "synchronous", "concurrent" and "concurrent limited". The tool runs stand-alone or as a distributed crawl engine running in a Kubernetes cluster.

astradb cassandra concurrency crawler go golang kubernetes nats sitemap