An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/hoan02/novel-crawler

Tool cào dữ liệu truyện để phục vụ cho doctruyen.io.vn

crawler python

Last synced: 13 Mar 2025

https://github.com/tjdsneto/jcnet-crawler

Extract (scrap) movie schedule info from JCNet movies page

crawler scraping

Last synced: 11 Apr 2026

https://github.com/onetail/applenews

simple crawler

crawler simple

Last synced: 18 Mar 2025

https://github.com/andresayac/cuevana3

Cuevana3 scraper is a content provider of the latest in the world of movies and tv show in Latin Spanish dub or subtitled.

crawler cuevana3 php scraper

Last synced: 05 Apr 2025

https://github.com/mnoalett/cscrawler

BSc degree thesis - crawler for www.couchsurfing.org

bsc-thesis couchsurfing crawler data-analysis database python

Last synced: 02 May 2026

https://github.com/discountry/crawler-microservice

crawler microservice

crawler

Last synced: 16 Jan 2026

https://github.com/jplitza/urlsearch

Index typical webserver directory listings and then search for arbitrary terms

crawler search

Last synced: 17 Mar 2025

https://github.com/waived/pastebin-ripper

Scrape all pastes from pastebin page + sub-pages

crawler mass-downloader pastebin-ripper pastebin-scraper python3 ripper scraper

Last synced: 24 Jun 2025

https://github.com/sajjadanwar0/booking.com-scraping

Scraping booking.com using Selenium and Beautiful Soup

crawler data python scraping selenium

Last synced: 18 Oct 2025

https://github.com/leonardopinho/instagramfeed

Image list based on a tag for the Instagram feed.

crawler instagram python

Last synced: 28 Mar 2025

https://github.com/edumucelli/rubybikes

A set of Bike Sharing System parsers in Ruby

bike-sharing crawler ruby

Last synced: 12 Apr 2025

https://github.com/jpleorx/tagblender

A simple java API to retrieve hashtags from https://www.tagblender.net/

api crawler hashtags java jsoup parser

Last synced: 20 Mar 2025

https://github.com/raspi/scrapy-crucial

Web crawler for Crucial (crucial.com)

crawler hardware memory scrapy spider

Last synced: 02 Jul 2025

https://github.com/cak/foot

Foot is a library that fetches a list of URLs and silly walks through each site to gather information.

bugbounty crawler scraping

Last synced: 22 May 2026

https://github.com/brighteyekid/rendermw

Zero-dependency dynamic rendering middleware for Express. No Puppeteer. No external services. No cost. Bots get semantic HTML. Users get your SPA.

angular bots crawler dynamic-rendering express expressjs indexing middleware nodejs open-graph prerender react seo spa typescript vue

Last synced: 24 Jun 2026

https://github.com/lucasfogliarini/minhaentradacrawler.consoleapp

Web crawler em C# que usa a biblioteca AngleSharp para extrair detalhes de eventos do site "https://minhaentrada.com.br". Ele analisa o HTML da página e recupera informações como título, data, local e links dos eventos.

anglesharp crawler minhaentrada

Last synced: 19 Jul 2025

https://github.com/basemax/okala-store-ids

A PHP script designed to systematically query the Okala API and extract a comprehensive list of valid store IDs. By automating the retrieval of store details, it enables users to efficiently compile and maintain an up-to-date dataset of active Okala stores for analysis, integration, or further processing.

crawler curl id ids ir iran okala okala-store okala-store-id php store store-okala

Last synced: 10 Jun 2025

https://github.com/sedrubal/webcrawler

Crawl sites and search for security issues.

crawler script security website-auditing

Last synced: 17 Mar 2025

https://github.com/earelin/jwraith

A Java clone of the Wraith website comparison tool.

crawler screenshots screenshots-comparison selenium webtest

Last synced: 17 May 2026

https://github.com/dubniczky/webmap

Website mapping crawler implemented in python

crawler mapping mapping-tools package python scraping security

Last synced: 31 Mar 2025

https://github.com/dubniczky/bad-robot

This is a python crawler that disregards robots.txt rules and downloads disallowed resources

crawler osint-python osint-tool python robots-txt

Last synced: 31 Mar 2025

https://github.com/rsheremeta/web-crawler

A tiny web-crawler which looks for the links, extract and prints them concurrently to the Terminal output

crawler go golang web-crawler webcrawler

Last synced: 12 Jun 2026

https://github.com/dominikrys/web-scraper

🎬 IMDB Web Scraper in Go

crawler go mongodb

Last synced: 14 Apr 2026

https://github.com/johanbook/node-web-crawler

Nodejs CLI for web crawling

cli crawler nodejs typescript

Last synced: 11 Apr 2026

https://github.com/fulcrum6378/twitter_profile_exporter

A web-based application which crawls profiles on Twitter for all of their tweets, all tweets related to them, including their attachments, statistics and data of their authors. Main data is stored in an SQLite database and all media are downloaded. Then it'll be able to reconstruct a Twitter profile in front-end.

crawler exporter profile social-media sqlite twitter twitter-api

Last synced: 17 May 2026

https://github.com/kodemartin/webcrawler

A simple webcrawler

crawler rust

Last synced: 18 Jul 2025

https://github.com/jlenon7/sef_automation

📑 Crawler that automatically enrol in open vacancies in SEF website.

athenna crawler esm nodejs playwright portugal residence sef typescript

Last synced: 03 Mar 2026

https://github.com/tetreum/puppeteer-for-crawling

Daily use crawling methods for puppeteer

crawler crawling puppeteer

Last synced: 12 Apr 2026

https://github.com/maddevsio/spiderwoman

"Vertical" crawler, which main target is to count links (resolved, e.g. from bit.ly) to external domains from all pages of given resources

big-data count-links crawler golang

Last synced: 19 May 2026

https://github.com/r3c0ger/douban-movie-top250-crawler

Crawl the movie information of Douban Movie Top-250, including movie name, movie link, director, starring, release time, production country/region, type, rating, number of reviews and introduction.

beautifulsoup4 crawler lxml python3 spider

Last synced: 10 Jun 2026

https://github.com/ecklf/reddit-clawler

A command-line tool written in Rust that crawls Reddit posts from a user or subreddit

cli crawler downloader downloader-for-reddit reddit

Last synced: 31 Mar 2025

https://github.com/tsaohucn/crawler_fb_user_group

This is crawler use selenium for facebook user groups

crawler facebook-user-groups rails ruby

Last synced: 16 May 2026

https://github.com/raspi/scrapy-transcend

Crawler for transcend (us.transcend-info.com)

crawler hardware memory scrapy spider

Last synced: 16 Jul 2025

https://github.com/kimi0230/pstocks

Python 爬股市

crawler numpy pandas python python3 stocks

Last synced: 07 Apr 2026

https://github.com/jeanluc162/prnt-sc-crawler

Crawler for the Website prnt.sc

crawler net5 net50 prntsc screenshots

Last synced: 07 Jun 2026

https://github.com/rowyio/llm-web-crawler

Web Scraper and Crawler for LLM Apps and AI Workflows with NoCode / LowCode. Plug and play with your own logic and customize it flexibly and scalably on BuildShip.

ai automation crawler llm lowcode nocode scraper web web-crawler workflow

Last synced: 15 Jul 2025

https://github.com/rayspock/go-web-crawler

A web crawler to fetch all the links from a given website via go routines.

concurrency crawler golang goroutine

Last synced: 10 Jun 2026

https://github.com/zhou-chaoxian/ax-spider

A simple, powerful, and fast asynchronous Python crawler framework.

asyncio ax-spider crawler httpx python scrapy

Last synced: 18 Mar 2025

https://github.com/zaneh/ocw-crawler

Crawl MIT OpenCourseWare courses with Kimurai. Not affiliated.

crawler kimurai mit ocw opencourseware spider

Last synced: 28 May 2026

https://github.com/crosscutsaw/iscsicrawler

iscsicrawler is a bash script that crawls files in the iscsi targets with ease.

crawler iscsi iscsi-target iscsiadm

Last synced: 16 Jan 2026

https://github.com/d7isme/pixiv-downloader-mod

Modded extension of the pixiv downloader on chrome webstore with premium feature unlocked.

chrome-extension crawler extension-chrome image pem pixiv pixiv-bot pixiv-crawler pixiv-downloader

Last synced: 14 May 2026

https://github.com/tylpk1216/favorite-youtube-to-video

Download your favorite youtube video in PHP

crawler php tool youtube

Last synced: 16 May 2026

https://github.com/isaqueveras/scrape-google-results

Scrape Google Results in Golang

crawler golang google scraper webcrawler

Last synced: 21 Mar 2025

https://github.com/seanghay/wpget

⚡️wpget - A tool for downloading all posts from a WordPress website via public JSON API

crawler wordpress wp-json

Last synced: 08 Feb 2026

https://github.com/jefftriplett/pholcidae-demo

:spider: A Pholcidae demo for crawling/spidering a website

crawler csv pholcidae python scrapper scrapy-crawler spider toml

Last synced: 22 Jul 2025

https://github.com/guilhem/cachanais

Populate cache by crawling pages

cache crawler hacktoberfest

Last synced: 08 Apr 2025

https://github.com/balintpethe/laravel-universal-scraper

Universal Scraper for Laravel

crawler laravel scraper web-scraper

Last synced: 13 Jan 2026

https://github.com/yuchenq/comp90055-project

This is the lastest version of my project belong to Comp90055.

couchdb crawler data-visualization python3 textblob tweepy

Last synced: 16 Jul 2025

https://github.com/moparisthebest/nginx-limit-crawlers

rate limit crawlers in nginx

ai crawler nginx

Last synced: 14 Mar 2025

https://github.com/jjpaulo2/crawler-financeiro

Módulo em Python que extrai dados públicos de planos de previdência do portal da SUSEP.

crawler docker ocr python selenium tesseract

Last synced: 11 Jul 2025

https://github.com/evansuner/smartproxypool

智能代理,自动获取可用高匿代理

crawler fastapi proxy python

Last synced: 15 May 2026

https://github.com/phatpham9/scraper.fun

Building, using & sharing HTML scraper are way funnier!

crawler html-scraper scraper

Last synced: 24 Mar 2025

https://github.com/raspi/scrapy-amp

Crawler for Amiga Music Preservation (AMP) site

amiga crawler mod module music python s3m scrapy spider tracker

Last synced: 11 Jul 2025

https://github.com/longluo/spider

My Python Spider / Crawler

crawler python spider twitter weibo weibo-crawler weibo-spider

Last synced: 11 Jun 2025

https://github.com/blarc/windsurf-crawler

A simple crawler that collects windsurf boards offers from different sites.

crawler windsurf

Last synced: 10 Sep 2025

https://github.com/yggverse/yps

YPS - Yggdrasil Port Scanner

cli crawler network port scanner tcp tool udp yggdrasil

Last synced: 03 Jul 2025

https://github.com/jregistr/laker-parser

Small program to scrape and sanitize scheduling data.

crawler gradle htmlunit lakers oswego scraping suny

Last synced: 16 May 2026

https://github.com/billy0402/python-application

A learning project from the book 'Python 技術者們'.

course crawler matplotlib opencv pandas python requests selenium sklearn

Last synced: 12 Apr 2026

https://github.com/mattmoony/webcrawler.py

A very simple python webcrawler. This is just a fun little side project, which I used to gather some valuable experience with advanced Python- and Web techniques. 🐍

beautifulsoup crawler indexing mongodb multithreading pymongo python spider web webcrawler

Last synced: 29 Apr 2026

https://github.com/Kissaki/website-downloader

A website Crawler and downloader. Useful for archiving dynamic websites as static files.

archive crawler csharp download gpl website

Last synced: 10 Mar 2025

https://github.com/ma-pony/playwright-spider-utils

Playwright Spider Utils is a utility library for engineers using the Playwright framework to build web crawlers. This project provides common web scraping functions, simplifying the process of crawler development and enhancing productivity.

crawl crawler playwright python scrapy selenium spider spiderman

Last synced: 06 Jan 2026

https://github.com/jonesrussell/pipelinex

Firecrawl-style web intelligence pipeline powered by North Cloud

crawler pipeline vue

Last synced: 09 Mar 2026

https://github.com/andrefs/derzis

A path-aware distributed linked data crawler

crawler linked-data

Last synced: 09 Aug 2025

https://github.com/sxoxgxi/webcrawler

A multi threaded web crawler

crawler python webcrawling

Last synced: 28 Jul 2025

https://github.com/manikantasanjay/stackoverflow_tag_generator_webcrawler

StackOverFlow Tag Generator Using a WebCrawler.

crawler python

Last synced: 08 Apr 2025

https://github.com/eklem/vinmonopolet-crawler

Crawling Vinmonopolet-data and indexing it to a norch search index

crawler dataset javascript norch search-engine

Last synced: 26 Mar 2025

https://github.com/n3d1117/sisop17

Esercizio per esame di Sistemi Operativi - 2017

crawler html java parser semaphores synchronization thread-safety threading

Last synced: 06 Apr 2025

https://github.com/fritz-c/itunes-stats

Fetch info on podcasts, etc. from iTunes RSS data

crawler itunes

Last synced: 18 Jun 2026

https://github.com/ronierisonmaciel/crawler

Um crawler utilizando BeautifulSoup tem como objetivo extrair informações de sites de maneira eficiente e estruturada. BeautifulSoup é uma biblioteca Python que facilita a análise e extração de dados de páginas HTML e XML. O projeto permite coletar e organizar informações relevantes.

beautifulsoup4 crawler crawling python python3

Last synced: 26 Mar 2025

https://github.com/fengzixu/crawlinganything

如果你对数据有兴趣,那么就应该立即行动起来

crawler python

Last synced: 15 Jun 2026

https://github.com/smikodanic/dex8-sdk

DEX8 SDK is software development kit for DEX8.com platform.

crawler crawler-engine data-extraction dex8 scraper scraping-websites spider

Last synced: 11 Jul 2025

https://github.com/jjeffcaii/ok-spider

a simple web crawler like scrapy

crawler nodejs scrapy spider

Last synced: 02 May 2026

https://github.com/pwcong/zhihuhook

知乎钩子,愿者上钩。

crawler zhihu

Last synced: 08 Dec 2025

https://github.com/lulurun/kick-off-crawling

make web scraping easy

crawler nodejs scraper

Last synced: 01 May 2026

https://github.com/amazingcoderpro/pythonup

玩转Python!for improving python skills

crawler python

Last synced: 19 May 2026

https://github.com/jiusanzhou/reaper

Distributed Elegant Scraper and Crawler Framework for Rust.

crawler data-scraping rust scraper spider

Last synced: 24 Jul 2025

https://github.com/govau/warcraider

Convert WARC files into Avro for big data processing

avro bigquery crawler rust warc

Last synced: 16 May 2026

https://github.com/burakkaygusuz/web-security-scanner

A Java-based web security browser, it detects common web vulnerabilities such as SQL Injection, XSS and sensitive information disclosure.

crawler java vulnerability-scanner web-security xss

Last synced: 16 May 2026

https://github.com/brnrajoriya/india-s-states-and-cities-crawler

Crawler to crawl india's all states and cities

cities crawler india php script states

Last synced: 29 May 2026

https://github.com/lilchen96/pokemon-crawler

Crawl JSON-formatted data for Pokémon, based on the PokeAPI.

crawler pokemon

Last synced: 28 Dec 2025

https://github.com/alphabs/navercafeclient

네이버 카페 글 목록 크롤링을 위한 닷넷 라이브러리

crawler crawling dotnet naver naver-api naver-cafe web-scraper web-scraping

Last synced: 06 May 2026

https://github.com/allancapistrano/steam.py

An API wrapper for Steam written in Python.

crawler python steam

Last synced: 16 Mar 2025

https://github.com/engineer2b/cure_crawl

Cure afvalbeheer kalender crawler

afval afvalwijzer browser crawler kalender

Last synced: 22 Oct 2025

https://github.com/tech-espm/misc-webbot

This project is aimed on creating personal assistants for replying messages about specifics issues.

classification-model crawler nlp

Last synced: 12 Jun 2026

https://github.com/reineimi/va2crawl

Website crawler, validator and SEO optimizer

crawler seo-optimization seotools validator website-crawler

Last synced: 07 Jul 2025

https://github.com/beckkramer/puppeteer-traverse

Puppeteer utility to easily run a function you define per route on a set of routes.

crawler crawling nodejs puppeteer

Last synced: 06 May 2026

https://github.com/bradsec/gomine

A Go CLI tool to quickly crawl and mine (download) specific file types from websites.

cli crawler golang terminal-based

Last synced: 09 Apr 2025

https://github.com/fusetim/bitcrawler

Small experiments to learn a bit more about BitTorrent, DHT and etc. Might also be a BitTorrent DHT crawler one day?

bittorrent crawler dht

Last synced: 30 Mar 2025

https://github.com/patrickschababerle/schabbi-webscraper

Small and easy to use NodeJS webcrawler project. Returns basic information about the crawled sites.

crawler puppeteer scraper scraping web-crawler

Last synced: 04 Apr 2025

https://github.com/diegojromerolopez/relwrac

A basic crawler developed with python and asyncio

asyncio crawler page-rank python

Last synced: 11 Nov 2025