An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/dnknth/robot.py

Simple web spider

crawler curio python

Last synced: 23 Jul 2025

https://github.com/jonesrussell/north-cloud

A full-stack content intelligence pipeline that crawls, classifies, and routes news articles in real time for downstream consumers.

content crawler publisher

Last synced: 25 Jan 2026

https://github.com/kettou/silentscraper

SilentScraper is a web scraping solution built with advanced stealth protocols. It operates undetectably in the background, bypassing anti-scraping mechanisms to collect structured data at scale. It's lightwight architecture mimics humans browsing patterns, rotating IP addresses, spoofing user agents, and more

beautifulsoup beautifulsoup4 crawler datastructures datastructures-algorithms python webautomation webscraper webscraping

Last synced: 23 Jul 2025

https://github.com/huyduc1602/uniapp-crawler

Crawl và Dịch tài liệu Uni-app

crawler docker python

Last synced: 25 Jan 2026

https://github.com/fscotto/noahcrawler

A simple web crawler written in Java to support a database of Italian regions.

crawler java jsoup-library

Last synced: 14 Sep 2025

https://github.com/allancapistrano/steam.py

An API wrapper for Steam written in Python.

crawler python steam

Last synced: 16 Mar 2025

https://github.com/reineimi/va2crawl

Website crawler, validator and SEO optimizer

crawler seo-optimization seotools validator website-crawler

Last synced: 07 Jul 2025

https://github.com/tisfeng/bing-dict

A Bing command line dictionary, which obtains the query results of bing dictionary by crawler.

bing-dictionary command-line crawler nodejs

Last synced: 13 May 2026

https://github.com/daviddavo/blogspot-crawler

Crawler for blogspot and blogger with beautifulsoup

crawler hacktoberfest python

Last synced: 19 Apr 2026

https://github.com/ekojs/web-crawler

Web Crawler untuk mengambil judul penelitian pada Google Scholar

crawler nodejs web-crawler

Last synced: 12 Apr 2026

https://github.com/bradsec/gomine

A Go CLI tool to quickly crawl and mine (download) specific file types from websites.

cli crawler golang terminal-based

Last synced: 09 Apr 2025

https://github.com/artemnikitin/crawler

Example of web crawler implemented in Go

crawler go golang

Last synced: 22 Jun 2025

https://github.com/ryoii/hook

A declarative Java crawler framework

crawler declarative java java-crawler-framework jdk11

Last synced: 18 Mar 2025

https://github.com/jmousqueton/check-broken-link

Multi-threaded Python tool for crawling and checking all internal links on a website, with live Rich dashboard, broken link export (CSV), and detailed source tracking.

check crawler error400 error404 error500 links

Last synced: 29 Aug 2025

https://github.com/suconghou/sitemap

a simple sitemap generator and page crawler

crawler html-parser nim-lang scraper sitemap spiders

Last synced: 15 May 2026

https://github.com/truongdd03/searchengine

A search engine written in c++.

cpp crawler search search-engine

Last synced: 06 Apr 2025

https://github.com/jyasskin/pbot-crawler

Crawler for PBOT's website to show what has changed.

crawler

Last synced: 23 Mar 2025

https://github.com/patrik-fredon/python_wallpaper_crawler

Wallpaper Crawler is an advanced web scraping tool designed to crawl websites and download high-resolution wallpapers.

crawler crawling-python image image-recognition images python scraping-websites scrapper selenium-python uv

Last synced: 14 Sep 2025

https://github.com/peterbencze/silene

Silene is an open source web crawler framework built upon Pyppeteer.

crawler framework pypp python scraper webcrawler

Last synced: 12 Jan 2026

https://github.com/pyohei/rirakkuma-crawller

Crawler for my hobby.🐻

crawler python rirakkuma

Last synced: 29 Nov 2025

https://github.com/kenanbek/tutorial-python-crawler

Crawling website data using Python with requests and Beautiful Soup libraries

beautifulsoup crawler crawling miner parser python python-requests requests

Last synced: 30 Mar 2025

https://github.com/gxjansen/website-to-pdf

Creates a PDF based on the content of a website/subomain

claude-3-sonnet crawler python3

Last synced: 30 Mar 2025

https://github.com/kestarumper/imagecrawler

Downloads images from given URL

crawler image-downloader

Last synced: 28 Jun 2025

https://github.com/evangelos-karavas/arduino-crawler-line-follower-obstacle-avoidance

Crawler Robot following black line while avoiding obstacles found in the way. Assignment for Mehcatronics

arduino-uno autonomous-vehicles cpp crawler infrared-sensors mechatronics path-planning robotics

Last synced: 28 Apr 2026

https://github.com/eklem/vinmonopolet-crawler

Crawling Vinmonopolet-data and indexing it to a norch search index

crawler dataset javascript norch search-engine

Last synced: 26 Mar 2025

https://github.com/sgeisler/fishbones2epub

fetches the fishbones novel and outputs an epub

crawler ebook epub python-3-6

Last synced: 22 Mar 2025

https://github.com/surister/scrupy

Python library to create web Crawlers which aims to be powerful yet simple.

crawler crawling-framework crawling-python http library python scraping

Last synced: 15 May 2026

https://github.com/manikantasanjay/stackoverflow_tag_generator_webcrawler

StackOverFlow Tag Generator Using a WebCrawler.

crawler python

Last synced: 08 Apr 2025

https://github.com/mccranky83/aistudy-docs-crawler

上海市中小学数字教学系统爬虫

crawler hoarding puppeteer

Last synced: 07 Apr 2025

https://github.com/wilmsn/simple_deye_crawler

A simple crawler to get data from the Deye Inverter using the status webpage

crawler deye fhem inverter shell-script

Last synced: 27 May 2026

https://github.com/yaoshanliang/linkedinspider

Crawl job information from LinkedIn for data analysis

big-data crawler python social-network-analysis

Last synced: 30 Mar 2025

https://github.com/davelongdev/link-report-crawler

A web crawler using Node.js that crawls a site and returns a report showing all internal links.

crawler crawling javascript seo seo-tools

Last synced: 16 Jun 2025

https://github.com/martincastroalvarez/web-to-pdf

Web crawlers using Python & Beautiful Soup

crawler python3 webcrawler

Last synced: 08 Apr 2025

https://github.com/boatraceventureproject/boatracescraper

The BVP Crawler package for Boatrace.

boatrace crawler php php-library php8

Last synced: 17 Mar 2025

https://github.com/jeanluc162/prnt-sc-crawler

Crawler for the Website prnt.sc

crawler net5 net50 prntsc screenshots

Last synced: 07 Jun 2026

https://github.com/mizcausevic-dev/procurement-pulse-engine

The crawl + aggregate engine behind the AI Procurement Pulse. Probes a universe of vendor domains for the 11 Kinetic Gain Protocol Suite documents and produces the quarterly issue dataset. Issue #1: the zero baseline.

ai-governance ai-procurement-pulse crawler data-journalism javascript kinetic-gain-protocol-suite procurement research well-known

Last synced: 01 Jun 2026

https://github.com/nblthree/python-url-crawler

Simple web crawler

crawler python3

Last synced: 25 Mar 2025

https://github.com/bruce-lee-ly/crawler

Several fun crawler cases implemented in Python.

crawler python

Last synced: 27 Jun 2025

https://github.com/licoy/win4000-images-crawler

基于scrapy爬取&下载win4000.com的图片壁纸

crawler python scraper

Last synced: 28 Mar 2025

https://github.com/mstephen19/apify-click-events

Like TypeScript, but for clicking ;) Manage automated clicks, and ensure your Apify web-crawler is only clicking exactly what you allow it to

apify apify-sdk crawler scraper web-automation

Last synced: 23 Aug 2025

https://github.com/humbertodias/go-nie-crawler

Simple crawler that extract some useful informations from sede.administracionespublicas.gob.es.

crawler golang

Last synced: 03 Mar 2025

https://github.com/mustafadalga/website-crawler

Hedef web sitesini tarayarak linklerini listeleyen bir web crawler scripti || A web crawler script that lists links by scanning the target website.

crawl crawler crawling-sites hacking hacking-tool web-crawler web-crawler-python web-crawling

Last synced: 20 Apr 2026

https://github.com/sc0vu/gocrawl

Simple crawl for golang

crawler golang

Last synced: 23 Jul 2025

https://github.com/nextlevelshit/adonis-crawler

A free web crawler on top of the incredibile AdonisJS Framework

adonisjs crawler javascript nodejs regex spider websocket

Last synced: 22 May 2026

https://github.com/limdongjin/bill-scraper

Python3 Scraper / Multiprocessing / ElasticSearch / BeautifulSoup :: 20대 국회 법안 크롤러

crawler python scraper

Last synced: 15 Oct 2025

https://github.com/kh4ru/crusoecrawler

A python crawler to download 3DS Roms from Hshop

3ds crawler hshop python roms

Last synced: 25 Mar 2025

https://github.com/ghsaboias/alpha-agent

An intelligent web research assistant that combines web crawling, search functionality, and AI-powered analysis using Anthropic's Claude API.

ai claude crawler search web

Last synced: 14 Mar 2025

https://github.com/ma-pony/playwright-spider-utils

Playwright Spider Utils is a utility library for engineers using the Playwright framework to build web crawlers. This project provides common web scraping functions, simplifying the process of crawler development and enhancing productivity.

crawl crawler playwright python scrapy selenium spider spiderman

Last synced: 06 Jan 2026

https://github.com/shamsher31/crawler

Simple site crawler that extracts all the URL links from the given website

crawler

Last synced: 15 Oct 2025

https://github.com/mizcausevic-dev/aeo-crawler

BFS crawler for AEO Protocol v0.1 declaration graphs. Seed an origin, follow primary_source URIs, emit JSON Lines records of every fetch. Built on aeo-sdk-go. Concurrent, depth-limited, budget-capped, stdlib-only HTTP.

aeo aeo-protocol ai-governance answer-engine-optimization crawler entity-graph go-cli golang kinetic-gain-protocol-suite protocol-implementation well-known

Last synced: 01 Jun 2026

https://github.com/marshalw/crawler

爬虫项目

crawler javascript nodejs

Last synced: 22 Jan 2026

https://github.com/filipsedivy/tachometer-check

🚘 MDČR - kontrola tachometru

crawler czech-republic mdcr

Last synced: 11 Jan 2026

https://github.com/mawkler/go-web-crawler

Toy web server written in Go

crawler go

Last synced: 15 Aug 2025

https://github.com/stephanebruckert/gocrawl

Crawl every pages and assets of a web domain

crawler python

Last synced: 16 Oct 2025

https://github.com/kiranjisonawane143/blockchain-data-crawler

🔍 Discover and extract valuable data from blockchain networks efficiently with this easy-to-use data crawler.

binance bitcoin bsc coingecko coingecko-api crawler crypto-bot cryptocurrencies cryptocurrency ethereum scraper

Last synced: 06 May 2026

https://github.com/Kissaki/website-downloader

A website Crawler and downloader. Useful for archiving dynamic websites as static files.

archive crawler csharp download gpl website

Last synced: 10 Mar 2025

https://github.com/claudio-code/nap-web-crawler

Created It crawler to find broken links in docs of framework and languages

crawler

Last synced: 07 Jul 2025

https://github.com/roele/roast

A JVM Data Crawler

cli crawler jvm

Last synced: 16 May 2025

https://github.com/foolishway/blog-crawler

blog-crawler crawl blogs by your configuration file.

blogs config crawler

Last synced: 22 Jan 2026

https://github.com/zfael/scrape-it-all

Modular web scraper for Node.JS

crawler scraper scraping scraping-websites web-scraping

Last synced: 04 Feb 2026

https://github.com/terminaldweller/crawley

A creepy crawler that runs as a sleepy daemon.

crawler daemon python3

Last synced: 04 Jul 2025

https://github.com/yokoyang/baidu-crawler

tieba_crawler

crawler

Last synced: 16 Jun 2025

https://github.com/zenixls2/2chpreprocess

Dump messages from 2ch with some preprocessing for ML analysis

2ch crawler python

Last synced: 26 Mar 2025

https://github.com/asmrcodez-yt/google-extensions-scraper

🚀 Download free and open-source Chrome extensions for web scraping! Extract data from various websites effortlessly with our latest .crx releases.

chrom codez crawler extension free linkedin omid opensource scraper thecodez web-scraper

Last synced: 17 Oct 2025

https://github.com/beckkramer/puppeteer-traverse

Puppeteer utility to easily run a function you define per route on a set of routes.

crawler crawling nodejs puppeteer

Last synced: 06 May 2026

https://github.com/thesurlydev/surly-spider

A command line interface for the spider library

crawl crawler rust spider surly surly-spider

Last synced: 16 Feb 2026

https://github.com/berecat/selenium_facebook_scraper

A simple python3 script used to download a users's friend list from facebook.

automation crawler facebook facebook-scraper webscraper

Last synced: 24 Jul 2025

https://github.com/tca166/ck2-history-extractor

A tool for creating an encyclopedia from your CK2 savefile

ck2 crawler crusader-kings-2

Last synced: 02 Apr 2025

https://github.com/billy0402/python-application

A learning project from the book 'Python 技術者們'.

course crawler matplotlib opencv pandas python requests selenium sklearn

Last synced: 12 Apr 2026

https://github.com/amazingcoderpro/pythonup

玩转Python!for improving python skills

crawler python

Last synced: 19 May 2026

https://github.com/manu-sh/http_normalizer

http url normalization for web crawlers

crawler http spider url-normalization

Last synced: 12 Jun 2025

https://github.com/ilovebacteria/digikala-api

This python package requests to Digikala API and gets a product detail.

crawler digikala pypi

Last synced: 11 Feb 2026

https://github.com/tigercosmos/web-crawler

Web Crawler in Java Maven Project

crawler

Last synced: 12 Jun 2025

https://github.com/insectmk/douban-crawler

豆瓣电影Top250爬虫及数据展示

analysis crawler django echarts mysql python3 website

Last synced: 10 Mar 2026

https://github.com/webdevcave/directory-crawler-php

Directory Crawler PHP is a simple PHP library for recursively crawling through directories and listing files and directories.

crawler crawling directory path php php-library

Last synced: 12 Feb 2026

https://github.com/tormol/zenphoto-dl

A script for recursively downloading all pictures from zenphoto-based photo albums.

crawler python-script

Last synced: 30 Aug 2025

https://github.com/tungct/tngtcrawler

Crawler using Scrapy

crawler python scrapy

Last synced: 29 May 2026

https://github.com/cak/foot

Foot is a library that fetches a list of URLs and silly walks through each site to gather information.

bugbounty crawler scraping

Last synced: 22 May 2026

https://github.com/itechbear/robotstxt

A java clone of Google's robotst.txt parser: https://github.com/google/robotstxt

crawler google-robotst-parser java robotstxt

Last synced: 14 Jan 2026

https://github.com/vuchkov/forbes-billionairs-list

Forbes Billionairs List Crawler - PHP, MySQL, Headless browser, etc.

crawler headless-chrome php scraper website

Last synced: 29 Apr 2026

https://github.com/jannchie/go-probe

HTML and JSON data crawler based on Golang. Simple and fast, very easy to use.

collector crawler fetcher golang spider

Last synced: 09 Apr 2025

https://github.com/juangesino/ah-bonus-crawler

React + Express application that crawls Albert Heijn's promotions.

crawler crawling express expressjs headless-chrome nodejs react reactjs

Last synced: 06 May 2026

https://github.com/basemax/okala-database-crawler

A robust, UTF-8 compliant PHP-based crawler designed to extract structured product data from Okala. This tool efficiently scrapes and saves store information, category slugs, and detailed product listings into organized JSON files. Ideal for data analysis, backup, or integration into other systems.

crawler crawler-php curl data json okala okala-com okalacom php php-crawler scraper

Last synced: 01 May 2026

https://github.com/engageintellect/scrapers

A repository of web scrapers using Python & Scrapy

crawler python scrapy spider

Last synced: 31 Mar 2025

https://github.com/sevenecks/web-crawler

crawl a website, find pages, find links, find relationships between them and report on 404 and other errors

404 checker crawler site web

Last synced: 21 Jun 2025