An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/mustafadalga/website-crawler

Hedef web sitesini tarayarak linklerini listeleyen bir web crawler scripti || A web crawler script that lists links by scanning the target website.

crawl crawler crawling-sites hacking hacking-tool web-crawler web-crawler-python web-crawling

Last synced: 20 Apr 2026

https://github.com/kh4ru/crusoecrawler

A python crawler to download 3DS Roms from Hshop

3ds crawler hshop python roms

Last synced: 25 Mar 2025

https://github.com/ymdarake/otenki-crawler

Yet another weather data scraper.

crawler weather weather-data

Last synced: 02 Feb 2026

https://github.com/ariefrahmansyah/crawler

Simple website crawler using Go programming language.

crawler go

Last synced: 27 Mar 2025

https://github.com/filipsedivy/tachometer-check

🚘 MDČR - kontrola tachometru

crawler czech-republic mdcr

Last synced: 11 Jan 2026

https://github.com/taleblou/brokenlinkchecker_python

This Python web crawler traverses a website, verifies resource links (CSS, JS, images, videos, iframes), and identifies broken links with HTTP errors (400-599)

crawler http links python resources website

Last synced: 03 Apr 2025

https://github.com/zfael/scrape-it-all

Modular web scraper for Node.JS

crawler scraper scraping scraping-websites web-scraping

Last synced: 04 Feb 2026

https://github.com/terminaldweller/crawley

A creepy crawler that runs as a sleepy daemon.

crawler daemon python3

Last synced: 04 Jul 2025

https://github.com/tca166/ck2-history-extractor

A tool for creating an encyclopedia from your CK2 savefile

ck2 crawler crusader-kings-2

Last synced: 02 Apr 2025

https://github.com/manu-sh/http_normalizer

http url normalization for web crawlers

crawler http spider url-normalization

Last synced: 12 Jun 2025

https://github.com/tigercosmos/web-crawler

Web Crawler in Java Maven Project

crawler

Last synced: 12 Jun 2025

https://github.com/itechbear/robotstxt

A java clone of Google's robotst.txt parser: https://github.com/google/robotstxt

crawler google-robotst-parser java robotstxt

Last synced: 14 Jan 2026

https://github.com/vuchkov/forbes-billionairs-list

Forbes Billionairs List Crawler - PHP, MySQL, Headless browser, etc.

crawler headless-chrome php scraper website

Last synced: 29 Apr 2026

https://github.com/jannchie/go-probe

HTML and JSON data crawler based on Golang. Simple and fast, very easy to use.

collector crawler fetcher golang spider

Last synced: 09 Apr 2025

https://github.com/s3rgeym/wscrap

Command line web scraping tool.

crawler scraping

Last synced: 09 Apr 2025

https://github.com/engageintellect/scrapers

A repository of web scrapers using Python & Scrapy

crawler python scrapy spider

Last synced: 31 Mar 2025

https://github.com/splorg/sage

A scraper to get every quote from a book off of Goodreads.

books crawler datamining goodreads goodreads-data python scraper scrapy webcrawling webscraping

Last synced: 12 Jun 2025

https://github.com/46319943/ganji_community

爬取赶集网上各个城市的小区信息

crawler ganji ganjispider

Last synced: 18 Jan 2026

https://github.com/dalthviz/csapp

Crawler-Scrapper for the playstore

crawler csapp keyword nlp playstore rating review scrapper

Last synced: 13 May 2026

https://github.com/casatrick/solana-transaction-crawler

crawl & parse solana transaction

crawler parser rust solana transaction

Last synced: 20 Jun 2026

https://github.com/athulmurali/flickr-api-docs-crawler

A python based crawler that extracts the documentation of apis and writes it into a file as JSON. A beautiful documentation page can be built from the JSON file using Docusaurus

api beautifulsoup4 crawler documentation python3

Last synced: 18 Jun 2026

https://github.com/huakunshen/cron-crawler-template

Web Crawler Cron Job Template running with GitHub Action. Capable of sending email notifications.

crawler github-actions python

Last synced: 15 May 2026

https://github.com/billy0402/scrapy-tutorial

A learning project from the book 'Scrapy一本就精通'.

course crawler docker mongodb mysql proxy python redis scrapy splash sqlite ubuntu

Last synced: 13 Apr 2026

https://github.com/amirsorouri00/crawler

Page-Rank Public python2 projects whice have been turned into python3.

crawler page-rank python

Last synced: 05 Sep 2025

https://github.com/c17an/grade-tracer

👨‍💻 항공대 성적변동 추적 크롤러 🏑

concurrently crawler es6 express nodejs nodemon puppeteer react

Last synced: 13 Apr 2026

https://github.com/gabrielolobo/crawley

This project is designed to run crawlers and process the results based on the specified output format. It takes command-line arguments to select the crawler and output format.

crawler poetry python scrapping

Last synced: 22 Jun 2025

https://github.com/ggteixeira/motorcycle-simulator

A toy project that fetches prices from motorcycles from OLX and does some calculations for those who want to buy them..

crawler motorcycle olx scraper

Last synced: 28 Feb 2025

https://github.com/pvital/cra-cra

Another web crawler

crawler python

Last synced: 16 Mar 2025

https://github.com/linjonh/videowebsidesparser

This Project is used to parse a video web side to remove ads.

crawler parser python

Last synced: 13 Jun 2025

https://github.com/danielemoraschi/sitemap-common

Simple PHP Sitemap generator and crawler library.

crawler php php-library php-sitemap-generator sitemap

Last synced: 11 Mar 2026

https://github.com/raspi/scrapy-kuntavaalit2021-keskisuomalainen

Fetch Keskisuomalainen kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 26 Apr 2025

https://github.com/raspi/scrapy-kuntavaalit2021-sanoma

Fetch Sanoma kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 26 Apr 2025

https://github.com/raspi/scrapy-kuntavaalit2021-almamedia

Fetch Almamedia kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 26 Apr 2025

https://github.com/radityaharya/sitesweeper

Sitesweeper is a python package to help you automate your web scraping process, outputting pages to a file

crawler pdf python website-crawler

Last synced: 27 Mar 2025

https://github.com/basemax/crawler-news-currency-gold-coins

PHP Crawler to get Persian news related to currency coin and gold.

crawler crawler-php crawler-testing currency currency-exchange-rates gold php php-crawler

Last synced: 05 Jul 2025

https://github.com/der3318/daily-pixiv

Integrated Flow - Line Notification of Top Ranked Pixiv Illustrations

crawler line-notify pixiv workflow

Last synced: 03 Mar 2025

https://github.com/shentengtu/cht-yp-crawler

Simple Crawler of www.iyp.com.tw.

crawler node-js nodejs yellow-pages yellowpages

Last synced: 09 May 2026

https://github.com/hackthedev/botnet

Tool to find IP's on the Web and check SSH availability and brute force login with a wordlist. Educationally only !!!

botnet bruteforce crawler education educational ip malicious proof-of-concept ssh testing web

Last synced: 17 Mar 2025

https://github.com/massongit/ibaraki-univ-circle-crawler

Crawls official circles in Ibaraki University from university's website

crawler python

Last synced: 25 Mar 2025

https://github.com/w3labkr/ipynb-scraper

A collection of frequently used Jupiter notebook code.

crawler ipynb jupyter jupyter-notebook python scrapper

Last synced: 19 Apr 2026

https://github.com/hvtuananh/twitter_crawler

Daemon to call and get tweets from Twitter Public Stream API

crawler java streaming-api tweets twitter twitter-crawler

Last synced: 11 Mar 2025

https://github.com/cls1991/gank.io-go

A simple crawler for fetching pictures from http://gank.io, implemented in golang.

crawler gankio goquery pictures

Last synced: 27 Feb 2025

https://github.com/ericc-ch/crawldown

Crawl websites and convert their pages into clean, readable Markdown content using Mozilla's Readability and Turndown.

crawler markdown scraper

Last synced: 05 Jul 2025

https://github.com/matheusfaustino/jazzmaster_crawler

It is a crawling for getting the audio programs from a specific radio program called Jazzmaster

crawler python scrapy

Last synced: 14 Jun 2025

https://github.com/jenting/compare-drugstore-price

Compare price between cosmeceutical shops

cosmed crawler golang poya side-project watsons

Last synced: 27 Mar 2025

https://github.com/marcosvbras/twitton

A simple Python library to make Twitter Search API easily to use

crawler crawling python spider twitter twitter-api

Last synced: 27 Mar 2025

https://github.com/kasperomari/simplecrawlerapi

A simple RESTful API that takes a URL and returns all the links in a specific depth.

crawler flask-api flask-restful

Last synced: 02 Apr 2025

https://github.com/lesterrry/campfire

Shock-drop watching utility

crawler parser web-crawler web-parser

Last synced: 13 Jun 2026

https://github.com/moe131/webcrawler

Python web crawler designed to scrape websites

crawler crawling-python python python-crawler scraping simhash web-crawler

Last synced: 09 Apr 2025

https://github.com/ismoreirakt/spyder

The web is changing. Spyder sees it.

alerts automation crawler monitor

Last synced: 01 Mar 2025

https://github.com/mnemocron/VPNNetworkShareCrawler

ugly scripts to connect a Raspberry Pi to a VPN and attach network share to periodically crawl the documents on it

crawler samba vpn

Last synced: 11 Mar 2025

https://github.com/codegram01/go-ai-crawl

Golang Web Crawl with AI

ai chromedp crawler golang ollama

Last synced: 16 Apr 2026

https://github.com/appliedsoul/headless-screenshot

High-level library for taking screenshot of websites based on headless chrome (puppeteer)

crawler headless-chromium javascript nodejs scrapper screenshot testing

Last synced: 21 Apr 2026

https://github.com/ggteixeira/corpus-cleaner

Linguistic tool (made by a linguist, for linguists) that scraps corpora, automatically cleans it up, and generates n-grams.

beautifulsoup4 bs4 corpora corpus corpus-linguistics crawler linguistics nlp python scraper web-scraping

Last synced: 28 Feb 2025

https://github.com/yosh1/mio-crawler

A crawler that acquires data usage of iijmio .

crawler iijmio mio ruby

Last synced: 10 May 2026

https://github.com/Arman2409/data-falcon

Web crawler

crawler extract-data

Last synced: 02 Apr 2025

https://github.com/bramtenhove/issue-crawler

Crawls Drupal issues and keeps stats

crawler

Last synced: 09 Jan 2026

https://github.com/yangxuhui/requests-google

A simple google related Parsing Package

crawler google-api parsing

Last synced: 14 Jan 2026

https://github.com/usethisname1419/connectioncrawler

crawls a website and checks for connections

connection crawler http-headers reporting website-analyzer

Last synced: 06 Jul 2025

https://github.com/mikiw/reactweb3

Ethereum transaction crawler in ReactJs.

blockchain crawler ethereum

Last synced: 14 May 2026

https://github.com/loko5ja/seed-gen

Seed-gen is an innovative tool designed to generate unique and creative seed phrases for cryptocurrency wallets. With a focus on security and usability, it ensures that users have robust, memorable keys for safeguarding their digital assets efficiently.

crawler crypto crypto-2025 crypto-bot crypto-finder crypto-recovery ethereum-bruteforce laravel lost-btc-wallet-finder mnemonic-generator seed-crypto seed-recovery seed-tool yeoman

Last synced: 03 Apr 2025

https://github.com/nowshad-sust/corona

A simple data endpoint for coronavirus updates

api corona coronavirus-updates crawler dcoker-compose excel nodejs

Last synced: 17 May 2026

https://github.com/sssshefer/web-crawler-http

Basic web crawler which represents the linking structure of the website

crawler jest jest-tests js

Last synced: 01 Mar 2025

https://github.com/aweirddev/air-web

A lightweight package for crawling the web with the minimalist of code.

crawl crawler markdown scrape scraper web

Last synced: 25 Jan 2026

https://github.com/allancapistrano/anime-sheets

Crawler que pega as informações dos animes e salva numa planilha.

anime crawler google-sheets google-sheets-api

Last synced: 16 Mar 2025

https://github.com/eivindarvesen/naive-spider

A minimal web crawler

crawler python spider

Last synced: 31 May 2026

https://github.com/roc41d/http-web-crawler

Http web crawler with Nodejs + TDD

crawler http javascript jest jest-test nodejs webcrawler

Last synced: 13 Apr 2026

https://github.com/moojing/coinmarketcap-crypto-crawler

A Raycast plugin for getting the latest price of your favorite coins from CoinMarketCap.

crawler cryptocurrency

Last synced: 01 Apr 2025

https://github.com/d-w-arnold/local-news-data-collection

Web crawler for local news sites - Generates HTML files of each webpage visited and a list of links found on the webpage, as a TXT file 🌎

crawler data-collection python

Last synced: 01 Apr 2025

https://github.com/keizerzilla/ssh-hunter

Script que caça por Raspberry Pis vulneráveis na internet (porta SSH aberta e senha padrão não modificada).

crawler raspberry-pi ssh

Last synced: 10 Apr 2025

https://github.com/keizerzilla/search4dwango9

My attempt to help solving the DWANGO9 wad mystery. More info: https://www.youtube.com/watch?v=RXGtCjdwwe8

crawler datamining doom-wad

Last synced: 10 Apr 2025

https://github.com/mevljas/gov.si-crawler-playwright

A standalone crawler that crawls only .gov.si web sites using Playwright.

crawler multithreading playwright sqlachemy

Last synced: 19 Jan 2026

https://github.com/allanbian1017/mbpprice

二手Macbook Pro資訊

crawler python

Last synced: 14 Jan 2026

https://github.com/mehdieidi/offliner

Offliner is a tool to make a website offline viewable. It's a concurrent web crawler which saves all the pages and static files in a directory.

concurrency concurrent concurrent-programming crawler go golang goroutine multiprocessing multithreading process scraper thread

Last synced: 14 Jan 2026

https://github.com/heitor57/astronomy-news

:telescope::newspaper: Astronomy News

crawler data-science news text-mining

Last synced: 06 Oct 2025

https://github.com/b3j4y/unidisk

A Crawler to search for keywords and compare the score

comparison crawler nlp solr-client

Last synced: 17 Jan 2026

https://github.com/semoal/pythoncrawler

Python crawler with XMLRPC & BeautifulSoap

beautifulsoup crawler python wordpress xmlrpc

Last synced: 15 Apr 2026

https://github.com/heyihuang826/ncku_course

Efficiently and reliably scrapes course information from National Cheng Kung University on a regular basis(if you choose to store data on onedrive). The collected data is organized into Excel files and can be automatically uploaded to OneDrive or saved locally (to your personal computer or github repo).

captcha crawler onedrive

Last synced: 01 Mar 2026

https://github.com/constaf79/pycn

🔗 Simplify your cryptocurrency tasks with pycoin, a Python library providing essential utilities for Bitcoin and alt-coins, ensuring seamless transactions and operations.

cnc-machine cnc-milling-controller cnn cnn-model cnn-processors computer-vision crawler edge-detection fun image-classification image-processing library neural-network pillow pycnc python raspberry-pi web

Last synced: 14 May 2026

https://github.com/dasantonym/node-cesspoll

:poop: Turd Miner Node Module

crawler news poopetry potty-humour

Last synced: 28 Oct 2025

https://github.com/nyarla/net-paranoid-go

(WIP) A paranoidic helpers for untrusted web content crawler

crawler filtering golang helper

Last synced: 14 Jan 2026

https://github.com/btlmd/asahi_nikkei_news_crawler

日本经济新闻、朝日新闻爬虫

crawler

Last synced: 07 Oct 2025

https://github.com/greytabby/grawl

Simple web crawler for learning.

crawler

Last synced: 14 Jan 2026

https://github.com/huyduc1602/uniapp-crawler

Crawl và Dịch tài liệu Uni-app

crawler docker python

Last synced: 25 Jan 2026

https://github.com/viko16/hatcher

🐣[WIP] Provides APIs by simple configuration.

api api-server cli crawler koa-middleware nodejs spider

Last synced: 08 Oct 2025

https://github.com/romangw/lukki

Completely free code for a webcrawling bot.

crawler python web-scraping web-scraping-python

Last synced: 08 Oct 2025

https://github.com/killianmeersman/wander

Convenient scraping library for Gophers

crawler data-mining golang scraper spider

Last synced: 14 Jan 2026

https://github.com/bernieyangmh/check-link

Checking through whole website, identifying broken links.

checkurl crawler golang

Last synced: 14 Jan 2026