An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/yggverse/pulsarss

RSS Aggregator for Gemini Protocol

aggregator cli crawler daemon feed gemini gemini-protocol gemtext parser rss rust

Last synced: 13 Feb 2026

https://github.com/codegram01/go-ai-crawl

Golang Web Crawl with AI

ai chromedp crawler golang ollama

Last synced: 16 Apr 2026

https://github.com/ashwantmanikoth/intellilsearch

This is a AI powered crawler that can search the web for information based on your input.

crawler deepseek groq-api hybrid-search llama llm pydantic python rag reranking retrieval-augmented-generation

Last synced: 15 Apr 2026

https://github.com/dinofizz/sitemapper

sitemapper is a site mapping tool which provides a JSON output listing each internal URL and the internal links found at that URL. The crawl depth is configurable, as well as the mode of operation: "synchronous", "concurrent" and "concurrent limited". The tool runs stand-alone or as a distributed crawl engine running in a Kubernetes cluster.

astradb cassandra concurrency crawler go golang kubernetes nats sitemap

Last synced: 16 Jan 2026

https://github.com/appliedsoul/headless-screenshot

High-level library for taking screenshot of websites based on headless chrome (puppeteer)

crawler headless-chromium javascript nodejs scrapper screenshot testing

Last synced: 21 Apr 2026

https://github.com/ggteixeira/corpus-cleaner

Linguistic tool (made by a linguist, for linguists) that scraps corpora, automatically cleans it up, and generates n-grams.

beautifulsoup4 bs4 corpora corpus corpus-linguistics crawler linguistics nlp python scraper web-scraping

Last synced: 28 Feb 2025

https://github.com/chamzzzzzz/supersimplesoup

a go package implements a super simple soup like DOM API

beatifulsoup crawler crawler-go dom go golang html-parser

Last synced: 28 Jan 2026

https://github.com/shivamsaraswat/webxcrawler

WebXCrawler is a fast static crawler to crawl a website and get all the links.

crawler crawling python scraping webcrawler webxcrawler

Last synced: 13 Feb 2026

https://github.com/luminovrym/crawler-tools-js

Crawler Tools Js adalah sebuah aplikasi yang digunakan untuk scrapping data pada sebuah web

crawler crawler-js data js web-scraping

Last synced: 08 Sep 2025

https://github.com/gn00678465/crawler

使用 Firecrawl API 的 Python CLI 工具,支援多種輸出格式的網頁爬取。

crawler pythone

Last synced: 06 Feb 2026

https://github.com/rsheremeta/web-crawler

A tiny web-crawler which looks for the links, extract and prints them concurrently to the Terminal output

crawler go golang web-crawler webcrawler

Last synced: 12 Jun 2026

https://github.com/danielfillol/ab2l_crawler

Crawler for AB2L radar

brazil crawler lawtech legaltech

Last synced: 28 Jan 2026

https://github.com/dubniczky/bad-robot

This is a python crawler that disregards robots.txt rules and downloads disallowed resources

crawler osint-python osint-tool python robots-txt

Last synced: 31 Mar 2025

https://github.com/zenixls2/2chpreprocess

Dump messages from 2ch with some preprocessing for ML analysis

2ch crawler python

Last synced: 26 Mar 2025

https://github.com/dubniczky/webmap

Website mapping crawler implemented in python

crawler mapping mapping-tools package python scraping security

Last synced: 31 Mar 2025

https://github.com/Arman2409/data-falcon

Web crawler

crawler extract-data

Last synced: 02 Apr 2025

https://github.com/bramtenhove/issue-crawler

Crawls Drupal issues and keeps stats

crawler

Last synced: 09 Jan 2026

https://github.com/yangxuhui/requests-google

A simple google related Parsing Package

crawler google-api parsing

Last synced: 14 Jan 2026

https://github.com/usethisname1419/connectioncrawler

crawls a website and checks for connections

connection crawler http-headers reporting website-analyzer

Last synced: 06 Jul 2025

https://github.com/evangelos-karavas/arduino-crawler-line-follower-obstacle-avoidance

Crawler Robot following black line while avoiding obstacles found in the way. Assignment for Mehcatronics

arduino-uno autonomous-vehicles cpp crawler infrared-sensors mechatronics path-planning robotics

Last synced: 28 Apr 2026

https://github.com/mikiw/reactweb3

Ethereum transaction crawler in ReactJs.

blockchain crawler ethereum

Last synced: 14 May 2026

https://github.com/loko5ja/seed-gen

Seed-gen is an innovative tool designed to generate unique and creative seed phrases for cryptocurrency wallets. With a focus on security and usability, it ensures that users have robust, memorable keys for safeguarding their digital assets efficiently.

crawler crypto crypto-2025 crypto-bot crypto-finder crypto-recovery ethereum-bruteforce laravel lost-btc-wallet-finder mnemonic-generator seed-crypto seed-recovery seed-tool yeoman

Last synced: 03 Apr 2025

https://github.com/pjt3591oo/spider-base_crawler

scrapy 기반 크롤러 만들기

crawler python scrapy spider

Last synced: 16 May 2025

https://github.com/nowshad-sust/corona

A simple data endpoint for coronavirus updates

api corona coronavirus-updates crawler dcoker-compose excel nodejs

Last synced: 17 May 2026

https://github.com/russellsteadman/netscrape

A Node.js framework for creating good bots

bot crawler crawling exclusion rfc9309 scraper scraping web-scraping

Last synced: 20 Jun 2026

https://github.com/sssshefer/web-crawler-http

Basic web crawler which represents the linking structure of the website

crawler jest jest-tests js

Last synced: 01 Mar 2025

https://github.com/sedrubal/webcrawler

Crawl sites and search for security issues.

crawler script security website-auditing

Last synced: 17 Mar 2025

https://github.com/tisfeng/bing-dict

A Bing command line dictionary, which obtains the query results of bing dictionary by crawler.

bing-dictionary command-line crawler nodejs

Last synced: 13 May 2026

https://github.com/basemax/okala-store-ids

A PHP script designed to systematically query the Okala API and extract a comprehensive list of valid store IDs. By automating the retrieval of store details, it enables users to efficiently compile and maintain an up-to-date dataset of active Okala stores for analysis, integration, or further processing.

crawler curl id ids ir iran okala okala-store okala-store-id php store store-okala

Last synced: 10 Jun 2025

https://github.com/suconghou/sitemap

a simple sitemap generator and page crawler

crawler html-parser nim-lang scraper sitemap spiders

Last synced: 15 May 2026

https://github.com/tatamiya/gas-new-books-crawler

Crawling new book information from 版元ドットコム(https://www.hanmoto.com/)

crawler gas

Last synced: 30 Oct 2025

https://github.com/fritz-c/itunes-stats

Fetch info on podcasts, etc. from iTunes RSS data

crawler itunes

Last synced: 18 Jun 2026

https://github.com/fusetim/bitcrawler

Small experiments to learn a bit more about BitTorrent, DHT and etc. Might also be a BitTorrent DHT crawler one day?

bittorrent crawler dht

Last synced: 30 Mar 2025

https://github.com/surister/scrupy

Python library to create web Crawlers which aims to be powerful yet simple.

crawler crawling-framework crawling-python http library python scraping

Last synced: 15 May 2026

https://github.com/jplitza/urlsearch

Index typical webserver directory listings and then search for arbitrary terms

crawler search

Last synced: 17 Mar 2025

https://github.com/allancapistrano/anime-sheets

Crawler que pega as informações dos animes e salva numa planilha.

anime crawler google-sheets google-sheets-api

Last synced: 16 Mar 2025

https://github.com/eivindarvesen/naive-spider

A minimal web crawler

crawler python spider

Last synced: 31 May 2026

https://github.com/raspi/scrapy-crucial

Web crawler for Crucial (crucial.com)

crawler hardware memory scrapy spider

Last synced: 02 Jul 2025

https://github.com/roc41d/http-web-crawler

Http web crawler with Nodejs + TDD

crawler http javascript jest jest-test nodejs webcrawler

Last synced: 13 Apr 2026

https://github.com/jpleorx/tagblender

A simple java API to retrieve hashtags from https://www.tagblender.net/

api crawler hashtags java jsoup parser

Last synced: 20 Mar 2025

https://github.com/moojing/coinmarketcap-crypto-crawler

A Raycast plugin for getting the latest price of your favorite coins from CoinMarketCap.

crawler cryptocurrency

Last synced: 01 Apr 2025

https://github.com/yokoyang/baidu-crawler

tieba_crawler

crawler

Last synced: 16 Jun 2025

https://github.com/madret/selenium_crawler

Selenium Webcrawler based on the chromedriver.

chromedriver crawler human-like selenium selenium-webdriver webcrawler

Last synced: 15 Apr 2026

https://github.com/izh318/genie-music-artist-album-crawler

지니뮤직에 등록 되어 있는 특정 아티스트의 앨범 정보를 한 번에 크롤링 하는 Python Script 입니다.

crawler genie genie-music gui

Last synced: 08 Nov 2025

https://github.com/miiraak/scrapc

C# WinForms - Crawler & Scraper Web content

crawler csharp html scraper url web windows-forms

Last synced: 29 Jan 2026

https://github.com/edumucelli/rubybikes

A set of Bike Sharing System parsers in Ruby

bike-sharing crawler ruby

Last synced: 12 Apr 2025

https://github.com/jamesponddotco/wikiextract

[READ-ONLY] A word extractor for Wikipedia articles.

crawler crawling diceware go wikipedia wikipedia-crawler word-extraction

Last synced: 15 Mar 2025

https://github.com/anthonysigogne/scrapy

A list of simple scrapers made with Scrapy

crawler elasticsearch python scrapy spider

Last synced: 11 Apr 2026

https://github.com/anshiii/pixder

🤔 A spider for pixiv.net

crawler pixiv spider

Last synced: 09 Aug 2025

https://github.com/d-w-arnold/local-news-data-collection

Web crawler for local news sites - Generates HTML files of each webpage visited and a list of links found on the webpage, as a TXT file 🌎

crawler data-collection python

Last synced: 01 Apr 2025

https://github.com/leonardopinho/instagramfeed

Image list based on a tag for the Instagram feed.

crawler instagram python

Last synced: 28 Mar 2025

https://github.com/xprnvd/makdi

Website crawler created for pentest exercises like HTB.

crawler htb htb-scripts pentest python

Last synced: 20 Jul 2025

https://github.com/fscotto/noahcrawler

A simple web crawler written in Java to support a database of Italian regions.

crawler java jsoup-library

Last synced: 14 Sep 2025

https://github.com/keizerzilla/ssh-hunter

Script que caça por Raspberry Pis vulneráveis na internet (porta SSH aberta e senha padrão não modificada).

crawler raspberry-pi ssh

Last synced: 10 Apr 2025

https://github.com/keizerzilla/search4dwango9

My attempt to help solving the DWANGO9 wad mystery. More info: https://www.youtube.com/watch?v=RXGtCjdwwe8

crawler datamining doom-wad

Last synced: 10 Apr 2025

https://github.com/dyslab/otglite

Online TXT Grabee Lite Edition :bee:

crawler expressjs jquery nodejs sqlite3

Last synced: 09 Apr 2026

https://github.com/shunk031/amebloscraper

Scraper for Ameblo in Scrapy

ameblo crawler scraper scrapy

Last synced: 30 Jul 2025

https://github.com/tech-espm/misc-webbot

This project is aimed on creating personal assistants for replying messages about specifics issues.

classification-model crawler nlp

Last synced: 12 Jun 2026

https://github.com/waived/pastebin-ripper

Scrape all pastes from pastebin page + sub-pages

crawler mass-downloader pastebin-ripper pastebin-scraper python3 ripper scraper

Last synced: 24 Jun 2025

https://github.com/apurvsikka/mediaverse

MediaVerse is a versatile search engine for various media types such as anime, books and drama

anime anime-api anime-api-free api-rest bun crawler extensions extensions-pack free-manga kdrama lightnovel manga manga-api manga-api-free manga-crawler manga-reader movies netflix ts tv

Last synced: 29 Mar 2025

https://github.com/snwfdhmp/3gm-bot

Bot for the online french indie game 3gm.fr implemented in Ruby. Mostly website crawling and task automation.

3gm-bot crawler game-bot task-automation web-crawling

Last synced: 30 Oct 2025

https://github.com/alonecandies/golwarc

All-in-One crawlers for Golang

crawler crawling go golang scraper scraping

Last synced: 12 Jan 2026

https://github.com/sevenecks/web-crawler

crawl a website, find pages, find links, find relationships between them and report on 404 and other errors

404 checker crawler site web

Last synced: 21 Jun 2025

https://github.com/mnoalett/cscrawler

BSc degree thesis - crawler for www.couchsurfing.org

bsc-thesis couchsurfing crawler data-analysis database python

Last synced: 02 May 2026

https://github.com/rafaelmoraes003/tech-news

Analysis and manipulation of news data from a technology website obtained through data scraping using Python.

crawler data-scraping https mongodb parsel pymongo python web-scraping

Last synced: 05 May 2026

https://github.com/laffrex/xiaolanben_crawler

一个高效、稳定的小蓝本网站数据采集工具,可自动提取公司和集团产品、媒体及股东等信息,支持智能处理弹窗和自动化数据分类整理,最终目的是为了方便进行SRC信息收集。

crawler pandas selenium src

Last synced: 23 Mar 2025

https://github.com/gustavooferreira/wcrawler

Simple Web Crawler CLI tool with "minimal" dependencies

cli crawler golang graph html links web

Last synced: 31 Jan 2026

https://github.com/isaqueveras/scrape-google-results

Scrape Google Results in Golang

crawler golang google scraper webcrawler

Last synced: 21 Mar 2025

https://github.com/andrepradika/scrape-medrecruit.medworld.com

🛠 A Playwright-based web scraper that extracts job listings from MedRecruit, including job title, department, location, job type, duration, and job URL, saving the data to an Excel file.

crawler scrape

Last synced: 17 Mar 2025

https://github.com/nextlevelshit/adonis-crawler

A free web crawler on top of the incredibile AdonisJS Framework

adonisjs crawler javascript nodejs regex spider websocket

Last synced: 22 May 2026

https://github.com/murilobsd/icrop-csv

Icrop-csv para automatizar o processo do download dos relatórios.

crawler csv-export python3

Last synced: 17 Nov 2025

https://github.com/andrepradika/scrape-xpel.com

📌 A Playwright-based web scraper that extracts installer details from XPEL’s Installer Locator and saves them to CSV and Excel files.

crawler scrape

Last synced: 17 Mar 2025

https://github.com/allanbian1017/mbpprice

二手Macbook Pro資訊

crawler python

Last synced: 14 Jan 2026

https://github.com/jauharibill/animeindo-crawler

this crawler is used for research only. the creator doesn't take any responsibility for any harmful usage

crawler python3 scrapy

Last synced: 08 Jul 2025

https://github.com/alphabs/navercafeclient

네이버 카페 글 목록 크롤링을 위한 닷넷 라이브러리

crawler crawling dotnet naver naver-api naver-cafe web-scraper web-scraping

Last synced: 06 May 2026

https://github.com/mehdieidi/offliner

Offliner is a tool to make a website offline viewable. It's a concurrent web crawler which saves all the pages and static files in a directory.

concurrency concurrent concurrent-programming crawler go golang goroutine multiprocessing multithreading process scraper thread

Last synced: 14 Jan 2026

https://github.com/tjdsneto/jcnet-crawler

Extract (scrap) movie schedule info from JCNet movies page

crawler scraping

Last synced: 11 Apr 2026

https://github.com/heitor57/astronomy-news

:telescope::newspaper: Astronomy News

crawler data-science news text-mining

Last synced: 06 Oct 2025

https://github.com/pixlcrashr/stwhh-mensa

Better STWHH Mensa menu data / interface / notifier

api crawler data food studierendenwerk-hamburg university website

Last synced: 07 Aug 2025

https://github.com/bruce-lee-ly/crawler

Several fun crawler cases implemented in Python.

crawler python

Last synced: 27 Jun 2025

https://github.com/ri0n/unboxer

MP4 crawler and extractor

crawler extractor mp4 object-oriented-design qt

Last synced: 10 May 2026

https://github.com/b3j4y/unidisk

A Crawler to search for keywords and compare the score

comparison crawler nlp solr-client

Last synced: 17 Jan 2026

https://github.com/marceloneppel/crawler

Simple web crawler developed in Go.

crawler go golang web-crawler

Last synced: 07 Aug 2025

https://github.com/r3c0ger/douban-movie-top250-crawler

Crawl the movie information of Douban Movie Top-250, including movie name, movie link, director, starring, release time, production country/region, type, rating, number of reviews and introduction.

beautifulsoup4 crawler lxml python3 spider

Last synced: 10 Jun 2026

https://github.com/gxjansen/website-to-pdf

Creates a PDF based on the content of a website/subomain

claude-3-sonnet crawler python3

Last synced: 30 Mar 2025

https://github.com/semoal/pythoncrawler

Python crawler with XMLRPC & BeautifulSoap

beautifulsoup crawler python wordpress xmlrpc

Last synced: 15 Apr 2026

https://github.com/vaenow/crawler-chromeless

A chromeless crawler for coursera

chromeless coursera crawler puppeteer

Last synced: 18 May 2026