An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/sachin21/dmm-crawler

Fetch DMM.R18's data by crawler. Now, All arts for dojin and eroge is crawlable.

crawler dmm dojin doujin gem ruby

Last synced: 12 Sep 2025

https://github.com/juan-kabbali/glassdoor-linkedin-web-scrapper

CLI application that acts as web scrapper to retrieve Glassdoor and LinkedIn information

crawler webscraping

Last synced: 29 Jan 2026

https://github.com/princed/specht

Check links found in html or js files by pattern

cli crawler html javascript streams

Last synced: 10 Jul 2025

https://github.com/coghost/crawlers

crawlers in one

crawler python3 staticimg weibo

Last synced: 10 Jul 2025

https://github.com/zituocn/ziva

A golang crawler framework

crawler go golang

Last synced: 18 Jan 2026

https://github.com/turtiesocks/zendriver-rs

Async-first, undetectable browser automation in Rust via the Chrome DevTools Protocol. Stealth-by-default port of zendriver — no WebDriver, no JS shim.

anti-detection async automation bot browser-automation cdp chrome-devtools-protocol chromium cloudflare-bypass crawler headless-chrome playwright-alternative rust scraping stealth tokio undetectable-chromedriver web-scraping web-testing zendriver

Last synced: 13 Jun 2026

https://github.com/scrwdrv/siege-crawler

This CLI tool will find same domain urls in a web page and requesting them to find even more urls until server crash (or at the end of benchmark). It is used to test maximun capacity of server or finding for glitches that users might encounter.

benchmark cli crawler ddos debug siege tool

Last synced: 05 Apr 2025

https://github.com/khilnani/spidey.py

Web spiders are usually disliked by websites, but useful for recursive API/page downloads for offline analysis.

cli crawler python scaper web-spider

Last synced: 25 Mar 2025

https://github.com/sebyx07/active_proxy

Ruby proxy fetcher, retries request until completed, provides user agent🚀🚀

crawler http proxy rails ruby

Last synced: 07 May 2026

https://github.com/noarche/darknoisy

Same as my Noisy but on TOR network. Logs links. Crawls onion sites.

crawler crawling onion-domains onion-services onion-sites onions-list python python-script python3 tor torsocks

Last synced: 08 Sep 2025

https://github.com/hamidrabedi/digikala-crawler

a crawler for digikala with django framework, selenium and rest api. also scraping data from gathered urls

crawler digikala digikala-crawler django python scraper

Last synced: 16 May 2026

https://github.com/arshadkazmi42/gh-crawl

Crawler for Github repositories. Finds all the broken links from the repositories

bug-bounty-recon crawl crawler gh-crawler github github-crawler githubcrawler python

Last synced: 20 Jan 2026

https://github.com/camilamaia/crawl4us

[WIP] A Python web crawler looking wildly for tables 🕵️‍♀️

beautifulsoup4 crawler crawling pypi python-3 python-module scraper scraping tables web-scraping

Last synced: 28 Mar 2025

https://github.com/jimmy-ly00/dhe-prime-grabber

Grabs Diffie-Hellman primes from certificates using OpenSSL. Uses multiprocessing to collect over 50 million Diffie-Hellman primes.

certificate certificates crawler dhe-prime-grabber diffie-hellman ipv4 multiprocessing openssl prime prime-numbers python python-3

Last synced: 26 Dec 2025

https://github.com/sreejoy/crawlerfriend

A light weight crawler which gives search results in HTML form or in Dictionary form, given URLs and keywords.

crawler python-crawler python-scraper python27 scrapper

Last synced: 12 Jun 2025

https://github.com/loggerhead/dianping_crawler

基于 Scrapy (python 3.5) 的大众点评爬虫

crawler python-3-5

Last synced: 14 Feb 2026

https://github.com/godbout/htmlpagedom

jQuery-inspired DOM manipulation extension for Symfony's Crawler

crawler dom html htmlpagedom php symfony

Last synced: 14 Jan 2026

https://github.com/exp-codes/pyzone-crawler

QQ空间爬虫(Python版)

crawler programming

Last synced: 03 Apr 2025

https://github.com/greatdrake/contributecounter

crawl Wikipedia for contributers

crawler python scraping

Last synced: 02 Apr 2025

https://github.com/victorpre/erlich

Erlich Bachman - Hacker Hostel

chatbot crawler elixir housing umbrella

Last synced: 28 Mar 2025

https://github.com/marzzzello/gplaycrawler

(mirror) Discover apps by different mehtods. Mass download app packages and metadata.

crawler google-play google-play-store googleplay googleplaystore playstore playstoreapi scraper

Last synced: 09 Apr 2025

https://github.com/basemax/rondircrawler

A crawler for extracting a list of top sim cards and tel numbers from the Rond.ir website. (PHP)

crawle-php crawler crawler-testing crawlers crawlers-php php php-crawler rondir

Last synced: 03 Apr 2025

https://github.com/abdus/scrape-web

A simple web scrapper for Node.js

crawler web-scraping web-scrapper

Last synced: 25 Mar 2025

https://github.com/developerjosh/gogo-crawler

The tool kit for making an anime website with a database full of anime

crawler crawler-js gogoanime gogoanime-api gogoanime-scraper

Last synced: 07 Aug 2025

https://github.com/thiagopanini/datadelivery

Um módulo Terraform open source capaz de proporcionar um toolkit completo de infraestrutura para que usuários iniciem suas respectivas jornadas de exploração em serviços de Analytics na AWS.

analytics athena aws catalog crawler data datamesh glue s3 terraform

Last synced: 29 Nov 2025

https://github.com/hanifdwyputras/se-scraper

Search Engine scraper with PHP

crawler scraper seo seo-crawler

Last synced: 27 Mar 2025

https://github.com/baerwang/sec_craw

一个方便安全研究人员获取每日安全日报的爬虫,目前爬取范围包括90sec、看雪论坛、v2ex、精易论坛、52破解论坛等实验室博客,持续更新中。

crawler security security-tools threat threat-intelligence

Last synced: 04 Jul 2025

https://github.com/yjg30737/pyqt-wikipedia-crawler

Crawling the Wikipedia with Python powered by BeautifulSoup4, Supporting GUI/CUI

beautifulsoup4 crawler pyqt pyqt5 wikipedia

Last synced: 05 Sep 2025

https://github.com/phanikmr/linkcrawler

A LinkCrawler is a Python module that takes a url on the web (ex: http://python.org), fetches the web-page corresponding to that url, and parses all the links on that page into a repository of links. Next, it fetches the contents of any of the url from the repository just created, parses the links from this new content into the repository and continues this process for all links in the repository until stopped or after a given number of links are fetched.

async crawler linkcrawler parse python scrapy spider

Last synced: 07 Feb 2026

https://github.com/brianmacintosh/wikicrawler

Sandbox project for manipulating Wikimedia wikis

c-sharp crawler mediawiki-bot wikipedia-bot

Last synced: 11 Jul 2025

https://github.com/seanowenhayes/recipe-scraper

A simple scraper uses puppeteer to scrape recipes and more from the web

crawler crawling data recipes scraping

Last synced: 22 Feb 2026

https://github.com/maxmindlin/swarm

Go crawler that searches and aggregates information relevant to your interests. WIP for learning Go crawling.

crawler golang mongodb

Last synced: 04 May 2026

https://github.com/curegit/nominium

個人間取引サイトの新着商品をメールなどで通知するクローラーシステム

c2c chromium crawler ecommerce firefox selenium shopping webdriver

Last synced: 12 Mar 2025

https://github.com/appliedsoul/crawlmatic

Static and Dynamic website crawling library - a common promise based wrapper around node-crawler & hccrawler libraries.

crawler scraper

Last synced: 24 Jul 2025

https://github.com/dingpingzhang/papermedia

A scrapy-based crawler for crawling paper media.

crawler scrapy spider

Last synced: 08 Apr 2025

https://github.com/0xpr03/clantool

CF Management & Data Analysis Tool, crawler backend in rust

backend-server crawler data-analysis rust

Last synced: 05 Feb 2026

https://github.com/roswelly/solana-transaction-crawler

crawl & parse solana transaction

crawler parser rust solana transaction

Last synced: 15 May 2026

https://github.com/javokhirbek1999/tez-spider

Distributed music scraper built in Go

concurrent crawler distributed-systems music-scraper

Last synced: 17 Jan 2026

https://github.com/injectrl/xhspicextractor

小红书原图提取工具

crawler dotnet7 minimalapi okteto xiaohongshu

Last synced: 20 Jun 2026

https://github.com/captain-woof/zhi-zhu

Zhi-Zhu is a multithreaded spidering script that recursively searches base webpages and all urls appearing in it, for specific (regex) words.

crawler crawler-python crawling-python python3

Last synced: 15 Feb 2026

https://github.com/wondervictor/spiderman

2017 Software Course Project

crawler distribute-crawler zhihu-crawler

Last synced: 21 Apr 2026

https://github.com/buren/stupid_crawler

Stupid crawler that looks for URLs on a given site

cli crawler ruby rubygem

Last synced: 09 Apr 2025

https://github.com/anyparser/anyparser_core

Anyparser Python SDK for RAG/ETL Pipelines - File Content Extraction. Supports extraction from various file formats including PDF, Microsoft Office documents, OCR/Image to Text, Audio to Text, and Website to Text.

cache-augmented-generation crawler crewai etl-framework etl-pipeline knowledge-graph knowledgebase langchain langgraph llamaindex ms-office n8n ocr openai pdf python rag retrieval-augmented-generation search-engine typescript

Last synced: 05 Oct 2025

https://github.com/thiiagoms/car-stealth

REST API to all cars that were stolen

api cars crawler student

Last synced: 16 Jun 2025

https://github.com/raphaelm22/crawling

Set of crawlers to find out something on the internet and whether it succeeds, it will send a notification.

caesb crawler growth-suplements gsuplementos

Last synced: 06 Mar 2026

https://github.com/filsuin/linkedin-crawler

A Python tool for automating job searches on LinkedIn based on user-defined keywords.

crawler crawler-python linkedin offer

Last synced: 16 Jun 2025

https://github.com/lucaaszsx/spyder

A powerful schema-based web scraping library for Node.js built for fast, structured, and reliable data extraction.

cheerio crawler data dom dom-manipulation html json json-ld parser scraper web xml

Last synced: 11 Jun 2026

https://github.com/pnguyen215/instagram-crawler

Instagram Crawler is a Python script to download posts from a specified Instagram account.

crawler crawling-python instagram instagram-crawler scraper scraping-python scraping-websites scrapper scrapy-crawler

Last synced: 12 Jun 2026

https://github.com/pythoript/pgn-scraper

PGN Scraper is a command-line application written in Go, designed to scrape Portable Game Notation (PGN) files and related formats from the internet.

7zip cbv chess chessbase cli command-line-tool crawler downloader go golang open-source pgn pgn-extract scid scraper web-crawler web-scraper zip

Last synced: 16 Mar 2025

https://github.com/mmqnym/pyppeteer-use-case

Show how to do web crawl via pyppeteer

crawl crawler pyppeteer python

Last synced: 24 Dec 2025

https://github.com/marcinrek/sauron

Basic page crawler written in Node.js

crawler json node-js nodejs requests

Last synced: 28 Apr 2025

https://github.com/supratikchatterjee16/serp_bot

A generic SERP bot, that can be used with just about any search engine.

bot crawler python requests scraping search serp user-agent-spoofer

Last synced: 14 Dec 2025

https://github.com/dimitar0528/crawlitics

An AI-powered Next.js and Python-based ecommerce web crawler, scraper and data-analyst platform that transforms scattered product data into clear market insights.

crawler nextjs product-analysis python scraper

Last synced: 08 Sep 2025

https://github.com/fa7ad/aiub-notes-dl

Download all notes from AIUB's portal

aiub beautifulsoup4 crawler

Last synced: 12 Mar 2025

https://github.com/beomi/pycon2017

2017 파이콘 발표자료: <처음부터 알아보는 웹 크롤러>

crawler pyconkr python

Last synced: 10 Jan 2026

https://github.com/oglinuk/goccer

Go Concurrent Crawler Library

concurrency crawler go library

Last synced: 06 Jul 2025

https://github.com/ambersun1234/lotto_crawler

web crawler for fetching Taiwan lottery history data

crawler python3

Last synced: 15 Jun 2025

https://github.com/moontai0724/auto-notify-pu-courses-quota

A small crawler to fetch remains quota of a list of courses in Providence University every 2 to 10 minutes, then send webhook when change.

crawler javascript nodejs

Last synced: 15 May 2026

https://github.com/kbychkov/simplecrawler-app

The GUI for Simplecrawler

crawler simplecrawler spider

Last synced: 12 Jun 2025

https://github.com/deptno/nsdi

㉿ nsdi downloader built on puppeteer

crawler downloader nsdi openapi puppeteer

Last synced: 16 Apr 2026

https://github.com/andmerk93/scrapy_parser_pep

Учебный проект на Scrapy, парсит PEP, выводит в 2х форматах

crawler scrapy

Last synced: 17 Mar 2025

https://github.com/dangdungcntt/crawl-fb-v2

Simple script to detect email and phone from facebook comment.

crawler facebook

Last synced: 26 Apr 2026

https://github.com/weaming/simple-crawler

my simple crawler

crawler

Last synced: 13 Jun 2025

https://github.com/raphaelalmeidamartins/python-tech-news

Python data science project developed js at the end of Unit 35 (Computer Science Module) of the Trybe's Web Development course

crawler crawler-python data-science pytest python

Last synced: 22 May 2026

https://github.com/maxgio92/package-crawler

A package crawler for most known Linux distros

crawler go linux package

Last synced: 20 Apr 2026

https://github.com/greycloudss/greave

Greave is a fast, multi-mode scanner for locating sensitive information in both local filesystems and Confluence pages.

armourer confluence crawler python reconnaissance security

Last synced: 07 Oct 2025

https://github.com/yowenter/career-roadmap

Oh, how I hate this living death which has swallowed all my teens, if I am cursed with any, will be worn away!

career crawler findjob job-crawler roadmap search-engine

Last synced: 10 Apr 2025

https://github.com/idlesign/gallerycrawler

Generic crawling for galleries

crawler gallery images python3

Last synced: 08 Oct 2025

https://github.com/copha-project/copha

Open-Source Software For Managing Tasks

crawler framework nodejs puppeteer selenium

Last synced: 14 Apr 2026

https://github.com/zabuzard/wslotter

WSlotter is a Selenium driven tool for assigning to events on 'https://www.gruppe-w.de'.

bot crawler gruppe-w

Last synced: 10 Oct 2025

https://github.com/bitscoper/bitscoper_crawler

Crawls the titles of webpages in series by number and creates a list of the available links.

crawler lister

Last synced: 27 Mar 2025

https://github.com/rflcnunes/crawler_email_py

In this project I'm creating a web crawler to check email boxes and handle incoming messages.

aws-bucket aws-bucket-s3 aws-s3 crawler crawler-python email python rabbitmq

Last synced: 10 Aug 2025

https://github.com/mdazlaanzubair/amazon-scraper-api

A web scraper to crawl on amazon to extract products information and return in JSON format.

amazon crawler expressjs json-api nodejs webscraping

Last synced: 14 Apr 2026

https://github.com/40uf411/sillybot

SillyBot is a wrapper for the selenium library

bot crawler python scraper selenium web wrapper

Last synced: 19 Jan 2026

https://github.com/wangzekaihhhh/f2_web_app

面向飞牛 fnOS 的抖音数据采集与备份工具,提供 Web 管理界面与 FPK 打包支持。

crawler douyin fnos nas python

Last synced: 13 Mar 2026

https://github.com/afuntw/misc-crawler

some small crawler for specific website

crawler

Last synced: 14 Oct 2025

https://github.com/soulyma/web_crawler

A focused web crawler to extract and structure Arabic content from web pages. Designed for researchers, data analysts, and developers working on Arabic language datasets.

beautifulsoup4 crawler csv data json python structured-data

Last synced: 15 May 2026

https://github.com/elky84/stock-crawler

Naver Stock Crawler & Mock Invest

asp-net asp-net-core crawler csharp dotnet

Last synced: 18 Apr 2026

https://github.com/dean9703111/humandesign_nodejs

用nodejs爬蟲工具將人類圖網頁上的資訊爬下來,再存到雲端的google excel

crawler googlesheetapi googlesheets nodejs

Last synced: 15 May 2026

https://github.com/somehowchris/swisslos-cralwer

(WIP) Crawler to access the current and history numbers of swisslos

crawler euromillions lotto rust swisslos

Last synced: 22 Mar 2025

https://github.com/birdroad1/server-pinger

Server pinger for Minecraft written in C++

cpp crawler make minecraft minecraft-scanner postgres scanner server

Last synced: 14 Apr 2026

https://github.com/bujosa/aldebaran

Example use APP ENGINE with Python3, ThreadPool and webScraping

appengine crawler flask gcp python3 thread-pool

Last synced: 19 Oct 2025

https://github.com/estroz/seekret

Seekret is a sensitive data crawler for GitHub repositories

crawler security

Last synced: 20 Oct 2025

https://github.com/elektrostudios/gamefaqs-platform-exclusive-games-scraper

Crawls exclusive video games released for the platforms specified on GameFAQs website to generate a table in Markdown format with the crawled titles.

console-app console-application crawler dotnet game gamefaqs games megadrive netframework nintendo ps3 ps4 ps5 scraper snes vbnet videogame videogames windows xbox

Last synced: 09 May 2026

https://github.com/kgruiz/stealth-crawler

Asynchronous headless-Chrome web crawler that discovers internal links and optionally saves HTML, Markdown, screenshots, or PDFs. Built for scripting, inspection, and automation.

asyncio cli crawler headless-chrome html-scraper pydoll python web-crawler

Last synced: 25 Oct 2025

https://github.com/bigmeech/mangaka

Crawl scanlation websites for manga pages

comic crawler manga scanlation webtoon

Last synced: 23 Jan 2026

https://github.com/f-ca7/movie-cat

A website displaying movies

crawler golang website

Last synced: 19 Apr 2026

https://github.com/tubone24/askfm-qa-crawler

Crawl Ask.fm QA lists and create corpus for ML.

askfm chromedriver corpus-builder crawler selenium

Last synced: 14 May 2026

https://github.com/68publishers/crawler-client-php

:spider_web: PHP Client for https://github.com/68publishers/crawler

crawler crawling php scraper scraping

Last synced: 23 Jan 2026