An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/salman0ansari/sitefetch

Fetch a site and extract its readable content as Markdown (to be used with AI models).

ai chatgpt crawler fetcher golang scraping

Last synced: 19 Aug 2025

https://github.com/amirzenoozi/aparat-videos-dataset

Some Simple Information About Aparat Videos for DataScientists

aparat cli crawler data-science data-science-projects pandas python python3 sdk-python sqlite3 video

Last synced: 17 May 2026

https://github.com/rogerluo410/gcrawler

Google search crawler for Ruby version. Crawling each links' text and url by keywords on Google.com.

crawler crawling google ruby

Last synced: 22 Jun 2026

https://github.com/jovijovi/ether-crawler

A transaction crawler for the Ethereum ecosystem.

blockchain crawler ether ethereum transaction

Last synced: 08 May 2026

https://github.com/okwilkins/web-crawler

This program will crawl through entire domains, exporting every link it can find into a txt file.

crawler crawling files html htmlparser python python3 reader scraper threading threads web writer

Last synced: 14 Mar 2025

https://github.com/vitaee/laravelandcrawlers

php web crawler examples with oop concept and laravel project

crawler laravel php

Last synced: 25 Apr 2026

https://github.com/jiamingla/mvdis_i18n

機車駕照預約考試多語友善版 Non-official

crawler jquery koa koajs nodejs supertest

Last synced: 04 Jan 2026

https://github.com/zephyrpersonal/github-trending-crawler

transform github-trending repos to json data

cheerio crawler fetch github node repository spider trending

Last synced: 04 Jan 2026

https://github.com/rogerchappel/crawldeck

Local-first crawl job deck for fixture-backed queues, health, and crawler adapter seams.

agent-tools cli crawler local-first queue typescript

Last synced: 26 May 2026

https://github.com/basemax/css-properties

The CSS Properties Repository is a comprehensive collection of CSS properties, categorized and detailed for web developers. It offers a structured overview of various CSS properties, including their names, categories, brief descriptions, and links to detailed references.

crawler css css-properties css-property css3

Last synced: 11 Jun 2026

https://github.com/zhanymkanov/marketplace_parser

Products and Reviews Crawler

crawler python scrapy

Last synced: 26 May 2026

https://github.com/buttermiilk/sentakusha

simple (and badly written express.js) crawler for the washing machine game.

api crawler imagegeneration maimai

Last synced: 07 Apr 2025

https://github.com/xcrypt0r/xcrawler

✂️ A crawling example for maplestory with various languages using multi-threading

crawler crawling multithreading parsing regexp

Last synced: 14 Jun 2025

https://github.com/mazzasaverio/lean-jobs-crawler

(Let's build) A lean, high-performance web crawler specializing in job posting extraction directly from company websites. Uses LLM for intelligent URL discovery and data extraction.

crawler docker llm logfire neon openai python uv

Last synced: 15 Mar 2025

https://github.com/konradlinkowski/mailcrawler

Crawler to find emails in the websites

crawler scraper

Last synced: 05 Jan 2026

https://github.com/danoctavian/proxy-master

manage a set of http proxies

crawler http-proxy node-proxy-server

Last synced: 27 May 2026

https://github.com/naveenaidu/google-crawler

Google Crawler - Curates the search results

beautifulsoup crawler scraper

Last synced: 27 May 2026

https://github.com/hedon954/go-crawler

A crawler system implemented in Go.

crawler go

Last synced: 15 Mar 2025

https://github.com/taurusolson/jobscraper

Je cherche un poste de développeur en France

crawler

Last synced: 23 Jun 2025

https://github.com/fnkr/gocrawl

Simple web crawler.

crawler http-client

Last synced: 23 Mar 2025

https://github.com/trixsec/zeuscrawler

The ultimate web crawling powerhouse, striking the web like lightning to harvest data with divine precision.

crawler cybersecurity information-gathering information-retrieval osint python scraper spider web-crawler web-scraper

Last synced: 07 Apr 2025

https://github.com/fiandev/otaku-crawler

simple way to scrape and collect anime list from otakudesu

anime bun crawler nodejs scraper

Last synced: 08 May 2026

https://github.com/bac0id/wayback-machine-auto-save

A crawler to save web pages on list to Save Page Now of Internet Archive's Wayback Machine.

crawler internet-archive python save-page-now wayback-machine

Last synced: 28 May 2026

https://github.com/mkfsn/chronos

A light cron-like container service - create cron job easily.

crawler cron cronjob golang

Last synced: 20 Jul 2025

https://github.com/mohabmes/matool

A collection of various custom tools. { Antesh, CITerm, INetSC, KADManga, Tomado }

cli codeigniter-terminal crawler mangareader markd markdown markdown-to-html parser readme scan-tool scanner-web

Last synced: 15 May 2026

https://github.com/vinzdef/apartment-crawler

Crawler for housing in the Netherlands. It scrapes FB groups and Kamernet listings

amsterdam crawler fb-groups housing phantomjs

Last synced: 31 Mar 2025

https://github.com/tanja-4732/od-get

A Rust tool for recursively crawling & downloading data from open directories

cli crawler open-directory open-directory-downloader rust

Last synced: 26 May 2026

https://github.com/bingxyz/blackcat

使用telegram bot查詢黑貓物流

crawler nodejs telegram-bot

Last synced: 21 May 2026

https://github.com/fenying/huaban-crawler

A board-pins crawler for huaban.com, base on Node.js

crawler huaban

Last synced: 02 Jul 2025

https://github.com/geoffreybauduin/website-checker

Performs useful checks against a website, such as 404 errors reporting, structured data validation...

crawler seo structured-data web-spider website

Last synced: 19 Apr 2025

https://github.com/simonrichardson/crwlr

Crawl all the things!

crawler meshuggah

Last synced: 24 Mar 2025

https://github.com/henkman/crawlers

:squirrel: some crawlers and downloaders

crawler

Last synced: 28 May 2026

https://github.com/hudson-newey/user-web-crawler

The Archive.org Crawler works through volunteering users who install an extension on their browsers. When the user visits a webpage, the URL is anonymously added to the Archive.org database.

archive crawler open-internet

Last synced: 27 Feb 2025

https://github.com/vindecodex/automated-crawler-wget

Using wget to crawl site

crawler shell-script

Last synced: 03 Sep 2025

https://github.com/juan-kabbali/glassdoor-linkedin-web-scrapper

CLI application that acts as web scrapper to retrieve Glassdoor and LinkedIn information

crawler webscraping

Last synced: 29 Jan 2026

https://github.com/zituocn/ziva

A golang crawler framework

crawler go golang

Last synced: 18 Jan 2026

https://github.com/khilnani/spidey.py

Web spiders are usually disliked by websites, but useful for recursive API/page downloads for offline analysis.

cli crawler python scaper web-spider

Last synced: 25 Mar 2025

https://github.com/sebyx07/active_proxy

Ruby proxy fetcher, retries request until completed, provides user agent🚀🚀

crawler http proxy rails ruby

Last synced: 07 May 2026

https://github.com/noarche/darknoisy

Same as my Noisy but on TOR network. Logs links. Crawls onion sites.

crawler crawling onion-domains onion-services onion-sites onions-list python python-script python3 tor torsocks

Last synced: 08 Sep 2025

https://github.com/jimmy-ly00/dhe-prime-grabber

Grabs Diffie-Hellman primes from certificates using OpenSSL. Uses multiprocessing to collect over 50 million Diffie-Hellman primes.

certificate certificates crawler dhe-prime-grabber diffie-hellman ipv4 multiprocessing openssl prime prime-numbers python python-3

Last synced: 26 Dec 2025

https://github.com/sreejoy/crawlerfriend

A light weight crawler which gives search results in HTML form or in Dictionary form, given URLs and keywords.

crawler python-crawler python-scraper python27 scrapper

Last synced: 12 Jun 2025

https://github.com/loggerhead/dianping_crawler

基于 Scrapy (python 3.5) 的大众点评爬虫

crawler python-3-5

Last synced: 14 Feb 2026

https://github.com/godbout/htmlpagedom

jQuery-inspired DOM manipulation extension for Symfony's Crawler

crawler dom html htmlpagedom php symfony

Last synced: 14 Jan 2026

https://github.com/greatdrake/contributecounter

crawl Wikipedia for contributers

crawler python scraping

Last synced: 02 Apr 2025

https://github.com/victorpre/erlich

Erlich Bachman - Hacker Hostel

chatbot crawler elixir housing umbrella

Last synced: 28 Mar 2025

https://github.com/marzzzello/gplaycrawler

(mirror) Discover apps by different mehtods. Mass download app packages and metadata.

crawler google-play google-play-store googleplay googleplaystore playstore playstoreapi scraper

Last synced: 09 Apr 2025

https://github.com/basemax/rondircrawler

A crawler for extracting a list of top sim cards and tel numbers from the Rond.ir website. (PHP)

crawle-php crawler crawler-testing crawlers crawlers-php php php-crawler rondir

Last synced: 03 Apr 2025

https://github.com/abdus/scrape-web

A simple web scrapper for Node.js

crawler web-scraping web-scrapper

Last synced: 25 Mar 2025

https://github.com/developerjosh/gogo-crawler

The tool kit for making an anime website with a database full of anime

crawler crawler-js gogoanime gogoanime-api gogoanime-scraper

Last synced: 07 Aug 2025

https://github.com/thiagopanini/datadelivery

Um módulo Terraform open source capaz de proporcionar um toolkit completo de infraestrutura para que usuários iniciem suas respectivas jornadas de exploração em serviços de Analytics na AWS.

analytics athena aws catalog crawler data datamesh glue s3 terraform

Last synced: 29 Nov 2025

https://github.com/baerwang/sec_craw

一个方便安全研究人员获取每日安全日报的爬虫,目前爬取范围包括90sec、看雪论坛、v2ex、精易论坛、52破解论坛等实验室博客,持续更新中。

crawler security security-tools threat threat-intelligence

Last synced: 04 Jul 2025

https://github.com/yjg30737/pyqt-wikipedia-crawler

Crawling the Wikipedia with Python powered by BeautifulSoup4, Supporting GUI/CUI

beautifulsoup4 crawler pyqt pyqt5 wikipedia

Last synced: 05 Sep 2025

https://github.com/phanikmr/linkcrawler

A LinkCrawler is a Python module that takes a url on the web (ex: http://python.org), fetches the web-page corresponding to that url, and parses all the links on that page into a repository of links. Next, it fetches the contents of any of the url from the repository just created, parses the links from this new content into the repository and continues this process for all links in the repository until stopped or after a given number of links are fetched.

async crawler linkcrawler parse python scrapy spider

Last synced: 07 Feb 2026

https://github.com/brianmacintosh/wikicrawler

Sandbox project for manipulating Wikimedia wikis

c-sharp crawler mediawiki-bot wikipedia-bot

Last synced: 11 Jul 2025

https://github.com/seanowenhayes/recipe-scraper

A simple scraper uses puppeteer to scrape recipes and more from the web

crawler crawling data recipes scraping

Last synced: 22 Feb 2026

https://github.com/appliedsoul/crawlmatic

Static and Dynamic website crawling library - a common promise based wrapper around node-crawler & hccrawler libraries.

crawler scraper

Last synced: 24 Jul 2025

https://github.com/dingpingzhang/papermedia

A scrapy-based crawler for crawling paper media.

crawler scrapy spider

Last synced: 08 Apr 2025

https://github.com/0xpr03/clantool

CF Management & Data Analysis Tool, crawler backend in rust

backend-server crawler data-analysis rust

Last synced: 05 Feb 2026

https://github.com/javokhirbek1999/tez-spider

Distributed music scraper built in Go

concurrent crawler distributed-systems music-scraper

Last synced: 17 Jan 2026

https://github.com/injectrl/xhspicextractor

小红书原图提取工具

crawler dotnet7 minimalapi okteto xiaohongshu

Last synced: 20 Jun 2026

https://github.com/wondervictor/spiderman

2017 Software Course Project

crawler distribute-crawler zhihu-crawler

Last synced: 21 Apr 2026

https://github.com/buren/stupid_crawler

Stupid crawler that looks for URLs on a given site

cli crawler ruby rubygem

Last synced: 09 Apr 2025

https://github.com/anyparser/anyparser_core

Anyparser Python SDK for RAG/ETL Pipelines - File Content Extraction. Supports extraction from various file formats including PDF, Microsoft Office documents, OCR/Image to Text, Audio to Text, and Website to Text.

cache-augmented-generation crawler crewai etl-framework etl-pipeline knowledge-graph knowledgebase langchain langgraph llamaindex ms-office n8n ocr openai pdf python rag retrieval-augmented-generation search-engine typescript

Last synced: 05 Oct 2025

https://github.com/pnguyen215/instagram-crawler

Instagram Crawler is a Python script to download posts from a specified Instagram account.

crawler crawling-python instagram instagram-crawler scraper scraping-python scraping-websites scrapper scrapy-crawler

Last synced: 12 Jun 2026

https://github.com/pythoript/pgn-scraper

PGN Scraper is a command-line application written in Go, designed to scrape Portable Game Notation (PGN) files and related formats from the internet.

7zip cbv chess chessbase cli command-line-tool crawler downloader go golang open-source pgn pgn-extract scid scraper web-crawler web-scraper zip

Last synced: 16 Mar 2025

https://github.com/mmqnym/pyppeteer-use-case

Show how to do web crawl via pyppeteer

crawl crawler pyppeteer python

Last synced: 24 Dec 2025

https://github.com/marcinrek/sauron

Basic page crawler written in Node.js

crawler json node-js nodejs requests

Last synced: 28 Apr 2025

https://github.com/beomi/pycon2017

2017 파이콘 발표자료: <처음부터 알아보는 웹 크롤러>

crawler pyconkr python

Last synced: 10 Jan 2026

https://github.com/oglinuk/goccer

Go Concurrent Crawler Library

concurrency crawler go library

Last synced: 06 Jul 2025

https://github.com/deptno/nsdi

㉿ nsdi downloader built on puppeteer

crawler downloader nsdi openapi puppeteer

Last synced: 16 Apr 2026

https://github.com/andmerk93/scrapy_parser_pep

Учебный проект на Scrapy, парсит PEP, выводит в 2х форматах

crawler scrapy

Last synced: 17 Mar 2025

https://github.com/dangdungcntt/crawl-fb-v2

Simple script to detect email and phone from facebook comment.

crawler facebook

Last synced: 26 Apr 2026

https://github.com/greycloudss/greave

Greave is a fast, multi-mode scanner for locating sensitive information in both local filesystems and Confluence pages.

armourer confluence crawler python reconnaissance security

Last synced: 07 Oct 2025

https://github.com/idlesign/gallerycrawler

Generic crawling for galleries

crawler gallery images python3

Last synced: 08 Oct 2025

https://github.com/copha-project/copha

Open-Source Software For Managing Tasks

crawler framework nodejs puppeteer selenium

Last synced: 14 Apr 2026

https://github.com/zabuzard/wslotter

WSlotter is a Selenium driven tool for assigning to events on 'https://www.gruppe-w.de'.

bot crawler gruppe-w

Last synced: 10 Oct 2025

https://github.com/mdazlaanzubair/amazon-scraper-api

A web scraper to crawl on amazon to extract products information and return in JSON format.

amazon crawler expressjs json-api nodejs webscraping

Last synced: 14 Apr 2026

https://github.com/40uf411/sillybot

SillyBot is a wrapper for the selenium library

bot crawler python scraper selenium web wrapper

Last synced: 19 Jan 2026

https://github.com/wangzekaihhhh/f2_web_app

面向飞牛 fnOS 的抖音数据采集与备份工具,提供 Web 管理界面与 FPK 打包支持。

crawler douyin fnos nas python

Last synced: 13 Mar 2026

https://github.com/afuntw/misc-crawler

some small crawler for specific website

crawler

Last synced: 14 Oct 2025

https://github.com/basemax/my-site-url-finders

A simple Python-based web crawler that extracts and filters URLs from a given website while avoiding unwanted paths and file types. The crawler follows links recursively within the same domain and provides a clean list of URLs found across the website.

crawler find-url py py-crawler python python-crawler sitemap sitemap-generator url-find url-finder

Last synced: 15 Oct 2025

https://github.com/birdroad1/server-pinger

Server pinger for Minecraft written in C++

cpp crawler make minecraft minecraft-scanner postgres scanner server

Last synced: 14 Apr 2026

https://github.com/bujosa/aldebaran

Example use APP ENGINE with Python3, ThreadPool and webScraping

appengine crawler flask gcp python3 thread-pool

Last synced: 19 Oct 2025

https://github.com/estroz/seekret

Seekret is a sensitive data crawler for GitHub repositories

crawler security

Last synced: 20 Oct 2025

https://github.com/elektrostudios/gamefaqs-platform-exclusive-games-scraper

Crawls exclusive video games released for the platforms specified on GameFAQs website to generate a table in Markdown format with the crawled titles.

console-app console-application crawler dotnet game gamefaqs games megadrive netframework nintendo ps3 ps4 ps5 scraper snes vbnet videogame videogames windows xbox

Last synced: 09 May 2026

https://github.com/kgruiz/stealth-crawler

Asynchronous headless-Chrome web crawler that discovers internal links and optionally saves HTML, Markdown, screenshots, or PDFs. Built for scripting, inspection, and automation.

asyncio cli crawler headless-chrome html-scraper pydoll python web-crawler

Last synced: 25 Oct 2025