An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/juangesino/gazette

A personal news aggregator application using Meteor.

crawler meteor meteorjs news news-aggregator news-feed scraper

Last synced: 17 Apr 2026

https://github.com/ma-pony/deepspider

智能爬虫工程平台 - 基于 DeepAgents + Patchright 的 AI 爬虫 Agent | Intelligent Web Scraping Platform - AI-powered Crawler Agent built on DeepAgents + Patchright

ai-agent anti-detect automation captcha crawler javascript reverse-engineering web-scraping

Last synced: 03 Apr 2026

https://github.com/949886/pixiv-crawler

Pixiv illustration info crawler to local MySQL database.

crawler mysql pixiv

Last synced: 17 Apr 2026

https://github.com/yassilah/nuxt-crawler

Automatic crawler & search for Nuxt SSG.

algolia crawler nuxt search ssg

Last synced: 17 Apr 2026

https://github.com/dizys/weibo-crawler

A nodejs weibo crawler

crawler nodejs typescript weibo-spider

Last synced: 19 Apr 2026

https://github.com/karantyagi/web-crawler

BFS and DFS implementations for a wikipedia crawler

beautifulsoup crawler

Last synced: 05 Jun 2026

https://github.com/chenty2333/tiktok-youtube_commentscraper

This tool allows you to collect public comments from TikTok and YouTube videos, either via direct video URLs or keyword-based search. It's useful for data analysis, opinion mining, and building datasets for machine learning tasks.一个轻量级的 TikTok 与 YouTube 评论爬虫工具,支持通过视频链接或关键词批量获取评论数据,适用于情感分析、文本挖掘、机器学习等数据收集任务。

comment crawler nlp scraper sentiment-analysis tiktok youtube

Last synced: 20 Apr 2026

https://github.com/bl4ck0w1/swmap

Service Worker security scanner that maps scope, caching, routes & Workbox behavior into actionable risk static-first with optional AST/headless.

app-sec bug-bounty crawler dynamic-analysis penetration-testing playwright pwa recon security-tools service-worker static-analysis web-security work-box

Last synced: 21 Apr 2026

https://github.com/ndoolan360/go-crawler

A simple web crawling program written in Go in an afternoon. 🕷️🕸️

afternoon-project crawler scraper

Last synced: 21 Apr 2026

https://github.com/krishpranav/gocralwer

A awsome crawler made in go

crawler

Last synced: 24 Apr 2026

https://github.com/nagilum/focus

Simple CLI tool, written in C#, to crawl a site and log the responses.

cli crawl crawler csharp playwright

Last synced: 24 Apr 2026

https://github.com/ryanchao2012/okbot

A conversation retrieval engine based on PTT corpus

chatbot crawler django ptt

Last synced: 24 Apr 2026

https://github.com/ssv445/js-rendering-proxy-docker

JS Rendering Proxy API to Handle JS Website in Your Crawler.

crawler proxy puppeteer

Last synced: 25 Apr 2026

https://github.com/m98/email-extractor-crawler

A minimal Node crawler to find emails used inside a website content, this crawler follows links in the website and tries to find an email in the content of the page

crawler email javascript lowdb node-crawler nodejs scraper

Last synced: 25 Apr 2026

https://github.com/suddi/fundscraper

Collection of web crawlers to scrape fund data using Scrapy

crawler funds scraper scrapy

Last synced: 06 Jun 2026

https://github.com/rbkgh/dailytext-crawler

Crawl jw.org to retrieve daily text

crawler dailytext java jsoup jw

Last synced: 06 Jun 2026

https://github.com/tsaohucn/crawler_fb_group

This is crawler use selenium for facebook groups

crawler facebook-groups rails ruby

Last synced: 27 Apr 2026

https://github.com/dan3002/imdb-crawler

A powerful Python-based web crawler that collects comprehensive movie information from IMDb using both GraphQL API and web scraping techniques. This tool can gather detailed movie data including basic information, reviews, and ratings for any type of movies based on customizable filters.

crawler imdb imdb-dataset selenium

Last synced: 27 Apr 2026

https://github.com/luthfan98/screenshoot-crawl-web-automation

Automated full-website screenshot capture and internal link crawler using Puppeteer. Organized output with full-page screenshots, link discovery, retries, and AEST timestamp logs.

automation crawler nodejs puppeteer puppeteer-screenshot web-scraping

Last synced: 27 Apr 2026

https://github.com/saketh7382/smartcrawler

Package for crawling items from webpages and store them as json file

crawler crawler-python open-source pip python3 scraper selenium selenium-webdriver webdriver-manager

Last synced: 28 Apr 2026

https://github.com/pxlrbt/website-diff

Utility tool that bundles a crawler and BackstopJS for visual regression testing.

backstopjs crawler visual-regression-testing

Last synced: 29 Apr 2026

https://github.com/fi1a/crawler

PHP crawler

crawler php

Last synced: 29 Apr 2026

https://github.com/luukalindgren/jobposts-utu

Web site for a database that holds job post data of IT jobs.

crawler docker fastapi mariadb react virtual-machine

Last synced: 29 Apr 2026

https://github.com/zukahai/formosa-views

View Formosa employee profile, salary, bonus year

bonus-year crawler css formosa html javascript nodejs python salary views

Last synced: 29 Apr 2026

https://github.com/redco/goose-phantom-environment

Environment for Goose parser which allows to run it in PhantomJS

crawler environment goose goose-parser nodejs parse parser phantomjs scraper

Last synced: 30 Apr 2026

https://github.com/manku27/webscrapping

Crawls and scraped a website to get rental listings as per my custom needs which the website wasnt providing, and to directly scrape necessary information like Property owner's phone number for quick use.

beautifulsoup crawler python scraper

Last synced: 30 Apr 2026

https://github.com/amirsorouri00/dsl-se

This is a MVP provided based on the "Search Engine And Data Mining" Course. The idea behind this project is the forked project which its link provided is

container crawler distributed-systems docker docker-compose elasticsearch pagerank search-engine

Last synced: 01 May 2026

https://github.com/christopher-besch/therapy_search

Compute Call Times from arztsuche-bw into a Calendar.

appointments calendar crawler gatsby therapy time-management typescript

Last synced: 01 May 2026

https://github.com/lukasherz/22fs-sc-twitter-crawler

used for a research project in social computing @ uzh (fs22)

crawler crawling database twitter twitter-api-v2

Last synced: 02 May 2026

https://github.com/linux0hat/cpp-web-crawler

Explore the web.

cpp crawler sqlite3

Last synced: 09 Jun 2026

https://github.com/ammirsm/data-grabber-cnn-twitter

Basic setup to get data from twitter and CNN with a keyword.

cnn crawler django scrapyd twitter

Last synced: 02 May 2026

https://github.com/comigor/balances

Your checking and savings accounts balances on banks and brokers.

balance bank broker crawler node

Last synced: 02 May 2026

https://github.com/basemax/buskool.com-crawler

This repository contains a PHP-based crawler and scraper designed to fetch and download all product data from the Buskool website (باسکول). The crawler is designed to handle large-scale data scraping efficiently and stores the collected data in JSON format.

buskool buskoolcom crawler crawler-php php php-crawler

Last synced: 03 May 2026

https://github.com/rebrowser/seatgeek-dataset

SeatGeek ticket marketplace data: events with taxonomy and schedule status, listings with section/row and deal bucket, 15K+ performers, 12K+ venues with capacity and coordinates. Updated daily.

concerts crawler data-collection data-science dataset deal-score events open-data scraper seatgeek sports ticket-prices tickets web-scraping

Last synced: 03 May 2026

https://github.com/wshwbluebird/trider

小型go语言爬虫框架

crawler golang

Last synced: 09 Jun 2026

https://github.com/priyakdey/github-api-crawler

A crawler to crawl and save the APIs found in the Public APIs github repo - https://github.com/public-apis/public-apis. Visit README for details.

api crawler mongo python3

Last synced: 04 May 2026

https://github.com/muhfalihr/pycrawlconnect

Project to connect crawled data to Kafka and monitor using elasticsearch. Still under development, PLEASE UNDERSTAND. Haha:)

apache-kafka beginners books crawl crawler crawling crawling-python elasticsearch indonesian instagram movie news python-script python3 social-media twitter x

Last synced: 04 May 2026

https://github.com/kevinchiu2k/kevinchiu2k.github.io

My blog is based on Hexo's Icarus theme

crawler markdown python ssh-eky youtube

Last synced: 04 May 2026

https://github.com/excaliburhan/littlenews

A news app via electron

crawler electron rss-feed

Last synced: 04 May 2026

https://github.com/ozansz/simple-web-downloader

A simple web page downloader program in C

c crawler curl libcurl web

Last synced: 06 May 2026

https://github.com/openpj/manifoldcf-sdk

Apache ManifoldCF SDK is a Maven project focused on helping developers to extend ManifoldCF with new connectors and extensions

apache crawler docker ecm extensions integrations manifoldcf migration sdk search

Last synced: 07 May 2026

https://github.com/leomaurodesenv/smm-maker-profile

A package to fetching the maker profile - Super Mario Maker

crawler javascript json mario-maker nodejs

Last synced: 08 May 2026

https://github.com/alexzhangs/stockdb

Stock data collecting and analyzing

crawler django pandas scrapy stock tushare

Last synced: 10 May 2026

https://github.com/efishery/wpi-kkp-crawler

This is crawler for fisheries price on wpi.kkp.go.id

crawler kkp wpi

Last synced: 10 May 2026

https://github.com/par7133/splash-bot-crawler

Splash Bot creates splash on the fly of your websites - GPL License 🔥

bot crawler gallery open-source opensource php splash

Last synced: 10 May 2026

https://github.com/octcarp/sustech_cs209a-java2_f24_proj

(Spring Boot + Vue3) Stack Overflow data crawling and visualization: Our project of CS209A 2024 Fall: Computer System Design and Applications A (a.k.a. Java 2), SUSTech. Taught by Dr. Yida Tao @yidatao .

crawler spring-boot stackexchange sustech visualization

Last synced: 10 May 2026

https://github.com/enansari/guess-price-car

Car price estimation based on the information of a car sales site | final project of Maktabkhooneh | حدس قیمت خودرو با ماشین لرنینگ | پروژه نهایی مکتب‌خونه

crawler jadi machine-learning maktabkhoone maktabkhooneh python

Last synced: 10 May 2026

https://github.com/alizdavoodi/mcpdocsearch

This project provides a toolset to crawl websites, generate Markdown documentation, and make that documentation searchable via a Model Context Protocol (MCP) server, designed for integration with tools like Cursor.

crawler mcp mcp-server

Last synced: 13 May 2026

https://github.com/siddhantsharma24/web-scraping-application-jsoup

A web crawler application made using Jsoup Library for scraping information from a webpage.

crawler java jsoup jsoup-crawler scraping

Last synced: 13 Jun 2026

https://github.com/siddhantsharma24/stock-market-scraper-jsoup

A web crawler application made using Jsoup Library for scraping Stock Market data from a webpage.

crawler java jsoup jsoup-html jsoup-library web-scraping

Last synced: 13 Jun 2026

https://github.com/danielemoraschi/go-sitemap-common

Simple GO sitemap generator and crawler.

crawler golang sitemap sitemap-generator

Last synced: 17 Jun 2026

https://github.com/sinkaroid/webnovelcrawler

Simple PHPcurl and getRequest to grab Light Novel and WebNovel, then create parser with DOMpdf.

crawler dompdf webnovel

Last synced: 18 Jun 2026

https://github.com/alatiera/ellinofreneia-crawler

Crawler of ellinofreneianet.gr for offline content consumption

crawler ellinofreneia

Last synced: 19 Jun 2026

https://github.com/chen0040/ios-stock-tracker

Stock tracker implemented using Objective-C for iOS

crawler ios-app objective-c stock-prices

Last synced: 20 Jun 2026

https://github.com/basemax/my-site-url-finders

A simple Python-based web crawler that extracts and filters URLs from a given website while avoiding unwanted paths and file types. The crawler follows links recursively within the same domain and provides a clean list of URLs found across the website.

crawler find-url py py-crawler python python-crawler sitemap sitemap-generator url-find url-finder

Last synced: 15 Oct 2025

https://github.com/aleclarson/recrawl

Filesystem crawler

crawler fs nodejs

Last synced: 16 Sep 2025

https://github.com/nelcifranmagalhaes/web_crawler

A web crawler for all Naruto characters

anime beautifulsoup characters crawler naruto python

Last synced: 14 Jul 2025

https://github.com/deventerprisesoftware/scrapi-sdk-dotnet

The only web scraping service you'll ever need that offers advanced features that are simple to use for efficient data extraction.

browser-automation crawler scraper-api web-scraping webscraper

Last synced: 22 May 2026

https://github.com/vietdoo/sg-property-hub

SG Property Hub is a comprehensive platform for managing and analyzing property data.

airflow celery-redis crawler etl etl-pipeline fastapi minio mongodb nextjs postgresql s3 spark webscraping

Last synced: 08 Apr 2026

https://github.com/thomashirtz/douban-crawler

A simple crawler for retrieving information about movies or TV shows from the famous www.douban.com website.

crawler douban

Last synced: 14 May 2025

https://github.com/pymarcus/webscrapingiii

Um crawler que pega produtos em uma lista e percorre as páginas do mercado livre selecionando preços, o nome e o link para acessá-los.

crawler mercadolivre python webscraping

Last synced: 15 Sep 2025

https://github.com/im-perativa/public_crawler

A collection of crawler project for Indonesia dataset

crawler indonesia indonesia-api scrapy

Last synced: 20 Mar 2025

https://github.com/hwywl/mzitu-crawler

爬取mzitu网站的妹子,注意营养

crawler mzitu python

Last synced: 29 Apr 2026

https://github.com/tranbavinhson/crawler

Crawler by Scrapy

crawler python scrapy

Last synced: 25 Jul 2025

https://github.com/konradlinkowski/wikipediafinder

Find words in wikipage

crawler scraper wikipedia

Last synced: 17 Sep 2025

https://github.com/willi-dev/dtcapp

dtcapp : distributed twitter crawler.

crawler distributed-systems hazelcast java twitter twitter-api

Last synced: 18 Sep 2025

https://github.com/roccomuso/is-apple

Verify that a request is from Apple crawlers using DNS verification steps

apple bot crawler dns ip js nodejs

Last synced: 21 May 2026

https://github.com/cseas/shares-monitor

Web crawler to fetch and monitor shares details.

crawler python python3 scraper scraping-websites shares

Last synced: 27 Jul 2025

https://github.com/machu-gwu/crawlib-project

tool set for crawler project.

crawler framework mongodb python scrapy

Last synced: 20 Sep 2025

https://github.com/davidkhala/ml

classic AI index

crawler

Last synced: 17 Jan 2026

https://github.com/panakour/pkscraper

Extract structured data from the web

crawler crawling scraper scraping scraping-websites webcrawler

Last synced: 19 Feb 2026

https://github.com/fengdongfa1995/video-dl

download video from online video websites.

bilibili crawler pornhub python3 video

Last synced: 09 Apr 2026

https://github.com/leandrols/scliper

CLI Tool to make simple web scraping.

cli-scripts crawler golang scraping

Last synced: 01 Nov 2025

https://github.com/ghost---shadow/feature-extractor-from-codebase

Copies the target java file and all its dependencies recursively to another directory

code-splitting crawler

Last synced: 22 Sep 2025

https://github.com/andrew-ld/wowroms-downloader

download all roms from wowroms

aiohttp asyncio crawler python3

Last synced: 17 Jan 2026

https://github.com/beanwei/zmt-post-crawler

Crawler the ZMT platform site ,put the author id, get the post list.This project is coding for my friend

crawler golang golang-ui

Last synced: 08 Nov 2025

https://github.com/rebrowser/autotrader-dataset

AutoTrader car listings database: new, used & CPO vehicles with make, model, trim, mileage, MSRP, KBB fair price range, deal rating, body style, fuel type, and seller state. Updated daily.

automotive autotrader car-listings car-prices crawler data-collection data-science dataset kbb open-data scraper used-cars vehicle-data web-scraping

Last synced: 03 May 2026

https://github.com/dhsagaryt/multisearch

Search efficiently across different platforms with ease. Type your query and choose from multiple search engines, streamlining your experience.

browser crawler internet search search-algorithm search-engine searchbar searchengine webcrawler

Last synced: 14 Feb 2026

https://github.com/programming-with-love/skyeyesystem

天眼系统,每隔十分钟爬取各个平台的热搜数据并入库。包括原始热搜数据存入mysql。词频统计存入Redis。

crawler mysql redis skyeye skyeyewall springboot

Last synced: 25 Sep 2025

https://github.com/shimech/pokemon-db-maker

Webクローリングでポケモン図鑑を生成

beautifulsoup crawler docker pokemon scraper

Last synced: 25 Jan 2026

https://github.com/tasooshi/digslash

A site mapping and enumeration tool for Web applications analysis

crawler mapping sitemap spider

Last synced: 08 Apr 2026

https://github.com/arihantbansal/cybersec-python

Cybersec/CTF practice problems solved in Python

crawler cryptography ctf cybersecurity sockets webscraping

Last synced: 02 Aug 2025

https://github.com/ryanking13/bellorin

Multi-threaded Social Media Crawler 🔍

crawler instagram social-media

Last synced: 29 Jun 2025

https://github.com/udaykiran2017/seo-reports

📊 Generate and analyze SEO reports effortlessly to enhance your website's visibility and performance across search engines.

audit broken-links cli crawler extraction google-lighthouse hreflang-checker hreflang-matrix puppeteer scan-website searchengineoptimization seo seo-macroscope seo-manager seo-meta seo-optimization web-scraping webmaster

Last synced: 16 May 2026

https://github.com/tsoliangwu0130/ptt-search

A simple Python script to fetch PTT post from the command line.

crawler ptt python

Last synced: 08 Aug 2025

https://github.com/muhfalihr/pyxdtelebot

PyXDTeleBot is a Telegram bot created using the Python programming language, specifically designed to facilitate the seamless sharing of media such as photos and videos from Twitter user posts.

crawler crawling crawling-python crontab python3 telegram-bot telegram-bot-api twitter twitter-api x

Last synced: 06 Apr 2025

https://github.com/jfcherng/wiki-cgroup-crawler

此腳本用於抓取維基百科的公共轉換組詞庫,並將結果儲存為外部檔案。

crawler php-71 wiki-cgroup-crawler wikipedia

Last synced: 03 Oct 2025