An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/ericc-ch/crawldown

Crawl websites and convert their pages into clean, readable Markdown content using Mozilla's Readability and Turndown.

crawler markdown scraper

Last synced: 05 Jul 2025

https://github.com/lucasromualdo/glassdoorcrawler

Crawler em Python para coletar vagas do Glassdoor e exportar para Excel

cli crawler glassdoor openpyxl pandas python web-scraping

Last synced: 25 Feb 2026

https://github.com/cak/foot

Foot is a library that fetches a list of URLs and silly walks through each site to gather information.

bugbounty crawler scraping

Last synced: 22 May 2026

https://github.com/matheusfaustino/jazzmaster_crawler

It is a crawling for getting the audio programs from a specific radio program called Jazzmaster

crawler python scrapy

Last synced: 14 Jun 2025

https://github.com/frostming/daily-wallpaper

A small crawler to get wallpapers from Unsplash

crawler python requests unsplash wallpaper

Last synced: 20 Mar 2025

https://github.com/jayzhan211/python-crawler-startups

python crawler learning

crawler python

Last synced: 20 Mar 2025

https://github.com/kkamara/nodejs-scraper

(2022) Use JavaScript technologies to crawl and click buttons on websites with GUI.

bot crawler nodejs scraper spider

Last synced: 25 Jan 2026

https://github.com/tetreum/xupopter_runner

Executes crawling recipes coming from Xupopter Chrome Extension.

crawler scrapper scrapping webscraper

Last synced: 08 Aug 2025

https://github.com/tetreum/xupopter_client

Simple interface to manage Xupopter recipes aswell as it's runners.

crawler scrapper scrapping webscraper

Last synced: 04 Apr 2025

https://github.com/luanpotter/series-api

A simple IMDB crawler feeding a Series API

api crawler imdb json rest series

Last synced: 15 Feb 2026

https://github.com/kasperomari/simplecrawlerapi

A simple RESTful API that takes a URL and returns all the links in a specific depth.

crawler flask-api flask-restful

Last synced: 02 Apr 2025

https://github.com/krishpranav/gozap

⚡️ Multiple target ZAP Scanning made in go

cli crawler go go-crawler golang zap

Last synced: 27 Mar 2025

https://github.com/jonesrussell/north-cloud

A full-stack content intelligence pipeline that crawls, classifies, and routes news articles in real time for downstream consumers.

content crawler publisher

Last synced: 25 Jan 2026

https://github.com/g-ongenae/morphalou-crawler

A Crawler for CNRTL's Morphologie words

crawler french lexical-databases list-of-words words

Last synced: 25 Feb 2025

https://github.com/r3c0ger/douban-movie-top250-crawler

Crawl the movie information of Douban Movie Top-250, including movie name, movie link, director, starring, release time, production country/region, type, rating, number of reviews and introduction.

beautifulsoup4 crawler lxml python3 spider

Last synced: 10 Jun 2026

https://github.com/iamkushvanth/real-time-data-analysis-using-kafka

In this project, you will execute an End-To-End Data Engineering Project on Real-Time Stock Market Data using Kafka. We are going to use different technologies such as Python, Amazon Web Services (AWS), Apache Kafka, Glue, Athena, and SQL.

athena aws aws-ec2 aws-s3 crawler glue kafka kafka-consumer python sql

Last synced: 18 Jun 2026

https://github.com/daviddavo/blogspot-crawler

Crawler for blogspot and blogger with beautifulsoup

crawler hacktoberfest python

Last synced: 19 Apr 2026

https://github.com/boatraceventureproject/boatracescraper

The BVP Crawler package for Boatrace.

boatrace crawler php php-library php8

Last synced: 17 Mar 2025

https://github.com/lesterrry/campfire

Shock-drop watching utility

crawler parser web-crawler web-parser

Last synced: 13 Jun 2026

https://github.com/fmind/fincrawl

Crawl documents, metadata, and files from financial institutions

crawler documents finance python scrapy

Last synced: 30 Apr 2026

https://github.com/grayhat12/grawler

A web based Crawler that takes two inputs(search item, number of sites to search)and curently displays Readable Content in Text Format but the Code can be modified to display the HTML code.

crawler scraping scraping-websites scrapper scrapy-crawler

Last synced: 27 Mar 2025

https://github.com/lolyratul025/web-email-bundler

A lightweight Python web crawler that extracts valid email addresses from websites. Features domain-bound crawling, false-positive filtering (@1x.png etc.), proxy support, and polite delays.

crawler cybersecurity-tools email-extractor osint-tool python3 web-scraping

Last synced: 22 May 2026

https://github.com/moe131/webcrawler

Python web crawler designed to scrape websites

crawler crawling-python python python-crawler scraping simhash web-crawler

Last synced: 09 Apr 2025

https://github.com/ark930/douban-movie-crawler

豆瓣影评爬虫

crawler douban movie python

Last synced: 18 Mar 2025

https://github.com/ismoreirakt/spyder

The web is changing. Spyder sees it.

alerts automation crawler monitor

Last synced: 01 Mar 2025

https://github.com/mnemocron/VPNNetworkShareCrawler

ugly scripts to connect a Raspberry Pi to a VPN and attach network share to periodically crawl the documents on it

crawler samba vpn

Last synced: 11 Mar 2025

https://github.com/jofaval/open-graph-visualizer

Web Scraping showcase of how crawlers retrieve site's details through the Open Graph Protocol

crawler javascript opengraph scraping web web-scraping

Last synced: 08 Sep 2025

https://github.com/onetail/crawler-with-kafka-docker

homework to crawler and anaylsis

analysis crawler kafka-docker

Last synced: 18 Mar 2025

https://github.com/codegram01/go-ai-crawl

Golang Web Crawl with AI

ai chromedp crawler golang ollama

Last synced: 16 Apr 2026

https://github.com/crosscutsaw/iscsicrawler

iscsicrawler is a bash script that crawls files in the iscsi targets with ease.

crawler iscsi iscsi-target iscsiadm

Last synced: 16 Jan 2026

https://github.com/appliedsoul/headless-screenshot

High-level library for taking screenshot of websites based on headless chrome (puppeteer)

crawler headless-chromium javascript nodejs scrapper screenshot testing

Last synced: 21 Apr 2026

https://github.com/ggteixeira/corpus-cleaner

Linguistic tool (made by a linguist, for linguists) that scraps corpora, automatically cleans it up, and generates n-grams.

beautifulsoup4 bs4 corpora corpus corpus-linguistics crawler linguistics nlp python scraper web-scraping

Last synced: 28 Feb 2025

https://github.com/shunk031/amebloscraper

Scraper for Ameblo in Scrapy

ameblo crawler scraper scrapy

Last synced: 30 Jul 2025

https://github.com/xoraus/revieworacle

The proposed system assists users in deciding which product to buy. It gathers reviews along with the details from multiple websites, which sell the product. Other than that the system is trained to analyze the polarity of the product.

ai crawler datascience machinelearning scrappy selenium-webdriver

Last synced: 07 May 2026

https://github.com/dnknth/robot.py

Simple web spider

crawler curio python

Last synced: 23 Jul 2025

https://github.com/waived/google-drive-crawler

Proxy-based crawler to expose public (shared) Google Drive links

crawler crawler-python file-crawler google-drive-api shared-folders web-spider

Last synced: 27 Mar 2025

https://github.com/matheusfaustino/phrawl

Phrawl: A web crawling framework in PHP (or it seems so)

crawler crawling crawling-framework php scraper wip

Last synced: 08 Sep 2025

https://github.com/avsbharadwaj/web_crawler

A basic web crawler that prints out the links and description present on a website rescursively

crawler web

Last synced: 21 Apr 2026

https://github.com/kofj/octopus

Octopus an open source software to collect data from web pages.

crawler

Last synced: 15 May 2026

https://github.com/joaooliveirapro/trawlergo

TrawlerGo 🐛 is a basic HTTP crawler written in Go, designed to efficiently discover all URLs within a specified domain while capturing related HTTP request information.

crawler go golang http

Last synced: 09 Jun 2026

https://github.com/Arman2409/data-falcon

Web crawler

crawler extract-data

Last synced: 02 Apr 2025

https://github.com/bramtenhove/issue-crawler

Crawls Drupal issues and keeps stats

crawler

Last synced: 09 Jan 2026

https://github.com/yangxuhui/requests-google

A simple google related Parsing Package

crawler google-api parsing

Last synced: 14 Jan 2026

https://github.com/usethisname1419/connectioncrawler

crawls a website and checks for connections

connection crawler http-headers reporting website-analyzer

Last synced: 06 Jul 2025

https://github.com/jonasrenault/pubchem-api-crawler

Python client for PubChem's API to crawl compounds and their properties using a molecular formula search query.

chemistry crawler molecular-formula pubchem python

Last synced: 15 May 2026

https://github.com/mikiw/reactweb3

Ethereum transaction crawler in ReactJs.

blockchain crawler ethereum

Last synced: 14 May 2026

https://github.com/loko5ja/seed-gen

Seed-gen is an innovative tool designed to generate unique and creative seed phrases for cryptocurrency wallets. With a focus on security and usability, it ensures that users have robust, memorable keys for safeguarding their digital assets efficiently.

crawler crypto crypto-2025 crypto-bot crypto-finder crypto-recovery ethereum-bruteforce laravel lost-btc-wallet-finder mnemonic-generator seed-crypto seed-recovery seed-tool yeoman

Last synced: 03 Apr 2025

https://github.com/jackfsuia/chats-crawler

Discourse chat data crawling and on-the-way parsing straight for LLM instruction finetuning. 论坛数据爬取和解析,直接用于对话微调。

crawler fine-tuning finetune-llm gpt html-css-javascript instruction-tuning llm llm-training llms nlp nlp-parsing parser

Last synced: 09 Jul 2025

https://github.com/nowshad-sust/corona

A simple data endpoint for coronavirus updates

api corona coronavirus-updates crawler dcoker-compose excel nodejs

Last synced: 17 May 2026

https://github.com/sssshefer/web-crawler-http

Basic web crawler which represents the linking structure of the website

crawler jest jest-tests js

Last synced: 01 Mar 2025

https://github.com/moparisthebest/nginx-limit-crawlers

rate limit crawlers in nginx

ai crawler nginx

Last synced: 14 Mar 2025

https://github.com/timchen10001/crawler-711-taiwan

Crawler for Python to scrapping updated informations of 711

711 crawler python python3 taiwan

Last synced: 27 Mar 2025

https://github.com/iamtonmoy0/sitemap-crawler

site map crawler with golang and goquery

crawler

Last synced: 23 Feb 2025

https://github.com/allancapistrano/anime-sheets

Crawler que pega as informações dos animes e salva numa planilha.

anime crawler google-sheets google-sheets-api

Last synced: 16 Mar 2025

https://github.com/eivindarvesen/naive-spider

A minimal web crawler

crawler python spider

Last synced: 31 May 2026

https://github.com/truongdd03/searchengine

A search engine written in c++.

cpp crawler search search-engine

Last synced: 06 Apr 2025

https://github.com/pixlcrashr/stwhh-mensa

Better STWHH Mensa menu data / interface / notifier

api crawler data food studierendenwerk-hamburg university website

Last synced: 07 Aug 2025

https://github.com/roc41d/http-web-crawler

Http web crawler with Nodejs + TDD

crawler http javascript jest jest-test nodejs webcrawler

Last synced: 13 Apr 2026

https://github.com/jonesrussell/pipelinex

Firecrawl-style web intelligence pipeline powered by North Cloud

crawler pipeline vue

Last synced: 09 Mar 2026

https://github.com/moojing/coinmarketcap-crypto-crawler

A Raycast plugin for getting the latest price of your favorite coins from CoinMarketCap.

crawler cryptocurrency

Last synced: 01 Apr 2025

https://github.com/luminovrym/crawler-tools-js

Crawler Tools Js adalah sebuah aplikasi yang digunakan untuk scrapping data pada sebuah web

crawler crawler-js data js web-scraping

Last synced: 08 Sep 2025

https://github.com/andrefs/derzis

A path-aware distributed linked data crawler

crawler linked-data

Last synced: 09 Aug 2025

https://github.com/kenanbek/tutorial-python-crawler

Crawling website data using Python with requests and Beautiful Soup libraries

beautifulsoup crawler crawling miner parser python python-requests requests

Last synced: 30 Mar 2025

https://github.com/kestarumper/imagecrawler

Downloads images from given URL

crawler image-downloader

Last synced: 28 Jun 2025

https://github.com/d-w-arnold/local-news-data-collection

Web crawler for local news sites - Generates HTML files of each webpage visited and a list of links found on the webpage, as a TXT file 🌎

crawler data-collection python

Last synced: 01 Apr 2025

https://github.com/raspi/scrapy-vgmusic

Crawler for vgmusic web site

crawler game midi music python scrapy spider

Last synced: 16 Apr 2026

https://github.com/tsaohucn/crawler_fb_page

This is crawler use selenium for facebook pages

crawler facebook-page rails ruby selenium

Last synced: 09 May 2026

https://github.com/allotmentandy/socialmedialinkextractor

php laravel package to extract social media links from an array of links for my spider, used as part of a spider for checking londinium.com website links

crawler extractor facebook laravel linked-list php social social-network spider twitter url youtube

Last synced: 09 May 2026

https://github.com/basemax/okala-product-ids

A PHP script to fetch and save product IDs from Okala's online store API across multiple categories and store branches.

crawler crawler-okala crawler-php crawlers data database ids ir iran json okala okala-crawler php php-crawler product

Last synced: 09 May 2026

https://github.com/xenia101/coro.na

A Web Map Service for the Corona-virus

coronavirus covid-19 crawler crawling flask flask-api json python requests urllib

Last synced: 09 May 2026

https://github.com/catbraaain/search-crawl

Search the web and crawl content stealthily, with optional extraction using LLMs.

crawl crawler fastapi playwright scrape scraping searxng

Last synced: 09 May 2026

https://github.com/victorbaumgartner/electron-crawler-ui

Desktop app with axios electron to crawl websites accross multiple servers

app axios crawler desktop electronjs macos multiple-servers multithreading

Last synced: 09 May 2026

https://github.com/metehan777/http-header-link-graph

Publish a site's link graph & heading map in HTTP response headers. Crawl 65k pages in 99 seconds without parsing one byte of HTML. Companion code for the SEO Week 2026 NYC experiment.

aeo answer-engine-optimization cloudflare-workers crawler generative-engine-optimization geo http-headers link-graph python rust seo site-architecture technical-seo

Last synced: 03 Jun 2026

https://github.com/nsalvacao/cli-plugins

OpenAPI for CLIs — Crawl any CLI's --help output and generate structured Claude Code plugins with expert command knowledge

ai-agent claude-code cli cli-reference crawler developer-tools help-parser llm plugin python

Last synced: 04 Mar 2026

https://github.com/machinecyc/lotteryinsight

Use crawler to collect Taiwan Lotto data, and save data into local MySQL server.

crawler data docker lottery mysql-database python3 taiwan

Last synced: 09 May 2026

https://github.com/lopins/article-crawler

一个简单的网页文章爬取工具,可以自定义抽取自己所需要的字段内容,简单容易上手。

article crawler ftp mysql python sqlite3

Last synced: 10 May 2026

https://github.com/khanof89/twitter_scraper

Scrape tweet details from user profile using selenium

crawler scraper selenium twitter twitter-bot

Last synced: 11 May 2026

https://github.com/briangershon/crawlee-playwright

Browser-based automations with Crawlee and Playwright using Vite tooling and TypeScript

crawlee crawler playwright starter-template typescript vite

Last synced: 12 May 2026

https://github.com/georgynet/crawler

Web Crawler

crawler go golang web-crawler

Last synced: 10 Jun 2026

https://github.com/sbstjn/tatort

Query information for upcoming Tatort shows

crawler node nodejs tatort

Last synced: 12 May 2026

https://github.com/fredcodee/pexel.com-image-scrapper

download images from pexel.com

crawler image python selenium

Last synced: 13 May 2026

https://github.com/maxonary/simple-crawler

Streamlit Webscraper

crawler streamlit webscraping

Last synced: 20 Jun 2026

https://github.com/nextlevelshit/node-crawl

Webcrawler for nodejs

crawl crawler javascript nodejs

Last synced: 14 May 2026

https://github.com/manchittlab/TheCrawler

Open-source web scraper + LLM-powered structured extraction. PDF/DOCX, markdown, JSON-LD, microdata, commerce data, forms, 16 analytics-tracker detection. Structured errors with retryable flags. Adaptive Cheerio->Playwright. CLI, npm, REST API, and MCP server. AGPL-3.0.

agpl apify cheerio crawler llm markdown mcp mcp-server model-context-protocol nodejs playwright rag scraper typescript web-scraping

Last synced: 20 Jun 2026

https://github.com/soenneker/soenneker.playwrights.crawler

A configurable Playwright crawler with rich stealth and control options.

browser chrome chromium crawl crawler csharp dotnet playwright playwrightcrawler playwrights scrape scraper stealth util

Last synced: 14 Jun 2026

https://github.com/vhdm/twitter-hashtag-crawler

Twitter hashtag crawler by selenium, without using the Twitter API ;)

crawler python tor twitter

Last synced: 14 Jun 2026

https://github.com/tri613/nespresso

A mobile version for nespresso coffee website :coffee:

crawler nespresso node-js

Last synced: 15 Jun 2026