An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/meilisearch/actions

Meilisearch Github Actions

crawler meilisearch

Last synced: 26 Jun 2025

https://github.com/rogerchappel/crawldeck

Local-first crawl job deck for fixture-backed queues, health, and crawler adapter seams.

agent-tools cli crawler local-first queue typescript

Last synced: 26 May 2026

https://github.com/gozeon/weibo-crawler

微博爬虫

crawler web-crawler

Last synced: 21 Mar 2025

https://github.com/eea/eea-crawler

EEA Crawler contains the tasks (DAGs) used by Apache Airflow to index content from various EEA-Eionet websites into a central Elasticsearch (aka content hub).

airflow-dags crawler elasticsearch etl-pipeline indexing

Last synced: 20 May 2026

https://github.com/rxcai/python3-weibo-crawler

基于Python3实现的微博小爬虫

crawler python python3 spider weibo

Last synced: 22 Mar 2025

https://github.com/arghyadipchak/craww

Gemini (protocol) crawler written in Rust

crawler gemini gemini-protocol rust

Last synced: 15 Jun 2026

https://github.com/yordadev/fenrisjs

A NodeJS application that scrapes any links from a given input and outputs the results nicely into one of two files, external or internal file for further analysis.

analysis crawler link-collection link-crawler nodejs nodejs-application

Last synced: 10 May 2026

https://github.com/basemax/css-properties

The CSS Properties Repository is a comprehensive collection of CSS properties, categorized and detailed for web developers. It offers a structured overview of various CSS properties, including their names, categories, brief descriptions, and links to detailed references.

crawler css css-properties css-property css3

Last synced: 11 Jun 2026

https://github.com/muhfalihr/pyxdtelebot

PyXDTeleBot is a Telegram bot created using the Python programming language, specifically designed to facilitate the seamless sharing of media such as photos and videos from Twitter user posts.

crawler crawling crawling-python crontab python3 telegram-bot telegram-bot-api twitter twitter-api x

Last synced: 06 Apr 2025

https://github.com/dineshsprabu/concurrent-web-crawler

Flexible and concurrent web crawler implemented in 'go'

concurrent-web-crawler crawler go-crawler spider web-crawler

Last synced: 12 Jan 2026

https://github.com/danielemoraschi/sitemap-common

Simple PHP Sitemap generator and crawler library.

crawler php php-library php-sitemap-generator sitemap

Last synced: 11 Mar 2026

https://github.com/yuchenq/comp90055-project

This is the lastest version of my project belong to Comp90055.

couchdb crawler data-visualization python3 textblob tweepy

Last synced: 16 Jul 2025

https://github.com/ferru97/jsketchfabcrawler

jSketchfabCrawler is a java for the automatic crawling of model's information from sketchfab.com

crawler data database java sketchfab sql

Last synced: 03 Jan 2026

https://github.com/sgeisler/fishbones2epub

fetches the fishbones novel and outputs an epub

crawler ebook epub python-3-6

Last synced: 22 Mar 2025

https://github.com/andresayac/cuevana3

Cuevana3 scraper is a content provider of the latest in the world of movies and tv show in Latin Spanish dub or subtitled.

crawler cuevana3 php scraper

Last synced: 05 Apr 2025

https://github.com/raspi/scrapy-kuntavaalit2021-keskisuomalainen

Fetch Keskisuomalainen kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 26 Apr 2025

https://github.com/raspi/scrapy-kuntavaalit2021-sanoma

Fetch Sanoma kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 26 Apr 2025

https://github.com/raspi/scrapy-kuntavaalit2021-almamedia

Fetch Almamedia kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 26 Apr 2025

https://github.com/balintpethe/laravel-universal-scraper

Universal Scraper for Laravel

crawler laravel scraper web-scraper

Last synced: 13 Jan 2026

https://github.com/fmind/fincrawl

Crawl documents, metadata, and files from financial institutions

crawler documents finance python scrapy

Last synced: 30 Apr 2026

https://github.com/grayhat12/grawler

A web based Crawler that takes two inputs(search item, number of sites to search)and curently displays Readable Content in Text Format but the Code can be modified to display the HTML code.

crawler scraping scraping-websites scrapper scrapy-crawler

Last synced: 27 Mar 2025

https://github.com/basemax/crawler-news-currency-gold-coins

PHP Crawler to get Persian news related to currency coin and gold.

crawler crawler-php crawler-testing currency currency-exchange-rates gold php php-crawler

Last synced: 05 Jul 2025

https://github.com/amazingcoderpro/pythonup

玩转Python!for improving python skills

crawler python

Last synced: 19 May 2026

https://github.com/der3318/daily-pixiv

Integrated Flow - Line Notification of Top Ranked Pixiv Illustrations

crawler line-notify pixiv workflow

Last synced: 03 Mar 2025

https://github.com/guilhem/cachanais

Populate cache by crawling pages

cache crawler hacktoberfest

Last synced: 08 Apr 2025

https://github.com/shentengtu/cht-yp-crawler

Simple Crawler of www.iyp.com.tw.

crawler node-js nodejs yellow-pages yellowpages

Last synced: 09 May 2026

https://github.com/onetail/applenews

simple crawler

crawler simple

Last synced: 18 Mar 2025

https://github.com/seanghay/wpget

⚡️wpget - A tool for downloading all posts from a WordPress website via public JSON API

crawler wordpress wp-json

Last synced: 08 Feb 2026

https://github.com/orkan/tlc

Simple PHP/cURL/FlareSolverr framework with Logger, Cache and more!

crawler curl flaresolverr net scrap

Last synced: 27 Aug 2025

https://github.com/hoan02/novel-crawler

Tool cào dữ liệu truyện để phục vụ cho doctruyen.io.vn

crawler python

Last synced: 13 Mar 2025

https://github.com/constaf79/pycn

🔗 Simplify your cryptocurrency tasks with pycoin, a Python library providing essential utilities for Bitcoin and alt-coins, ensuring seamless transactions and operations.

cnc-machine cnc-milling-controller cnn cnn-model cnn-processors computer-vision crawler edge-detection fun image-classification image-processing library neural-network pillow pycnc python raspberry-pi web

Last synced: 14 May 2026

https://github.com/jonesrussell/north-cloud

A full-stack content intelligence pipeline that crawls, classifies, and routes news articles in real time for downstream consumers.

content crawler publisher

Last synced: 25 Jan 2026

https://github.com/massongit/ibaraki-univ-circle-crawler

Crawls official circles in Ibaraki University from university's website

crawler python

Last synced: 25 Mar 2025

https://github.com/w3labkr/ipynb-scraper

A collection of frequently used Jupiter notebook code.

crawler ipynb jupyter jupyter-notebook python scrapper

Last synced: 19 Apr 2026

https://github.com/hvtuananh/twitter_crawler

Daemon to call and get tweets from Twitter Public Stream API

crawler java streaming-api tweets twitter twitter-crawler

Last synced: 11 Mar 2025

https://github.com/cls1991/gank.io-go

A simple crawler for fetching pictures from http://gank.io, implemented in golang.

crawler gankio goquery pictures

Last synced: 27 Feb 2025

https://github.com/coding-dream/aspider

A spider run on Android Platform

crawler jsoup spider

Last synced: 24 Jun 2025

https://github.com/dasantonym/node-cesspoll

:poop: Turd Miner Node Module

crawler news poopetry potty-humour

Last synced: 28 Oct 2025

https://github.com/casoon/astro-crawler-policy

Policy-first crawler control for Astro — generates robots.txt and llms.txt with presets, per-bot rules, AI crawler registry, and build-time audits.

ai-crawler astro astro-integration crawler llms-txt robots-txt seo typescript

Last synced: 24 May 2026

https://github.com/lucasromualdo/glassdoorcrawler

Crawler em Python para coletar vagas do Glassdoor e exportar para Excel

cli crawler glassdoor openpyxl pandas python web-scraping

Last synced: 25 Feb 2026

https://github.com/ark930/douban-movie-crawler

豆瓣影评爬虫

crawler douban movie python

Last synced: 18 Mar 2025

https://github.com/zaneh/ocw-crawler

Crawl MIT OpenCourseWare courses with Kimurai. Not affiliated.

crawler kimurai mit ocw opencourseware spider

Last synced: 28 May 2026

https://github.com/ericc-ch/crawldown

Crawl websites and convert their pages into clean, readable Markdown content using Mozilla's Readability and Turndown.

crawler markdown scraper

Last synced: 05 Jul 2025

https://github.com/kettou/silentscraper

SilentScraper is a web scraping solution built with advanced stealth protocols. It operates undetectably in the background, bypassing anti-scraping mechanisms to collect structured data at scale. It's lightwight architecture mimics humans browsing patterns, rotating IP addresses, spoofing user agents, and more

beautifulsoup beautifulsoup4 crawler datastructures datastructures-algorithms python webautomation webscraper webscraping

Last synced: 23 Jul 2025

https://github.com/d7isme/pixiv-downloader-mod

Modded extension of the pixiv downloader on chrome webstore with premium feature unlocked.

chrome-extension crawler extension-chrome image pem pixiv pixiv-bot pixiv-crawler pixiv-downloader

Last synced: 14 May 2026

https://github.com/huyduc1602/uniapp-crawler

Crawl và Dịch tài liệu Uni-app

crawler docker python

Last synced: 25 Jan 2026

https://github.com/matheusfaustino/jazzmaster_crawler

It is a crawling for getting the audio programs from a specific radio program called Jazzmaster

crawler python scrapy

Last synced: 14 Jun 2025

https://github.com/jofaval/open-graph-visualizer

Web Scraping showcase of how crawlers retrieve site's details through the Open Graph Protocol

crawler javascript opengraph scraping web web-scraping

Last synced: 08 Sep 2025

https://github.com/qzcool/uscis-case-status-estimation-system-stat-ez

Estimates time of case results arrival, for applicants who are waiting for their USCIS case results with the receipt numbers at hand.

beautifulsoup crawler immigration web

Last synced: 16 Jun 2025

https://github.com/jul10l1r4/objetive

This software is a mini-crawler that aims to grab some text parts from some website or ip that responds http*

bigdata crawler data-science security-tools web

Last synced: 12 Aug 2025

https://github.com/mt4110/postal_converter_ja

High-performance Japanese Postal Code Converter & API. Auto-updating, DB-agnostic (MySQL/PostgreSQL), written in Rust & Next.js.日本郵便局のデータを自動更新機能付き、Rustの非同期クローリングシステム。最加速で最新の郵便番号データの更新化がされます。

api crawler docker mysql nextjs nix postgresql react rust

Last synced: 13 Feb 2026

https://github.com/truongdd03/searchengine

A search engine written in c++.

cpp crawler search search-engine

Last synced: 06 Apr 2025

https://github.com/onetail/crawler-with-kafka-docker

homework to crawler and anaylsis

analysis crawler kafka-docker

Last synced: 18 Mar 2025

https://github.com/zhou-chaoxian/ax-spider

A simple, powerful, and fast asynchronous Python crawler framework.

asyncio ax-spider crawler httpx python scrapy

Last synced: 18 Mar 2025

https://github.com/ekojs/web-crawler

Web Crawler untuk mengambil judul penelitian pada Google Scholar

crawler nodejs web-crawler

Last synced: 12 Apr 2026

https://github.com/licoy/win4000-images-crawler

基于scrapy爬取&下载win4000.com的图片壁纸

crawler python scraper

Last synced: 28 Mar 2025

https://github.com/kenanbek/tutorial-python-crawler

Crawling website data using Python with requests and Beautiful Soup libraries

beautifulsoup crawler crawling miner parser python python-requests requests

Last synced: 30 Mar 2025

https://github.com/dylancl/sitemap-crawler

Verify the status of each url in a (hosted) sitemap XML file.

crawler parser scraper sitemap xml

Last synced: 04 Oct 2025

https://github.com/waived/google-drive-crawler

Proxy-based crawler to expose public (shared) Google Drive links

crawler crawler-python file-crawler google-drive-api shared-folders web-spider

Last synced: 27 Mar 2025

https://github.com/yyj08070631/web-spider

一个网络蜘蛛

crawler spider webspider

Last synced: 11 Sep 2025

https://github.com/kasperomari/simplecrawlerapi

A simple RESTful API that takes a URL and returns all the links in a specific depth.

crawler flask-api flask-restful

Last synced: 02 Apr 2025

https://github.com/matheusfaustino/phrawl

Phrawl: A web crawling framework in PHP (or it seems so)

crawler crawling crawling-framework php scraper wip

Last synced: 08 Sep 2025

https://github.com/avsbharadwaj/web_crawler

A basic web crawler that prints out the links and description present on a website rescursively

crawler web

Last synced: 21 Apr 2026

https://github.com/kestarumper/imagecrawler

Downloads images from given URL

crawler image-downloader

Last synced: 28 Jun 2025

https://github.com/kofj/octopus

Octopus an open source software to collect data from web pages.

crawler

Last synced: 15 May 2026

https://github.com/azshurith/depth-crawler

A simple yet powerful Python web crawler that explores a given domain up to a specified depth and outputs a JSON sitemap of URLs and page titles.

crawler puppeteer python

Last synced: 20 Apr 2026

https://github.com/ecklf/reddit-clawler

A command-line tool written in Rust that crawls Reddit posts from a user or subreddit

cli crawler downloader downloader-for-reddit reddit

Last synced: 31 Mar 2025

https://github.com/sanskar107/c-subject-predictor

Predicts topic of a code.

crawler nlp rnn

Last synced: 14 Mar 2025

https://github.com/jmousqueton/check-broken-link

Multi-threaded Python tool for crawling and checking all internal links on a website, with live Rich dashboard, broken link export (CSV), and detailed source tracking.

check crawler error400 error404 error500 links

Last synced: 29 Aug 2025

https://github.com/tormol/zenphoto-dl

A script for recursively downloading all pictures from zenphoto-based photo albums.

crawler python-script

Last synced: 30 Aug 2025

https://github.com/lesterrry/campfire

Shock-drop watching utility

crawler parser web-crawler web-parser

Last synced: 13 Jun 2026

https://github.com/joaooliveirapro/trawlergo

TrawlerGo 🐛 is a basic HTTP crawler written in Go, designed to efficiently discover all URLs within a specified domain while capturing related HTTP request information.

crawler go golang http

Last synced: 09 Jun 2026

https://github.com/jlenon7/sef_automation

📑 Crawler that automatically enrol in open vacancies in SEF website.

athenna crawler esm nodejs playwright portugal residence sef typescript

Last synced: 03 Mar 2026

https://github.com/jonasrenault/pubchem-api-crawler

Python client for PubChem's API to crawl compounds and their properties using a molecular formula search query.

chemistry crawler molecular-formula pubchem python

Last synced: 15 May 2026

https://github.com/johanbook/node-web-crawler

Nodejs CLI for web crawling

cli crawler nodejs typescript

Last synced: 11 Apr 2026

https://github.com/diegojromerolopez/relwrac

A basic crawler developed with python and asyncio

asyncio crawler page-rank python

Last synced: 11 Nov 2025

https://github.com/jackfsuia/chats-crawler

Discourse chat data crawling and on-the-way parsing straight for LLM instruction finetuning. 论坛数据爬取和解析,直接用于对话微调。

crawler fine-tuning finetune-llm gpt html-css-javascript instruction-tuning llm llm-training llms nlp nlp-parsing parser

Last synced: 09 Jul 2025

https://github.com/moe131/webcrawler

Python web crawler designed to scrape websites

crawler crawling-python python python-crawler scraping simhash web-crawler

Last synced: 09 Apr 2025

https://github.com/timzatko/fiit-vinf-1

School project - data crawling, storing using ElasticSearch and visualisation.

angular crawler elasticsearch

Last synced: 16 Jan 2026

https://github.com/martincastroalvarez/web-to-pdf

Web crawlers using Python & Beautiful Soup

crawler python3 webcrawler

Last synced: 08 Apr 2025

https://github.com/jimut123/leaderbehaviour

Scrapy project to get and extract the names of Leaders, their misdeed by scraping news website!

crawler leaderbehaviour newsscraper scrapy timesofindia

Last synced: 16 Jan 2026

https://github.com/timchen10001/crawler-711-taiwan

Crawler for Python to scrapping updated informations of 711

711 crawler python python3 taiwan

Last synced: 27 Mar 2025

https://github.com/ismoreirakt/spyder

The web is changing. Spyder sees it.

alerts automation crawler monitor

Last synced: 01 Mar 2025

https://github.com/mnemocron/VPNNetworkShareCrawler

ugly scripts to connect a Raspberry Pi to a VPN and attach network share to periodically crawl the documents on it

crawler samba vpn

Last synced: 11 Mar 2025

https://github.com/pmuens/crawler

Multi-threaded Web crawler with support for custom fetching and persisting logic

crawler crawler-engine rust rust-lang web-crawler web-crawling

Last synced: 15 May 2025

https://github.com/iamkushvanth/real-time-data-analysis-using-kafka

In this project, you will execute an End-To-End Data Engineering Project on Real-Time Stock Market Data using Kafka. We are going to use different technologies such as Python, Amazon Web Services (AWS), Apache Kafka, Glue, Athena, and SQL.

athena aws aws-ec2 aws-s3 crawler glue kafka kafka-consumer python sql

Last synced: 18 Jun 2026

https://github.com/iamtonmoy0/sitemap-crawler

site map crawler with golang and goquery

crawler

Last synced: 23 Feb 2025

https://github.com/recepkizilarslan/console-tourist

Tourist is a simple tool that allows you to collect console messages, errors, unsuccessful requests of all your pages after the DOM loading with authentication support.

console-log crawler crawling crawling-tool error-monitoring error-reporting qa qa-automation qatools

Last synced: 24 Feb 2026

https://github.com/rayspock/go-web-crawler

A web crawler to fetch all the links from a given website via go routines.

concurrency crawler golang goroutine

Last synced: 10 Jun 2026

https://github.com/dominikrys/web-scraper

🎬 IMDB Web Scraper in Go

crawler go mongodb

Last synced: 14 Apr 2026