https://github.com/devramsean0/wikipedia-crawler

A robots.txt respecting web crawler to track how big the Wikipedia network of linked domains goes.
https://github.com/devramsean0/wikipedia-crawler

wikipedia-scraper

Last synced: 10 months ago
JSON representation

A robots.txt respecting web crawler to track how big the Wikipedia network of linked domains goes.

Host: GitHub
URL: https://github.com/devramsean0/wikipedia-crawler
Owner: devramsean0
License: mit
Created: 2023-10-04T11:26:13.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2025-08-19T13:27:02.000Z (10 months ago)
Last Synced: 2025-08-19T15:29:56.080Z (10 months ago)
Topics: wikipedia-scraper
Language: TypeScript
Homepage:
Size: 42 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 11
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# wikipedia-crawler
This is a robots.txt compliant crawler to try and find out how deep the network of links and pages that stem from "https://wikipedia.org/"

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/devramsean0/wikipedia-crawler

Awesome Lists containing this project

README