Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/hudson-newey/user-web-crawler

The Archive.org Crawler works through volunteering users who install an extension on their browsers. When the user visits a webpage, the URL is anonymously added to the Archive.org database.
https://github.com/hudson-newey/user-web-crawler

archive crawler open-internet

Last synced: about 2 months ago
JSON representation

The Archive.org Crawler works through volunteering users who install an extension on their browsers. When the user visits a webpage, the URL is anonymously added to the Archive.org database.

Host: GitHub
URL: https://github.com/hudson-newey/user-web-crawler
Owner: hudson-newey
License: unlicense
Created: 2020-04-28T08:19:04.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2023-03-08T14:53:16.000Z (almost 2 years ago)
Last Synced: 2024-06-21T17:49:44.057Z (7 months ago)
Topics: archive, crawler, open-internet
Language: Go
Homepage:
Size: 26.4 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# User Web Crawler

The user web crawler is a website indexer that is built upon what websites a browser navigates to

## How it works

The user web crawler works through volunteering users who install an extension on their browsers. When the user visits a webpage, the URL is anonymously added to an upstream database that holds all the unique webpages.
_Note: There is currently no centralized database that the data is pushed to. To start logging data, you will need to setup your own backend service_

## Usage

```sh
go run server.go
```

Install the [Tampermonkey](https://github.com/Tampermonkey/tampermonkey) browser extension

Run the following Python3 script when you want to push your code to the upstream database

```sh
python3 ./commit.py
```