Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hudson-newey/user-web-crawler
The Archive.org Crawler works through volunteering users who install an extension on their browsers. When the user visits a webpage, the URL is anonymously added to the Archive.org database.
https://github.com/hudson-newey/user-web-crawler
archive crawler open-internet
Last synced: 4 days ago
JSON representation
The Archive.org Crawler works through volunteering users who install an extension on their browsers. When the user visits a webpage, the URL is anonymously added to the Archive.org database.
- Host: GitHub
- URL: https://github.com/hudson-newey/user-web-crawler
- Owner: hudson-newey
- License: unlicense
- Created: 2020-04-28T08:19:04.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2023-03-08T14:53:16.000Z (over 1 year ago)
- Last Synced: 2024-06-21T17:49:44.057Z (5 months ago)
- Topics: archive, crawler, open-internet
- Language: Go
- Homepage:
- Size: 26.4 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# User Web Crawler
The user web crawler is a website indexer that is built upon what websites a browser navigates to
## How it works
The user web crawler works through volunteering users who install an extension on their browsers. When the user visits a webpage, the URL is anonymously added to an upstream database that holds all the unique webpages.
_Note: There is currently no centralized database that the data is pushed to. To start logging data, you will need to setup your own backend service_## Usage
```sh
go run server.go
```Install the [Tampermonkey](https://github.com/Tampermonkey/tampermonkey) browser extension
Run the following Python3 script when you want to push your code to the upstream database
```sh
python3 ./commit.py
```