https://github.com/notreeceharris/www-crawler
🕷️⚡ a lighting fast web crawler, designed to crawl the entire internet.
https://github.com/notreeceharris/www-crawler
indexer internet-crawler webcrawler
Last synced: 8 months ago
JSON representation
🕷️⚡ a lighting fast web crawler, designed to crawl the entire internet.
- Host: GitHub
- URL: https://github.com/notreeceharris/www-crawler
- Owner: NotReeceHarris
- License: gpl-3.0
- Created: 2024-06-03T15:48:03.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-06-05T16:01:01.000Z (about 2 years ago)
- Last Synced: 2025-03-01T17:12:00.219Z (over 1 year ago)
- Topics: indexer, internet-crawler, webcrawler
- Language: Go
- Homepage:
- Size: 98.6 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 🕷️ www-crawler
a lighting fast web ⚡ crawler, designed to crawl the entire internet.
# Setup and Build
To build from source, you will need to have Go installed. If you use the pre-built binaries, this step is not necessary.
```bash
# Individual steps
$ git clone https://github.com/NotReeceHarris/www-crawler
$ cd www-crawler
$ go build -o crawler ./src/.
$ ./crawler
# One command
$ git clone https://github.com/NotReeceHarris/www-crawler && cd www-crawler && go build -o crawler ./src/. && ./crawler
```
On Windows, you will need 64-bit GCC. Additionally, use the flag **`CGO_ENABLED=1`** when building.
# Database structure
```db
Table domains {
id integer [primary key]
domain TEXT
}
Table paths {
id integer [primary key]
domain integer
path text
secure bool
httpCode text
scanned bool
onHold bool
}
Table links {
id integer [primary key]
parent integer
child integer
}
Table emails {
id integer [primary key]
email integer
path integer
}
Ref: paths.domain > domains.id
Ref: emails.path > paths.id
Ref: links.parent > paths.id
Ref: links.child > paths.id
```
https://dbdiagram.io/d/665ed3e4b65d9338797257df