https://github.com/farghul/googlebot
Discover if Googlebots are dominating your NGINX logs.
https://github.com/farghul/googlebot
googlebot nginx rust
Last synced: about 2 months ago
JSON representation
Discover if Googlebots are dominating your NGINX logs.
- Host: GitHub
- URL: https://github.com/farghul/googlebot
- Owner: farghul
- License: unlicense
- Created: 2024-07-02T22:36:22.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-12-16T01:10:03.000Z (over 1 year ago)
- Last Synced: 2025-06-04T20:56:57.856Z (about 1 year ago)
- Topics: googlebot, nginx, rust
- Language: Rust
- Homepage:
- Size: 36.1 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# Googlebot Finder
Googlebot Finder downloads, unzips, filters, and analyzes log files to highlight NGINX requests from Googlebots.

## Prerequisites
Variables declared in a `tasks/vars.rs` file:
- SERVERS: Array of applicable servers.
- IDENTITY: SSH credentials plus folder path ( ex. username@server:/folder/ ).
- PREFIX: Base path to store all files.
- TARGET: The site url to investigate.
## Run
Navigate to the folder containing your *src* folder and run:
``` zsh
./googlebot [task] [month]
```
## Example
``` zsh
./googlebot filter july
```
Available tasks:
- **download**: Download the zipped (.gz) log files from the named server.
- **unzip**: Decompress the .gz files previously downloaded.
- **filter**: Create a file only containg hits to the target site.
- **divide**: Divide the filtered file into one file containing googlebot hits and another containing everything else.
- **capture**: Capture all existing search strings.
- **analyze**: Discover if search strings are repeated.
**Note**: Tasks depend on a *PREFIX/server_name/type/month* file structure ( ex. ~/iss/unzipped/june/), and the assumption that compressed log files have an *nginx_access.log-20230922.gz* naming scheme.
## License
Code is distributed under [The Unlicense](https://github.com/farghul/googlebot/blob/main/LICENSE.md) and is part of the Public Domain.