https://github.com/slmt/google-load-parser
A parser to extract the CPU usage timeline from Google's cluster-usage trace data.
https://github.com/slmt/google-load-parser
cluster google monitoring parser resources
Last synced: 8 months ago
JSON representation
A parser to extract the CPU usage timeline from Google's cluster-usage trace data.
- Host: GitHub
- URL: https://github.com/slmt/google-load-parser
- Owner: SLMT
- License: mit
- Created: 2019-04-01T08:11:13.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2022-09-24T09:28:56.000Z (over 3 years ago)
- Last Synced: 2025-08-25T11:59:42.092Z (9 months ago)
- Topics: cluster, google, monitoring, parser, resources
- Language: Rust
- Size: 14.6 KB
- Stars: 1
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Google Load Parser
This program is designed to parse and transfer the resource usage files downloaded from [Google cluster-usage traces][1] into the timeline sequences of CPU usage for each machine. Note that we only need the files in `task_usage` directory of the trace.
## Requirements
In order to build the source, you need `cargo`, which can be downloaded via [rustup][2], the Rust toolchain installer.
## Building
Use the following command to build the program. We recommend to build the program in `RELEASE` mode, which makes the program run much faster.
```bash
> cargo build --release
```
Then, you will find the executable `google-load-parser.exe` in `target/release`.
## Usage
```txt
google-load-parser v1.1.0
SLMT
Parse the load trace of Google's testing cluster
USAGE:
google-load-parser.exe [SUBCOMMAND]
FLAGS:
-h, --help Prints help information
-V, --version Prints version information
SUBCOMMANDS:
help Prints this message or the help of the given subcommand(s)
transfer Transfers the files in the given directory to daily timeline files
trim Trims the files in the given directory to leave only necessary data
```
You can add `RUST_LOG=DEBUG` in front of the command to show debugging information.
```bash
> RUST_LOG=DEBUG google-load-parser.exe [SUBCOMMAND]
```
This program currently supports two sub commands:
- `trim`
- `transfer`
### Trimming
```txt
Trims the files in the given directory to leave only necessary data
USAGE:
google-load-parser.exe trim
FLAGS:
-h, --help Prints help information
-V, --version Prints version information
ARGS:
the directory containing input files
the directory for placing output files
```
Since the downloaded files are compressed as `gz` files, we need to decompress the files before processing them. This command decompresses the `gz` files in the given `[INPUT DIR]` directory and trims unnecessary information from the files such that the output files in `[OUTPUT DIR]` have only the necessary information we need.
An example of the content of one of decompressed file:
```csv
5612000000,5700000000,4665712499,369,4820204869,0.03143,0.05389,0.06946,0.005997,0.006645,0.05408,7.629e-05,0.0003834,0.2415,0.002571,2.911,,0,0,0.02457
5612000000,5700000000,4665712499,369,4820204869,0.03143,0.05389,0.06946,0.005997,0.006645,0.05408,7.629e-05,0.0003834,0.2415,0.002571,2.911,,0,0,0.02457
5612000000,5700000000,4665712499,798,3349189123,0.02698,0.06714,0.07715,0.004219,0.004868,0.06726,7.915e-05,0.0003681,0.27,0.00293,3.285,0.008261,0,0,0.01608
...
```
After performing the trimming:
```csv
5612000000,5700000000,3349189123,0.02698
5612000000,5700000000,372630265,0.04114
5612000000,5700000000,1437119225,0.07275
...
```
This can greatly reduces the size of files.
### Transferring
```txt
Transfers the files in the given directory to daily timeline files
USAGE:
google-load-parser.exe transfer [SLOT LENGTH]
FLAGS:
-h, --help Prints help information
-V, --version Prints version information
ARGS:
the directory containing input files
the directory for placing output files
the length of time slot (in seconds) [default: 60]
```
This command processes the trimmed files in the given `[INPUT DIR]` directory into the daily timeline files, where each row represents the changing of CPU usage of a machine in the form of timeline in a day. You can specify the length of a time slot in seconds, which defines the sample rate of the timeline.
Note that the program maps machine ids from its original domain into [1~15000] in order to save the size of the files.
## License
MIT
[1]: https://github.com/google/cluster-data
[2]: https://rustup.rs/