https://github.com/amd/node-scraper
https://github.com/amd/node-scraper
Last synced: 5 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/amd/node-scraper
- Owner: amd
- License: mit
- Created: 2025-06-05T16:12:11.000Z (about 1 year ago)
- Default Branch: development
- Last Pushed: 2025-06-25T13:43:19.000Z (12 months ago)
- Last Synced: 2025-06-25T14:43:41.441Z (12 months ago)
- Language: Python
- Size: 390 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
README
# Node Scraper
Node Scraper is a tool which performs automated data collection and analysis for the purposes of
system debug.
## Table of Contents
- [Installation](#installation)
- [Install From Source](#install-from-source)
- [CLI Usage](#cli-usage)
- [Execution Methods](#execution-methods)
- [Example: Remote Execution](#example-remote-execution)
- [Example: connection_config.json](#example-connection_configjson)
- [Subcommands](#subcommands)
- ['describe' subcommand](#describe-subcommand)
- ['run-plugins' sub command](#run-plugins-sub-command)
- ['gen-plugin-config' sub command](#gen-plugin-config-sub-command)
- ['summary' sub command](#summary-sub-command)
- [Configs](#configs)
- [Global args](#global-args)
- [Plugin config: `--plugin-configs` command](#plugin-config---plugin-configs-command)
- [Reference config: `gen-reference-config` command](#reference-config-gen-reference-config-command)
- **Extending Node Scraper (integration & external plugins)** → See [EXTENDING.md](EXTENDING.md)
- **Full view of the plugins with the associated collectors & analyzers as well as the commands
invoked by collectors** -> See [docs/PLUGIN_DOC.md](docs/PLUGIN_DOC.md)
## Installation
### Install From Source
Node Scraper requires Python 3.9+ for installation. After cloning this repository,
call dev-setup.sh script with 'source'. This script creates an editable install of Node Scraper in
a python virtual environment and also configures the pre-commit hooks for the project.
```sh
source dev-setup.sh
```
Alternatively, follow these manual steps:
### 1. Virtual Environment (Optional)
```sh
python3 -m venv venv
source venv/bin/activate
```
On Debian/Ubuntu, you may need: `sudo apt install python3-venv`
### 2. Install from Source (Required)
```sh
python3 -m pip install --editable .[dev] --upgrade
```
This installs Node Scraper in editable mode with development dependencies. To verify: `node-scraper --help`
### 3. Git Hooks (Optional)
```sh
pre-commit install
```
Sets up pre-commit hooks for code quality checks. On Debian/Ubuntu, you may need: `sudo apt install pre-commit`
## CLI Usage
The Node Scraper CLI can be used to run Node Scraper plugins on a target system. The following CLI
options are available:
```sh
usage: node-scraper [-h] [--sys-name STRING] [--sys-location {LOCAL,REMOTE}] [--sys-interaction-level {PASSIVE,INTERACTIVE,DISRUPTIVE}] [--sys-sku STRING]
[--sys-platform STRING] [--plugin-configs [STRING ...]] [--system-config STRING] [--connection-config STRING] [--log-path STRING]
[--log-level {CRITICAL,FATAL,ERROR,WARN,WARNING,INFO,DEBUG,NOTSET}] [--gen-reference-config] [--skip-sudo]
{summary,run-plugins,describe,gen-plugin-config} ...
node scraper CLI
positional arguments:
{summary,run-plugins,describe,gen-plugin-config}
Subcommands
summary Generates summary csv file
run-plugins Run a series of plugins
describe Display details on a built-in config or plugin
gen-plugin-config Generate a config for a plugin or list of plugins
options:
-h, --help show this help message and exit
--sys-name STRING System name (default: )
--sys-location {LOCAL,REMOTE}
Location of target system (default: LOCAL)
--sys-interaction-level {PASSIVE,INTERACTIVE,DISRUPTIVE}
Specify system interaction level, used to determine the type of actions that plugins can perform (default: INTERACTIVE)
--sys-sku STRING Manually specify SKU of system (default: None)
--sys-platform STRING
Specify system platform (default: None)
--plugin-configs [STRING ...]
built-in config names or paths to plugin config JSONs. Available built-in configs: NodeStatus (default: None)
--system-config STRING
Path to system config json (default: None)
--connection-config STRING
Path to connection config json (default: None)
--log-path STRING Specifies local path for node scraper logs, use 'None' to disable logging (default: .)
--log-level {CRITICAL,FATAL,ERROR,WARN,WARNING,INFO,DEBUG,NOTSET}
Change python log level (default: INFO)
--gen-reference-config
Generate reference config from system. Writes to ./reference_config.json. (default: False)
--skip-sudo Skip plugins that require sudo permissions (default: False)
```
### Execution Methods
Node Scraper can operate in two modes: LOCAL and REMOTE, determined by the `--sys-location` argument.
- **LOCAL** (default): Node Scraper is installed and run directly on the target system. All data collection and plugin execution occur locally.
- **REMOTE**: Node Scraper runs on your local machine but targets a remote system over SSH. In this mode, Node Scraper does not need to be installed on the remote system; all commands are executed remotely via SSH.
To use remote execution, specify `--sys-location REMOTE` and provide a connection configuration file with `--connection-config`.
#### Example: Remote Execution
```sh
node-scraper --sys-name --sys-location REMOTE --connection-config ./connection_config.json run-plugins DmesgPlugin
```
##### Example: connection_config.json
```json
{
"InBandConnectionManager": {
"hostname": "remote_host.example.com",
"port": 22,
"username": "myuser",
"password": "mypassword",
"key_filename": "/path/to/private/key"
}
}
```
**Notes:**
- If using SSH keys, specify `key_filename` instead of `password`.
- The remote user must have permissions to run the requested plugins and access required files. If needed, use the `--skip-sudo` argument to skip plugins requiring sudo.
### Subcommands
Plugins to run can be specified in two ways, using a plugin JSON config file or using the
'run-plugins' sub command. These two options are not mutually exclusive and can be used together.
#### **'describe' subcommand**
You can use the `describe` subcommand to display details about built-in configs or plugins.
List all built-in configs:
```sh
node-scraper describe config
```
Show details for a specific built-in config
```sh
node-scraper describe config
```
List all available plugins**
```sh
node-scraper describe plugin
```
Show details for a specific plugin
```sh
node-scraper describe plugin
```
#### **'run-plugins' sub command**
The plugins to run and their associated arguments can also be specified directly on the CLI using
the 'run-plugins' sub-command. Using this sub-command you can specify a plugin name followed by
the arguments for that particular plugin. Multiple plugins can be specified at once.
You can view the available arguments for a particular plugin by running
`node-scraper run-plugins -h`:
```sh
usage: node-scraper run-plugins BiosPlugin [-h] [--collection {True,False}] [--analysis {True,False}] [--system-interaction-level STRING]
[--data STRING] [--exp-bios-version [STRING ...]] [--regex-match {True,False}]
options:
-h, --help show this help message and exit
--collection {True,False}
--analysis {True,False}
--system-interaction-level STRING
--data STRING
--exp-bios-version [STRING ...]
--regex-match {True,False}
```
Examples
Run a single plugin
```sh
node-scraper run-plugins BiosPlugin --exp-bios-version TestBios123
```
Run multiple plugins
```sh
node-scraper run-plugins BiosPlugin --exp-bios-version TestBios123 RocmPlugin --exp-rocm TestRocm123
```
Run plugins without specifying args (plugin defaults will be used)
```sh
node-scraper run-plugins BiosPlugin RocmPlugin
```
Use plugin configs and 'run-plugins'
```sh
node-scraper run-plugins BiosPlugin
```
#### **'gen-plugin-config' sub command**
The 'gen-plugin-config' sub command can be used to generate a plugin config JSON file for a plugin
or list of plugins that can then be customized. Plugin arguments which have default values will be
prepopulated in the JSON file, arguments without default values will have a value of 'null'.
Examples
Generate a config for the DmesgPlugin:
```sh
node-scraper gen-plugin-config --plugins DmesgPlugin
```
This would produce the following config:
```json
{
"global_args": {},
"plugins": {
"DmesgPlugin": {
"collection": true,
"analysis": true,
"system_interaction_level": "INTERACTIVE",
"data": null,
"analysis_args": {
"analysis_range_start": null,
"analysis_range_end": null,
"check_unknown_dmesg_errors": true,
"exclude_category": null,
"interval_to_collapse_event": 60,
"num_timestamps": 3
}
}
},
"result_collators": {}
}
```
**Running DmesgPlugin with a dmesg log file:**
Instead of collecting dmesg from the system, you can analyze a pre-existing dmesg log file using the `--data` argument:
```sh
node-scraper --run-plugins DmesgPlugin --data /path/to/dmesg.log --collection False
```
This will skip the collection phase and directly analyze the provided dmesg.log file.
**Custom Error Regex Example:**
You can extend the built-in error detection with custom regex patterns. Create a config file with custom error patterns:
```json
{
"global_args": {},
"plugins": {
"DmesgPlugin": {
"analysis_args": {
"check_unknown_dmesg_errors": false,
"interval_to_collapse_event": 60,
"num_timestamps": 3,
"error_regex": [
{
"regex": "MY_CUSTOM_ERROR.*",
"message": "My Custom Error Detected",
"event_category": "SW_DRIVER",
"event_priority": 3
},
{
"regex": "APPLICATION_CRASH: .*",
"message": "Application Crash",
"event_category": "SW_DRIVER",
"event_priority": 4
}
]
}
}
},
"result_collators": {}
}
```
Save this to `dmesg_custom_config.json` and run:
```sh
node-scraper --plugin-configs dmesg_custom_config.json run-plugins DmesgPlugin
```
#### **'summary' sub command**
The 'summary' subcommand can be used to combine results from multiple runs of node-scraper to a
single summary.csv file. Sample run:
```sh
node-scraper summary --search-path /
```
This will generate a new file '//summary.csv' file. This file will
contain the results from all 'nodescraper.csv' files from '/'.
### Configs
A plugin JSON config should follow the structure of the plugin config model defined here.
The globals field is a dictionary of global key-value pairs; values in globals will be passed to
any plugin that supports the corresponding key. The plugins field should be a dictionary mapping
plugin names to sub-dictionaries of plugin arguments. Lastly, the result_collators attribute is
used to define result collator classes that will be run on the plugin results. By default, the CLI
adds the TableSummary result collator, which prints a summary of each plugin’s results in a
tabular format to the console.
```json
{
"globals_args": {},
"plugins": {
"BiosPlugin": {
"analysis_args": {
"exp_bios_version": "TestBios123"
}
},
"RocmPlugin": {
"analysis_args": {
"exp_rocm_version": "TestRocm123"
}
}
}
}
```
#### Global args
Global args can be used to skip sudo plugins or enable/disble either collection or analysis.
Below is an example that skips sudo requiring plugins and disables analysis.
```json
"global_args": {
"collection_args": {
"skip_sudo" : 1
},
"collection" : 1,
"analysis" : 0
},
```
#### Plugin config: **'--plugin-configs' command**
A plugin config can be used to compare the system data against the config specifications:
```sh
node-scraper --plugin-configs plugin_config.json
```
Here is an example of a comprehensive plugin config that specifies analyzer args for each plugin:
```json
{
"global_args": {},
"plugins": {
"BiosPlugin": {
"analysis_args": {
"exp_bios_version": "3.5"
}
},
"CmdlinePlugin": {
"analysis_args": {
"cmdline": "imgurl=test NODE=nodename selinux=0 serial console=ttyS1,115200 console=tty0",
"required_cmdline" : "selinux=0"
}
},
"DkmsPlugin": {
"analysis_args": {
"dkms_status": "amdgpu/6.11",
"dkms_version" : "dkms-3.1",
"regex_match" : true
}
},
"KernelPlugin": {
"analysis_args": {
"exp_kernel": "5.11-generic"
}
},
"OsPlugin": {
"analysis_args": {
"exp_os": "Ubuntu 22.04.2 LTS"
}
},
"PackagePlugin": {
"analysis_args": {
"exp_package_ver": {
"gcc": "11.4.0"
},
"regex_match": false
}
},
"RocmPlugin": {
"analysis_args": {
"exp_rocm": "6.5"
}
}
},
"result_collators": {},
"name": "plugin_config",
"desc": "My golden config"
}
```
#### Reference config: **'gen-reference-config' command**
This command can be used to generate a reference config that is populated with current system
configurations. Plugins that use analyzer args (where applicable) will be populated with system
data.
Sample command:
```sh
node-scraper --gen-reference-config run-plugins BiosPlugin OsPlugin
```
This will generate the following config:
```json
{
"global_args": {},
"plugins": {
"BiosPlugin": {
"analysis_args": {
"exp_bios_version": [
"M17"
],
"regex_match": false
}
},
"OsPlugin": {
"analysis_args": {
"exp_os": [
"8.10"
],
"exact_match": true
}
}
},
"result_collators": {}
```
This config can later be used on a different platform for comparison, using the steps at #2:
```sh
node-scraper --plugin-configs reference_config.json
```
An alternate way to generate a reference config is by using log files from a previous run. The
example below uses log files from 'scraper_logs_/':
```sh
node-scraper gen-plugin-config --gen-reference-config-from-logs scraper_logs_/ --output-path custom_output_dir
```
This will generate a reference config that includes plugins with logged results in
'scraper_log_' and save the new config to 'custom_output_dir/reference_config.json'.