https://github.com/masoncl/rsched
Kernel process scheduler statistics
https://github.com/masoncl/rsched
Last synced: 8 months ago
JSON representation
Kernel process scheduler statistics
- Host: GitHub
- URL: https://github.com/masoncl/rsched
- Owner: masoncl
- License: gpl-2.0
- Created: 2025-06-16T13:23:48.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-06-18T21:19:13.000Z (9 months ago)
- Last Synced: 2025-06-18T22:26:28.388Z (9 months ago)
- Language: C
- Size: 787 KB
- Stars: 1
- Watchers: 0
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-repositories - masoncl/rsched - Kernel process scheduler statistics (C)
README
# rsched - Rust Scheduling Delay Tracker
A BPF-based tool written in Rust that tracks kernel metrics related to
scheduling. The goal is to give a high level overview of kernel process
scheduling decisions with relatively low overhead.
## Overview
rsched uses tracepoints and performance counters to monitor:
- **Scheduling delays**: Time spent waiting in the runqueue
- **Time slice analysis**: Duration and preemption statistics
- **Runqueue depth**: Number of tasks waiting when processes are woken
- **Wakeup delays**: Time from sched_waking to sched_switch
- schedstats: via /proc/schedstats
- CPU cycles and instructions: both user and kernel time
## Prerequisites
- Linux kernel 5.4+ with BPF support
- Rust toolchain
- libbpf development files
- Root privileges (required for BPF)
## Installation
```bash
# Install dependencies
libbpf-dev clang llvm
# Clone and build
git clone
cd rsched
cargo build --release
```
### Generate vmlinux.h
rsched ships with a vmlinux.h from libbpf-tools. If this is out of date, you
might need to regenerate:
```bash
bpftool btf dump file /sys/kernel/btf/vmlinux format c > src/bpf/vmlinux.h
```
## Usage
```bash
sudo ./target/release/rsched [OPTIONS]
```
### Options
- `-d, --detailed`: Show detailed per-process and per-CPU output
- `-i, --interval `: Update interval (default: 1)
- `-r, --run-time `: Run for specified duration then exit
- `-c, --comm `: Filter by command name regex
- `-p, --pid `: Filter by specific PID
- `-l, --min-latency `: Minimum latency threshold
- `-C, --no-collapse`: Don't collapse/aggregate by command name
- `-w, --trace-sched-waking`: Trace sched_waking events (adds overhead)
- `-s, --schedstat`: Enable schedstats and output them
- `-m, --cpu-metrics`: report on cycles and IPC per process
### Examples
```bash
# Basic usage - shows grouped output
sudo ./target/release/rsched
# Run for 30 seconds then exit
sudo ./target/release/rsched -r 30
# Show detailed per-process statistics
sudo ./target/release/rsched -d
# Filter for processes with >100μs latency
sudo ./target/release/rsched -l 100
# Track only processes matching "schbench"
sudo ./target/release/rsched -c "schbench.*"
# Show all pids from schbench worker processes into one entry
sudo ./target/release/rsched -C -c "schbench.*"
# Enable waking delay tracking for 60 seconds
sudo ./target/release/rsched -w -r 60
```
## Output Interpretation
### Scheduling Delays
Shows time processes spend in the runqueue waiting to be scheduled:
- **p50/p90/p99**: Percentile latencies in microseconds
- **COUNT**: Number of scheduling events
- Grouped by latency ranges in default mode
```
High (1-10ms) (1 entries):
COMMAND PROCS p50 p90 p99 COUNT PIDs
schbench-msg 4 5 1446 3879 43026 2190965,2190966,2190967,...(+1)
Very High (>10ms) (2 entries):
COMMAND PROCS p50 p90 p99 COUNT PIDs
schbench-worker 1024 3479 20397 62750 483127 2190969,2190970,2190971,...(+1021)
schbench 1 9 14563 24576 34 2190964
```
### Time Slice Statistics
Tracks how long processes run before context switching:
- **VOLUNTARY**: Process yielded CPU voluntarily, along with p50 and p90 time slice length
- **PREEMPTED**: Process was forcibly preempted, with p50 and p90 time slice length
- **PREEMPT%**: Percentage of involuntary switches
```
COMMAND PROCS INVOL_COUNT VOLUNTARY(p50/p90) PREEMPTED(p50/p90) PREEMPT%
schbench-msg 4 413 - 94307/123612 100.0 %
schbench-worker 1024 386 5/9 14/19 0.0 %
schbench 1 0 18/50 - 0.0 %
```
### Per-process sleep duration
Time spent sleeping between sched_switch and sched_wakeup
```
Medium (100μs-1ms) (1 entries):
COMMAND PROCS p50 p90 p99 COUNT
schbench-worker 1024 215 384 453 46712685
Very High (>10ms) (1 entries):
COMMAND PROCS p50 p90 p99 COUNT
schbench 1 757304 990321 990321 9
```
### Runqueue Depth
Shows `rq->nr_running` when each process was woken
```
COMMAND PROCS p50 p90 p99 COUNT
schbench-worker 1024 1 1 2 93226439
schbench 1 1 1 1 20
```
### schedstats
schedstat deltas are per-interval -i N
```
=== System-wide Schedstat Metrics (deltas) ===
alb_count 3 | lb_hot_gained_idle 0 | lb_nobusyg_idle 290796
alb_failed 0 | lb_hot_gained_newly_idle 0 | lb_nobusyg_newly_idle 8625
alb_pushed 3 | lb_hot_gained_not_idle 0 | lb_nobusyg_not_idle 329
lb_balance_idle 302433 | lb_imbalance_load_idle 0 | lb_nobusyq_idle 32
lb_balance_newly_idle 8984 | lb_imbalance_load_newly_idle 0 | lb_nobusyq_newly_idle 99
lb_balance_not_idle 344 | lb_imbalance_load_not_idle 0 | lb_nobusyq_not_idle 0
lb_count_idle 303118 | lb_imbalance_misfit_idle 0 | sbe_balanced 0
lb_count_newly_idle 12249 | lb_imbalance_misfit_newly_idle 0 | sbe_cnt 0
lb_count_not_idle 344 | lb_imbalance_misfit_not_idle 0 | sbe_pushed 0
lb_failed_idle 535 | lb_imbalance_task_idle 2328 | sbf_balanced 0
lb_failed_newly_idle 3110 | lb_imbalance_task_newly_idle 39145 | sbf_cnt 0
lb_failed_not_idle 0 | lb_imbalance_task_not_idle 0 | sbf_pushed 0
lb_gained_idle 159 | lb_imbalance_util_idle 0 | ttwu_move_affine 3263
lb_gained_newly_idle 159 | lb_imbalance_util_newly_idle 0 | ttwu_move_balance 0
lb_gained_not_idle 0 | lb_imbalance_util_not_idle 0 | ttwu_wake_remote 22703335
=== CPU Field Totals (deltas) ===
yld_count 2329 | sched_count 0 | sched_goidle 45296934
ttwu_count 22544931 | ttwu_local 22729275 | rq_cpu_time 25947
rq_run_delay usec 4 | rq_pcount 21582151769 |
```
### Per-CPU idle duration
```
Very Short (<100μs): CPUs 4-15,126-141
Aggregate: p50=6 p90=16 p99=39 count=26704057
Short (100μs-1ms): CPUs 16-125,142-251
Aggregate: p50=73 p90=270 p99=433 count=19646721
```
### Per-CPU Scheduling Delay Statistics
```
Very Low (<10μs): CPUs 0-251
Aggregate: p50=5 p90=9 p99=9 count=46512458
```
### CPU Performance Counters
Cycles per second
```
Global: User 5628.1M cycles/sec (IPC: 0.27), Kernel 24109.4M cycles/sec (IPC: 1.39)
CPU Performance by Usage Group (cycles are per timeslice):
Medium (100M-1G cycles p99) (1 entries):
COMMAND PROCS USER CYC(p50/p99) KERN CYC(p50/p99) U-IPC K-IPC PIDs
schbench-msg 4 24.4M/33.3M 192.2M/266.6M 0.12 0.71 3658281,3658282,3658283,...(+1)
Very Low (<10M cycles p99) (2 entries):
COMMAND PROCS USER CYC(p50/p99) KERN CYC(p50/p99) U-IPC K-IPC PIDs
schbench 1 52.4K/65.5K 14.3K/65.5K 3.49 3.32 3658280
schbench-worker 1024 1.5K/3.9K 6.1K/11.5K 0.30 1.64 3658285,3658286,3658287,...(+1021)
```
## Performance Notes
- Minimal overhead in default mode
- Enabling `-w` (sched_waking) can make this worse
## License
GPL-2.0 (with code from the Linux kernel)