https://github.com/nh2/linux-bad-core-scheduling-investigation

Write-up of a Linux kernel bisection to find a scheduler bug that results in reduced performance
https://github.com/nh2/linux-bad-core-scheduling-investigation

Last synced: 3 months ago
JSON representation

Write-up of a Linux kernel bisection to find a scheduler bug that results in reduced performance

Host: GitHub
URL: https://github.com/nh2/linux-bad-core-scheduling-investigation
Owner: nh2
Created: 2016-04-15T00:55:05.000Z (about 9 years ago)
Default Branch: master
Last Pushed: 2016-04-16T00:49:50.000Z (about 9 years ago)
Last Synced: 2025-01-13T00:26:25.862Z (5 months ago)
Language: Python
Size: 38.1 KB
Stars: 2
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Bisecting a kernel scheduler regression

This is a write-up of me investigating the bug [Why does Linux's scheduler put two threads onto the same physical core on processors with HyperThreading?](http://stackoverflow.com/questions/29422073/why-does-linuxs-scheduler-put-two-threads-onto-the-same-physical-core-on-proces).

There's apparently a regression introduced between Linux v3.12
and v3.13:
On a machine with 2 cores and 4 virtual cores / HyperThreading (`Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz` in a Lenovo ThinkPad X220 in my case), Linux would *sometimes* (in around 30% of the cases schedule 2 busy (100% CPU) threads onto the *same* physical core, instead of scheduling them on different physical cores, resulting in reduced performance.

To reproduce, this command showed the difference in time when
the "bad" scheduling happened (run multiple times):

```
stress-ng -c2 --cpu-method ackermann --cpu-ops 10
```

Not all machines are affected by this bug, it seems to happen
only with specific processor models.

On the linked StackOverflow page, Greg Glockner [found](http://stackoverflow.com/questions/29422073/why-does-linuxs-scheduler-put-two-threads-onto-the-same-physical-core-on-proces#comment60808046_29422073) that this bug wasn't present in Ubuntu 12.04, but present in 14.04. He investigated further and told me that the must have been introduced between v3.12 and v3.13. I could confirm that on those two kernel versions.

So I bisected the kernel to find the exact commit that caused it.

## How you can help

I'd like to create a list of all CPUs on which this bug occurs / doesn't occur.

Please run `test.py` on your machines, and file an issue containing your finding (get your CPU info from `/proc/cpuinfo`; of course you have to test with a kernel >= 2.13). You may have to change the `relative_stdev` threshold if your machine doesn't have 1 CPU, 2 real cores and 4 virtual cores.

Bug present on:

* `Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz`

Bug not present on:

* `Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz`

## Making a classifier to tell whether the bug is present

As mentioned above, I could tell whether the bug is present by running `stress-ng -c2` a couple of times and looking at how quickly it finished.

I made a little Python script `test.py` that does this a couple of times on the machine in question, and inspects the relative standard deviation of the run durations; if it's large, the bug is present (due to the bi-modality of the distribution).

## Kernel setup

```
sudo adduser linux-bisect
touch /boot/kimage-test
sudo chown linux-bisect /boot/kimage-test
```

In `sudo visudo`, add an entry `linux-bisect ALL=NOPASSWD:/sbin/reboot` so that the user can reboot the machine without being asked for a password.

Setup grub config similar as described on http://moi.vonos.net/linux/bisecting-a-linux-kernel/, but since `sudo update-grub` wouldn't pick up my `50_test` file, I just edited `/boot/grub/grub.cfg` manually (see `grub-menuentry.txt`) and made sure to not run `update-grub` to not override it.

In the kernel tree:

```
git checkout v3.12
make localyesconfig
## Press Enter a lot (unfortunately one can't pipe `yes ""` into that)
cp .config ../linux-bad-core-scheduling-investigation/x220-kernel-config
```

We'll use that saved config as a starting point for `make oldconfig` for each commit that gets tested.

## First try, on the machine itself

```
cd /linux/linux # That's where I put the kernel on the machine itself
make dist-clean
cp .test-config .config
yes "" | make oldconfig
make EXTRAVERSION=-test -j5
cp arch/x86/boot/bzImage /boot/kimage-test
```

Reboot (selecting our custom grub entry) and check (`uname -a`) that the test kernel is running and everything works.

To make my machine boot, I had to create an intitrd; for that I used `sudo update-initramfs -u -k test`. I only did this once, and used the same initrd during the entire bisection, since reproducing the bug was independent of the initrd.

(Since my partitions are typically full-disk-encrypted, I had to remove those that would block automatic booting from `/etc/crypttab` and `/etc/fstab`, and also change `/etc/uswsusp.conf` and `/etc/initramfs-tools/conf.d/resume` to tell uswsusp that it should no longer try to boot from the encrypted, not-needed-for-the-test swap partition, and generate a new initrd after changing those files - otherwise I would keep getting prompts at boot, forbidding automatic bisection.)

## The bisection

Since we have to reboot to test a new one, our bisection script has to run on a different machine, compile the kernel there, copy it over via SSH, run the test and check the result.

Use `ssh-copy-id linux-bisect@targetmachine` so that the bisection script can copy the kernel over and run the classification script without prompting.

```
git bisect good v3.12
git bisect bad v3.13
git bisect run ~/src/linux-bad-core-scheduling-investigation/bisect-script.sh
```

While this process is fully automatic in theory, I had to watch it, because sometimes Linux wouldn't boot (which I could solve by restarting the machine manually, then it just worked), or wouldn't shut down (which I could solve by force-resetting the machine while the script was waiting for it to come up -- I used Ctrl-Z to get some more time for doing that).

Note that my `bisect-script.sh` isn't perfect, e.g. it doesn't check whether the compilation or remote execution failed - to be really proper, it should also use exit code 125 to `git bisect skip` non-compiling commits, and exit code -1 to abort when the other machine can't be reached -- but luckily, all bisected commits compiled fine and I dealt with non-booting issues on the fly as described above.

## Result

Bad commit is:
`37dc6b50cee97954c4e6edcd5b1fa614b76038ee - sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus`
When reverting it on top of `v3.13`, a line from this commit conflicts:
`5d4cf996`.
**That commit indicates that somebody found a performance problem, which they partially fixed but acknowledge "there may be an additional bug or a common root cause".** Maybe what I found is what they were looking for?
I resolved the conflict with https://github.com/nh2/linux/commit/627fb019acaa245768980f90a65449e6b2774e59.

## Next

I'll report this bug to LKML to find out if it can be fixed in recent kernels.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nh2/linux-bad-core-scheduling-investigation

Awesome Lists containing this project

README