Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dasj/emergency-kexec
Kexec into an in-memory emergency system
https://github.com/dasj/emergency-kexec
emergency kexec linux nix nixos recovery wcgw
Last synced: 10 days ago
JSON representation
Kexec into an in-memory emergency system
- Host: GitHub
- URL: https://github.com/dasj/emergency-kexec
- Owner: dasJ
- License: lgpl-3.0
- Created: 2019-03-02T17:33:05.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2022-03-01T10:01:37.000Z (over 2 years ago)
- Last Synced: 2024-10-11T23:36:16.794Z (26 days ago)
- Topics: emergency, kexec, linux, nix, nixos, recovery, wcgw
- Language: Nix
- Size: 18.6 KB
- Stars: 30
- Watchers: 4
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# emergency-kexec
Okay, your system is completely broken, and you need to umount `/` or something like that.
What do you do?## Motivation
One of our servers had a broken root filesystem (btrfs, don't judge me).
Online recovery was not possible, so the filesystem needed to be unmounted which is not possible for the root fs.
Additionally, as errors were detected, the kernel decided to mount it read only and didn't let me remount it as `rw`.
IPMI? Yes, I had the password in my password store but not the username.
So the only logical solution was to kexec into an emergency system.
This code is what I used.
It recovers all IP addresses as well as SSH host and user keys from the old system and kexecs into a new one - entirely in-memory.## What it does
The `emergency` script (found in the repository root) will SSH over and execute the following things:
1. Build the recovery image (a `.tar.xz` with a small nix store and a `kexec` script) from the files in this repository locally on the machine you're executing this code on
1. The system configuration is found in `configuration.nix`
2. Some `kexec`-related features are imported from `kexec.nix`
3. The scripts will be included to be used in the `kexec` script (see below)
2. Try to `mkdir` `/nix` and `/tmp`. If the don't already exist and your root fs is read-only, you have a problem this project can't fix
3. Mount a fresh `tmpfs` on `/tmp` because there might not be one already
4. `scp` the emergency image over and extract it
5. Mount the nix store from the emergency image over `/nix` using `overlayfs`
6. Run the kexec scriptThe `kexec` script (found in `kexec.nix`) will do the following:
1. Prepare a second initrd
2. Put your SSH host keys into the initrd
3. Put all of your SSH user keys into the initrd
4. Fetch all your IP addresses and routes and put them into the initrd
5. Pack the second initrd and append it to the default NixOS initrd from the emergency image
6. `kexec` into the kernel from the emergency image while using the new initrd
7. In case you didn't already notice: **This will crash your currently running system, so maybe it's a good idea to gracefully shut down remaining daemons if that's still possible**The script that is packed into the initrd of the new system will do the following:
1. Place the SSH host key
2. Place the SSH user keys
3. Place a script for the IP addresses which will be executed using `networking.localCommands` so the interfaces are availableIf you set the environment variable `EMERGENCY_DUMP_NETWORK` to `1`, all IPs, routes, and nameservers will be placed in the `emergency_ips`, `emergency_routes`, and `emergency_nameservers` files, respectively.
## How to use
```
$ ./emergency root@somehost
# or
$ ./emergency somebody@somehost
```## Disclaimer and license
If it doesn't work for you, I'm sorry.
I can probably not help you, but if you're able to fix something, feel free to create a PR.The code is based on [clever's](https://github.com/cleverca22) kexec nix-test (found [here](https://github.com/cleverca22/nix-tests/tree/master/kexec)).
The code is licensed under the [LGPL3](LICENSE).