https://github.com/openshift/lvm-operator
The LVM Operator deploys and manages LVM storage on OpenShift clusters
https://github.com/openshift/lvm-operator
Last synced: 11 days ago
JSON representation
The LVM Operator deploys and manages LVM storage on OpenShift clusters
- Host: GitHub
- URL: https://github.com/openshift/lvm-operator
- Owner: openshift
- License: apache-2.0
- Created: 2021-12-01T10:47:13.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-05-21T20:16:37.000Z (11 months ago)
- Last Synced: 2024-05-22T12:55:30.019Z (11 months ago)
- Language: Go
- Homepage:
- Size: 24.3 MB
- Stars: 41
- Watchers: 18
- Forks: 35
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Security: docs/security.md
Awesome Lists containing this project
README
# The LVM Operator - part of LVMS
## [Official LVMS Product Documentation](https://docs.openshift.com/container-platform/latest/storage/persistent_storage/persistent_storage_local/persistent-storage-using-lvms.html)
For the latest information about usage and installation of LVMS (Logical Volume Manager Storage) in OpenShift, please use the official product documentation linked above.
## Overview
Use the LVM Operator with `LVMCluster` custom resources to deploy and manage LVM storage on OpenShift clusters.
The LVM Operator leverages the [TopoLVM CSI Driver](https://github.com/topolvm/topolvm) on the backend to dynamically create LVM physical volumes, volume groups and logical volumes, and binds them to `PersistentVolumeClaim` resources.
This allows applications running on the cluster to consume storage from LVM logical volumes backed by the TopoLVM CSI Driver.The LVM Operator, in conjunction with the TopoLVM CSI Driver, Volume Group Manager, and other related components, collectively comprise the Logical Volume Manager Storage (LVMS) solution.
Here is a brief overview of how the Operator works. See [here](docs/design/architecture.md) for the architecture diagram.
```mermaid
graph LR
LVMOperator((LVMOperator))-->|Manages| LVMCluster
LVMOperator-->|Manages| StorageClass
StorageClass-->|Creates| PersistentVolumeA
StorageClass-->|Creates| PersistentVolumeB
PersistentVolumeA-->LV1
PersistentVolumeB-->LV2
LVMCluster-->|Comprised of|Disk1((Disk1))
LVMCluster-->|Comprised of|Disk2((Disk2))
LVMCluster-->|Comprised of|Disk3((Disk3))subgraph Logical Volume Manager
Disk1-->|Abstracted|PV1
Disk2-->|Abstracted|PV2
Disk3-->|Abstracted|PV3
PV1-->VG
PV2-->VG
PV3-->VG
LV1-->VG
LV2-->VG
end
```- [Deploying the LVM Operator](#deploying-the-lvm-operator)
* [Using the pre-built images](#using-the-pre-built-images)
* [Building the Operator yourself](#building-the-operator-yourself)
* [Deploying the Operator](#deploying-the-operator)
* [Inspecting the storage objects on the node](#inspecting-the-storage-objects-on-the-node)
* [Testing the Operator](#testing-the-operator)
- [Cleanup](#cleanup)
- [Metrics](#metrics)
- [Known Limitations](#known-limitations)
* [Dynamic Device Discovery](#dynamic-device-discovery)
* [Unsupported Device Types](#unsupported-device-types)
* [Single LVMCluster support](#single-lvmcluster-support)
* [Upgrades from v 4.10 and v4.11](#upgrades-from-v-410-and-v411)
* [Missing native LVM RAID Configuration support](#missing-native-lvm-raid-configuration-support)
* [Missing LV-level encryption support](#missing-lv-level-encryption-support)
* [Snapshotting and Cloning in Multi-Node Topologies](#snapshotting-and-cloning-in-multi-node-topologies)
* [Validation of `LVMCluster` CRs outside the `openshift-storage` namespace](#validation-of-lvmcluster-crs-outside-the-openshift-storage-namespace)
- [Troubleshooting](#troubleshooting)
- [Contributing](#contributing)## Deploying the LVM Operator
Due to the absence of a CI pipeline that builds this repository, you will need to either build it yourself or use a pre-built image that has been made available. Please note that the pre-built image may not be in sync with the current state of the repository.
### Using the pre-built images
If you are comfortable using the pre-built images, simply proceed with the [deployment steps](#deploying-the-operator).
### Building the Operator yourself
To build the Operator, install Docker or Podman and log into your registry.
1. Set the following environment variables to the repository where you want to host your image:
```bash
$ export IMAGE_REGISTRY=
$ export REGISTRY_NAMESPACE=
$ export IMAGE_TAG=
```2. Build and push the container image:
```bash
$ make docker-build docker-push
```Building the Operator for OLM deployment
If you intend to deploy the Operator using the Operator Lifecycle Manager (OLM), there are some additional steps you should follow.
1. Build and push the bundle image:
```bash
$ make bundle-build bundle-push
```2. Build and push the catalog image:
```bash
$ make catalog-build catalog-push
```Ensure that the OpenShift cluster has read access to that repository. Once this is complete, you are ready to proceed with the next steps.
### Deploying the Operator
You can begin the deployment by running the following command:
```bash
$ make deploy
```Deploying the Operator with OLM
You can begin the deployment using the Operator Lifecycle Manager (OLM) by running the following command:
```bash
$ make deploy-with-olm
```The process involves the creation of several resources to deploy the Operator using OLM. These include a custom `CatalogSource` to define the Operator source, the `openshift-storage` namespace to contain the Operator components, an `OperatorGroup` to manage the lifecycle of the Operator, a `Subscription` to subscribe to the Operator catalog in the `openshift-storage` namespace, and finally, the creation of a `ClusterServiceVersion` to describe the Operator's capabilities and requirements.
Wait until the `ClusterServiceVersion` (CSV) reaches the `Succeeded` status:
```bash
$ kubectl get csv -n openshift-storageNAME DISPLAY VERSION REPLACES PHASE
lvms-operator.v0.0.1 LVM Storage 0.0.1 Succeeded
```After the previous command has completed successfully, switch over to the `openshift-storage` namespace:
```bash
$ oc project openshift-storage
```Wait until all pods have started running:
```bash
$ oc get pods -w
```Once all pods are running, create a sample `LVMCluster` custom resource (CR):
```bash
$ oc create -n openshift-storage -f https://github.com/openshift/lvm-operator/raw/main/config/samples/lvm_v1alpha1_lvmcluster.yaml
```After the CR is deployed, the following actions are executed:
- A Logical Volume Manager (LVM) volume group named `vg1` is created, utilizing all available disks on the cluster.
- A thin pool named `thin-pool-1` is created within `vg1`, with a size equivalent to 90% of `vg1`.
- The TopoLVM Container Storage Interface (CSI) plugin is deployed.
- A storage class and a volume snapshot class are created, both named `lvms-vg1`. This facilitates storage provisioning for OpenShift workloads. The storage class is configured with the `WaitForFirstConsumer` volume binding mode that is utilized in a multi-node configuration to optimize the scheduling of pod placement. This strategy prioritizes the allocation of pods to nodes with the greatest amount of available storage capacity.
- The LVMS system also creates two additional internal CRs to support its functionality:
* `LVMVolumeGroup` is generated and managed by LVMS to monitor the individual volume groups across multiple nodes in the cluster.
* `LVMVolumeGroupNodeStatus` is created by the [Volume Group Manager](docs/design/vg-manager.md). This CR is used to monitor the status of volume groups on individual nodes in the cluster.Wait until the `LVMCluster` reaches the `Ready` status:
```bash
$ oc get lvmclusters.lvm.topolvm.io my-lvmclusterNAME STATUS
my-lvmcluster Ready
```Wait until all pods are active:
```bash
$ oc get pods -w
```Once all the pods have been launched, the LVMS is ready to manage your logical volumes and make them available for use in your applications.
### Inspecting the storage objects on the node
Prior to the deployment of the Logical Volume Manager Storage (LVMS), there are no pre-existing LVM physical volumes, volume groups, or logical volumes associated with the disks.
```bash
sh-4.4# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdb 8:16 0 893.8G 0 disk
|-sdb1 8:17 0 1M 0 part
|-sdb2 8:18 0 127M 0 part
|-sdb3 8:19 0 384M 0 part /boot
`-sdb4 8:20 0 893.3G 0 part /sysroot
sr0 11:0 1 987M 0 rom
nvme0n1 259:0 0 1.5T 0 disk
nvme1n1 259:1 0 1.5T 0 disk
nvme2n1 259:2 0 1.5T 0 disk
sh-4.4# pvs
sh-4.4# vgs
sh-4.4# lvs
```After successful deployment, the necessary LVM physical volumes, volume groups, and thin pools are created on the host.
```bash
sh-4.4# pvs
PV VG Fmt Attr PSize PFree
/dev/nvme0n1 vg1 lvm2 a-- <1.46t <1.46t
/dev/nvme1n1 vg1 lvm2 a-- <1.46t <1.46t
/dev/nvme2n1 vg1 lvm2 a-- <1.46t <1.46t
sh-4.4# vgs
VG #PV #LV #SN Attr VSize VFree
vg1 3 0 0 wz--n- <4.37t <4.37t
sh-4.4# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
thin-pool-1 vg1 twi-a-tz-- <3.93t 0.00 1.19
```### Testing the Operator
Once you have completed [the deployment steps](#deploying-the-operator), you can proceed to create a basic test application that will consume storage.
To initiate the process, create a Persistent Volume Claim (PVC):
```bash
$ cat <` returns mount points for the device in the 4th or 10th field.7. **ROM Devices:**
- *Condition:* Devices of type `rom` are unsupported.
- *Why:* Such devices are designed for static data storage and lack the necessary read-write capabilities essential for dynamic operations performed by LVMS.
- *Filter:* `type` is set to `rom`.8. **LVM Partitions:**
- *Condition:* Devices of type `LVM` partition are unsupported.
- *Why:* These partitions are already dedicated to LVM and are managed as part of an existing volume group.
- *Filter:* `type` is set to `lvm`.9. **Loop Devices:**
- *Condition:* Loop Devices must not be used if they are already in use by Kubernetes.
- *Why:* When loop devices are utilized by Kubernetes, they are likely configured for specific tasks or processes managed by the Kubernetes environment. Integrating loop devices that are already in use by Kubernetes into LVMS can lead to potential conflicts and interference with the Kubernetes system.
- *Filter:* `type` is set to `loop`, and `losetup -O BACK-FILE --json` returns a `back-file` which contains `plugins/kubernetes.io`.Devices meeting any of these conditions are filtered out for LVMS operations.
_NOTE: It is strongly recommended to perform a thorough wipe of a device before using it within LVMS to proactively prevent unintended behaviors or potential issues._
### Single LVMCluster support
LVMS does not support the reconciliation of multiple LVMCluster custom resources simultaneously.
### Upgrades from v 4.10 and v4.11
It is not possible to upgrade from release-4.10 and release-4.11 to a newer version due to a breaking change that has been implemented. For further information on this matter, consult [the relevant documentation](https://github.com/topolvm/topolvm/blob/main/docs/proposals/rename-group.md).
### Missing native LVM RAID Configuration support
Currently, LVM Operator forces all LVMClusters to work with a thinly provisioned volume in order to support Snapshotting and Cloning on PVCs.
This is backed by an LVM Logical Volume of type `thin`, which is reflected in the LVM flags as an attribute.
When trying to use LVM's inbuilt RAID capabilities, it conflicts with this `thin` attribute as the same flag is also indicative whether a volume is part of LVM RAID configurations (`r` or `R` flag).
This means that the only way to support RAID configuration from within `LVM` would be to do a conversion from two RAID Arrays into a thinpool with `lvconvert`, after which the RAID is no longer recognized by LVM (due to said conflict in the volume attributes).
While this would enable initial synchronization and redundancy, all repair and extend operations would not longer respect the RAID topology in the Volume Group, and operations like `lvconvert --repair` are not even supported anymore.
This means that it would be quite a complex situation to recover from.Instead of doing LVM based RAIDs, we recommend using the [`mdraid`](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/managing_storage_devices/managing-raid_managing-storage-devices#linux-raid-subsystems_managing-raid) subsystem in linux instead of the LVM RAID capabilities.
Simply create a RAID array with `mdadm` and then use this in your `deviceSelector` within `LVMCluster`:1. For a simple RAID1, you could use `mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdc1`
2. Then you can reference `/dev/md0` in the `deviceSelector` as normal
3. Any recovery and syncing will then happen with `mdraid`: [Replacing Disks](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/managing_storage_devices/managing-raid_managing-storage-devices#replacing-a-failed-disk-in-raid_managing-raid) and [Repairing](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/managing_storage_devices/managing-raid_managing-storage-devices#repairing-raid-disks_managing-raid) will work transparently of LVMS and can be covered by a sysadmin of the Node._NOTE: Currently, RAID Arrays created with `mdraid` are not automatically recognized when not using any `deviceSelector`, thus they MUST be specified explicitly._
### Missing LV-level encryption support
Currently, LVM Operator does not have a native LV-level encryption support. Instead, you can encrypt the entire disk or partitions, and use them within LVMCluster. This way all LVs created by LVMS on this disk will be encrypted out-of-the-box.
Here is an example `MachineConfig` that can be used to configure encrypted partitions during an OpenShift installation:
```yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
name: 98-encrypted-disk-partition-master
labels:
machineconfiguration.openshift.io/role: master
spec:
config:
ignition:
version: 3.2.0
storage:
disks:
- device: /dev/nvme0n1
wipeTable: false
partitions:
- sizeMiB: 204800
startMiB: 600000
label: application
number: 5
luks:
- clevis:
tpm2: true
device: /dev/disk/by-partlabel/application
name: application
options:
- --cipher
- aes-cbc-essiv:sha256
wipeVolume: true
```Then, the path to the encrypted partition `/dev/mapper/application` can be specified in the `deviceSelector`.
For non-OpenShift clusters, you can encrypt a disk using LUKS with `cryptsetup`, and then use this in your `deviceSelector` within `LVMCluster`:
1. Set up the `/dev/sdb` device for encryption. This will also remove all the data on the device:
```bash
cryptsetup -y -v luksFormat /dev/sdb
```You'll be prompted to set a passphrase to unlock the volume.
2. Create a logical device-mapper device named `encrypted`, mounted to the LUKS-encrypted device:
```bash
cryptsetup luksOpen /dev/sdb encrypted
```You'll be prompted to enter the passphrase you set when creating the volume.
3. You can now reference `/dev/mapper/encrypted` in the `deviceSelector`.
### Snapshotting and Cloning in Multi-Node Topologies
In general, since LVMCluster does not ensure data replication, `VolumeSnapshots` and consumption of them is always limited to the original dataSource.
Thus, snapshots must be created on the same node as the original data. Also, all pods relying on a PVC that is using the snapshot data will have to be scheduled
on the node that contained the original `LogicalVolume` in TopoLVM.It should be noted that snapshotting is based on Thin-Pool Snapshots from upstream TopoLVM and are still considered [experimental in upstream](https://github.com/topolvm/topolvm/discussions/737).
This is because multi-node Kubernetes clusters have the scheduler figure out pod placement logically onto different nodes (with the node topology from the native Kubernetes Scheduler responsible for deciding the node where Pods should be deployed),
and it cannot always be guaranteed that Snapshots are provisioned on the same node as the original data (which is based on the CSI topology, known by TopoLVM) if the `PersistentVolumeClaim` is not created upfront.If you are unsure what to make of this, always make sure that the original `PerstistentVolumeClaim` that you want to have Snapshots on is already created and `Bound`.
With these prerequisites it can be guaranteed that all follow-up `VolumeSnapshot` Objects as well as `PersistentVolumeClaim` objects depending on the original one are scheduled correctly.
The easiest way to achieve this is to use precreated `PersistentVolumeClaims` and non-ephemeral `StatefulSet` for your workload._NOTE: All of the above also applies for cloning the `PersistentVolumeClaims` directly by using the original `PersistentVolumeClaims` as data source instead of using a Snapshot._
### Validation of `LVMCluster` CRs outside the `openshift-storage` namespace
When creating an `LVMCluster` CR outside the `openshift-storage` namespace by installing it via `ClusterServiceVersion`, the Operator will not be able to validate the CR.
This is because the `ValidatingWebhookConfiguration` is restricted to the `openshift-storage` namespace and does not have access to the `LVMCluster` CRs in other namespaces.
Thus, the Operator will not be able to prevent the creation of invalid `LVMCluster` CRs outside the `openshift-storage` namespace.
However, it will also not pick it up and simply ignore it.This is because Operator Lifecycle Manager (OLM) does not allow the creation of `ClusterServiceVersion` with installMode `OwnNamespace` while also not restricting the webhook configuration.
Validation in the `openshift-storage` namespace is processed normally.## Troubleshooting
See the [troubleshooting guide](docs/troubleshooting.md).
## Contributing
See the [contribution guide](CONTRIBUTING.md).