dnf-plugin-protected-kmods

<tldr>dnf-plugin-protected-kmods is now available in EPEL!</tldr>

I don’t think I ever posted about it, but nine months ago (exactly, which I just realized as I’m writing these words), I joined CIQ as a Senior Systems Engineer. One of my early tasks was to help one of our customers put together Rocky Linux images that their customers could use, and one of the requirements from their HPC customers was that the latest Intel irdma kernel module be available.

While packaging up the kernel module as an external kmod was easy enough, the question was asked, “What if the kernel ABI changes?” Their HPC customers wanted to use the upstream Rocky kernel, which, as a rebuild of RHEL has the same kABI guarantees that Red Hat has. There is a list of symbols that are (mostly) guaranteed not to change during a point release, but the Intel irdma driver requires symbols that aren’t in that list.

I did some investigation, and, in the lifespan of Rocky 8.10 (roughly 15 months), there have been somewhere just under 60 kernel releases, with only 3 or 4 breaking the symbols required by the Intel irdma driver. This meant that we could build the kmod when 8.10 came out, and, using weak-updates, the kernel module would automatically be available for newer kernels as they’re released until a release came out that broke one of the symbols that the kmod depended on. At that point, we would need to bump the release and rebuild the kmod. The new kmod build would be compatible with the new kernel, and any other new kernels until the kABI broke again.

When doing the original packaging for the kernel, Red Hat had the wisdom to add in a custom dependency generator that automatically generates a “Provides:” in the RPM for each symbol exported by the kernel, along with a hashed signature of its structure. This means that the kmod RPMs can be built to “Require:” each symbol they need, ensuring that the kmod can’t be installed on a system without also having a matching kernel installed.

This last item would seem to solve the whole “make sure kmods and kernels match” problem, except for one minor detail: You can have more than one kernel installed on your system.

Picture this. You have a system, and you install a kernel on it, and then install the Intel irdma/idpf driver, which makes your fancy network card work. A little while later, you update to the latest kernel and reboot… only to find your network card won’t work anymore!

What’s happened is that the kernel update changed one of the symbols required by the Intel irdma kmod, breaking the kABI. The kmod RPM has a dependency on the symbols it needs, but, because the kernel is special (that’s for you, Maple!), it’s one of the few packages that can have multiple versions installed at the same time, and those symbols are provided by the previous kernel, which is still installed, even if it’s not the currently booted kernel. The fix is as easy as booting back into the previous kernel, and waiting for an updated Intel kmod, but this is most definitely not a good customer experience.

What we really need is a safety net, a way to temporarily block the kernel from being updated until a matching kmod is available in the repositories. This is where dnf-plugin-protected-kmods comes in. When configured to protect a kmod, this DNF plugin will exclude any kernel RPMs if that kernel doesn’t have all the symbols required by the kmod RPM.

This means that, in the example above, the updated kernel would not have appeared as an available update until the Intel irdma/idpf kmod was also available (a warning would appear, indicating that this kernel was being blocked).

NVIDIA originally came up with the idea when they created yum-plugin-nvidia-driver, but it was very specifically designed with the NVIDIA kmods and their requirements in mind, so I forked it and made it more generic, updating it to filter based on the kernel’s “Provides:” and the kmod’s “Requires:”.

Our customer has been using this plugin for over six months, and it has functioned as expected. The DNF kmods we’re building for CIQ SIG/Cloud Next (a story for another day) are also built to support it and there’s a “Recommends:” dependency on it when the kmods are installed.

Since this plugin is useful not just to CIQ, but also to the wider Enterprise Linux community, I started working on packaging it up at this year’s Flock to Fedora conference (thanks for sending me, CIQ!), and, thanks to a review from Jonathan Wright (from AlmaLinux) with support from Neal Gompa, it’s now available in EPEL.

Note that there is no DNF 5 version available yet, and, given the lack of kABI guarantees in the Fedora kernel, it isn’t of much point in having it in Fedora proper.

And I do want to emphasize that, out of the box, the plugin doesn’t actually do anything. For it to protect a kmod, a drop-in configuration file is required as described in the documentation.