Multiseat systems and the NVIDIA binary driver

Building mesa

Ever since our school switched to Fedora on the desktop, I’ve either used the onboard Intel graphics or AMD Radeon cards, since both are supported out of the box in Fedora. With our multiseat systems, we now need three external video cards on top of the onboard graphics on each system, so we’ve bought a large number of Radeon cards over the last few years.

Unfortunately, our local supplier has greatly reduced the number of AMD cards that they stock. In their latest price lists, they have a grand total of two Radeon cards in our price range, and one of them is almost seven years old!

This has led me to take a second look at NVIDIA cards, and I’m slowly coming back around to the concept of buying them and maybe even using their binary drivers. Our needs have changed since we first started using Linux, and NVIDIA’s binary driver does offer some unique benefits.

As we’ve started teaching 3D modeling using Blender, render time has become a real bottleneck for some of our students. We allow students to use the computers before and after school, but some of them don’t have much flexibility in their transportation and need to get their rendering done during the school breaks. Having two or three students all trying to render at the same time on a single multiseat system can lead to a sluggish system and very slow rendering. The easiest way to fix this is to do the rendering in the GPU, which Blender does support, but only using NVIDIA’s binary driver.

So about a month ago, I ordered a cheap NVIDIA card for testing purposes. I swapped it with an AMD card on one of our multiseat systems and powered it up. Fedora recognized the card using the open-source nouveau driver and everything just worked. Beautiful!

Then, a few hours later, I noticed the system had frozen. I rebooted it, and, after a few hours, it had frozen again. I moved the NVIDIA card into a different system, and, after a few hours, it froze while the original system just kept running.

Some research showed that the nouveau driver sometimes has issues with multiple video cards on the same system. There was some talk about extracting the binary driver’s firmware and using it in nouveau, but I decided to see if I could get the binary driver working without breaking our other Intel and AMD seats.

The first thing I did was upgrade the test system to Fedora 25 in hopes of taking advantage of the work done to make mesa and the NVIDIA binary driver coexist. I then installed the binary NVIDIA drivers from this repository (mainly because his version of blender already has the CUDA kernels compiled in). The NVIDIA seat came up just fine, but I quickly found that mesa in Fedora 25 isn’t built with libglvnd (a shim between either the mesa or NVIDIA OpenGL implementation, depending on which card you’re using and your applications) enabled, so all of the seats based on open drivers didn’t come up. But, even when it was enabled, I ran into this bug, so I ended up extending this patch so it would also work with Gallium drivers and applying it.

This took me several steps closer, but apparently the X11 GLX module is not part of libglvnd and NVIDIA sets the Files section in xorg.conf to use it’s own GLX module (which, oddly enough, doesn’t work with the open drivers). I finally worked around this via the ugly hack of creating two different xorg.conf.d directories and telling lightdm to use the NVIDIA one when loading the NVIDIA seat.

Voilà! We now have a multiseat system with one Intel built-in card using the mesa driver, two AMD cards using the mesa Gallium driver, and one NVIDIA card using the NVIDIA binary driver. And it only cost me eight hours and my sanity.

So what needs to happen to make this Just Work™? Either libglvnd needs to also include the X11 GLX module or we need a different shim to accomplish the same thing. And Fedora needs to build mesa with libglvnd enabled (but not until this bug is fixed!)

My mesa build is here and the source rpm is here. There is a manual “Provides: libGL.so.1()(64bit)” in there that isn’t technically correct, but I really didn’t want to recompile negativo17’s libglvnd to add it in and my mesa build requires that libglvnd implementation.

My xorg configs are here and my lightdm configuration is here. Please note that the xorg configs have my specific PCI paths; yours may differ.

And I do plan to write a script to automate the xorg and lightdm configs. I’ll update this post when I’ve done so.

Sidenote: As I was looking through my old posts to see if I had anything on NVIDIA, I came across a comment by Seth Vidal. He was an excellent example of what the Fedora community is all about, and I really miss him.

Update: Configuration has become much simpler. An updated post is here.

Notes on a mass upgrade to Fedora 23

Fedora 23

One of the hardest parts of running Fedora in a school setting is keeping on top of the upgrades, and I ended up falling a few months behind. Fedora 23 was released back in November, and it took me until February to start the upgrade process.

For our provisioning process, we’ve switched from a custom koji instance to ansible (with our plays on github), and this release was the first time I was really able to take advantage it. I changed our default kickstart to point to the Fedora 23 repositories, installed it on a test system, ran ansible on it, and voilà, I had a working Fedora 23 setup, running perfectly with all our school’s customizations. It was the easiest upgrade experience I’ve ever had!

Well, mostly.

As usual, the moment you think everything is perfect is the moment everything goes wrong. On our multiseat systems, we have three external AMD graphics cards along with the internal Intel graphics. The first bug I noticed was that the Intel card wasn’t doing any graphics acceleration. It turns out that VGA arbitration is automatically turned on if you have more than one video card, and Intel cards don’t support it in DRI2. DRI3 does handle arbitration just fine, but it was (and still is) disabled in the latest xorg-x11-drv-intel in the updates repository. Luckily for me, there’s a build in koji that re-enables DRI3. Problem solved.

The second bug was…odd. While we use gnome-shell as the default desktop environment in the school, we use lightdm for logging in, mainly because of it’s flexibility. We run xscreensaver in the login screen (and only in the login screen) to make it clear which computers are off, which are on, and which are logged in. GDM doesn’t support xscreensaver, but lightdm does. And this brings us back to the bug. On the Intel seat, moving the mouse or pressing a key would stop the screensaver as expected, but the screen would remain black except for the username control. It seems that the “VisibilityNotify” event isn’t being honored by the driver (though don’t ask me why it should be passed down to the driver). I filed a bug, and then finally figured out that fading xscreensaver back in works around the problem.

The third bug is even stranger. On the teacher’s machine, we have a small script that starts x11vnc (giving no control to anyone connecting to it) so the teacher can give a demonstration to the students. But after install Fedora 23 on the teacher’s machine, the demo kept showing the same three frames over and over. The teacher’s system isn’t multiseat and is using the builtin Intel graphics, so, oddly enough, disabling DRI3 fixed the problem. I filed another bug.

When upgrading the staff room systems, I ran into a bug in which cups runs screaming into the night (ok, slight exaggeration) if you have a server announcing printers over both the old cups and new dnssd protocols. Since we don’t have any pre-F21 systems any more, I’ve just disabled the old cups protocol on the server.

And, finally, my principal, who teachers computers to grades 11 and 12, came in to ask me why LibreOffice was crashing for a couple (and only a couple) of his students when they were formatting cells on a spreadsheet that he gave them. After some fancy footwork involving rm’d .config/libreoffice directories and files saved into random odd formats and then back into ods, we finally managed to format the cells without a crash. Lovely.

All this brings me back to ansible. In each of the bugs that required changes to the workstations, all I had to do was update the ansible scripts and push the changes out. Talk about painless! Ansible has made this job so much easier!

And I do want to finish by saying that these bugs are part of the reason that I love Fedora. With Fedora, I have the freedom to fix these problems myself. For both the cups bug and the xscreensaver bug, I was able to dig into the source code to start tracking down where the problem lay and come up with a workaround. And if I can just get the LibreOffice bug to reproduce, I could get a crash dump off of it and possibly figure it out too. Hurrah for source code!