Config Caching Filesystem (ccfs)

One of the problems we’ve had to deal with on our servers is high load on the fileserver that holds the user directories. I haven’t worked out if it’s because we’re using standard workstation hardware for our servers, or if it’s a btrfs problem.

The strange thing is that the load will shoot up at random times when the network shouldn’t be that taxed, and then be fine when every computer in the school has someone logged into it.

Anyhow, we hit a point where the load on the server hit something like 60 and the workstations would lock for sixty seconds (or more) while waiting the the NFS server to respond again. This seemed to happen most often when all of the students in the computer room opened Firefox at the same time.

In a fit of desperation, I threw together a python fuse filesystem that I have cunningly called the Config Caching Filesystem (or ccfs for short). The concept is simple. A user’s home directory at /netshare/users/[username] is essentially bind-mounted to /home/[username] using ccfs.

The thing that separates ccfs from a simple fuse bind-mount is that every time a configuration file (one that starts with a “.”) is opened for writing, it is copied to a per-user cache directory in /tmp and opened for writing there. When the user logs out, /home/[username] is unmounted, and all of the files in the cache are copied back to /netshare/users/[username] using rsync. Any normal files are written directly to /netshare/users/[username], bypassing the cache.

Now the only time the server is being written to is when someone actually saves a file or when they log out. The load on the server rarely goes above five, and even then it’s only when everyone is logging out simultaneously, and the server recovers quickly.

A few bugs have cropped up, but I think I’ve got the main ones. The biggest bug was that some students were resetting their desktops when the system didn’t log out quickly enough and were getting corrupted configuration directories, mainly for Firefox. I fixed that by using –delay-updates with rsync so you either get the fully updated configuration files or you’re left with the configuration files were there when you logged in.

I do think this solution is a bit hacky, but it’s had a great effect on the responsiveness of our workstations, so I’ll just have to live with it.

Ccfs is available here for those interested, but if it breaks, you get to keep both pieces.

Jetpack credit: Fly with U.S. poster by Tom Whalen. Used under CC BY-NC-ND

Catalyst vs. Mesa (round 2)

In February, I wrote about my frustration with AMD’s binary Catalyst drivers and my switch to the free Mesa drivers for my media/gaming system. During the summer, I updated to Fedora 13 and continued to enjoy the reliability of free drivers.

Then, a problem. Sometime in September, some of the rendering in XBMC started to be corrupted. Movies played fine and the picture slideshow continued to work correctly, but any controls rendered on top of the Movie or slideshow seemed to be missing the background texture and instead rendered as a very light grey. With white text rendered on top of it, it made the controls pretty unreadable.

Upgrading mesa didn’t work. Neither did upgrading the kernel or XBMC. And a full upgrade to Fedora 14 didn’t help either. Given the insanity of getting everything else up and running at the beginning of the school year, this was the point that I stopped. After all, our main use of the system is to watch movies or the pictures slideshow, and, though annoying, the bug wasn’t a show stopper.

With some of the free time we had for Christmas, I decided to try to tackle the bug. I figured the easiest way would be to go back to the Catalyst drivers and see if the rendering was still screwed up. Sure enough, the Catalyst drivers fixed the rendering. I then tried to open Nexuiz full-screen over XBMC. The display froze. One reboot later I tried again. And the display froze again.

After several hours of trying different kernel and xorg.conf options, I was ready to put the computer in the middle of the Damascus highway during rush-hour traffic.

Then I had an epiphany. In Fedora 15, the r600 driver will switch from the standard Mesa driver to the new Gallium3D driver. So I installed fedora-release-rawhide and then did a:

yum update mesa-* libdrm-* xorg-x11-* gdm-*

One reboot later and XBMC is rendered correctly in all of its glory, all of my games run correctly over it, and I still don’t have to worry about keeping Catalyst up to date.

Closed drivers: 0

Open drivers: 2

Note: Some may wonder why I updated gdm. For some reason, the old version interacts with X in such a way that X crashes and I’m left with the boot screen and can only ssh into the system. It seems to be somewhat related to this bug, and updating gdm fixes it.