btrfs on the server

As mentioned back here and here, our current server setup looks something like this:

Current server configuration

One thing not noted in the diagram is that fileserver, our dns server, ldap server, web server, and a few others all run as virtual machines on storage-server01 and storage-server02.

The drawback to this is that when disk io gets heavy, our virtual machines start struggling, even though they’re on separate hard drives.

Another problem with our current system is that we don’t have a good method of backup. Replication, yes, but if a student accidentally runs rm ./ -rf in their home directory, it’s gone.

So, with a bit of time over the summer after I’ve set up the school’s Fedora 13 image, I thought I’d tackle these problems. We now have three new “servers” (well, 2GB desktop systems with lots of big hard drives shoved in them). Our data has been split into three parts, and each server is primary for one part and backup for another.

The advantage? Now our virtual machines have full use of the (now misnamed) storage-servers01-2, both of which are still running CentOS 5.5. Our three new datastore servers, running Fedora 13, now share the load that was being put on one storage-server.

But this doesn’t solve the backup problem. A few years back, I experimented with LVM snapshots, but they were just way too slow. Ever since then, though, I’ve been very interested in the idea of snapshots and btrfs has them for free (at least in terms of extra IO, and I’m not too worried about space). Btrfs also handles multiple devices just fine, which means goodbye LVM. With btrfs, our new setup looks something like this:

New server configuration

I have hit a couple of problems, though. By default, btrfs will RAID1 metadata if you have more than one device in a btrfs filesystem. I’m not sure whether my problem was related to this, but when I tried to manually balance the user filesystem which was spread across a 2TB and 1TB disk, I got -ENOSPC, a kernel panic, and a filesystem that was essentially read-only. This when the data on the drive was under 800GB (though most of the files are small hidden files in our users’ home directories). After checking out the btrfs wiki, I upgraded the kernel to the latest 2.6.34 available from koji (at that point in time), and then copied the data over to a newly created filesystem with RAID0 metadata and data (after all, my drives are already RAID1 using DRBD). A subsequent manual balance had no problems at all.

The second problem is not so easily solved. I wanted to do a speed comparison between our new configuration and our old one, so I ran bonnie++ on all of the computers in our main computer lab. I set it up so each computer was running their instance in a different directory on the nfs share (/networld/bonnie/$HOSTNAME).

Yes, I knew it would take a while (and stress-test the server), but that’s the point, right? The server froze after a few minutes. No hard drive activity. No network activity. The flashing cursor on the display stopped flashing (and, yes, it’s in runlevel 3). Num lock and caps lock don’t change color. Nothing in any logs. Frozen dead.

I rebooted the server, and tried the latest 2.6.33 kernel. After a few minutes of the stress test, it was doing a great imitation of an ice cube. I tried a 2.6.35 Fedora 14 kernel rebuilt for Fedora 13 that I had discarded because of a major drop in DRBD sync speed. This time the stress test barely made it 30 seconds.

So where does that leave me? Tomorrow I plan on running the stress test on our old CentOS server. If it freezes too, then I’m not going to worry too much. It hasn’t ever frozen like that with normal use, so I’ll just put it down to NFS disliking 30+ computers writing gigabytes of data at the same time. I did file this bug report, but not sure if I’ll hear anything on it. It’s kind of hard to track down a problem if there aren’t any error messages on screen or in the logs.

The good news is that I do have daily snapshots set up, shared read-only over NFS, that get deleted after a week. So now we have replication and backups.

I’d like to keep this configuration, but that depends on whether the server freeze bug will show up in real-world use. If it does, we’ll go back to CentOS on the three servers, and probably use ext4 as the base filesystem.

Update: 08/26/2010 After adding a few boot options, I finally got the logs of the freeze from the server. It looks like it’s a combination of relatively low RAM and either a lousy network card design or a poor driver. Switching the motherboard has mitigated the problem, and I’m hoping to get some more up-to-date servers with loads more RAM.

Better Building

As I mentioned in my last post, I’m setting up the computer system in our sister school in Ain Zhalta, up in the mountains, and last summer I set up the computer system in our sister school down in Tyre.

This includes both servers and workstations, and, being the lazy sysadmin that I am, I prefer not to reinvent the wheel for each place. My method last summer was to build rpms for most of the school-specific configuration settings, which allows me to make small changes and have them pulled in automatically.

The one problem I’ve hit is that there are some packages that have to be different between the two (and now three) schools. For example, the package lesbg-gdm-gconf contains the gconf settings so our login says “Welcome to the LES Loueizeh computer system”. Somehow, I don’t think Tyre or Ain Zhalta will appreciate having that showing on their welcome screen. Each school also has a different logo, and, again, the other schools don’t want our logo on their backgrounds.

So, what I really need is a way of organizing my rpms so that the common ones get passed to all the schools while the per-school ones only get passed to their school. Hmm. Think, think, what software is available in Fedora that could do that…

Enter koji. I had already setup a koji buildsystem to help track down the disappearing deltarpms bug (yes, the bug is still there, but that’s for another day), and the hardest part was getting the SSL certs right.

I set up a koji instance on our dedicated server (now yum upgraded to Fedora 13, see this post for more details) by following these instructions, and now have a nice centralized build system for our schools at http://koji.lesbg.com.

The beauty of koji is that it handles inheritance. For Fedora 13, I’ve created one parent tag, dist-f13, and three child tags dist-f13-lesbg, dist-f13-lest, dist-f13-lesaz. All of the common packages are built to the dist-f13 tag, while the school-specific packages are built to their respective tags. Every night, I generate three repositories (lesbg, lest, and lesaz), and each repository has the correct rpms for that school. What could be easier than that?

There are a few caveats, though. First, our dedicated server is slow. It’s an old celeron with a whole 1GB of RAM (through HiVelocity), so I’ve had to compromise on a few little things. First off, we run both the x86_64 and i386 Fedora distributions, but our server is i386 only. This means that, at least for the moment, all of our packages have to be noarch.

Second, a normal part of the build process is to merge the upstream Fedora repositories with the local packages after each build (so it can be used to build the next package). On our server, this takes almost two hours. So I’ve modified it so the build repository doesn’t include the local packages, and that mess is now gone. The downside is that I can’t BuildRequires any local packages, but, seeing as they’re all supposed to be configuration anyway, that hasn’t been a problem yet (and I don’t expect that it ever will be).

Anyhow, aside from some small glitches that seem to reflect more on the slow hardware available, koji has done the trick and done it nicely. With our current setup, I can now add another organization with a minimal amount of fuss, and that’s just what I was looking for! Thanks koji devs!

Gears credit: Gears gears cogs bits n pieces by Elsie esq. Used under CC BY