It’s been months since my last post, so I was planning to sit down and write a post about how we’re using podman and ostree in my work. But, as I sat down, I realized that I just can’t write about that right now.
This month has been difficult on many levels. Here in Ireland, as in much of the world, we’re unable to leave our homes except to buy necessities and exercise (within a 2km radius of our home). We’ve lived in Ireland for almost two years now, and it has become home for us. I’ve worked remotely before and normally enjoy the quiet of working from home, but, given our current circumstances, I just want to leave.
You see, nine days ago, Kristina Dieter, my sister-in-law, passed away from cancer. She was 37. She married my brother, Jason, when they were both nineteen and next year would have been their 20th anniversary. She was an amazing sister-in-law, did an incredible job of raising their four kids (though I suppose my brother gets credit for that too), and was a light of encouragement to those around her. There’s so much more that I could say, but I think it’s best to just link to what my brother wrote on Instagram.
I’ve been using Docker (packaged as moby-engine in Fedora) on my home server for quite a while now to run Nextcloud, Home Assistant, and a few other services.
When I upgraded to Fedora 31, I ran into quite the surprise when I realized that, because of its inability to handle cgroupsv2, Docker is no longer supported out of the box. The fix is easy enough, but I took this as the kick in the pants I needed to switch over to podman.
The process was fairly straightforward, but there were a couple of gotchas that I wanted to document and a couple of podman features that I wanted to take advantage of.
No more daemon
This is both a feature and a gotcha when switching from Docker to podman. The docker daemon, which has traditionally run as root, is an obvious attack vector, and its removal can be seen as nothing other than a pretty compelling feature, but without a daemon, containers will no longer automatically start on boot.
The (maybe not-so) obvious workaround is to treat each container as a service and use an obscure tool called systemd to manage the container lifecycle. Podman will even go to the trouble of generating a systemd service for you, if that’s what you want.
Unfortunately, there were a couple of things I was looking for that podman’s auto-generated services just didn’t cover. The first was container creation; I wanted a service that would create the container if it didn’t exist. The second was auto-updates. I wanted my containers to automatically update to the latest version on boot.
Just yesterday, Valentin Rothberg published a post on how to do the first, but, unfortunately, his post didn’t exist when I was trying to do this a month ago, so I had to wing it. I have shamelessly stolen a few of his ideas, though, to simplify my services.
The one other major feature I wanted was rootless containers, specifically containers that would be started by root, but would immediately drop privileges so root in the container is not the same as root on the host.
The systemd file I came up with looks something like this:
Most of this is pretty similar to what Valentin posted, but I want to highlight a few changes that are specific to my goals:
I’m pulling the image before starting the service. The - at the beginning of the ExecStartPre lines means that, if the pull fails for whatever reason, we will still start the service.
If there’s a container called nextcloud running before the service starts, we stop and remove it. There can be only one.
When we actually run podman run, we don’t use the -d (detached) flag and this is a simple service rather than forking. The reason for this is that I want my container logs to be in the journal, tied to their service, and I haven’t worked out how to do that with a forking service.
The --uidmap and --gidmap flags are used to map the uids and gids from 0-4998 in the container to 110000-114998 on the host. Because a number of containers have nobody mapped to uid/gid 65534, I then specially map that uid/gid to 114999 on the host. Using these flags allows my containers to think they’re running as root when they’re not, and should hopefully help protect my system in the off chance that an attacker were able to break out of the container.
The tty flag is used because we get read/write problems with /dev/stdin, /dev/stdout, and /dev/stderr when using --uidmap 0 without this flag.
Runtime path bug
After running the above setup a few weeks, I noticed that I kept losing the container state. I found a related bug report, investigated further, and realized that the container state for a system service should be in /run/crun rather than /run/user/0/crun, and that the latter directory was getting wiped when I’d log out after logging into my server as root (because root is my own account).