When I upgraded to Fedora 31, I ran into quite the surprise when I realized that, because of its inability to handle cgroupsv2, Docker is no longer supported out of the box. The fix is easy enough, but I took this as the kick in the pants I needed to switch over to podman.
The process was fairly straightforward, but there were a couple of gotchas that I wanted to document and a couple of podman features that I wanted to take advantage of.
No more daemon
This is both a feature and a gotcha when switching from Docker to podman. The docker daemon, which has traditionally run as root, is an obvious attack vector, and its removal can be seen as nothing other than a pretty compelling feature, but without a daemon, containers will no longer automatically start on boot.
The (maybe not-so) obvious workaround is to treat each container as a service and use an obscure tool called
systemd to manage the container lifecycle. Podman will even go to the trouble of generating a systemd service for you, if that’s what you want.
Unfortunately, there were a couple of things I was looking for that podman’s auto-generated services just didn’t cover. The first was container creation; I wanted a service that would create the container if it didn’t exist. The second was auto-updates. I wanted my containers to automatically update to the latest version on boot.
Just yesterday, Valentin Rothberg published a post on how to do the first, but, unfortunately, his post didn’t exist when I was trying to do this a month ago, so I had to wing it. I have shamelessly stolen a few of his ideas, though, to simplify my services.
The one other major feature I wanted was rootless containers, specifically containers that would be started by root, but would immediately drop privileges so root in the container is not the same as root on the host.
The systemd file I came up with looks something like this:
[Unit] Description=Nextcloud Wants=mariadb.service network-online.target After=mariadb.service network-online.target [Service] Restart=on-failure ExecStartPre=-/usr/bin/podman pull docker.io/library/nextcloud:stable ExecStartPre=-/usr/bin/podman rm -f nextcloud ExecStart=/usr/bin/podman run \ --name nextcloud \ --uidmap 0:110000:4999 \ --gidmap 0:110000:4999 \ --uidmap 65534:114999:1 \ --gidmap 65534:114999:1 \ --add-host mariadb:10.88.1.2 \ --hostname nextcloud \ --conmon-pidfile=/run/nextcloud.pid \ --tty \ -p 127.0.0.1:8888:80 \ -v /var/lib/nextcloud/data:/var/www/html:Z \ docker.io/library/nextcloud:stable ExecStop=/usr/bin/podman rm -f nextcloud KillMode=none PIDFile=/run/nextcloud.pid [Install] WantedBy=multi-user.target
Most of this is pretty similar to what Valentin posted, but I want to highlight a few changes that are specific to my goals:
I’m pulling the image before starting the service. The
-at the beginning of the
ExecStartPrelines means that, if the pull fails for whatever reason, we will still start the service.
If there’s a container called nextcloud running before the service starts, we stop and remove it. There can be only one.
When we actually run
podman run, we don’t use the
-d(detached) flag and this is a simple service rather than forking. The reason for this is that I want my container logs to be in the journal, tied to their service, and I haven’t worked out how to do that with a forking service.
--gidmapflags are used to map the uids and gids from 0-4998 in the container to 110000-114998 on the host. Because a number of containers have
nobodymapped to uid/gid 65534, I then specially map that uid/gid to 114999 on the host. Using these flags allows my containers to think they’re running as root when they’re not, and should hopefully help protect my system in the off chance that an attacker were able to break out of the container.
ttyflag is used because we get read/write problems with
--uidmap 0without this flag.
Runtime path bug
After running the above setup a few weeks, I noticed that I kept losing the container state. I found a related bug report, investigated further, and realized that the container state for a system service should be in
/run/crun rather than
/run/user/0/crun, and that the latter directory was getting wiped when I’d log out after logging into my server as root (because root is my own account).
With the podman fix described in the last section, my containers are now working to my satisfaction.
The real joy is when I run the following:
# podman exec nextcloud ps ax -o user,pid,stat,start,time,command USER PID STAT STARTED TIME COMMAND root 1 Ss+ 11:57:10 00:00:00 apache2 -DFOREGROUND www-data 23 S+ 11:57:11 00:00:06 apache2 -DFOREGROUND www-data 24 S+ 11:57:11 00:00:04 apache2 -DFOREGROUND www-data 25 S+ 11:57:11 00:00:03 apache2 -DFOREGROUND www-data 26 S+ 11:57:11 00:00:04 apache2 -DFOREGROUND www-data 27 S+ 11:57:11 00:00:02 apache2 -DFOREGROUND www-data 28 S+ 11:57:16 00:00:03 apache2 -DFOREGROUND www-data 29 S+ 11:58:18 00:00:02 apache2 -DFOREGROUND www-data 34 S+ 12:01:07 00:00:05 apache2 -DFOREGROUND # ps ax -o user,pid,stat,start,time,command USER PID STAT STARTED TIME COMMAND ... 110000 64235 Ss+ 11:57:10 00:00:00 apache2 -DFOREGROUND 110033 64336 S+ 11:57:11 00:00:06 apache2 -DFOREGROUND 110033 64337 S+ 11:57:11 00:00:04 apache2 -DFOREGROUND 110033 64338 S+ 11:57:11 00:00:03 apache2 -DFOREGROUND 110033 64339 S+ 11:57:11 00:00:04 apache2 -DFOREGROUND 110033 64340 S+ 11:57:11 00:00:02 apache2 -DFOREGROUND 110033 64343 S+ 11:57:16 00:00:03 apache2 -DFOREGROUND 110033 64359 S+ 11:58:18 00:00:02 apache2 -DFOREGROUND 110033 64402 S+ 12:01:07 00:00:05 apache2 -DFOREGROUND ...
Seeing both the
www-data uids mapped to something with more restricted access makes me a very happy sysadmin.