Image-Based Linux Summit Berlin 24th September 2024
# Attendee’s projects
# systemd mkosi SUSE: MicroOS/Tumbleweed Red Hat:
image-builder/osbuild ,
bootc , systemd, systemd-boot Microsoft: confidential containers, Flatcar, Azure Boost, Mariner/Azure Linux Edgeless Systems: Constellation,
Contrast (confidential containers),
uplosi NixOS: systemd-{repart,stub,boot,cryptenroll}, tvix-store,
Lanzaboote , UKIs,
NixOS Bootspec , PE binary manipulation in Rust Canonical: Ubuntu Core, snaps, foundations/bootloader/package management System Transparency: UKI, UAPI specs, systemd-cryptenroll, remote attestation, TPM eventlog, mkosi Freedesktop-sdk GNOME OS Pengutronix: OE/Yocto, nspawn, RAUC, verity, OPTEE Arch Linux: sbctl/go-uefi, signstar, mkinitcpio, arch-install-scripts PostmarketOS Summary of Last Year’s Progress
# Flatcar:Sysexts widely adopted: In-system (shipped with OS), optional but system dependent (updated in lock-step with OS, like podman/ZFS), Custom / user provided (updated from custom servers via sysupdate) Composable OS is more flexible, some traction with contributors, even vendors (Akamai) Mutable sysexts feature available in systemd-256 and newer. Slowly trickling into distros. OS test suite keeps finding bugs in LTS kernel patch-level updates 😅 NixOS:Systemd-repart integration to build NixOS images Edgeless:Not much progress on
package manager lockfile proposal Talk about reproducible OS image builds with mkosi @ fosdem (
repo ) uplosi tool for uploading OS images to cloud provider, can pre-calc PCRs for measured boot (started pre-systemd-measure) openSUSE MicroOSsystemd soft-reboot with transactional-update (switching btrfs root subvolume) Full disk encryption with systemd-pcrlock / signed policies (TPM and FIDO2) Working on BLS support on all architectures Systemd-boot integration in YaST2 Installer and all relevant tooling GNOME OSSignificant investment in integrating systemd-homed with GNOME (WIP) systemd-sysupdate integration with the GUI app center (GNOME Software) Various improvements to systemd-sysupdate (optional components, bug fixes) Integration of systemd-sysext into GNOME’s CI pipelines - testing system components as easy as Flatpaks for apps Switch to ukify, drop dracut We now use systemd main branch systemdTons of stuff, see ASG Thursday LPCUapi wishlist room
https://uapi-group.org/kernel-features/ contributions welcome (via PR) Session on initrd changesInitrd are quite large with UKIs, need to rm -rf content, tons of work to do, unnecessary We have some agreement on umounting initrd from the host New root filesystem type as the immutable / (single inode) that can stay around, initrd mounted on top, so it can be removed Page-aligned erofs images packed in uncompressed cpio, kernel could zero-copy use them, so we avoid copying/uncompressing on boot No agreement on immutable via erofs mkosiNow fully unprivileged (fs generation via systemd-repart)BTRFS work ongoing to do offline compressing/subvolumes No more bubblewrap/newuidmap mkosi-sandbox instead of bubblewrap More scripts (version, config) Mkosi-initrd as an easy to use tool to build initrds locally Various new distributions OpenSSL engine support for HSM signing (signed) shim, systemd-boot, grub support Sysupdate support Red Hat Image Builder / osbuild https://uapi-group.org FOSDEM CFP open, will prep a proposal Image Based devroom Kernel: IPE LSM is now upstream (in v6.12), image-based distros can do code integritynot yet for scripts, there’s ongoing work for that Topics
# Dual-booting DPS distros, DPS partition ownership (
https://github.com/uapi-group/specifications/pull/117 )New installer for GNOME OS, no longer supports /etc/fstab
Cannot identify which root/usr partition belongs to which distribution home/srv are shareable two instances of the same os is not supported rootfs discovery is done on pattern matching on labels (implemented in sysupdated, not in the spec yet) corresponding /var partition is found via hashing of the machine-idrelated: mkosi images with integrated /var currently needs a fixed machine-id Q: Who has adopted DPS in production?Emil: not adopted in SteamOS, as not the full partition set could be discovered We can add label-based matching/filtering. There is an open PR for that. Need backward compat, if there are no labels do the autodiscovery People do want to use F39 and want to try F40 and maybe go back, so multiple versions of the same OS can be useful Pick the rootfs, then the rest is derived (home shared, var hashed by machine-id) Comment: It’s hard to specify root=
with UKISolution: new credential, TPM locked /etc/os-release
add new field, or re-purpose IMAGE_ID Generic, stateless OpenPGP verification of distribution artifacts (repositories, packages, installation media, updates, etc.)Links: Tons of pitfalls, gpg not standard compliant anymore, is not stateless with regards to keyrings Would be great to have stateless (no keyring) thingy for signatures why not PKCS#7 Apt has global trusted.gpg.d
directory, being phased out, specified inline in the src line seems similar to how distros handle TLS certificates differently systemd currently could use this for verity authentication and downloading suggestion: write a UAPI spec (agnostic with regards to OpenPGP or PKCS#7 or SSH?) Spec says where to put keys, how to name them Key need a purpose, path specifies it, so that a key cannot be used to add trust to another key, only to a repo Should support the usual /usr/
, /etc/
, /run/
priorities Directory structure should be used to find the right key (apt key or sysupdate key), but key policy should be inside the key (signing key or bla key)find key type from the file name suffix Not turn it into a Certificate Store thing, like openssl Have a way to find a private key corresponding to a public key Reserve some OIDs for usage, or define usage extension fields would like to add a security feature (BPF LSM) to reject any access to an unauthenticated FSdm-verity/dm-crypt or fixed list (sys, proc, …) also reject any dev nodes outside of /devchromeos has something similar Reject AF_UNIX
sockets in /etc/
or /usr/
Any other “no-brainer” security policies we should add globally? TODO: file RFE on systemd to track list of policies BPF LSMs are often closed-source currently Make program visible in BPF FS when loaded, so you can do ConditionPathExists=/sys/filesystem/bpf
(TODO: fix this) Think about the relation between FIDO2 + TPM2 (tpm2+fido2 as PIN, TPM2 and FIDO2 as recovery key)No two factor auth, it’s either TPM or FIDO Chromeos makes the tpm send a challenge to the fido device, then the TPM reveals the encryption keycould likely also be used with PKCS#7 HSMs SSS (Shamir Secret Sharing) support was being worked on, but keys are combined in software on the CPU, so the spooks can get at you SSH agent could also use this scheme TPM policy expresses that there must be a challenge-answer Q: use this for remote proof of identity?we have PCR15, which includes machine-id and could have other identity identifiers immutable without significantly raising the bar for contributorshttps://gitlab.com/postmarketOS/postmarketos/-/issues/62#note_2095562422 How to deliver an immutable system without raising complexity Need to figure out how to build imageImage build efficiency: composefs Q: how about sysexts?Thilo: there are plans to load sysexts very early (initramfs). Also can create systexts on the fly - using mutable overlays and capturing the writes in a separate “working” directory, then building a new sysext from that perhaps do an overlayfs and keep a list of user-installed packages? This completely breaks the security modelFor developers, it makes sense for a “break glass” mechanism to do things locally For users, the update mechanism should be designed with security in mind and do transactional updates Think hard on whether the immutable system is the right approach for the use case a way would be to use local keys to build images locally. but uses the full image workflow. They currently use apk and install packages into an overlayfs, purely additive. the overlayfs would become incompatible on update, so nuke it. This kind of issue will become bigger when moving to UKI.Seems like using measured boot is much more interesting than secure boot. Perhaps mkosi approach, keep a list of packages and generate a locally signed sysext. then refresh the sysextJust build the full image locally, no need for sysext if you can sign everything yourself maybe have a “developer mode” like chromeos. there is an entry in the systemd todo listgnome os is looking at this as well have an explicit knob in the TPM for this NIXOS has a similar mode On qualcomm phones there is no way to have a fTPM in TrustZone systemd on muslhttps://catfox.life/2024/09/05/porting-systemd-to-musl-libc-powered-linux/ pmOS is for smartphones, they have a high number of contributors relative to the number of users Tons of patches We merged some patchesthey (adelie folks) use s6 parts to support group shadow stuff that’s required by systemd have a shim library when using systemd on muslpidfd_spawn
, shadow/gshadow, printf
string formatter, detected by pkgconfigMaintain shadow parser and printf in parallel with musl needs to be done by libc experts systemd adds more nss stuff (userdb), might be considered advanced use cases locale support in musl is problematic (tests fail)there is conditional execution of tests in systemd already (e.g. when running in a container) Q: how is pmOS handling networkA: NetworkManager, due to switching between mobile and wifi GUIs are also deeply integrated with NetworkManager be careful with musl on memory constrained systems, as they don’t release memory back to the OS Missing functionality for managing /etc/
as a writable confextPersisting /etc/machine-id
Deriving the machine-id for /var/
, even though /var/
is probably storing the mutable /etc/
context Handling presets. Store them in /usr/
somehow? Possible solution: Just mount /var/
from initrd? (Who does this actually affect, and do they use image-based / confexts / whatever?) Move mounts that exist on /etc/
when reapplying, writable /etc/
on top of the stack Overlayfs works by using the superblock for each elements, pins it, and doesn’t look at the VFS at all, so it needs to be handled by the confext layer How to handle cases where only /var/
should be persisted?repart uses the ephemeral machine-id to create /var/
there is now also systemd.machine-id=firmware
Could add a mechanism to persist machine-id via tpm (NV index, allows factory reset) or system credential could also use the /var/
partition UUID as machine-id?problematic from the security perspective, needs some authentication perhaps reopen the discussion to mount /var/
early (in the initramfs)MicroOS already mounts /var/
(and /etc/
overlay) in the initrd things are a bit tricky around soft-reboot because there is no initrd Support for CC-based (confidential computing) runtime measurements (i.e. “PCRs”) in systemd-stub, systemd-pcr*Most systems have vTPM Should systemd support this if it detects CVM?problematic for Intel, as they do it differently compared to everyone elselooks like they will move to a TPM API as well separate PCRs are required for disk encryption via TPM policies, not needed for remote attestation sd stub already supports some CC measurement log Use case: runtime generalised measurementsStuff we build in systemd are not ima based Industry largely is accepting that vTPM is the interface to provide to the OSIf userspace uses the 4 intel registers to map to the 4 PCRs systemd uses, this could be hacked to work for very minimal/fixed userspace, tpm might not be needed, as everything can be measured at the start Unpriv container support/mountfsdnewuidmap is bad We have mountfsd/nsresourced Doesn’t work with containers in the home directory Extend mountfsd to get file descriptor of directory instead of image file Predefine 65k users for containers, no dynamic, no system specific, always the same, statically mountfsd enforces that the setup matches, then checks the parent inode to check that the caller owns it, then set up id mapped mount Dynamic ranges get mapped to this static range Homed already uses id mapping, kernel doesn’t allow nesting. This will be fixed in the kernel to allow multiple mappings. FreeIPA has existing systems with large amounts of mapping, want to make it configurableBut if OCI tarballs have to be built offline, no way to synchronise them Mountfsd also needs to learn to delete them, as unpriv users cannot delete files they do not own Once available this may be used for mkosi too IPC is via a varlink unix socket, so they could be used in containers as well via a bind mount Partition resizing discussionShould repart start resizing filesystems on its own? There’s nothing to resize vfat. What do we do if we want to grow the ESP? Gnome OS wants to use repart for the installer don’t resize ESP. use xbootloader Android uses dm-linear to runtime-merge partitions DPS GPT flag for extension partition to concatenate them with dm-linearDifficult to implement in repart for the allocation algorithm related: fuchsia has a
blobfs , which makes everything accessible via a rootfs Confext dm-crypt+dm-verity loopEnd goal: being able to reason about every single inode on the system, with different owners/signers per component Agree with this, an issue with composefs because of the multiple layers Hermetic /usr/
+ Hermetic /etc/
Easy parts done for MicroOS Some projects can’t have their files split, some refuse patches (some quite rudely) glibc would take patches, but they don’t want to do the work and it needs to be reimplemented in glibc MicroOS currently targets putting everything in /usr/etc/
and only using that as a fallback if sysadmins don’t put a file in /etc/
(
libnss_usrfiles ) tmpfiles + /usr/share/factory/etc/
+ symlinks seems to work great suggest using volatile /etc and have an allow list for specific stuff for the base os we should be fine now (on fedora), except /etc/services
MicroOS doesn’t have /etc/services
any more related: /etc/protocols
/etc/shells
is read by many things directlyuse tmpfiles to create a symlink for glibc, most important is ldconfig, services, nsswitch.conf https://github.com/authselect/authselect/pull/380 it would be nice if overlayfs had a way to “reveal” (config) files from the lower layersyou can diff between upper and lower layers Do we need a arewehermeticyet page? (there’s no “.yet” tld though) also relates to random top-level paths (mount points in /
, container usecases, /opt/
) OS Factory ResetWhat to do with user content on the ESP (credentials, self-signed UKIs)? Original patch set for factory reset became purge, which then tried to delete home directories, which made some people unhappy Need to factory reset TPMThere is an API for userspace to request this on reboot There should be a way to request factory reset from the boot menuthat then triggers repart factory reset, TPM reset and something like bootctl factory-reset Should we have global directories to sort things by owner? Undecidable whether e.g. nVidia drivers should be removed or not, might be required for graphics to work, but maybe the factory reset was initiated to get rid of them let’s add a vendor directory that does not get deleted systemd-pcrlockCanonical wrote a whitepaper on PCR-based unlock, but it has not been published yet TODO: ask to get it published SUSE has adopted pcrlockTheir older Dell laptops don’t support NV index policies The problem will resolve itself by waiting, until then we need graceful fallbacks Order of measurement is deterministic by PCR, but not globally across all PCRs Golden measurements supported, provided by vendor (in /usr/
), derive the rest from hardwaresome laptop vendors provide PCR0 values for LFVS, but the actual measurement values would be better Ideally would supply measurements in the json format for other PCRs touched to the firmware pcrlock is not very user friendlyShould have documentation pointing users to the obvious defaults that they should pick --lock=auto
option like cryptenrollpcrlock status, like bootctl show what local hardware supports pcrlock in combination with signed pcr policies is fixed in v257 via key sharding Recovery key fallbackWindows doesn’t do much user-friendly stuff, nobody knows their bitlocker recovery key GNOME OS: show GUI when measurements change, to tell the user ‘something’ happened, which PCR is different pcrlock will find a measurement that it can’t make sense of in the log No verb to machine-readable output, can be added This is the remote attestation problem, as you cannot trust the output of a machine booting a compromised kernel There was a bluetooth based project to tie this to a phone (FOSDEM ?? Ultrablue) Delivering credentials via systemd-creds to initrd for non-UKI systems (type#1)?Define UEFI protocol to pass sidecars to sd-stub, which will look for it together with the filesystem-based ones Ostree adds a new menu entry on each system update Btrfs creates a new snapshot on any change to the rootfs (update, config changes, etc) Nixos creates a new entry every time the system config changes Can a credential be used instead of the kernel command line?Problem is GUI: we need a chooser in the menu By opening up the kernel command line choice, we need to ensure we are not opening up a security hole to let a user access the root user Action Items
# file RFE on systemd to track list of BPF LSM filesystem lockdown policies Make program visible in BPF FS when loaded, so you can do ConditionPathExists=/sys/filesystem/bpf
ask for Canonical whitepaper on PCR-based unlock to be published writeup for LWN, send draft to group CFP for FOSDEM for image-based devroom organise 4th summit next year mastodon posts about systemd v257 features add stuff to kernel features wishlist file “user friendly pcrlock” issue uki.d
vs uki.vendor.d
dropins for factory resetcreate PR for certificate directory structure make suggestions on other projects to invite (use Signal)