Linux System Configuration Data Mechanism
The key to making read-only, immutable rootfs useful is to combine it with a mechanism where some modifications are possible and are saved in a non-volatile space across reboots. This can include things as simple as the hostname
and timezone
setting or as complex as the set of system services, timers and sockets that are enabled or the set of SSL certificates the device knows about.
The key to make this work is to combine a technical measure which accepts modifications of a given file with an operator controlling a specific set of modifiable files and directories.
Making things writable again
To make a file writable, the immutable system image can contain one of the following objects:
1. A symbolic link pointing to a well-known, stable, writable replacement.
For example, the /etc/hostname
file might be a symbolic link to /run/sysota-etc/hostname
where another mechanism persists the changes to hostname
and restores them in the early boot process
Symbolic links may be problematic if the software writing to the file creates a temporary file and performs an atomic rename operation. In that case, the entire directory must be technically writable for this to work without additional patches that must be maintained in the distribution.
2. A bind mount pointing to a well-known, stable, writable replacement.
This is very similar to the symlink approach with the following essential differences:
- the file cannot be unliked
- software which inspects file sype will see a regular file instead
- it consumes an entry in the mount table
Bind mounts are very flexible but have added complexity with regards to mount event propagation and with regards to making the mount table cluttered or convoluted
3. An overlay filesystem
This approach combines a set of lower directories that are not modified (e.g. one of the System Root partitions) and one upper directory which stores all the writes.
Overlay filesystems are a fundamental technology at this time, but they still have some shortcomings. For example, apparmor is not very compatible with overlayfs. This may impact some of the sandboxing technology.
We could use overlay on top of strategic places, such as /etc
where other places /var
are handled with a large bind mount.
4. A FUSE filesystem
We could mount a custom FUSE filesystem over /etc
, which handles redirects access to specific places. This could be more elegant than a swarm of bind mounts. This would require custom engineering and has a small performance penalty but for things in /etc
I think that would not be a problem.
I would only consider this if we absolutely have to, as it is clearly a large and complex piece to implement if it is meant to manage /etc
. We may also use an existing solution if one exists. One main advantage of this would be relative simplicity for user space, as it would behave like a normal file system and we could flexibly redirect access to read-only files or to writable files or to even synthesize files on the fly.
State Operators
State Operators are an idea to manage state in a specific location in the file system. An operator is a program which implements a specific interface and is registered in the central registry as a manager of a specific file or a file hierarchy.
In general, each file and directory in the mutable space is managed by exactly one operator. The system must ensure there is no ambiguity as to which operator is responsible for each file.
Operators allow us to have the flexibility to control what happens to a modification. For example: an operator could parse a modified /etc/hostname
, store the actual value in a dedicated location and re-create the file on boot, making the file mutable but also ephemeral. Another operator could simply discard all modifications, re-creating the
file based on some data source. This idea allows us to use a regular writable filesystem, ephemeral or not, and have a way to both manage and model the data. This last aspect is important when data changes formats and the update system must create the representation that is in agreement with the software in the root filesystem.
This is why it is essential for operators to be able to both read and write the data. A hypothetical operator could parse /etc/hostname
and /etc/timezone
and store both in an internal format that is managed by the OTA system. Following that the operator can render the data stored internally into an appropriate format.
Operator API
type StateOperator interface {
// UnmarshalDirectory inspects the directory alone, and marshals it into internal state.
// A directory is always unmarshaled before any of the children.
UnmarshalDirectory(path string) error
// UnmarshalFile inspects the state of a given file and marshals it into internal state
UnmarshalFile(path string) error
// MarshalDirectory creates or re-creates a directory at a given path.
MarshalDirectory(path string) error
// MarshalFile creates or re-creates a file at a given path.
MarshalFile(path string) error
}
For example, a sample marshaler could handle /etc/hostname
and /etc/timezone
by storing them in a “registry” (whatever that is). This could be defined declaratively as follows:
operators:
system-config:
registry:
"/etc/hostname": "system.hostname"
"/etc/timezone": "system.timezone"
locations:
"/etc/hostname": system-config
"/etc/timezone": system-config
In code it could look something like this:
reg := NewRegistry("...")
sysConfig := NewRegistryOperator(reg, map[string]string{
"/etc/hostname": "system.hostname",
"/etc/timezone": "system.timezone",
})
// Walk all of /etc picking the right operator for each file we've seen.
// Using the locations map, we finally reach /etc/hostname and /etc/timezone
// and those are unmarshaled and stored in the registry.
sysConfig.UnmarshalFile("/etc/hostname")
sysConfig.UnmarshalFile("/etc/timezone")
// The registry can now be saved, on error the data could be discarded.
// /etc/hostname is re-constructed by this specific operator
sysConfig.MarshalFile("/etc/hostname")
Proposed state operators
I think there are three operators we would actually need:
- Files that are parsed, stored in the registry and re-created - we can migrate the format
- Files that are stashed entirely and not parsed - no way to migrate those
- Files that are never stored and are always restored from a reference copy (e.g. system root)