Bootloader Protocol

I want to describe the “bootloader protocol”, which is the model of interaction between System OTA service and the platform boot loader.

The protocol allows the OTA service to discover and distinguish update slots (available spaces in the A/B update model). Having written an update to the inactive slot the OTA service can configure the boot loader for a one-off boot into that slot.

On subsequent boot the bootloader can be configured to permanently boot into that slot, this swapping the roles of active and inactive slots.

How this is implemented is up for the platform to decide. Integration with the System OTA is exposes as native code which is a part of the service process. It could be tailored code that understands a given boot loader and does everything internally or integration/glue that runs system hooks implemented in arbitrary ways.

I’m working on a prototype that implements this for a Raspberry Pi 4, with the native pi bootloader, that is without an uboot in the way. This will let me examine the proposed initial API and see if anything needs changing.

The current rough proposal, in terms of a go interface is:

// Slot describes a bootable system partition in an A/B scheme
type Slot struct {
	// XXX: needs more design
	DevName string
	TryMode bool
}

type RebootFlags int

// RebootTryMode indicates that the system should use one-time boot configuration
const RebootTryMode RebootFlags = 1 << iota

// Protocol encapsulates the interaction between the OTA system and the boot loader
type Protocol interface {
	// QueryActive returns the slot that is used for booting
	QueryActive() (*Slot, error)
	// QueryInactive returns the slot that is not used for booting
	QueryInactive() (*Slot, error)

	// TrySwitch configures the boot loader for a one-off boot using the given
	// slot On the next boot invoked with RebootTryMode, the system will boot
	// into the try slot.
	TrySwitch(*Slot) error
	// Reboot gracefully reboots the system
	Reboot(RebootFlags) error
	// CommitSwitch re-configures a try-mode slot for continuous use
	CommitSwitch(*Slot) error
}

EDIT: ^ that interface has changed slightly, have a look at the following post for the real interface:

I’ve proposed the bootloader protocol as boot: define the "boot protocol" interface (!7) · Merge requests · OSTC / OHOS / components / SystemOTA · GitLab

There also the first implementation, for the raspberry pi, proposed at Draft: boot: implement the boot protocol for the Raspberry Pi bootloader (!8) · Merge requests · OSTC / OHOS / components / SystemOTA · GitLab. It is currently marked as a draft because it depends on the bootloader protocol itself and also on a raspberry pi specific config.txt model proposed at picfg: model Raspberry Pi "config.txt" file (!4) · Merge requests · OSTC / OHOS / components / SystemOTA · GitLab

Please have a look and poke holes at it.

Quite a bit of that has been merged already and we managed to find our first bug. The boot protocol was passing two arguments to reboot, 0 and tryboot separately, while the it ought to be one 0 tryboot. This was caught with exploratory testing on a real system. This is also the only part of the integration test that is mocked out because it does not run on a real piece of hardware yet.

Separately, I have a feeling we need to have RollbackSwitch() as well, mainly to have a way to clean up files that are not used anymore but may be confusing to anyone looking at the system.

CC @agherzan, I will follow up with RollbackSwitch or CancelTrySwitch to complete this picture. I’ve filed Add CancelTrySwitch method to boot.Protocol (#12) · Issues · OSTC / OHOS / components / SystemOTA · GitLab to capture that. Not urgent but something to pick up before we forget.

We should look at integrating with RAUC using the custom bootloader support: rauc/bootchooser.c at 9e13bdd51192f83e6cd5d3b0c88eaad1b3988696 · rauc/rauc · GitHub

EDIT: I will check out RAUC and see what kind of mismatch, if any, exists between bootloader.Protocol and RAUC’s custom hook-based system.

I’ve added CancelSwitch (nee CancelTrySwitch) in boot,cmd,service: add CancelSwitch to boot protocol (!34) · Merge requests · OSTC / OHOS / components / SystemOTA · GitLab This closes the following issue: Add CancelTrySwitch method to boot.Protocol (#12) · Issues · OSTC / OHOS / components / SystemOTA · GitLab

I’ll use Monday to look at RAUC integration scripts.