LAVA and Spread in All Scenarios OS CI loop

We use two nifty tools in our CI. Spread is a full-system test distribution tool with very convenient debugging features and LAVA is an automated and distributed system for deploying operating systems onto physical and virtual hardware, running tests and reporting the results.

We want to utilize Spread mainly for development process since it’s really convenient and easy to kick start and we want to use LAVA in our CI loop, integrated with gitlab. These two have one thing in common which is a test definition (in other terms, a yaml with a set of commands we want to run as part of the test), but those test definitions have a different format in LAVA and Spread. Since we wouldn’t like to maintain two sets of definitions, our solution will be to extend the Spread tool a bit so that it can generate LAVA test definition on the fly. The command like would look something like:

spread -lava ... 

The output would be the LAVA test definition ready to be used by LAVA test job.

I was thinking we could do something like this. Inside a project tree we would have fixed spread.yaml file. Inside we would define a lava backend which would have any number of systems. Each of those would translate to a text template that contains a full blown LAVA job preamble and has a way to reference test cases from the spread model.

This could look something like this:

project: sysota
path: ...
                template: lava/avenger96.yaml.tmpl
suites: ....

When invoked with any of the lava back-ends, spread would load the template, based on text/template go package, expose the data model to the template and render the result. In addition, the backend would create a set of shell scripts that describe individual steps through the progression of a spread job. Those would start with project prepare, continue with suite prepare, task prepare, task restore and so on. The set of jobs would be lineralized into a set of distinct scripts that encapsulate the shell commands described by the task.yaml and spread.yaml files, combined with any of the computed environment variables that spread normally provides.

The entire set of files could be archived into a tar.gz file and uploaded to LAVA as a complete standalone job definition. The job could obviously take advantage of anything that GitLab provides, like artifact URLs and any other context-specific configuration.

I’d like to keep thing simple. Maybe I don’t understand spread enough and you don’t understand lava enough to find a middle ground but we’re getting there I think. The basic concept in gitlab should look like this:

  • First CI job builds the code, uploads build artifacts
  • Second CI job runs spread which generates LAVA test definition and any additional scripts it might need to run the tests on DUT, uploads those as a tarball artifacts.
  • Third gitlab job submits the LAVA job(s) based on configuration
  • Fourth job is manual (blocking) job which collects LAVA test results which is run via callback from LAVA when all the LAVA jobs submitted in previous step are finished.

It’s all already there aside from the spread conversion CI job.

So the job configurations need to be maintained (separate repo linked above is a good starting point), since each device will have a different deploy and boot steps.

Example qemu:

device_type: qemu
- deploy:
    to: tmpfs
        url: https://some.image.url/
- boot:
    method: qemu
    media: tmpfs
      minutes: 10
    - 'root@qemux86-64:'
      login_prompt: 'qemux86-64 login:'

Example avenger96:

device_type: avenger96
- deploy:
    to: tftp
      url: http://some_url/uImage
      type: uimage
      url: http://some_url/stm32mp157a-av96.dtb
      url: http://some_url/allscenarios-image-base-tests-stm32mp1-av96-20210511022205.rootfs.tar.xz
      compression: xz
- boot:
    method: u-boot
    commands: nfs
      login_prompt: 'login:'
      username: root
    - 'root@stm32mp1-av96:'

And only test action will be actually shared. The test definition can be shared to, provided the execution environment is the same (bash vs some other shell) and provided we run the same set of tests on separate devices.

So this spread job should just do something like:

spread -lava-compile ...

or similar and the output should be lava test definition and/or a script to be executed by it (if necessary maybe we can do it all inline).

Hope that makes things closer to clarity :slight_smile:

That is exactly what I meant. I did not mean to imply spread would submit the job, only that it would generate it.

1 Like

There was some work in this area recently. First is started with @chase-qi submitting Add support for converting to lava test definitions (!1) · Merge requests · OSTC / tools / oh-spread · GitLab which allowed us to look at what this might look like and to have a conversation about how to proceed.

I’ve discussed this pull request with Chase in a video call and then together with @stevanr and Chase in our weekly Linaro sync. I don’t think we were immediately in agreement, mostly because I am stubborn to get things right even if it takes somewhat longer.

Here’s what we, I hope, agreed to:

  1. The converter will retain spread semantics, so that developers can expect equivalent behaviour locally when running spread directly, and in CI where lava executes all the tests
  2. The converter will support a subset of the features and will actively identify and refuse to work if an unsupported feature is used by the project.
  3. Projects opt into using LAVA by declaring a lava backend in their spread.yaml. The converter uses exports those jobs implicitly, as if it was invoked with spread lava:

The most contentious and complex aspect is related to the spread prepare/restore logic, and the flexible way it can be defined at nearly every level. There’s no direct equivalent in LAVA and we argued if we need to support that feature or not.

For some context, spread project defines a set of tasks. Apart from initial deployment logic, when the system is prepared, everything else is a sequence of tasks that execute in (random) order. Spread allows each task to define an execute script, which should be exactly what the developer wants to see tested as well as prepare and restore scripts, which prepare the system to execute the task and restore it after executing the task respectively. Immediately there is no equivalent in LAVA, where everything is just a flat sequence of steps. In addition, spread has semantics that describe what happens when each of those scripts fails. If the prepare script fails, the execute script does not run and the restore script is started. If the execute script fails, the restore script is started. If the restore script fails, spread assumes the system is now corrupted and stops using it. There is also the debug script which spread runs if it is defined and anything related to a task has failed. The debug script runs before the restore script.

The second problem with how to map that to LAVA concepts is that, for convenience, spread allows to define prepare-each and restore-each scripts at nearly every level of the project: starting from the project-wide, to backend, to test suite. This lets a developer easily ensure that some piece of code runs before or after every tasks contained in a given sub-tree. The question is how to map that to LAVA with its linear run steps script.

My proposal is to to the following:

  • Project prepare and restore are converted to a synthetic test, at the very start and very end of the test actions
  • Suite prepare and restore are similarly converted to synthetic tests. As we iterate over the project structure and visit subsequent test suites, we emit the relevant prepare and restore scripts around all the tasks contained in a given suite.
  • Now for tasks, this is where the magic happens! We keep track of the current suite as we traverse and emit a LAVA test that contains, together: the entire aggregated prepare-each script, task prepare, task execute, task restore, the entire aggregated restore-each script. (I simplified this by ignoring debug scripts but they are not fundamentally any different).

We use shell if-then-else logic to capture the relation and behaviour of each spread task. It is roughly, in pseudo-code, as follows:

set -e
trap EXIT
trap ERR

We stop on any error in simple statements.
On exit, we run restore.
On error we run debug.
We run prepare
We run execute

That’s it

All the scripts need to have correct environment variables or shell functions. There are three sources:

  • shell functions MATCH, NOMATCH and REBOOT - they are constants that come from spread
  • intrinsic spread variables that inform the task about the backend, suite and so on
  • declared variables that come from the environment: section inside tasks, projects and suites.

Lastly spread variants, where a single task.yaml becomes a set of unique named tasks is the last element of the puzzle. Spread offers a way to get all the variables with their correct values.

In pseudo code, the project traversal logic is:

emit project prepare
for each suite:
   emit suite prepare
   for each task:
      for each variant of current task:
          emit aggregated prepare
          emit task execute
          emit aggregate restore
   emit suite restore
emit project restore

The emit word implies that a test definition, with all the variables is created.
The aggregated word implies that all the -each scripts that apply to a given task are combined.

I’ve started to walk on this path with a pull request adding LAVA types: lava: add types describing LAVA concepts (!4) · Merge requests · OSTC / tools / oh-spread · GitLab

I will follow this with the next smallest logical step, working with Chase to review and land all the pieces as we make progress.

That’s it. Let’s get this done.

This is getting out of hand. Every time we move one step forward, we also take two steps back. I’m afraid at this stage of the ASOS project, “being stubborn to get things right” is not really a way to go. We have to focus on our goal at this point. So what is our goal? Currently, we don’t have a proper CI and we can’t force any kind of testing in gitlab on our developers. Our CI in manifest is broken for 4 days straight as we speak. Also it’s third week in a row that this happens.

And here we are trying to develop something that will help a small niche of developers (only those working on linux target) to utilize both spread and LAVA in their testing. I was ok with us spending some time on this, but looking at the time we already spent on this and pace we’re currently progressing on (with regards to our meetings, discussions, MRs etc) I think this should be put on hold for now. Don’t get me wrong I’d really like to see this brought to fruition as much as you but the priority for this should be lowered by a fair bit.

Wrt your 3 point I generally agree with this, with a caveat that we should still discuss a bit more about this initial subset of features that we want to support. Otherwise the design look really good, great work!!

I don’t see how you reached this conclusion. It’s certainly not that in my opinion. There are several goals in flight. They support different needs of our project:

  • support packaging and integration - building recipes, images, some basic testing
  • support development of new solutions - this is mainly the OTA stack
  • support running CTS - this is the ACTS repository and everything inside

Two of those rely on spread to describe and execute tests today. I specifically chose spread for linux work because I know how immensely important it is for development of complex system software. The plan I laid out allows us to scale that technology to on-device testing through LAVA. I don’t think it is practical to use LAVA directly for that, mainly because it’s not something one can run locally with ease. In practice it’s either easy or people will just not use it and not take advantage of it.

Spread is extremely practical to write regression tests, tests are in the same repository as the code that gets exercised. I think LAVA shines at integrating third party test suites or custom scenarios that run as batch background activity and can aid kernel and toolchain developers with feedback from a wide range of devices.

But back to your statement. I’m unclear as to why exactly you think this is getting out of hand. This is exactly what I described several weeks ago. There is nothing new here. I just described it in detail to show what needs to be done.

As for CI, I laid out tasks to move the ACTS tests, with spread, before LAVA conversion, to the manifest level. Those were not done so we missed the compiler change breaking ACTS tests by, presumably as this is unclear at this time, breaking ssh.

Can you propose how we can get proper CI? I’m really happy to see improvements to what we do.

As for manifest, I didn’t know about that. I just realized there’s an unset variable that makes things not build. If you knew about it why didn’t you bring it up before? Let’s get that fixed and carry on our work.

Couple of notes here, though I somewhat agree with your logic. You can’t force anything on developers. You know that. However it’s easy to use. The only thing you can really force on them is the CI loop. Tests don’t pass? MR can’t be merged. Period. What we need here is a high-throughput CI system running ACTS and that should be our primary goal (talking about point 3 to keep things inside the topic).

The reliance on spread in acts is only partially correct. Only jffs2 afaik needs some preparation/restore scripts and that can be handled in LAVA quite easily. The reliance on OTA was introduced a few weeks ago, again, for your own convenience (correct me if I’m wrong). I understand it’s much easier for you to set up spread then lava jobs and I’m totally ok with that.

Now, don’t get me wrong. I want to reiterate that I have nothing against us doing this. It’s the priority that concerns me. If we can’t agree on this let’s escalate and get a third party opinion, I’m completely fine with that.

There’s a lot of new things here actually. Support for WHOLE prepare/restore logic was never mentioned. Generating test job definition was never agreed upon. These are just two examples of the top of my head.
The development time went from my one man/week estimate to IMO 1+ man/months (if we also count our time to exchange angry forum posts :slight_smile: ) It’s the time / benefit ratio that makes me think this is going out of hand. If the estimates prior to our discussions were clear enough, I would consider the priority before we started executing.

I hope this clears things up a bit about why I have such strong opinions on this moving forward.

This is on the verge of being toxic. I don’t think pointing fingers will help us in any way here. I’m sorry if you felt like I’m blaming you for the CI issues. I’m not, I’d just like us to fix things and move forward.

I’m not talking about the scheduled pipeline, this is the first time I see it as well. I made a typo I was talking xts_acts and about how we’re not able to merge anything b/c of the spread SSH issue that bero fixed (hopefully) today. It happened just after hisilicon taurus issue.

I’m sorry, that was not my intent.

Just to be clear, this was CI catching a bug that went unnoticed through another repository. This is a success story even though it caused us some downtime.

We can help developers set the right standard. We can help them by bringing tools that help them do their job instead of getting in the way. I’ve used lots of CI tools in my life but I’ve not seen anything like spread before and I know it is transformative when you apply it right.

You’ve mentioned CI blocking merges. That’s not the point. CI showing a failure a developer didn’t see is either because the developer didn’t think about checking or didn’t bother checking. We cannot fix the first but I strongly believe being able to check is key. CI, as people often use it, is not allowing you to check until you push. People either push noise that swamps CI or push infrequently, hoping to get that green tick after spending lots of time on testing locally, if at all possible.

Spread changes this balance by having a way to check effortlessly while we develop. It’s the right blend of unit test and integration tests, that lets you have confidence while you work on the code. I want to bring this as the value to our team because I’ve seen it work wonders before.

That’s true but that’s also what we do. We do what’s convenient. If you want to we can start a mini project to get all of ACTS on top of lava natively, as a intermediate step to validate all the bits in place, before we have spread integration. If it’s only a matter of writing a job and getting some test data out, and can time box it to a few days then let’s do that.

In our current work ACTS is both important and not really important. It is not integrated into meta-ohos and manifest CI yet. We know it is full of failures because the code was meant for LiteOS, not Linux, originally and lots of assumptions are just wrong.

I think that it is way more important to support the OTA effort, the upcoming work on blueprints, than to support ACTS. I look at ACTS a background activity that we have some fixed constant amount of effort devoted to. I don’t see it, realistically, being any kind of bar that we measure ourselves against. On one hand side it’s just broken on Linux, on the other hand side we have a small fraction of it working. The “rest” of ACTS is nothing but unit tests for specific libraries that are a part of Open Harmony that we don’t even have in any of our images. I understand the organizational importance of ACTS but it has no impact on day to day development yet, and won’t have in the next 3 to 6 months IMO.

I understand you are worried about the scope of this spread conversion task and that’s exactly why I’m hopping in to get it done. I can get that working quickly because I understand spread and I hope to understand LAVA well enough to connect the points.

Right now we have almost no “serious” development work going on. We just don’t write that much software. We focus our attention on integration of existing software. I want us to be ready to when that changes. In that moment I fully want to use spread for all testing on Linux, so that we can write much better software with it than without it. If it takes an iteration for us to get that working I think that’s time very well spent for the months-times-people ahead.

@stevanr I have the exporter working now, all the features except variables. I didn’t touch variables yet, but I bet it’s all going fall into place today. I recall @chase-qi mentioned there are lava parameters that we can put to express them can you comment on the lava types PR to point out where they go and where they are specified?

Here’s a dummy export from xts-acts: Ubuntu Pastebin

Please have a look and point out issues