IP Toolchain QA

Please put your questions about the IP toolchain in this thread. We will use them to drive a training/tutorial session

This is a wiki post, so you can edit.

  1. Where can I find docs of the output format of the IP toolchain? My specific case is to crosscheck it with the security pass

it’s about syncronization between the two pipelines. Define a common json schema and file names. Will do next week

  1. What are the hardware infrastructure requirements for running IP toolchain as a part of Continuous Integration process?

1 dedicated gitlab-runner machine (8|16 cores, 16|32GB RAM, 1|2TB persistent storage). Persistent storage is needed for spdx/json data pool (which has also history)

  1. Do we plan to host whole IP toolchain in OSTC infrastructure?

yes. We can plan to start migration after the 1st week of September

  1. If answer to previous question is ‘yes’, are there any components that are currently hosted not in OSTC infra and requires migration?

yes, the data pool (persistent storage disk)

  1. What is the scanning strategy (ex do we plan to scan each commit or each merge request or run nightly scans or use manual trigger)?

I add some counter-questions for the training session.

  1. We need to find the right place/repo where to run the IP toolchain pipeline. The most natural solution would be the manifest repo (or a mirror of it), but there are some additional data that should be added as a yaml or json file along default.xml. Here’s an example/proposal:
flavours:
      linux:
        machines:
          - qemux86-64
          - qemux86
          - seco-intel-b68
          - stm32mp1-av96
          - seco-imx8mm-c61
          - raspberrypi4-64
        images:
          - allscenarios-image-base
          - allscenarios-image-base-tests
          - allscenarios-image-extra
          - allscenarios-image-extra-tests
        configs:
          seco-intel-b68:
            - CONFIG_SERIAL_OF_PLATFORM = "y"
          seco-imx8mm-c61:
            - ACCEPT_FSL_EULA = "1"
      zephyr:
        machines:
          - qemu-x86
          - qemu-cortex-m3
          - 96b-nitrogen
          - 96b-avenger96
          - nrf52840dk-nrf52840
          - arduino-nano-33-ble
        images:
          - zephyr-philosophers
        configs:
      freertos:
        machines:
          - qemuarmv5
        images:
          - freertos-demo

Such file should be kept updated by developers in order to enable the IP pipeline to automatic build all images for all available machines and flavours. Makes sense?

  1. We should decide a naming policy for branches (other than master and develop) and/or for MRs for which we want to run the pipeline (or do we want to run it for every MR? Or only when it’s ready to merge?.. always keeping in mind that the pipeline may take a long time to execute, so we cannot run it too frequently)
  1. see .ostc-ci/machines-and-flavours.yaml · develop · OSTC / OHOS / meta-ohos · GitLab

IP Compliance Pipeline design (2021-08-20 meeting)

The Context

a. Compliance pipeline needs to build all available flavours/distros, machines and images (and maybe also variants, in the near future), to extract metadata and source files (through Tinfoil bb library) and aggregate them before uploading them to Fossology, in order to reduce complexity and ease audit team’s work (more details in this thread): this, plus the fact that automated scanners (both ScanCode and Fossology integraded scanners) require a lot of machine time to complete, entail that the compliance pipeline regularly consumes a lot of machine time (~30h for a complete scan from scratch, ~4-5h for an average scan when only some new compnents have been added to the project)
b. Compliance process is asyncronous, because automated scanner results need to be validated in Fossology by the audit team, and the final results are collected by the compliance pipeline at a later stage

The Problem

a+b above entail that compliance pipeline cannot be run too frequently, not only to avoid machine overloading (which may be solved by process optimization or by adding more horsepower) but to avoid audit team overloading (which is not solvable by simply adding more resources – and in any case adding human resources would require some months for a number of reasons, so it could happen only after Jasmine release)

So assessing the correct timing for scheduling the compliance pipeline is key.

Also, relying only on a periodic schedule (eg. every 2 weeks) and/or on developers having to manually trigger the compliance pipeline every time they find it appropriate is not a viable solution, because we would risk both to have the pipeline not triggered when it is really needed (i.e. when new software components that do require an audit are added), and to have the pipeline triggered when it is not needed (i.e. when only trivial modifications have been made).

So we need to define an optimal strategy to automatically trigger the compliance pipeline.

Prospected Solution: Split the Pipeline in Two Parts

After some discussion, we came up with the following possible solution.

  1. There should be a first part of the pipeline, that just builds all images (leveraging the existing private SSTATE cache, if possible) and runs only aliens4friends’ debian matcher tool on the upsteam source of each new software package/recipe (we do not care about yocto patches at this stage). If there is a good debian matching (eg. > 80%) or if the package is included in a whitelist, there is no need to proceed further (the new package will be scanned at a later stage, eg. via a periodically scheduled pipeline every 2 weeks). If there is a bad (or no) debian matching, or if the package is included in a blacklist, the second part of the pipeline (the “real” compliance pipeline) will be triggered. This first pipeline should be triggered at every commit on meta-ohos develop branch, as well as on every MR into the develop branch, and provide artifacts (a json file?) that can be read by developers to assess which component changes have been introduced by their commit(s) and to check if the second (the “real”) compliance pipeline has been triggered and why.

    internal note: this is a good reason to keep both the “old” debian matcher and the “new” debian snapshot matcher in a4f: the first one (less accurate, not reproducible, but faster) could be used in the first part of the pipeline, while the second one (more accurate, always reproducible, but significantly slower) could be used in the second part of the pipeline.

  2. The second pipeline (the “real” one) will perform all the steps of aliens4friends’ workflow, and will be triggered:

    • by the fist pipeline, but also
    • by a periodic scheduler and
    • manually by developers when needed (great power, great responsibility: use with care!)
  3. There will be also another pipeline, running periodically (eg. nightly) that will perform just the final 2 steps of aliens4friends’ workflow (fossy and harvest), in order to update json stats for the dashboard and therefore to monitor audit work progress on Fossology

Environment hashing

As Andrei correctly pointed out, all parts of the pipeline would require some sort of “environment hashing” (modeled on SSTATE) in order to have unique identifiers of what we are checking, scanning and/or auditing. This topic will be further explored in a meeting with Andrei.

Roadmap

We will define a development roadmap next week, together with NOI Techpark.