Improvements in testing and building: GitLab CI and reproducible builds

Over the last couple of months, we have made some significant changes to two important parts of the Qubes development process: testing and building.

What are continuous integration (CI) and reproducible builds?

Automated testing is a major part of the software development process. It spares developers many, many hours of manual testing that would still miss some bugs and other problems. In Qubes development, we’re using an approach called “continuous integration” (CI), in which local changes made by the developers are frequently merged and tested remotely, using dedicated automated testing solutions. This is very important both for maintaining consistent code quality (all changes are tested) and for making development easier for the developers. Testing Qubes is not easy. Since Qubes is an entire operating system, doing the testing on the same system in which you’re developing is a bit like building a rocket landing system en route to Mars — not impossible, but very stressful.

The second area of improvement is the build process. The term “reproducible builds” refers to a process in which the same source code always compiles into exactly the same binary (for example, a package used to install a program via a package manager like dnf or apt). Why is this difficult to achieve? After all, computers are not random. Shouldn’t builds be reproducible by default, without requiring special effort to make them deterministic? Unfortunately, it’s not that simple. There are thousands of variables influencing the way binaries are built, ranging from the time of day to the availability of remote servers and locale settings.

Ensuring that binaries are built the same way every time is surprisingly difficult. However, the effort is worth the security benefits. To understand these benefits, imagine that an attacker wishes to feed unsuspecting users a compromised package. The attacker knows that the source code is public, so any malicious code he inserts into it would be highly exposed and at risk of detection. On the other hand, he reasons, compromising the build infrastructure would allow him to surreptitiously insert malicious changes that would make it into the resultant package. Since the source code remains untouched, his malicious changes are less likely to be detected. This is where the value of reproducible builds comes in. If the build process is reproducible, then we will immediately notice that building a package from the untouched source code results in a package that is different from the compromised one. This would be a major red flag that would prompt an immediate security investigation.

GitLab-CI migration

As many of you are aware, we migrated from Travis-CI to GitLab-CI late last year. While the direct reason was a change in the Travis-CI terms of service, GitLab-CI gives us many additional benefits. Just to name a few:

A modern execution environment with native Docker support: We can use whatever base environment we like. We are no longer constrained to specific (not so fresh) Ubuntu versions.
Much more flexible job definitions, including dependencies among them: We use this to split jobs into smaller pieces that can run in parallel and reduce duplication among them.
Out-of-the-box support for caching and artifacts: Another feature allowing for a great speed-up of our tests. A specific build environment can be stored with a pre-populated cache, for example avoiding the need to create a chroot environment each time.
Higher time limits and the ability to connect our own workers: This allows us to automatically test bigger components like the Linux kernel (which previously didn’t fit into Travis-CI’s hard time limit).

The actual migration was a massive undertaking, with the GitLab-CI configuration spread across 50 files with over 1,000 lines in total. We have opened and merged over 90 pull requests in the process. This was mainly done by Frédéric Pierret.

We still host the actual code on GitHub. We use GitLab only for CI. This mode of operation is supported natively by GitLab, but this support is quite limited. Most importantly, it does not support testing pull requests made from repository forks, which is the vast majority of our pull requests (if not all of them). For this reason, Frédéric ended up creating our own integration, which takes care of uploading the code to GitLab, scheduling CI jobs, and reporting the results back to GitHub. In fact, we are not the only project that has taken this route. Several other similar projects, such as check_pr, coq bot, and labhub, have also done so.

In the process, our CI has gained some nice extra features:

Control via special comments in pull requests:
- PipelineRetry — restarts a CI job
- PipelineRefresh — refreshes job status in GitHub (should normally not be needed)
- testdeploy — in a website-related pull request, takes the website built in GitLab-CI and publishes it under a temporary address (works only after the GitLab-CI build completes and only for selected users)
More tests, specifically for reproducible builds of packages

We also have a nice summary page with all the recent jobs states, which is updated daily.

Reproducible builds

Our current short-term goal for reproducible builds in Qubes OS is to integrate what is realistically possible in Debian. Specifically, we aim to get our Debian templates to the state in which packages can be installed only when configured rebuilders confirm they really came from the source code we publish. (A “rebuilder” is a program that takes a binary and its purported source code as inputs, along with any applicable metadata, and attempts to build an identical binary from the source code. The goal is to check whether the binary was really compiled from its claimed source code.) For this task, we focus specifically on Debian Bullseye (testing version), which includes most of the reproducibility work done by the Debian community.

To start, we had to bring our packages to the state in which they could be built reproducibly. GitLab-CI massively helped here with identifying issues. One of the CI jobs we added is running the reprotest tool on our packages. This tool builds the package twice in two different build environments and compares the results. If there are any differences, the job automatically runs the diffoscope tool, which provides a detailed, in-depth report. Now, all our packages for Debian Bullseye are reproducible, and we are actively monitoring it with our CI. This is designated by the “reproducible” badge on the CI summary page.

The next step on the journey is to enable an arbitrary user or organization to challenge the official packages — to take the sources, together with the buildinfo file describing how it was built and to verify that they result in the same binary package. Theoretically, there is a debrebuild.pl script in Debian’s devscripts package that does exactly this. We started by adding missing parts to it, but it turned out there were several limitations making it unsuitable for our use case. The primary limitation is support for only one source repository. (We need two: original Debian and Qubes.) Since debrebuild.pl, as the name suggests, is written in Perl, it isn’t very friendly for contributing to. For these reasons, we decided to write our own similar tool in Python. While this was a bit of work, the end result is very nice: about 900 lines of code, with proper structure using classes. We are in the process of upstreaming this code back to Debian, potentially to replace the original debrebuild.pl tool.

To ease the rebuilder setup, we are also in the process of creating a simple rebuilder orchestrator that monitors specified repositories and does the package build verification automatically when necessary. The goal is to give anyone who wants to verify our (or others’) packages an easy way to get started. The decision of who to trust is separate, but in order to have a choice, rebuilders must exist in the first place. This also includes scenarios in which users or organizations want to verify packages for themselves.

When the package rebuild is done, and it matches the official package, the next step is to announce this result so other users can take it into account when deciding whether to install the package or not. For this purpose, we are using the in-toto framework with its APT integration, apt-transport-in-toto. This step requires publishing signed hashes of rebuilt packages in a specific format (so-called “link” files). We have integrated this into our debrebuild tool (and the original debrebuild.pl too). The result can be seen at Frédéric’s rebuilder site.

The next step is to configure apt-transport-in-toto to actually use these proofs, but this step requires the user to decide who to trust. The configuration process is currently described in the README of rebuilder orchestrator. It includes steps to use both Frédéric’s rebuilder as well as to create a custom configuration. When enabled, every installed package is checked to ensure that a rebuilder can confirm that the package was really compiled from its purported source code. For example, installing the tzdata package looks like this:

root@debian-11:~# apt-get install tzdata
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following packages will be upgraded:
  tzdata
1 upgraded, 0 newly installed, 0 to remove and 842 not upgraded.
Need to get 284 kB of archives.
After this operation, 4,096 B of additional disk space will be used.
Get:1 intoto://deb.debian.org/debian bullseye/main amd64 tzdata all 2021a-1 [284 kB]
(...)
In-toto verification for '/var/cache/apt/archives/partial/tzdata_2021a-1_all.deb' passed! :)
Fetched 284 kB in 15s (19.3 kB/s)
Reading changelogs... Done
(...)

Reproducible RPMs

While we started with Debian, our goal is to have all of our packages reproducibly built and independently verified. Reproducible builds support in the RPM ecosystem is mainly driven by OpenSUSE, but Fedora and other RPM-based distributions benefit from this too. In some cases, this means that Fedora just needs some configuration to be adjusted in order to achieve a reproducible build, while in other cases this requires developing missing tools.

First of all, there is a set of tools that is very helpful while working on reproducible builds, specifically:

diffoscope — compares files in depth; an indispensable tool when working with packages and other compound file formats
reprotest — builds a package twice and compares the results
disorderfs — intentionally disturbs the way in which the filesystem is visible to applications, especially the order in which files are listed

While diffoscope was already available in Fedora, the other two weren’t. Frédéric started by adding RPM support to reprotest and then packaging both disorderfs and reprotest for Fedora. Thanks to these efforts, all three tools are now available in the default Fedora repositories.

Next, similarly to Debian, we developed a tool that takes an official package and tries to reproduce it. We call it rpmreproduce. While working on it, we noticed that there is no single official format for an RPM package build environment description, something that has existed in Debian for a long time as the buildinfo format. After discussing it a bit with the community, we submitted a buildinfo proposal and implementation to the upstream RPM project. At this stage, the only thing left to do is to add it to the rebuilder orchestrator mentioned earlier.

Having a package rebuilder running for RPM packages is one thing. Actually using it to verify packages during installation is another matter. In Debian, we used an existing APT plugin (apt-transport-in-toto), but for Fedora no such thing existed. To fill this gap, we have developed dnf-plugin-in-toto which works on a very similar basis. The DNF plugin architecture is a bit different from that of APT. It allows DNF to request a rebuild confirmation (based on the package hash from the repository metadata) before actually downloading package. There is still some work to do on this plugin, but the basic functionality is there already. Here’s an example:

# dnf install --enablerepo=qubes*testing qubes-u2f
(...)
Prepare in-toto verification for 'qubes-u2f-1.2.8-1.fc33.noarch'
Create verification directory '/tmp/tmpfei9swbm'
Request in-toto metadata from 1 rebuilder(s) (DNF global_info)
Request in-toto metadata from https://mirror.notset.fr/qubes/rebuild/yum/r4.1/vm/sources/qubes-u2f/1.2.8-1.fc33/metadata
Successfully downloaded in-toto metadata 'rebuild.8deb0bef.link' from rebuilder 'https://mirror.notset.fr/qubes/rebuild/yum/r4.1/vm/'
Copy final product to verification directory
Load in-toto layout '/home/user/dnf-transport-in-toto/data/root.layout' (DNF global_info)
Load in-toto layout key(s) '['9fa64b92f95e706bf28e2ca6484010b5cdc576e2']' (DNF global_info)
Use gpg keyring '/home/user/dnf-transport-in-toto/data/gnupg' (DNF global_info)
Run in-toto verification
In-toto verification for 'qubes-u2f-1.2.8-1.fc33.noarch' passed! :)
Dependencies resolved.
=======================================================================================
 Package                Arch    Version            Repository                      Size
=======================================================================================
Installing:
 qubes-u2f              noarch  1.2.8-1.fc33       qubes-vm-r4.1-current-testing  264 k
Installing dependencies:
 hidapi                 x86_64  0.9.0-4.fc33       fedora                          45 k
 python3-cryptography   x86_64  3.2.1-2.fc33       updates                        546 k
 python3-hidapi         x86_64  0.9.0.post2-2.fc33 fedora                          50 k
 python3-u2flib-host    noarch  3.0.3-9.fc33       qubes-vm-r4.1-current-testing   49 k

Transaction Summary
=======================================================================================
Install  5 Packages

Total download size: 954 k
Installed size: 3.6 M

Next steps

As explained above, some parts still need finishing, as well as cleanups and proper documentation. But we are very close to the point where every Debian package we build can be independently verified. The very same tooling we’ve made can be used to verify native Debian packages, too, which should also be helpful for non-Qubes Debian users. Similar progress has already been made for Fedora, although some more work is needed on the Fedora side to allow reproducing native (not only Qubes-related) packages.

In the broader future, our ultimate goal is to make all parts of Qubes OS reproducible, including templates and the installation image. Reproducible packages are the first step toward this goal, which incidentally is also the most valuable step to our users and the broader community.

Acknowledgements

This work is possible thanks to generous support from Mozilla Open Source Support (MOSS).

I’d like also to thank GitLab for granting us a free GitLab Gold license, which enabled much higher service quotas.

News

Categories Statistics