Qubes Architecture Next Steps: The New Qrexec Policy System
This is the second article in the “What’s new in Qubes 4.1?” series. You can find the previous one (about GUI Domains) here. While the introduction of GUI domains is a big, singular feature, the changes to qrexec are more complex and varied — but also very important.
What is qrexec?
You might have been using Qubes for a while and never encountered the word “qrexec,” but it is one of the crucial components in the system. Qubes provides isolation and compartmentalization. Separate qubes are separate worlds, and what the user does in one qube should not impact another qube. That is, of course, not realistic. In the real world, we often need and want to do things like copying and pasting, sending files, routing internet traffic, and simply synchronizing the time. This is where qrexec comes in: It is an RPC (remote procedure call) mechanism that allows one qube to do something inside another qube. Of course, allowing everything all the time would be extremely dangerous. Thus, a part of qrexec called “qrexec policy” is also used to enforce who can do what and where. Furthermore, we want to be able to audit what was done, and this is provided with the logging capabilities of qrexec. The how is controlled by qrexec services, which are executed by qrexec and also must be designed in a secure and resilient way. This post focuses mainly on qrexec itself, but qrexec services will make a brief appearance.
Overview of changes
Most of this post will be very technical. If you don’t care about writing your own qrexec policies and services, feel free to just read this overview.
Before we get into nitty-gritty technical details, here’s a brief, less technical overview of Qubes 4.1 qrexec changes and what they mean for users and developers:
- A new qrexec policy format. (The old format is still supported, but the new one is very much superior, allowing for easier-to-read policies, more qube-centered customization, better auditing, and more.)
- Big performance improvements: bigger data chunk sizes for faster transfers of large amounts of data, qrexec policy daemon for faster policy evaluation and call setup, resulting in up to seven-fold faster qrexec service calls.
- Support for socket services: better performance for services that can use a socket-based implementation with significantly faster setup and connection times.
- Policy notifications that make any abnormal behavior easier to detect and any problems resulting from incorrect permissions easier to solve.
There is one other big upcoming change (which is not yet fully implemented and will arrive after Qubes 4.1): We are working on a qrexec policy API, that is, a set of qrexec services that will allow for managing qrexec policies without manually editing policy files. It’s another step toward separating the user from dom0 and protecting the vulnerable system internals by isolating them from the outside world to the greatest extent possible.
New policy format
In Qubes 4.0 and earlier, policies were stored as multiple files, one for each service. While changing permissions for a single action was easy, managing permissions for an entire qube was very cumbersome. The new Qubes 4.1 policy format completely overhauls this approach and introduces several convenience features to make policy management easier and more secure.
What if you’ve already spent time carefully crafting your policies in Qubes 4.0? Don’t worry. The old policy format will still be supported until at least Qubes 5.0.
You can find details of the development of the new policy format here.
Policy files
The biggest difference between the old and new policy formats is that, under the new format, the entire policy is a single entity. It is not divided into separate, per-service fragments. While it can be stored in multiple files located in multiple places, the files are equivalent, and each can describe policies for multiple services. In other words, we are moving from a set of tables to one big, flat table.
The policy files, stored previously in /etc/qubes-rpc/policy/
, are now
located in /etc/qubes/policy.d/
. Furthermore, each file must have a
.policy
extension, and any temporary files will no longer cause
issues. Under the old format, there were dozens files:
/etc/qubes-rpc/policy/
├── admin.backup.Cancel
├── admin.backup.Execute
├── admin.backup.Info
├── include
│ ├── admin-local-ro
│ ├── admin-local-rwx
│ ├── admin-global-ro
│ ├── admin-global-rwx
├── …
├── qubes.Gpg
├── qubes.StartApp
├── …
├── whonix.SdwdateStatus
Under the new format, we simply have:
/etc/qubes/policy.d/
├── 35-compat.policy
├── 90-admin-default.policy
├── 90-default.policy
├── include
│ ├── admin-local-ro
│ ├── admin-local-rwx
│ ├── admin-global-ro
│ ├── admin-global-rwx
(and any additional user-defined files)
Here’s an example of an old policy file (qubes.GetDate.policy
):
## Note that policy parsing stops at the first match,
## so adding anything below "$anyvm $anyvm action" line will have no effect
## Please use a single # to start your custom comments
$tag:anon-vm $anyvm deny
$anyvm $anyvm allow,target=dom0
Here’s an example of the beginning of a new policy file
(90-default.policy
):
## Do not modify this file, create a new policy file with lower number in the
## filename instead. For example `30-user.policy`.
###
### Default qrexec policy
###
## File format:
## service-name|* +argument|* source destination action [options]
## Note that policy parsing stops at the first match.
# policy.RegisterArgument should be allowed only for specific arguments.
policy.RegisterArgument * @anyvm dom0 deny
# WARNING: The qubes.ConnectTCP service is dangerous and allows any
# qube to access any other qube TCP port. It should be restricted
# only to restricted qubes. This is why the default policy is 'deny'
# Example of policy: qubes.ConnectTCP +22 mytcp-client @default allow target=mytcp-server
qubes.ConnectTCP * @anyvm @anyvm deny
# VM advertise its supported features
qubes.FeaturesRequest * @anyvm dom0 allow
# Windows VM advertise installed Qubes Windows Tools
qubes.NotifyTools * @anyvm dom0 allow
# File copy/move
qubes.Filecopy * @anyvm @anyvm ask
# Get current date/time
qubes.GetDate * @tag:anon-vm @anyvm deny
qubes.GetDate * @anyvm @anyvm allow target=dom0
Under the new format, the policy for a single qube is much easier to find, parse, and edit. It’s all contained in a single file, and by using wildcards (see below), it’s much easier to read and understand. For example, it’s trivial now to forbid all calls for a given qube or forbid all with a list of exceptions. This allows for a more qube-centric approach to policies, and while the results are theoretically the same, there’s a huge difference between a couple of rules, all in one places, and dozens of rules spread across dozens of files. It’s especially important for Admin API policies. (The Admin API provides qrexec calls for querying system state. There are a lot of them, and they are needed by system qubes such as GUI domains. Under the old policy format, it was quite arduous to edit them when each call had its own policy file.)
Now that the policy is a single entity, it is parsed as a whole. If
there are any syntax errors, the parser will refuse to load anything (in
order to prevent any unintended permission grants). The system is
designed to “failed closed”: An empty policy results in all qrexec calls
being denied. While this protects users from their own mistakes, users
should still be careful when editing policies so that the system remains
functional. This is similar to the approach that tools such as sudo
take, and we decided that it makes for a safer system on the whole. In
the future, Qubes will also include a Policy API (see below) that will
include calls for modifying the policy in a safe way that prevents
syntax errors.
New column syntax
The full specification can be found here.
Old policy files had three columns: source qube, target qube, and action (allow/deny/ask with optional arguments). Because the new policy format no longer includes the service name in the file name, we had to change this a bit to a five-column format: service name, service argument, source qube, target qube, and action.
qrexec.Service +ARGUMENT SRCQUBE DSTQUBE {allow|deny|ask} [PARAM=VALUE [PARAM=VALUE ...]]
Under the old format, you could use wildcards only in the source qube
and target qube fields. Under the new format, you can also use wildcards
in the service name and argument fields. For example, to forbid all
calls from the work
qube to the personal
qube, you could use this
line:
* * work personal deny
You could also deny all calls from work
except for calls to work-pub
:
* * work work-pub allow
* * work * deny
This, along with the new file structure, results in much clearer, qube-centered policy rules.
A minor change is the replacement of commas with spaces in the parameter list. For example, a ConnectTCP policy could look like this:
qubes.ConnectTCP * mytcp-client @default allow target=mytcp-server
Security in symbols
The old policy format used the $
character in multiple keywords, such
as $default
and $dispvm
. Due to a potential issue described in
Qubes Security Bulletin #38,
we have been getting rid of $
. (In brief, interaction between Bash
shell parameter expansion and $
characters in policy files could
theoretically lead to security problems.) For backward compatibility,
$
is still supported in policy files, but it is being internally
turned into @
before processing, and in the new policy format, we only
use @
. In other words, $default
in Qubes 4.0 will now be @default
,
$dispvm
turns into @dispvm
, and so on. For example, to update all
TemplateVMs through sys-whonix
, you would use:
qubes.UpdatesProxy * @type:TemplateVM @default allow target=sys-whonix
In all client tools, $
will still be automatically translated into
@
, so you don’t have to change any existing scripts. However, we
highly recommend using only @
. Legacy support for $
will continue
throughout the Qubes 4.x series and end in Qubes 5.0.
Includes
The old format supported a simple $include
directive. In Qubes 4.1,
the inclusion capabilities are extended, and we now have four different
kinds of includes. Due to the aforementioned QSB #38 change, the
keywords no longer start with $
, but rather with an exclamation point
(!
). The supported include directives are:
!include path_to_file
— The simplest command. Includes the entire contents of the file and parses it according to new policy syntax rules. The file can contain policy rules for one or more services, for one or more qubes.!include-dir path_to_directory
— Includes all.policy
files from the directory, parsing them according to new policy syntax rules.!include-service service argument path_to_file
— Allows for the inclusion of a policy in a format similar to the old one (without a service or an argument) by filling in the missing service and argument fields from the new format with parameters provided toinclude-service
. Either of those (or both) can be a wildcard (*
), as they can be in all new policy files. (This might seem like a directive useful only for backward compatibility, but it also has uses in the Admin API policies used for implementing GUI domains, among other things.)!compat-4.0
— An explicit backward compatibility directive that takes the old policy files from their old locations and converts them internally (usinginclude-service
) to be a part of the new policy.
Backward compatibility
One of the new policy files is 35-compat.policy
. It loads the old
policy files and parses them in a way compatible with the new policy
format.
Before the old policy format is completely phased out in Qubes 5.0, we
will provide a conversion tool. (This tool may not be available in Qubes
4.1 on release.) Since there will be legacy support for the old policy
format with 35-compat.policy
and the !compat-4.0
include directive,
it is not necessary for users to update their current custom policies
for Qubes 4.1.
Performance improvements
In Qubes 4.1, the max qrexec data chunk size has been increased from 4kB
to 64kB. This makes large data transfers significantly faster, for
example, copying big files or streaming video through qvm-usb
from
sys-usb
to a qube with an ongoing video call. (Details can be found
here.)
We have also significantly increased the maximum service and argument
length. In Qubes 4.0, it was just 63 characters for the combined
service+arguments. Now, it is 65,000 characters (a bit less than 64kB,
due to an internal header). The main reason for this change — apart
from sheer convenience — is the qubes.VMShell
service. It’s an
extremely dangerous service that gives complete shell access.
Unsurprisingly, it’s not allowed by default. In Qubes 4.0,
qubes.VMShell
just provides a shell stdin/stdout, so handling it with
a single command was extremely cumbersome and often required a
prodigious number of quotation marks. Even worse, the command was not
stored in any log file, and qrexec policy did not control what commands
could be executed, opening up a big, unauditable, dangerous space for
exploitation. By increasing the service+argument length (and governing
the argument under qrexec policy, as described above), we are able to
introduce a new, more accountable, safer service. In Qubes 4.1, this new
service is called qubes.VMExec
, and the complete command to be
executed, along with its arguments, is logged. This allows for precise
control, as the policy can allow only certain commands, not everything
that can be executed in a shell. Moreover, all the arguments are
explicitly defined and unambiguous, freeing the user from quotation mark
purgatory. (The syntax is described
here.) Other
services will also benefit, such as the U2F implementation that, due to
previous limitations, required some awkward wrangling of the public key
hash. (Details can be found
here.)
There are also some minor improvements and streamlining coming up: for
example, a new call that allows access to all qube properties at once,
which allows qvm-ls
(and similar tools) to work significantly faster.
Instead of making around five Admin API calls per qube, qvm-ls
will
now only make one, leading to a five-fold speed increase.
Policy daemon
In Qubes 4.0, performing a qrexec call (with all its assorted policy
checking) was a fairly slow process. Even in the best case (simple
allow, no asking user for input), it consisted of several processes:
loading the whole policy again, loading Python modules required to parse
it, executing the call itself…. The setup alone took about 300 ms (for
a typical system). In the case of a lone operation, such as copying a
file, it’s hardly a problem, but for frequent operations it could be a
disaster. Tunneling a TCP connection via qrexec (the qubes.ConnectTCP
service) would mean a 300 ms lag for establishing the connection — a
complete nightmare for the server! Qubes Clipboard, which also uses
qrexec and is managed by a policy (which allows for blocking copy or
paste for certain qubes), needed 300 ms for each operation, which can be
a bit of a problem for particularly fast typists. The biggest problem
was, however, the Admin API. The whole idea of an Admin API (read more
here)
requires a lot of qrexec calls. If we want to implement a GUI domain
that manages certain parts of the system instead of having the user
directly access dom0, we need the Admin API to be much, much faster. For
example, without the changes outlined below, starting the Qube Manager
on a default, out-of-the-box Qubes installation with a GUI domain took
37 seconds. This is just unacceptable.
To speed up qrexec calls, Qubes 4.1 introduces a policy daemon. It is a process that loads the policy into memory and keeps it there (only reloading when the policy files change), answering qrexec queries about whether a given operation should be accepted or denied without spawning new processes or loading more files and Python modules. The results are quite satisfying. Starting the Qube Manager in a default Qubes installation with a GUI domain takes less than 5 seconds — a more than seven-fold improvement.
Socket services
Another problematic bottleneck for qrexec was that each service was a
new process. In many cases, this process just connects to an existing
service. A particularly egregious example is the Admin API. An Admin API
call is just a connection to dom0’s qubesd daemon, but in Qubes 4.0 each
of its calls was a separate process, going through a bunch of shell
scripts to return a simple chunk of data. For a program like qvm-ls
,
which has to access a lot of data from the Admin API, it was a
performance nightmare.
A qrexec service is a file residing in /etc/qubes-rpc/
. In Qubes 4.0,
it has to be an executable or a symlink to one. In Qubes 4.1, however,
it can also be a symlink to a unix socket. Via this socket, you can
simply connect to an existing process that will handle your call. With
each socket connection, this running service receives a new connection
with a metadata header (containing information on who wants to do what
and where, a service name just in case, and a terminating 0 byte)
followed by the same client data as before. Thus, even though it’s a
single process that handles multiple calls, it knows where each call
came from and what it asked for.
Performance gains on both speed and memory benchmarks are significant. For example, we have an icon rendering service, which is responsible for making sure each of your program icons bears the color of its qube. In Qubes 4.0, each invocation of this service was a separate per-qube processes in dom0, each one loading Gtk and other libraries for about 32MB of RAM. In Qubes 4.1, this has been converted into a single-process, socket-based service. Now, ten running qubes need only this one socket-based service process, for a single instance of 32MB RAM usage, instead of ten times that in Qubes 4.0. The Admin API also uses a socket service, which — combined with the policy daemon described above — dropped call execution time from 300 ms to about 50 ms.
More information about socket services can be found here.
Notifications
Creating a qrexec policy daemon opened another convenient opportunity:
adding notifications for some policy actions. By default, notifications
are shown only when a deny occurs, unless it was due to an ask
action. (If the user was already asked for consent, it is not necessary
to show the notification.) However, the new policy format allows for an
optional notify=no
parameter with deny rules. Allow rules support
a notify=yes
parameter that will show a notification on accepted
qrexec calls.
For example, this rule will cause notifications to be shown whenever a
qubes.VMExec
service call is made from the work
qube to the
personal
qube:
qubes.VMExec * work personal allow notify=yes
And the above screenshot could be accomplished with:
qubes.Gpg * work work-gpg allow notify=yes
qubes.Filecopy * untrusted * deny
Notifications alert the user that something unexpected might have happened. They can signal a misconfigured policy (when calls are denied while the user would prefer them to be allowed) or a misbehaving or malicious qube (when unexpected calls are being made and denied due to a correctly configured policy).
The introduction of GUI domains makes the subject of qrexec notifications a bit more complex. Since there can be more than one GUI domain, which one should receive notifications? For now, we’ve decided that the GUI domain serving the source qube (the one making the call) will show notifications. There are some conceivable setups in which a different GUI domain would show notifications. While such setups will not be supported in Qubes 4.1, they may appear in future versions.
More information about notifications can be found here.
Policy API (in development)
As we mentioned above, one of our big development goals is separating the user from the most vulnerable part of the system: dom0. Doing so improves compartmentalization and security while lowering both the attack surface and the chance of accidentally committing a compromising mistake.
In order to completely isolate dom0, however, we would have to provide qrexec interfaces for everything that the user may need to access and configure — including the qrexec policy itself. At the moment, we are working on a broad Policy API that allows for editing the whole policy. The Policy API will also enforce the correctness of any policy changes, preventing users from making syntactically invalid policy changes.
We also have some ideas for implementing limited access, for example, having a deprivileged user who is limited to modifying certain services or controlling access to certain qubes. The design for this is not yet final. In its current state, we are thinking about supporting two kinds of roles: an API administrator with total access, and an operator with access to only a well-defined subset of the policy. The implementation of any such ideas will definitely not be a part of Qubes 4.1.
Current work on the Policy API design draft and a proof of concept can be found here.