Compiler flags to be used for All Scenarios OS

We’re investigating what compiler flags to use for All Scenarios OS - for performance, debuggability and security.
Since in some situations, performance and increased security are mutually exclusive, it will be necessary to add a compiler flag selection setting – that way, we can enable additional checks in debug builds and enable maximum security for builds where security is key and performance doesn’t really matter (e.g. doorlock blueprint) without sacrificing performance where it is needed and maximum security is a secondary concern.

  • -O2 is probably the most reasonable optimization level - it optimizes reasonably well without making too many tradeoffs on performance vs. code size (-O3 tends to be overly aggressive with inlining). Where performance doesn’t matter (e.g. UIs that spend all their time waiting for user input), we can use -Os or even -Oz (clang only) on a per-recipe basis to reduce code size a little more and fit into smaller devices.
  • -flto Link-time optimizations – this allows for faster code, smaller binaries, and potentially increased security by enabling control-flow integrity checks. The only drawback is CPU and in particular memory requirements at build time. Some code may need patches to work with -flto (in the worst case, LTO can be disabled on a per-package basis), in particular when using gcc (gcc LTO tends to omit more symbols, even ones that are actually needed).
    clang only: There is a light version of -flto, activated by -flto=thin, that decreases build time significantly, but optimizes slightly worse. If you’re using a lower end build machine, -flto=thin may be for you.
  • -march=cputype -mtune=cputype per-target options to make sure we can make use of all available CPU features. Yocto already does a good job of adding those, but we have to make sure all our added targets are configured the right way, but we should double-check (especially for target hardware not supported upstream).
  • -fstack-protector-strong protects from stack smashing attacks, but it comes with a relatively high overhead added to every function call and stack allocation. It should definitely be enabled for packages that deal with input from the network, but should be excluded from relatively “safe” packages that need highest possible performance, such as mesa when including a graphics stack.
    clang only -fsanitize=safe-stack can be added as a different protection from stack smashing attacks: It puts return addresses etc. on a separate, safe, stack so overwriting the normal, “unsafe” stack doesn’t affect the relevant stack.
  • –param=ssp-buffer-size=4 enables stack smashing protection (see -fstack-protector-strong) even for smaller stacks, increasing the effect of -fstack-protector-strong (on both security and performance).
  • -fstack-clash-protection Adds additional code to detect stack overflow attacks (for the inner workings, see Software Security: Stack Clash Protection in gcc - the title of that page says gcc, but it applies equally to clang). It adds some overhead, but is still in the tolerable range.
  • -fPIE -pie Makes executables position independent - i.e. address space can be randomized, making “off-the-shelf” exploits of security bugs harder. The overhead is mostly in startup time, so this should be enabled even when not in paranoid mode. On some architectures (32-bit x86), the overhead is somewhat higher since one less register is available (and 32-bit x86 is register starved) in PIC/PIE code.
  • -Wformat -Wformat-security -Wformat-nonliteral -Werror=format-security -Werror=format-nonliteral Adds stricter syntax checks to printf and printf-like commands. This doesn’t have any runtime overhead (just increases compile time a little), so it should be enabled unconditionally. Some recipes may need to override it because -Wformat-security can generate false positives.
  • -Werror=array-bounds Errors out instead of warning when the compiler sees a potential buffer overflow in an array. There is no runtime overhead. The only drawback is the possibility of false positives. This should be enabled.
  • -Wstrict-aliasing=2 -Werror=strict-aliasing Try to detect aliasing violations at compile time instead of potentially causing problems at runtime. This is nice to have, but tends to trigger a number of errors in anything using the BSD socket API. Usual per-package workaround is -fno-strict-aliasing.
  • -fcf-protection Enables Control Flow Enforcement on some processors. This blocks some ROP and JOP attacks where supported with relatively low overhead. This flag works only on relatively current (Tiger Lake+) Intel processors; therefore it is not relevant to most targets. It should not be enabled for older Intel or AMD processors since it increases code size at no gain -but it should be added to specific targets along with -march=…
  • -Wl,-z,relro Marks the ELF relocation table read-only. This is always a good idea and should be enabled.
  • -Wl,-z,now Resolves all calls into shared libraries at startup time instead of runtime. This makes application startup a little slower, but improves security. This should be enabled by default.
  • -Wl,-z,noexecstack Marks the stack non-executable. This may break a few applications that rely on executing code from the stack (a rare requirement), but prevents a number of buffer overrun attacks. It doesn’t have any overhead, so it should be enabled by default (and just overridden in a recipe if we need to ship an application that relies on executable stack). Most code that fails to build with -Wl,-z,noexecstack is actually assembly files that can be “fixed” by just adding a .note.GNU-stack section.
  • -Wl,–hash-style=gnu Switches the linker’s hash table from the old sysv format to the newer gnu format, resulting in faster startup and slightly smaller binaries. No drawbacks, except losing support for ancient dynamic linkers.
  • -Wl,–as-needed Only link to shared libraries if they’re actually being used. The only drawback is having to take better care of the order in which object files and shared libraries are specified; since many distributions already use -Wl,–as-needed, most code that broke with -Wl,–as-needed has already been fixed.
  • -D_FORTIFY_SOURCE=2 This define adds some extra checks for buffer overruns that have low overhead and therefore should be used where possible; however, its implementation is specific to glibc and has no effect on smaller libc implementations (with the possible exception of some compiler builtins). Enabling it doesn’t hurt, but has no effect on builds in our default configurations. Of course it can’t hurt to enable it and run a glibc based build once in a while to see if anything triggers the assertions added by this flag.
  • -D_GLIBCXX_ASSERTIONS Similar to -D_FORTIFY_SOURCE=2, this define adds some extra checks for buffer overruns in libstdc++. Overhead for -D_GLIBCXX_ASSERTIONS is significantly higher than overhead for -D_FORTIFY_SOURCE=2, so this flag is probably closer to the paranoid category. It also doesn’t have any effect on libc++.
  • clang only -ftrivial-auto-var-init=pattern Initializes uninitialized variables with a pattern, preventing data leakage from uninitialized variables. This has medium overhead.
  • -fno-delete-null-pointer-checks Makes sure the compiler never optimizes a null pointer check away. Deleting null pointer checks assumes a pointer is non-NULL after it has been dereferenced (because accessing memory at NULL will always fail). Assuming that a pointer that has been dereferenced can’t be NULL in userland seems to be a safe assumption; therefore there is probably no need to enable this flag in anything but kernels (the Linux kernel enables it by default).
  • clang only -mspeculative-load-hardening Prevents spectre v1 style attacks by creating a data dependency between data access and predicate state – this has very high overhead and should probably not be enabled outside of the kernel except where maximum security is needed and performance doesn’t matter.
  • clang only -mretpoline Prevents spectre v2 style attacks by using return prediction instead of branch prediction – this has very high overhead and should probably not be enabled outside of the kernel except where maximum security is needed and performance doesn’t matter.
  • gcc only -mindirect-branch, -mfunction-return These flags are similar to clang’s -mspeculative-load-hardening and -mretpoline - with the same advantages and drawbacks. They mitigate spectre style vulnerabilities at a significant overhead.
  • -fsanitize=address -fsanitize-address-use-after-scope Detects a number of address errors that may well be security problems, such as: Out-of-bounds accesses to heap, stack and globals, Use-after-free, Use-after-return, Use-after-scope, Double-free, invalid free, Memory leaks (experimental). However, it comes at a high performance penalty (2-3x) and more than doubles memory use. It should therefore probably not be used in production builds. It is also mutually exclusive with -fsanitize=memory
  • -fsanitize=memory Detects a number of memory handling errors that may well be security problems. However, it comes at a high performance penalty (3x) and more than doubles memory use. It should therefore probably not be used in production builds. It is also mutually exclusive with -fsanitize=memory. Provides even better results (and even less performance) when used with -O1 -fno-omit-frame-pointer -fno-optimize-sibling-calls.
  • -fsanitize=thread Detects data races. Performance and memory overhead is typically 10x-15x – so this should not be used in production builds.
  • -fno-semantic-interposition, -Wl,-Bsymbolic and -Wl,-Bsymbolic-functions The linker flags -Bsymbolic and -Bsymbolic-functions bind references to symbols (with -Bsymbolic-functions, only function symbols) to the definition inside the binary being built. When used while linking a shared library, this makes it impossible to override a symbol in another shared library or a binary using it. This is sometimes beneficial (for performance: better inlining etc; for security: makes it impossible to LD_PRELOAD an exploitable library, etc) - but sometimes harmful (breaks LD_PRELOAD, breaks some libraries that want some of their symbols overloaded such as libxfont1). Problems introduced by this are usually subtle and hard to detect; if this is added to default flags, it should be done at the beginning of a cycle to increase the likelihood of finding drawbacks before any release. -fno-semantic-interposition is similar on the compiler side - it tells the compiler that if interposition happens for functions, the overwriting function will have precisely the same semantics (and side effects) so calls don’t have to be extra careful preserving information that might change, but does not change in the known implementation. Similarly if interposition happens for variables, the constructor of the variable is assumed to be the same.
  • -fvisibility=hidden and -fvisibility-inlines-hidden Defaults to making symbols hidden (can’t be referenced outside of the current binary – e.g. function not exported in a shared library). This makes the symbol table smaller (improving startup time and code size) and allows for extra optimizations (removing a symbol entirely if all uses are inlined). The drawback is that some shared libraries need patching to declare what symbols they want to export. The good news is that this is a compile time error, so failures are easy to detect.

Summary
Recommended default flags for a maximum performance build (either clang or gcc):
-O2 -flto -Wformat -Wformat-security -Wformat-nonliteral -Werror=format-security -Werror=format-nonliteral -Werror=array-bounds -Wstrict-aliasing=2 -Werror=strict-aliasing -Wl,-z,noexecstack,–hash-style=gnu,–as-needed -fvisibility=hidden -fvisibility-inlines-hidden -fno-semantic-interposition -Wl,-Bsymbolic,-Bsymbolic-functions
If code size isn’t a concern, replacing -O2 with -O3 is an option. If some standard compliance can be sacrificed, even -Ofast may be worth considering. Note that -ffast-math (implied by -Ofast) is known to break some code (e.g. Qt’s QVariant as of Qt 5.15.3 and 6.1.0) in subtle ways. This is not a bug - the purpose of -Ofast is relaxing standards compliance where ignoring standards can help optimize. When using -Ofast, do even more QA than usual.
Recommended default flags for a normal build (either clang or gcc):
-O2 -flto -Wformat -Wformat-security -Wformat-nonliteral -Werror=format-security -Werror=format-nonliteral -Werror=array-bounds -Wstrict-aliasing=2 -Werror=strict-aliasing -Wl,-z,relro,-z,now,-z,noexecstack,–hash-style=gnu,–as-needed -fvisibility=hidden -fvisibility-inlines-hidden -fstack-protector-strong --param=ssp-buffer-size=4 -fstack-clash-protection -fPIE -pie
Recommended default flags for a paranoid build (clang only)
-O2 -flto -Wformat -Wformat-security -Wformat-nonliteral -Werror=format-security -Werror=format-nonliteral -Werror=array-bounds -Wstrict-aliasing=2 -Werror=strict-aliasing -Wl,-z,relro,-z,now,-z,noexecstack,–hash-style=gnu,–as-needed -fvisibility=hidden -fvisibility-inlines-hidden -fstack-protector-strong --param=ssp-buffer-size=4 -fstack-clash-protection -fPIE -pie -ftrivial-auto-var-init=pattern -mspeculative-load-hardening -mretpoline
Recommended default flags for a paranoid build (gcc only)
-O2 -flto -Wformat -Wformat-security -Wformat-nonliteral -Werror=format-security -Werror=format-nonliteral -Werror=array-bounds -Wstrict-aliasing=2 -Werror=strict-aliasing -Wl,-z,relro,-z,now,-z,noexecstack,–hash-style=gnu,–as-needed -fvisibility=hidden -fvisibility-inlines-hidden -fstack-protector-strong --param=ssp-buffer-size=4 -fstack-clash-protection -fPIE -pie -ftrivial-auto-var-init=pattern -mindirect-branch -mfunction-return
Of course, in any build type, target CPU specific options like -march=…, -mtune=…, -mfpmath=sse, … should be added for increased performance.

1 Like

This is a complete document. What would make the last section easier to read would be to better show differences in recommended options for each of the modes (eg. in italics, bold…)

This can have rather dramatic positive impact on performance. I’m totally for enabling that. This was recently covered on hacker news Issue 38980: Compile libpython with -fno-semantic-interposition - Python tracker

Yes, I’m in favor as well, but I’m not sure if it should be done now or after Jasmine, given we’re already about to make an Alpha release and some of the possible issues caused by -Bsymbolic are very easy to miss (this can be something extremely subtle like an overloaded function call calling the original function, essentially resulting in an uninitialized variable or worse. That’s what’s known to break libxfont1).
We can give it a try, but we should keep in mind we’re adding a potentially dangerous flag there if we see any kind of weird breakage.
But someone has to do it first and fix/document breakages… Might as well be us before that other company claims they did it first.

We should enable these flags in an “experimental” build to get some CI going.

1 Like

I think we should enable that early. CI should be going up piece by piece and we will be able to catch those things when there is more testing.