OpenBSD Journal

Next steps toward mimmutable, from deraadt@

Contributed by Peter N. M. Hansteen on from the unmute the immutable dept.

In a recent message to the tech mailing list, Theo de Raadt (deraadt@) summarized the state of the new memory protections work. The thread also includes a followup from Otto Moerbeek (otto@) on consequent changes to the memory allocation mechanisms.

Theo writes,

From: "Theo de Raadt" <deraadt () openbsd ! org>
Date: Fri, 18 Nov 2022 03:10:05 +0000
To: openbsd-tech
Subject: More on mimmutable


I am getting close to having the big final step of mimmutable in the tree.
Here's a refresher on the how it works, what's already done, and the next
bit to land.

     The mimmutable() system call changes currently mapped pages in the region
     to be marked immutable, which means their protection or mapping may not
     be changed in the future.  mmap(2), mprotect(2), and munmap(2) to pages
     marked immutable will return with error EPERM.

That's the system call.  In reality, almost no programs call it.

Let me start by explaining a process's address space, starting with the simplest
programs and then heading into more complicated cases.

A process runtime has a
  - stack (rwS permissions, S being the annotation used at system call entry
    to ensure the "sp" register points to stack, and thus prevent a class
    of ROP pivot methods),
  - a stack-guard (for growing the stack in case rlimits are changed, this
    is is permission NONE)
  - a signal trampoline page, randomly placed, which will perform sigreturn(2),
    (permission rwe, e being the annotation used at system call entry to ensure
    the "pc" register points at a region allowed to system calls, thus preventing
    attackers from uploading direct system call instruction code)

Those objects are automatically marked immutable by the kernel.

On to static executables.  The kernel loads a static ELF binary into
memory as a text segment (rx permission), followed by a data segment (rw
permission), a bss (zero'd data, rw permission), and a rodata segment
(ro permission).  The order of these varies per architecture.  There is
an overlay of this called the "GNU_RELRO", which is pretty uhm special.
I've created a new overlay called "OPENBSD_MUTABLE", which is
page-aligned and must not be made immutable.  As it happens, these two
special regions are the only part of the image load that cannot be marked
immutable, so the kernel proceeds to mark everything else immutable.

When that static program starts running, it will run the crt0 ("c run time")
startup code, which can make some small changes in the GNU_RELRO region,
and them mark it immutable.

So this static executable is completely immutable, except for the
OPENBSD_MUTABLE region.  This annotation is used in one place now, deep
inside libc's malloc(3) code, where a piece of code flips a data structure
between readonly and read-write as a security measure.  That does not become
immutable.  It happens to be an page.

There is another ugly old wart called "text relocations", and I won't
get into it except to say the kernel recognizes such binaries, and skips
some immutabilities, but of course crt0 finishes the job. 

I want to speak a bit about the mechanism.  Inside the kernel,
immutability is applied to all the regions.  And then the exceptions are
marked mutable.  The kernel is allowed to reverse setting immutability,
but userland cannot.  This will come up later.

Now let's talk about dynamic executables.  The same applies as above for
the main program, but then the kernel also loads another object into memory:
/usr/libexec/ -- the shared library linker.  And instead of starting
to run the main program, execution starts in the shared library linker.

The shared library linker ELF image contains similar objects.  There is a
GNU_RELRO, which the kernel cannot mark immutable.  There is no OPENBSD_MUTABLE
because we don't request creation of one.  There is a special "boot.text"
section that the shared library linker unmaps upon startup, as a security
measure, and the kernel ignores that region.  All the other regions of are marked immutable by the kernel automatically.

Now starts to execute, and the first job it does is to fix it's
own relro section, handle text relocations in the dynamic binary, repair
some permissions, and then mark itself and the main program immutable.
Completely immutable, except for the OPENBSD_MUTABLE page in malloc(3).

I was really surprised we got to this point without blowing up the ports
tree in a major way.  Some of these warts were found along the way and
changed the direction a little.  And I don't want to talk about sigaltstack
right now. is now responsible for loading the shared libraries required by the
program.  Shared libraries have the same pieces as regular programs, so
they are loaded into memory, but here we hit a problem.  In the kernel we
could simplistically mark all the regions as immutable, and then reverse
it for the special cases.  Userland code cannot do that.  So we have to
keep track of the sections we want to be immutable, as well as the regions
that are immutable, and then subtract the differences, and apply the
immutables very late in the shared library loading process.  I call this
code the clipping engine, and I had a lot of bugs in it.

There is another way shared libraries are loaded:  via the dlopen()
call later at runtime.  With the flag RTLD_NODELETE, libraries can
be marked immutable.  Otherwise we must assume they will later be
dlclose()'d and unmapped, and immutable isn't allowed.

Most of our architectures load objects (programs,, libraries) in
linear fashion, but OpenBSD/i386 has an old line-in-the-sand pre-NX layout
that seperates the code and data by a 512MB seperation, and this less simple
memory layout results in the clipper actually needing to do more than
"1 immutable minus 2 mutables", it is actually closer to "6 immutables minus 3".

But all of it is now working on all the architectures.

The results can be inspected using the procmap(8) command.  To use this,
you must temporarily enable sysctl kern.allowkmem=1, in /etc/sysctl.conf,
and reboot, because this setting cannot be done after single-user boot.

Here is a cheat-sheet from the manual page:

     In this format the column labeled "rwxSeIpc" comprises:

           rwx     permissions for the mapping
           S       mapping is marked stack
           e       mapping is allowed system call entry points
           I       mapping is immutable (rwx protection may not be changed)
           p       shared/private flag
           c       mapping needs to be copied on write (`+') or has already
                   been copied (`-')

Using i386 (because the addresses are smaller), this is a cat binary.
I will manually delete entries which aren't relevant for various reasons
(mostly, malloc or mmap allocations)

# procmap -p $pid
Start    End         Size  Offset   rwxSeIpc  RWX  I/W/A Dev     Inode - File
0cda9000-0cda9fff       4k 00000000 r-x-eIp+ (rwx) 1/0/1 00:00       0 -   [ uvm_aobj ]
161d8000-161dcfff      20k 00000000 r----Ip+ (rwx) 1/0/0 04:00   77777 - /bin/cat [0xd587f000]
161dd000-161f2fff      88k 00004000 r-x-eIp+ (rwx) 1/0/0 04:00   77777 - /bin/cat [0xd587f000]
361d8000-361d8fff       4k 00000000 r----Ip- (rwx) 1/0/0 00:00       0 -   [ anon ]
361d9000-361dafff       8k 0001a000 rw---Ip- (rwx) 1/0/0 04:00   77777 - /bin/cat [0xd587f000]
361db000-361dbfff       4k 0001c000 r-----p- (rwx) 1/0/0 04:00   77777 - /bin/cat [0xd587f000]
cd6c8000-cefc7fff   25600k 00000000 -----Ip+ (rwx) 1/0/0 00:00       0 -   [ ]
cefc8000-cf7c7fff    8192k 00000000 rw-S-Ip- (rwx) 1/0/0 00:00       0 -   [ ]
 total               8436k

Everything is immutable except for the malloc 4k page (361db000) which
was marked "OPENBSD_MUTABLE".  The first 4k object is r-x-eIp+, that's
the signal trampoline.  Then the rodata, text, the GNU_RELRO showing itself,
and then data and bss.  The stackguard and stack are at the end.

I will avoid the OpenBSD/i386 overlapping mapping confusion, and switch
to a more linear-mapped architecture to show the dynamic binary sed(1),
which uses one shared library, sorry it is 64-bit architecture so the addresses
are huge:

Start            End                 Size  Offset           rwxSeIpc  RWX  I/W/A Dev     Inode - File
00000b35973ef000-00000b35973f1fff      12k 0000000000000000 r----Ip+ (rwx) 1/0/0 04:05  596337 - /usr/bin/sed [0xfffffd83c2256200]
00000b35973f2000-00000b35973f7fff      24k 0000000000002000 r-x--Ip+ (rwx) 1/0/0 04:05  596337 - /usr/bin/sed [0xfffffd83c2256200]
00000b35973f8000-00000b35973f8fff       4k 0000000000000000 r----Ip- (rwx) 1/0/0 00:00       0 -   [ anon ]
00000b35973f9000-00000b35973f9fff       4k 0000000000000000 rw---Ip- (rwx) 1/0/0 00:00       0 -   [ anon ]
00000b35973fa000-00000b35973fafff       4k 0000000000000000 rw---Ip- (rwx) 1/0/0 00:00       0 -   [ anon ]
00000b37a4262000-00000b37a4298fff     220k 0000000000000000 r----Ip+ (rwx) 1/0/0 04:05  362882 - /usr/lib/ [0xfffffd83bff74b10]
00000b37a4299000-00000b37a433dfff     660k 0000000000036000 r-x-eIp+ (rwx) 1/0/0 04:05  362882 - /usr/lib/ [0xfffffd83bff74b10]
00000b37a433e000-00000b37a433efff       4k 00000000000da000 r----Ip- (rwx) 1/0/0 04:05  362882 - /usr/lib/ [0xfffffd83bff74b10]
00000b37a433f000-00000b37a4344fff      24k 00000000000db000 r----Ip- (rwx) 1/0/0 04:05  362882 - /usr/lib/ [0xfffffd83bff74b10]
00000b37a4345000-00000b37a4346fff       8k 00000000000e0000 rw---Ip- (rwx) 1/0/0 04:05  362882 - /usr/lib/ [0xfffffd83bff74b10]
00000b37a4347000-00000b37a4347fff       4k 00000000000e2000 r-----p- (rwx) 1/0/0 04:05  362882 - /usr/lib/ [0xfffffd83bff74b10]
00000b37a4348000-00000b37a4355fff      56k 0000000000000000 rw---Ip- (rwx) 1/0/0 00:00       0 -   [ anon ]
00000b37ce440000-00000b37ce444fff      20k 0000000000000000 r----Ip+ (rwx) 1/0/0 04:04 1866250 - /var/run/ [0xfffffd83bebe6378]
00000b37e8eb8000-00000b37e8ebafff      12k 0000000000000000 r----Ip+ (rwx) 1/0/0 04:05  518403 - /usr/libexec/ [0xfffffd83c089d290]
00000b37e8ebb000-00000b37e8ebcfff       8k 0000000000000000 -----Ip+ (rwx) 1/0/0 00:00       0 -   [ anon ]
00000b37e8ebd000-00000b37e8ec8fff      48k 0000000000005000 r-x-eIp+ (rwx) 1/0/0 04:05  518403 - /usr/libexec/ [0xfffffd83c089d290]
00000b37e8fb8000-00000b37e8fb8fff       4k 0000000000011000 r----Ip- (rwx) 1/0/0 04:05  518403 - /usr/libexec/ [0xfffffd83c089d290]
00000b37e8fb9000-00000b37e8fb9fff       4k 0000000000012000 rw---Ip- (rwx) 1/0/0 04:05  518403 - /usr/libexec/ [0xfffffd83c089d290]
00000b37e8fba000-00000b37e8fbafff       4k 0000000000000000 rw---Ip- (rwx) 1/0/0 00:00       0 -   [ anon ]
00000b3819cfc000-00000b3819cfcfff       4k 0000000000000000 r----Is- (r--) 1/0/1 00:00       0 -   [ uvm_aobj ]
00000b3850ed3000-00000b3850ed3fff       4k 0000000000000000 r-x-eIp+ (rwx) 1/0/1 00:00       0 -   [ uvm_aobj ]
00007f7ffdecb000-00007f7fff7cafff   25600k 0000000000000000 -----Ip+ (rwx) 1/0/0 00:00       0 -   [ ]
00007f7fff7cb000-00007f7ffffcafff    8192k 0000000000000000 rw-S-Ip- (rwx) 1/0/0 00:00       0 -   [ ]

sed is once again early, leading with rodata, then text (but the main program
has no system calls!), then another readonly which is surely GNU_RELRO, followed
the data and bss. happens to be the next region, rodata, code, GNU_RELRO
which is two regions and quite large, followed by the malloc(3) mutable page, and
then libc's data and bss.  The shared library linker has mapped it's database, and has marked it immutable by itself.  Later on, the itself
shows up, mapped in the same rodata/text/relro/data/bss order.  At the end
we see the stackguard and the stack.

These are small programs.  I've spent some time looking at chrome and emacs,
and they are looking really clean.

I think there is a small glitch on hppa.  Maybe there are others.  We have
two linkers in the tree (ld.bfd from binutils, and ld.lld from clang).  It
is possible that some of the obscure architectures have layout issues that
are either linker bugs, or handling bugs in my code.  But for now it seems
good enough to move forward and see if there are any other issues found in
the ports tree ecosystem.

while Otto noted that

From: Otto Moerbeek <otto () drijf ! net>
Date: Fri, 18 Nov 2022 06:42:30 +0000
To: openbsd-tech
Subject: Re: More on mimmutable

On Thu, Nov 17, 2022 at 08:10:05PM -0700, Theo de Raadt wrote:

> So this static executable is completely immutable, except for the
> OPENBSD_MUTABLE region.  This annotation is used in one place now, deep
> inside libc's malloc(3) code, where a piece of code flips a data structure
> between readonly and read-write as a security measure.  That does not become
> immutable.  It happens to be an page.

This will change.

I have code ready to change the init of that malloc data structure so
that the one page wil be modified to contain the right data, then made
readonly and then made immutable.


So we have exciting times ahead in snapshots land and looking forward to the next release.

(Comments are closed)


Copyright © - Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to as well as images and HTML templates were copied from the fabulous original with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]