Contributed by Peter N. M. Hansteen on from the unmute the immutable dept.
tech
mailing list, Theo de Raadt (deraadt@
) summarized the state of the new memory protections work. The thread also includes a followup from Otto Moerbeek (otto@
) on consequent changes to the memory allocation mechanisms.Theo writes,
From: "Theo de Raadt" <deraadt () openbsd ! org> Date: Fri, 18 Nov 2022 03:10:05 +0000 To: openbsd-tech Subject: More on mimmutable [LONG] I am getting close to having the big final step of mimmutable in the tree. Here's a refresher on the how it works, what's already done, and the next bit to land. DESCRIPTION The mimmutable() system call changes currently mapped pages in the region to be marked immutable, which means their protection or mapping may not be changed in the future. mmap(2), mprotect(2), and munmap(2) to pages marked immutable will return with error EPERM.
That's the system call. In reality, almost no programs call it. Let me start by explaining a process's address space, starting with the simplest programs and then heading into more complicated cases. A process runtime has a - stack (rwS permissions, S being the annotation used at system call entry to ensure the "sp" register points to stack, and thus prevent a class of ROP pivot methods), - a stack-guard (for growing the stack in case rlimits are changed, this is is permission NONE) - a signal trampoline page, randomly placed, which will perform sigreturn(2), (permission rwe, e being the annotation used at system call entry to ensure the "pc" register points at a region allowed to system calls, thus preventing attackers from uploading direct system call instruction code) Those objects are automatically marked immutable by the kernel. On to static executables. The kernel loads a static ELF binary into memory as a text segment (rx permission), followed by a data segment (rw permission), a bss (zero'd data, rw permission), and a rodata segment (ro permission). The order of these varies per architecture. There is an overlay of this called the "GNU_RELRO", which is pretty uhm special. I've created a new overlay called "OPENBSD_MUTABLE", which is page-aligned and must not be made immutable. As it happens, these two special regions are the only part of the image load that cannot be marked immutable, so the kernel proceeds to mark everything else immutable. When that static program starts running, it will run the crt0 ("c run time") startup code, which can make some small changes in the GNU_RELRO region, and them mark it immutable. So this static executable is completely immutable, except for the OPENBSD_MUTABLE region. This annotation is used in one place now, deep inside libc's malloc(3) code, where a piece of code flips a data structure between readonly and read-write as a security measure. That does not become immutable. It happens to be an page. There is another ugly old wart called "text relocations", and I won't get into it except to say the kernel recognizes such binaries, and skips some immutabilities, but of course crt0 finishes the job. I want to speak a bit about the mechanism. Inside the kernel, immutability is applied to all the regions. And then the exceptions are marked mutable. The kernel is allowed to reverse setting immutability, but userland cannot. This will come up later. Now let's talk about dynamic executables. The same applies as above for the main program, but then the kernel also loads another object into memory: /usr/libexec/ld.so -- the shared library linker. And instead of starting to run the main program, execution starts in the shared library linker. The shared library linker ELF image contains similar objects. There is a GNU_RELRO, which the kernel cannot mark immutable. There is no OPENBSD_MUTABLE because we don't request creation of one. There is a special "boot.text" section that the shared library linker unmaps upon startup, as a security measure, and the kernel ignores that region. All the other regions of ld.so are marked immutable by the kernel automatically. Now ld.so starts to execute, and the first job it does is to fix it's own relro section, handle text relocations in the dynamic binary, repair some permissions, and then mark itself and the main program immutable. Completely immutable, except for the OPENBSD_MUTABLE page in malloc(3). I was really surprised we got to this point without blowing up the ports tree in a major way. Some of these warts were found along the way and changed the direction a little. And I don't want to talk about sigaltstack right now. ld.so is now responsible for loading the shared libraries required by the program. Shared libraries have the same pieces as regular programs, so they are loaded into memory, but here we hit a problem. In the kernel we could simplistically mark all the regions as immutable, and then reverse it for the special cases. Userland code cannot do that. So we have to keep track of the sections we want to be immutable, as well as the regions that are immutable, and then subtract the differences, and apply the immutables very late in the shared library loading process. I call this code the clipping engine, and I had a lot of bugs in it. There is another way shared libraries are loaded: via the dlopen() call later at runtime. With the flag RTLD_NODELETE, libraries can be marked immutable. Otherwise we must assume they will later be dlclose()'d and unmapped, and immutable isn't allowed. Most of our architectures load objects (programs, ld.so, libraries) in linear fashion, but OpenBSD/i386 has an old line-in-the-sand pre-NX layout that seperates the code and data by a 512MB seperation, and this less simple memory layout results in the clipper actually needing to do more than "1 immutable minus 2 mutables", it is actually closer to "6 immutables minus 3". But all of it is now working on all the architectures. The results can be inspected using the procmap(8) command. To use this, you must temporarily enable sysctl kern.allowkmem=1, in /etc/sysctl.conf, and reboot, because this setting cannot be done after single-user boot. Here is a cheat-sheet from the manual page: In this format the column labeled "rwxSeIpc" comprises: rwx permissions for the mapping S mapping is marked stack e mapping is allowed system call entry points I mapping is immutable (rwx protection may not be changed) p shared/private flag c mapping needs to be copied on write (`+') or has already been copied (`-') Using i386 (because the addresses are smaller), this is a cat binary. I will manually delete entries which aren't relevant for various reasons (mostly, malloc or mmap allocations) # procmap -p $pid Start End Size Offset rwxSeIpc RWX I/W/A Dev Inode - File 0cda9000-0cda9fff 4k 00000000 r-x-eIp+ (rwx) 1/0/1 00:00 0 - [ uvm_aobj ] 161d8000-161dcfff 20k 00000000 r----Ip+ (rwx) 1/0/0 04:00 77777 - /bin/cat [0xd587f000] 161dd000-161f2fff 88k 00004000 r-x-eIp+ (rwx) 1/0/0 04:00 77777 - /bin/cat [0xd587f000] 361d8000-361d8fff 4k 00000000 r----Ip- (rwx) 1/0/0 00:00 0 - [ anon ] 361d9000-361dafff 8k 0001a000 rw---Ip- (rwx) 1/0/0 04:00 77777 - /bin/cat [0xd587f000] 361db000-361dbfff 4k 0001c000 r-----p- (rwx) 1/0/0 04:00 77777 - /bin/cat [0xd587f000] cd6c8000-cefc7fff 25600k 00000000 -----Ip+ (rwx) 1/0/0 00:00 0 - [ ] cefc8000-cf7c7fff 8192k 00000000 rw-S-Ip- (rwx) 1/0/0 00:00 0 - [ ] total 8436k Everything is immutable except for the malloc 4k page (361db000) which was marked "OPENBSD_MUTABLE". The first 4k object is r-x-eIp+, that's the signal trampoline. Then the rodata, text, the GNU_RELRO showing itself, and then data and bss. The stackguard and stack are at the end. I will avoid the OpenBSD/i386 overlapping mapping confusion, and switch to a more linear-mapped architecture to show the dynamic binary sed(1), which uses one shared library, sorry it is 64-bit architecture so the addresses are huge: Start End Size Offset rwxSeIpc RWX I/W/A Dev Inode - File 00000b35973ef000-00000b35973f1fff 12k 0000000000000000 r----Ip+ (rwx) 1/0/0 04:05 596337 - /usr/bin/sed [0xfffffd83c2256200] 00000b35973f2000-00000b35973f7fff 24k 0000000000002000 r-x--Ip+ (rwx) 1/0/0 04:05 596337 - /usr/bin/sed [0xfffffd83c2256200] 00000b35973f8000-00000b35973f8fff 4k 0000000000000000 r----Ip- (rwx) 1/0/0 00:00 0 - [ anon ] 00000b35973f9000-00000b35973f9fff 4k 0000000000000000 rw---Ip- (rwx) 1/0/0 00:00 0 - [ anon ] 00000b35973fa000-00000b35973fafff 4k 0000000000000000 rw---Ip- (rwx) 1/0/0 00:00 0 - [ anon ] 00000b37a4262000-00000b37a4298fff 220k 0000000000000000 r----Ip+ (rwx) 1/0/0 04:05 362882 - /usr/lib/libc.so.96.4 [0xfffffd83bff74b10] 00000b37a4299000-00000b37a433dfff 660k 0000000000036000 r-x-eIp+ (rwx) 1/0/0 04:05 362882 - /usr/lib/libc.so.96.4 [0xfffffd83bff74b10] 00000b37a433e000-00000b37a433efff 4k 00000000000da000 r----Ip- (rwx) 1/0/0 04:05 362882 - /usr/lib/libc.so.96.4 [0xfffffd83bff74b10] 00000b37a433f000-00000b37a4344fff 24k 00000000000db000 r----Ip- (rwx) 1/0/0 04:05 362882 - /usr/lib/libc.so.96.4 [0xfffffd83bff74b10] 00000b37a4345000-00000b37a4346fff 8k 00000000000e0000 rw---Ip- (rwx) 1/0/0 04:05 362882 - /usr/lib/libc.so.96.4 [0xfffffd83bff74b10] 00000b37a4347000-00000b37a4347fff 4k 00000000000e2000 r-----p- (rwx) 1/0/0 04:05 362882 - /usr/lib/libc.so.96.4 [0xfffffd83bff74b10] 00000b37a4348000-00000b37a4355fff 56k 0000000000000000 rw---Ip- (rwx) 1/0/0 00:00 0 - [ anon ] 00000b37ce440000-00000b37ce444fff 20k 0000000000000000 r----Ip+ (rwx) 1/0/0 04:04 1866250 - /var/run/ld.so.hints [0xfffffd83bebe6378] 00000b37e8eb8000-00000b37e8ebafff 12k 0000000000000000 r----Ip+ (rwx) 1/0/0 04:05 518403 - /usr/libexec/ld.so [0xfffffd83c089d290] 00000b37e8ebb000-00000b37e8ebcfff 8k 0000000000000000 -----Ip+ (rwx) 1/0/0 00:00 0 - [ anon ] 00000b37e8ebd000-00000b37e8ec8fff 48k 0000000000005000 r-x-eIp+ (rwx) 1/0/0 04:05 518403 - /usr/libexec/ld.so [0xfffffd83c089d290] 00000b37e8fb8000-00000b37e8fb8fff 4k 0000000000011000 r----Ip- (rwx) 1/0/0 04:05 518403 - /usr/libexec/ld.so [0xfffffd83c089d290] 00000b37e8fb9000-00000b37e8fb9fff 4k 0000000000012000 rw---Ip- (rwx) 1/0/0 04:05 518403 - /usr/libexec/ld.so [0xfffffd83c089d290] 00000b37e8fba000-00000b37e8fbafff 4k 0000000000000000 rw---Ip- (rwx) 1/0/0 00:00 0 - [ anon ] 00000b3819cfc000-00000b3819cfcfff 4k 0000000000000000 r----Is- (r--) 1/0/1 00:00 0 - [ uvm_aobj ] 00000b3850ed3000-00000b3850ed3fff 4k 0000000000000000 r-x-eIp+ (rwx) 1/0/1 00:00 0 - [ uvm_aobj ] 00007f7ffdecb000-00007f7fff7cafff 25600k 0000000000000000 -----Ip+ (rwx) 1/0/0 00:00 0 - [ ] 00007f7fff7cb000-00007f7ffffcafff 8192k 0000000000000000 rw-S-Ip- (rwx) 1/0/0 00:00 0 - [ ] sed is once again early, leading with rodata, then text (but the main program has no system calls!), then another readonly which is surely GNU_RELRO, followed the data and bss. libc.so happens to be the next region, rodata, code, GNU_RELRO which is two regions and quite large, followed by the malloc(3) mutable page, and then libc's data and bss. The shared library linker has mapped it's database ld.so.hints, and has marked it immutable by itself. Later on, the ld.so itself shows up, mapped in the same rodata/text/relro/data/bss order. At the end we see the stackguard and the stack. These are small programs. I've spent some time looking at chrome and emacs, and they are looking really clean. I think there is a small glitch on hppa. Maybe there are others. We have two linkers in the tree (ld.bfd from binutils, and ld.lld from clang). It is possible that some of the obscure architectures have layout issues that are either linker bugs, or handling bugs in my code. But for now it seems good enough to move forward and see if there are any other issues found in the ports tree ecosystem.
while Otto noted that
From: Otto Moerbeek <otto () drijf ! net> Date: Fri, 18 Nov 2022 06:42:30 +0000 To: openbsd-tech Subject: Re: More on mimmutable On Thu, Nov 17, 2022 at 08:10:05PM -0700, Theo de Raadt wrote: > So this static executable is completely immutable, except for the > OPENBSD_MUTABLE region. This annotation is used in one place now, deep > inside libc's malloc(3) code, where a piece of code flips a data structure > between readonly and read-write as a security measure. That does not become > immutable. It happens to be an page. This will change. I have code ready to change the init of that malloc data structure so that the one page wil be modified to contain the right data, then made readonly and then made immutable. -Otto
So we have exciting times ahead in snapshots land and looking forward to the next release.
(Comments are closed)