Contributed by pitrh on from the guard my RET, you dept.
This year I went to BSDCAN in Ottawa. I spent much of it in the 'hallway track', and had an extended conversation with various people regarding our existing security mitigations and hopes for new ones in the future. I spoke a lot with Todd Mortimer(mortimer@). Apparently I told him that I felt return-address protection was impossible, so a few weeks later he sent a clang diff to address that issue...
The first diff is for amd64 and i386 only -- in theory RISC architectures can follow this approach soon.The mechanism is like a userland 'stackghost' in the function prologue and epilogue. The preamble XOR's the return address at top of stack with the stack pointer value itself. This perturbs by introducing bits from ASLR. The function epilogue undoes the transform immediately before the RET instruction. ROP attack methods are impacted because existing gadgets are transformed to consist of "<gadget artifacts> <mangle ret address> RET". That pivots the return sequence off the ROP chain in a highly unpredictable and inconvenient fashion.
The compiler diff handles this for all the C code, but the assembly functions have to be done by hand. I did this work first for amd64, and more recently for i386. I've fixed most of the functions and only a handful of complex ones remain.
For those who know about polymorphism and pop/jmp or JOP, we believe once standard-RET is solved those concerns become easier to address separately in the future. In any case a substantial reduction of gadgets is powerful.
For those worried about introducing worse polymorphism with these "xor; ret" epilogues themselves, the nested gadgets for 64bit and 32bit variations are +1 "xor %esp,(%rsp); ret", +2 "and $0x24,%al; ret" and +3 "and $0xc3,%al; int3". Not bad.
Over the last two weeks, we have received help and advice to ensure debuggers (gdb, egdb, ddb, lldb) can still handle these transformed callframes. Also in the kernel, we discovered we must use a smaller XOR, because otherwise userland addresses are generated, and cannot rely on SMEP as it is really new feature of the architecture. There were also issues with pthreads and dlsym, which leads to a series of uplifts around __builtin_return_address and DWARF CFI.
Application of this diff doesn't require anything special, a system can simply be built twice. Or shortcut by building & installing gnu/usr.bin/clang first, then a full build.
We are at the point where userland and base are fully working without regressions, and the remaining impacts are in a few larger ports which directly access the return address (for a variety of reasons).
So work needs to continue with handling the RET-addr swizzle in those ports, and then we can move forward.
[followed by the diff]
You can find the full message with the diff here, or if you're already on on tech@, in a mailbox near you.
(Comments are closed)
By Wolfram (89.166.208.39) on
Comments
By Damien Couderc (91.135.188.215) on
As far as i understand it, the return address is modified at the beginning of the call and is modified back in its original state right before returning from the call.
So if someone want to abuse the code the call will be ineffective because the return address will be modified back without it being modified at the beginning. This will result in unpredictable return address meaning that the ROP attack is unusable.
By Darren Tucker (dtucker) on
(I'm probably oversimplifying, but here goes):
Traditional buffer overflow exploits rely on jamming some executable code into memory then getting that executed somehow, eg by overwriting the return address of the stack frame to return your code. Various techniques make this harder these days, eg marking writable memory as non-executable ("W^X").
Because of this, exploit writers came up with a way to use existing pieces of code in a process which are already marked as executable which is known as "Return Oriented Programming" (ROP). You find snippets of code that each do a small part of the exploit and end in a "return" instruction (these are known as "gadgets") that together do the thing you want. You then cook up a fake stack with frames that say "return to this address then this address then this address" to chain the gadgets together, overwrite the real stack with this and when the function returns it'll follow the chain and do what you want.
What this change does is: instead of storing the plain return address on the stack when making a function call, it stores the return address to return to scrambled a value that is unknown to the attacker and is later unscrambled before being used. Since the attacker can't predict the scrambling value it should be harder for them to cook up this fake stack that makes the code do what they want because when they get unscrambled and used the frames will point off to unpredictable places instead of their intended gadgets.
Comments
By Noryungi (noryungi) on
I am very very far from being a kernel hacker, and it definitely clarified the whole thing for me. Much appreciated.