Contributed by Peter N. M. Hansteen on from the best laid plans of pufferfish and ... dept.
In a message to tech@
titled
openat(2) is mostly useless, sadly
Theo de Raadt (deraadt@
) describes how the
openat(2)
family of system calls has failed to live up to expectations in practice,
and he proposes changes that may improve the situation.
Theo writes,
List: openbsd-tech Subject: openat(2) is mostly useless, sadly From: "Theo de Raadt" <deraadt () openbsd ! org> Date: 2025-05-28 14:03:29 The family of system calls related to openat(2) are mostly useless in practice, rarely used. When they are used it is often ineffectively or even with performance-reducing results. int openat(int fd, const char *path, int flags, ...); These are the others:
and the rest of the message is the diff (againstsys_fstatat sys_utimensat sys_chflagsat sys_pathconfat sys_faccessat sys_fchmodat sys_fchownat sys_linkat sys_mkdirat sys_mkfifoat sys_mknodat sys_readlinkat sys_renameat sys_symlinkat sys_unlinkat The idea is that you can open a directory as fd, typically using O_DIRECTORY, and then do relative accesses. This will reduce lookups, and corresponding locking operations in the kernel. In practice two things get in the way, as POSIX specs say: The openat() function shall be equivalent to the open() function except in the case where path specifies a relative path. 1) What if it is not a relative path, meaning /etc/passwd? openat(herefd, "/etc/passwd, O_RDONLY) will open that file and completely ignore herefd. 2) What if the relative path is upwards, meaning "../../../../something". It walks up the path, and opens it. To keep it simple, these calls were not designed to assist any security model. Both FreeBSD and Linux have designed variations which do this. Since all the *at(2) functions have a flags parameter, their strategy was to add an additional flag which didn't allow upwards traversal. I think that misses the point, and have a different proposal. Let's create directoryfd's
which cannot traverse upwards. Mark the object, instead of requiring a programmer to put a flag on every system call acting upon the object. Two operational flags are added, O_BELOW and F_BELOW. Creating such a locked directory fd is done with either dirfd = open("path", O_DIRECTORY | O_BELOW); or you can lock a pre-existing dirfd: fcntl(dirfd, F_BELOW); This dirfd has two charactistics. Absolute accesses always fail with ENOENT. Relative accesses that attempt to traverse upwards fail with ENOENT. You can openat(dirfd, ".") but you cannot openat(dirfd, ".."). Code using readdir() or similar must be careful because they will be provided with "." and ".." but operations on ".." will now fail. --- An interesting use case shows up that this is a tiny bit like a chroot() system call allowed for non-root users. You can dirfd = open("path", O_DIRECTORY | O_BELOW); fchdir(dirfd); Your process is now contained inside that directory. This does not have the classic risks that prevented providing chroot() to regular processes (meaning, the opening of absolute paths inside the chroot could confuse library functions because they are now inspecting the user-created files, and the consequences of this were considered too grave). Absolute paths accessses with open() start at the process current directory, and now fail. I have not explored this regular user chroot-like thing extensively yet. Some semantic changes maybe be desired. There's a chance that this becomes something we want to use in many daemons instead of chroot(). This is just a draft. The main idea comes out of review one program which uses openat() strangely, and wondering if we can do pathname containment better in the kernel. This can work nicely alongside unveil(), but it is cheaper because the kernel doesn't need to hold references to vnodes like unveil() does. Index: […]
-current
) that implements the draft proposal.
What do you think? As a developer, what would this mean for the code you write and maintain? Testing and feedback is welcome, as always.