Contributed by rueda on from the puffy punting ports and keys dept.
The hackathon started with me loosing my baseball cap in a bus in Poland and then having my phone wipe itself clean (including 2 factor auth).
My phone overheats which results in a soft reboots. That would be a very boring thing except someone at Google decided that after a reboot it's good to keep the screen unlocked and ask the user for the unlock code to decrypt the drive. Unfortunately my left leg doesn't know the unlock code so it keyed it in wrong 10 times resulting in a total wipe of the device on the first day on the way to the pub - that also happens to be my 2 factor auth phone.
With all external signals being cut off I started hacking on ports. Some time ago I ported devel/luaposix & devel/lua-bit32 without noticing that jsg@ did the same thing a few week before me. We exchanged some emails back and forth but couldn't get to an agreement on how to move forward with them. During the hackathon we happened to sit next to each other so after a 10 minute chat I came up with a mixed version of our changes that we both agreed on. I also imported graphics/fotowall which was waiting on the ports@ list due to the tree being locked. After that I tested some ports and started working on a new one for tome4 which is a rather old but very popular rogue-like.The first thing that ToME needs is premake4 which is a build system similar to CMake but exposing the full Lua language for build configuration. Preparing a port for devel/premake4 took a whole day as upstream bundles a full copy of the Lua sources without any OpenBSD patches. I did some Makefile surgery and that port now properly uses the one provided by our ports system, devel/premake4 now sits on ports@ waiting for OK's.
When the build system was done I started porting the game itself. Unfortunately it takes the same approach as devel/premake4 and bundles 22 dependencies that are heavily entangled with it's build system. I gave it a full day and gave up in the evening as it looked like the proper approach is to go back & forward with upstream patches as maintaining them in our tree would be insane. To prevent wasting more hackathon time I put that port on a todo list for when I get back home.
This marks the 3 days I spent on ports and untangling dependencies I decided to finally fix my athn usb dongle. I reported a problem with it back in December 2015 and stsp@ was kind enough to point me to a possible solution. I read the athn usb driver, looked at iwm and started implementing a watchdog handler in a separate process context using the taskq API. I also managed to find a reliable test that reproduces the timeout on my athn device. In order to do so I have to bring up an SSH tunnel that forwards a remote jenkins instance to my local port 8080. Then accessing that page via Google Chrome leads to the device timing out. I have yet no idea why that specific case causes the device to time out but stsp@ was pretty excited when I managed to reproduce the same behavior on the exact same model of the athn wifi dongle that he owned.
When the diff was ready and working I sent it out to stsp@ for a review. We had a chat and I learned about the different context a driver code can be run in. The ioctls device interrupts, the taskq process context and the USB task context. Apparently the change I did needed more work to prevent those different context stepping on each other, the rewrite was based on adding proper waits in various places.
stsp@ seemed quite happy that someone decided to dive into the USB/Wifi/Networking stack and quickly dumped a full bag of USB wifi dongles for me to hack on when I'm done with the one I own. People also started quickly backing away when the conversations about those areas started so I might have gotten myself into an 'interesting' part of the codebase I took it as a sign and bought a Cthulu figurine to put on the desk while hacking.
After a few iteration of the diff stsp@ felt pretty happy with the diff and sent it up for a review to mpi@. When mpi@ learned that I'm getting my feet wet in the USB stack he also wanted to dump a bag of USB devices on me but my desk was already overflowing with USB wifi dongles. Regarding diff feedback mpi@ raised our awarness to the existance of an USB task API that should be used instead of the tasq API as it already handles properly serialising events with the USB stack. So I went on to re-implement the diff again.
The usbdi task interface isn't documented yet in OpenBSD so I made a mental note to hack on it later. Fortunately it's pretty similar to taskq and a lot of the drivers serve as nice examples of usage so it wasn't that hard to re-implement the watchdog with it.
Again after a few iterations mpi@ felt quite happy but then stsp@ (or him himself) remembered that the watchdog is already running in a separate process context and the watchdog comment was just outdated and misleading. So I proceeded to rip out all that code and implemented the 3 lines that are required to bring the device down & back up again when it times out...
Then we started to think what happens when someone yanks the device while the watchdog is rebooting it from a different context. I added a 10 second sleep in the drivers watchdog handler, triggered my bug and yanked the device out while it was rebooting. Panic.
The first one was quickly identified and fixed by mpi@. Before I managed to reboot there was already a diff sitting in my mailbox to test. I also learned a very nice trick to get the line number from a normal kernel by comparing it to the disassembly of a debug kernel. I don't think I will be able to replicate that process out of memory but knowing it's there should help a lot when needing to learn how to do it exactly
With the fix applied I re-did the whole test again (this time with a debug kernel to avoid wasting time). Panic.
The second one was from the networking stack. My wait time was pretty long so the TCP stack timers kicked in. Again mpi@ sends a diff before I finish rebooting. One kernel recompile later. Panic.
The third one happened a bit deeper in the networking stack. The kernel wanted to remove cached routes to an interface that was already yanked out. This resulted in a kernel assertion failing and dropping me down to ddb. mpi@ and bluhm@ both committed some more fixes. With those in place I rebooted and proceeded with a re-test. Panic.
The fourth one was in the athn driver itself. The interface was half cleaned up (the fields of the ifp data structure were freed but not the interface itself) so when the watchdog tried to access it caused the crash.
Again mpi@, stsp@ and blumh@ had a talk and devised a target solution but it's not yet in so I can't yet commit my final version of the watchdog driver.
It's funny how that diff evolved to reach the slim option.
I also had a talk with stsp@ and he supplied me with some Wifi docs, which should prepare me better to help with implementing 11n support in more drivers. That pretty much sums up my adventure in 'getting started with the kernel'.
Great thanks to mpi@, stsp@ and blumh@ for their patience
In between I met up with edd@ to exchange GPG key signatures. We also extended my key expiry date and he started working on bumps to GPG utility ports. During this we found an issue with one of the CVE patches which resulted in the default value of PATCH_DEBUG being set to yes in order for people not to miss that a patch might have been applied by luck when matching a different part of the codebase.
Fun thing about changing my private key at the hackathon is not having a backup for it. During the driver changes the panics lead to a lot of 'harsh' reboots. I had a pretty big o-shit moment when fsck gave up on the first pass. Fortunately it was possible to recover. I would hate to have an additional device wiped in the same week.
It was really amazing to meet everyone in person. There were a ton of interesting discussions going on. History lessons about the OpenBSD past, present & the future. Everyone was very nice, very friendly & willing to share and pass on knowledge.
I would like to especially thank avsm@ and GemmaG for organizing the event (the dorms were amazing), the OpenBSD foundation for sponsoring our stay and Koparo for sponsoring my travel.
This was my first hackathon and I am looking forward to attending more of them!
Regards, awolk@
final diff
Index: if_athn_usb.c =================================================================== RCS file: /cvs/src/sys/dev/usb/if_athn_usb.c,v retrieving revision 1.42 diff -u -p -r1.42 if_athn_usb.c --- if_athn_usb.c 11 Dec 2015 16:07:02 -0000 1.42 +++ if_athn_usb.c 5 Sep 2016 10:22:39 -0000 @@ -2104,7 +2104,8 @@ athn_usb_watchdog(struct ifnet *ifp) if (sc->sc_tx_timer > 0) { if (--sc->sc_tx_timer == 0) { printf("%s: device timeout\n", sc->sc_dev.dv_xname); - /* athn_usb_init(ifp); XXX needs a process context! */ + athn_usb_stop(ifp); + athn_usb_init(ifp); ifp->if_oerrors++; return;}diff evolution
$ wc -l athn-watchdog* 112 athn-watchdog.diff 131 athn-watchdog.1.diff 163 athn-watchdog.2.diff 159 athn-watchdog.3.diff 212 athn-watchdog.4.diff 31 athn-watchdog.5.diff-sleeper 26 athn-watchdog.6.diff 17 athn-watchdog.7.diff
Thanks for the report and all the great work, Adam! We're looking forward to snapshots with those things in them!
(Comments are closed)
By Anonymous Coward (62.210.105.116) on
By Daniel M (danielm) daniel@melameth.com on
By Anonymous Coward (45.56.158.14) on