Contributed by dlg on from the kernel-hackers dept.
I finished a major reworking of the ami(4) driver about two weeks ago, the major goal of which was to streamline the code paths for getting a command from the operating system onto the hardware an off again. I've already described how the paths that commands take on and off the hardware have been split up from one huge, generic, and hairy path into several lightweight and specific paths. However, just this morning marco and I figured out how to improve the interactivity of your system when running with a MegaRAID controller.
ami(4) is a SCSI driver, so its job is to translate requests from the SCSI midlayer (ie, the scsibus driver) in the OpenBSD kernel into commands that go onto the hardware. The scsi midlayer gets requests from sd, which gets requests from the block layer, which gets requests from the filesystem which gets requests from applications running in userspace. The problem here is that userland apps can generated a large number of requests that are eventually turned into a metric buttload of io commands that ami has to deal with. The MegaRAID in my box at home can deal with 126 commands at once, and using iogen (or even find) it isn't hard to generate enough io requests to do just that.
So for the fun of it just imagine that the midlayer is trying to push 126 requests onto ami all at once.
In the old ami code what would happen is that for every command that came into the driver we would busy wait till the hardware was ready to accept a new command, then push it onto the hardware, then return to the midlayer. The midlayer would then immediately issue a new command to ami, which would then busy wait again, and so on. Imagine doing that 126 times in a row.
Now realise that all of this is happening in the kernel, meaning that nothing else can run until it's done. The result is that you get these really noticable pauses when ami puts all these commands on the hardware, and the main culprit is the busy waits that ami does while it waits for the hardware to become ready for a new command. In the worst cases I've seen ami lock up the machine for 3 or 4 seconds when this happens.
Getting rid of these busy waits has been the main reason I've been reworking the ami driver. So here is what happens now when the midlayer tries to put the same 126 commands on the hardware.
The new code will take a request from the SCSI midlayer and put it on a worklist. If the hardware is too busy to take that command right now, ami will schedule a timeout to be run at the next tick of the clock interrupt, and try it again from there. After scheduling the timeout it returns to the midlayer which pushes another command into the driver. Each time the midlayer gives ami a command, we just add it to the list and try to put it on the hardware. Eventually, the midlayer will stop giving ami commands and it will return control to the rest of the system.
Now ami has a list of commands that it has to put on the hardware. At the next clock interrupt it tries to submit commands from the work list onto the hardware. If the hardware gets busy again and wont take more commands from the work list we schedule another timeout and try them again from there.
Notice that now we're not busy waiting for the hardware to become ready for the command? Instead we're trying to use the hardware every time the clock ticks. This means that other stuff can run in between the clock ticks which results in an improvement in the interactivity of the system.
Most of this work was finished about two weeks ago, but unfortunately I misunderstood how the timeout API worked. I was accidentally scheduling the retries to happen after 0 ticks of the clock which meant that the current clock interrupt would keep running the new scheduling attempts. This in turn led to the same behaviour I was trying to avoid because we were basically looping until all the commands had been put on the hardware. It wasn't till marco@ was reading the code that he pointed out what I'd done wrong. He committed the fix to it this morning.
So now ami(4) is streamlined and doesn't busy wait.
(Comments are closed)