Contributed by merdely on from the riscrisc dept.
After seeing Miod Vallat's (miod@) commit to source-changes@ today, I thought it'd be nice to ask Miod a few questions about mvme88k and the work he did to enable multiprocessor support for the architecture. (More information about this hardware platform is available the OpenBSD mvme88k page) Here is the commit message:
CVSROOT: /cvs Module name: src Changes by: miod@ 2007/11/09 11:15:22 Modified files: distrib/notes/mvme88k: contents hardware etc/etc.mvme88k: Makefile.inc Log message: Build and advertize bsd.mp on mvme88k.
The interview follows...
ME: Is there anything interestingly unique or different about mvme88k from other architectures? Miod: Well, yes and no. At first glance, this looks like a "boring" RISC design. However, although this was Motorola's first RISC processor, it built on experience from existing RISC processors (such as mips), and it shows.
The m88k have very few instructions (between 50 and 60, I don't remember the exact number), and a very simple instruction set, which is very easy to understand or write.
Another interesting thing is that it has no separate floating point register: any regular (32 bit) register, or any pair of registers (thus 64 bit), can be used as a floating point operand.
The first generation (88100) was also designed with multiprocessor systems in mind: the processor itself does not have internal caches or an MMU, but instead defers this to so-called 88200 CMMU units, which implement both functions (hence their name). This was also intended to allow for low-cost MMU-less systems, and NCD built many X terminals with a 88100 and no MMU.
All the CMMU sit on the same so-called M-Bus with the processor, and work closely with it to do their MMU work. And since they all sit on the same bus, they can monitor each other to achieve a good cache coherency across processors, and they can also make themselves visible to all processors: this allows a given processor to perform cache invalidation or MMU operation on a remote processor (really, on the CMMU associated to a remote processor) without needing to interrupt the remote processor. This makes multiprocessor implementation, from a software point of view, much easier than on more standard architectures.
However, the second generation of the m88k family (88110) went back to put the cache and MMU inside the processor, so - when 88110 systems will eventually work under OpenBSD - the SMP code will be very different (and will require processors to interrupt each other to perform the remote cache and MMU operations, since the "visibility" trick is no longer available.
ME: What do you use the architecture for? Miod: I keep telling people the main reason is for heating purpose, but this joke is really getting old. I guess that, besides from "we can do it, so why not?", this kind of uncommon platform is a good playground to tinker with the kernel.
Also, by being a slow architecture, it does not exhibit the same bugs as faster machines: some subtle timing bugs (races) only show up (or more easily) on fast systems, some only (or more easily) on slow ones. By spotting "low speed" problems, I can find and fix kernel bugs which benefit everyone.
Also, for a long time the MVME188 systems were the only multiprocessor systems I had, so being able to run them multiuser would have allowed me to help working on machine independent multiprocessor issues - and I still intend to do this, but of course, since then I've got a pair of SMP sparc64 systems.
ME: What attracts you to this architecture? Miod: When I got my first m88k systems, seven years ago, the toolchain was broken for a few years, the kernel (compiled with the old gcc 2.8 toolchain) was not very stable, and there was only one person hacking on the mvme88k port (Steve Murphree), and he did not have much spare time to work on this.
It was obvious that, for this architecture not to be dropped from OpenBSD in a somewhat near future, there was work to do. And since the system was not reliable, there was no risk of breaking things. So it was fun to slowly climb the ladder back to the "supported platform" OpenBSD citizenship.
ME: How long did it take to get multiprocessor support working in it? Miod: I am not sure how to answer this. Having only one processor running on my MVME188 systems has always felt like a waste, so I really wanted OpenBSD to run on all processors, eventually. Of course there were more important hurdles to address first.
Then, as usual, all it takes is a combination of mood and time. I have always been in the mood to work on this, so the mood was not a problem. Time was.
Early 2005, the company I was working for folded, and I was unemployed for several months. Other reasons caused me to not have access to my machines (but a sparc laptop) for a few months as well. So since I had plenty of time on my head and needed to occupy my mind, I wrote the SMP support for mvme88k in a few days. Of course I could not test it at all, it was just sitting in a tree.
Then, when I got my machines back, I gave it a shot, and of course it did not work: there were a few simple bugs. Once they were fixed, there was a very subtle bug which caused the kernel to panic as soon as it was trying to start init(8). That was quite the party crusher: all I had was these extra "cpu#" lines in the dmesg, and the system would not even run for half a second after interrupts were enabled.
Even running the SMP kernel on a single processor machine (just to sort out interprocessor communication, or lack of, bugs) would die in the same frustrating way.
I spent a lot of time adding traces and lots of debugging information, in order to track this down, but to no avail. During every release cycle, I would try once or twice to hunt this bug, spend a few days dissecting traces, lose hair on it, and give up.
Then, almost two years after the first bits of the SMP were committed, and while still facing the same bug, I had the idea of adding a new debugging check in some strategic place, and when it fired, I eventually saw the light and understood where the bug was coming from. And of course, this was one of these nasty bugs which get fixed with a one-liner and makes you whack your forehead and say "Doh!" when you are looking at the diff.
So for the first time, the mvme88k bsd.mp was running single user. And a few minutes later, it was running multiuser. And a few minutes later, it was panicking. But there had been progress, I knew I was close to a stable MP systems. A few fixes here and there, and a few discussions with other developers to get good ideas and it was done.
Thus the total time to get this platform working in multiprocessor mode was about 4 weeks, cumulative. But it spanned two and a half years.
ME: Were there any recent commits (cpu_switchto?) that allowed you to move forward with MP support? Miod: I don't know. The cpu_switchto change, which was necessary for sparc64 SMP to work, actually made mvme88k SMP worse, because this change has an indirect requirement on the way the secondary processors are started.
Unlike all other multiprocessor platforms that OpenBSD runs on at the moment, mvme88k systems start the secondary processors very late, and I had to move some parts of their initialization earlier in the boot process.
Apart from this, cpu_switchto did not help, since the bug which prevented me from making progress for two years was still there.
ME: What kind of performance improvement are you seeing for things like building a kernel (with make -j?) with bsd.mp? Miod: I don't know yet - keep in mind that I have only a few hours of stable uptime, and then I am still testing new changes. I would like to compare build times between a 33MHz MVME187 and a 4x25MHz MVME188, but there are a few Makefiles in the tree which do not work yet with our make in -j mode. As for the kernel compilation, building the kernel with make -j4 is probably not 4 times faster, but it's definitely better than not using the j option (-: ME: Is this ready for general use or is there still work to be done? Miod: After the next round of commits (a pair of commits, actually) which I am going to put in tonight, HEAD should be ready (that's why I enabled bsd.mp in the release builds).
There is still work left, since there are a few things I want to change, in order to improve performance of the multiprocessor kernels, but this is less vital.
[It was a trio of commits, actually: 1, 2 and 3]
ME: How can the community help you? Miod: Well, people who happen to have MVME188 systems can help testing, once the next snapshot featuring bsd.mp will be available. I don't think there are many such people, though (-:
People can otherwise help by paying for my beer tab (or my electricity bill, but the beer tab is less expensive), or by supporting the OpenBSD project in general, by buying OpenBSD merchandise, or donating to the project.
ME: Anything else you can think of? Miod: Right now? cvs diff, cvs ci, and then going to sleep (only slept 4 hours last night).
I'd like to thank Miod very much for taking the time to respond to my questions. Please support Miod by buying him beer when possible and support continued OpenBSD developments like this through CD sales and donations.
(Comments are closed)