Contributed by marco on from the dept.
Something I really like in my OpenBSD work, is that I get to play with lots of odd hardware, each platform having unique pecularities, such as non-coherent virtually addressed caches, register windows, or externally controllable MMUs.
This gives me a fair share of janitor and QA work, which benefits the project as a whole. But sometimes, I need to take a break from this. When I am taking breaks, I am usually playing with old or obscure hardware noone is interested in, just for the sake of it. I can't help but do this - I must have the computer tinkerer's gene. Sometimes getting a long dead machine to display a blurred ``Hello World'' on a dying-of-old-age 100lbs monitor with a unique connector nobody remembers the pinout of makes me happy for the next ten months. Sometimes, I need a greater achievement.
During the second half of April, I had the chance to have more spare time than expected, so I could afford tinkering with something different, for a change. But since that spare time was unexpected, I had no idea what to do!
If you are following source-changes@, or simply checking ``who-does-what'' in OpenBSD, you probably know that I am responsible for most of the m88k-based work. This processor is a very elegant RISC design; before Motorola killed it in favour of the PowerPC processor, several lines of workstations were built on top of this processor. The three most successful, and most known lines were the Data General AViiON, the OMRON Luna88k, and Motorola's own m88k-based VME boards (MVME181, MVME187, MVME188, MVME197). OpenBSD was initially ported to Motorola's boards; then Kenji Aoyama ported it to the Luna88k, but nothing had happened for the AViiONs yet.
This was not exactly surprising; while Motorola's MVME boards are somewhat common among Unix hobbyists, and Luna's are still common in Japan, the AViiONs are often unheard of. The lack of any open source operating system available for these machines probably did not help!
I'll confess I was not interested in AViiONs myself, despite being a bit familiar with DG/UX which I used for some time in '99. My todo-list is large enough already!
But things changed when Chris Tribo, eagerly waiting for an operating system for his AViiONs, decided to publish some technical information about the machines he had available, borrowed from Data General technical books. Out of curiosity, I had a look, and it did not took me long to realize Data General had built the AViiON machines around a modified Motorola MVME188 design. A design I had extensive documentation for (including schematics!), which was supported by OpenBSD/mvme88k and which I knew well.
With these similarities, I had to give the port a try. Except I had no hardware, and wasn't interested. Plus I had no time. Until last month...
So I looked at this as a challenge: with the partial documentation I had access too, and my knowledge of the MVME188 design, would I be able to get a kernel to run?
Ok, I lied. I did not even ask me this question. I had thought about this for two years, it was time to write code. So I took an OpenBSD/mvme88k kernel, changed its name to OpenBSD/dg88k, removed all the non-MVME188 parts, and started changing the various register addresses to match the AViiON 400 addresses. This even compiled, but that kernel had no drivers but serial ports (using the same chip as on MVME188, of course), so the kernel would not be very useful.
Some more reading, a few guesses, and I had something for the on-board ethernet - enough to get a diskless system running over serial console, to begin with (actually, not really - it was obvious the interrupt vector for the ethernet controller was hardcoded in the design, yet I had no idea which value it was, but the first ``spurious interrupt'' message would tell me). Then I needed to replace the Motorola PROM calls with their AViiON equivalents, and I had a kernel ready to go. Except I had no bootloader for it!
At the moment, OpenBSD/luna88k and OpenBSD/mvme88k use a.out format binaries. But the AViiON PROM expects COFF binaries! So, in addition to the kernel, I needed to write a quick and dirty a.out to ECOFF converter.
Surprisingly, while changing the mvme88k code to become dg88k was sometimes tedious work, while writing this tool was just a matter of 300 simple lines of code, I had more fun playing with the kernel sources. So, before starting on this tool which would be the first real work done for the port, I contacted Chris Tribo, asking him if he would be interested in testing my work.
Fortunately for me, Chris agreed, and helped me with more documentation, which let me understand the few minor changes Data General had made against the original MVME188 design.
I gave Chris a kernel image converted to COFF. He tried to netboot it. Needless to say, he was not impressed by the results - the kernel froze before even writing anything on the console!
But there was no reason to despair or give up. I had the whole day to carefully proofread my code and try to find why it would misbehave; I would then upload a kernel in the evening, and thanks to living in different timezones, Chris could download it and play with it while I was asleep, mailing me his results during the night, and I could discover the results for breakfast. This is not a fast way to test, but there was no deadline anyway!
One day was enough for me to spot me stupidest errors, so every day Chris would test a new kernel which would print more things and eventually die. And everyday I would fix stupid bugs or typos, and get further. Until it was time to enable virtual memory. The kernel would freeze immediately after, and it had no reason to.
At this point, I spent nearly a week experimenting different things and ideas, usually causing severe regression because they were not good ideas. Eventually, I decided I would stop using the PROM routines to output characters and directly talk to the serial port. The kernel was then able to run substantially further, but froze as well. The freeze conveniently happend to match with a PROM call between two printf() calls, thus proving the PROM call to be the bad citizen. This meant that, while running with virtual memory enabled, the kernel was not respecting some of the PROM's assumptions. I had to move all the code which needed to use PROM calls (for example, to get the onboard ethernet address, which is stored in NVRAM) before enabling virtual memory.
The kernel thanked us by running to the end of the device autoconfiguration. It was not yet able to know it had been booted from the network, and was asking for its root device. Chris told it ``le0'', pressed enter... and got greeted by a panic from the interrupt handling code.
I was puzzled. The interrupt handling code was the same as on MVME188, except for the on-board interrupt source being different. Yet reserved bits in the interrupt status register were set, while they should have been masked, and obviously the kernel did not know how to handle a fictitious interrupt source!
I added a few traces, until I realized that it was obvious that some hardware registers I was reading to determine which interrupts the kernel could service were read as 0xffffffff (i.e. all bits set to one), because they were in fact write-only, unlike on MVME188! All I needed was to cache in global memory the values which had been written in these interrupt mask registers, and instead of reading back the hardware address, the kernel would use the cached value. Hoping this would be enough, I uploaded a new kernel for Chris and went to sleep.
When I got up the next morning, my mailbox contained a mail whose subject said ``Welcome to multiuser''. Chris was happily running multiuser but diskless on his machine. It was time to clean the code and commit it. The CVS tree shall now document all my future progress on this port!
(Comments are closed)
By Anonymous Coward (159.148.95.9) on
Alpha releases before 3.8 was quite usable. Now on 3.9 it is fairly impossible to compile kernel or make build without hitting it.
Comments
By Anonymous Coward (68.104.1.58) on
By Mathieu Sauve-Frankel (70.81.113.49) msf@openbsd.org on
> Alpha releases before 3.8 was quite usable. Now on 3.9 it is fairly impossible to compile kernel or make build without hitting it.
Thanks for the suggestion anonymous whiner... we'll get right on that!
Comments
By Anonymous Coward (159.148.95.9) on
np.
By Anonymous Coward (125.63.151.148) on
> > Alpha releases before 3.8 was quite usable. Now on 3.9 it is fairly impossible to compile kernel or make build without hitting it.
>
> Thanks for the suggestion anonymous whiner... we'll get right on that!
>
So much for quality software.
Comments
By Anonymous Coward (204.209.209.129) on
> > > Alpha releases before 3.8 was quite usable. Now on 3.9 it is fairly impossible to compile kernel or make build without hitting it.
> >
> > Thanks for the suggestion anonymous whiner... we'll get right on that!
> >
>
> So much for quality software.
So much for quality bug reports.
By Miod Vallat (80.65.224.82) miod@ on
> Alpha releases before 3.8 was quite usable. Now on 3.9 it is fairly impossible to compile kernel or make build without hitting it.
I'd rather say 3.2 was the last release for which the odds of hitting it were so low no non-developers noticed it.
This problem is far from trivial, this is a memory corruption in the kernel which eventually affects the scheduler process queues, which cause wrong struct proc to be scheduled (with the panics as an evil side effect of dereferencing wrong pointers).
On most platforms, we can play with page protections to eventually spot the offending access, at a reasonable performance hit while the issue is being narrowed down.
Unfortunately, on alpha, for performance reasons as well as TLB friendliness, all kernel memory accesses are done in a special directly translated window - think of it as a several *gigabytes* page in which all the physical memory is mapped.
Because of this, faulty memory accesses from the kernel, as long as they end up in existing physical memory, will not be caught by anything.
And finding ways around this (for example, to force scheduler data NOT to be addressed through this window) without having to change too many parts of the kernel is non trivial.
That said, this bug is not being ignored; several developers are thinking about it and trying new ideas, but unfortunately with no success so far, which can be pretty disappointing, and does not help motivation on fixing this particular bug.
By Anonymous Coward (212.43.240.148) on
Now I hope there will be some undeadlier around that could give you an AViiON (or did you already found one ?).
Comments
By Miod Vallat (80.65.224.82) miod@ on
Unfortunately I still haven't found any, but making some publicity about this port will hopefully increase the odds of finding someone willing to part from his AViiON in the near future...
Comments
By Anonymous Coward (134.58.253.131) on
By Chas (147.154.235.52) on
<p>I just checked ebay and all that's listed is power supplies and RAM.</p>
Comments
By Martin (84.245.180.170) martin@oneiros.de on www.oneiros.de
Comments
By gwyllion (134.58.253.131) on
Freaking expensive :)
By Miod Vallat (80.65.224.82) miod@ on
Looks like a pc-based high end modern AViiON server with its CLARiiON disk array.
By Anonymous Coward (62.252.32.11) on
I'll just smile and nod and say this: well done :).
Comments
By Jared G (24.250.75.36) on
>
> I'll just smile and nod and say this: well done :).
Same here. Exactly how does one get to work at this level? I would love to try coding some kernel internals but I just get discouraged when I try reading code without a good idea of where to start. Does anyone have any suggestions?
Comments
By Philipp (89.49.217.33) pb@ on
How about reading through
http://www.cs.ualberta.ca/~pawel/COURSES/379/cmput379.html
(all slides on the bottom of the page)
I'd recommend that one does not start on dealing with HW before
kernel/userland concepts are not really clear.
For starters, hmm, maybe the sensors framework code, or the very
initial pf diff (pf.c 1.1 :>)
Comments
By Jared G (24.250.75.36) on
> How about reading through
> http://www.cs.ualberta.ca/~pawel/COURSES/379/cmput379.html
> (all slides on the bottom of the page)
>
> I'd recommend that one does not start on dealing with HW before
> kernel/userland concepts are not really clear.
>
> For starters, hmm, maybe the sensors framework code, or the very
> initial pf diff (pf.c 1.1 :>)
>
Thanks for the advice. Now that I think about it, undergrad courses in OS design may be a great place to start digging for information. Usually there will also be some assignments posted to help gauge understanding.
By Anonymous Coward (83.147.128.114) on
Like everything else, it's basically just a matter of doing it. Few people are motivated enough to keep at it though. You need to keep going despite frustration. Its a question of putting in the work and not giving up, rather than magic.
Comments
By Anonymous Coward (70.179.123.124) on
I'm reading through kernel source (uipc_mbuf2?.[ch] to be specific), trying to understand what's there, and then starting to find code where mbuf functions are called from.
Going from 1) understanding the function to 2) understanding the use; not going to say that I'm the king of C reading, but it's been illustrative thus far.
HTH
By Chad Loder (69.109.56.181) on
Comments
By Miod Vallat (80.65.224.82) miod@ on
This is hard to know as I don't keep track of all the bugs I find, plus not all of those are found while explicitely working on a particular port.
As a more concrete example, working on the aviion port made me spot (and fix) two bugs in the MVME188 suppport in OpenBSD/mvme88k, as well as several optimizations.
By gwyllion (134.58.253.131) on
Comments
By Anonymous Coward (83.147.128.114) on
Maybe because aviion is the name of the actual machine, rather than the company that produces it (dg).
By Miod Vallat (80.65.224.82) miod@ on
``dg88k'' was the name Chris coined when starting to document the hardware, which matches the ``manufacturer + architecture name'' commonly encountered, and follows the ``luna88k'' and ``mvme88k'' trend.
However, when the architecture name is not meaningful because the manufacturer only used one, or the other ones are so rare noone remembers about them (who, here, remembers sgi producing m68k-based irises, or have ever seen one?), it does not make sense to carry it around. This is why our sgi port is called sgi, not sgimips (as in NetBSD) since there won't probably never be an sgi68k port (and if there is, it should better be named iris68k or iris).
luna88k and mvme88k had to embed ``88k'' since the original Omron Luna machines were m68k-based (and are supported by NetBSD/luna68k), while Motorola has produced m68k, m88k and powerpc-based VME modules, hence the mvme68k, mvme88k and (not ready for primetime) mvmeppc ports.
Of course, there is no rule written in stone (after all, with this logic, the alpha port should have been decalpha, with the vax port being decvax), so the last call is up to the port maintainer. And in this case, it is clear to me that ``aviion'' is much better than ``dg88k'' to describe the older AViiON hardware. With this name, the worst which can happen is people with modern (x86) AViiON hardware trying the wrong port. With ``dg88k'', noone will ever try this port...
By Anonymous Coward (81.57.42.108) on
Maybe like you renaming this blog a blob ; of course it can't be a blob: as lyrics.html page points out, "Blobs cannot be supported by developers." ;)
By Fábio Olivé Leite (15.227.249.72) on
I hope you DO get to see the sun often enough!