OpenBSD Journal

Loongson port about to make 4.7

Contributed by weerd on from the here-be-dragons dept.

Some time ago, we featured an article with a request for hardware. Specifically, Jasper Lievisse Adriaanse (jasper@) was looking for a Lemote Yeeloong and Otto Moerbeek (otto@) had recently received one from a donor to work on the Loongson port. Jasper had his Yeelong sponsored by two donors and received his machine less than two weeks after the article was posted.

Quite a few commits have hit the tree (eg. here, here, and here) since then, mostly from otto and miod for src/ and jasper for ports/ and it looks like the 4.7 release will feature an OpenBSD/Loongson port that should work on the Lemote Yeeloong, the Lemote Fuloong and the EMTEC Gdium.

Undeadly followed up on the donations and asked Miod Vallat (miod@), Otto and Jasper about the porting efforts, please read on for their story:

Otto got his Yeeloong late January, as a donation. He's been working on getting OpenBSD running properly on the Loongson machines, recently focussing on the installation process.

I received my donated Yeeloong in the last week of January. At that point in time the port wasn't stable enough to be self hosting, so I set up a cross building environment first. Apart from the processor bugs Miod can tell horror stories about, the gnu C compiler had problems building some of the sources. I started investigating this problem and it turned out to be a bug in the ProPolice code. The fix I applied also fixed similar problems on the sgi platform, so that shows nicely how a new port can cause other platforms to progress.

After this fix and Miod's work on avoiding the processor bugs, we had a self-hosting environment. There were some rough edges, most of them had to do with the PMON (boot environment) shipped with the Lemote machines. This particular version of PMON is broken in so many ways it's hard to get started telling you about it. But most importantly, it can only load a kernel or a second stage bootloader via netboot or from an ext2 filesystem. I spent quite some time trying to make PMON load from a fat or iso9660 filesystem, but without success.

After creating a working RAMDISK configuration used to create bsd.rd, I decided that to be able to hack and build snapshots at the same time, I needed another machine, so I ordered a Fuloong 2F. Getting this one to work was tricky, mostly because its framebuffer is not yet supported and the serial console code was missing; the Yeeloong machine used to do the initial port does not have a serial port. After Miod helped to get the interrupt routing ok, my code to setup the serial console started working and the Fuloong 2F turned into a supported machine.

I then concentrated on the installation procedure, which needed a way to create an ext2 filesystem. So I ported newfs_ext2fs from NetBSD, and wrote the machine specific parts of the install script. This script will automatically either use an existing ext2 partition to place the bootloader on, or create a small ext2 partition to hold it. The bootloader then accesses the ffs root filesystem to load the kernel.

It was is good fun working on all this, committing to all corners of the OpenBSD tree.

Jasper, a longtime ports developer, has been working on Loongson too. Here's what he has to say:

Right after I got my Yeeloong I installed OpenBSD, and the installation procedure was not nearly as smooth then, as it is now.

As soon as OpenBSD had been installed I checked out the ports tree and assembled a list of ports I felt that were needed most. Which resulted in the first 100 or so packages for mips64el. (mips64el is the application architecture of the Loongson processors.) After a bit more than a week I copied out the first full ports build for mips64el. As on our other mips64 platform, sgi, we suffer from some rather big fallout, due to the fact we don't have Python and GTK+2. These issues are not trivial as they require fixes for binutils and low-level floating point emulation. So that's a "to be continued" part.

On the other hand, quite some ports have already been fixed, and even more ports will be fixed after the ports tree unlocks again.

Along the way I've fixed an issue where a machine with 2GB of RAM would report having 4096TB, which made the kernel rather upset.

Another developer working on the Loongson, doing much of the low-level grunt work, is Miod Vallat. He found some nice undocumented bugs (turns out these were documented, but in Chinese) during the initial phase of the porting effort:

I had been bribed with a Lemote Yeeloong machine last spring; but as usual, my spare time is close to zilch, so I did not really start working on it until July, and even then, I kept being distracted with other duties. Eventually the hardware hackathon in November allowed me to settle down and spend quality time with this machine.

At the end of the hackathon I had the kernel booting up to the point it was asking for a filesystem to mount as root and an init(8) binary to run.

Then regular life resumed, but gentle pressure from other developers eventually caused me to commit this work and slowly start working on the userland bits. Matthieu Herrb (matthieu@) started working on userland with the aim of getting X running as soon as possible, and Otto Moerbeek joined the party a few days later. But our systems would not run stably - after a few hours, or sometimes only a few minutes, they would freeze solid.

So I put my debug hat on and started to research this. It didn't take long to figure out a reproducible way to trigger the freezes in less than a minute; then I started adding extra sanity checks and guard code to the kernel, to try and gather as much information as possible about the problem.

At first I suspected a subtle race condition in my interrupt handling code, which would cause interrupts not to be re-enabled after being serviced; but after carefully reading this code many times, I couldn't find such a bug (and there wasn't any, really).

I ended up losing the few hairs I had, adding code deep down in the exception handling code, to figure out in which state the kernel would hang. This allowed me to figure out that the freezes were always happening while servicing a clock interrupt, while a disk controller interrupt was pending or had just been serviced (one more reason to suspect a race).

These are the times where you'd give everything you have to get a logic analyzer for two minutes. Unfortunately, I no longer work in a place which has logic analyzers, and even then, had I had access to an analyzer, there is no analyzer probe connector on the Yeeloong laptop.

But I did not have such a luxury, and this machine has no serial port. At some point my debugging code was drawing small bars of different colours in the margins of the screen, and when the kernel would freeze I'd gather state information from this meager display. It was sort of a morse code, but with colours!

At this point, I started to get desperate and trying anything to get a kernel to survive my test. I tried running with the cache disabled, it was slow but it didn't help. I tried running diskless, it didn't help either. I tried disabling all interrupt-capable devices but the Ethernet interface, and guess what? It did not help. I went further and tried to disable Loongson-specific functionality, and to my surprise, although I did not get a reliable kernel, it would take much more time for it to freeze. But then I was also changing timing, so the ``subtle race in the interrupt codepath'' theory would still stand.

Fortunately, at this point, I stumbled upon an archived message from the binutils mailinglist, where a Loongson engineer was discussing changes to the assembler to workaround processor misfeatures... the description of the errata was quite vague, but was matching exactly the symptoms I was seeing.

It turns out that this processor has a so-called ``Branch Target Buffer'', which is a cache of the last few recently executed branches through registers (i.e. where the address to branch to is not set in stone in the code, but is held in a register, for example when invoking a function pointer... such as an interrupt handling routine). Since branch misprediction causes a 20 pipeline cycle penalty on this particular processor, it is important to try to prevent suffering such penalties. To do so, this processor has a cache of the last 16 branch addresses, and it will use it to fetch and decode the instructions at the branch address in advance. If the branch is not taken, these instructions will be canceled in the processor pipeline.

So far, so good, every modern RISC processor has something similar to this.

Now, the Loongson designers decided that, if the instructions to be canceled are loads from memory, it will not hurt to let the memory loads complete, in order to fill a cache line from memory; the rationale behind this being that the odds of this particular memory being used soon are high, even if the branch was not taken yet.

Unfortunately, this load is not always correctly ignored, and the processor can end up keeping the memory bus locked (according to the Loongson information).

This sounded too horrible to be true. Yet it was worth a try. The suggested workaround was to add extra code around branches to confuse the BTB matching logic, but this looked fishy to me. I decided to go with something guaranteed to work: forcing a BTB clear before every branch through a register.

And, as you might have guessed, since then, our kernels have been rock solid, and developers have been able to work on fixing userland bugs, getting X to run, and building and fixing ports for this machine.

In retrospect, a logic analyzer would have exposed this bug in no time. And although my workarounds were going in the right direction, I would never had suspected such a horrible errata. I am glad this problem is over, but I still want my hair back.

As you can see, your donation can go a long way into getting new hardware support into OpenBSD. Thanks go to Jasper, Otto an Miod for taking the time to tell us about their work and of course for working on the Loongson. Note that Paul Irofti (pirofti@) very recently also added his request for a Loongson machine to want.html to work on suspend / resume on these machines, so if you missed the chance to send hardware to The Netherlands, perhaps you can send some to Romania.

(Comments are closed)


Comments
  1. By Richard Toohey (richardtoohey) richardtoohey@paradise.net.nz on

    Thanks for the write-up - the time you take out to do so is much appreciated.

    Doubt I'll ever get near one of these machines, but an interesting read!

    Thank you.

  2. By phessler (phessler) spambox@theapt.org on http://theapt.org

    I've been working with jasper@ on ports for the loongson/mips64el arch, and I have a fix for python in the queue (needs a bit more testing). I've been using it somewhat as a desktop, but its not quite there yet. Several of the ports I use on a daily basis don't yet work (gaim, firefox), but we're fixing up all of the dependencies we can (give us a break, its been a supported platform for almost a month ;) ).

    4.7 will be pretty nice on the Loongson, with all of the major pieces on the Lemote Yeeloong laptops functional, and a decent (if not extensive) set of packages.

    Comments
    1. By phessler (phessler) on http://theapt.org

      > I've been working with jasper@ on ports for the loongson/mips64el arch, and I have a fix for python in the queue (needs a bit more testing).

      python is now fixed in-tree. The next package build should include it, or wait for your favorite CVS mirror to synchronize.

  3. By Otto Moerbeek (otto) otto@drijf.net on http://www.drijf.net

    Work is continuing.... Another gcc bug was squashed and I'm working on battery information and rudimentary apm for the Yeeloong.
    [otto@rocal:2]$ apm 
    Battery state: high, 98% remaining, 116 minutes life estimate
    A/C adapter state: not connected
    Performance adjustment mode: manual (797 MHz)
    [otto@rocal:3]$ 
    
    
    Note that speed throttling will require some more work, not to speak of suspend/resume.
    
    

    Comments
    1. By Paul 'WEiRD' de Weerd (weerd) on http://www.weirdnet.nl/openbsd/

      > Work is continuing.... Another gcc bug was squashed and I'm working on battery information and rudimentary apm for the Yeeloong.
      >
      > [otto@rocal:2]$ apm
      > Battery state: high, 98% remaining, 116 minutes life estimate
      > A/C adapter state: not connected
      > Performance adjustment mode: manual (797 MHz)
      > [otto@rocal:3]$
      >
      >
      > Note that speed throttling will require some more work, not to speak of suspend/resume.

      Yeah .. development is happening so fast right now, you can't have an article proofread without there being new developments to discuss :)

      Thanks for your work, Otto, Miod, Jasper .. but also Matthieu, Peter and the others !

    2. By Motley Fool (MotleyFool) on

      Can you relate it's performance to any Intel x86 processors?

      thanks for the work

      Comments
      1. By Otto Moerbeek (otto) on http://www.drijf.net

        > Can you relate it's performance to any Intel x86 processors?
        >
        > thanks for the work
        >
        >

        I have no other netbooks to compare to. Comparing it to a my am64 desktop with its giant caches is a bit unfair.... Anyway, a make build takes between 6 and 7 hours. That should give you a rough idea of the general performance.

  4. By Lawrence Teo (lteo) lteo.openbsd1 ! calyptix.com on http://labs.calyptix.com/

    It would be great if one of the developers could post a dmesg for us mere mortals to drool over ;-)

    Comments
    1. By Otto Moerbeek (otto) on http://www.drijf.net

      Yeeloong:
      [ using 423272 bytes of bsd ELF symbol table ]
      Copyright (c) 1982, 1986, 1989, 1991, 1993
              The Regents of the University of California.  All rights reserved.
      Copyright (c) 1995-2010 OpenBSD. All rights reserved.  http://www.OpenBSD.org
      
      OpenBSD 4.7-beta (GENERIC) #65: Thu Feb 25 11:20:00 CET 2010
          otto@rocal.intra.drijf.net:/usr/src/sys/arch/loongson/compile/GENERIC
      real mem = 1073741824 (1024MB)
      avail mem = 1043480576 (995MB)
      mainbus0 at root: Lemote Yeeloong
      cpu0 at mainbus0: STC Loongson2F CPU 797 MHz, STC Loongson2F FPU
      cpu0: cache L1-I 64KB D 64KB 4 way, L2 512KB 4 way
      clock0 at mainbus0: ticker on int5 using count register
      bonito0 at mainbus0: memory and PCI-X controller, rev 1
      pci0 at bonito0 bus 0
      rl0 at pci0 dev 7 function 0 "Realtek 8139" rev 0x10: irq 5, address 00:23:8b:f2:b4:5b
      rlphy0 at rl0 phy 0: RTL internal PHY
      smfb0 at pci0 dev 8 function 0 "Silicon Motion LynxEM+" rev 0xb0
      wsdisplay0 at smfb0 mux 1: console (std, vt100 emulation)
      ohci0 at pci0 dev 9 function 0 "NEC USB" rev 0x44: irq 7, version 1.0
      ehci0 at pci0 dev 9 function 1 "NEC USB" rev 0x05: irq 7
      usb0 at ehci0: USB revision 2.0
      uhub0 at usb0 "NEC EHCI root hub" rev 2.00/1.00 addr 1
      glxpcib0 at pci0 dev 14 function 0 "AMD CS5536 ISA" rev 0x03: rev 3, 32-bit 3579545Hz timer, watchdog, gpio
      isa0 at glxpcib0
      pckbc0 at isa0 port 0x60/5
      pckbd0 at pckbc0 (kbd slot)
      pckbc0: using irq 1 for kbd slot
      wskbd0 at pckbd0: console keyboard, using wsdisplay0
      pmsi0 at pckbc0 (aux slot)
      pckbc0: using irq 12 for aux slot
      wsmouse0 at pmsi0 mux 0
      mcclock0 at isa0 port 0x70/2: mc146818 or compatible
      ykbec0 at isa0 port 0x381/3
      gpio at glxpcib0 not configured
      pciide0 at pci0 dev 14 function 2 "AMD CS5536 IDE" rev 0x01: DMA, channel 0 wired to compatibility, channel 1 wired to compatibility
      wd0 at pciide0 channel 0 drive 0: 
      wd0: 16-sector PIO, LBA48, 152627MB, 312581808 sectors
      wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2
      pciide0: channel 1 ignored (disabled)
      auglx0 at pci0 dev 14 function 3 "AMD CS5536 Audio" rev 0x01: isa irq 9, CS5536 AC97
      ac97: codec id 0x414c4760 (Avance Logic ALC655 rev 0)
      audio0 at auglx0
      ohci1 at pci0 dev 14 function 4 "AMD CS5536 USB" rev 0x02: isa irq 11, version 1.0, legacy support
      ehci1 at pci0 dev 14 function 5 "AMD CS5536 USB" rev 0x02: isa irq 11
      usb1 at ehci1: USB revision 2.0
      uhub1 at usb1 "AMD EHCI root hub" rev 2.00/1.00 addr 1
      usb2 at ohci0: USB revision 1.0
      uhub2 at usb2 "NEC OHCI root hub" rev 1.00/1.00 addr 1
      usb3 at ohci1: USB revision 1.0
      uhub3 at usb3 "AMD OHCI root hub" rev 1.00/1.00 addr 1
      umass0 at uhub1 port 1 configuration 1 interface 0 "Generic USB2.0-CRW" rev 2.00/58.87 addr 2
      umass0: using SCSI over Bulk-Only
      scsibus0 at umass0: 2 targets, initiator 0
      sd0 at scsibus0 targ 1 lun 0:  SCSI0 0/direct removable
      sd0: drive offline
      urtw0 at uhub1 port 4 "Realtek RTL8187B" rev 2.00/2.00 addr 3
      urtw0: RTL8187B rev E, address 00:17:c4:4e:09:d7
      vscsi0 at root
      scsibus1 at vscsi0: 256 targets
      softraid0 at root
      pmon bootpath: /dev/disk/wd0
      boot device: wd0
      root on wd0a swap on wd0b dump on wd0b
      
      Fuloong 2F:
      [ using 423024 bytes of bsd ELF symbol table ]
      Copyright (c) 1982, 1986, 1989, 1991, 1993
              The Regents of the University of California.  All rights reserved.
      Copyright (c) 1995-2010 OpenBSD. All rights reserved.  http://www.OpenBSD.org
      
      OpenBSD 4.7-beta (GENERIC) #10: Tue Feb 23 20:55:15 CET 2010
          otto@fubar.intra.drijf.net:/usr/src/sys/arch/loongson/compile/GENERIC
      real mem = 536870912 (512MB)
      avail mem = 518914048 (494MB)
      mainbus0 at root: Lemote Fuloong
      cpu0 at mainbus0: STC Loongson2F CPU 797 MHz, STC Loongson2F FPU
      cpu0: cache L1-I 64KB D 64KB 4 way, L2 512KB 4 way
      clock0 at mainbus0: ticker on int5 using count register
      bonito0 at mainbus0: memory and PCI-X controller, rev 1
      pci0 at bonito0 bus 0
      re0 at pci0 dev 6 function 0 "Realtek 8169" rev 0x10: RTL8169/8110SCd (0x1800), irq 4, address 00:23:9e:00:0f:32
      rgephy0 at re0 phy 7: RTL8169S/8110S PHY, rev. 2
      sisfb0 at pci0 dev 8 function 0 "SiS 315 Pro VGA" rev 0x00SEQ 0005 <- 86
      CRTC 0012 -> 8f
      CRTC 0007 -> 1f
      SEQ 000a -> 30
      CRTC 0001 -> 4f
      SEQ 000b -> 00
      CRTC 000d -> 00
      CRTC 000c -> 00
      SEQ 000d -> 00
      SEQ 0037 -> 00
      FBADDR 00000000
      SEQ 0006 -> 02
      : 640x400x8 frame buffer
      wsdisplay0 at sisfb0 mux 1
      wsdisplay0: screen 0 added (std, vt100 emulation)
      glxpcib0 at pci0 dev 14 function 0 "AMD CS5536 ISA" rev 0x03: rev 3, 32-bit 3579545Hz timer, watchdog, gpio
      isa0 at glxpcib0
      com0 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
      com0: console
      mcclock0 at isa0 port 0x70/2: mc146818 or compatible
      gpio at glxpcib0 not configured
      pciide0 at pci0 dev 14 function 2 "AMD CS5536 IDE" rev 0x01: DMA, channel 0 wired to compatibility, channel 1 wired to compatibility
      wd0 at pciide0 channel 0 drive 0: 
      wd0: 16-sector PIO, LBA48, 152627MB, 312581808 sectors
      wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2
      pciide0: channel 1 ignored (disabled)
      auglx0 at pci0 dev 14 function 3 "AMD CS5536 Audio" rev 0x01: isa irq 9, CS5536 AC97
      ac97: codec id 0x414c4760 (Avance Logic ALC655 rev 0)
      audio0 at auglx0
      ohci0 at pci0 dev 14 function 4 "AMD CS5536 USB" rev 0x02: isa irq 11, version 1.0, legacy support
      ehci0 at pci0 dev 14 function 5 "AMD CS5536 USB" rev 0x02: isa irq 11
      usb0 at ehci0: USB revision 2.0
      uhub0 at usb0 "AMD EHCI root hub" rev 2.00/1.00 addr 1
      usb1 at ohci0: USB revision 1.0
      uhub1 at usb1 "AMD OHCI root hub" rev 1.00/1.00 addr 1
      vscsi0 at root
      scsibus0 at vscsi0: 256 targets
      softraid0 at root
      pmon bootpath: /dev/disk/wd0
      boot device: wd0
      root on wd0a swap on wd0b dump on wd0b
      
      The debug blurp is mio's work in progress to support the framebuffer.

      Comments
      1. By Otto Moerbeek (otto) on http://www.drijf.net

        That should be Miod's. Sorry about that.

  5. By Ted Walther (TedWalther) ted@reactor-core.org on http://reactor-core.org/

    This is the kind of coverage that keeps me hooked on OpenBSD. LWN.net is pretty good too. Every time I see this sort of openness and transparency, I'm motivated to donate more $$ to the project. Is the OpenBSD Journal being funded by OpenBSD directly? If not, who do I send the money to? This kind of news coverage is really refreshing and I don't mind saying so with my wallet.

    Ted

    Comments
    1. By phessler (phessler) on http://theapt.org

      > This is the kind of coverage that keeps me hooked on OpenBSD. LWN.net is pretty good too. Every time I see this sort of openness and transparency, I'm motivated to donate more $$ to the project. Is the OpenBSD Journal being funded by OpenBSD directly? If not, who do I send the money to? This kind of news coverage is really refreshing and I don't mind saying so with my wallet.
      >
      > Ted

      The OpenBSD Journal has no funding, all of this is volunteer time. The best thing you can do is donate directly to OpenBSD, as that allows the developers to do more cool stuff for Undeadly to report on ;).

  6. By Brynet (Brynet) brynet@gmail.com on

    Owain Ainsworth (oga@) needs an x86 machine with Intel GM45 (..or later) graphics.

    http://marc.info/?l=openbsd-cvs&m=126711097909004&w=2

    He appears to be working on some fancy things in that area of the tree, so if you can donate I'm sure he would really appreciate it.

Credits

Copyright © - Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to as well as images and HTML templates were copied from the fabulous original deadly.org with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]