OpenBSD Journal

Old OpenBSD box needs help

Contributed by jose on from the Old-OpenBSD-box-needs-help dept.

Having been away for a few days on vacation up north, I returned home to find Arrigo Triulzi had written:
"I've been desperately trying to understand this problem on a box which for a number of reasons I am not allowed to upgrade (it is protected from the wild world outside so it isn't a key requirement)."

"Fundamentally the box is a PII, 266MHz with 128Mb RAM and IDE disk. Nothing to write home about. It runs OpenBSD 2.7 with all the applicable patches.

After a certain amount of time, which is sadly variable, all network ports lock up. Through traffic (it is gatewaying) works fine, but no new processes seem to be able to start (for example there is sms_client as a cron on it to send a GSM SMS message on certain events).

I've totally run out of ideas. Can anyone help me debug this issue?

Most grateful.

Arrigo "

This is the kind of thing that seems to stump me every now and then, and I can't seem to figure out how to debug it. Anyone have any advice?

(Comments are closed)


Comments
  1. By Stefan Johansson () stefanjo@telia.com on mailto:stefanjo@telia.com

    I had similar problems with 2.7 (and 2.8 iirc). Never could have more than a couple of days uptime at max. I never bothered to hunt down any bugs though (dont even know how to). And after an upgrade to 2.9 it has worked perfect.

  2. By andrei tanase () tanase@alioth.net on mailto:tanase@alioth.net

    1. any dumped cores from the services that are trying to start?

    2. resolver problem. i wasnt able to get any more connections (ssh) to an obsd router after the interfaces were up and running because i had no /etc/hosts entries and/or named set up and sshd was trying to resolve my connections over the new networks to log me or something. already established connections worked. it may not be exactly this but i seem to remember some bugs with the resolver routines were found in 3.0?

    3. run tcpdump on the relevant interfaces and see if the SYNs are being received ( i bet they are ). check filtering/nat rules, logs. yeah i know, DOH!

    4. --debug, -log-all, whatever and comb /var/log. if you already done that, do it again.

    5. compile a new kernel with option DIAGNOSTICS, DDB, KMEMSTATS, KTRACE, whatever is avaiable for 2.7 kernel debugging if it isn't done already. it may be a weird hardware problem. i once had a pc do a panic: cpu_switch() at EXACTLY 77 minutes after boot-up because of a faulty dimm (i think...).

    6. if nothing works, call exorcists, priests, shamans, buddhist priests, voodoo people and have them do their thing on your box. maybe its bad juju.

    7. finally if nothing works start persuading the right people for a new machine/ new bsd CD or both.

    8. maybe youre running low on memory?

  3. By francisco () on http://www.blackant.net/

    I've totally run out of ideas.

    well, what have you tried?

    a better description would help, is it processes that cant be created or is it ports cant be opened? did this just start happening or has it always happened? if it just started, did you make any recent changes? if it just started happening and you have made no changes, check your hardware. log files report nothing unusual? you're running GENERIC or did you add RANDOM_HANG_PATCH?

    you could always set the machine to reboot at a time interval less than the shortest time to hang.

  4. By Anonymous Coward () on

    Don't run a 2.5 year old version of OpenBSD and ask for help.

  5. By Tony S () on


    Are you running a custom kernel ?

    If so, out back the
    option I386_CPU

    in the kernel config file and recompile.

    /Tony S

  6. By jose () on http://www.monkey.org/~jose/

    when it gets stuck, dump the kernel to ddb> and have a look around: ddb may be invoked from the console by the key sequence Ctl-Alt-Esc when the sysctl(8) name ddb.console is set to 1. then start poking around at the "show" commands (man ddb). you may find that you've exhausted something (and its not being freed, ie memory or threads).

  7. By Arrigo Triulzi () on http://www.alchemistowl.org/arrigo

    So far there have been a number of ideas, to summarise:

    1) put back the I386 definition (which is commented out in my custom kernel),
    2) upgrade to 2.8 or better (except that the machine is remote so I am wondering if there is a "failsafe" mechanism to do so),
    3) ddb (if someone has a scheme to wire the serial console into a Lantronix LRS16 console concentrator - DB9 to RJ45)
    4) setup a cron to reboot the machine - this would be fine except that it is not always a "good time" to reboot the machine.

    I can try 1) relatively easily scheduling a reboot "out of hours" and my only comment/question about 1) is "why does it work?" out of curiosity.

    There were a couple of comments about "this has been fixed in later revisions" to which my reply is "could you please tell me what `this' stands for in this context?". At least we can all learn something about this problem and it remains in the archives.

  8. By Arrigo Triulzi () on http://www.alchemistowl.org/arrigo

    ...kindly detailed by Miod. I am reproducing it here because it is hidden away in a thread.

    Fundamentally Miod concurs with the "option I386_CPU" remark and adds the motivation which is found at:

    http://www.openbsd.org/cgi-bin/cvsweb/src/sys/arch/i386/conf/Makefile.i386?rev=1.26&content-type=text/x-cvsweb-markup

    The description is remarkably similar to what you see in practice and the resource starvation effect.

    Thanks to all!

  9. By Anonymous Coward () on

    Try increasing the NMB_CLUSTERS size to 8192, I had the same problem with 2.7 without this option.

  10. By Daniel Brandt () dbr@linux.se on mailto:dbr@linux.se

    Here is a remote upgrade procedure for OpenBSD which I myself followed. Someone might already have pointed you to it but anyway, in case you didn't get it.. I'd hate to see yet another person switch to some other platform because of arrogant repiles when you ask for help. I decided I wanted to test it in a non-production environment in case I ever had to leave a headless box somewhere remote, so I tried it and it worked for me. The instructions are here: http://tash.pintday.org/hack/docs/remote-upgrade-howto.shtml The instructions should be easy enough to adjust if you only want to go to 2.9 so you don't have to rewrite the firewall rules from ipf->pf. Hope this works for you..

Latest Articles

Credits

Copyright © - Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to as well as images and HTML templates were copied from the fabulous original deadly.org with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]