Contributed by jose on from the Old-OpenBSD-box-needs-help dept.
"I've been desperately trying to understand this problem on a box which for a number of reasons I am not allowed to upgrade (it is protected from the wild world outside so it isn't a key requirement)."
"Fundamentally the box is a PII, 266MHz with 128Mb RAM and IDE disk. Nothing to write home about. It runs OpenBSD 2.7 with all the applicable patches.This is the kind of thing that seems to stump me every now and then, and I can't seem to figure out how to debug it. Anyone have any advice?After a certain amount of time, which is sadly variable, all network ports lock up. Through traffic (it is gatewaying) works fine, but no new processes seem to be able to start (for example there is sms_client as a cron on it to send a GSM SMS message on certain events).
I've totally run out of ideas. Can anyone help me debug this issue?
Most grateful.
Arrigo "
(Comments are closed)
By Stefan Johansson () stefanjo@telia.com on mailto:stefanjo@telia.com
By andrei tanase () tanase@alioth.net on mailto:tanase@alioth.net
2. resolver problem. i wasnt able to get any more connections (ssh) to an obsd router after the interfaces were up and running because i had no /etc/hosts entries and/or named set up and sshd was trying to resolve my connections over the new networks to log me or something. already established connections worked. it may not be exactly this but i seem to remember some bugs with the resolver routines were found in 3.0?
3. run tcpdump on the relevant interfaces and see if the SYNs are being received ( i bet they are ). check filtering/nat rules, logs. yeah i know, DOH!
4. --debug, -log-all, whatever and comb /var/log. if you already done that, do it again.
5. compile a new kernel with option DIAGNOSTICS, DDB, KMEMSTATS, KTRACE, whatever is avaiable for 2.7 kernel debugging if it isn't done already. it may be a weird hardware problem. i once had a pc do a panic: cpu_switch() at EXACTLY 77 minutes after boot-up because of a faulty dimm (i think...).
6. if nothing works, call exorcists, priests, shamans, buddhist priests, voodoo people and have them do their thing on your box. maybe its bad juju.
7. finally if nothing works start persuading the right people for a new machine/ new bsd CD or both.
8. maybe youre running low on memory?
By francisco () on http://www.blackant.net/
well, what have you tried?
a better description would help, is it processes that cant be created or is it ports cant be opened? did this just start happening or has it always happened? if it just started, did you make any recent changes? if it just started happening and you have made no changes, check your hardware. log files report nothing unusual? you're running GENERIC or did you add RANDOM_HANG_PATCH?
you could always set the machine to reboot at a time interval less than the shortest time to hang.
By Anonymous Coward () on
By Tony S () on
Are you running a custom kernel ?
If so, out back the
option I386_CPU
in the kernel config file and recompile.
/Tony S
By jose () on http://www.monkey.org/~jose/
By Arrigo Triulzi () on http://www.alchemistowl.org/arrigo
1) put back the I386 definition (which is commented out in my custom kernel),
2) upgrade to 2.8 or better (except that the machine is remote so I am wondering if there is a "failsafe" mechanism to do so),
3) ddb (if someone has a scheme to wire the serial console into a Lantronix LRS16 console concentrator - DB9 to RJ45)
4) setup a cron to reboot the machine - this would be fine except that it is not always a "good time" to reboot the machine.
I can try 1) relatively easily scheduling a reboot "out of hours" and my only comment/question about 1) is "why does it work?" out of curiosity.
There were a couple of comments about "this has been fixed in later revisions" to which my reply is "could you please tell me what `this' stands for in this context?". At least we can all learn something about this problem and it remains in the archives.
By Arrigo Triulzi () on http://www.alchemistowl.org/arrigo
Fundamentally Miod concurs with the "option I386_CPU" remark and adds the motivation which is found at:
http://www.openbsd.org/cgi-bin/cvsweb/src/sys/arch/i386/conf/Makefile.i386?rev=1.26&content-type=text/x-cvsweb-markup
The description is remarkably similar to what you see in practice and the resource starvation effect.
Thanks to all!
By Anonymous Coward () on
By Daniel Brandt () dbr@linux.se on mailto:dbr@linux.se