N2K8 Hackathon Summary Part 3

Contributed by merdely on 2008-05-21 from the what-no-tim-tams? dept.

Mark Uemura (mtu@) brings us Part 3 of his series from the Network Hackathon featuring Damien Miller (djm@):

Network Hackathon (Part 3) - May 5-10, 2008, Ito, Japan

Damien Miller (djm@) is one of the most amiable guys that I've ever met. I thought that he was one of the best speakers at AsiaBSDcon last year when he gave his "Security Measures in OpenSSH" talk (slides, paper). It was an excellent presentation. I was also fascinated by how he was able to connect with the audience. He is so humble, polite and smart. I am starting to think that this is an Australian trait. :-)

The third edition of the n2k8 series continues below.

We all know that OpenSSH is the "de facto" standard for SSH, which is to say it is perhaps one of the most important applications in the world that everyone relies on for security. There is a lot of responsibility that comes with this. Expectations are high and vulnerabilities are not just really embarrassing but intolerable.
As Damien was giving his talk, I remember thinking to myself, "How can anyone belittle their efforts?" even when he was highlighting some of the past vulnerabilities in OpenSSH. I am sure that this was uncomfortable to talk about and yet he showed how they were dealt with in a timely manner and what changes were made to help mitigate these kinds of vulnerabilities from ever happening in the future. After the talk, you couldn't help but think, "Man, this guy is brilliant and I'm awfully glad that he's one of the guys that I can rely on to keep OpenSSH free, functional and secure!" If you ever get a chance to hear Damien speak, don't miss it.
Damien was perhaps the quietest person during the hackathon but as you will see from his report below, he was by no means idle. Here is what Damien had to say about his work at the hackathon:

"n2k8 was a fantastic event - the organisation and location were fantastic and everyone had a great time. Here's what I worked on:
tcpbench
Early at the hackathon I was investigating TCP performance and playing with some patches from Markus Friedl (markus@) to automatically resize the TCP send buffers to better utilise connections with large bandwidth x delay products. Initially I did some testing with an instrumented netcat [nc(1)] that printed some statistics every second, but I soon decided that I needed a better tool. There are quite a few good benchmarking tools around, but I wanted something that would give me more visibility into how the kernel TCP implementation was behaving during the transfer and I wanted something more elegant than a kernel littered with printf() calls to print out the interesting variables.
So was born tcpbench(1). This tool is a simple client/server that sends data over a TCP stream as quickly as possible, while printing some bandwidth statistics. tcpbench adds to this simple base the ability to sample, via kvm(3), and display most of the kernel variables related to the TCP sessions so_send, so_recv, inpcb and tcpcb. This provides much more visibility into what TCP is doing as it sends and receives data and will hopefully help us as we further improve the stack.
MaxSessions
OpenSSH has supported multiple shell/login/file transfer sessions over a single SSH connection for some years. The number of such sessions has been fixed at 10 for all this time. At n2k8 I made this limit run-time controllable using a new MaxSessions knob in sshd_config(5). This is helpful for administrators who would like to raise the limit a little. It can also be used to entirely disable logins and shell execution while leaving port-forwarding intact, by specifying "MaxSessions 0". Another use is to disable connection multiplexing from the server "MaxSessions 1"
The reason that MaxSessions was fixed at a small number is that each session uses a surprising number of file descriptors each, and it was quite easy to make the server run out by raising it. Moreover, when the server ran out of file descriptors it would not handle the situation gracefully - in most cases it would fatally error or leak file descriptors as it tried to clean up. As part of making MaxSessions dynamic, I performed an audit of sshd to improve this and have fixed all the cases of bad behaviour I could find. It is still strongly recommended to ensure that the value of MaxSessions does not lead to file descriptor exhaustion in the server because there may be more incorrect behaviour lurking around. The best way to test it to run fstat(1) on a sshd that has MaxSessions sessions active and compare it to the ulimits that sshd was started with.
The fd audit replaced quite a few fatal() error calls with graceful notifications. Unfortunately, the ssh client did not expect to be told that its shell/login session had been refused and would hang in such cases. So some further work was done to make ssh request confirmation for all sessions it started, to check the confirmation replies when they were sent back by the server and to provide some feedback, in the form of an error message, when things went wrong.

OpenSSH write failures
Very late on Friday night, Markus and I fixed one of the oldest bugs in OpenSSH's bugzilla database. When a command that has been remotely executed over a SSH protocol 2 connection had its local output file descriptor closed, this closure was not being signalled to the remote command. To see why this is important, consider the case of:
cat /dev/zero | true
In a simple shell redirection, the "true" command would quickly exit and close its stdin file descriptor. The "cat" command would instantly see this as a closure of its stdout and stop the write. In the ssh equivalent:
ssh remotehost "cat /dev/zero" | true
The closure of ssh's stdout was being noticed, but was not being reported to the remote sshd that was running "cat" therefore "cat" would merrily continue running and sending data.
In the SSH protocol all shell, login and forwarding activity occurs inside a "channel" a protocol abstraction that allows them to coexist and share SSH transport (over TCP) connection. Fixing this was a little tricky because it turned out that, while the SSH 2 channel protocol could signal to fully close a channel, or that the local end would not send any more data, there is no standard way of signalling the remote end that it should not send any more data. Interestingly, SSH protocol 1 did support this and this bug did not manifest there.
The solution (implemented by Markus) was to implement a new channel protocol request "eow@openssh.com" The SSH protocol has a cool extension mechanism that allows vendors to define new requests in a namespace defined by their Internet domain, so this extension is trivially backwards-compatible - it will just be ignored by SSH implementations that do not support it. OpenSSH-CVS will send this message to its peer when it sees an output file descriptor close - the peer (assuming it is running OpenSSH-CVS too) will take this message as an indication that the output side of the channel has closed and it should close its local input file descriptor. This makes OpenSSH a bit more transparent and "shell-like" in its behaviour.
This alone wasn't enough to fix the bug; however, because OpenSSH used a bidirectional socketpair(2) to communicate with its child process, the same file descriptor was being used for both input and output, and closing it would close both. Markus tried to use shutdown() to half-close the socketpair, but it didn't produce the signaling semantics that we wanted (we need to investigate this further). Fortunately, there was a solution to this: using pipes instead of a socketpair. OpenSSH used to use pipes, but we switched to a socketpair some years ago because it saved a couple of file descriptors per login/shell session (socketpairs are bidirectional everywhere, but pipes are not guaranteed to be). Markus resurrected the pipes code and made it the default again. This gave us the close semantics that we needed.
At this stage we were almost ready to commit the patch, but I noticed a problem: sometimes ssh would fatally error right at the point that it was exiting. The problem was intermittent, but we narrowed it down to a test case:
ssh "od /sbin/isakmpd; echo ok 1>&2" | true
The first part of the remote command ("od /sbin/isakmpd") was just to generate some output that should be truncated when the output channel closed. The second part ("echo ok 1>&2") was to ensure that the extended (stderr) channel was left open. After much debugging it turned out that there was a race condition in the code that manages the fiddly interdependencies between closing a channel and closing the login/shell session that uses it. This race caused the remote end to select() on an already closed file descriptor and error out. Markus tightened some of the conditions used to decide when to close the stderr part of the channel and fixed the crash.
Fixing this bug took too many hours late at night, but I guess that if it was easy, then we would have done it years ago...

OpenSSH TCP connections
Another bug, almost as old, was that OpenSSH would try only the first address returned by the DNS when making a forwarding connection to a hostname; if the connection was refused any additional addresses would be ignored.
OpenSSH uses non-blocking connect() to make sure that a slow port forwarding connection does not stall a connection, so fixing this was more than just trying each host sequentially, but it was still quite straightforward: keep a copy of the linked list of addresses returned by getaddrinfo(), and try the subsequent ones in cases where earlier ones failed.
Work started
Early at n2k8, Markus and I discussed improving our TCP performance. Our TCP implementation is very conservative in sizing its send and receive buffers and does no auto-tuning, so it does not perform well on high bandwidth x delay product links (e.g. those from Australia to Canada). It quickly became apparent that I wasn't nearly familiar enough with the TCP code or the literature on how to tune it, so I spent quite a bit of the hackathon tinkering with the stack and some patches from Markus, as well as reading some papers on automatic tuning techniques. Hopefully I'll be wiser and ready to write some code in time for the general OpenBSD hackathon.
I also spent some time looking at the remaining two unadorned linear congruential generator-based ID generators in the tree (both in IPv6), so see if they can be replaced or improved. Neither has yet been found vulnerable to the guessing/precomputation attacks recently reported against the bind resolver or IP ID generators, but we don't want to wait for an attack to have to change them. One of these is pretty easy to fix, but the other will be a little more challenging."
Damien

(n2k8 hackathon summary to be continued)

Thank you again, Mark, for taking the time to share these articles with us.

(Comments are closed)

Comments

By Joe Price (68.43.169.37) on 2008-05-21 06:48

typo?

'The "cat" command would instantly see this as a closure of its stdout and stop the right.'

s/right/write

?
Comments
1. By Mike Erdely (merdely) on 2008-05-21 14:34 http://erdelynet.com/
  
  > typo?
  >
  > 'The "cat" command would instantly see this as a closure of its stdout and stop the right.'
  >
  > s/right/write
  >
  > ?
  
  Thanks. Fixed.
By jirib (89.176.154.98) on 2008-05-21 12:12

i know this benchmark is old but what do you think about it? http://bulk.fefe.de/scalability/

have been done any improvements in openbsd?
Comments
1. By Anonymous Coward (69.251.202.202) on 2008-05-21 12:41
  
  > i know this benchmark is old but what do you think about it? http://bulk.fefe.de/scalability/
  >
  > have been done any improvements in openbsd?
  
  In the nine releases and nearly five years since? What do you think?
2. By Anonymous Coward (128.171.90.200) on 2008-05-21 20:23
  
  > have been done any improvements in openbsd?
  
  Yes, but OpenBSD is not really in the play-fast-and-loose-with-the-code-for-the-sake-of-a-bit-of-extra-speed game.
By MaximB (212.116.70.46) on 2008-05-21 18:20

I hope that tcpbench is just a beginning for a more complete benchmarking tool, which eventually can replace iperf some day.
By Anonymous Coward (76.179.19.79) on 2008-05-21 21:36

Thank you to those people who are taking the effort to write up and publish these stories. This is a nice window into the development of OpenBSD, and it is great to have something more that the CVS commit logs to read when you want to know what is going on for the next release.
Comments
1. By Anonymous Coward (216.205.224.10) on 2008-05-21 23:14
  
  seconded. this are invaluable to some of us.

Latest Articles

Fri, Jul 11
- 09:15 watch(1) utility added to -current (0)
Sat, Jul 05
- 08:17 KDE Plasma 6.4 has landed in OpenBSD (0)
- 08:13 Blink and you'll miss it! 4096 colours and flashing text on the console! (2)
- 08:08 Game of Trees Hub now taking signups for repository hosting (0)
Sat, Jun 28
- 05:57 Game of Trees 0.115 released (0)
Tue, Jun 24
- 07:48 Game of Trees 0.114 released (0)
- 07:23 Call for testing: bge/bnx/iavf/igc/ix/ixl/ngbe/pcn: ifq_restart() fix (0)
Mon, Jun 16
- 08:22 j2k25 hackathon report from kn@: installer, low battery, and more (0)
Fri, Jun 13
- 11:18 dhcpd(8): use UDP sockets instead of BPF (1)

Credits

Copyright © 2004-2008 Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to April 2nd 2004 as well as images and HTML templates were copied from the fabulous original deadly.org with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]