PF and Linux NFS

Contributed by jose on 2003-02-12 from the fixing-what-is-broken dept.

(yet another anonymous) sends us this story:

"KernelTrap has an interesting story about how PF's scrub functionality conflicts with the Linux NFS implementation. Actually, it is probably better put the other way around, as the article explains Linux NFS does an odd thing...
The story explains, "essentially, the Linux NFS implementation with UDP PMTU discovery enabled sets the "don't fragment" bit on fragmented packets, which PF's packet normalization functionality determines to be improper and drops. PF author Daniel Hartmeier notes that by disabling PF's "scrub" option on the protocols/ports in question, you can allow the Linux NFS client/server to work as its authors intended."

As usual, Daniel and Kernel Trap do an excellent job of bringing very useful, solid facts to the forefront.

(Comments are closed)

Comments

By Gimlet () on 2003-02-12 06:34

You know, I learn something every time I read something from Daniel Hartmeier. I never thought about fragments needing to be reassembled for stateful connections to work, but it certainly makes sense now that I think about it.

Can we get this guy to write some more white papers? I know, I know, the more docs he writes, the less code...
Comments
1. By Shane () on 2003-02-12 06:50
  
  Personally, I think Daniel Hartmeier would make a great OpenBSD public relations guy. He clearly knows his stuff and he's great at explaining concepts and answering questions. It's really cool when you ask a pf question on the newsgroup and Mr. pf himself responds. I'd like to hear more from him. It's too bad that a role like that would mean less code, I think he'd do a fine job.
  
  I also like reading what the other guys write (mostly what I read misc@). I always look forward to something by Theo especially. I think it's rare to find someone so willing to stand up for his beliefs. I'm glad Theo has the guts to do what's right. I admire that.
  Comments
  1. By Matt () on 2003-02-12 13:06
    
    I have written to Daniel's personal email account with feature requests in the past and he replied in a concise and well thought out manner. He shot my request down, but he stated his reasons and showed me why my idea was impractical. Overall he seemed a very professional and likeable guy, needless to say I was impressed.
    
    Keep up the good work Daniel!
    
    Comments
    
    By Michael Anuzis () on 2003-02-12 13:30
    
    & let's not forget my personal favorite; ie. his advice on annoying spammers =)
    
    http://www.benzedrine.cx/relaydb.html
By Shane () on 2003-02-12 15:47

RFCs are not laws that cannot be broken when common sense must prevail. For example, nobody currently ships a system that does RFC compliant URG handling, you wouldn't be able to talk to any other stack if that were the case.

So this guy just makes up stuff to make his life easier? It seems like he made a stupid implementation decision and is now just trying to rationalize it. Not the kind of guy I want working on my sotware.

And common sense here dictates that without being able to set DF on fragmented frames, UDP path mtu discovery is basically impossible and at best useless.

So, how do other implementations of NFS do path mtu discovery? I assume the FreeBSD, NetBSD, and OpenBSD implementations don't have any problems because everyone basically asks, "You're using Linux, right?" If the Linux implementation is the only one that's having trouble, then it's obvious where the problem lies. This guy should take the hint.

These weird BSD firewalls are the only systems blocking these packets, and I'm not going to give up UDP pmtu discovery for the sake of making these systems happy.

Weird BSD firewalls? pf does the RIGHT thing and it's weird? That doesn't make sense.
Comments
1. By Dries Schellekens () on 2003-02-12 15:53 http://www.kerneltrap.org/node.php?id=579&cid=2386
  
  Daniel finally figured out why they are setting DF on fragments:
  
  The missing piece in the puzzle was the fact that certain protocols like NFS can't split transactions/operations into smaller packets, they need to send the entire transaction in one single (complete) IP packet. This size might exceed any real MTU, so it will get fragmented first. And only afterwards PMTU discovery gets applied to the fragments. Hence, DF on fragments. This scheme is not explicitely covered by the RFCs, but I agree that it's a logical conclusion.
  Comments
  1. By Shane () on 2003-02-12 16:55
    
    Hm, okay. But, that still doesn't explain why other systems aren't affected.
    
    Comments
    
    By Daniel Hartmeier () daniel@benzedrine.cx on 2003-02-12 19:28 mailto:daniel@benzedrine.cx
    
    As I understand it, the BSDs (or at least OpenBSD) just don't do PMTU discovery in that case.
    
    If NFS sends, say, a 16kB transaction, the OpenBSD stack will fragment that packet according to the interface's MTU, usually 1500 bytes on a LAN, and not set DF. If the PMTU should be lower, an intermediate hop may further fragment those fragments.
    
    For instance, if the PMTU should be 600 bytes, an intermediate hop will break one 1500 byte fragment into three smaller ones. I guess that's less optimal than the endpoint doing PMTU discovery and sending 600 byte fragments itself (which can cause less IP header overhead).
    
    But NFS is used mostly on local networks where the PMTU is equal to all interfaces' MTU, so I'm not sure PMTU discovery actually improves performance in many (or even most) cases.
2. By Gimlet () on 2003-02-12 17:20
  
  Okay, so if the RFCs don't work...why not submit an RFC with a *better* design? Isn't that the whole point?
  
  But this way we play the old game of "my non-spec protocol doesn't work - it's the damn firewall!" Did the Linux NFS guys design H.323????
3. By _azure () on 2003-02-12 17:47
  
  This makes my head hurt.
  Over the last few years I've had countless problems getting my ISP's cable head-end gear to play nicely with various BSD firewall implementations. Of course it's never my software or equipment that's at fault -- but of course it's never any other software or equipment that develops a problem with their latest "new" setup.
  Then again, this is technology that places groups of hundreds of users, all facing the Internet with static IP addresses, on a single segment.
By Anonymous Coward () on 2003-02-12 17:50

Was that a Linux who used no udp checksums???
What PMTU -these gus want to blame somebody else, not to understand the protocols behind their optical micen and wireless keyboard-trolls
By Anonymous Coward () on 2003-02-12 19:26

"Datagram" is UDP packet (not the in-frame fragment) as per RFC 768
PMTU responds ICMP type 3 ode 4 to DF "Datagrams" as per RFC 1191, thus linux packet is badly broken (a controvesity of DF packet and MF and correctly filtered
please point to other releavant RFCs if i missed something or am completely wrong
Comments
1. By Daniel Hartmeier () daniel@benzedrine.cx on 2003-02-12 19:39 mailto:daniel@benzedrine.cx
  
  Compare the use of "datagram" with RFC 791 (IP), and how it's used in the section "Fragmentation and Reassembly".
  
  Especially the wording in the "An Example Fragmentation Procedure". It's very careful about copying the MF bit from the original datagram header into the fragments. Obviously, this is done to address the case where the orginal datagram is already a fragment (otherwise MF wouldn't be set, and it wouldn't need special attention).
  
  So fragmenting a fragment further is perfectly legitimate and well documented. But PMTU discovery (RFC 1191) doesn't explicitely address doing the discovery for fragments (setting DF on fragments to trigger ICMP errors and discover the PMTU).
  Comments
  1. By Anonymous Coward () on 2003-02-12 20:23
    
    Yes! You're right... I have checked with Solaris 7,8,9 and AIX 4.1.5,4.3.3,5.1 clients, no-one hit problems, mailing lists or web message boards... ( my usual setup is netbsd _without_ ipf with nfs and samba holding mostly patches and a bunch of clients (referred as servers in other, management's scope) who gets patched directly from here) from RFC 1911 ...In the IP architecture, the choice of what size datagram to send is made by a protocol at a layer above IP... it seems to be later and pretty explaining, at least lifting "datagram" to higher layer (?tcp/udp/whatever), not to refer a "larger fragment to be fragmented further" as in RFC 791's scope RFC 791's definition of DF follows... ..If the Don't Fragment flag (DF) bit is set, then internet fragmentation of this datagram is NOT permitted, although it may be discarded ...
  2. By Dries Schellekens () on 2003-02-12 20:44
    
    So fragmenting a fragment further is perfectly legitimate and well documented. But PMTU discovery (RFC 1191) doesn't explicitely address doing the discovery for fragments (setting DF on fragments to trigger ICMP errors and discover the PMTU).
    
    Does the OpenBSD stack send ICMP errors when it tries to fragment fragments with DF set? Which OSes do so? (I guess Linux does so, else they wouldn't be doing PMTU discovery on already fragmented UDP packets).
    
    Comments
    
    By Daniel Hartmeier () daniel@benzedrine.cx on 2003-02-12 20:54 mailto:daniel@benzedrine.cx
    
    Yes, it does, and it doesn't handle fragments differently from complete packets in this regard.
    
    The RFC would allow to silently drop the packet/fragment, but all stacks I know do send an ICMP 'need frag' message.
    
    Comments
    
    By Even More Anonymous () on 2003-02-12 21:19
    
    I just wanted to emphasize that word "datagram" has different meanings in different rfc-s, and pmtu just serves as hint for mtu (to be smaller than interface mtu) nothing more nothing less..., thus a linux nfs client by setting DF is completely aware that UDP datagrams can be {Dropped|Processed-as-expected}, even if there is no way to feel difference between first two on client side...just to wait and tell phew it is a timeout - Dazed and Confused - let's blame somebody else...
    
    By Dries Schellekens () on 2003-02-12 23:40
    
    The RFC would allow to silently drop the packet/fragment, but all stacks I know do send an ICMP 'need frag' message. So basicly the statement of Mike Frantzen doesn't hold: Why does scrub drop MF|DF fragments? Because it is not clear whether the end host will reassemble those packets. Some people consider fragments with the Don't Fragment bit set to be perfectly logical, others of us don't know what the hell it means. That folks, is an ambiguity and is exactly what the scrubber is tasked to prevent. So The standard behaviour of routers is to forward these fragments (and not drop them, according to RFC) or send an ICMP 'need frag' when they need to fragment them further.
    
    Comments
    
    By Daniel Hartmeier () daniel@benzedrine.cx on 2003-02-13 12:44 mailto:daniel@benzedrine.cx
    
    There's two separate issues here:
    
    a) what happens when a fragment with DF exceeds an MTU on a router. The RFC allows to drop it silently or with an ICMP error.
    
    b) what happens when a fragment with DF arrives at an endpoint. The RFC doesn't address DF on fragments, only DF on datagrams (which can be interpreted as complete datagrams or fragments), and says they should be reassembled.
    
    The first issue was never in question, but the second is. Mike says there are endpoints handling this differently, which means that such a fragment causes ambiguity, which normalization must resolve.
    
    I can't name examples for each interpretation, but then I basically just use OpenBSD, so I trust Mike on this.
By Anonymous Coward () on 2003-02-15 01:18

Is anyone really surprised?

Considering, this is the same OS that brought us "Sending fragments in reverse order..."

Latest Articles

Fri, Jul 11
- 09:15 watch(1) utility added to -current (0)
Sat, Jul 05
- 08:17 KDE Plasma 6.4 has landed in OpenBSD (0)
- 08:13 Blink and you'll miss it! 4096 colours and flashing text on the console! (2)
- 08:08 Game of Trees Hub now taking signups for repository hosting (0)
Sat, Jun 28
- 05:57 Game of Trees 0.115 released (0)
Tue, Jun 24
- 07:48 Game of Trees 0.114 released (0)
- 07:23 Call for testing: bge/bnx/iavf/igc/ix/ixl/ngbe/pcn: ifq_restart() fix (0)
Mon, Jun 16
- 08:22 j2k25 hackathon report from kn@: installer, low battery, and more (0)
Fri, Jun 13
- 11:18 dhcpd(8): use UDP sockets instead of BPF (1)

Credits

Copyright © 2004-2008 Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to April 2nd 2004 as well as images and HTML templates were copied from the fabulous original deadly.org with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]