Contributed by jose on from the fixing-what-is-broken dept.
"KernelTrap has an interesting story about how PF's scrub functionality conflicts with the Linux NFS implementation. Actually, it is probably better put the other way around, as the article explains Linux NFS does an odd thing...As usual, Daniel and Kernel Trap do an excellent job of bringing very useful, solid facts to the forefront.The story explains, "essentially, the Linux NFS implementation with UDP PMTU discovery enabled sets the "don't fragment" bit on fragmented packets, which PF's packet normalization functionality determines to be improper and drops. PF author Daniel Hartmeier notes that by disabling PF's "scrub" option on the protocols/ports in question, you can allow the Linux NFS client/server to work as its authors intended."
(Comments are closed)
By Gimlet () on
Can we get this guy to write some more white papers? I know, I know, the more docs he writes, the less code...
Comments
By Shane () on
I also like reading what the other guys write (mostly what I read misc@). I always look forward to something by Theo especially. I think it's rare to find someone so willing to stand up for his beliefs. I'm glad Theo has the guts to do what's right. I admire that.
Comments
By Matt () on
Keep up the good work Daniel!
Comments
By Michael Anuzis () on
http://www.benzedrine.cx/relaydb.html
By Shane () on
So this guy just makes up stuff to make his life easier? It seems like he made a stupid implementation decision and is now just trying to rationalize it. Not the kind of guy I want working on my sotware.
And common sense here dictates that without being able to set DF on fragmented frames, UDP path mtu discovery is basically impossible and at best useless.
So, how do other implementations of NFS do path mtu discovery? I assume the FreeBSD, NetBSD, and OpenBSD implementations don't have any problems because everyone basically asks, "You're using Linux, right?" If the Linux implementation is the only one that's having trouble, then it's obvious where the problem lies. This guy should take the hint.
These weird BSD firewalls are the only systems blocking these packets, and I'm not going to give up UDP pmtu discovery for the sake of making these systems happy.
Weird BSD firewalls? pf does the RIGHT thing and it's weird? That doesn't make sense.
Comments
By Dries Schellekens () on http://www.kerneltrap.org/node.php?id=579&cid=2386
The missing piece in the puzzle was the fact that certain protocols like NFS can't split transactions/operations into smaller packets, they need to send the entire transaction in one single (complete) IP packet. This size might exceed any real MTU, so it will get fragmented first. And only afterwards PMTU discovery gets applied to the fragments. Hence, DF on fragments. This scheme is not explicitely covered by the RFCs, but I agree that it's a logical conclusion.
Comments
By Shane () on
Comments
By Daniel Hartmeier () daniel@benzedrine.cx on mailto:daniel@benzedrine.cx
If NFS sends, say, a 16kB transaction, the OpenBSD stack will fragment that packet according to the interface's MTU, usually 1500 bytes on a LAN, and not set DF. If the PMTU should be lower, an intermediate hop may further fragment those fragments.
For instance, if the PMTU should be 600 bytes, an intermediate hop will break one 1500 byte fragment into three smaller ones. I guess that's less optimal than the endpoint doing PMTU discovery and sending 600 byte fragments itself (which can cause less IP header overhead).
But NFS is used mostly on local networks where the PMTU is equal to all interfaces' MTU, so I'm not sure PMTU discovery actually improves performance in many (or even most) cases.
By Gimlet () on
But this way we play the old game of "my non-spec protocol doesn't work - it's the damn firewall!" Did the Linux NFS guys design H.323????
By _azure () on
This makes my head hurt.
Over the last few years I've had countless problems getting my ISP's cable head-end gear to play nicely with various BSD firewall implementations. Of course it's never my software or equipment that's at fault -- but of course it's never any other software or equipment that develops a problem with their latest "new" setup.
Then again, this is technology that places groups of hundreds of users, all facing the Internet with static IP addresses, on a single segment.
By Anonymous Coward () on
What PMTU -these gus want to blame somebody else, not to understand the protocols behind their optical micen and wireless keyboard-trolls
By Anonymous Coward () on
PMTU responds ICMP type 3 ode 4 to DF "Datagrams" as per RFC 1191, thus linux packet is badly broken (a controvesity of DF packet and MF and correctly filtered
please point to other releavant RFCs if i missed something or am completely wrong
Comments
By Daniel Hartmeier () daniel@benzedrine.cx on mailto:daniel@benzedrine.cx
Especially the wording in the "An Example Fragmentation Procedure". It's very careful about copying the MF bit from the original datagram header into the fragments. Obviously, this is done to address the case where the orginal datagram is already a fragment (otherwise MF wouldn't be set, and it wouldn't need special attention).
So fragmenting a fragment further is perfectly legitimate and well documented. But PMTU discovery (RFC 1191) doesn't explicitely address doing the discovery for fragments (setting DF on fragments to trigger ICMP errors and discover the PMTU).
Comments
By Anonymous Coward () on
By Dries Schellekens () on
Does the OpenBSD stack send ICMP errors when it tries to fragment fragments with DF set? Which OSes do so? (I guess Linux does so, else they wouldn't be doing PMTU discovery on already fragmented UDP packets).
Comments
By Daniel Hartmeier () daniel@benzedrine.cx on mailto:daniel@benzedrine.cx
The RFC would allow to silently drop the packet/fragment, but all stacks I know do send an ICMP 'need frag' message.
Comments
By Even More Anonymous () on
By Dries Schellekens () on
Comments
By Daniel Hartmeier () daniel@benzedrine.cx on mailto:daniel@benzedrine.cx
a) what happens when a fragment with DF exceeds an MTU on a router. The RFC allows to drop it silently or with an ICMP error.
b) what happens when a fragment with DF arrives at an endpoint. The RFC doesn't address DF on fragments, only DF on datagrams (which can be interpreted as complete datagrams or fragments), and says they should be reassembled.
The first issue was never in question, but the second is. Mike says there are endpoints handling this differently, which means that such a fragment causes ambiguity, which normalization must resolve.
I can't name examples for each interpretation, but then I basically just use OpenBSD, so I trust Mike on this.
By Anonymous Coward () on
Considering, this is the same OS that brought us "Sending fragments in reverse order..."