2020-08-29 PMTUD black holes still exist with IPv6

From Wikistix

So, I've just spent a few hours debugging a hanging TCP https download to an IPv6 host (from a large internet company I'll leave unnamed), which turns out to be a PMTUD black hole. I have some history debugging those in the past (details below), but I'm surprised yet again that this is still an issue. The reason is somewhat more simple than it was 12 years ago when I debugged this with IPv4, but still has the same main cause of LLC PPPoE.

The issue is that PPPoE adds an 8 byte header to a standard Ethernet frame, which means the interface MTU is reduced from 1500 to 1492 bytes. This means that the MSS of a TCP connection must also be reduced from 1440 to 1432 bytes. For this to work in a NAT scenario, or, indeed, a routed IPv4/IPv6 scenario, PMTUD is relied on to determine the appropriate MTU (and MSS). However, within the carrier network, there may be an MTU change occurring between pieces of equipment (DSLAM) that deal only at layer 2, and, hence, are unable to participate in PMTUD. Additionally, carriers tend to disable fragmentation, ignore the client MRU during PPPoE negotiation, and use a full 1500 byte MTU. And, just to make matters worse, MSS only applies, and is only negotiated for TCP, meaning ICMP, UDP, IPSEC and other IP protocols may break.

As discussed on the Exetel forum, this results in "baby giants" (RFC 4638), where large Ethernet jumbo frames of 1508 bytes may be seen by the customer. These may be dropped by ethernet hubs/switches, host NICs, or operating system kernels.

My solution 12 years ago was to patch my NetBSD kernel, upgrade my Ethernet switch and host NIC. Generally, either gigabit ethernet devices, or devices supporting VLANs are sufficient to support jumbo frames. This fixed the behaviour I was seeing with ICMP and UDP (and IPSEC).

For my IPv6 issue this time around, I simply added MSS clamping for IPv6 in my NetBSD npf configuration:

$ext_if = "pppoe0"
$ext_v6 = inet6(pppoe0)

procedure "norm" { normalize: "max-mss" 1432 }

group "external" on $ext_if {
    pass stateful out final family inet6 proto tcp from ! $ext_v6 to any apply "norm"
    ....
}

With this change, all the TCP connections negotiated a 1432 byte MSS and proceeded to work. Most large internet services tend to already use a lower MTU (and hence MSS) specifically to work around issues like this (eg. google.com appears to negotiate an MSS of 1360 as I check). I'll be chasing up the issue I found, and hopefully their MTU can also be reduced.

See Also

  • test-ipv6.com
  • MTU and "baby giants" (RFC4638)? on the Exetel forum.
  • <templatestyles src="Module:Citation/CS1/styles.css" />{{#invoke:Catalog lookup link|main}} Multiprotocol Encapsulation over ATM Adaptation Layer 5
  • <templatestyles src="Module:Citation/CS1/styles.css" />{{#invoke:Catalog lookup link|main}} A Method for Transmitting PPP Over Ethernet (PPPoE)
  • <templatestyles src="Module:Citation/CS1/styles.css" />{{#invoke:Catalog lookup link|main}} Accommodating a Maximum Transit Unit/Maximum Receive Unit (MTU/MRU) Greater Than 1492 in the Point-to-Point Protocol over Ethernet (PPPoE)
  • <templatestyles src="Module:Citation/CS1/styles.css" />{{#invoke:Catalog lookup link|main}} Path MTU Discovery for IP version 6
  • NetBSD patch for if_ether.h to allow baby giants.
  • NetBSD problem report kern/39203 PPPoE issues with broken MTU/MRU implementations.