- 1 What is path MTU discovery?
- 2 Why path MTU discovery fails
- 3 Manual MTU discovery
- 4 MTU and PPPoE
- 5 Links
What is path MTU discovery?¶
Path MTU Discovery (PMTUD) is a standardized technique in computer networking for determining the maximum transmission unit (MTU) size on the network path between two Internet Protocol (IP) hosts, usually with the goal of avoiding IP fragmentation. Path MTU Discovery works by setting the Don’t Fragment (DF) option bit in the IP headers of outgoing packets. Then, any device along the path whose MTU is smaller than the packet will drop it, and send back an Internet Control Message Protocol (ICMP) Fragmentation Needed (Type 3, Code 4) message containing its MTU, allowing the source host to reduce its Path MTU appropriately. The process repeats until the MTU is small enough to traverse the entire path without fragmentation.
When a host needs to transmit data out an interface, it references the interface’s Maximum Transmission Unit (MTU) to determine how much data it can put into each packet… Unfortunately, not all links which compose the Internet have the same MTU… When a router decides to forward an IPv4 packet out an interface, but determines that the packet size exceeds the interface’s MTU, the router must fragment the packet to transmit it as two (or more) individual pieces, each within the link MTU. Fragmentation is expensive both in router resources and in bandwidth utilization; new headers must be generated and attached to each fragment…. To utilize a path in the most efficient manner possible, hosts must find the path MTU; this is the smallest MTU of any link in the path to the distant end. For example, for two hosts communicating across three routed links with independent MTUs of 1500, 800, and 1200 bytes, the smallest (800 bytes) must be assumed by each end host to avoid fragmentation.
Why path MTU discovery fails¶
it’s impossible to know the MTU of each link through which a packet might travel. RFC 1191 defines path MTU discovery, a simple process through which a host can detect a path MTU smaller than its interface MTU. Two components are key to this process: the Don’t Fragment (DF) bit of the IP header, and a subcode of the ICMP Destination Unreachable message, Fragmentation Needed… Setting the DF bit in an IP packet prevents a router from performing fragmentation when it encounters an MTU less than the packet size. Instead, the packet is discarded and an ICMP Fragmentation Needed message is sent to the originating host.
But this can go horribly wrong. Some servers block ICMP, and so can’t correctly respond with a “Fragmentation Needed” message:
The packet has the DF bit set, telling Joe’s ISP to drop the packet and send the server an ICMP saying what packet size will fit. The ISP sends an ICMP saying the largest size is 1492 bytes… However, if the server does not get that ICMP, things go downhill fast. The server is expecting an acknowledgement from Joe’s computer, but Joe’s computer didn’t get the packet, so the acknowledgment never comes. After awhile, the server gives up waiting and sends the same 1500-byte packet again. Joe’s ISP sends back another ICMP. The server doesn’t get it again and tries sending the 1500-byte packet a third time, then a fourth, a fifth, and so on. Meanwhile Joe’s computer can’t tell this is happening and is waiting for a response from the server. Eventually, it gives up and sends the server a connection reset. It reports a network failure to Joe, who is left wondering what happened. He may soon discover he can access nearly all other sites, just not the one he wanted.
These servers which don’t get ICMP correct, and thus don’t respond correctly, are called ‘black holes’.
- about black hole routers: support.microsoft.com/kb/159211
Manual MTU discovery¶
traceroute -F <host> <size>
Adds the ‘do not fragment’ bit to the ICMP packets and specifies the size of the packets.
lets you test what the max size
apt-get install iputils-tracepath tracepath -n <host>
This will identify the max MTU supported by each hop in the network route.
ping -M do -s <size> <host>
This will ping the host with a packet of the given size with the ‘do not fragment’ bit set. If the ping doesn’t work, then you know that you need a lower MTU. Note, however, that the size you specify here is not the same as the size you would specify for the MTU. Because of the overhead of ICMP, the max ping size will be 28 bytes lower than the max MTU size, or 36 bytes lower over a PPPoE connection.
So, if the ‘do not fragment’ ping test works with 1400 bytes over a PPPoE connection, then you could set the MTU to be 1436.
To test if 1500 works, for example:
ping -M do -s 1472 riseup.net
MTU and PPPoE¶
PPPoE uses 8-bytes of every packet. The effective MTU over a 1500 byte link is really then 1492.
When using ping to test MTU, it gets a little more complicated, because ICMP adds extra space to the headers.
TCP, IP, PPPoE, MTU magic numbers (from www.dslreports.com/faq/695)
|1500||The biggest-sized IP packet that can normally traverse the Internet without getting fragmented. Typical MTU for non-PPPoE, non-VPN connections.|
|1492||The maximum MTU recommended for Internet PPPoE implementations.|
|1472||The maximum ping data payload before fragmentation errors are received on non-PPPoE, non-VPN connections.|
|1460||TCP Data size (MSS) when MTU is 1500 and not using PPPoE.|
|1464||The maximum ping data payload before fragmentation errors are received when using a PPPoE-connected machine.|
|1452||TCP Data size (MSS) when MTU is 1492 and using PPPoE.|
|576||Typically recommended as the MTU for dial-up type applications, leaving 536 bytes of TCP data.|
|48||The sum of IP, TCP and PPPoE headers.|
|40||The sum of IP and TCP headers.|
|36||The sum of IP, ICMP, and PPPoE headers. (i think)|
|28||The sum of IP and ICMP headers.|