Using PMTUD for a higher DNS responsiveness

June 4, 2013 by willem

Motiviation

In May 2011 we were notified (from a Japan based enthusiast) that our site wasn’t reachable over IPv6 unless the user lowered the MTU on his machine. This triggered interest in the “Path MTU Discovery black holes” problem [6]  and lead to a study [2] executed by Maikel de Boer and Jeffrey Bosma, two students from the University of Amsterdam (UvA), at NLnet Labs in June/July 2012.

During that study we learned that PMTU black-holes are especially problematic for stateless protocols, such as DNS over UDP. However, we also noticed that the IPv6 ICMP error messages (that realize the PMTUD) carry as much of the provoking packet as possible. We realised that for DNS, the state carried in the PMTUD ICMPv6 messages, might be enough for DNS servers to participate in PMTUD nevertheless.

To explore that potential we initiated another study executed by two UvA students, Hanieh Bagheri and Victor Boteanu, at NLnet Labs in January/February 2013 [1]. Below an overview of some of the considerations and steps taken in this study. We also created a proof of concept software program [5] that can extend any nameserver with PMTUD (on Linux) by listening and writing on a raw sockets on the same host as that nameserver.

Fragmentation, PMTUD and Packet-Too-Big (PTB)

IPv6 changed the way fragmentation is managed. With IPv4, fragmentation (and reassembly) of DNS-UDP packets was handled transparently by the network. With IPv6, only end-points may fragment. The size of the fragments (i.e. the smallest MTU of the links on the path between two end-points) should be detected by Path MTU Discovery (PMTUD).

PMTUD operates as follows: When a packet arrives on a link at a router somewhere on the internet, and the link to the next hop is too small for the packet to fit, the router returns a ICMPv6 Packet-Too-Big message to the sender of the packet. In the packet is, besides the MTU of the next/smaller link, as much of the causative packet as possible.

Name servers don’t do PMTUD

When an intermediate router returns an ICMPv6 Packet-Too-Big (PTB) message in response to an unfitting DNS answer, only on the OS layer of the PTB receiving name server the MTU for that destination (the resolver) is learned. The (stateless) name server does not resend the answer and the resolver has to requery.

A method has been proposed that tries to prevent PMTUD by always using the minimum MTU of 1280 [3]. However, this increases the likelihood of fragmented DNS answers. Earlier research has shown that +-10% of all end-points/resolvers discard IPv6 fragments [2][4].

A closer look at the headers in a Packet-Too-Big message

Router IPv6 Header:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| Traffic Class |           Flow Label                  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         Payload Length        |  Next Header  |   Hop Limit   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
/                Source Address = Router      IPv6 address      /
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
/           Destination Address = Name server IPv6 address      /
ICMPv6 Header:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Type = PTB   |     Code      |          Checksum             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                              MTU                              |
Name server IPv6 Header:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| Traffic Class |           Flow Label                  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         Payload Length        |  Next Header  |   Hop Limit   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
/                Source Address = Name server IPv6 address      /
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
/           Destination Address = Requester   IPv6 address      /
Optional Fragment Header:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Next = UDP   |   Reserved    |      Fragment Offset    |   |M|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
/                        Identification                         /
UDP Header:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          Source port          |      Destination port         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|            Length             |          Checksum             |
Quite a bit of the original DNS answer:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+Q+-+-+-+-+A+T+R+R+-+A+C+-+-+-+-+
|              ID               |R|Opcode |A|C|D|A|Z|D|D| RCODE |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|            QDCOUNT            |           ANCOUNT             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|            NSCOUNT            |           ARCOUNT             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
/            Query                                              /

ICMPv6 PTB messages contain as much of the invoking packet as possible without the whole packet exceeding the minimum MTU (of 1280). In theory it should carry at least 1176 of the original DNS answer (1280 – 40 IPv6 header of router – 8 ICMPv6 header – 40 IPv6 header of nameserver – 8 possible fragmentation header – 8 UDP header = 1176). The DNS question is embedded in a DNS answer near the beginning in the question section. A PTB message is thus very likely to carry both the destination of the request as the request (the question) itself. Enough information to answer it again in smaller packets.

Oberservations and Approach

To deliver a name server software independent solution, we implemented PMTUD for DNS  in a process separate from the name server. The process has to run on the same host and employs raw sockets to listen and react to PTB messages.

Care should be taken when responding to PTB messages. With insufficient caution opportunities for cache poisoning and amplification attacks are easily opened. This is caused by the fact that a malicious attacker may construct a PTB message indistinguishable from a real PTB targeted at the name server.

A summary of approaches follow, each with a brief analysis of the consequences.

Consider simply setting the TC (truncated) bit and retransmit what is left of the answer. This is the most simple and semantically correct approach. However, it is very easy for a malicious attacker to forge a requester’s source address. An attacker does not even have to actually spoof the source address; It just pretends it is an router on the path between a (supposedly) requester and the name sever and creates a false “Name server IPv6” header and everything below it. By simply trusting and retransmitting the answer in a PTB message, we allow third parties to send out any DNS answer from our name server’s source address; and in doing so open a cache poisoning attack vector. (Although clients are unlikely to interpret a truncated message and will most likely rerequest over TCP).

DNS answers contain the query for which it is an answer (including query ID and flags). It may be extracted from the answer and reinjected (with raw sockets). However, EDNS0 information, like the for the requester acceptable message size and the DO (DNSSEC OK) bit, is almost certainly missing because it is situated at the end of the original message (which is torn off). We can assume that it was present in the original request though, because otherwise it would not have provoked a big response.

Setting the acceptable message size to a prevailing value, like 4096, might seem just, however this would enable a malicious attacker to provoke a bigger answer than the size of the forged PTB message and thereby open an very easy opportunity for amplification attacks. No actual source address spoofing required. Therefore, care must be taken that only answers smaller or equal than the PTB message size will be given. A reinjected request must restrict the size of the answer by setting the EDNS0 acceptable message size to the size of the PTB. This approach is implemented in the proof-of-concept program [5].

Tests and measurements
We have assessed the proof-of-concept program using RIPE Atlas; a global network of probes that measure Internet connectivity and reachability. Using RIPE Atlas we were able to send DNS queries to a name server at NLnet Labs from 863 different IPv6 vantage points, all from different networks at different locations in the world.

The name server at NLnet Labs served a zone with RR’s that, when queried directly over IPv6, would generate a precisely sized answers:

  • 1280.gorilla.nlnetlabs.nl TXT produces an 1280 bytes packet answer
  • 1500.gorilla.nlnetlabs.nl TXT produces
    • a 1500 bytes packet when not fragmenting to the minimum MTU, or when
    • fragmented to the minimum MTU,
      a fragment of 1280 bytes and one of 180 bytes
  • 1600.gorilla.nlnetlabs.nl TXT procudes
    • fragmented to the maximum MTU,
      a fragment of 1496 bytes and one of 160 bytes
    • fragmented to the minimum MTU,
      a fragment of 1280 bytes and one of 376 bytes.

Fragmenting to minimum MTU could be turned on or off in the name server as desired.

We identified for each probe how the network behaves with different types of messages. We started with a baseline measurement by querying for small answers (1280.gorilla.nlnetlabs.nl) to see which probes can reach our name server. 863 IPv6 probes successfully received the answer.

Then we identified probes with a PMTU smaller  than 1496 by answering the queries with large packets. We let the probes query for 1600.gorilla.nlnetlabs.nl TXT with “fragmenting to minimum MTU” on our name server turned off.
439 probes did not receive an answer. Those probes have a PMTU smaller than 1496.

Then we identified the probes having fragment filtering firewalls by sending a big answer in smaller packets/fragments. Again we let the probes query for 1600.gorilla.nlnetlabs.nl TXT, but this time with “fragmenting to minimum MTU” on our name server turned on.
68 probes did not receive an answer at all. Those probes are behind fragment filtering firewalls

Besides counting the answers received by probes, we also analysed traffic to the name server. One noticeable thing in those captures was that: 18 of the 68 fragment filtering probes,  sent an ICMPv6 “Administratively Prohibited” message in response to receiving a fragment. It appears some firewalls are friendly enough to report that they drop the message this way. Just like PTB, “Administratively Prohibited” also has most of the DNS answer in its payload. We have adapted our proof-of-concept program to respond to “Administratively Prohibited” too.

Finally we tested the proof-of-concept program by querying from the probes for 1500.gorilla.nlnetlabs.nl TXT with “fragmenting to minimum MTU” on our name server turned off. 23 probes still did not receive an answer at all. These must be the probes that are on a smaller PMTU for which no PTB messages are sent.  The remaining probes returned PTB only occasionally or returned a non-standard sized PTB too small to extract a query from.

In the table below an overview is given of the taken measurements and the answers that could be counted. Note that for each measurement, each probe queried the name server several times (with 20 minutes interval). So it is possible for a probe to count for receiving a good answer, and receiving no answer.

measurement msg size max. pkt size # probes # answered # no answer
baseline 1280 1280 863 863 (100.0%) 0 (0.0%)
PMTU < 1500 1600 1500 861 422 (49.0%) 441 (51.1%)
fragment filters 1600 1280 861 795 (92.3%) 84 (9.8%)

With proof-of-concept program running:

unfragmented 1500 1500 828 805 (97.2%) 35 (4.2%)

All measurements were performed in the period 25 May – 2 June 2013.

Conclusion

Current name servers do not respond to PTB messages. As a result clients with smaller PMTUs will not receive an answer from an initial query. They will still receive a fragmented answer when a second query is sent within the PMTU expiry timeout (10 minutes). Responding to PTB will serve the initial queries too and will also serve clients that query for big answers infrequently.

Also, on RIPE atlas, responding to PTB is gives better DNS responsiveness than avoiding PTB, because fragments are more frequently filtered than PTB messages are not sent. Of the probes that did not receive an initial answer, 64% will receive one when responding to PTBs. Also, when answers exist in the 1232-1452 size range, more queries will be answered without fragmenting, resulting in an even bigger responsiveness increase.

References

  1. H. Bagheri, V. Boteanu, ”Making do with what we’ve got: Using PMTUD for a higher DNS responsiveness” (February 2013)
    http://www.delaat.net/rp/2012-2013/p55/report.pdf
  2. M. de Boer, J. Bosma, “Discovering Path MTU black holes on the Internet using RIPE Atlas”, (July 2012)
    http://nlnetlabs.nl/downloads/publications/pmtu-black-holes-msc-thesis.pdf
  3. M. Andrews, “DNS and UDP fragmentation”, draft-andrews-dnsext-udp-fragmentation-01, (January 2012)
    http://tools.ietf.org/html/draft-andrews-dnsext-udp-fragmentation-01
  4. J. van den Broek, R. van Rijswijk, A. Pras, A. Sperotto, “DNSSEC and firewalls – Deployment problems and solutions”, Private Communication, Pending Publication, (2012)
    Results are presented here: https://ripe65.ripe.net/presentations/167-20120926_-_RIPE65_-_Amsterdam_-_DNSSEC_reco_draft.pdf
  5. The PMTUD for DNS activating Proof-Of-Concept script
    http://www.nlnetlabs.nl/downloads/pmtud4dns.py
  6. K. Lahley, “TCP Problems with Path MTU Discovery”, RFC2923 (September 2000)
    http://tools.ietf.org/html/rfc2923
  7. W. Toorop, “Using Path MTU Discovery (PMTUD) for a higher DNS responsiveness” presentation slides at the 5th CENTR R&D workshop, Amsterdam (June 2013).
    https://www.centr.org/system/files/agenda/attachment/rd5-toorop_using_path_mtu_discovery-20130604.pdf

 

Wed Sep 25 2013

© Stichting NLnet Labs

Science Park 400, 1098 XH Amsterdam, The Netherlands

labs@nlnetlabs.nl, subsidised by NLnet and SIDN.