RRL SLIP and Response Spoofing

September 16, 2013 by wouter

The recent disclosure by ANSSI (CVE-2013-5661) notes problems with RRL Slip and response spoofing. This document explains explains the tradeoffs. Other documents with advice:

Note that the security advise is about trade-offs between the vulnerability to reflective DoS versus the likelihood of individuals being cache poisoned and as such a generic operational DNS trade-off. There are no specific vulnerabilities in the NSD implementation; rather the vulnerability is caused by the network throttling dropping answers.

NSD has response rate limiting (RRL) implemented. This exists in NSD3 and NSD4, when configured with –enable-ratelimit. The rate limiting uses SLIP to send back truncated replies and drop other replies. The default slip rate is 2. The slip rate is randomized, and it is therefore difficult to predict exactly which response is going to be truncated and which response is going to be dropped.

When the zones served with NSD have DNSSEC signatures, it would be best to use the default slip rate of 2. Spoofing can be countered with DNSSEC validation of the signatures. And reflective DoS is countered with the RRL slip rate of 2. The slip rate of 2 causes reflective DoS attacks to lose half their bandwidth, and protects the target, while legitimate clients that are falsely identified as spoofing targets (false positives) experience delays in receiving answers.

When the zones that are loaded are not protected with DNSSEC, the choices are less optimal. The RRL slip rate of 2 solves reflection, but response spoofing, as the (ANSSI report) notes, is a problem. You can also choose an RRL slip rate of 1, which truncates every response, and the possibility to spoof responses as reported by ANSSI is removed. But with RRL slip 1 the server acts as a reflector for spoofed traffic. Albeit as a reflector that does not change the size of that traffic, so without amplification.

NLnet Labs recommends DNSSEC for DNS data protection, including detection of spoofing. We realize that operators of authoritative name servers may not be able to influence the operators of recursive name servers to turn on validation. Turning on DNSSEC on your zones allows the recursive name server operators to make their choice while a slip value of 2 decreases the attractiveness of the global DNS system as a DoS amplification tool.

NSD4 TCP Performance

July 8, 2013 by wouter

For NSD 4 the TCP performance was optimised, with different socket handling compared to NSD 3. This article discusses a TCP performance test for NSD 4. In previous blog contributions, general (UDP) performance was measured and memory usage was analysed for NSD 4.

The TCP performance was measured by taking the average qps reported by the dnstcpbench tool from the PowerDNS source distribution. (Thanks for a great tool!) The timeout was set to 100 msec. On FreeBSD the system sends connection resets when a TCP connection cannot be established, and in this situation the tool overreports the qps. To mitigate this the qps was scaled back by multiplying by the fraction of succeeded tcp queries. The scaled back qps is close to the median qps that is also reported by the tool. For Linux, such scaling was not performed, and the average and median are close together.

You can click to enlarge these charts:


The highest TCP queries per second performance on Linux is about 14k qps by Yadifa, and then followed by NSD 4. On FreeBSD performance is higher, about 16k qps, and NSD 4 is fastest, with Knot and then Yadifa following with about 14k qps. Notice how Bind performs at 12k qps on FreeBSD, and 8k qps on Linux. PowerDNS remains at about the same speed. NSD 4 has higher TCP qps than NSD 3, on Linux and on FreeBSD.

On FreeBSD, software that cannot handle the load produces connection errors. The number of connection errors goes down when more threads are used by NSD 3, and Knot. For other software the thread count does not really influence this connection error count, but it does increase the qps performance. For Yadifa the qps performance degrades substantially when more threads are used, and it has a large number of connection errors because of that.  In general the connection errors are caused by a lack of performance, and a TCP connection cannot be established. Linux apparently deals differently with this (turns it into timeouts), this may cause some qps reporting differences between the OSes. In both cases the charts represent the average successful TCP qps.

The same pattern as for UDP can be seen with the number of threads, for NSD 4, the best Linux performance uses 2 cpu, and performance increases better and higher on FreeBSD, but the optimum here is 3 cpu instead of 4 cpu for UDP. Other software similarly benefits from more CPU power. It turns out that the PowerDNS option to get extra distribution threads adds UDP workers and not TCP workers, this is why performance does not scale up in these charts for PowerDNS. Yadifa performance goes down for both Linux and FreeBSD when more threads are in use.

The zone served in these experiments is a synthetic root zone (as used in previous tests), with 1 million random queries for unsigned delegations. PowerDNS uses its zonefile backend. The same test system(s) as used in the previous measurements are used, a PowerEdge 1950 with 4 cores at 2 GHz is running the DNS server.

NSD4 High Memory Usage

July 5, 2013 by wouter

NSD 4 is currently in beta and we are expecting a release candidate soon. This is the second of a series of blog-posts in which we describe some findings that may help you to optimize your NSD4 installation. In the first article we talked about general performance, this article muses about memory usage. (This article is based on the forthcoming nsd-4.0.0b5)

NSD4 Memory usage

The memory intensive architectural trade-off between pre-compiling answers and high speed serving of packets has been part of the NSD design since its first incarnation almost a decade ago.

With NSD 4 we continued the pre-compilation philosophy.  It even seems that, compared to NSD 3, NSD 4 uses more memory. Why? How?


Memory is being consumed to achieve speed improvements, but also for improvements such that administrators can update the zones served without needing the restart that so prominently featured NSD 3; NSD 4 can update, add, and remove zones without a restart. NSD 4 can receive IXFR (incremental zone transfers) and apply them in a time that depends on the size of that transfer, independent of zone size. Besides, during an update the database as stored on disk (nsd.db) is updated. On incremental updates of NSEC3 signed zones the nsec3-precompiled answers are all updated as well. All these features, that improve usability and speed imply that disk usage and memory usage increased compared to NSD 3.


To compare the memory a Dell PowerEdge 1950 with 8 GB of RAM, a large HDD and 2 GHz Xeon CPU (the same machine used for the performance tests earlier) was used to load the .NL zone (the authoritative Dutch top-level-domain) from June 2013. This is a fairly large zone, its zonefile is about 1.5 GB. It has 5.3 million delegations. It is signed with DNSSEC, and uses NSEC3 (opt-out), and has about 28% signed delegations. This means, with the nsec3 domains for the signed delegations, it has 5.3 * 1.28 = 6.8 million domain names with associated resource records.

The figure below shows the memory use of the daemon. ‘Rss’ represents the resident memory used by the daemon after starting. ‘Rss other’ is measured by tracking the total system memory usage. ‘Compiler’ represent the memory used by a zone compiler (if the software has such) and added onto it. This assumes you run the zone compiler and the DNS server on the same machine. If swap space is used we add it separately. Finally, the virtual memory usage (‘vsz extra’) is also added onto the bar, that entry reflects the size of the memory-mapped I/O to the nsd.db for NSD 4. Note that the memory-mapped I/O does not need to reside in core-memory.


We configured our measurement machine with 8 GB of RAM and we observe that the NL zone barely fits with NSD 4 (16 GB would be a better and realistic configuration). Bind and Yadifa can easily serve the zone from 8 GB of core memory. The zone compiler of Knot runs into swap space because it becomes very big. NSD 4 causes swap space to be used for a different reason, it (barely) fits in the 8 GB (about 7 GB), but its heavy use of memory mapped I/O causes the Linux kernel to make space in RAM by swapping other stuff to disk. The 8 GB of RAM is insufficient, you can start the daemon, but provisioning for operations it is too tight for common tasks, such as reloading the zone from zonefile and processing a (large) AXFR. However, because of it’s new design, NSD 4 could actually work in this RAM if it handled only relatively small IXFR updates.

The NSD 4 usage is the main daemon, plus a very small xfrd (xfrd now uses less than it did in NSD 3). The main daemon uses more memory for an increase in speed and also for better NSEC3 zone update processing. The virtual memory is the memory mapped nsd.db file. The kernel uses its virtual memory cache mechanism to handle this I/O, and you can provision for less than the total nsd.db file (at the cost of update processing speed). Realistic provisioning for NSD 4 here is about 10% of the virtual space to 100% of the virtual space, somewhere between 9 GB – 17 GB. It would be wise to add another multiple of memory on this for large zone changes, which because NSD keeps serving the old zone while it is busy setting up the new version of the zone, uses about twice the memory for that zone, so add another 6-7 Gb (the rss) for this (AXFR, zonefile change).

The NSD 3 usage is the base daemon, plus a xfrd (the other process), plus zonec. For continued operations another (same sized) chunk should be added for nsdc update, that updates the zonefiles and cleans up the nsd.db. This causes NSD 3 to use more memory in its provision than is necessary to run NSD 4 with a low disk I/O provision. This is because NSD 4 does not have zonec and nsdc update, this has been folded into the main daemon, and is performed during reload tasks (the daemon keeps serving DNS), and is what causes the disk structures to be much larger.

NSD 4 comes with a tool that tells you estimates for the size of RAM and disk needed for a zone. For this zone it indicates that 6.9 GB is used for RAM and 11 GB is used for nsd.db. The tool estimates that about 8 GB to about 17 GB could be used to run the NL zone (with 10% – 100% of the nsd.db memory mapped). As an aside, you build the nsd-mem tool by ‘make nsd-mem’ in the source repository.

NSEC3, memory and performance

NSD4 uses precompiled NSEC3 answers. Without pre-compilation of NSEC3, providing answers that proof the non-existence of a query (NXDOMAIN proof) involve a number of hash-calculations that bog down the performance of the name server. Obviously this precompiled data takes memory but results in NSD 4 answering queries much faster, as it is not CPU-bound by the nsec3 hashing. The precompilation means hashing all the names in the zone, something that takes 60-80 seconds on our measurement machine for the NL zone. In NSD 4 to handle new zone updates quickly, it keeps administration to incrementally update its precompiled NSEC3 data. This means IXFR updates to NSEC3 zones are handled by hashing the names affected by the update and not the entire zone. Note that NSD 4 does not allocate NSEC3 memory for NSEC (non-NSEC3) and unsigned zones, and this could make it use less memory than NSD 3 for non-NSEC3 zones.

If the NL zone was signed with NSEC, with the same key sizes, then the zonefile file would become 2.7 GB for the 5.3 million delegations. The memory usage goes up because there is no opt-out, but goes down because there is no nsec3 administration. The nsd-mem tool calculates 6.0 GB RAM and 10.6 GB disk usage and estimates 7.8 – 16.6 GB. This is nearly identical to the NSEC3 case, slightly less on the RAM and disk usage. NSD 3 uses 4.5 GB (rss) + 4.5 GB (other proc) + about 4 GB (zonec). And omitting the nsdc update usage this is already 13 GB for NSD 3.

Starting the server

In NSD 4 a restart of the daemon should only be necessary for system reasons (kernel updates). With its nsd-control tool you can change the other configuration on the fly without a restart. NSD 3 needed to zonec and restart the daemon to serve a new zone and NSD 4 does not need to do so.

This shows the speed of starting the daemon:

read_speedFor NSD 4 you can compile a new zone without restart, and while serving the old zone. Its zone compiler also has to write the 11 GB nsd.db to disk, and this makes it slower than the NSD 3 zone compiler (it is the same parser). The Knot compiler is likely from before its recent Ragel updates that speed it up. The initial start for NSD 4 measures the time to read the NL zone from the 11 GB nsd.db, this would happen after a system restart for example.

The stop time for NSD 3 and 4 is 0, below one second. For the other daemons this is curiously slow. But these numbers are very small compared to the system start numbers.

Thus, if you get a fresh zonefile and want to start, you can use the left bar for NSD 4, add up the two bars for NSD 3, and add up the two bars for Knot. For a system restart, the daemon start value gives the time needed to setup the daemon memory.

Summary: From NSD 3 to NSD 4

If you are running NSD 3 today and you do not experience any memory issues, such as extensive swapping, during the full serving-updating-zone-compiling cycle you should not experience any problems migrating to NSD 4. This is mainly due to the fact that what a significant fraction of the memory use in NSD 4, is memory-mapped to disk and is not accessed for serving answers to DNS queries.

However, we do advice you to run the nsd-mem tool that ships with NSD 4 to test your actual requirements. That will give you an exact calculation of your core-needs.


The software tested is NSD 4.0.0b5, NSD 3.2.15, Bind 9.9.2-P1, Knot 1.2.0, and Yadifa 1.0.2-2337. The OS is Linux 3.9, the file system is ext4 on hdd.

NSD4 Performance Measurements

by wouter

NSD 4 is currently in beta and we are expecting a release candidate soon. This is the first of a series of blog-posts in which we describe some findings that may help you to optimize your NSD4 installation. The article also serves as an explanation for differences that may show up in various benchmarks.

NSD4 Optimisation

The NSD4 code has been optimised: The latest beta(4.0.0b5) has a couple of optimizations (and beta bug fixes).  We tested the results of our efforts on NLnet Labs’ DISTEL testlab and performed a number of speed measurements.  Several other common open source nameservers are also tested.

A quick view on the results, the figures below show the query load in kqps (1000 of queries per second) for which the different nameserver implementations still manage to answer 100% of all queries. Higher queries rates lead to packets being dropped. Some servers had 99.9% responses on lower qps but then recovered to 100% queries answered at higher query rate. This may be the result of test measurement instability and was ignored.


Similar results as reported by the Knot and Yadifa teams are found, but delving deeper into the performance measurements reveals some subtleties in behavior that bias the results. Knot and Yadifa show very similar or better performance than NSD when they are configured on Linux based servers to use exactly 4 out of 4 cpus. Of course, this is strange and we searched further to look at what caused these outcomes, and it turns out to be related to the number of threads (and processes) and the choice of operating system.

NSD3 has about the same performance as Knot (on Linux). Yadifa is a little faster than Knot. NSD4 is faster than NSD3, and with optimizations in implemented in beta5 even more so.

Knot and Yadifa use a threaded model, where they have threads in one single process that service the DNS requests.  Bind can also be compiled with thread support, which was done here for comparison. It seems that Bind can scale up its performance in both Linux and FreeBSD with more threads, up to 3x more performance. NSD is different to the other implementations in that it uses processes that service the DNS requests instead of threads. This is where operating systems differences start to matter. Operating systems differ in their  threads and processes implementations and in their implementations of the network code.

FreeBSD can increase its packet output when the number of threads is increased, and it can also increase its packet output when the number of processes is increased. However, Linux treats threads and processes very differently, and in both cases, using a number of workers equal to the number of CPU cores is not optimal. The ksoftirqd Linux irq (interrupt) handler uses up the remainder of the four CPU cores when the server uses less than all four cores, likely the implementation of irq handling is a source of measured differences. Interrupts are caused by incoming packets and handling the interrupts from the network card under high load needs a lot of processing power. FreeBSD can push out more packets on the same hardware configuration, with the same software.

The optimal choice of the number of CPU cores to devote to DNS processing depends on the software. On FreeBSD, use as many cores as installed in the system. On Linux, use less than the total number of cores, 2 out of 4 cores for NSD. And 3 out of 4 cores for Yadifa. On Linux, Bind and Knot benefit from using 4 out of 4 cores.

Our Measurements: DISTEL Test Setup

Measurements were carried out using a modified DISTEL testlab setup. The configuration of the DISTEL testlab is with a number of (mostly identical) machines. There is a Player, a Server and a number of replay machines.


The player controls the action, this is scripted. The control is performed over ssh over the control network. The player starts the server software on the server machine, listening on the private LAN. A set of queries is replayed from the replay machines. The resulting replies are captured with tcpdump. Because the PowerEdge 1950 on the server is capable of replying with up to 140.000-160.000 qps on Linux and 220.000 qps on FreeBSD, multiple replay machines are necessary to send and record traffic to the server. Each replay machine sends 1/5 of the query traffic. This adds up to the total qps for the server machine. Test code on the replay machines measures if the actual sent packets correspond with the intended query rate (with timers). The test is run for a fixed time period, so that faster query rates still take the same time period. The maximum qps for this setup is around 430-440 kqps, but the measurements are up to 400kqps. Instability of the outcome seems to increase a little for the higher speeds (esp. for speeds above 350k which cause trouble for some (weaker) replay machines). In any case, the instability is several percent of the response rate percentage.

The detailed graphs for Linux 3.9 (click to enlarge):


The detailed graphs for FreeBSD 9.1 (click to enlarge):


The bar graphs at the beginning of this post are based on these detailed plots, analyzing where 100% responses occur.

The software tested is BIND 9.9.2-P1, NSD 3.2.15, NSD 4.0.0b4, NSD 4.0.0b5, Knot-1.2.0 and Yadifa 1.0.2-2337. The server hardware is a Dell PowerEdge 1950, 2 x 64-bit Intel Xeon CPU 2.0 GHz, thus 4 cores in total, 4 MB cache, 1333 MHz FSB. The Ethernet is the on-board Broadcom NetXtreme II BCM5708 1000Base-T interface. Settings are left at their defaults, if possible. The zone that is loaded is an artificial (test) root zone that contains around 500 delegations, it is not signed with DNSSEC. This zone is an old zone, created before the root was signed (and it is the same zone as previously used for measurements). The order of the queries is random and there are no queries that result in nxdomains.

Future work

In preparation for the release of NSD4 we are measuring the behavior for larger zones in terms of performance and memory usage.

Further Reading:


Using PMTUD for a higher DNS responsiveness

June 4, 2013 by willem


In May 2011 we were notified (from a Japan based enthusiast) that our site wasn’t reachable over IPv6 unless the user lowered the MTU on his machine. This triggered interest in the “Path MTU Discovery black holes” problem [6]  and lead to a study [2] executed by Maikel de Boer and Jeffrey Bosma, two students from the University of Amsterdam (UvA), at NLnet Labs in June/July 2012.

During that study we learned that PMTU black-holes are especially problematic for stateless protocols, such as DNS over UDP. However, we also noticed that the IPv6 ICMP error messages (that realize the PMTUD) carry as much of the provoking packet as possible. We realised that for DNS, the state carried in the PMTUD ICMPv6 messages, might be enough for DNS servers to participate in PMTUD nevertheless.

To explore that potential we initiated another study executed by two UvA students, Hanieh Bagheri and Victor Boteanu, at NLnet Labs in January/February 2013 [1]. Below an overview of some of the considerations and steps taken in this study. We also created a proof of concept software program [5] that can extend any nameserver with PMTUD (on Linux) by listening and writing on a raw sockets on the same host as that nameserver.

Fragmentation, PMTUD and Packet-Too-Big (PTB)

IPv6 changed the way fragmentation is managed. With IPv4, fragmentation (and reassembly) of DNS-UDP packets was handled transparently by the network. With IPv6, only end-points may fragment. The size of the fragments (i.e. the smallest MTU of the links on the path between two end-points) should be detected by Path MTU Discovery (PMTUD).

PMTUD operates as follows: When a packet arrives on a link at a router somewhere on the internet, and the link to the next hop is too small for the packet to fit, the router returns a ICMPv6 Packet-Too-Big message to the sender of the packet. In the packet is, besides the MTU of the next/smaller link, as much of the causative packet as possible.

Name servers don’t do PMTUD

When an intermediate router returns an ICMPv6 Packet-Too-Big (PTB) message in response to an unfitting DNS answer, only on the OS layer of the PTB receiving name server the MTU for that destination (the resolver) is learned. The (stateless) name server does not resend the answer and the resolver has to requery.

A method has been proposed that tries to prevent PMTUD by always using the minimum MTU of 1280 [3]. However, this increases the likelihood of fragmented DNS answers. Earlier research has shown that +-10% of all end-points/resolvers discard IPv6 fragments [2][4].

A closer look at the headers in a Packet-Too-Big message

Router IPv6 Header:
|Version| Traffic Class |           Flow Label                  |
|         Payload Length        |  Next Header  |   Hop Limit   |
/                Source Address = Router      IPv6 address      /
/           Destination Address = Name server IPv6 address      /
ICMPv6 Header:
|  Type = PTB   |     Code      |          Checksum             |
|                              MTU                              |
Name server IPv6 Header:
|Version| Traffic Class |           Flow Label                  |
|         Payload Length        |  Next Header  |   Hop Limit   |
/                Source Address = Name server IPv6 address      /
/           Destination Address = Requester   IPv6 address      /
Optional Fragment Header:
|  Next = UDP   |   Reserved    |      Fragment Offset    |   |M|
/                        Identification                         /
UDP Header:
|          Source port          |      Destination port         |
|            Length             |          Checksum             |
Quite a bit of the original DNS answer:
|              ID               |R|Opcode |A|C|D|A|Z|D|D| RCODE |
|            QDCOUNT            |           ANCOUNT             |
|            NSCOUNT            |           ARCOUNT             |
/            Query                                              /

ICMPv6 PTB messages contain as much of the invoking packet as possible without the whole packet exceeding the minimum MTU (of 1280). In theory it should carry at least 1176 of the original DNS answer (1280 – 40 IPv6 header of router – 8 ICMPv6 header – 40 IPv6 header of nameserver – 8 possible fragmentation header – 8 UDP header = 1176). The DNS question is embedded in a DNS answer near the beginning in the question section. A PTB message is thus very likely to carry both the destination of the request as the request (the question) itself. Enough information to answer it again in smaller packets.

Oberservations and Approach

To deliver a name server software independent solution, we implemented PMTUD for DNS  in a process separate from the name server. The process has to run on the same host and employs raw sockets to listen and react to PTB messages.

Care should be taken when responding to PTB messages. With insufficient caution opportunities for cache poisoning and amplification attacks are easily opened. This is caused by the fact that a malicious attacker may construct a PTB message indistinguishable from a real PTB targeted at the name server.

A summary of approaches follow, each with a brief analysis of the consequences.

Consider simply setting the TC (truncated) bit and retransmit what is left of the answer. This is the most simple and semantically correct approach. However, it is very easy for a malicious attacker to forge a requester’s source address. An attacker does not even have to actually spoof the source address; It just pretends it is an router on the path between a (supposedly) requester and the name sever and creates a false “Name server IPv6” header and everything below it. By simply trusting and retransmitting the answer in a PTB message, we allow third parties to send out any DNS answer from our name server’s source address; and in doing so open a cache poisoning attack vector. (Although clients are unlikely to interpret a truncated message and will most likely rerequest over TCP).

DNS answers contain the query for which it is an answer (including query ID and flags). It may be extracted from the answer and reinjected (with raw sockets). However, EDNS0 information, like the for the requester acceptable message size and the DO (DNSSEC OK) bit, is almost certainly missing because it is situated at the end of the original message (which is torn off). We can assume that it was present in the original request though, because otherwise it would not have provoked a big response.

Setting the acceptable message size to a prevailing value, like 4096, might seem just, however this would enable a malicious attacker to provoke a bigger answer than the size of the forged PTB message and thereby open an very easy opportunity for amplification attacks. No actual source address spoofing required. Therefore, care must be taken that only answers smaller or equal than the PTB message size will be given. A reinjected request must restrict the size of the answer by setting the EDNS0 acceptable message size to the size of the PTB. This approach is implemented in the proof-of-concept program [5].

Tests and measurements
We have assessed the proof-of-concept program using RIPE Atlas; a global network of probes that measure Internet connectivity and reachability. Using RIPE Atlas we were able to send DNS queries to a name server at NLnet Labs from 863 different IPv6 vantage points, all from different networks at different locations in the world.

The name server at NLnet Labs served a zone with RR’s that, when queried directly over IPv6, would generate a precisely sized answers:

  • 1280.gorilla.nlnetlabs.nl TXT produces an 1280 bytes packet answer
  • 1500.gorilla.nlnetlabs.nl TXT produces
    • a 1500 bytes packet when not fragmenting to the minimum MTU, or when
    • fragmented to the minimum MTU,
      a fragment of 1280 bytes and one of 180 bytes
  • 1600.gorilla.nlnetlabs.nl TXT procudes
    • fragmented to the maximum MTU,
      a fragment of 1496 bytes and one of 160 bytes
    • fragmented to the minimum MTU,
      a fragment of 1280 bytes and one of 376 bytes.

Fragmenting to minimum MTU could be turned on or off in the name server as desired.

We identified for each probe how the network behaves with different types of messages. We started with a baseline measurement by querying for small answers (1280.gorilla.nlnetlabs.nl) to see which probes can reach our name server. 863 IPv6 probes successfully received the answer.

Then we identified probes with a PMTU smaller  than 1496 by answering the queries with large packets. We let the probes query for 1600.gorilla.nlnetlabs.nl TXT with “fragmenting to minimum MTU” on our name server turned off.
439 probes did not receive an answer. Those probes have a PMTU smaller than 1496.

Then we identified the probes having fragment filtering firewalls by sending a big answer in smaller packets/fragments. Again we let the probes query for 1600.gorilla.nlnetlabs.nl TXT, but this time with “fragmenting to minimum MTU” on our name server turned on.
68 probes did not receive an answer at all. Those probes are behind fragment filtering firewalls

Besides counting the answers received by probes, we also analysed traffic to the name server. One noticeable thing in those captures was that: 18 of the 68 fragment filtering probes,  sent an ICMPv6 “Administratively Prohibited” message in response to receiving a fragment. It appears some firewalls are friendly enough to report that they drop the message this way. Just like PTB, “Administratively Prohibited” also has most of the DNS answer in its payload. We have adapted our proof-of-concept program to respond to “Administratively Prohibited” too.

Finally we tested the proof-of-concept program by querying from the probes for 1500.gorilla.nlnetlabs.nl TXT with “fragmenting to minimum MTU” on our name server turned off. 23 probes still did not receive an answer at all. These must be the probes that are on a smaller PMTU for which no PTB messages are sent.  The remaining probes returned PTB only occasionally or returned a non-standard sized PTB too small to extract a query from.

In the table below an overview is given of the taken measurements and the answers that could be counted. Note that for each measurement, each probe queried the name server several times (with 20 minutes interval). So it is possible for a probe to count for receiving a good answer, and receiving no answer.

measurement msg size max. pkt size # probes # answered # no answer
baseline 1280 1280 863 863 (100.0%) 0 (0.0%)
PMTU < 1500 1600 1500 861 422 (49.0%) 441 (51.1%)
fragment filters 1600 1280 861 795 (92.3%) 84 (9.8%)

With proof-of-concept program running:

unfragmented 1500 1500 828 805 (97.2%) 35 (4.2%)

All measurements were performed in the period 25 May – 2 June 2013.


Current name servers do not respond to PTB messages. As a result clients with smaller PMTUs will not receive an answer from an initial query. They will still receive a fragmented answer when a second query is sent within the PMTU expiry timeout (10 minutes). Responding to PTB will serve the initial queries too and will also serve clients that query for big answers infrequently.

Also, on RIPE atlas, responding to PTB is gives better DNS responsiveness than avoiding PTB, because fragments are more frequently filtered than PTB messages are not sent. Of the probes that did not receive an initial answer, 64% will receive one when responding to PTBs. Also, when answers exist in the 1232-1452 size range, more queries will be answered without fragmenting, resulting in an even bigger responsiveness increase.


  1. H. Bagheri, V. Boteanu, ”Making do with what we’ve got: Using PMTUD for a higher DNS responsiveness” (February 2013)
  2. M. de Boer, J. Bosma, “Discovering Path MTU black holes on the Internet using RIPE Atlas”, (July 2012)
  3. M. Andrews, “DNS and UDP fragmentation”, draft-andrews-dnsext-udp-fragmentation-01, (January 2012)
  4. J. van den Broek, R. van Rijswijk, A. Pras, A. Sperotto, “DNSSEC and firewalls – Deployment problems and solutions”, Private Communication, Pending Publication, (2012)
    Results are presented here: https://ripe65.ripe.net/presentations/167-20120926_-_RIPE65_-_Amsterdam_-_DNSSEC_reco_draft.pdf
  5. The PMTUD for DNS activating Proof-Of-Concept script
  6. K. Lahley, “TCP Problems with Path MTU Discovery”, RFC2923 (September 2000)
  7. W. Toorop, “Using Path MTU Discovery (PMTUD) for a higher DNS responsiveness” presentation slides at the 5th CENTR R&D workshop, Amsterdam (June 2013).


Open Recursor Blocked

April 19, 2013 by wouter

We have blocked an open recursive DNS nameserver running at NLnet Labs. This was due to abuse traffic, reflected traffic.

Two different types of abuse traffic were pointed at this server:

  • Queries of type ANY for large DNSSEC data. Sporadic bursts of about 3-5 qps, to one or two target IPv4 addresses at the same time.
  • Queries for NXDOMAIN responses, sporadic bursts of fairly low qps, with different query names for every query.

This is a low traffic volume. For a sizable Denial-Of-Service stream many more recursive resolvers must have been sent such query streams. The second type also has different query names for every query, which together with the low traffic volume would bypass RRL rate limiting.

A sample of the traffic with different query names:

Apr 10 23:08:12 : 192.x.x.x kelfmdaaaaerv0000diaaaaaaafaejam. A IN
Apr 10 23:08:12 : 192.x.x.x fcbajpaaaaerv0000diaaaaaaafaejam. A IN
Apr 10 23:08:13 : 192.x.x.x iediclaaaaerv0000diaaaaaaafaejam. A IN
Apr 10 23:08:13 : 192.x.x.x pfkgckaaaaerv0000diaaaaaaafaejam. A IN
Apr 10 23:08:13 : 192.x.x.x fjcjbdaaaaerv0000diaaaaaaafaejam. A IN
Apr 10 23:08:13 : 192.x.x.x dcdefaaaaaerv0000diaaaaaaafaejam. A IN
Apr 10 23:08:13 : 192.x.x.x eemcblaaaaerv0000diaaaaaaafaejam. A IN
Apr 10 23:08:13 : 192.x.x.x ocadmmaaaaerv0000diaaaaaaafaejam. A IN
Apr 10 23:08:13 : 192.x.x.x gblhefaaaaerv0000diaaaaaaafaejam. A IN
Apr 10 23:08:13 : 192.x.x.x mmhjaaaaaaerv0000diaaaaaaafaejam. A IN

At NLnet Labs we host this open resolver for use by the dnssec-trigger project. It is used as one of the last fallback strategies for the retrieval of DNSSEC signed DNS data. The legitimate traffic is very low to this resolver, perhaps 1 qps. Dnssec-trigger is a project that helps enable DNSSEC validation on laptop and desktop computers. Dnssec-trigger probes the environment and selects a method to retrieve DNSSEC data, if possible it uses local methods, such as the DHCP supplied DNS resolver, or contacting the DNS servers on the internet over UDP. Therefore it does not contact the open resolver unless there is an alternative.

We have blocked UDP traffic for port 53 towards this open resolver. Other ports are left open so that the resolver can use UDP on other ports to retrieve DNS data from the internet itself. Dnssec-trigger uses the TCP and SSL ports for contacting this server, because if UDP works at some internet location, then dnssec-trigger uses that to contact different DNS servers.

The firewall rule that we enabled to block UDP traffic to port 53 on the open resolver:

broer = "{, 2001:7b8:206:1:bb:: }"
block in quick on $ext_if proto { udp } from any to $broer port 53 no state

The open resolver cannot be contacted for queries on UDP, but remains usable on TCP, and on ports 80 and 443. This functionality is used by the dnssec-trigger project, so the resolver remains usable for DNSSEC deployment.


NSD 4 migration and features

December 18, 2012 by wouter

This post describes migration to NSD4 and the new features of NSD4. An overview of the NSD 4 project is here.


The old NSD3 config file can be used without changes for NSD4. There are new config statements and some old statements are gone (and ignored by NSD4).

The nsd.db file has a new format that allows read and write. Thus the nsd.db file needs to be re-created in NSD4 format. This takes place when you start NSD4 the first time; it creates the nsd.db file. NSD4 needs write permission on the nsd.db directory for that. If you need to rollback to NSD3, run the old zonec to recreate the NSD3 nsd.db file (use nsdc rebuild).

The cron job for nsdc patch is no longer needed, because nsd.db is updated on-the-fly. It can be removed.

If you favor cron jobs, you can have a cron job that does “nsd-control write”. This would periodically write the contents of changed zones to their zonefile.

nsdc is removed, reload with kill -HUP $pid or use nsd-control reload. The SIGHUP makes NSD4 check zone file timestamps and reload changed zones. nsd-control reload is the same. SIGTERM stops NSD.

You probably want to install and enable some of the new NSD 4 features, such as set-up nsd-control and statistics. And you may want to use the new pattern config options.

Removed config options

difffile: ixfr.db is gone. This setting is no longer applicable, because the ixfr.db file is no longer used. Files are created in /tmp (configurable) now. The value is ignored by NSD4 if given in nsd.conf.

New config options

zonelistfile: zone.list. This file contains a plain text listing of the dynamically added zones and their pattern. It is read and written by NSD while it is running.

xfrdir: /tmp. This directory is used to store temporary zone transfer files. They are stored in a unique subdirectory that has few access permissions.

tcp-count: 100. This option already exists in NSD3, but in NSD4 you can increase it above 1024, like 2048, to have higher TCP capacity.

remote-control: this is a new section in the config file that configures the nsd-control remote control utility. It is very similar to unbound’s remote control configuration. With control-enable: yes you can enable it, it is disabled by default. It is bound to the loopback interface by default. See the manpage or sample config for the list of options, it is possible to set the port number and keyfile paths, and configure it to be accessible from the outside.

pattern: these allow you to bundle a set of zone config statements. Then for a zone you can include-pattern: “nameofpattern” to apply those config statements. patterns can also include other patterns. This is needed to allow the user to specify the config  statement pattern for a newly added zone. But you can also use it to organise the configuration.

zone: These already exist in NSD3 and work similarly. For NSD4, they create a zone that cannot be dynamically removed because it is hardcoded in the nsd.conf file. Zones that are dynamically added can also be dynamically removed, but those zones are in the  zones.list file. The zone can have the normal zone config statements, and it can also use include-pattern to apply config statements from a pattern to it.

The nsd-control Utility

You can control the NSD4 daemon with signals, SIGHUP, SIGUSR1 and SIGTERM, if you want. It reloads on SIGHUP and this includes parsing and loading changed zone files. More commands are available via the nsd-control utility. It connects over SSL with the daemon and sends the command to it, and prints the result.

To enable nsd-control you have to create the private and public keys with nsd-control-setup, run this setup script as root. Then edit nsd.conf and set remote-control: control-enable: yes in the config file. Then you should be able to use nsd-control; the nsd-control status command is a simple check if everything works.

reload [zone] : without a zone name it checks if zone files have changed, if so, loads them. If you specify the zone name (nsd-control reload example.com) it will load that zone.

repattern : this rereads the nsd.conf file without a restart. Only the zone configuration and ratelimits are updated from it. Other settings, file paths, chroot location, interfaces, and port numbers, cannot be applied and need a restart. During the restart NSD will have the permissions to bind port 53 and chroot again.

log_reopen : also done on SIGHUP, but this controls more exactly that only the logfile is reopened.

stats and stats_noreset : print statistics.

addzone name pattern : adds a new zone to the running server. If it has a zonefile this file is read in and served. If it is a slave zone, a zone transfer is attempted.

delzone name : removes zone.

write [zone] : write a zone contents from nsd.db to its zonefile in text format. writes all changed zones, but if you specify a particular zone, it writes that zone only.

notify [zone] : for master zones, send notifies to its slaves. If you specify a name, only that zone, otherwise all master zones.

transfer [zone] : for slave zones, attempt a zone transfer from the masters. If you specify a name, only that zone, otherwise all slave zones.

force_transfer [zone] : same as transfer but uses full zone transfer with AXFR and does not perform a serial number check.


With nsd-control you can get a list of statistics from NSD on demand. This makes it easier to integrate NSD into a statistics collection system. You have to compile with –enable-bind8-stats for this. In source/contrib/nsd_munin_ is an example munin plugin.

Other features

  • Performance increase.
  • Support a high zone count.
  • Faster zone transfers.
  • Add and remove zones without a restart.
  • Can reread zone configuration from config file without a restart.
  • Higher TCP service levels, more sockets.
  • Detect which zone files have changed.
  • Stores nsec3prehash on disk, and calculates incrementally after IXFR.
  • Domain tree does not have the small leak of domain nodes.

More documentation

The nsd(8) man page, the nsd.conf(5) man page, the nsd-control(8) man page.

The man pages are installed when you install the beta.

NSD 4.0 Beta: NSD4 sees the light..

by olaf

EDIT (10 jan 2013): added beta2 link.

We are proud to announce a beta version of NSD4.0.

With this beta release NSD4.0 is feature complete. Earlier we described our high-level plans with NSD4;  below we describe the features that are available in NSD4.0 and some of what we have on the TODO list for 4.1 and 4.2; while in another post the differences in configuration and migration methodology is described.

NSD 4.0 highlights

Stable: DNS Protocol Logic

The implementation of the DNS protocol logic itself has not been touched: You may expect the same query-answer behavior as with NSD3.

Internals: Radix Tree

We have modified the internal data structure to use a radix tree. We have not done (nor are we aware of) serious benchmarks, but early reports of operators that have been doing rudimentary tests indicate that the performance of NSD4 has significantly (2 digits percentage) improved over NSD3.

Response Rate Limiting (RRL)

Inspired and based on the work by Vixie and Schryver we have implemented DNS Response Rate Limiting. The NSD RRL implementation includes most of the features as described in the Vixie and Schryver paper. The implementation is done independently from scratch (following the biological diversity requirement that is at the basis of NSD) and in some details we have made different choices in order to fit to the NSD internal architecture. More details here.

Support to operate in ‘Many Zones’ environments

We have introduced a number of changes that allows better integration of environments that want to use NSD in situations where many zones are served. This is achieved through the integration of the zone compiler as an NSD process, by adding patterns that allow the specification of common configuration options among zones, such as the location of a primary server. Using the nsd-control program, zones can be added and tied to these patterns during run-time. There is no need to stop the server, recompile, and reload in order to add new zones.

In addition, we have improved the performance of the XFR transfer code by two orders of magnitude, making the operation of NSD4 in a highly dynamic environment more effective.

We believe that this makes NSD4 suitable as a secondary server for hosting and similar environments.

NSD 4.x

We continue to work on NSD4.1 and NSD4.2.

For these versions we plan to:

  • Review logging and statistics.
  • Review incremental updates.
  • Review the specifics of the response rate limiting implementation. On this issue we are looking for community feedback, mostly through the rate limits mailing list that was specifically created for discussing DNS rate limiting.



We would like to stress that this is a beta release and we are still going through final documentation and tests. That said, we welcome reports from folk that are running their code in their test or production environments. If you want to participate in beta tests, at your own risk off course, the latest version of the code is available from

sha1 ad899c3795ca5311a1fea0d38f61026338a5ff60.

There is a new beta2 version:

sha1 e093d1519bf2e3f3c458ccf41aec45dce6a84a84
sha256 966bd0a7cdc29654df6579904d6833abfcd913428d68801f49853db7867e86a5

ISOC.NL nominatie.

November 1, 2012 by olaf

This posting contains the text that we send to the jury after our nomination for an award from the Dutch ISOC chapter in the category Security and Privacy. Try google-translate for an awful translation of this text.

Over NLnet Labs

Stichting NLnet Labs is een algemeen nut betogende instelling die zich bezig houdt met de ontwikkeling van Open Source Software en Open Standaarden ten behoeve van het Open Internet. We concentreren ons op de technologieën die nodig zijn om de openheid het Internet te behouden; technologieën die het ‘netwerk van netwerken’ efficiënt en veilig laten draaien; technologieën die een groter algemeen belang dienen.

Sinds onze oprichting in 1999 hebben we ons bezig gehouden met de ontwikkeling van DNSSEC. DNSSEC is technologie die een belangrijke component van het Internet beveiligd en waarop nieuwe veiligheidsmechanismen op kunnen worden gebaseerd. Een voorbeeld van een dergelijke innovatie is ‘DANE’, op DNS gebaseerde certificaat signalering die de barrière voor incidenten als de Diginotar-affaire een stuk hoger legt [zie recente presentatie].

Over dnssec-trigger

dnssec-trigger is software waarmee we proberen een van de problemen van DNSSEC roll-out  weg te nemen: hoe breng je de beveiliging die DNSSEC biedt uiteindelijk naar de eindgebruiker. Dnssec-trigger overbrugt dus de zogenaamde last-mile bij het brengen van DNSSEC naar de eindgebruiker en is daarmee de eerste en vooralsnog enige tool die dit OS-breed en applicatie onafhankelijk levert.

dnssec-trigger integreert Unbound, onze high-performance recursieve nameserver, met automatische configuratie die voor de gebruiker uitzoekt of DNSSEC in het netwerk ook echt te gebruiken is. dnssec-trigger doet maximale inspanning om de eindgebruiker DNSSEC te leveren: De software houdt rekening met nomadische gebruikers die verschillende netwerken met verschillende eigenschappen gebruiken – en bijvoorbeeld tegen inlogschermen (een zg. ‘captive portal’) in een hotelnetwerk aanlopen.

dnssec-trigger past in onze filosofie om kwalitatief hoogwaardige toepassingen te leveren waarmee praktische drempels met betrekking tot de uitrol van belangrijke globale verbeteringen van de Internet infrastructuur worden weggenomen. Met het leveren van dit soort software doen we waardevolle ervaring op die we terugvoeren in het IETF-standaardisatieproces en delen met de (globale) internet gemeenschap.

De dnssec-trigger software vind je op http://www.nlnetlabs.nl/projects/dnssec-trigger/ of in je favoriete software repository.

Over de Nominatie

NLnet Labs houdt zich bezig met de ontwikkeling en uitrol van technologie die misschien niet direct het predicaat sexy heeft, maar die wel een essentiële bijdrage levert aan het open en veilig houden van het Internet. Ons werk is 100% van private donaties afhankelijk. We zijn dus blij met de aandacht die de nominatie voor dnssec-trigger ons oplevert.

DNS Response Rate Limiting as implemented in NSD.

October 11, 2012 by wouter

(Note 10 Oct 2012: Rate limiting is worked on at this time, and is being tested, it is not available in NSD production code yet).

(Update 10 Dec 2012 : changed title to indicate it is based on Vixie and Schryver’s work)

Rate Limits

Rate limiting is seen as a way to combat reflection attacks, where spoofed UDP packets are bounced off DNS servers to attack a target, and rate limits stop these high volume query streams. BCP38 (source IP checks on originating networks, bcp38) is the real fix, but has not seen sufficient deployment. We would not want to complicate our software needlessly; the rate limiting must not become a target itself. But the knowledge that the DNS server has, makes rate limiting easier to implement. We chose an implementation that should be easy to separate from the DNS server, if desired. Our approach is based on RRL by Vixie and Schryver (RRL) and Tony Finch’s analysis (fanf).

The RRL (response rate limiting) is implemented in the NSD server, for convenience. This makes server operations easier and implementation faster, and it saves time classifying the response. This code is enabled with the –enable-ratelimit option for configure. We plan to implement RRL for NSD4 and NSD3. There is a whitelist option available for allowing higher traffic for some parts.

The approach is meant to be different from others, for code diversity goals, thus we do not use the code from Vixie and Schryver, but we use similar false positive prevention mechanisms. The NSD RRL code uses a fixed-size hashtable, and smoothed averaged qps rates per bucket. If two different source netblocks (/24 ip4, /64 ip6) hash to the same bucket, it is reinitialized, causing a potential false positive to turn into a false negative. Additionally, the SLIP mechanism from Vixie and Schryver is used to give TCP fallback in case of a false positive.


Wed Sep 25 2013

© Stichting NLnet Labs

Science Park 400, 1098 XH Amsterdam, The Netherlands

labs@nlnetlabs.nl, subsidised by NLnet and SIDN.