Joined: 11 Sep 2005
|Posted: Tue Sep 09, 2008 11:39 am Post subject: 1 of n Name Servers DOWN = No Problem?
|We deal with many customers and see all kinds of things that can cause a website to be DOWN. A problem that is commonly misunderstood is if you have a name server DOWN, it will not cause any problem as other name servers can still be queried. Although DNS is designed to fail over automatically, in real life some name resolution requests will still fail if you have a DOWN name server. We have explained this to our customers many times and it pays to write an article for the benefits of all.
Suppose you have a domain name (yourdomain.com) with DNS hosted on 2 name servers (ns1.dnshost.com and ns2.dnshost.com). For now, ns1.dnshost.com is DOWN.
This is what happens when you try to resolve yourdomain.com. Your PC requests your ISP's name server for IP of yourdomain.com (ie. your OS resolver -> your ISP's Bind). When your ISP's name server tries to resolve yourdomain.com, it will query a random name server (ns1.dnshost.com OR ns2.dnshost.com). If that name server is DOWN, it has to retry querying other name servers. This process can take a long time, causing your OS's resolver to time out and therefore yourdomain.com is unresolved for you. Since 1 of 2 name servers is DOWN in this case, the chance of failure is 50%. Even if you have 10 name servers, as long as one of them is DOWN, it will be felt by some of your users (although a smaller (10%) population).
An excerpt from RFC2182:
|First, the only way the resolvers can determine that these addresses are, in fact, unreachable, is to try them. They then need to wait on a lack of response timeout (or occasionally an ICMP error response) to know that the address cannot be used. Further, even that is generally indistinguishable from a simple packet loss, so the sequence must be repeated, several times, to give any real evidence of an unreachable server. All of this probing and timeout may take sufficiently long that the original client program or user will decide that no answer is available, leading to an apparent failure of the zone.
If you use our monitoring service and your domain encounters a problem like mentioned above, our system will show that your website is UP and DOWN intermittently because name resolution may fail at each monitoring location randomly.