Caffeinated DNS Monitoring and the AT&T DNS Outage

An article by Dotcom-Monitor “Caffeinated DNS Monitoring and the AT&T DNS Outage” published on SpeedAwarenessMonth.com regarding the AT&T domain name server (DNS) outage of Aug. 15, 2012 demonstrates why a non-cached method of DNS monitoring results in a faster time-to-repair (TTR), and even zero downtime due to the DNS issue.

The full article is available at SpeedAwarenessMonth.com however the basics include:

AT&T DNS Outage IssueTo Cache or Not-to-Cache – that is the DNS Monitoring Question

Firstly, it is not generally well-known that external-based HTTP request-type website monitoring, like coffee at your local java joint, comes in different “grades” – cache-based and non-cache based. Dotcom-Monitor employs non-cached monitoring, which propagates through the full DNS process with each monitoring instance. Cache-based monitoring (used by many basic monitoring services) does not propagate through the DNS process and misses DNS issues.

How to Effectively Monitor for the next DNS Outage Situation

In the case of the AT&T DNS outage issue there are several key factors that help to speed up Time-to-Repair (TTR), or avoiding downtime:

  • Error Detection method: Use a monitoring solution that uses a non-cache method to propagate DNS queries all the way through to root name servers with each monitoring instance. A cache-method service caches DNS and therefore will not detect a secondary DNS issue at all, or it may take days or even weeks to detect the issue.
  • Frequency of monitoring: Use a faster frequency of non-cache monitoring, such as every 1-minute versus once per hour. The faster the non-cache monitoring solution detects and alerts an impacted administrator of a website using a failing DNS service, the faster a switch can be made to a DNS fail-over provider.
  • Value of Time-to-Live (TTL) setting: The smaller the value of the TTL setting used by the DNS administrator to persist the IP caching of the a domain from the primary authoritative name server the faster the fail-over to another DNS provider may be implemented. Typically set to 86,400 seconds (1-day)  or more, in disaster recovery planning the TTL can be set as low as once every 300 seconds, however the lower the setting the higher the load on the authoritative domain name server.
  • Diagnostics – such as an automatic trace-route at the time of the detected DNS problem – is provided by the monitoring solution (keep in mind that many basic monitoring services do not provide any diagnostic info).
  • Repair:  Continue monitoring during the error condition to further pinpoint the issue. Send the monitored results to your DNS provider. You can also run free manual DNS trace-routes at www.dotcom-monitor.com/WebTools/trace.asp (select Trace Style “DNS”) to verify the issue as needed.
  • Prevent: Keep an eye on “soft error” DNS issues, such as DNS slowdowns and intermittent DNS outages, so you can take action before the “soft error” becomes a “hard error” such as a customer facing downtime.

Thanks, I’ll take the Caffeinated Double Depth Charge, Non-cached

Its clear then that a combination of non-cache and other factors limit the downtime exposure due to issues like the AT&T DNS outage of Aug. 15, 2012. Furthermore,  a non-cached method of DNS monitoring is a critical factor in a faster TTR, and even zero downtime.

Finally, it is important to remember that TTR determines the loss due to downtime. In other words, the longer total time it takes to detect, diagnose, and repair a DNS problem the worse the impact of the DNS issue. Conversely, the faster a monitoring solution speeds up TTR the more the loss is reduced, or completely avoided.

Similar to a good strong cup of caffeinated coffee a non-cache method can make the difference between a downtime day and a fast productive day.

For more on the AT&T DNS outage see our article, Doing DNS Monitoring Right: The AT&T DNS Outage.

Latest Web Performance Articles​

Top 15 Infrastructure Monitoring Tools

Infrastructure monitoring tools ensure systems’ optimal performance and availability, enabling the identification and resolution of potential issues before they become complex. This article delves into

Top 20 Server Monitoring Tools of 2023

A server monitoring tool is software that monitors the operation and general health of servers and other components of IT infrastructure. These tools continuously track

Top 25 Server Monitoring Tools

In this article we give our expert picks of the top 25 server monitoring tools to help monitor your website’s uptime and give your users the best experience, starting with our own solution at Dotcom-Monitor. Learn why server monitoring is an essential part of any monitoring strategy.

Top 20 Synthetic Monitoring Tools

Synthetic monitoring allows teams to monitor and measure website and web application performance around the clock from every conceivable vantage point, and to receive alerts before issues begin to impact real users. Here are our top picks for synthetic monitoring tools, leading with our own at Dotcom-Monitor.

Start Dotcom-Monitor for free today​

No Credit Card Required