When it comes to DNS errors triggered during web monitoring, it is not typically easy to quickly identify and understand the exact issue that leads to connection errors. There is a lot that goes on behind the scenes (in just a matter of seconds) that we take for granted. So, when DNS issues occur, the impacts may be felt immediately across a wide variety of resources and internal and external users. Websites experiencing DNS errors may result in user experience issues, leading to discouraged users and loss of revenue for the business. Even more, if DNS configurations are improperly set, or issues go unchecked, it can lead to a loss of domain authority.
In this article, we will walk you through some DNS error troubleshooting steps.
For common recommendations on troubleshooting errors, please visit the Troubleshooting Monitoring Errors article.
What is DNS and How Does it Work?
What is DNS and How Does it Work?
DNS stands for Domain Name System and it acts a directory to the Internet. It is what allows all of us to access websites and connects devices across the Internet. It acts as a translator between humans and machines. Everything from mobile phones and laptops to servers, routers, and networking hardware all rely on the DNS. Think of the DNS like a large, spread out tree, with various layers and groups, or zones, that make up the DNS. We will talk more about these DNS layers in the next few sections.
How DNS Works: Domain Names
Domain names, like google.com or dotcom-monitor.com, are the human-readable forms of domains. These domain names are managed by group called the Internet Corporation for Assigned Names and Numbers, or ICANN for short. Instead having to remember a long string of random numbers that make up an IP address, it is quicker and easier to remember and type in a domain name into our browser and have the DNS translate into its IP address and find it for us. However, there are multiple pieces that make up a domain name, and depending on how they are structured, they can tell us different things about that domain.
The top-level domain, or TLD, is the piece at the end of the domain name. So, if we use google.com as our example, the top-level domain is .com. These top-level domains located in what is called the DNS root zone. You’ve likely come across some of the more common different top-level domains, such as .org, .edu, .gov, and .net. These top-level domains are also used to distinguish domains by country, so they’re given specific country code, like .pr for Puerto Rico or .cn for China. As of mid-2020, there were over 1,500 different top-level domains.
The second-level domain, or SLD, is the portion before the top-level domain, so sticking with our google.com example, the second-level domain would be the google portion.
Websites can be expanded further by creating subdomains. Subdomains are primarily used to separate and differentiate specific content from the main domain. Subdomains help with SEO and provide a better way for users to navigate through resources. Within the domain structure, it will come before the second-level domain, so using our google.com example, some examples of their subdomains include news.google.com and photos.google.com.
So, now that we’ve covered a bit about the specific aspects of domain names, let us cover the process of what happens when we type in a domain address.
How DNS Works: DNS Servers
Once you enter the address in your browser, the DNS is put to work and the process begins behind the scenes. The first thing that happens is that your request may go through a series of servers (if there is no caching involved). These servers contain the HTML files, images, videos, etc., that make up the contents of the request and provide that content to your browser. There a four types of DNS servers: recursive resolvers, root nameservers, TLD nameservers, and authoritative nameservers.
DNS Recursive Resolver
The recursive resolver, also sometimes referred to as DNS recursor, is first on the list of stops. This server is provided by the Internet Service Provider (ISP) and sits between the client/browser and the nameservers. After receiving the DNS request, the recursive resolver will respond by providing any cached content or sending the request on to the next level, which would be the root nameserver, TLD nameserver, and finishing up with the authoritative nameserver. After the authoritative nameserver sends the response to the recursive resolver, the recursive resolver sends the final response to the client/browser.
If caching was implemented and the request was made within the TTL (time to live) period, then the request stops with recursive resolver and it responds to the client/browser, providing a quicker response and better user experience. Due to this reason, recursive resolvers are referred to as non-authoritative DNS servers as they only provide a response based upon the last good cache from the authoritative nameservers.
DNS Root Nameservers
There are 13 root nameservers operated and managed by 12 organizations around the world, however, that does not mean there are only 13 servers that can be accessed. There are multiple instances of each server around the world to provide faster response times, depending on the location of a user. Root servers are assigned a letter from A to M, which defines the organization that operates that root server. These root serves are operated by various organizations, universities, and government and military groups like the DoD, Army, and NASA. For more information on root servers and their locations, visit root-servers.org
Once the DNS root server receives the request from the recursive resolver, it directs the request to the proper TLD nameserver based upon the top-level domain extensions we mentioned previously, like .com, .org, .edu, etc. There is a TLD nameserver for every website extension.
Last in the DNS process is the authoritative nameserver and where the IP address is stored. If the authoritative server includes the IP record being requested, it will respond to the recursive resolver with the IP address from the initial domain request. The authoritative nameserver where the IP address is stored is set by the domain name registrar. Once the authoritative nameserver details are set by the registrar, the registrar tells the TLD nameserver authority to update its records with the details of the authoritative name server. That way, the TLD nameserver knows which authoritative nameserver will provide the IP address of the requested website. The authoritative nameservers are the source of the DNS resource records, which we will talk about in more detail below.
Why Do DNS Errors Occur?
In general, the most common reasons for DNS errors, like DNS timeouts or DNS misconfigurations, occurs with the DNS service provider and the impact to users can vary depending on the severity of the error. The DNS is a vast and complex network. Some issues may be hardware related and can be resolved quickly, but some errors may require more in-depth DNS troubleshooting, resulting in additional resources and time to fix.
For example, DNS tree propagation can be considerably long. The DNS chain to the authoritative DNS nameserver where the IP address of the host is stored is long, which increases the DNS resolution time. In this case, the DNS timeout error can be received due to inconsistency between a monitoring timeout limit and time it takes to complete DNS resolving.
If DNS resolving takes more than 11 seconds (allowed by default), the DNS timeout error will be generated. Note that this is not necessarily a website availability issue, but a long timeout that was terminated.
DNS Troubleshooting Tips & Best Practices
One of the first troubleshooting steps we recommend is checking the DNS tree and understand at which point a DNS server issue or timeout occurred. We recommend pulling an Online Report for the monitoring device, and on the Log tab, use the built-in DNS Trace option.
To build a DNS tree in real-time, use the Dotcom-Monitor DNS trace tool. On the DNS Trace tab, enter the IP and location and start the test. The propagation is started from the root servers as it is executed for the Device Cached or Non-Cached DNS modes.
Troubleshooting DNS Errors
To troubleshoot a DNS error, review the error description provided for each server node of the DNS tree as shown in the picture above (step 2).
If you experience DNS errors often, you can set up a separate DNS task for the domain in question (the Verify Response On option set to First Responding) and set up a specific monitoring frequency to check the domain name resolution into an IP.
Other areas to investigate when troubleshooting DNS errors include network devices, browsers and/or devices, DNS records, service provider, and latency issues. It is good practice to have a checklist of items to troubleshoot. Like troubleshooting anything, it is best to start with the item(s) that are easiest to review and check and then move down the list to the more difficult and time-consuming items, such as troubleshooting misconfigured DNS records or reviewing latency issues.
DNS Response and Error Codes
DNS Response and Error Codes
While there are over 20 DNS response codes that can result from a DNS query, users will typically only ever see a handful of these over time. These response codes do not necessarily mean there is an error, but many response/error codes do indicate an error was encountered. We will talk about some of the more common DNS response codes, what they mean, and steps you can take to resolve them.
DNS Response Code 0: NoError
The NoError response indicates the DNS query that was processed came back valid. However, it does not necessarily mean there aren’t any issues. The NoError response won’t indicate performance-specific issues. For example, if the DNS query took longer than expected, there could be a network issue even though the query came back as NoError. In cases like this, it is important to look through all the DNS servers the query goes through and that they are configured properly. For more information about the NoError response code, see RFC6895, Section 2.3 and RFC1035.
DNS Response Code 1: FormErr
The FormErr response code refers to format errors. Like other response codes, format errors can occur to firewall issues or issues with networking hardware, such as a router, that is shortening, or truncating DNS responses that are too big or exceeds a specific size limit. For more information on FormErr response codes, see RFC6895, Section 2.3 and RFC1035.
DNS Response Code 2: SERVFail
The SERVFail response code indicates that the DNS server can’t give you a response to your query. A lot of domains use multiple authoritative DNS nameservers, so SERVFail responses aren’t typically common, and the DNS query should work on a different server. When they do occur, it could be due to networking glitch with the DNS server or possibly an issue with a firewall blocking the domain. The SERVFail response code is rare but should be investigated if they persistently occur. For more information on the SERVFail response code, see RFC6895, Section 2.3 and RFC1035.
DNS Response Code 3: NXDomain
The NXDomain response code indicates a non-existent domain. It means that the DNS query responded back that the requested domain doesn’t exist, and the query failed. If it didn’t fail, the response would have been the NoError response. There are few reasons that an NXDomain response could come back. One reason is human error and that the query included a typo, which is easily fixed. Other reasons for a NXDomain response code could be due to a network or security issue, like DNS hijacking. This type of issue would require a lot more in-depth troubleshooting to find the root of the cause. For more information on the NXDomain response code, see RFC6895, Section 2.3 and RFC1035.
DNS Response Code 4: Not Implemented
The Not Implemented response code indicates that a server that received a request for an update and didn’t implemented the update, either due to not recognizing the opcode (operation code or command) or recognizing the opcode, but not implementing it. See RFC6895, Section 2.3 and RFC1035 for more information about the Not Implemented response code.
DNS Response Code 5: Refused
The Refused response code means that, due to a policy issues, the DNS nameserver refuses to perform a specific operation. For example, the requestor may not have the right credentials or authorization to carry out the request. For more information about the Refused response code, see RFC6895, Section 2.3 and RFC1035.
DNS Response Code 6: YXDomain
DNS Response Code 7: YXRRSet
The YXRRSet response code means that a Resource Records Set (RRSets) exists but should not exist. The DNS allows for many record sets. For example, A records are what map a domain to a specific IPv4 address. A set of A records for a given domain is called an RRSet. See RFC6895, Section 2.3 and RFC1236 for more information.
DNS Response Code 8: NXRRSet
DNS Response Code 9: NotAuth
Under the NotAuth response code, there are two different descriptions, Server Not Authoritative for zone and Not Authorized. These response codes indicate the primary connection a domain uses to validate a zone is invalid. Within DNS, authorizations can be configured by zone. See RFC6895, Section 2.3, RFC2136, and RFC2845 for more information.
DNS Response Code 10: NotZone
The NotZone response code is used in defining the prerequisites for a specific zone. If there are any RRs that aren’t part of the domain, the NotZone response code is returned. See RFC6895, Section 2.3 and RFC2136 for more information.
DNS Response Codes 11-15: Unassigned
DNS response codes 11-15 are currently in unassigned status.
DNS Response Code 16: BADVERS & BADSIG
There are two DNS response codes that have the same RCode of 16. The BADVERS response code means that it encountered a bad OPT version. OPT refers to the mechanism to add responses as a message in the additional data section of a DNS message. Initially, DNS messages were carried through the UDP (User Datagram Protocol) but were bound by size limitations (512 bytes). OPT was introduced to work with the previous protocol, but also extend the space for DNS messages.
The BADSIG response code means a bad transaction signature (TSIG) or a failure to verify the signature. A TSIG allows for authenticating updates to databases and DNS endpoints. Unlike DNS queries, updates to the DNS must be authenticated. Along with the BADSIG response, the server also sends the NotAuth response code. See RFC6895, Section 2.3, RFC6891, and RFC2845 for more information.
DNS Response Code 17: BADKEY
The BADKEY response code is in response to a request that included a TSIG, but the key was not recognized by the server. When this occurs, the BADKEY response, as well as the NotAuth response, is generated by the server. For more information, see RFC6895, Section 2.3 and RFC2845.
DNS Response Code 18: BADTIME
The BADTIME response code relates to time synchronization and assists with preventing DDoS attacks. It indicates that a TSIG request was received by the server, but outside of the time specified by the request. Along with the BADTIME response, the NotAuth response code would also be sent. See RFC6895, Section 2.3 and RFC2845 for more information.
DNS Response Code 19: BADMODE
Within the DNS, there is a record type called TKEY (Transaction Key). The TKEY is used for establishing secret keys between servers and resolvers. The BADMODE response code indicates that a server received a TKEY request but doesn’t support one of the pre-defined modes/values. For more information, see RFC6895, Section 2.3 and RFC2930.
DNS Response Code 20: BADNAME
The BADNAME response code is associate with the Name field, but more specifically key names. According to the IETF, only one octet string of keying material may be in place for any key name. The BADNAME response occurs when there has been an attempt to use a new name for the existing key name. This response code would also be sent if a server has no record of a particular key name. See RFC6895, Section 2.3 and RFC2930 for more information.
DNS Response Code 21: BADALG
A DNS response code of BADALG indicates that the server received a request for an algorithm that it did not support. The algorithm in this instance is a field within the TKEY resource record and helps to define how secret key data is utilized and shared. For more information, see RFC6895, Section 2.3 and RFC2930.
DNS Response Code 22: BADTRUNC
The BADTRUNC response code is indicated when there is a truncation related issue when message authentication code, or MAC, is utilized. When the MAC code is shorter than what the truncation policy allows, then the BADTRUNC, as well as the NotAuth, response codes are returned. See RFC6895, Section 2.3 and RFC4635 for more information.
DNS Response Code 23: BADCOOKIE
Cookies were introduced to help alleviate DNS attacks, like cache poisoning, that DNSSEC (Domain Name Security Extensions) were unable to prevent. The cookie is made up of the client and server IP, as well as secret that should be at least 64 bits long. When DNS and client-side cookies are utilized, the BADCOOKIE response indicates when there is a missing or malformed cookie. See RFC6895, Section 2.3 and for more information.
There are additional placeholder response codes, but they are either unassigned, used for private use, or reserved. To read more about the DNS error and response codes, and to see the full list of DNS Parameters, visit the IANA.org website.
Troubleshooting DNS Timeout
If a DNS server timeout is the issue, we recommended to contact your provider to check if any DNS servers on their side have performance degradation. Also, an overloaded DNS server can slow down the server response time to a name request.
If there are no DNS server performance issues, use one of the suggested approaches:
- Change the DNS resolve mode to TTL Cached.
- Use an External DNS Server.
To deal with DNS timeout errors, you can change the resolving mode from Device Сached to TTL Сached.
By default, in the Device Cached mode, Dotcom-Monitor makes a full resolution from root DNS servers without any caching on every check. In other words, this is the most reliable approach because the entire DNS chain, starting from the root DNS server, is checked. However, the disadvantage of this approach is that it increases device execution time and in the case of a long DNS tree, can lead to timeout issues.
On the other hand, the TTL Cached mode allows for mimicking a DNS lookup as it is executed on an actual user’s computer. To resolve a website IP address, the DNS lookup information is cached locally on a user’s computer. On the first request for a domain’s IP address, the DNS record is saved to the cache and used on subsequent requests to the domain, which helps to speed up the process by skipping all the lookup steps from the DNS resolution process, better for the overall user experience.
Similarly, a local DNS server with an installed monitoring agent is used to pre-cache lookup information in the TTL Cached mode. The DNS records are saved in the local server’s cache during TTL. Depending on what the TTL is for a specific host (usually around a day), it will rarely be requested. Therefore, the possibility of getting a timeout is reduced significantly.
To use a public caching service, such as Google (18.104.22.168, 22.214.171.124) or Cloudflare (126.96.36.199), change the DNS mode to External DNS Server. Dotcom-Monitor will fetch IP addresses from a public service’s DNS cache. This option will reduce DNS resolving time and can help to troubleshoot a timeout issue.
DNS Resource Records
DNS resource records (RRS) are essentially what hold the information to everything related to your domain. Simply put, DNS records help point users in the right direction to anything related to your domain. For example, DNS records are important for directing emails to the right email server or the right domain to the right web server, and so on. There are well over 40 different types of resource records (and many more that have been retired) that cover a wide array of categories and different functions, such as mail records and website records to information records. We will cover the all the DNS resource record types supported by the Dotcom-Monitor platform.
Dotcom-Monitor Supported DNS Resource Record Types
IPv4 Address Record: A
The address record type specifies an IPv4 address that corresponds with your domain and subdomains. There are two fields associated with address records: IP address and domain. For example, an IP address for google.com is 188.8.131.52. Instead of having to remember this random set of numbers and keying in the 10-digit IPv4 address, address records allow users to just type google.com and they directed to the correct site. There can also be address records for subdomains with different IP addresses. In terms of business goals, subdomains are useful for companies that may operate in different countries.
Subdomains are an effective way to direct customers/users to their respective country-specific site or if your site has a lot of information that needs to be organized in a more efficient, user friendly way. Subdomains can also be used to differentiate between subsets of customers, B2C versus B2B audiences. If we use the example of google.com again, they have subdomains for their image site (images.google.com), maps (maps.google.com), news (news.google.com), and so forth. Furthermore, using address records, subdomains can be setup on different servers/web hosts. The benefit being that if your main site goes down, you do not lose everything. For more information about address records, see RFC1035, Section 3.4.1
IPv6 Address Record: AAAA
Like the aforementioned address records, the AAAA records are used to associate an IPv6 address with a host name. IPv6 is the sixth version of the IP (Internet Protocol) standard. IPv6 functions the same as IPv4, providing unique IP addresses to devices, however, one of the key differences is that IPv4 uses a 32-bit IP address. IPv6 uses a 128-bit IP address. When IPv4 was introduced, it was able to support almost 4.3 billion unique IP addresses. However, the Internet has already assigned more IP addresses than that, so IPv6 was the next generation to address and support the continuous growth and demand for IP addresses. For more information about IPv6 address records, see RFC3596, Section 2.1
Canonical Name Record: CNAME
Canonical name records are used in redirection, or alias, for a domain or subdomain. A canonical name record is made up of a couple of fields, the Name and Destination fields. As with different hosting providers, there may also be some custom fields associated with canonical name records. A common usage of the canonical name record is setting up your website domain with the www and one without. This is to ensure that users get pointed to the same domain regardless of how they enter your URL. It can also be used with setting up email, such as Microsoft Exchange. There may be additional canonical records required for email, mobile, SIP (Session Initiation Protocol), etc., that are used to point to your domain. For more information on canonical name records, see RFC1035, Section 3.3.1
Mail Exchange Record: MX
The mail exchange records are related to your email server/hosting provider and help to manage the delivery of emails. Typically, there are multiple mail exchange records in your DNS for any email server/hosting provider to serve as a backup if the primary record is unavailable. Email providers typically have their own respective mail exchange records; however, Microsoft Exchange mail exchange records can vary. For more information on mail exchange records, see RFC1035, Section 3.3.9
There are also several fields that make up mail exchange records, including your domain, mail server, priority, and TTL fields. The priority field is used when there are multiple mail exchange records and the lower the number, the higher the priority. Again, by issuing a priority is typically used with primary and backup email servers. TTL (Time to Live) refers to the caching mechanism that tells the sender’s email server how long to wait (usually defined in milliseconds) before contacting the receiver’s DNS server to check for any changes to the record. For more information about mail exchange records, see RFC1035, Section 3.3.9 and RFC7505.
Name Authority Pointer: NAPTR
Name Authority Pointer records are typically used with applications that require Internet telephony or when mapping users and servers within the SIP protocol. NAPTR records are used with SRV records to find out which services are available for a domain/hostname, such as SIP, email, or web, etc. For example, SIP can be run over different protocols, but the address record (A), won’t tell you which protocols to use. For more information about the name authority pointer record, see RFC2915.
Name Server Record: NS
The name server is where all your DNS records are stored. Specifically, name server records help users to access your website through your domain rather than a complex IP address. Name servers point your domain to where your site is hosted. Domain registration and site hosting can be managed by separate entities, but usually, domain registrars can supply web hosting (or vice versa). If you want to change any other records, such as website or email related records, it must be done where your name server is located. Name server records are typically stored with your hosting company. The actual name server names are usually created by default by your hosting provider but can be changed if necessary. This allows site owners to change their web host without changing the domain registrar, for example. For more information about name server records, see RFC1035, Section 3.3.11
Domain Name Pointer Record: PTR
Pointer records, also referred to rDNS (reverse DNS), are used for reverse DNS lookups. Within the DNS, there can be multiple domain names pointed to a single IP address, or multiple domains hosted on the same web server. However, with a pointer record, or reverse DNS lookup, you point, or associate, an IP address to a specific domain name. Like other DNS resource records, pointer records consist of various fields, including name, TTL, class, type, and data. The most common use for reverse DNS lookups is associated with traceroute and ping networking tools, as well as with SMTP to trace header fields. For additional information about pointer records, see RFC1035, Section 3.3.12
Start of Authority Record: SOA
The start of authority resource record provides domain administrators the way to provide additional information about the domain or zone, such as update frequency, administrator contact information, expire, TTL, etc. A DNS zone is considered the space that is managed by an administrator or company. Within DNS zones, there are additional levels, or zones, where a variety of records and attributes can be managed and organized that point back, and through, to the top-level domain. For more information about the start of authority records, see RFC1035, Section 3.3.13 and RFC2308.
Sender Policy Framework Record: SPF
The Sender Policy Framework record that is used to define which SMTP servers can use your domain. It helps to prevent your email servers from being used by spammers or blacklisted domains. It is also important for maintaining domain authority and trust with other email servers. If your domain is blacklisted, it is likely emails will be rejected by other email servers. Implementing an SPF record is done by simply adding a text record (TXT) to a DNS zone that specifies which IP addresses are approved to send email from the domain. For more information on the Sender Policy Framework, see RFC7208.
Service Record: SRV
Like the Name Authority Pointer Records, the Service record is primarily used in Internet telephony to assist with service defining or discovering SIP services. The Service record follows a specific format that includes and identifies the following information: name of the service, transport protocol (TCP or UDP), domain name, TTL, class, type of record (which is SRV), priority of the target host, weight (the higher the weight, the better the chance of getting picked by the server), port used, and target. For more information about the Service record, see RFC2782.
Text Record: TXT
Text records are used for a variety of functions, such as verifying your domain ownership with third-party services or email automation services. Text fields consist of a couple of fields, Name and Content. The Name field is typically related to your domain. The Content field is used to provide the verification codes, so the information in this field varies depending on the service used and can be completely random. For more information on text records, see RFC1035, Section 3.3.14
The Importance of DNS Monitoring
Setting up and configuring DNS monitoring is important for many reasons, but the primary reason is to ensure that any network and website outages or slow response times are kept to a minimum and don’t impact the user experience. And more importantly, when they do happen, they can be identified quickly to prevent more users from being affected. Users that can’t access your websites are going to be frustrated and quickly go elsewhere to find what they were looking for. You want to be sure to you and your teams are notified immediately if something goes wrong so you can begin working on a fix as quickly as possible. DNS monitoring is also important for the following factors:
Meeting Service Level Agreements
Not only is DNS monitoring good for catching configuration issues, but DNS monitoring can also be utilized to ensure SLAs are being met by your service providers. DNS monitoring is also critical for ensuring SLAs are being met and uptime is within the agreed upon thresholds. If not, DNS monitoring can show you where the issues lie and provide you with the information to start the process of resolving the problem. In some cases, it could be a third-party service that you have no control over.
DNS hijacking also referred to as DNS poisoning or DNS spoofing, refers to the process of changing DNS cache so that it points to a different website for malicious purposes. Attackers can impersonate the authoritative nameservers and respond with fake information back to the user where the attackers can point unsuspecting users to fake sites.
DDoS attacks are an attempt to bring down a website by flooding a server, service, website, or network with traffic. There are several types of DDoS attacks that can affect different components, but a DDoS attack to the DNS server is called a DNS amplification. Essentially, what occurs is that an attacker creates a request for not just a domain, but everything related to that domain, and repeats that with multiple queries and multiple open DNS resolvers. DNS monitoring ensures that if any tampering with your DNS records or configuration issues are occurring, alerts are sent to the appropriate teams so they can be dealt with immediately, helping to protect your company from potential risks or loss of revenue and users.
Building and Maintaining Customer Trust
Trust is a major factor in why customers choose one product or service over another. If your website is experiencing issues, whether it is an issue you even have control over or not, customers may view that as a reflection of your products or services. Their view will be that your site is unreliable, so how can they trust that your products or services are any different.
Monitoring DNS Performance: Good for Business and the Bottom Line
Although DNS far to often gets overlooked and taken for granted, it is a critical and essential piece for customers and businesses alike, which is why DNS monitoring is so important for any business that has an online presence. With DNS monitoring from Dotcom-Monitor, you can monitor DNS availability and health 24/7 from those locations around the world where your customers reside and get alerted to any errors that occur along the DNS tree.