Service Interruption Summary (2021/05/03) and Capacity Plans
Quad9 was the target of a distributed denial of service attack beginning at 16:10 UTC on May 3, and lasting at the most-impacted sites for around ninety minutes. Though service was not degraded at the majority of our locations (most cities saw no disruptions at all) it was the case that Quad9 users in some of the largest POP locations in North America and Europe, and to a lesser degree in Asia, may have noticed a high percentage of DNS resolution failures or slow performance for a portion or all of that time.
The brief summary is that this attack was brief though significant, and was focused on a few large cities where we have dense interconnections with other networks. Quad9 has already been working to expand significantly to be able to make attacks like this less problematic in the future, both by adding resources in major cities as well as expanding the number of locations from which we serve DNS requests.
Background of DDoS Attack Traffic Flows
Like all large networks, Quad9 interconnects with the rest of the Internet via peering for the majority of packet exchanges. Peering is the bilateral exchange of traffic between our network and other Internet networks, such as those of the Internet Service Providers who supply Internet bandwidth to Quad9’s users. When a user sends a query to Quad9’s servers, the user’s ISP chooses a nearby Internet Exchange Point (IXP) and delivers the query to us there. We send the reply back to them, and they forward it back to the user. Many of the largest ISPs have dedicated interconnections with our network at many IXPs, usually at ten to one hundred gigabits per second in each location.
In a distributed denial of service (DDoS) attack, the attacker depends upon “botted” machines; computers that belong to regular users in their homes and offices, but which have been infected by malware which misappropriates users’ Internet bandwidth to send attack traffic. That attack traffic competes with legitimate traffic, and if there is enough of it relative to the legitimate traffic, the invalid packets squeeze the legitimate traffic out, and the attack succeeds by denying service to legitimate users. (Quad9 actually defends end users against many of the variants of these types of attacks, by blocking the malware command and control servers which allow the authors or operators of these DDoS networks to undertake their attacks, but we can’t compel anyone to use Quad9, so today still most systems are unprotected and can be used to generate attack traffic if they become infected.)
Smaller ISPs, and ISPs which are not aggressively upgrading their infrastructure, may depend upon shared ports at the IXPs, or may depend upon a longer-path transit provider to reach our networks. In these cases, the legitimate DNS queries that users send us can be mixed together with significant volumes of attack traffic before reaching constrained bottlenecks. In these bottlenecks, legitimate packets are discarded along with attack traffic in the contention to pass through to our network.
This particular attack was a volumetric reflection attack, which utilizes weaknesses in protocols and insufficiently secured servers to amplify traffic from botted machines before it targeted our network. It is unclear as to why the attack was launched, but that is not an unusual condition and determining intent is not usually easy. Early analysis indicates that it possibly used CLDAP amplification, a type of attack which is well understood but is unfortunately still effective, possibly combined with other amplification attack methods. Our routing and filtering infrastructure was able to absorb the traffic where it was seen, so that none of the attack traffic reached our actual servers, which saw no unusual traffic during the attack, but contention at capacity bottlenecks further away in the Internet and outside of our control meant that during the attack many legitimate users were denied access to those servers.
Quad9 is under constant attack at low levels. Some of those attacks are at the protocol level - malformed DNS packets, high volumes of queries to specific domains or hosts, “ping flood” attacks, or traffic that intentionally or unintentionally tries to cause our systems and software to fail or cause unexpected results against some third party. These are typically absorbed without much incident, as we have a large number of locations (roughly 170) and capacity at each location is normally sufficient to withstand these events without legitimate clients noticing any issues. However, DDoS events are becoming more challenging, with hundreds or even thousands of gigabits per second of attack traffic, and in the most concentrated forms some sites may experience packet loss while others remain operational. The patterns of attack traffic change depending on the origin of the attack, and there are a few ways that Quad9 is working to decrease the chances of an attack from causing outages.
Peer Locally, Peer Often
The internet functions best as a cooperative model - network operators and content publishers work together to improve the experience to end users. To reduce the impact of these types of attacks, we hope that network operators will cooperate to build a more richly-interconnected Internet.
In this case, users who were unaffected were those whose Internet service providers were not hosting significant numbers of CLDAP reflecting machines, and whose ISPs were connected to us directly, using interconnections of sufficient capacity and which were distributed across a suitable geography. We do not charge for interconnection or for our services, and anyone is welcome to meet at any shared public Internet exchange point and interconnect freely with our peering partner PCH at 10gbps or 100gbps. Doing so, and doing so in as many locations as possible, is what helps to keep DDoS attacks from succeeding
In locations like Frankfurt, we see a significant volume of traffic from operators in whose countries Quad9 has equipment, but many of those same operators do not peer with Quad9 or our peering partners in their own nation. We see this as a missed opportunity, and a clear case where such failures to peer locally cause fragility in the network in ways that are not obvious at first glance. Failure to peer locally by larger networks is also often an abuse of a market-dominant position by operators within their own country and an invitation to national communications regulators to step in and correct that abuse, and none of us really want an Internet architected by regulators. When interconnections are distributed as widely as possible, DDoS attacks like this one become more dissipated rather than concentrated.
This instance was a good example of how that cooperation and distribution can improve performance even under adverse attack conditions: our national and regional IX locations and the networks that connect to them saw fewer or no interruptions due to DDoS traffic overload, but some "peering hub" locations with inordinately dense volumes of interconnected networks were swamped with traffic. Hotspots on the network are obvious points of focused trouble during a DDoS, and wider peering at more diverse locations would help alleviate those issues in many cases, or at least reduce the number of users who would see the effects of an attack.
We realize that peering is not always an option due to policies, costs, or political realities, and even when peering is well-distributed there are still bottlenecks in certain interconnection models, such as very “hot” connections to web hosting providers who may be generating disproportionally large amounts of attack traffic, or networks which have only a very local presence and few IX location opportunities in which to interconnect. To counter this, we are constantly working to find ways to make Quad9 more robust even where other networks have difficulty interconnecting directly or where density is naturally high (see below) but we would still hope that network operators peer locally - it will improve the experience for everyone’s user base, not just for Quad9 DNS traffic but for all traffic destinations and origins.
How Quad9 Works To Stay Ahead of DDoS Attacks
While there are mitigation techniques that can be applied to volumetric DDoS attacks, a successful defense is mostly a matter of being large enough to continue servicing all legitimate requests while simultaneously ingesting and discarding the attack traffic - there is no substitute for "more" as a strategy. In this context, "more" means larger capacity ports, in more IX locations, with more machines, and with more transit providers and more peering interconnections.
Quad9 is working on delivering all five of these solutions:
- In our largest cities by query volume, Quad9 is upgrading port capacity between our servers and our transit and peering partners as well as installing additional equipment.
- The majority of our largest locations were already slated for either significant server capacity upgrades (~2x) or interconnect upgrades (~10x) or both in the next 30-60 days.
- Our peering partner PCH has recently announced the first of many interconnection upgrades which move from 10G to 100G circuits at the largest IX locations, which provides more capacity upstream of our equipment and allows for more peering sessions to be established at higher traffic volumes.
- Quad9 continues to expand both geographically and across more IX locations: In the last two months, the service has been activated in 6 new locations, and we have another 30 locations that are pending deployment or activation in roughly the next 60 days which will bring Quad9 into more than 200 locations worldwide.
- We have several announcements pending about expansion with new transit and peering sponsorships which will significantly reduce latency, add bandwidth capacity and DDoS resiliency, and increase our server footprint massively and which will be the most important changes in our network in several years - we'll have a posting on that shortly on our blog as well. If your network or hosting firm has a multi-continental footprint and a robust BGP community structure for transit ASNs, we would be very interested in discussing opportunities for sponsorship that would allow Quad9 to continue to expand.
We have been anticipating both growth of our customer base but also growth of these types of attacks, but there is no such thing as a network that is entirely insulated from DDoS effects. We can only hope to reduce the negative results, but we cannot prevent the attacks. We are pursuing what we think are the solutions from both a short-term and long-term perspective to stay ahead of attackers and the natural growth of recursive DNS traffic, but last week's events in some cities were ahead of our resources to more fully address a concentrated attack. The team here at Quad9 is working long hours and getting a huge amount done to expand the network and service, which we hope will put us ahead of these issues in the near future.
What You Can Do: Alternate Secondary Quad9 Addresses
To improve performance in the future, we would urge any users of Quad9 services to enable our alternate secondary addresses for any Quad9 DNS services that are being used. Having both the primary and alternate addresses allows client DNS resolvers to switch between addresses in the case of a problem with one of the networks, even if that problem is very localized to a certain city, provider, or even a home network which is having problems. This built-in resiliency is a part of the DNS protocol, but needs to be configured to work correctly. If your devices or network are using 188.8.131.52, please make sure that 184.108.40.206 as well as 2620:fe::fe are also configured in the resolver list for any systems using Quad9 for DNS resolution, as that may make any future network issues less noticeable. See our Service Addresses page for a full list.
Want to help? Consider donating to Quad9 - we are a non-profit whose focus is on protecting end users' privacy and security. Your sponsorship goes directly and only to those goals. We rely upon our user community to help us fund upgrades that help us improve uptime, improve the security that we provide, and maintain privacy for your DNS data.