20:34 UTC | 13:34 PT
Access and performance issues with Pods 5/6/9 have been resolved.
19:39 UTC | 12:39 PT
Access has been restored to Pods 5, 6, and 9.
19:37 UTC | 12:37 PT
We've restored service to Pods 5, 6, and 9 and are seeing improvements across all pods.
19:32 UTC | 12:32 PT
The issues affecting Pods 5, 6, and 9 have been resolved and performance is normalized.
19:15 UTC | 12:15 PT
Pods 5, 6, and 9 are beginning to come back up and we are seeing improvements across those pods.
17:33 UTC | 10:33 PT
We’re seeing access improvements across Pods 4/5/6 - if you’re still having issues, please let us know!
17:09 UTC | 10:09 PT
We’re receiving reports of intermittent slowness on Pods 4, 5, and 6. Investigation is underway.
At 9:57 AM PDT our monitoring began to report connection problems to our Agent Interface and Help Center in our US data centers. We began investigation immediately. At 10:17 AM PDT we received an emergency service notification from a DDoS mitigation vendor, which provides us with protection from malicious distributed denial of service (DDoS) attacks, stating that they were experiencing problems in their London, Washington DC, and San Jose network centers. We quickly confirmed this event was the source of our customers' connection problems and that the impact was widespread.
We immediately contacted the vendor, who recommended that we re-direct our network traffic because they could not confirm when the disruption would be resolved. Based on this information and the significant customer impact at hand, we decided to bypass the service and direct all network traffic around this provider, directly to our data centers.
The process to route traffic back was executed between 10:35 AM PDT and 1:12 PM PDT. During that time, network traffic was switched back in batches. As a result, impact for individual customers varied during this incident with some customers experiencing return of service as early as 10:40 AM PDT and no later than 1:12 PM PDT.
This incident also impacted the ability of our Customer Advocacy (Support) team to respond to email, voice, and chat tickets submitted by our customers between 10:45am to 12:10 pm PDT.
- We have a case open with our DDoS mitigation vendor to determine the root cause of this service issue. We do not expect to receive an official RCA until at least next week.
- We are monitoring service status from said vendor in order to make a decision on resuming our traffic routing through their service. They are continuing to analyze the incident across their network.
- We are evaluating the need for any further (urgent or otherwise) changes across our own infrastructure; should the need arise we will schedule an emergency change. Any maintenance, whether emergency or scheduled, will be communicated in this forum and in your Zendesk account once it is officially scheduled.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.