15:24 UTC | 08:24 PT
We believe we have identified the cause of the issues on all of the impacted Pods and implemented a fix for this issue.
15:00 UTC | 08:00 PT
We found the cause and resolved our latency issues for POD 3,4,5&6. We are starting to see improvement, we will keep you updated.
14:17 UTC | 07:17 PT
We are still investigating the connectivity issues that are impacting POD 3/4/5/6/14, more information to follow
13:59 UTC | 06:59 PT
We are currently experiencing delays on some of our services affecting pod3/4/5/6, more information to follow
A tool we use to scan incoming email for spam relies on another 3rd party tool. Our account with 3rd party tool expired and was disabled causing DNS lookup failures on inbound mail servers resulting in queue backups in pods 3,4,5,6,14. A secondary issue arose in pod 14 due to the spike in DNS query retries resulting from undersized dnscache nodes. This caused a service degradation for pod 14 users impacting all aspects of Zendesk. Pods 3,4,5,6 recovered quickly after our 3rd party tool account was re-enabled and pod 14 recovered after the dnscache instances were resized to a more appropriate instance size. In order to prevent this from happening again in the future we will assess our service dependencies such as DNS, update and increase dnscache, add additional DNS monitoring, and review our current vendor management practices.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.