19:08 UTC | 11:08 PT
The issues impacting p5, 6, 9 have been resolved.
18:42 UTC | 10:42 PT
We’re continuing to observe improvements across Pods 5/6/9. We will provide any updates once we have new information to report.
18:10 UTC | 10:10 PT
We’re seeing service availability improve across Pods 5/6/9. We’re continuing to investigate in the meantime.
17:42 UTC | 09:42 PT
Issues are improving in p6, with no recent reports in pods 5, 9. We are continuing to investigate the situation and will update shortly
17:16 UTC | 09:16 PT
We are seeing recent reports for issues in pod 6. We are working to remediate the issue and will update shortly.
16:45 UTC | 08:45 PT
We’re noting improvements across impacted pods. Please attempt a browser refresh, and let us know if you’e still experiencing issues.
16:15 UTC | 08:15 PT
We're seeing improvements in pods 5, 6, 9. We are continuing to monitor and will update shortly.
15:46 UTC | 07:46 PT
Our team continues to work to resolve failing DNS resolution in pods 5, 6, and 9, which we’ve identified as the root cause.
15:12 UTC | 07:12 PT
We have identified the root cause of the performance issues on Pods 5, 6, and 9, and are working on a resolution.
14:48 UTC | 06:48 PT
We are experiencing service availability issues in pods 5, 6, and 9. This is impacting all services in these pod.
During this incident customers in Pods 5, 6, and 9 experienced service disruptions, including delays in email processing, "502 Bad Gateway" errors, slow loading times in the agent interface, and general app issues. This incident occurred after the DNS configuration for these pods was modified to be more fault tolerant. This configuration change introduced a circular dependency between two vital processes, which caused several services to misbehave on restart. This particular circular dependency is difficult to predict and did not manifest itself during testing. To help prevent this sort of issue from recurring, we're taking several measures, including addressing an issue where configuration changes incorrectly cause services to restart, and ensuring that control gaps in our change processes are remediated.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.