14:35 UTC | 06:35 PT
The performance issues affecting customers in Pods 5, 6 & 9 is now resolved.
14:26 UTC | 06:26 PT
We are seeing improvement with performance on Pods 5, 6 & 9. We thank you for your patience.
14:11 UTC | 06:11 PT
We have found the cause of our issue and are working hard to finish stabilising our service.
13:54 UTC | 05:54 PT
We have performed a failover on network devices in Pods 5, 6 & 9. Performance has improved. We continue to monitor.
13:43 UTC | 05:43 PT
Investigations continue with the ongoing service disruption affecting pods 5, 6 and 9.
13:13 UTC | 05:13 PT
We apologise for the service interruption affecting Pods 5,6 & 9. We are working to restore service. Next update in 30 mins.
13:02 UTC | 05:02 PT
We continue to work on a service disruption affecting some customers in pods 5, 6 and 9.
12:29 UTC | 04:29 PT
We are currently experiencing service disruption on all pods. These issues could impact talk, chat, help centre and UI services.
12:10 UTC | 04:10 PT
We are currently investigating reports of dropped calls and access issues with help center in all pods.
At around 12:00 PM UTC (4:00 AM PST) we began to receive service failure alerts and customer reports of issues affecting the Help Center application in all pods. These reports were corroborated by Zendesk engineers and our monitoring systems. Our investigation identified that the incident was triggered by a bug contained in a code deploy. The bug caused the number of concurrent connections to increase across all Zendesk pods. The deploy and subsequent behavior were quickly correlated, and we decided to roll back the change in response.
Following the code rollback, we continued to observe instability across the service platform, primarily in pods 5, 6, & 9. As part as our normal incident response practice, we investigated each tier of our network infrastructure, including our ISP connectivity and general Internet stability. To clear any lingering connections and restore connectivity, we began controlled failover of our edge network devices. At 2:10 PM UTC (6:10 AM PST) all services returned to normal.
We are continuing to investigate the precise traffic characteristics and behavior that led to this service impact.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.