18:28 UTC | 11:28 PT
The issue affecting access for instances on Pods 5, 6, 9, and 13 has been resolved.
18:12 UTC | 11:12 PT
The issue affecting access for instances on Pods 5, 6, 9, and 13 has stabilized as we continue to monitor performance.
17:38 UTC | 10:38 PT
We are mitigating access issues and beginning to see improvements across Pods 5, 6, and 9.
15:49 UTC | 08:49 PT
We are seeing improvement in slowness and access issues for Pods 5, 6,and 13 but we're continuing to investigate.
15:28 UTC | 08:28 PT
We are currently investigating performance issues affecting Pods 5 and 6. More information to come.
This incident started with system monitor alerts and reports from customers of green screens. Upon investigation, we identified these to be caused by networking infrastructure connecting our data centers with our AWS pods. In response, we performed a load balancer and firewall failover in our Virginia data center. As a result, all connections traversing our firewall were reset. This produced a high volume of re-connection requests on top of new requests that together surpassed session limits and prolonged the service interruption.
To reduce the load on the recovering service, we identified and began redirecting sources of high traffic to other data centers. In particular, we identified our Embeddables service as a significant source of traffic to the impacted data center. We re-directed that traffic to another data center, reducing the traffic at the Virginia data center and allowing it to fully recover.
In our post-mortem review, we identified several remediation items including: upgrade of Virginia data center firewall infrastructure, updates to our risk assessment of corrective action taken during service incidents, architectural review and changes to better distribute requests from our Embeddables service, and changes to the configuration of our system status page to handle large spikes in traffic.
We regret the inconvenience and impact on your business and thank you for your patience as we worked through this incident.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.