22:00 UTC | 14:00 PT
This incident is now resolved and services are back to normal. Our Operations, Engineering, and Advocacy teams worked together to identify high resource utilization in one of our backend key/value stores. We further determined that the behavior had changed considerably since a software deploy earlier in the day. To remedy the situation the suspect deploy was rolled back and the resulting data in the backend system was managed until resource utilization levels returned to normal.
Investigation continues at this time, but all services are back to normal.
20:55 UTC | 12:55 PT
We believe we have found the root cause for performance issues with Talk & Support in pods 3, 5, 6, & 9, We are working towards a resolution
20:16 UTC | 12:16 PT
We have confirmed performance and stability issues affecting Talk and Support customers. Investigation continues.
19:39 UTC | 11:39 PT
We are receiving reports of performance and stability issues with Talk and Support. Investigation is currently underway
A code deploy caused a large number of failed jobs in the queue, resulting in the queue using a lot of memory, which degraded queue workers and other services. We are working to improve tests around the code that was rolled back as well as improved monitoring of high resource utilization on the queue.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.