SUMMARY
11:56 UTC | 03:56 PT
We can confirm issues affecting performance on Pods 17, 18, 19 & 20 are now resolved.
11:20 UTC | 03:20 PT
We’re seeing improvement with performance & access issues on Pods 17, 18, 19 & 20. Please let us know if you continue to experience issues.
11:03 UTC | 03:03 PT
Customers on Pod’s 17, 18, 19 & 20 may experience access issues to Zendesk products. We are investigating.
POST-MORTEM
On November 29 2018, at 10:38 UTC / 02:38 PT, we started receiving reports of Support and Guide (and Talk to a lesser extent) performance issues pods 17, 18, 19 and 20, Support and Guide being completely down for a subset of accounts at times.
This was due to a caching server being upgraded automatically and restarted in an unattended way, thus the cache got erased and had to be regenerated. The temporary lack of cache in this server impacted other services which then directly impacted Support, Guide and Talk.
In order to mitigate the impact for our customers in the short term, specific commands have been run on the impacted hosts to help regenerate the cache needed to help get things run smoothly again.
In the long term, critical systems will be blacklisted from the unattended upgrade list in order to avoid such restarts and cache loss which could potentially impact our customers directly.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.
Comments
1 comment
Post Mortem posted December 4, 2018
Article is closed for comments.