16:30 UTC | 09:30 PT
Service incident resolved at 9:30 PT.
15:58 UTC | 08:58 PT
We are seeing improvements in our voice service in pods 5, 6, and 9. We are monitoring closely.
15:35 UTC | 08:35 PT
“We are experiencing issues completing some calls in pods 5, 6, 9. We’re investigating.”
Our Voice team notified us that we were dropping voice calls which also included outbound connection timeouts. We identified that our firewall was being saturated by ingress traffic from S3.
Further investigation showed that a recent bug fix on our Translations App Servers required the app servers to download a backlog of files by all required client nodes. This concurrent download stream, paired with the large amount of files appeared to saturate the firewall.
We worked to manually kill all in progress downloads and from there did a slow roll of downloads until all clients were updated.
To mitigate this issue from happening again in the future, we will bring new firewalls into production to assist with cutover as well as refactor the Translations App servers sync design.
FOR MORE INFORMATION
For current system status information about your Zendesk, check out our system status page. During an incident, you can also receive status updates by following @ZendeskOps on Twitter. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.