As of 15:45 GMT / 07:45 AM PST we are working to resolve the following service incident:
We are receiving reports of latency in various Zendesk services. We are investigating it.
16:20 GMT / 08:20 AM
We are still investigating the issue creating latency in various Zendesk services for some of our customers. More info to come.
16:55 GMT / 08:55 AM
Latency localized at east coast data center, customers reporting intermittent voice & access issues
17:29 GMT / 09:29 AM
Still investigating performance issues at East Coast data center, including some reports of email delays.
18:08 GMT / 10:08 AM
Still investigating fix for performance issues at East Coast Data Center. Next update in 30 mins.
18:50 GMT / 10:50 AM
Following changes intended to resolve issues, we are seeing improved performance in East Coast data centers.
19:12 GMT / 11:12 AM
We have system monitor and customer confirmation that performance issue has resolved. Post-mortem will be posted here shortly.
An issue in communication between clusters in a web service and one if our US-based data centres, causing the endpoint for our ELK stack (which provides the syslog destination for our syslog-ng configurations) service discovery on the data centre's servers to fail intermittently.
With the service discovery DNS record failing intermittently, syslog-ng entered a failure state that makes it exhaust resources on the servers that had issues to resolve the DNS record for the ELK endpoint. This, in turn, caused a lack of availability of many services of our internal network, resulting in green screens and slow functionality in general for our customers.
To fix this problem, a workaround was put in place removing the DNS service discovery from syslog-ng configuration and replacing it by a plain DNS record in our internal zone.
Better monitoring around our cluster health and communication can be put in place in our monitoring/metrics system to improve the detection of this failure scenario.
FOR MORE INFORMATION
Please subscribe to this article for regular updates until the issue is resolved. If you aren't subscribed to our Twitter feed, we encourage you to do so in order to get the most current information about any service issues. We also record all site outages on our system status page where you can see the past 12 months of service uptime. If you have questions about this issue, please open a ticket with us by sending a note to firstname.lastname@example.org.