As of 14:31 GMT / 06:31 PST we are working to resolve the following service incident:
We are investigating performance issues affecting some customers. More info shortly.
14:47 GMT / 06:47 PST:
We continue to investigate the performance issues affecting some customers. More info ASAP.
15:03 GMT / 07:03 PST:
Our operations team continue to investigate issues with Zendesk services. More information shortly.
15:22 GMT / 07:22 PST:
We continue to investigate issues with Zendesk services affecting some customers.
15:38 GMT / 07:38 PST:
We have identified the issue causing service disruption for some customers. Our operations team are working on a fix.
15:56 GMT / 07:56 PST:
Our operations continue to work on a fix for the performance issues affecting some customers. More information shortly.
16:19 GMT / 08:19 PST:
Mitigation efforts from our operations team continues for performance-impacted customers. More information soon.
16:52 GMT / 08:52 PST:
Efforts continue by our operations team to mitigate problems impacting some customers.
17:33 GMT / 09:33 PST:
Efforts are ongoing by our operations engineers to alleviate problems experienced by some of our customers.
18:07 GMT / 10:07 PST:
Engineering operation changes are improving performance for impacted customers. Situation being monitored, next update in 30 minutes.
18:43 GMT / 10:43 PST:
Performance has improved for impacted customers; however, our operations team continues to monitor the situation.
19:12 GMT / 11:12 PST:
Performance has stabilized and is back to normal for impacted accounts. We are still reviewing the incident to ensure complete recovery.
10:45 GMT / 12:45 PST:
All services have been restored.
This incident affected our data centers located on the East Coast of the US and resulted in a widespread outage affecting multiple services and customers. The issue was caused by a small number of customers recent implementation of their mobile application with the Zendesk Mobile SDK and releases of their update mobile applications around the same time. The volume of requests of these applications being updated overwhelmed our firewall and in turn prevented customers from being able to access their accounts that were based in the affected data centers.
Our Operations team performed multiple investigative steps in working to identify the issue once it was reported by our monitoring systems. The length of this incident was affected by the fact that it was multiple customers use of the Zendesk Mobile SDK at the same time that created the complexity in identifying the offender as we had to remediate the outage step-by-step. Once the issue was found and the customers identified, we proactively routed the specific traffic to restore services to the affected data centers and reached out to the customers to resolve the root cause of this issue.
An important note here in regards to this outage is that at no time was there a loss in emails. All emails that were coming in during this time were queued up for processing once the services returned to normal.
We are still in the post-outage review process to determine exactly what went wrong with how the mobile SDK can inadvertently affect such a wide range of services. Some items we are currently implementing is the capability to identify the direct contributor to the unintentional but malicious traffic rather than through time-consuming manual processes. We are also investigating ways to increase throughput at the firewall level that works for our unique environment.
FOR MORE INFORMATION
Please subscribe to this article for regular updates until the issue is resolved. If you aren't subscribed to our Twitter feed, we encourage you to do so in order to get the most current information about any service issues. We also record all site outages on our system status page where you can see the past 12 months of service uptime. If you have questions about this issue, please open a ticket with us by sending a note to email@example.com.