Products Technologies Demo Docs Blog Support Company

ReportingCloud: Impact of the Azure Virtual Machine Maintenance Reboot Strategy

We saw a significant downtime for some ReportingCloud customers, because of the Microsoft Azure forced reboot strategy. In this article, we would like to explain what happened.

ReportingCloud: Impact of the Azure Virtual Machine Maintenance Reboot Strategy

A large number of computer processors are vulnerable to at least one of two exploits named Meltdown and Spectre. Google, Microsoft and others have been working behind the scenes to create patches for their operating systems and infrastructures such as Azure.

ReportingCloud is hosted on Microsoft Azure Virtual Machines in different Availability Sets. Availability sets ensure that virtual machines are distributed across isolated hardware nodes. In case of a reboot (or problem), only some VMs are impacted.

On December 28th, Microsoft released a security warning with a detailed Planned Virtual Machine Maintenance Reboot Strategy. Basically this strategy advised users to reboot all VMs until January 10th. After that date, a reboot will be forced. So far so good. We planned to reboot our VMs over the weekend after hours.

Then, on January 3rd, the security vulnerability has been disclosed and Microsoft released another security warning with a changed strategy. They started to reboot all VMs at 3:30pm PST on January 3rd. The plan was to reboot VMs in different availability sets separately. This probably worked for many VMs, not for ours. This caused an overall downtime (the first downtime ever) of ReportingCloud of approx. 45 minutes on January 4th 2017.

At that time, the Azure portal was not responsive at all and information about the health of VMs was wrong. Our VMs worked and all ReportingCloud endpoint servers were responsive. But the displayed power states of some VMs in the Azure portal were Failed and Deallocating, but they were running fine. Obviously, the Azure portal was completely overloaded. Of course, we expected issues like these and monitored the states of these VMs. And as expected, the already rebooted VMs have been stopped automatically which caused another downtime for some of our users.

We are very sorry for any inconvenience this downtime has caused. We checked all VMs, all ReportingCloud users and monitored the health of our endpoints carefully over the last hours and we are very confident that everything is up and running without any problems. If you are still facing issues, please let us know.

Stay in the loop!

Subscribe to the newsletter to receive the latest updates.

Related Posts

ReportingAzureExcel

ReportingCloud: Backend Updated to TX Text Control X16

We just updated ReportingCloud to the latest version of TX Text Control X16 that comes with loads of new features that can be used in the reporting Web API now as well.


ReportingAzureConference

Impressions from DevIntersection and the Azure and AI Conference in Las Vegas

We sponsored DevIntersection and the Azure and AI Conference in Las Vegas this week. See some impressions of our booth area.


ReportingAzureConference

Impressions from DDC Cologne

Last week, we sponsored the DDC .NET Developer Conference in Cologne, Germany.