ReportingCloud: Impact of the Azure Virtual Machine Maintenance Reboot Strategy
Microsoft Azure force-rebooted virtual machines across availability sets to address the Meltdown and Spectre CPU vulnerabilities, causing approximately 45 minutes of unplanned ReportingCloud downtime. The Azure portal displayed incorrect VM power states during the incident.

A large number of computer processors are vulnerable to at least one of two exploits named Meltdown and Spectre. Google, Microsoft and others have been working behind the scenes to create patches for their operating systems and infrastructures such as Azure.
ReportingCloud is hosted on Microsoft Azure Virtual Machines in different Availability Sets. Availability sets ensure that virtual machines are distributed across isolated hardware nodes. In case of a reboot (or problem), only some VMs are impacted.
On December 28th, Microsoft released a security warning with a detailed Planned Virtual Machine Maintenance Reboot Strategy. Basically this strategy advised users to reboot all VMs until January 10th. After that date, a reboot will be forced. So far so good. We planned to reboot our VMs over the weekend after hours.
Then, on January 3rd, the security vulnerability has been disclosed and Microsoft released another security warning with a changed strategy. They started to reboot all VMs at 3:30pm PST on January 3rd. The plan was to reboot VMs in different availability sets separately. This probably worked for many VMs, not for ours. This caused an overall downtime (the first downtime ever) of ReportingCloud of approx. 45 minutes on January 4th 2017.
At that time, the Azure portal was not responsive at all and information about the health of VMs was wrong. Our VMs worked and all ReportingCloud endpoint servers were responsive. But the displayed power states of some VMs in the Azure portal were Failed and Deallocating, but they were running fine. Obviously, the Azure portal was completely overloaded. Of course, we expected issues like these and monitored the states of these VMs. And as expected, the already rebooted VMs have been stopped automatically which caused another downtime for some of our users.
We are very sorry for any inconvenience this downtime has caused. We checked all VMs, all ReportingCloud users and monitored the health of our endpoints carefully over the last hours and we are very confident that everything is up and running without any problems. If you are still facing issues, please let us know.
Related Posts
ReportingCloud: Backend Updated to TX Text Control X16
ReportingCloud updated its backend to TX Text Control X16, introducing support for 100+ Excel-compatible formulas, merge block sorting and filtering for data shaping, and upcoming PDF AcroForm…
Impressions from DevIntersection and the Azure and AI Conference in Las Vegas
Text Control exhibited at DevIntersection and the co-located Azure and AI Conference in Las Vegas, doubling booth size. The team demonstrated TX Text Control X16 document collaboration features,…
Impressions from DDC Cologne
Text Control sponsored the DDC .NET Developer Conference in Cologne, Germany, demonstrating the latest versions of its reporting and document processing components at the exhibition booth. The…
