A large number of computer processors are vulnerable to at least one of two exploits named Meltdown and Spectre. Google, Microsoft and others have been working behind the scenes to create patches for their operating systems and infrastructures such as Azure.
ReportingCloud is hosted on Microsoft Azure Virtual Machines in different Availability Sets. Availability sets ensure that virtual machines are distributed across isolated hardware nodes. In case of a reboot (or problem), only some VMs are impacted.
On December 28th, Microsoft released a security warning with a detailed Planned Virtual Machine Maintenance Reboot Strategy. Basically this strategy advised users to reboot all VMs until January 10th. After that date, a reboot will be forced. So far so good. We planned to reboot our VMs over the weekend after hours.
Then, on January 3rd, the security vulnerability has been disclosed and Microsoft released another security warning with a changed strategy. They started to reboot all VMs at 3:30pm PST on January 3rd. The plan was to reboot VMs in different availability sets separately. This probably worked for many VMs, not for ours. This caused an overall downtime (the first downtime ever) of ReportingCloud of approx. 45 minutes on January 4th 2017.
At that time, the Azure portal was not responsive at all and information about the health of VMs was wrong. Our VMs worked and all ReportingCloud endpoint servers were responsive. But the displayed power states of some VMs in the Azure portal were Failed and Deallocating, but they were running fine. Obviously, the Azure portal was completely overloaded. Of course, we expected issues like these and monitored the states of these VMs. And as expected, the already rebooted VMs have been stopped automatically which caused another downtime for some of our users.
We are very sorry for any inconvenience this downtime has caused. We checked all VMs, all ReportingCloud users and monitored the health of our endpoints carefully over the last hours and we are very confident that everything is up and running without any problems. If you are still facing issues, please let us know.