Thursday, April 21, 2016

Single Virtual Machine in Azure - How to achieve maximum availability with minimum cost?

You already would know by now that if you host Single VM in Azure, there is no SLA provided by Microsoft.

What does that mean? It would mean that if this machine goes down, the time to bring this up is not guaranteed.

Disclaimer:
After reading whole Article, you may say that the solution is not using 1VM. And you are correct, this article suggests alternative means at almost the cost of 1VM and if you originally decided to use single VM then it answers few questions to justify that decision. I hope it will be useful read.

Why can the VM go down?
  •  Hardware Failure on the rack in DataCenter where VM resides
  •  DataCenter down (most likely due to terrorist attack, war, power grid failure, natural calamity) 
  • AppFabric internal maintenance (more on this below)


What is AppFabric Internal Maintenance?
AppFabric may need to upgrade its environment for major updates. This happened thrice in year 2015 in Azure China (as confirmed from an Azure China support personnel). The maximum downtime during this upgrade is 15 mins. However, this 15 mins is not guaranteed and in addition, it can also happen during the day time (peak times) and not necessarily when no one is using your application.
You would be made aware of this scheduled maintenance 1 to 2 weeks in advance.

How to achieve maximum availability with minimum cost?
We can answer it now. But let’s combine this question with one more question.

Why should you backup your Azure VMs?
  • Chances of it being corrupted while you are upgrading your custom software installed on it 
  • Chance of it being corrupted while you are upgrading third party software on it eg. OS / SQL Server 2014.
  • Malicious employee or hacker erasing some key system files


What are the options when VM gets corrupted and we want to fix it?
  • Install new OS if that’s the reason it got corrupted
  • Install your custom software if that caused the corruption. You would most likely get latest from your source safe repository Release branch.
  • Install any accompany software if VM crashed, eg if your custom software is Drupal based and VM crashed, it may require you to install Linux, Apache, Drupal, MySQL, etc.


Key is the time it takes to bring your VM up. Are you okay with that? If not, then you may need to back up your VM (a copy of VM hence) and bring the backup up when original VM crashes.

Where can you keep your Azure VM backup?
  • On-premises
  • On Page Blob on Cloud which is triple replicated
  • Use Azure Backup Service which allows point in time restores and provide a UI to manage backups. As of April 2016, the backups are not cross region. So if all DC in a region are down, I would assume that your backup would be lost.


Let’s come back to our original question and view it along with need for Backup:

How to achieve maximum availability with minimum cost?
By using 2 VMs instead of 1 but pay only for 1.

How?

Add two VMs in one Availability Set. Then from Azure portal mark 1 VM as Shutdown (De-allocated). Microsoft does not charge you for De-allocated VM.

Benefits:
  • With two VMs, you get a SLA from Microsoft of 99.95% availability
  • You are saved from entire DataCenter crash due to above SLA
  • During scheduled AppFabric maintenance day, you can bring your second VM online for 100% availability on that day.
  • Fastest possible VM restore in case of a crash, simply bring the other VM online.
  • And you don’t need to worry about taking and paying for Azure Backup Service!!


How much do you need to pay extra anyways?
  • VHD of Shutdown VM still requires space on Page Blob, you need to pay for that. But Page Blob price is so much less that you will not care to spend that much. 
  • Let’s say if in a year Azure AppFabric maintenance happens thrice, then you have to pay 3 days of another VM cost which is the cost of keeping your service highly available.


So that’s it. I believe this is the cheapest solution with High Availability plus backup with Single VM. If you see issues with this Solution, please post your comments and I will update it.

Some more FAQs on using Single VM:

Q. When we shut down the second VM and bring it up, will the public IP change?
A. the Public IP will not change as you have two VMs in a cloud service.

Q. When we shut down – de-allocate the VM, will all software on it be lost and we need to re-install it again.
A. No. VHD still exists and everything is retained.

Q. In case of planned outage, will the VM be up in maximum X minutes? What is that X?
A. During the planned maintenance window, each virtual machine (VM) that is not in an availability set may experience a reboot. The Virtual Machine will have approximately 15 minutes of total downtime. Temporary disk and Azure storage disks will be preserved during the maintenance. Microsoft informs you of the maintenance in at least 1 or 2 weeks prior to the outage, and the maintenance will be executed within twelve (12) hours from the announced start time. Please be noted that VMs that belong to an availability set and Cloud Services web and worker role instances will not be impacted by this maintenance operation.