8 Things to Consider When Comparing Other IaaS Providers to Microsoft Azure

Choosing the right Infrastructure-as-a-Service (IaaS) vendor can be a daunting task. Despite the fact that someone else is handling the purchasing, installation, and maintenance of the infrastructure; there are still dozens of considerations that must be made before landing on the right choice. Each individual scenario is different, and it would be near impossible to cover every possible bullet point that one should consider, but these 8 items should get you started.

1. Hardware Versions

One of the most important factors of IaaS is the infrastructure itself. Does this vendor offer current hardware? How fast does the vendor refresh their hardware? How easily is the move from old hardware to new? Every vendor likely has a refresh cadence, but move onto new hardware too early and it may be unstable. Too late, and it you may miss out on new features or optimizing workload speeds. Moving to new hardware should be relatively easy for any vendor that is standardized on a major virtualization platform such as VMWare, Hyper-V, etc.

How it looks in Azure: 

Microsoft does a pretty great job of regularly rolling out new models of server hardware every few months. Generally, this presents itself in two ways. The first being newer models of processor that are often both faster and cheaper. Additionally, these new roll-outs often gives us more choices in the form of different CPU/RAM/Storage combinations. A good example is the new Fv2 series of virtual machines that became generally available in late October. This particular family of VM is powered by the Intel® Skylake Xeon® Platinum 8168 processor, which was released by Intel in July of this year. Only 3 months elapsed from the time these processors hit the market, until they were generally available in Azure. Changing to a new VM size in Azure is generally as easy as a few clicks in the portal and a reboot. This is not the case 100% of the time, but by and large it applies.

 

2. Stated Performance Levels

Resources are positioned in all sorts of ways on the market. You may encounter everything from “click here to buy a server”, all the to extremely specific technical detail that looks like Chinese to the uninitiated. However, these details can prove vital to real-world performance and the resulting experience when working in the environment. Check with vendors who make the software you plan for the system requirements needed to run their applications. Comparing this to the IaaS providers stats will begin to paint a picture of whether or not your workloads will run properly on their infrastructure.

How it looks in Azure:

Granular data detailing the performance limits and thresholds of VMs and storage in Azure are readily available. There is platform specific knowledge that also applies when trying to reach certain performance goals. For instance, you may need a higher level of IOPS that would only be achievable by striping multiple virtual disks together within the OS on an Azure VM. Combining the official documentation with a little know-how should quickly get you on your way to right-sizing the environment for your desired workload.

 

3. Service Level Agreement (SLA)

A computing infrastructure does no good if it is down. In fact, you should be ascertaining how much downtime your company can withstand, along with how much data can acceptably be lost in a disaster scenario, as part of your disaster recovery and business continuity plans. Figuring out those plans will provide a clearer picture of acceptable SLA levels. An SLA is essentially a guarantee that you will receive a certain level of service. When dealing with computing infrastructures, the most common factor is the amount of time the infrastructure is online, measured in percentage of time. For instance, a 99.999% up time (or “five nines”) would account for account for only 5.26 minutes of total downtime per year. A 99.95% up time would jump to 4.38 hours of downtime per year.

How it looks in Azure:

Every component of Azure that is generally available has an up-time SLA. However, the particular amount of up-time varies by component and the configuration of those components. For instance, the SLA of a VM is higher when using Premium Storage, rather than Standard Storage. For a long time, Azure has been the only monetarily backed SLA public cloud (and to my knowledge, as of this writing, still is). This means that downtime events caused by an outage in Azure itself, that exceeds the SLA time, will incur a refund for the excessive downtime. As you can imagine, this is excellent motivation for Microsoft to keep everything up and running.

 

4. Capacity

All IaaS vendors must plan for capacity, which can be a daunting task. Essentially, vendors must predict how many resources will need to be on hand to meet demand at any given time. Too little resources and performance may suffer, or downtime can occur. Too many resources, and capital is wasted. Capacity planning is truly the tight rope that all providers must walk. If you are an agile company with non-static needs, or you may experience explosive growth, you need to ensure your vendor can grow alongside you.

How it looks in Azure:

Azure is one of only a handful of Hyper-Scale Public Clouds in the world (along with AWS, GCP, Facebook, etc). What this means is that capacity is essentially on-demand and limitless from the perspective of your company. In reality, all of these clouds must plan for capacity just like any other vendor. With Azure’s meteoric growth, there have been a couple of incidents where capacity was low in some regions at some points in time, but generally these were remedied quickly. Assuming the capacity is available, resources can be instantiated quickly due to the Infrastructure-as-Code model of Azure.

 

5. Power & Networking Redundancy

A computing infrastructure is made up of more than just servers. Power, cooling, networking, storage, and physical location also play a part. To ensure the environment stays online, providers must plan for anomalous power activity (i.e. Brown outs and black outs), as well as failures in networking and cooling. The environment can make or break reliability and performance of any computing infrastructure.

How it looks in Azure:

Azure makes use of Availability Sets to combat this problem. Availability sets are logical groupings of similar VMs that instruct the back end systems in Azure to ensure the participating VMs are separate across multiple fault and update domains. Fault domains are separated power and networking, so that a failure of either would not affect a VM that has been separated onto another rack for power and networking. An update domain ensures that the underlying virtual host that is running the VMs are not patched at the same time, thus ensuring the common service that both VMs host stays online. While Availability Sets work well for VMs in a single building within a single region, Microsoft has announced Availability Zones in preview. These will provide for power, networking, and patch fault tolerance across multiple data centers within an Azure Region. In essence, protecting against faults that take down and entire data center, rather than a single rack.

 

6. Storage Redundancy

The entire purpose of computing is to create and process data. As a result, data is without a doubt the most important part of your business. Ensuring that data is available, durable, and recoverable are key objectives of any company doing business in the digital age. The keys here are ensuring data is durable (multiple copies) while maintaining performance. Most vendors will mirror their data at the hardware level as part of the storage appliance (likely a SAN). Don’t forget about your backup data, which should also made durable, also without affecting performance (which can affect your recovery time objective).

How it looks in Azure:

Microsoft offers multiple tiers of storage in Azure around both durability and performance. In speaking of durability, the default tier of storage in Azure is Locally Redundant Storage (LRS). LRS is defined as 3 copies of your data in the same Azure region. You can step this up by moving your storage to Geo-Redundant Storage (GRS), which is 3 copies in the same Azure region and another 3 copies in a separate Azure region, for a grand total of 6 copies. There is an additional tier called Read-Only GRS, which makes your data in the alternate region available for immediate read, in case you need immediate fail over to the off-site copy.

 

7. Deployment Speed & Methodology

Providers run the gamut in how they offer deployment of new resources into their clouds. This could range anywhere from manual deployment to fully automated deployments, and deployment times could range from seconds to days. Again, consider how often your resource needs will change and how quickly you need those resources. Once that decision is made, you are equipped to hone in on your needs in this area.

How it looks in Azure: 

As mentioned previously, Azure is built on the Azure Resource Manager. This platform provides for automated, template based deployment that can be used repeatably create resources in a reliable and consistent manner. Given that essentially everything in Azure is software-defined, entire environments can be deployed without any complicated or time consuming preparatory configuration.

 

8. Network Performance

The network is the backbone of any infrastructure. As global scale applications have become the norm, the need for hundreds or thousands of servers working together seamlessly has also become necessary. Network bottlenecks can rob an N-tier application of all performance, make a web app unusable, or even cause data loss. While most small to medium offices will use 1 gigabit networks on premises, the standard these days for an infrastructure provider is 10 gigabit or better.

How it looks in Azure:

Virtual Networks in Azure generally operate at 10 Gbps, but often much faster. This is dependent on a few factors. Each VM family and size have max network throughput limits, as well as a limit on the number of Network Interface Cards (NIC) that can be attached to the VM. Teaming NICs can achieve higher throughput, assuming the VM size still has overhead in its maximum throughput limit. Azure also supports Infiniband networking and Accelerated Networking to get even more performance out of your cross VM communication.

 

Conclusion

This is not an exhaustive list of everything one should consider when comparing IaaS providers to Azure, but this does give you an idea of where to start and hopefully gets you thinking about how to best support your businesses infrastructure needs in the future.