Details of the cluster overcommit algorithm in System Center 2012 R2 Virtual Machine Manager

Applies to: Microsoft System Center 2012 R2 Virtual Machine Manager

This article describes four approaches for the cluster overcommit algorithm in Microsoft System Center 2012 R2 Virtual Machine Manager (VMM 2012 R2). You can find an example hereto help you understand the algorithm.

Overview of the approaches in the algorithm


The goal of the VMM 2012 R2 cluster overcommit check is to determine whether virtual machines will not be restarted if there is a concurrent failure of cluster reserve (R) nodes. The cluster is assumed to be overcommitted until proven otherwise. Four approaches are tried, and if any approach shows that the cluster is not overcommitted, the cluster state is set to OK. Otherwise, the cluster state is set to Overcommitted.

The four approaches can be visualized in a table as follows:
Check methodsProof methodSlot method
Simple checkProof-simpleSlot-simple
Full complexity checkProof-fullSlot-full

Note The full complexity check is only a marginal refinement over the simple check, and the simple proof check offers very similar results.

Calculations and algorithms


Value definitions and precalculations in the algorithm

Important The AdditionalMemory and AvailableSlots values cannot be calculated by using the values from a single host. The value of LargestClusterVM or SlotSize from the source failing hosts must be known. In a simple check, they are equal to the largest virtual machine in the cluster. In a full complexity check, they are equal to the largest HA virtual machine in the set of failing hosts. Some hosts will fail, and other hosts will receive that workload. The calculations of available space are incorrect unless they were completed for the failing hosts and not the receiving host.

Cluster values

The follow table shows the definitions of cluster values:
Value nameDefinition
NThe total number of hosts in the cluster
RThe cluster reserve value (that is, the maximum number of concurrent failures to model)
HThe remaining healthy hosts to be used as targets for failover (H = N-R)

Host values

The following values are precalculated for each host. When a value is calculated for the LargestClusterVMMB or SlotSizeMB, it is recalculated in each iteration of full complexity checks.
Value nameDefinition
AvailableMemoryThis is the total usable host memory for failed-over virtual machines to use.
AvailableMemory = Host Total Memory - Existing VMs - Host Reserve
AdditionalMemoryThis is the fill line after which a host will no longer be able to start the largest virtual machine that is failing over.
AdditionalMemory = Max(AvailableMemory - LargestClusterVM,0)
HAVMsThis is the total of high availability (HA) virtual machines on this host.
AvailableSlotsThis is the number of failing over virtual machines that this host is guaranteed to be able to start.
AvailableSlots = AvailableMemory / SlotSize, rounded down
UsedSlotsThis is the number of HA virtual machines on this host.
TotalSlotsTotal number of slots on a host.
TotalSlots = AvailableSlots + UsedSlots

Notes
  • A 64 megabyte (MB) buffer is added to each virtual machine's memory to account for hypervisor overhead.
  • Stopped, saved state, paused, and running virtual machines (VMs) are all counted. A tenant user who is starting a stopped virtual machine should be accounted for when the algorithm calculates overcommits.
  • If dynamic memory virtual machines are present in the cluster, their current memory demand is used.

Algorithms in the four approaches

Slot-simple

The algorithms in the slot-simple approach are as follows:
  • SlotSize = largest HA virtual machine in the cluster.
  • Calculate the AvailableSlots, UsedSlots, and TotalSlots values for each host.
  • TotalSlotsRemaining = sum of the smallest H values of TotalSlots.
  • If Sum(UsedSlots) <= TotalSlotsRemaining, cluster is not overcommitted.

Slot-full

Iterated over each set of R failing hosts. The algorithms in the slot-full approach are as follows:
  • SlotSize = largest HA virtual machines on the R failing hosts.
  • Calculate the AvailableSlots, UsedSlots, and TotalSlots values for each host.
  • TotalSlotsRemaining = sum of TotalSlots on all non-failing hosts.
  • Compare the Sum(UsedSlots) and TotalSlotsRemaining values:
    • If Sum(UsedSlots) > TotalSlotsRemaining, cluster may be overcommitted.
    • If Sum(UsedSlots) <= TotalSlotsRemaining for every set of failing hosts, cluster is not overcommitted.

Proof-simple

The algorithms in the proof-simple approach are as follows:
  • LargestClusterVM = largest HA virtual machine in the cluster.
  • Calculate AdditionalMemory, HAVMs for all hosts.
  • TotalAdditionalSpace = sum of smallest H values of AdditionalMemory.
  • TotalOrphanedVMs = (sum of largest R values of HAVMs) - LargestClusterVM.
  • Compare the values:
    • If TotalOrphanedVMs <= TotalAdditionalSpace, cluster is not overcommitted.
    • If TotalOrphanedVMs is 0, LargestClusterVM > 0 and TotalAdditionalSpace = 0, cluster may be overcommitted.

Proof-full

Iterated over each set of R failing hosts. The algorithms in the proof-full approach are as follows: 
  • LargestClusterVM = largest HA virtual machine on the R failing hosts.
  • Calculate AdditionalMemory, HAVMs for all hosts.
  • TotalAdditionalSpace = sum of AdditionalMemory on non-failing hosts.
  • TotalOrphanedVMs = (sum of HAVMsMB on the R failing hosts) - LargestClusterVM.
  • Compare the values:
    • If TotalOrphanedVMs > TotalAdditionalSpace, cluster may be overcommitted.
    • If TotalOrphanedVMs = 0, LargestClusterVM > 0 and TotalAdditionalSpaceMB = 0, cluster may be overcommitted.
    • If TotalOrphanedVMs < TotalAdditionalSpace for every set of failing hosts, cluster is not overcommitted.

Combining the approaches and example


None of the methods show that the cluster is overcommitted. They can show only that the cluster is not overcommitted. If none of the methods that are used can show that the cluster is not overcommitted, the cluster is flagged as overcommitted. If even a singlemethod shows that the cluster is not overcommitted, the cluster is flagged as OK, and calculation is stopped immediately.

However, for the full complexity analysis, if even a single set of R failing hosts shows that the cluster may be overcommitted, the method is immediately completed and does not flag the cluster as OK.