This hotfix includes several significant improvements to fix virtual machines that enter a resynchronization (resync) or paused state during replication, or that experience time-outs during initial replication or delta replication.
Issues that are fixed
- Issue 1
A virtual machine goes into resynchronization because of high churn on one of the disks. The previous logic was that the virtual machine goes into resynchronization if the accumulated logs for a virtual machine go beyond 50 percent of a replicating virtual hard disk that's attached to the virtual machine. This was calculated based on the size of the lowest disk.
With this fix, the calculation for the 50 percent is based on the total of all the replicating virtual hard disks that are attached to the virtual machine, not to one of its virtual hard disks.
- Issue 2
When the system performs resynchronization, and there's a tracking error, the state reverts to "resync required." Despite this error, the system that was used to continue trying to complete the resynchronization fails. This causes cyclic resynchronization.
With this fix, if the tracking for the virtual machine indicates an error, the system aborts the current resynchronization reverts to "resync required." This saves time and bandwidth usage.
- Issue 3
During replication, there is currently a threshold value of "free storage space." This is set at 300 MB, at which point the virtual machine goes into resynchronization. The low value of 300 MB could cause the production virtual machine to be paused by Hyper-V.
With this fix, the threshold value at which the virtual machine goes into resynchronization is increased to 3 GB.
- Issue 4
During resynchronization, the free storage space is not monitored. This may cause the production virtual machine to pause.
With this fix, the threshold value at which the virtual machine will stop resynchronization is set to 3 GB.
- Issue 5
During the phase of initial replication, if the initial replication does not finish in five days, replication is stopped with a time-out error. The time-out value of five days is too low for deployments in which the initial disk size is quite large, the bandwidth is low, or both.
With this fix, the time-out for initial replication is increased to 30 days. At the end of this period, replication is paused, and the user must resume replication.
- Issue 6
During the state of delta replication that occurs after initial replication, if the delta replication does not finish within six hours for a particular cycle, replication goes into a resynchronization-required state. The value of six hours is too low for deployments in which there's a lot of churn in a particular cycle, the bandwidth is low, or both. This is also true for the delta replication immediately after the initial replication.
With this fix, the time-out for a delta replication cycle is increased to 15 days.
Article ID: 3184854 - Last Review: Sep 20, 2016 - Revision: 1