This article describes a hotfix for Azure Site Recovery for Hyper-V deployments.
This hotfix includes several significant improvements to fix virtual machines going into resynchronization (resync)/paused state during replication or time outs during initial replication or delta replication.
Issues that are fixed
A virtual machine goes into resynchronization because of high churn on one of the disks. The previous logic was that the virtual machine goes into resynchronization if the accumulated logs for a virtual machine go beyond 50-percent of a replicating virtual hard disk attached to the virtual machine. This was calculated based on the size of the lowest disk.
With this fix, the calculation for the 50-percent is based on the total of all the replicating virtual hard disks attached to the virtual machine and not one of its virtual hard disks.
When the system performs resynchronization, and the tracking had an error, the state would go back to "resync required." Despite this error, the system that was used to continue to try to complete the resynchronization would fail. This causes cyclic resynchronization.
With this fix, if the tracking for the virtual machine indicates an error, then the system aborts the current resynchronization and goes back into "resync required." This saves time in addition to bandwidth usage.
During replication, currently there is a threshold value of ‘free storage space’. This is set at 300MB at which time, the Virtual Machine goes into resynchronization. The low value of 300MB could cause the production virtual machine to be paused by Hyper-V.
With this fix, the threshold value at which the virtual machine goes into resynchronization is increased to 3GB.
During resynchronization, the free storage space was not monitored. This leads to a possible pausing of the production virtual machine.
With this fix, the threshold value at which the virtual machine will stop resynchronization is 3GB.
During the phase of initial replication, if the initial replication does not finish in 5 days, the replication is stopped with a time-out error. The time-out value of 5 days is small for deployments wherein the initial disk size is high or the bandwidth is low or both.
With this fix, the time-out for initial replication is increased to 30 days. At the end of it, the replication will be paused and the user will have to resume replication.
During the state of delta replication that occurs after initial replication, if the delta replication does not finish within 6 hours for a particular cycle, the replication goes into a resynchronization required state. The value of 6 hours is small for deployments wherein the churn is high in a particular cycle or the bandwidth is low or both. This is also true for the delta replication right after the initial replication.
With this fix, the time-out for a delta replication cycle is increased to 15 days.
These issues are fixed in the following update for Windows Server 2012 R2:
3172614 July 2016 update rollup for Windows RT 8.1, Windows 8.1, and Windows Server 2012 R2
Microsoft has confirmed that this is a problem in the Microsoft products that are listed in the "Applies to" section.