Sign in with Microsoft
Sign in or create an account.
Hello,
Select a different account.
You have multiple accounts
Choose the account you want to sign in with.

Summary

When you use the drain-roles feature in the Azure Stack HCI, version 21H2 or 22H2 operating system, a node drain failure might occur on large cluster scenarios (such as eight or more clusters) because of a time-out when storage is put into maintenance mode. This issue especially occurs when you update or upgrade the Azure Stack HCI operating system.

More information

To resolve the drain failure time-out, follow these steps: 

  1. Before you enable maintenance mode or any operation which involves maintenance mode such as node drain or Cluster Aware Updating, first increase the health service physical disks scanning interval. To do this, change the health setting by running the following command:

    get-storagesubsystem Cluster* | set-storagehealthsetting -name System.Storage.PhysicalDisk.CheckPeriodMs -Value 10800000

    Note In this example, we increase the value from fifteen minutes to three hours. However, you should adjust this value to make sure that it is longer than the expected duration of the workflow that involves maintenance mode.

  2. Wait until any ongoing scans to finish. The exact duration depends on the environment. It might take forty to sixty minutes on a 16-node cluster to finish. To verify all existing scans have finished, check the health service log on the owner node of the “SDDC Group” and search for the pattern:

    'Maintenance Mode Event Interpreter' is interpreting Event Type - Origin 'Storage', EntityType 'SPACES_PhysicalDisk'.

    Note If there is no such entry within the last minute, it means that all scans have finished. The health log can be retrieved by running the following command:

    Get-ClusterLog -Destination . -TimeSpan 5 -UseLocalTime -Health

  3. Run a maintenance mode operation or other workflow which involves maintenance mode.

  4. Revert the health setting back to its original setting. This is important as a long interval could potentially cause some delay in certain health service functionality such as physical disk related errors or retirement. To revert the health setting, run the following command:

    get-storagesubsystem Cluster* | remove-storagehealthsetting -name System.Storage.PhysicalDisk.CheckPeriodMs

References

Failover cluster maintenance procedures

Learn about the standard terminology that is used to describe Microsoft software updates.

Need more help?

Want more options?

Explore subscription benefits, browse training courses, learn how to secure your device, and more.

Communities help you ask and answer questions, give feedback, and hear from experts with rich knowledge.

Was this information helpful?

What affected your experience?
By pressing submit, your feedback will be used to improve Microsoft products and services. Your IT admin will be able to collect this data. Privacy Statement.

Thank you for your feedback!

×