Applicability:
These steps are designed to be run on servers which were previously connected to Azure Arc-Enabled Servers, where connectivity was lost due to incident JLJ3-B88.
If you did not receive a Service Health notification regarding incident JLJ3-B88, these instructions are not relevant. If you have an issue with Azure Arc-Enabled Servers connectivity, please contact support.
To validate locally whether a server is affected, run 'azcmagent show':
❯ azcmagent show
Resource Name : test Resource Group Name : test Resource Namespace : Subscription ID : 11111111-1111-1111-1111-111111111111 Tenant ID : 11111111-1111-1111-1111-111111111111 VM ID : 11111111-1111-1111-1111-111111111111 Correlation ID : 11111111-1111-1111-1111-111111111111 VM UUID : 11111111-1111-1111-1111-111111111111 Location : westeurope Cloud : AzureCloud Agent Version : 1.17.01931.201 Agent Logfile : C:\ProgramData\AzureConnectedMachineAgent\Log\himds.log Agent Status : Disconnected Agent Last Heartbeat : 2022-05-12T11:27:15-07:00 Agent Error Code : Agent Error Details : Agent Error Timestamp : Using HTTPS Proxy : Proxy Bypass List : Cloud Provider : N/A Cloud Metadata Manufacturer : Dell Inc. Model : XPS 8930 MSSQL Server Detected : false Dependent Service Status GC Service (gcarcservice) : running Extension Service (extensionservice) : running Agent Service (himds) : running
Not affected: If the Subscription ID, Tenant ID, Resource Name or Resource Group Name is not set, the machine has not been connected and is not affected by this issue. Onboard it using 'azcmagent connect' if desired. If the Location is not 'westeurope', the server is not affected by this issue. If the Agent Status is 'Connected', the server is not affected by this issue. You should not run these steps on these servers.
Affected: If the Agent Error Details include a message similar to "AADSTS700016: Application with identifier 'xxx' was not found in the directory 'yyy'" then the server is affected by this issue. If the server is in West Europe, and status is Disconnected, but this message is not shown in the error details, it may be affected. You should run these steps on these servers.
Context: Servers affected by issue JLJ3-B88 are unable to communicate with Azure because the server's Managed Identity has been deleted. These steps are designed to create a new managed identity for the same resource, and update the service to use that identity.
Steps: A tool 'azcmrepair' has been developed to simplify the repair steps. This behaves very similarly to the azcmagent.exe distributed as part of the Azure Connected Machine Agent.
Download for Windows: Download azcmrepair.exe from aka.ms/azcmrepairwindows azcmrepair needs to be run in an Administrative shell on the local server. It can run in any directory and can be deleted after use.
Download for Linux: Download azcmrepair from https://aka.ms/azcmrepairlinux , for example with curl -L https://aka.ms/azcmrepairlinux -o ./azcmrepair Mark the file as executable, if required: chmod +x ./azcmrepair Execute it as root, using sudo azcmrepair.
Usage: For interactive use, you can just use: > ./azcmrepair run You will be prompted for credentials to use while authenticating to Azure. You will need the same permissions on the target Azure subscription and resource group that you would need to onboard servers initially. For at-scale use, the tool can use a service principal in the same way as 'azcmagent connect'. This can use the same service principal used for onboarding, or another one if preferred. The service principal needs at least the "Azure Connected Machine Onboarding" role. > ./azcmrepair run --service-principal-id xxx --service-principal-secret yyy No other arguments are required.
Evaluating Success: The tool writes diagnostic log information to the console and azcmagent.log. If the command is successful, you will see "Successfully Onboarded Resource to Azure" in the output. After this, if you run 'azcmagent show', you should see the server's state as 'Connected'. This will also show in the Portal but there may be a few minutes delay before the status is updated in the cloud. The tool will have a 0 exit code in a successful case. If you see "Machine is already connected, no repair needed", the tool believes that the server is connected already. If 'azcmagent show' also shows the server as connected, no further action is required. In case of failure, the tool will return a non-zero exit code and the output will indicate the issue. Please share any issues with Microsoft Support in this case.
Some common error conditions: Error: Exit Code: AZCM0016: Missing Mandatory Parameter or azcmrepair failed because missing 'xyz' in the agentconfig.json Resolution: Your local agentconfig.json is missing or invalid. It's possible the machine had not been connected before. Run azcmagent connect to connect your machine to Azure. You do not need to run the repair tool in this case. Limitations: Any roles assignments granted to the server's managed identity (for example, allowing it to access an Azure resource such as a Keyvault), are tied to the identity and cannot be restored by the azcmrepair utility. When the utility has run, a new managed identity will have been created for the resource. Please grant any role assignments to the new identity Azure Active Directory.