Applicability:
These steps are designed to be run on servers which were previously connected to Azure Arc-Enabled Servers, where connectivity was lost due to incident JLJ3-B88.
If you did not receive a Service Health notification regarding incident JLJ3-B88, these instructions are not relevant. If you have an issue with Azure Arc-Enabled Servers connectivity, please contact support.
To validate locally whether a server is affected, run 'azcmagent show':
❯ azcmagent show
Resource Name : test
Resource Group Name : test
Resource Namespace :
Subscription ID : 11111111-1111-1111-1111-111111111111
Tenant ID : 11111111-1111-1111-1111-111111111111
VM ID : 11111111-1111-1111-1111-111111111111
Correlation ID : 11111111-1111-1111-1111-111111111111
VM UUID : 11111111-1111-1111-1111-111111111111
Location : westeurope
Cloud : AzureCloud
Agent Version : 1.17.01931.201
Agent Logfile : C:\ProgramData\AzureConnectedMachineAgent\Log\himds.log
Agent Status : Disconnected
Agent Last Heartbeat : 2022-05-12T11:27:15-07:00
Agent Error Code :
Agent Error Details :
Agent Error Timestamp :
Using HTTPS Proxy :
Proxy Bypass List :
Cloud Provider : N/A
Cloud Metadata
Manufacturer : Dell Inc.
Model : XPS 8930
MSSQL Server Detected : false
Dependent Service Status
GC Service (gcarcservice) : running
Extension Service (extensionservice) : running
Agent Service (himds) : running
Not affected:
If the Subscription ID, Tenant ID, Resource Name or Resource Group Name is not set, the machine has not been connected and is not affected by this issue. Onboard it using 'azcmagent connect' if desired.
If the Location is not 'westeurope', the server is not affected by this issue.
If the Agent Status is 'Connected', the server is not affected by this issue.
You should not run these steps on these servers.
Affected:
If the Agent Error Details include a message similar to "AADSTS700016: Application with identifier 'xxx' was not found in the directory 'yyy'" then the server is affected by this issue.
If the server is in West Europe, and status is Disconnected, but this message is not shown in the error details, it may be affected.
You should run these steps on these servers.
Context:
Servers affected by issue JLJ3-B88 are unable to communicate with Azure because the server's Managed Identity has been deleted. These steps are designed to create a new managed identity for the same resource, and update the service to use that identity.
Steps:
A tool 'azcmrepair' has been developed to simplify the repair steps. This behaves very similarly to the azcmagent.exe distributed as part of the Azure Connected Machine Agent.
Download for Windows:
Download azcmrepair.exe from aka.ms/azcmrepairwindows
azcmrepair needs to be run in an Administrative shell on the local server. It can run in any directory and can be deleted after use.
Download for Linux:
Download azcmrepair from https://aka.ms/azcmrepairlinux , for example with
curl -L https://aka.ms/azcmrepairlinux -o ./azcmrepair
Mark the file as executable, if required:
chmod +x ./azcmrepair
Execute it as root, using sudo azcmrepair.
Usage:
For interactive use, you can just use:
> ./azcmrepair run
You will be prompted for credentials to use while authenticating to Azure. You will need the same permissions on the target Azure subscription and resource group that you would need to onboard servers initially.
For at-scale use, the tool can use a service principal in the same way as 'azcmagent connect'. This can use the same service principal used for onboarding, or another one if preferred. The service principal needs at least the "Azure Connected Machine Onboarding" role.
> ./azcmrepair run --service-principal-id xxx --service-principal-secret yyy
No other arguments are required.
Evaluating Success:
The tool writes diagnostic log information to the console and azcmagent.log. If the command is successful, you will see "Successfully Onboarded Resource to Azure" in the output. After this, if you run 'azcmagent show', you should see the server's state as 'Connected'. This will also show in the Portal but there may be a few minutes delay before the status is updated in the cloud. The tool will have a 0 exit code in a successful case.
If you see "Machine is already connected, no repair needed", the tool believes that the server is connected already. If 'azcmagent show' also shows the server as connected, no further action is required.
In case of failure, the tool will return a non-zero exit code and the output will indicate the issue. Please share any issues with Microsoft Support in this case.
Some common error conditions:
Error: Exit Code: AZCM0016: Missing Mandatory Parameter or azcmrepair failed because missing 'xyz' in the agentconfig.json
Resolution: Your local agentconfig.json is missing or invalid. It's possible the machine had not been connected before. Run azcmagent connect to connect your machine to Azure. You do not need to run the repair tool in this case.
Limitations:
Any roles assignments granted to the server's managed identity (for example, allowing it to access an Azure resource such as a Keyvault), are tied to the identity and cannot be restored by the azcmrepair utility. When the utility has run, a new managed identity will have been created for the resource. Please grant any role assignments to the new identity Azure Active Directory.