How to troubleshoot Operations Management Suite onboarding issues

Summary
This article contains a series of steps, procedures, and troubleshooting tips for both integrated Operations Manager attach mode clients and Direct Agent access in Microsoft Operations Management Suite (OMS).

The article covers the following topics

Operations Manager registration error

Describes two error messages that you might encounter when you register an Operations Manager (OpsMgr) management group.

Proxy registration or configuration steps

Describes how to configure proxy servers (if you have them) to allow traffic to Operational Insights.

Verifying the deployment after registration

Provides troubleshooting steps both for integrated Operations Manager attach mode and for directly-connected agents. Also describes how to check data flow, as well as common errors to look for and how to fix them.

Other Operations Manager known issues and workarounds

Describes other miscellaneous issues related to Operational Insights onboarding from Operations Manager.
More information

Operations Manager registration error

Error 2200

When you try to connect an Operations Manager management group to OMS, you receive the following error message:

Error 2200: Unable to register to the Advisor Service. Please contact the system administrator.

This issue may be caused by one of the following conditions:
  • The OMS workspace was not created before trying to integrate with Operations Manager. To fix this issue, go to the Microsoft.com\OMS site and create a workspace first. Then, try onboarding through the same account.
  • After you installed the latest required update rollups, the necessary management packs were not imported into your OpsMgr management group. To fix this issue, open the OpsMgr console, go to Administration view, and then select the option to import management packs. Next, go to %SystemDrive%\Program Files\System Center 2012 SP1\Operations Manager\Server\Management Packs for Update Rollups, and then import the management packs in this folder. As soon as the operation is complete, restart the console, and then try onboarding again.

Error 3000

When you try to connect an OpsMgr management group to OMS, you receive the following error message:

Error 3000: Unable to register to the Advisor Service. Please contact the system administrator.

This issue may be caused by one of the following conditions:
  • The server clock may be out of sync with the current time by more than 5 minutes. You can fix this issue by changing the clock time on the server to match the current time. To do this, open a command prompt as an administrator, run w32tm /tz to check the time zone, and then run w32tm /resync to sync the time.

    Note Even when your clock says it's synchronized (that is, synchronized with your company’s time server), it might still be out of sync with the one of the virtual machines in Azure. Because the allowable time skew is only 5 minutes, this is frequently an issue. Verify that you are synchronizing with a reliable time server on the Internet. You can further troubleshoot this type of issue by enabling verbose tracing on the management server or the computer running the consoler. For more information about tracing in Operations Manager, see the following article:
    942864 How to use diagnostic tracing in System Center Operations Manager 2007 and in System Center Essentials
    Essentially, you will need to run the following command:
    StartTracing.cmd VER - reproduce the issue – StopTracing.cmd FormatTracing.cmd
    In the output trace files, you should find an exception indicating that the token was rejected because it was not yet valid or expired.
  • An internal proxy server or firewall may be blocking communication to the Advisor service endpoints. The following section includes detailed information about how to troubleshoot these problems.

Proxy registration or configuration steps

When an internal proxy server or firewall is blocking communication to the Advisor service endpoints, registration may fail. Or, after registration completes, OpsMgr communication fails. This section describes the type of communications and endpoints that must be allowed on your management servers, console, and direct agents for communication for Operational Insights to work.

Step 1: Request exception for the service endpoints

The following domains and URLs must be accessible through the firewall or proxy for the management server to access the Azure Operational Insights Web Services.

Some proxy servers may require that HTTPS inspection be bypassed. The URLs that require this are noted in the following tables.

Management server
URLBypass HTTPS inspectionPorts
service.systemcenteradvisor.com443
*.service.opinsights.azure.com443
*.blob.core.windows.net/*X443
data.systemcenteradvisor.com443
ods.systemcenteradvisor.com443
*.ods.opinsights.azure.comX443

Operations Manager console
The following domains and URLs must be accessible through the firewall to view the Advisor web portal and the Operations Manager console (to perform ‘registration’ to Azure Operational Insights).

ResourceBypass HTTPS inspectionPorts
*service.systemcenteradvisor.com443
*.service.opinsghts.azure.com443
*.live.com80, 443
*.microsoft.com80, 443
*.microsoftonline.com 80, 443
*.mms.microsoft.com80, 443
login.windows.net80, 443

Also, make sure that the Internet Explorer proxy is set correctly on the computer being used. It can be especially valuable to try connecting to an SSL-enabled website such as https://www.bing.com to verify that HTTPS connections will work. If the HTTPS connection doesn’t work from a browser, it probably also won’t work from the Operations Manager console or in the server modules that talk to the web services in the cloud.
Directly connected agents
Direct Agent does not use your credentials to connect to the workspace; instead, you must enter the workspace ID and key. Those credentials are used for registration, and then after the agent is registered, a certificate is used. Direct Agent needs to connect only to the following destinations:

URLBypass HTTPS inspectionPorts
*.blob.core.windows.net/*X443
*.oms.opinsights.azure.comX443
*.ods.opinsights.azure.comX443
ods.systemcenteradvisor.com443

After you have completed the registration of your OpsMgr environment to the Advisor Service, follow steps 2, 3, and 5 to allow your management servers to send data to the Advisor Web Service. Note that step 4 is required only if you have not installed the most current updates.

Step 2: Configure the proxy server in the OpsMgr console

  1. Open the OpsMgr console.
  2. Go to the Administration view.
  3. Under the System Center Advisor node, select Advisor Connection.
  4. Click Configure Proxy Server:

  5. Select the check box to use a proxy server to access the Advisor Web Service
  6. Specify the proxy address in the http://proxyserver:port format:

    Proxy address

Step 3: Specify credentials for OpsMgr if the proxy server requires authentication

If the proxy server requires authentication, you can specify one for an OpsMgr RunAs account and associate it with System Center Advisor Run As Profile Proxy:
  1. In the OpsMgr Console, go to Administration view.
  2. Under the RunAs Configuration node, select Profiles.
  3. Double-click to open System Center Advisor Run As Profile Proxy:

    System Center Advisor Run As Profile Proxy
  4. Click Add to add a RunAs Account. You can either create one or use an existing account. This account must have sufficient permissions to pass through the proxy.
  5. Set the account to be targeted at the Operations Manager Management Servers group.
  6. Complete the wizard and save the changes:

Step 4: Configure the proxy server on each OpsMgr management server for managed code

There is an additional setting in Operations Manager that's intended for general error reporting. However, when that setting is enabled, the proxy setting may also end up affecting Advisor connector's functionality. This occurs because the same modules may be used in multiple workflows. Therefore. Microsoft recommends that you set the proxy server to the same proxy for each and every management server. To do this, follow these steps:
  1. In the OpsMgr Console, go to Administration view.
  2. Select Device Management, and then select the Management Servers node.
  3. Right-click, select Properties for each MS (one at the time), and then set the proxy on the Proxy Settings tab:

Verifying the deployment after registration

Step 1: Verify that the right management packs are downloaded to your Operations Manager environment

Depending on which Solutions (formerly known as Intelligence Packs) that you have enabled in the Operational Insights portal, will you see some or all of the management packs in the following screen shot. Search for keyword "Advisor" or "Solution" in their names, and make sure that the solutions that you've enabled have corresponding management packs installed.



You can also check for these management packs through PowerShell by using the following commands:
get-scommanagementpack | where {$_.DisplayName -match 'Advisor'} | select Name,DisplayName,Version,KeyToken
get-scommanagementpack | where {$_.DisplayName -match 'Advisor'} | select Name,DisplayName,Version,KeyToken | Out-GridView
Note If you are troubleshooting the Capacity Solution, check how many management packs you have that have a name that contains "capacity." There are two management packs that have the same display name but different internal IDs that come in the same management pack bundle. If one of the two does not get imported (often because of missing VMM dependencies) the other management pack may not be imported either.

You should see the following three management packs related to "capacity":
  • Microsoft System Center Advisor Capacity Intelligence Pack
  • Microsoft System Center Advisor Capacity Intelligence Pack
  • Microsoft System Center Advisor Capacity Storage Data
If you only see one or two of these, remove them and wait 5 to 10 minutes for Operations Manager to download and import them again. If this fails, check the event logs for errors during this period.

Step 2: Validate that the right solutions are downloaded to your Direct Agent

With Direct Agents, you should see the Solution collection policy being cached under the C:\Program Files\Microsoft Monitoring Agent\Agent\Health Service State\Management Packs path:


Step 3: Validate that data is being sent up to the Advisor service (or at least that a send is tried)

  1. Open Performance Monitor.
  2. Select Health Service Management Groups.
  3. Add all the counters that start with HTTP:

  4. If the configuration is correct, you should see activity for these counters as events, and other data is uploaded (based on the solutions loaded in the portal and the configured log collection policy). These counters do not necessarily have to be continuously busy. However, if you see little or no activity, it might be that you have not added many solutions, or that you have a very lightweight collection policy.

Step 4: Check for errors in the management server or Direct Agent event logs

As a final step, if all of the preceding steps fail, check whether you have any errors in the Event Viewer –> Application and Services –> Operations Manager event log. Filter by Event Sources: Advisor, Health Service Modules, HealthService, and Service Connector (this last one applies to Direct Agent only). You can copy these events and post them in the Feedback forum so that we on the product team can offer you additional help. Most of these events are also found on Direct Agent, and the steps for troubleshooting are similar. The only part that differs between an integrated OpsMgr environment and one using direct agents is the registration process:
  • When connected to Operations Manager, you have a wizard with browser integration that lets you pick your workspace as a user/admin. Then, OpsMgr takes care of exchanging certificates and uses those for management pack download and data transfer to Operational Insights.
  • When you're using Direct Agents, you just copy and paste the workspace ID and key, and those are used to verify that it’s really you who are registering those agents and that you own that workspace. After you're authenticated, certificates are exchanged under the hood by the service in a manner similar to when OpsMgr is integrated.
Most of these events apply to both types of reporting infrastructure. Open Event Viewer –> Application and Services –> Operations Manager and filter by Event Sources: Advisor, Health Service Modules, HealthService and Service Connector (this last one applies to Direct Agent only).



A few of the events that you might see when things aren’t working correctly are included in the following table:
EventIDSourceMeaningResolution
2138Health Service ModulesProxy requires authenticationFollow step 3 and/or step 1 above.
2137Health Service ModulesCannot read the authentication certificateRe-running the Advisor registration wizard will fix certificates/runas accounts.
2132Health Service ModulesNot AuthorizedCould be an issue with the certificate and/or registration to the service. Try re-running the Advisor registration wizard as that will fix certificates and runas accounts. Additionally, verify that the proxy has been set to allow exclusions per step 1 above, and/or verify authentication per step 3. Also verify that the user has access thru the proxy.
2129Health Service ModulesFailed connection / Failed SSL negotiationThere could be some incorrect TCP settings on the server. Check this post from the community for more information: http://jacobbenson.com/?p=511.
2127Health Service ModulesFailure sending data received error codeIf it only happens once in a while this could just be a random anomaly that can be ignored. Monitor to understand how often it happens. If it happens often (every 10 minutes or so throughout the day), then it is a problem. Check your network configuration and proxy settings, then re-run the registration wizard. If it only happens sporadically (i.e. a couple of times per day) then everything should be fine, as data will be queued and retransmitted.
Some of the HTTP error codes have some special meanings. For example, the first time that a MMA direct agent or management server tries to send data to our service, it will get a 500 error with an inner 404 error code. 404 means not found, which indicates that the storage area we’ll use for this new workspace of yours isn’t quite ready yet (it is still being provisioned). On next retry, this will be ready and flow will start working as expected.
A 403 error might indicate a permission or credentials issue. There is more information on the 403 error below in the Direct Agent specific section of this post.
2128Health Service ModulesDNS name resolution failedYou server can’t resolve our Internet address that is used when sending data. This might be DNS resolver settings on your computer, incorrect proxy settings, or maybe a temporary issue with DNS at your provider. Like the previous event, depending on whether it happens constantly or only once in a while, it could be an issue or not.
2130Health Service ModulesTime outLike the previous event, depending on whether it happens constantly or only once in a while, it could be an issue or not.
4511HealthServiceCannot load module "System.PublishDataToEndPoint" – file not foundInitialization of a module of type "System.PublishDataToEndPoint" (CLSID "{D407D659-65E4-4476-BF40-924E56841465}") failed with error code The system cannot find the file specified.

This error indicates you have old DLLs on your machine that don’t contain the required modules. The fix is to update your Management Servers to the latest Update Rollup package.
4502HealthServiceModule crashedIf you see this for workflows with names such as CollectInstanceSpace or CollectTypeSpace, it might mean the server is having issues sending data. Depending on how often it happens, it may be an issue or not. If it happens more than every hour it is definitely an issue, however if it only fails once or twice per day it will be fine and should be able to recover. Depending on how the module actually fails (the description will have more details) this could be an on-premises issue (e.g. to collect to DB) or an issue sending to the cloud. Verify your network and proxy settings and if it still fails, try restarting the HealthService.
4501HealthServiceModule "System.PublishDataToEndPoint" crashedA module of type "System.PublishDataToEndPoint" reported an error 87L which was running as part of rule "Microsoft.SystemCenter.CollectAlertChangeDataToCloud" running for instance "Operations Manager Management Group" with id:"{6B1D1BE8-EBB4-B425-08DC-2385C5930B04}" in management group "SCOMTEST".

You should NOT see this with this exact workflow, module and error anymore. It used to be a bug but it is now fixed. It was being tracked here http://feedback.azure.com/forums/267889-azure-operational-insights/suggestions/6714689-alert-management-intelligence-pack-not-sending-ale
4002Service ConnectorThe service returned HTTP status code 403 in response to a query. Please check with the service administrator for the health of the service. The query will be retried later.You can get a 403 during the agent’s initial registration phase and you’ll see a URL similar to the following:
https://<YourWorkspaceID>.oms.opinsights.azure.com/ AgentService.svc/AgentTopologyRequest

Error code 403 means forbidden. This is typically a mistyped Workspace ID or key, or the clock is not synced (just like for ‘error 3000’ at the beginning of this article) – see more here

Step 5: Look for your agents to send their data and have it indexed in the portal

Check in the Operational Insights portal to see whether your computers are reporting. From the Overview page navigate to the large blue SETTINGS tile. It will be either the first or last tile depending on your configuration. In SETTINGS, click the CONNECTED SOURCES tab. Each column on this page represents a different data source type attached to OI (servers attached directly, Operations Manager management groups and Azure storage accounts). Click the blue "X servers/mgmt groups/storage accounts connected" and it will bring you to a search with more detail. On this page you'll also see a list of individual management groups connected. Clicking one of these management groups will also bring you to search and show you a list of the servers connected to this management group.

Note If a data source is listed as reporting on this page it does not necessarily mean we have collected any data from the source. In this case it's possible that drilling into search from this page will show inconsistent results (e.g. you will see a data source listed in CONNNECTED SOURCESbut it won't be in search). Once data collection has started, either from an IP address or from log collection, the results in search will be consistent.


The Advisor engineering team is committed to resolving all your onboarding issues so please contact us if you run into any issues. We are here to help.

Other Operations Manager known issues and workarounds

The search button in the "Add a Computer/Group" dialog box is missing

Some customers have reported that the Search button in the Computer Search dialog box is missing. We are currently investigating this. As a temporary workaround, click in the Filter by (optional) edit box, and then press the Tab key to get to the invisible search button. Then, you can activate the button by pressing the <Spacebar> or <Enter> key.


IIS log collection issues

The following article contains specific information about how to best configure IIS logging for use with Operational Insights, as well as some other known issues:

http://blogs.technet.com/b/momteam/archive/2014/09/19/iis-log-format-requirements-in-system-center-advisor.aspx

Some of the information in this article also applies to Direct Agent, but it primarily targets Operations Manager. It also includes additional information about IIS with Direct Agent.

Connectivity issues may occur when the Baltimore CyberTrust Root certificate is not installed


Connectivity issues may occur on client computers when the following conditions are true:
  • The client computers use Microsoft Intune and have the automatic root certificate mechanism disabled.
  • The Baltimore CyberTrust Root certificate is not installed.

To resolve this problem when the automatic root certificate update mechanism is disabled on a client computer, install the latest root certificates to make sure that the client computer is up to date and secure. You can use the Microsoft Update Catalog to find the latest root certificate updates. You can search for "root update" or the Microsoft Knowledge Base article number for the Windows Root Certificate Program (931125) and then download the latest root update package. Because root update packages are cumulative, you must install only the latest package to receive all root certificates in the program.

For more information about this issue see, the following Microsoft Knowledge Base article:

2831435 Connectivity issues may occur when the Baltimore CyberTrust Root certificate is not installed on client computers that use Microsoft Intune

How to restrict the use of certain cryptographic algorithms and protocols in schannel.dll

There may be situations that require you to restrict the use of certain cryptographic algorithms and protocols in the schannel.dll file. For more information about how to do this, see the following Knowledge Base article:

245030 How to restrict the use of certain cryptographic algorithms and protocols in Schannel.dll

SQL and AD assessment

SQL and AD Assessment require .NET Framework 4 to run on each agent to be assessed. Analysis runs on the SQL Server machines and on the domain controllers (for AD). SQL Assessment supports the Standard, Developer, and Enterprise editions of SQL Server (all currently supported versions).

Malware assessment

Windows 7 and Windows Server 2008 R2 have the issues described and tracked here:

http://feedback.azure.com/forums/267889-azure-operational-insights/suggestions/6519211-windows-server-2008-r2-sp1-servers-are-shown-as-n

You can see what anti-malware products are enabled by following this thread:

http://feedback.azure.com/forums/267889-azure-operational-insights/suggestions/6519202-support-other-antivirus-products-in-malware-assess

Direct agent specific information

Most of the errors in the table above under ‘Procedure 4’ about ‘Management Servers’ also apply to Direct Agent. In Direct Agent, each agent is responsible for talking to Operational Insights on its own, as opposed to when integrated with Operations Manager where it is the management server that sends data on behalf of the agents reporting to it, thus acting as a gateway.

With Direct Agent, the most common issue is Error code 403 which means "forbidden." This is typically a mistyped workspaceId or key. You can get more information on this here.

Other things that we are currently tracking for Direct Agent include the following:

- The Capacity Management Intelligence Pack does not work with Direct Agent. It only works with Operations Manager and also needs Operations Manager to be integrated with Virtual Machine Manager. We are tracking ideas to generalize this Solution here:

http://feedback.azure.com/forums/267889-azure-operational-insights/suggestions/6662146-open-up-the-capacity-management-pack-for-other-sys

-The Alert Management Intelligence Pack does not work with Direct Agent. It requires Operations Manager to synchronize alerts to the cloud.

- Malware Assessment works with the exception of the issue noted above for Windows 7 and Windows Server 2008 R2.

Note that Update Assessment, Change Tracking as well as the Log Management Solutions for collecting Windows Events and IIS Logs works for both Operations Manager and Direct Agent already.

If you need further information on how to install the agent, including scripted and unattended methods, check the documentation here:

https://azure.microsoft.com/en-us/documentation/articles/operational-insights-direct-agent/

If need, Direct Agent supports passing thru proxy and there is a PowerShell script in the documentation above that can be used to configure which proxy server and credentials to use on the agent. It’s an application-specific setting so no process other than MMA’s needs to be able to know how to reach the Internet.

If your VM is in Azure, you can one-click install/enable the agent from the Azure portal:

http://azure.microsoft.com/en-us/updates/easily-enable-operational-insights-for-azure-virtual-machines/

Also, be aware that there is currently only a 64-bit version of the agent. Feedback for a potential 32-bit agent is being tracked here:

http://feedback.azure.com/forums/267889-azure-operational-insights/suggestions/6744349-support-for-windows-2003-and-2008-servers-32-bit

Windows Azure diagnostics information

Log management thru Azure Portal integration allows you to also collect Windows events from Windows Azure Diagnostics (WAD) Storage. This works for Cloud Services roles and IaaS VMs configured to write to WAD.

Collecting IIS Logs from WAD works for Cloud Services and for IaaS VMs but not currently for Azure Web Sites. This is tracked here:

http://feedback.azure.com/forums/267889-azure-operational-insights/suggestions/6519351-collect-iis-logs-from-windows-azure-diagnostics-st

Check out and vote the other ideas about what to collect for this category in our forum:

http://feedback.azure.com/forums/267889-azure-operational-insights/category/88086-log-management-and-log-collection-policy

Lastly, here is a good paper on how to configure your Microsoft Azure roles and VMs to write to Windows Azure Diagnostics storage:

http://download.microsoft.com/download/B/6/C/B6C0A98B-D34A-417C-826E-3EA28CDFC9DD/AzureSecurityandAuditLogManagement_11132014.pdf
OpsMgr 2012 R2
Properties

Article ID: 3126513 - Last Review: 11/17/2016 20:31:00 - Revision: 5.0

Microsoft System Center 2012 Operations Manager, Microsoft System Center 2012 R2 Operations Manager, Microsoft Operations Management Suite

  • kbexpertiseadvanced kbsurveynew kbtshoot KB3126513
Feedback