Troubleshooting UNIX/Linux agent discovery in Operations Manager 2012
To monitor UNIX or Linux computers in System Center 2012 Operations Manager (OpsMgr 2012), the computers must first be discovered, and the OpsMgr 2012 agent must be installed. The Computer and Device Management Wizard is used to discover and install agents on UNIX and Linux computers. However, discovery may not always find all eligible clients.
If you experience client discovery issues, this guide is for you.
What does this guide do?
Troubleshoots problems in System Center 2012 Operations Manager where UNIX or Linux computers can’t be discovered.
Who is it for?
Admins of System Center 2012 Operations Manager who help resolve UNIX/Linux agent discovery issues.
How does it work?
We’ll begin by asking the type of issue you are facing. Then we’ll take you through a series of steps that are specific to your situation to resolve your issue.
Estimated time of completion:
30-45 minutes.
Welcome to the guide
Select the type of issue you are experiencing below.
Welcome to the guide
Select the type of issue you are experiencing below.
The target address is unreachable
In this situation you will typically receive an error similar to the following:
The WinRM client cannot complete the operation within the time specified. Check if the machine name is valid and is reachable over the network and firewall exception for Windows Remote Management service is enabled.
Most likely causes include the following:
- The host is unreachable due to incorrect name resolution, network outage or host outage.
- A network or host-based firewall is blocking TCP port 1270 connectivity to the target host.
Certificate Errors or Certificate Signing Errors
Select the type of certificate issue you are experiencing below.
Signed certificate verification operation was not successful
When certificate verification fails you will typically get an error similar to the following:
Agent verification failed. Error detail: The server certificate on the destination computer (lx1.contoso.com:1270) has the following errors:The SSL certificate could not be checked for revocation. The server used to check for revocation might be unreachable.The SSL certificate contains a common name (CN) that does not match the hostname.
One common cause of this error is that the agent certificate’s CN value does not match the provided or resolved Fully-Qualified Domain name. To verify this, confirm that that agent host’s hostname and domain name match the Fully-Qualified Domain Name resolved through DNS.
You can view the basic details of the certificate on the UNIX or Linux computer by entering the following command:
openssl x509 -noout -in /etc/opt/microsoft/scx/ssl/scx.pem -subject -issuer -dates
When you do this, you will see output that is similar to the following:
subject= /DC=name/DC=newdomain/CN=newhostname/CN=newhostname.newdomain.nameissuer= /DC=name/DC=newdomain/CN=newhostname/CN=newhostname.newdomain.namenotBefore=Mar 25 05:21:18 2008 GMTnotAfter=Mar 20 05:21:18 2029 GMT
Using this information, validate the hostnames and dates and ensure that they match the name being resolved by the Operations Manager management server. If the hostnames do not match, use one of the following actions to resolve the issue:
- If the UNIX or Linux hostname is correct but the Operations Manager management server is resolving it incorrectly, either modify the DNS entry to match the correct FQDN or add an entry to the hosts file on the Operations Manager server.
- If the UNIX or Linux hostname is incorrect, do one of the following:
- Change the hostname on the UNIX or Linux host to the correct one and create a new certificate.
- Create a new certificate with the desired hostname.
You can also change the hostname and domain name on the certificate by using the –h and –d switches, as in the following example:
/opt/microsoft/scx/bin/tools/scxsslconfig -f -h <hostname> -d <domain.name>
Once complete, restart the agent by running the following command:
/opt/microsoft/scx/bin/tools/scxadmin -restart
If you would rather add an entry to the hosts file, if the FQDN is not in Reverse DNS you can add an entry to the hosts file located on the management server to provide name resolution. The hosts file is located in the \Windows\System32\Drivers\etc folder. An entry in the hosts file is a combination of the IP address and the FQDN. For example, to add an entry for the host named “newhostname.newdomain.name” with an IP address of 192.168.1.1, add the following to the end of the hosts file:
192.168.1.1 newhostname.newdomain.name
Signed certificate verification operation was not successful
Another common cause of this error is that the certificate has been signed by untrusted authority, such as when multiple Management Servers are members of the Resource Pool used for discovery but certificate trust has not been configured between the Management Servers. To verify this, confirm that all Management Servers in the Resource Pool used for Discovery trust each other server’s certificate.
More information on how to manage resource pools for UNIX and Linux computers can be found in the following TechNet document:
Congratulations!
Your UNIX/Linux Agent Discovery issue is resolved.
Sorry
It appears that we are unable to resolve your issue by using this guide. For more help resolving this issue please see our TechNet support forum or contact Microsoft Support.
Certificate signing operation was not successful
When the certificate signing operation is not successful, it is usually caused by one of two problems:
The user account specified for discovery has insufficient privileges to perform file operations involved in signing.
or
Sudo elevation privileges for the user account specified for discovery was not correctly configured.
Resolution:
Verify the user account by inspecting the StdErr output in the error details to identify the cause of the failure.
Also verify the sudo privilege configuration for the account used for certificate signing.
Network Name Resolution Errors
Select the type of network name resolution issue you are experiencing below.
The target address is not resolvable
These issues typically fall into one of two categories:
- Error Description: Failed to resolve IP address 192.168.25.25 to name
This can occur when an IP address for the host was entered for discovery but the IP address is not resolvable to a name in DNS (reverse lookup)
To resolve this issue, correct name resolution (DNS) configuration for the reverse lookup zone, ensuring that an IP address to name mapping exists for the affected host. - Error Description: Failed to resolve name server.contoso.com to IP address
This can occur if an FQDN for the host was entered for discovery but the name is not resolvable to IP address in DNS (forward lookup)
To resolve this issue, correct name resolution (DNS) configuration for forward lookup, ensuring that a host name to IP address mapping exists for the host.
DNS configuration: Forward DNS resolution does not match reverse DNS resolution
In this situation you will typically receive an error similar to the following:
The provided hostname ServerName resolved to the IP address of 10.137.216.102. The hostname ServerName.contoso.com returned by reverse lookup of the IP address 192.168.x.x did not match the provided hostname. Verify the DNS configuration and try the request again.
The most common cause for this type of error is that the records for the host in the forward and reverse DNS lookup zones do not match.
To resolve this issue, correct the records in the forward and reverse lookup zones in DNS so that the host names and IP address match.
SSH Connectivity Errors
Select the error you are receiving below
Failed during SSH discovery. Exit code: -1073479162
Error Description:
Failed during SSH discovery. Exit code: -1073479162Standard Output:Standard Error:Exception Message:An exception (-1073479162) caused the SSH command to fail - No connection could be made because the target machine actively refused it.
Possible Causes:
- The ssh daemon is not running on the target system.
- A network or host-based firewall is preventing ssh connections on TCP port 22.
- Verify that the ssh daemon is running.
- Verity that no network firewalls or host firewall is blocking TCP port 22.
Failed during SSH discovery. Exit code: -1073479118
Error Description:
Failed during SSH discovery. Exit code: -1073479118Standard Output:Standard Error:Exception Message:An exception (-1073479118) caused the SSH command to fail - Server sent disconnect message: type 2 (protocol error : Too many authentication failures for root)
Possible Causes:
- The user account specified for discovery is not permitted to login via ssh.
- The user account specified for discovery was input with an invalid username or password
- Verify that the user is permitted to login via ssh.
- Verify the input credentials and that the user is defined on the target host.
Failed during SSH discovery. Exit code: 1
Error Description:
Failed during SSH discovery. Exit code: 1Standard Output: Sudo path: /usr/bin/Standard Error: sudo: sorry, you must have a tty to run sudoException Message:
Cause:
Sudo elevation was selected in the user credential input, however the requiretty option was not disabled for the user in sudoers.
Resolution:Defaults: <username>!requiretty
.[?1034hopsuser@lx1:~> su - root -c 'sh /tmp/scx-opsuser/GetOSVersion.sh
Error Description:
.[?1034hopsuser@lx1:~> su - root -c 'sh /tmp/scx-opsuser/GetOSVersion.sh; EC=$?; rm -rf/tmp/scx-opsuser;exit $EC'Password:exitsu: incorrect passwordopsuser@lx1:~> exitlogout
Possible Cause:
Su elevation was selected in the user credential input, however an invalid root password was provided for su elevation.
Resolution:
Verify the password input for root in the Elevation the configuration dialog.
Failed during SSH discovery. Exit code: -2147221248
One common -2147221248 exception error you might see is below.
Failed during SSH discovery. Exit code: -2147221248Standard Output:Standard Error: Could not chdir to home directory /home/username: No such file or directory
Cause:
The user account specified for discovery does not have a home directory.
Resolution:
Verify that the user has a home directory at: /home/ and that the user is able to write to this directory.
Failed during SSH discovery. Exit code: -2147221248
Another common -2147221248 exception error you might see is below:
Failed during SSH discovery. Exit code: -2147221248Standard Output:Standard Error: root's password:Exception Message:Operation timed out
Cause:
Sudo elevation was selected in the user credential input, however the user account specified for discovery is not correctly configured to use passwordless sudo elevation, or the required sudo elevation privileges were not granted for the user account used in discovery.
Resolution:
Review sudo elevation configuration documentation and verify user configuration for sudo. Note that passwordless sudo must be configured.
WSMan Connectivity Errors
Select the error you are receiving below
The agent responded to the request but the WSMan connection failed due to : Access is Denied
Possible causes of this error include:
- The agent is installed, and the agent certificate has been signed, however the user credential provided for agent verification is invalid.
- The user account specified for discovery was configured to authenticate with an SSH key, but the user credential provided for agent verification is invalid.
- There is a permission problem or incorrect PAM configuration on the UNIX side.
Sep 3 14:49:07 server auth|security:debug /opt/microsoft/scx/bin/omiserver PAM: pam_authenticate: error Authentication failed.
If you see similar lines in the messages log, it means that the PAM configuration file is missing information about OMIServer. The PAM configuration file can be found under /etc/pam.d
The easiest way to add the information about OMIServer back to the PAM configuration file is to reinstall the SCX agent from scratch on that computer. If that is not easily possible, you can copy the lines pertaining to OMI from a working computer to the non-working computer.
WSMan Only Discovery failed for 192.168.x.x
Possible causes of this issue include:
- The Discovery Type option was set to “Only computers with an installed agent and signed certificate” and the target host has the agent installed, however the target host certificate has not been signed. In order to use the WSMan-only discovery option “Only computers with an installed agent and signed certificate”, the agent must be installed and the certificate manually signed.
- The Discovery Type option was set to “Only computers with an installed agent and signed certificate” but the target host does not have the UNIX/Linux agent currently installed.
- The Discovery Type option was set to “Only computers with an installed agent and signed certificate” but the UNIX/Linux agent is not currently running.
- The Discovery Type option was set to “Only computers with an installed agent and signed certificate” but the target host is unreachable, a network or host-based firewall is preventing connectivity or the UNIX/Linux agent is currently down.
- Manually sign the certificate.
- Verify that the UNIX/Linux agent has been installed.
- Change the option to “Discover all computers” to allow the Discovery Wizard to perform the certificate signing.
- Verify that the UNIX/Linux agent is running and that the target host is reachable.
- Verify that no network firewalls or host firewall is preventing access on TCP port 1270.
Other Errors
Please select your error
The task cannot be executed against the object(s) because the target of the task does not match any of the classes of the object
Error Description:
The task cannot be executed against the object(s) because the target of the task does not match any of the classes of the object.
Cause:
In a System Center 2012 Operations Manager management group, this can occur if the UNIX/Linux management packs imported are Operations Manager 2007 R2 versions.
Resolution:
Import the System Center 2012 versions of the UNIX/Linux operating system management packs.
The agent is installed and the computer is already being monitored by Operations Manager
Error Description:
The agent is installed and the computer is already being monitored by Operations Manager.
Cause:
The target host has already been discovered in this Management Group
Resolution:
No action is required. Agent upgrade or migration to an alternate resource pool can be performed from the UNIX/Linux Servers view in the Administration pane of the Operations Console.
Unable to enumerate Installable agent types
Error Description:
Unable to enumerate Installable agent types. The associated resource pool may still be initializing. If you had selected a newly created resource pool, please wait a few minutes before using it.
Causes:
- The Resource Pool used in discovery is not healthy (e.g. a majority of member servers are offline).
- The Resource Pool used in discovery was recently created and it has not yet fully initialized.
- If the Resource Pool used in discovery was recently created, retry the discovery after several minutes to allow the pool to initialize.
- Otherwise, check the Operations Manager Event Log on the servers that are members of the Resource Pool used for discovery for indications of the source of the problem.
Failed to find a matching supported agent instance in the imported management packs
Error Description:
Failed to find a matching supported agent instance in the imported management packs.Import the Management Pack(s) for this platform in order to discover this computer.
Possible Causes:
- The target host is running an unsupported operating system.
- The correct management pack for the target host’s operating system has not been imported.
- The correct management pack for the operating system has recently been imported but has not yet fully loaded.
- Confirm that the target host is running a supported operating system.
- Import the management pack for the target host’s operating system and version.
- If the management pack was just imported it may still be loading. Wait a few minutes and rerun discovery.