Frequent state changes on Operations Manager 2007 agents for computer objects may severely impact management server performance.
In System Center Operations Manager 2007 a condition can occur when there are a larger number of computer groups and a high volume of state changes occurring on computer objects, which results in one or both of the following conditions:
1. High CPU and disk utilization on the root management server coming from the <<ProcessName(s)>> process.
This is HealthService process.
2. Queues backup (<<Provide counter name>>) on the gateway, management server and root management server system(s).
This is counter \Health Service Management Groups(Management Group Name)\Send Queue % Used
Every time the state changes on one of the top-level monitors for a computer (example: Availability, Performance, Security, and Configuration) that state change is rolled up to every computer group for which the computer is a member. This can result is a cascading series of state changes and in environments where the volume of state changes are significant and the number of computer groups rolling up the changes are high, then the conditions described may occur.
In order to reduce the impact of the issue consider one of the following actions:
1. Remove any unnecessary computer groups from the management group.
2. Disable the state rollup behavior for the computer groups for which rollup is not necessary to reduce the overall load.
Use overrides to enable and disable the four dependency monitors.
The four dependency monitors are:
Determine how much state changed is created each day
To determine how much state change is created by the contributing monitors each day. Create and run the following query:
datepart(year, timegenerated), datepart(month, timegenerated), datepart(day, timegenerated),
from statechangeevent with(nolock)
inner join state with(nolock)
on statechangeevent.stateid = state.stateid
inner join basemanagedentity with(nolock)
on state.basemanagedentityid = basemanagedentity.basemanagedentityid
inner join managedtype with(nolock)
on basemanagedentity.basemanagedtypeid = managedtype.managedtypeid
and typename = 'Microsoft.Windows.Computer'
inner join monitor with(nolock)
on monitor.monitorid = state.monitorid
and monitorname in
group by datepart(year, timegenerated), datepart(month, timegenerated), datepart(day, timegenerated)
order by datepart(year, timegenerated), datepart(month, timegenerated), datepart(day, timegenerated)
If number is larger than 200k per day. It indicates we are probably getting very big hit on state changes.
Determine if CPU and disk usage are high on the root management server
To determine if CPU and disk usage are high on the root management server, use perfmon to check the following:
· Check ”Processor/%Processor time (_Total)” counter, if average value of this performance counter is larger than 40%.
· Check “LogicalDisk/% Idle time (<OpsMgr installation disk>)” counter, if average value of this counter is less than 80%
Identify if the queues are backing up
To identify if the queues are backing up use perfmon to check and see if “HealthService Management Group/Send Queue % Used(<ManagementGroup Name>)” counter is consistently larger than 10% on RMS/MS/GW.
Note: Operations Manager 2007 SP1 is not affected as the rollup behavior was disabled for that release. That change has since been found to be a significant regression so the rollup behavior was re-enabled for Operations Manager R2.
TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE LAW, MICROSOFT AND/OR ITS SUPPLIERS DISCLAIM AND EXCLUDE ALL REPRESENTATIONS, WARRANTIES, AND CONDITIONS WHETHER EXPRESS, IMPLIED OR STATUTORY, INCLUDING BUT NOT LIMITED TO REPRESENTATIONS, WARRANTIES, OR CONDITIONS OF TITLE, NON INFRINGEMENT, SATISFACTORY CONDITION OR QUALITY, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, WITH RESPECT TO THE MATERIALS.
Article ID: 967537 - Last Review: 01/14/2015 12:39:01 - Revision: 2.0
- kbnosurvey kbarchive kbnomt kbrapidpub KB967537