Troubleshoot stuck data warehouse jobs in System Center Service Manager
This article helps you troubleshoot issues that certain data warehouse jobs hang in System Center Service Manager.
Original product version: System Center 2016 Service Manager, System Center 2012 R2 Service Manager, System Center 2012 Service Manager
Original KB number: 4040022
Summary
This article helps you troubleshoot when the following data warehouse jobs are hung:
- MPSyncJob
- DWMaintenance
- Extract jobs for the Service Manager management group and the data warehouse management group
- Transform.Common
- Load.Common
- Load.CMDWDataMart
- Load.OMDWDataMart
Symptoms of hung data warehouse jobs include the following ones:
- The job has the status of Running in the Service Manager console or the PowerShell results. However, the job runs for a long time (for example, more than several hours).
- The start time and end time of the job are inconsistent.
- You don't see data or new data in your reports. Which may indicate that Extract, Transform, and Load (ETL) jobs aren't running.
Problems that aren't covered by this article:
- Errors that occur with individual modules
Transform.Common
failures that occur when a specific module fails or times out- Management pack synchronization failures
- Cube processing issues
- Data integrity issues
Understand the staging and configuration database
The DWStagingAndConfig
database is the database to use when you troubleshoot. Here is a list of the tables in the database:
Infra.Process
Stores information about the jobs, such as
ProcessName
andProcessId
.Infra.ProcessModule
Stores information about the
ProcessModules
, such asVertexName
(name of theProcessModule
),ProcessModuleId
, andModuleConfig
.Infra.ProcessCategory
Stores information about the process categories, such as
ProcessCategoryName
andIsEnabled
. The value ofIsEnabled
must be 1 for any process in this category to run.Infra.Batch
Stores information about all batches, such as
BatchId
,BatchStartTime
, andBatchEndTime
.Infra.WorkItem
Stores information about all the
WorkItems
of a batch. You can see any errors for failedWorkItems
of a batch in this table.Infra.Status
Stores information about respective
StatusIds
. Here is the data returned from theInfra.Status
table.StatusId StatusName Description 1 Success Success 2 Failed Failed 3 Not Started Not Started 4 Running Running 5 Stopped Stopped 6 Completed Completed 7 Waiting Waiting The most common reason for a stuck data warehouse job is that the data warehouse management server lost communication with Microsoft SQL Server. It occurs if SQL Server was updated without the Service Manager administrator's knowledge. You can check the Operations Manager event log on the data warehouse management server to determine whether it's the reason.
For other reasons, you can use Windows PowerShell cmdlets and SQL Server statements to troubleshoot. In the following resolution process, all PowerShell cmdlets are executed on the data warehouse management server, and the SQL Server statements are run against the DWStagingAndConfig
database.
Preparation
Start SQL Server Management Studio, and verify that you can connect to the SQL Server that hosts the
DWStagingAndConfig
database.To find the SQL Server instance, check the registry or type the following command:
$sql = Get-ItemProperty -path "HKLM:\SOFTWARE\Microsoft\System Center\2010\Common\Database";$sql.StagingSQLInstance
Use the following command to get the list of jobs and their current status:
$jobs = Get-SCDWJob;$jobs | ft -autosize
If no results are returned or you experience an error, use the following command to load the data warehouse cmdlets module, and then rerun the previous command:
import-module (((Get-ItemProperty "HKLM:\SOFTWARE\Microsoft\System Center\2010\Common\Setup").InstallDirectory)+ "Microsoft.EnterpriseManagement.Warehouse.Cmdlets.psd1")
Disable all job schedules
Use the following command to disable all job schedules and verify that all schedules are disabled (ScheduleEnabled
=False):
foreach($job in $jobs){Disable-SCDWJobSchedule -jobname $job.name};Get-SCDWJobSchedule | ft -autosize
Wait until all running jobs complete
Run the following command repeatedly until all jobs that are running complete:
Get-SCDWJob
If certain jobs seem to run forever, use the next steps to reset the jobs.
Note
To make sure that you don't interfere with any currently running job, let all jobs finish before going to the next step unless the jobs seem to freeze or run forever.
Stop the HealthService service
Use Service Control Manager (Services.msc) to stop the Microsoft Monitoring Agent service (HealthService.exe).
You can also use the
Net Stop HealthService
command or theStop-Service
cmdlet to stop the service.Rename the Health Service State folder.
Use the following command to find the folder location:
$dir = Get-ItemProperty -path "HKLM:\SOFTWARE\Microsoft\System Center\2010\Common\Setup";$dir.installdirectory
Set the status of all modules of the stuck jobs to Completed (StatusId = 6)
Find the
BatchId
of the stuck job in the result of theGet-SCDWJob
cmdlet.Here is example output:
BatchId Name Status CategoryName StartTime EndTime IsEnabled 8806 Load.Common Running Load 12/2/2016 7:44:00 AM True In SQL Server Management Studio, run the following statements:
UPDATE Infra.WorkItem SET StatusId = 6 WHERE BatchId = '<BatchId>' UPDATE Infra.Batch SET StatusId = 6 WHERE BatchId = '<BatchId>'
Rerun
Get-SCDWJob
to verify the status.
Create a new batch for the stuck job
Run the following SQL Server statement:
EXEC Infra.CreateBatch '<JobName>'
Rerun
Get-SCDWJob
, verify that the job has a newBatchId
and the status is Not Started.
Release all locks
Run the following SQL Server statement to get the locks:
SELECT * FROM LockDetails
If the result isn't empty, run the following statement to release the lock:
EXEC dbo.ReleaseLock @ResourceName = '<ResourceName>', @LockRequester = '<LockRequester>'
Repeat steps 1 and 2 until all locks are released.
Verify the job status
Run the following command:
Get-SCDWJob
Make sure that you see output similar to the following screenshot:
Verify the affected job has a new BatchId and all jobs have the value of True for
IsEnabled
. If any job shows False forIsEnabled
, run the following SQL Server statement:UPDATE infra.ProcessCategory SET IsEnabled = 1 WHERE IsEnabled = 0
Test whether the affected job runs correctly
Use the following command to start the affected job:
Start-SCDWJob -jobname \<JobName>
If the job completes successfully, go to the next step.
Enable all job schedules and restart the HealthService service
Use the following command to enable all job schedules and verify that all schedules are enabled (ScheduleEnabled=True):
foreach($job in $jobs){Enable-SCDWJobSchedule -jobname $job.name};Get-SCDWJobSchedule | ft -autosize
Start the Microsoft Monitoring Agent service (HealthService.exe).
Check the job status
Run the following command:
Get-SCDWJob
The output should be similar to the following screenshot:
Note
You may see that some process categories are disabled. This is normal, because
DWMaintenance
andMPSyncJob
may disable processes for synchronization.Monitor the jobs for a while to make sure that they run correctly.
Verify that the issue that you experienced no longer occurs. For report issues, it may take several hours to get the report updated with current results.
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for