Workflow generating large volume of system jobs and logs

Applies to: Dynamics 365

Symptoms


An asynchronous workflow gets invoked rapidly over a short time period generating a large volume of system jobs. This strains the asynchronous service causing a spike in the backlog it needs to process resulting in performance degradation which ultimately impacts all system jobs. It also causes an increase in database size because of the large number of async operations and associated workflow logs.

Cause


When an asynchronous workflow is activated and triggered, new jobs will be queued up by Dynamics 365 Asynchronous Service and processed based on available resources.

Under certain circumstances, an asynchronous workflow can end up being invoked several times rapidly causing a large number of system jobs to be created in a short period. This occurs either on account of a bulk operation that triggers the workflow aggressively or erroneous logic within some other workflow and/or plugin that invokes it with high frequency.

The resources available to the asynchronous service are limited and if its backlog gets filled up by such a runaway workflow, performance degradation that broadly impacts all system jobs is common. Additionally, because of the large volume of jobs generated in such a scenario, the database size will also increase.

Resolution


In such cases the asynchronous workflow infrastructure is able to self-heal to a certain extent once the identified workflow is deactivated. Steps to achieve this are listed below -->

  1. Navigate to Settings -> Processes and select the relevant workflow. Alternatively, find the workflow using Advanced Find.
  2. Once selected, click on 'Deactivate' to disable and set the workflow to 'Draft' state. Optionally if not needed, you can also delete the workflow at this point.

Once the steps above have been performed, new jobs will no longer be created. Additionally, any jobs that were already created and placed in the asynchronous queue before deactivation, but have not been picked up for processing yet will be cancelled immediately when they are picked up for processing. This will assist in draining the async backlog more quickly.

The remaining jobs that are already 'In Progress' will remain unaffected and will complete as expected. The backlog will drain steadily over a period of time to recover back to nominal levels.