Sudden loss of critical functionality after updating deployments that host high-volume conference scheduling applications in Skype for Business Server 2015

Applies to: Skype for Business Server 2015

Symptoms


In a Microsoft Skype for Business Server 2015 environment, consider the following scenario:

  • You have installed the Skype for Business May 2017 cumulative update 6.0.9319.281 on a Skype for Business pool that hosts multiple Front End Servers.
  • You use a contact center application such as Clarity Connect that's hosted in this pool to make Skype for Business conferences in high volume.

In this scenario, a subset of users homed in the same pool that hosts the accounts configured in the application would experience loss of very important functionality such as the following:

  • Schedule a meeting
  • Start an ad hoc meeting by using Meet Now
  • Start a pre-scheduled meeting (Already running meetings should continue to work.)
  • Turn a two-party conversation into a multiparty conversation by inviting a third user
  • Sign in after a long gap of inactivity

In addition, you may also experience the following issues:

  • Fabric routing group where in clarity connect or contact center application account resides may fail over to another Front End node.
  • Event ID 32190 may be reported in Event Viewer with the "Store procedure to replicate data from primary frontend to secondary frontend failed" description.

Cause


Contact center applications such as Clarity Connect schedule short-lived (use-and-throw) meetings with high frequency and delete them from the configured account when the call is completed. However, those deleted or expired meetings aren't completely removed from the system by the Skype for Business because of a regression that's introduced in the May 2017 cumulative update.

Occasionally since these applications are known to reuse conference IDs that may have used in the past, conflicts will occur. Then the Skype for Business server treats this as an unexpected fatal condition that resembles a data loss situation and tries to automatically recover from it by reconfiguring a subset of users who include the application's organizer account on other servers that leads to the observed symptoms lasting as long as the reconfiguration takes.

Resolution


To fix this issue, install the following updates: