Symptoms
EvoSTS certificates are managed by Azure Active Directory (Azure AD) and regularly updated individually per tenant, which happens more frequently for some users. The certificate rollover or its schedule is not transparent to the user. It turns out that such a rollover is creating service outages for users running Hybrid Modern Authentication (HMA). The problem occurs when a worker process gets started or recycled or when a machine is brought back from maintenance and diverging key material is present in AD. Upon initialization of any worker process, the first request containing bearer authentication data will load the OAuth libraries and initiate the key material by reading the information from the AuthServer object in AD. After this, the worker process can authenticate the request containing bearer authentication data. However, if the key material in Azure AD (EvoSTS) had been rolled over, it can't authenticate those requests due to invalid message security (key material does not match) as the signature diverges. After a random interval (timer max 30 minutes), the worker process will look up and fetch the key material online via the published metadata endpoint.
If new or diverging keys are found, those will be added and loaded into the process (instance) for the lifetime of the worker process and authentication will work from now on. Since the new key data is never written back to AD, the same iteration starts again for any worker process spawning a new instance.
Resolution
To fix this issue, install one of the following updates:
For Exchange Server 2019, install the Cumulative Update 6 for Exchange Server 2019 or a later cumulative update for Exchange Server 2019.
Cumulative Update 17 for Exchange Server 2016 or a later cumulative update for Exchange Server 2016.
For Exchange Server 2016, install theReferences
Learn about the terminology that Microsoft uses to describe software updates.