The number of physical processors is incorrectly reported in Windows Server 2008 x86 when MCM based CPUs are used

Article ID: 2711085 - View products that this article applies to.
Expand all | Collapse all

SYMPTOMS

Consider a system that has 4 CPU sockets. When all 4 sockets are populated, the number of Physical Processors (packages) is reported as 4. Now, remove 2 physical processors.
  • On Windows Server 2008 x86, the number of packages is reported as 2 for some CPU models, but 4 for other CPU models (MCM based CPUs).
  • On Windows Server 2008 x64, the number of packages is reported as 2 for all CPU models.
  • On Windows Server 2008 IA64, the number of packages is reported as 2 for all CPU models.

The number of packages can be viewed using a variety of tools. The GetLogicalProcessorInformation function can be used to retrieve this information programmatically.
  • MSInfo32 reports the Physical Packages as 'Processor' entries in the System Summary table.
  • Sysinternals CoreInfo (http://technet.microsoft.com/en-us/sysinternals/cc835722) reports the Physical Packages via the "Logical Processor to Socket Map" section:
    C:\Tools>Coreinfo.exe
    
    Coreinfo v3.05 - Dump information on system CPU and memory topology
    Copyright (C) 2008-2012 Mark Russinovich
    Sysinternals - www.sysinternals.com
    
    Logical to Physical Processor Map:
    *-----------------------  Physical Processor 0
    -*----------------------  Physical Processor 1
    --*---------------------  Physical Processor 2
    ---*--------------------  Physical Processor 3
    ----*-------------------  Physical Processor 4
    -----*------------------  Physical Processor 5
    ------*-----------------  Physical Processor 6
    -------*----------------  Physical Processor 7
    --------*---------------  Physical Processor 8
    ---------*--------------  Physical Processor 9
    ----------*-------------  Physical Processor 10
    -----------*------------  Physical Processor 11
    ------------*-----------  Physical Processor 12
    -------------*----------  Physical Processor 13
    --------------*---------  Physical Processor 14
    ---------------*--------  Physical Processor 15
    ----------------*-------  Physical Processor 16
    -----------------*------  Physical Processor 17
    ------------------*-----  Physical Processor 18
    -------------------*----  Physical Processor 19
    --------------------*---  Physical Processor 20
    ---------------------*--  Physical Processor 21
    ----------------------*-  Physical Processor 22
    -----------------------*  Physical Processor 23
    
    Logical Processor to Socket Map:
    *-*-*-*-*-*-------------  Socket 0
    -*-*-*-*-*-*------------  Socket 1
    ------------*-*-*-*-*-*-  Socket 2
    -------------*-*-*-*-*-*  Socket 3
    
    Logical Processor to NUMA Node Map:
    *-*-*-*-*-*-------------  NUMA Node 0
    ------------*-*-*-*-*-*-  NUMA Node 1
    -*-*-*-*-*-*------------  NUMA Node 2
    -------------*-*-*-*-*-*  NUMA Node 3
    
    Cross NUMA Node Access Costs (relative to same node access):
         00  01  02  03
    00: 3.6 3.9 3.7 3.9
    01: 4.5 4.6 4.6 4.6
    02: 1.5 1.2 1.0 1.0
    03: 3.6 3.4 2.7 2.6
    
    ...
  • Using PowerShell and WMI, each instance of the Win32_Processor class is a Processor Package:
    PS C:\> Get-WmiObject -class Win32_Processor | ft Name,DeviceID,NumberOfCores,NumberOfLogicalProcessors
    Name                          DeviceID     NumberOfCores     NumberOfLogicalProcessors
    ----                          --------     -------------     -------------------------
    AMD Opteron(tm) Processor ... CPU0                     6                             6
    AMD Opteron(tm) Processor ... CPU1                     6                             6
    AMD Opteron(tm) Processor ... CPU2                     6                             6
    AMD Opteron(tm) Processor ... CPU3                     6                             6 

CAUSE

The kernel uses the System Resource Affinity Table (SRAT) to define the topology of the hardware. The SRAT defines the proximity of the logical cores to the NUMA nodes, and the proximity of RAM to the NUMA nodes. Each architecture of Windows Server 2008 has its own implementation of the SRAT interpretation. In the Windows Server 2008 x86 implementation, it is specified that a NUMA node has one or more physical packages in it. That is, a physical package cannot span multiple NUMA nodes. This design matched the hardware capabilities of Windows Server 2008 x86 at launch.

Each logical processor within each package has a SRAT 'processor' record (Type = 0x00) that contains the NUMA proximity and APIC ID of the core. 

The SRAT of a system can be viewed using the kernel debugger's !srat command. Following is an abridged example of a system showing the header and the first 2 processors:

11: kd> !srat
Retrieved srat address from HAL NUMA code
SRAT - HEADER - ffffffffffd0b010
  Signature:               SRAT
  Length:                  0x000002a0
  Revision:                0x02
  Checksum:                0x01
  OEMID:                   AMD   
  OEMTableID:              F10     
  OEMRevision:             0x00000001
  CreatorID:               AMD 
  CreatorRev:              0x00000001
SRAT - BODY - ffffffffffd0b040
       Table Revision: 1
...
ENTRY:
 Type: 0x00
 Length: 0x10
 ProximityId: 0x00
 Processor:
  Enabled: TRUE
  APIC ID: 0x00
ENTRY:
 Type: 0x00
 Length: 0x10
 ProximityId: 0x00
 Processor:
  Enabled: TRUE
  APIC ID: 0x01

When all of the sockets are populated in a server, the SRAT will contain a processor record for each logical processor all assigned to the same proximity (NUMA node). Traditionally, when sockets are removed from the system, the processor entries of the removed packages are simply omitted from the SRAT. This is currently the case for Intel based CPUs and non-MCM based AMD CPUs.

When a Multi-chip module (MCM) based CPU (AMD is only known to make MCM based CPU packages at this time) is present in a system with unpopulated sockets, the MCM CPU represents itself in the SRAT as multiple NUMA nodes. The MCM based CPU spans the NUMA node that it (physically) populates, and a NUMA node that has an unpopulated socket. In this way, the memory controller within the MCM CPU will have separate cache affinity to the RAM associated with the two (or more) nodes spanned.

The reason that an MCM based CPU can do this is that it physically contains separate pieces of silicon (dies) that contain cores and a memory controller. In the case of the AMD Opteron 6168 (12 core), there are two dies with 6 cores on each, and a shared piece of silicon for the memory controller. The cores are split over the NUMA nodes in alignment to their die locality. The SRAT reflects the configuration change by changing the ProximityId of the split off cores to the proximity of the vacant socket.

Windows Server 2008 x86 correctly assigns the logical cores to associated the associated NUMA node, but in doing so, no longer honors the APIC ID package mask due to the Package per NUMA node implementation assumption. Instead of honoring the package defined via the APIC ID (see next), the package processor mask is (implicitly) copied from the NUMA node processor mask, resulting in a package count that is equal to the NUMA node count.

The APIC ID standard defines a limit of 16 cores per package. Each package is defined by the formula "[APIC ID] & 0xF0 >> 8". For example, if there are two packages with 4 cores in each, the APIC ID numbering would be 0x00, 0x01, 0x02, 0x03, 0x10, 0x11, 0x12 and 0x13. The cores would map to two physical packages (0x0n and 0x1n) by virtue of the use of two package masks. Note, when a MCM based CPU is present in a system with unpopulated sockets, the APIC ID does not change, only the proximity (ProximityID) changes.


RESOLUTION

Hotfix information
A supported hotfix is available from Microsoft. For more information, please refer to the following Knowledge Base article:
The hotfix resolves the performance issue described in the above article, as well as the physical processor package issue identified here.

MORE INFORMATION

Sysinternals CoreInfo
Before you apply this hotfix, the SysinternalsCoreInfo reports 4 sockets and 4 NUMA nodes.
C:\Tools>Coreinfo.exe

Coreinfo v3.05 - Dump information on system CPU and memory topology
Copyright (C) 2008-2012 Mark Russinovich
Sysinternals - www.sysinternals.com

Logical to Physical Processor Map:
*-----------------------  Physical Processor 0
-*----------------------  Physical Processor 1
--*---------------------  Physical Processor 2
---*--------------------  Physical Processor 3
----*-------------------  Physical Processor 4
-----*------------------  Physical Processor 5
------*-----------------  Physical Processor 6
-------*----------------  Physical Processor 7
--------*---------------  Physical Processor 8
---------*--------------  Physical Processor 9
----------*-------------  Physical Processor 10
-----------*------------  Physical Processor 11
------------*-----------  Physical Processor 12
-------------*----------  Physical Processor 13
--------------*---------  Physical Processor 14
---------------*--------  Physical Processor 15
----------------*-------  Physical Processor 16
-----------------*------  Physical Processor 17
------------------*-----  Physical Processor 18
-------------------*----  Physical Processor 19
--------------------*---  Physical Processor 20
---------------------*--  Physical Processor 21
----------------------*-  Physical Processor 22
-----------------------*  Physical Processor 23

Logical Processor to Socket Map:
*-*-*-*-*-*-------------  Socket 0
-*-*-*-*-*-*------------  Socket 1
------------*-*-*-*-*-*-  Socket 2
-------------*-*-*-*-*-*  Socket 3

Logical Processor to NUMA Node Map:
*-*-*-*-*-*-------------  NUMA Node 0
------------*-*-*-*-*-*-  NUMA Node 1
-*-*-*-*-*-*------------  NUMA Node 2
-------------*-*-*-*-*-*  NUMA Node 3
...

After you apply this hotfix, the Sysinternals CoreInfo reports 2 sockets and 4 NUMA nodes.

C:\Tools>Coreinfo.exe

Coreinfo v3.05 - Dump information on system CPU and memory topology
Copyright (C) 2008-2012 Mark Russinovich
Sysinternals - www.sysinternals.com

Logical to Physical Processor Map:
*-----------------------  Physical Processor 0
-*----------------------  Physical Processor 1
--*---------------------  Physical Processor 2
---*--------------------  Physical Processor 3
----*-------------------  Physical Processor 4
-----*------------------  Physical Processor 5
------*-----------------  Physical Processor 6
-------*----------------  Physical Processor 7
--------*---------------  Physical Processor 8
---------*--------------  Physical Processor 9
----------*-------------  Physical Processor 10
-----------*------------  Physical Processor 11
------------*-----------  Physical Processor 12
-------------*----------  Physical Processor 13
--------------*---------  Physical Processor 14
---------------*--------  Physical Processor 15
----------------*-------  Physical Processor 16
-----------------*------  Physical Processor 17
------------------*-----  Physical Processor 18
-------------------*----  Physical Processor 19
--------------------*---  Physical Processor 20
---------------------*--  Physical Processor 21
----------------------*-  Physical Processor 22
-----------------------*  Physical Processor 23

Logical Processor to Socket Map:
*-*-*-*-*-*-*-*-*-*-*-*-  Socket 0
-*-*-*-*-*-*-*-*-*-*-*-*  Socket 1

Logical Processor to NUMA Node Map:
*-*-*-*-*-*-------------  NUMA Node 0
------------*-*-*-*-*-*-  NUMA Node 1
-*-*-*-*-*-*------------  NUMA Node 2
-------------*-*-*-*-*-*  NUMA Node 3
...

WMI - Win32_Processor
Before you apply this hotfix, the Win32_Processor WMI class exhibits the following behavior.
  • The number of Win32_Processor instances that are returned is equal to the number of NUMA nodes that are available on the system. A single physical package may be spilt over multiple NUMA nodes, decreasing the number of cores in each NUMA node when compared to a system with fully populated sockets.
  • The NumberOfCores property returns the number of cores on the current instance, which may be less than what the package contains.
  • The NumberOfLogicalProcessors property returns the number of logical processors on the current instance, which may be less than what the package contains.
After you apply this hotfix, the Win32_Processor WMI class exhibits the following behavior.
  • The number of Win32_Processor instances that are returned is equal to the number of physical processors (packages) that are available on the system. 
  • The NumberOfCores property returns the number of cores on the current instance.
  • The NumberOfLogicalProcessors property returns the number of logical processors on the current instance.

To determine whether MCM caused physical package splitting is occurring, compare the value of the NumberOfLogicalProcessors property to the total core count of the processor (as indicated by the CPU vendor). Splitting is occuring if the value of the NumberOfLogicalProcessors property doesn't match the vendor's count (it usually will be half the value expected).

Note, the NumberOfLogicalProcessors property includes both physical cores and hyper-threaded cores. Use the NumberOfCores property to count the number of physical cores (only) so the hyper-threaded configuration does not affect the count. To determine whether hyper-threading is enabled for the processor, compare the value of the NumberOfCores property to the value of the NumberOfLogicalProcessors property. Hyper-threading is enabled if the value of the NumberOfCores property is less than the value of the NumberOfLogicalProcessors property.

Kernel Debugger
The kernel debugger can be used to view the SRAT data or the NUMA configuration via the !srat, !numa and !numa_hal commands. These commands work when live debugging and when using a kernel dump file.

The !numa and !numa_hal commands output the following when using a 4 socket system with 2 sockets populated with AMD Opteron 6168 (12 core) CPUs. Each package is split in to two dies associated with two NUMA nodes, with 6-cores on each die. The system runs with 4 NUMA nodes. If non-MCM CPUs were used, the NUMA count would be 2.

11: kd> !numa
NUMA Summary:
------------
    Number of NUMA nodes : 4
    Number of Processors : 24
    MmAvailablePages     : 0x00FABB28
    KeActiveProcessors   : ************************-------- (00ffffff)

    NODE 0 (FFFFFFFF819435C0):
        ProcessorMask    : *-*-*-*-*-*--------------------- (00000555)
        Color            : 0x00000000
        MmShiftedColor   : 0x00000000
        Seed             : 0x00000002
        Zeroed Page Count: 0x00000000003BF4AD
        Free Page Count  : 0x000000000000CDA2

    NODE 1 (FFFFFFFFA8B2A550):
        ProcessorMask    : ------------*-*-*-*-*-*--------- (00555000)
        Color            : 0x00000001
        MmShiftedColor   : 0x00000008
        Seed             : 0x0000000C
        Zeroed Page Count: 0x00000000003EA164
        Free Page Count  : 0x0000000000009840

    NODE 2 (FFFFFFFF805EA550):
        ProcessorMask    : -*-*-*-*-*-*-------------------- (00000aaa)
        Color            : 0x00000002
        MmShiftedColor   : 0x00000010
        Seed             : 0x00000001
        Zeroed Page Count: 0x00000000003F4ABA
        Free Page Count  : 0x0000000000002053

    NODE 3 (FFFFFFFFA8B61550):
        ProcessorMask    : -------------*-*-*-*-*-*-------- (00aaa000)
        Color            : 0x00000003
        MmShiftedColor   : 0x00000018
        Seed             : 0x0000000D
        Zeroed Page Count: 0x00000000003DE670
        Free Page Count  : 0x0000000000004E05

11: kd> !numa_hal
HAL NUMA Summary
----------------
    Node Count      : 4
    Processor Count : 24

    Node   ProximityId
    ------------------
    0x00   0x00000000
    0x01   0x00000001
    0x02   0x00000002
    0x03   0x00000003

    Proc   Domain       APIC Id
    ---------------------------
    0x00   0x00000000   0x00000000
    0x01   0x00000000   0x00000001
    0x02   0x00000000   0x00000002
    0x03   0x00000000   0x00000003
    0x04   0x00000000   0x00000004
    0x05   0x00000000   0x00000005
    0x06   0x00000001   0x00000006
    0x07   0x00000001   0x00000007
    0x08   0x00000001   0x00000008
    0x09   0x00000001   0x00000009
    0x0A   0x00000001   0x0000000A
    0x0B   0x00000001   0x0000000B
    0x0C   0x00000002   0x00000010
    0x0D   0x00000002   0x00000011
    0x0E   0x00000002   0x00000012
    0x0F   0x00000002   0x00000013
    0x10   0x00000002   0x00000014
    0x11   0x00000002   0x00000015
    0x12   0x00000003   0x00000016
    0x13   0x00000003   0x00000017
    0x14   0x00000003   0x00000018
    0x15   0x00000003   0x00000019
    0x16   0x00000003   0x0000001A
    0x17   0x00000003   0x0000001B

    Domain      Range
    -----------------
    0x00000000  0x0000000000000000 -> 0x0000000480000000
    0x00000001  0x0000000480000000 -> 0x0000000880000000
    0x00000002  0x0000000880000000 -> 0x0000000C80000000
    0x00000003  0x0000000C80000000 -> 0xFFFFFFFFFFFFFFFF
After you apply this hotfix, the !srat, !numa and !numa_hal commands output the same content. The MCM packages still span two NUMA node each. There is no evidence that the package is reported correctly using these commands.  If non-MCM CPUs were used, the NUMA count would be 2.


Note This is a "FAST PUBLISH" article created directly from within the Microsoft support organization. The information contained herein is provided as-is in response to emerging issues. As a result of the speed in making it available, the materials may include typographical errors and may be revised at any time without notice. See Terms of Use for other considerations.

Properties

Article ID: 2711085 - Last Review: May 10, 2012 - Revision: 1.0
APPLIES TO
  • Windows Server 2008 Service Pack 2
  • Windows Server 2008 Datacenter
  • Windows Server 2008 Datacenter without Hyper-V
  • Windows Server 2008 Enterprise
  • Windows Server 2008 Enterprise without Hyper-V
  • Windows Server 2008 for Itanium-Based Systems
  • Windows Server 2008 Foundation
  • Windows Server 2008 Standard
  • Windows Server 2008 Standard without Hyper-V
  • Windows Storage Server 2008 Basic
  • Windows Storage Server 2008 Basic 32-bit
  • Windows Storage Server 2008 Basic Embedded
  • Windows Storage Server 2008 Basic Embedded 32-bit
  • Windows Storage Server 2008 Enterprise
  • Windows Storage Server 2008 Enterprise Embedded
  • Windows Storage Server 2008 Standard
  • Windows Storage Server 2008 Standard Embedded
  • Windows Storage Server 2008 Workgroup
  • Windows Storage Server 2008 Workgroup Embedded
  • Windows Web Server 2008
Keywords: 
KB2711085

Give Feedback

 

Contact us for more help

Contact us for more help
Connect with Answer Desk for expert help.
Get more support from smallbusiness.support.microsoft.com