This article discusses the extensive study in determining the causes of
some NMI memory parity errors in Windows with the aid of a high tech
SIMM tester. The results are not conclusive, and the research into this is
ongoing.
Both IBM OS/2 2.x and Windows seem to experience problems that appear to
be associated with system memory in some circumstances. It can be
frustrating to have a system that is able to run DOS, Windows 3.1, or OS/2
1.x and suddenly find it cannot run Windows due to this problem. The
first issue to clear up is that not all NMI errors are due to memory. Other
boards in the system can cause this problem, and components directly on the
system motherboard can be at fault.
When memory is at fault, it is usually for the following reasons:
- The memory is not functioning at the specified access rate as
required by the system board. If the system specification calls for
80 ns access rate, Windows most likely fails if memory is
accessing at a slower rate such as 90 ns. Even though the chips may
be marked as 80 ns, in testing, some fail to meet this access rate.
Quite often memory chips run at a slower speed when they reach
operating temperature. This produces an effect called "speed
drift." The symptoms are a system which runs Windows when first
turned on; however, after 15 minutes or so, the system starts having
memory errors. A high quality SIMM tester can cycle the chips
through various voltage and heat cycles, so this is fairly easy to
see.
- The memory meets the system specifications, but the speeds are
different between individual SIMM modules. The average access rate
may be 70 ns on one SIMM module while the next is running at 60 ns.
We have found SIMMs stamped at the factory to be rated at a 70 ns
average access rate to actually be running as fast as 50 ns.
Although the SIMMs are obviously well under the system required
access specification, the difference of 10 ns or more between them
can often cause problems on some systems. An interesting note here
is that you can move these to a different system board which is
using a different BIOS and chip set, and it may not have any memory
problems. This is because each BIOS and chip set regulate the
"refresh wait states" used for timing, and this difference often
allows for variance in speed to be acceptable. If your system's
BIOS allows you to adjust the "wait states" for memory refresh,
this often will allow the system to run with SIMMs or DRAM memory
chips which are running at different access rates. The downside to
increasing the number of wait states is a slower system.
- The individual chips on the SIMM module are running at different
access rates. This requires a sensitive memory testing device to
determine. It must be able to gauge the access rate of each
individual bit (chip) on the module. A difference of 10 ns or more
between bits has been known to cause problems. This once again can
be regulated somewhat by the BIOS and chip set of the system board
if it allows you to lengthen the refresh wait states for memory
access.
- One of the memory chips is being affected by "cell leakage." This
ends up being a true parity error and is also known as a "soft
error." This occurs when the change in the state of an individual
cell (a zero or one) electrically leaks into a neighboring cell
changing it's state. When the memory is read back, it no longer
matches the parity bit's checksum value and an NMI is issued to the
processor signaling a parity error has occurred. This memory SIMM
must be replaced. If problems persist with replacement chips, there
is quite possibly a voltage or heat anomaly occurring with the
socket or circuitry which is damaging the chips.
- Cache memory is another thing to suspect. We have seen instances
where the Cache memory access rates were too slow and caused
enormous problems. On most Intel-based 486 computers, a 15 ns to
25 ns is normal. You will most likely have problems if it is slower
than 25 ns. The system manufacturer can provide the specifications
and locations of these chips.
In general, you should first carefully clean the system of dust. This
includes the areas allowing ventilation so that heat does not build up
abnormally. The contacts of all boards and SIMMs should be cleaned. You can
use the eraser of a pencil to do this, thus ensuring good contacts. Be
certain that all boards are firmly seated in their slots or sockets. It may
be necessary to replace old cabling which may degrade over time and under
high temperatures. Power supplies can also cause many problems, thus, if
possible, have the output voltages checked. Monitors can cause strange
behaviors on your system as well. It is also highly recommended that
computers be placed on some type of Surge Suppression power strip since
after a power outage occurs, the return of power back on is usually a
fairly high surge and can permanently damage sensitive electrical
components of your system.
If you add more memory to the system, it is possible that the BIOS will recognize the full amount of physical RAM that is installed in the server but that Windows will recognize only a part of the RAM. If the server has a redundant memory feature or a memory mirroring feature that is enabled, the full complement of memory may not be visible to Windows. Redundant memory provides the system with a failover memory bank when a memory bank fails. Memory mirroring splits the memory banks into a mirrored set. Both features are enabled or disabled in the BIOS and cannot be accessed through Windows. To modify the settings for these features, you may have to refer to the system user manual or the OEM Web site. Alternatively, you may have to contact the hardware vendor.
For example, if you are running a system that has 4 GB of RAM installed and you then add 4 GB of additional RAM, Windows may recognize only 4 GB of physical memory or possibly 6 GB instead of the full 8 GB. The redundant memory feature or the memory mirroring feature may be enabled on the new memory banks without your knowledge. These symptoms are similar to the symptoms that occur when you do not add the
/PAE switch to the Boot.ini file.
Article ID: 101272 - Last Review: February 20, 2007 - Revision: 3.2
APPLIES TO
- Microsoft Windows 2000 Server
- Microsoft Windows 2000 Advanced Server
- Microsoft Windows 2000 Professional Edition
- Microsoft Windows 2000 Datacenter Server
- Microsoft Windows NT Advanced Server 3.1
- Microsoft Windows NT Server 3.5
- Microsoft Windows NT Server 3.51
- Microsoft Windows NT Server 4.0 Standard Edition
- Microsoft Windows NT Workstation 3.1
- Microsoft Windows NT Workstation 3.5
- Microsoft Windows NT Workstation 3.51
- Microsoft Windows NT Workstation 4.0 Developer Edition
- Microsoft Windows NT Advanced Server 3.1