This article was previously published under Q101272
Warning The information in this article includes suggestions regarding theexamination and cleaning of hardware. If you do not have chip-maintenanceexperience, Microsoft recommends that you closely examine your hardwarewarrantee information to avoid invalidating any warrantee you may have andseek help from a trained hardware technician to avoid any damages to thehardware. ANY USE BY YOU OF THE INFORMATION PROVIDED IN THIS ARTICLE IS ATYOUR OWN RISK. Microsoft provides this information "as is" without warrantyof any kind, either express or implied, including but not limited to theimplied warranties of merchantability and/or fitness for a particularpurpose.
This article discusses the extensive study in determining the causes ofsome NMI memory parity errors in Windows with the aid of a high techSIMM tester. The results are not conclusive, and the research into this isongoing.
Both IBM OS/2 2.x and Windows seem to experience problems that appear tobe associated with system memory in some circumstances. It can befrustrating to have a system that is able to run DOS, Windows 3.1, or OS/21.x and suddenly find it cannot run Windows due to this problem. Thefirst issue to clear up is that not all NMI errors are due to memory. Otherboards in the system can cause this problem, and components directly on thesystem motherboard can be at fault.
When memory is at fault, it is usually for the following reasons:
The memory is not functioning at the specified access rate as required by the system board. If the system specification calls for 80 ns access rate, Windows most likely fails if memory is accessing at a slower rate such as 90 ns. Even though the chips may be marked as 80 ns, in testing, some fail to meet this access rate. Quite often memory chips run at a slower speed when they reach operating temperature. This produces an effect called "speed drift." The symptoms are a system which runs Windows when first turned on; however, after 15 minutes or so, the system starts having memory errors. A high quality SIMM tester can cycle the chips through various voltage and heat cycles, so this is fairly easy to see.
The memory meets the system specifications, but the speeds are different between individual SIMM modules. The average access rate may be 70 ns on one SIMM module while the next is running at 60 ns. We have found SIMMs stamped at the factory to be rated at a 70 ns average access rate to actually be running as fast as 50 ns. Although the SIMMs are obviously well under the system required access specification, the difference of 10 ns or more between them can often cause problems on some systems. An interesting note here is that you can move these to a different system board which is using a different BIOS and chip set, and it may not have any memory problems. This is because each BIOS and chip set regulate the "refresh wait states" used for timing, and this difference often allows for variance in speed to be acceptable. If your system's BIOS allows you to adjust the "wait states" for memory refresh, this often will allow the system to run with SIMMs or DRAM memory chips which are running at different access rates. The downside to increasing the number of wait states is a slower system.
The individual chips on the SIMM module are running at different access rates. This requires a sensitive memory testing device to determine. It must be able to gauge the access rate of each individual bit (chip) on the module. A difference of 10 ns or more between bits has been known to cause problems. This once again can be regulated somewhat by the BIOS and chip set of the system board if it allows you to lengthen the refresh wait states for memory access.
One of the memory chips is being affected by "cell leakage." This ends up being a true parity error and is also known as a "soft error." This occurs when the change in the state of an individual cell (a zero or one) electrically leaks into a neighboring cell changing it's state. When the memory is read back, it no longer matches the parity bit's checksum value and an NMI is issued to the processor signaling a parity error has occurred. This memory SIMM must be replaced. If problems persist with replacement chips, there is quite possibly a voltage or heat anomaly occurring with the socket or circuitry which is damaging the chips.
Cache memory is another thing to suspect. We have seen instances where the Cache memory access rates were too slow and caused enormous problems. On most Intel-based 486 computers, a 15 ns to 25 ns is normal. You will most likely have problems if it is slower than 25 ns. The system manufacturer can provide the specifications and locations of these chips.
In general, you should first carefully clean the system of dust. Thisincludes the areas allowing ventilation so that heat does not build upabnormally. The contacts of all boards and SIMMs should be cleaned. You canuse the eraser of a pencil to do this, thus ensuring good contacts. Becertain that all boards are firmly seated in their slots or sockets. It maybe necessary to replace old cabling which may degrade over time and underhigh temperatures. Power supplies can also cause many problems, thus, ifpossible, have the output voltages checked. Monitors can cause strangebehaviors on your system as well. It is also highly recommended thatcomputers be placed on some type of Surge Suppression power strip sinceafter a power outage occurs, the return of power back on is usually afairly high surge and can permanently damage sensitive electricalcomponents of your system.
If you add more memory to the system, it is possible that the BIOS will recognize the full amount of physical RAM that is installed in the server but that Windows will recognize only a part of the RAM. If the server has a redundant memory feature or a memory mirroring feature that is enabled, the full complement of memory may not be visible to Windows. Redundant memory provides the system with a failover memory bank when a memory bank fails. Memory mirroring splits the memory banks into a mirrored set. Both features are enabled or disabled in the BIOS and cannot be accessed through Windows. To modify the settings for these features, you may have to refer to the system user manual or the OEM Web site. Alternatively, you may have to contact the hardware vendor.
For example, if you are running a system that has 4 GB of RAM installed and you then add 4 GB of additional RAM, Windows may recognize only 4 GB of physical memory or possibly 6 GB instead of the full 8 GB. The redundant memory feature or the memory mirroring feature may be enabled on the new memory banks without your knowledge. These symptoms are similar to the symptoms that occur when you do not add the /PAE switch to the Boot.ini file.