The operating system loads a DLL separately for each process because each application has its own address space in Windows NT; the address space is shared in 16-bit Windows and in OS/2. Because the operating system must map pages into the address space for each process, the DLL may be loaded at different addresses in different processes. The memory manager optimizes loading DLLs such that if two processes share the same pages from the same DLL image, they share the same physical memory.
Each DLL has a preferred base address, specified at link time. If the address space range from the preferred base address to the base address plus the image size is not available, then the operating system loads the DLL somewhere else in memory and applies fixups to its addresses. There is no method to specify the load address at load time.
To summarize, the system performs the following steps at load time:
- Examines the image and determines its preferred base address and required size.
- Finds the address space required and maps the image, copy-on-write, from the file.
- Applies internal fixups if the image is not at its preferred base address.
- Fixes up all dynamic link imports by placing the correct address for each imported function into the appropriate entry of the Import Address Table. This table stores 32-bit addresses contiguously; to store up to 1024 imported functions requires it to dirty only one page of memory.
Two types of fixups are available. The first is used for the address of an imported function. According to the Portable Executable specification, this type of fixup is stored in the Import Address Table (IAT), an array of 32-bit function pointers, one for each imported function. The IAT has its own page or pages because it is always modified. A call to an imported function is actually an indirect call through the appropriate entry in the IAT. When an image is loaded at its preferred base address, imported function fixups are the only fixups required.
Note that an optimization is available whereby each import library exports a 32-bit number that corresponds with each function in addition to any name or ordinal number. This serves as a "hint" to speed the fixups performed at load time. If the hints in the application and in the loaded DLL do not match, the loader performs a binary search based on the function name.
The other fixup type is required for references to code or data in the image when the image is loaded somewhere other than at its preferred base address. When the memory manager removes a page from memory, it checks to see of the page has been modified. If not, the page retains its copy-on-write mapping and it can be discarded from memory. Otherwise, it must be written to the page file so the modified page can be recovered from the page file rather than from the executable image file.
Even if an application calls LoadLibrary() for a DLL more than once, the DLL entry point, DllMain(), is called only once and only one DLL_PROCESS_ATTACH entry is created. Similarly, if the application calls FreeLibrary() more than once, DLL_PROCESS_DETACH occurs only for the call in which the DLL reference count returns to zero.
Global instance data for the DLL is stored on a per-process basis (only one data set per process). If it is necessary to store global instance data for each LoadLibrary() call performed in one process, consider using thread local storage (TLS) as an alternative. If you use multiple threads of execution, TLS provides unique data storage for each ThreadID value. This process requires very little overhead for the DLL; it must only create a global TLS index at process initialization. At thread initialization, use GlobalAlloc(), HeapAlloc(), LocalAlloc(), functions in the C run-time library, or another method to allocate a block of memory and call the TlsSetValue() function to store a pointer to the memory using the global TLS index value. Win32 internally stores each thread's pointer indexed by TLS index and ThreadID to provide thread-specific storage.