Debugging flat thunks generated by the thunk compiler can be difficultbecause the thunk mechanism is complex and debugging tools capable oftracing through thunks are difficult to use. This article presents anoverall strategy for debugging flat thunks, several specific debuggingtechniques, and a troubleshooting guide that explains how to fix manycommon thunking problems.
Limitations on What a Target DLL Can Do
Before you get started debugging thunks, keep in mind that there are somelimitations on what a target DLL can do inside a thunk. This is because aWin16-based application calling a Win32-based DLL is not a Win32-basedprocess; likewise, a Win32-based application calling a Win16-based DLL isnot a Win16-based process. Common specific limitations include:
- You cannot create threads inside a thunk from a Win16-based application to a Win32-based DLL.
- The code inside Win32-based DLLs called by thunks should require little stack space because the calling Win16-based processes have much smaller stacks than do Win32-based applications.
- Win16-based DLLs that contain interrupt service routines (ISRs) must not thunk to Win32-based DLLs while handling interrupts.
- Win32-based applications must not pass pointers to data located on the stack as parameters of thunks or call Win16-based DLLs that switch stacks.
Why Debugging Flat Thunks Can Be Difficult
Debugging flat thunks is difficult partly because the flat thunk mechanismis a complex part of the Windows kernel. Its complexity stems from the factthat it must transform function calls in 32-bit compiled code into callscompatible with 16-bit code and vice versa. Because 32-bit code usesdifferent data types and CPU register sets from 16-bit code, the flat thunkmechanism must translate function parameters, switch stacks, and translatereturn values. It is optimized for speed, yet must allow preemptive Win32code to call non-preemptive Win16 code. The thunk compiler makes creatingflat thunks much easier than manually creating them, but it isn't foolproof.
Debugging flat thunks is difficult not only because the mechanism itself iscomplex, but also because the necessary debugging tools are more difficultto master. Application-level debuggers such as the Microsoft Visual C++debugger and WinDBG cannot trace through thunks because they consist ofboth 32-bit and 16-bit code and cause the system to claim or release theWin16Mutex. To trace through a thunk, you need to use a system-leveldebugger such as WDEB386.EXE. The major drawbacks to using WDEB386.EXE arethat you need to know Intel x86 assembly language, know how Intel x86microprocessors work, and remember many debugger commands.
The Best Strategy to Use
The best strategy for debugging thunks is to divide and conquer because isrelatively easy and eliminates most of the problems before you need totrace through assembly language code in a system-level debugger. Flatthunks are composed of a Win32-based DLL and a Win16-based DLL, so it ispossible to test each of these in isolation before testing them together.Create a Win16-based application to test the Win16-based DLL, and create aWin32-based application to test the Win32-based DLL. Doing so allows you touse a wide variety of debugging tools to verify that each side worksproperly.
Preliminary Checklist - Before Compiling with the Thunk Compiler
Once you've verified that each side works correctly, it's time to put thetwo together to test the thunk itself. Before you compile the thunk withthe thunk compiler, make a preliminary check of the following items:
- In your thunk script, make sure that each function has the correct number and types of parameters. Also make sure that the parameter types are supported by the thunk compiler. If they aren't, you will have to change the parameter somehow to pass the data with a supported type.
- If you pass any structures as parameters, make sure you use the same structure packing in your Win32-based DLL, Win16-based DLL, and thunk script. You set structure packing in your C/C++ compiler's command line, and in the thunk compiler command line. Note that the thunk compiler's packing switch is lowercase for the 16-bit side, and uppercase for the 32-bit side.
- Make sure that the functions you're thunking to are exported correctly and use the PASCAL calling convention if they're 16-bit, or _stdcall if they're 32-bit. The thunk compiler does not support the _cdecl and __fastcall calling conventions.
- Make sure that your Win32-based DLL calls ThunkConnect32() each time its DllMain() function is called. Likewise, make sure the Win16-based DLL has an exported DllEntryPoint() function, separate from its LibMain(), that calls ThunkConnect16() and returns TRUE if ThunkConnect16() succeeds.
NOTE: You actually call XXX_ThunkConnect16() and XXX_ThunkConnect32() where XXX is the symbol you define with the thunk compiler's -t switch. The code generated by the thunk compiler uses these symbols to generate tables that call ThunkConnect16() and ThunkConnect32.
- Make sure that the value specified in the thunk compiler's command line -t switch is the same for both the Win32 and Win16 thunk DLLs. The value must also correspond to the prefix of the ThunkConnect calls in your Win16-based and Win32-based DLLs (see the note in step 4).
- Verify that the Win16-based DLL has DLLEntryPoint exported with the RESIDENTNAME keyword in its module definition (.DEF) file. Without the RESIDENTNAME keyword, the ThunkConnect32/ThunkConnect16 call will fail and the DLLs will not load.
- Verify that the 16-bit DLL has XXX_ThunkData16 exported with the RESIDENTNAME keyword in its module definition (.DEF) file.
- Verify in your Win16-based DLL's makefile that the resource compiler is marking the DLL as 4.0. If it is marked less than 4.0, it won't load and the thunk will fail.
- If your 32-bit to 16-bit thunk function returns a pointer, make sure that the base type is the same size on both the 16-bit and 32-bit sides of the thunk. If the size of the base type is different, then the thunk compiler issues an error message stating, "Cannot return pointers to non-identical types." One way to work around this problem is to return a pointer to a different, but compatible, data type. For example, a thunk cannot return a pointer to an int because an int is two bytes on the 16-bit side, but four bytes on the 32-bit side. Change the thunk's return type from a pointer to an int to a pointer to a long in the thunk script and the source code of the Win16-based and Win32-based DLLs.
If you write a 16-bit to 32-bit thunk that returns a pointer, the thunk compiler issues an error message stating, "Pointer types may not be returned." The thunk compiler does not allow 16-bit to 32-bit thunks to return pointer types because once the thunk has returned from the 32-bit function, the pointer will not point to data in the correct Win32-based process address space. This is because the address spaces of all Win32-based processes use the same range addresses and are preemptively context-switched.
- If the linker reports an "unresolved external" error and the symbol is a function name that is spelled consistently throughout all source code, module definition files, and the thunk script, make sure that all occurrences of its prototype are consistent. On the Win32 side, the thunk function must be declared with the __stdcall type; on the Win16 side, the function must be declared with the PASCAL type. In C++ projects, be sure to declare and define both sides of the thunk function with the extern "C" linkage specifier in addition to the __stdcall or the PASCAL type.
Trouble-Shooting Guide - After Compiling with the Thunk Compiler
After you check the preliminaries, build your thunk DLLs and try to runthem. If they run, continue with further testing to make sure they're rocksolid. If they don't run, use the following troubleshooting guide todetermine and fix the cause of the problem.
ThunkConnect16() in the Win16 or ThunkConnect32() in the Win32 side fails:
- Run the debugging versions of the system DLLs. The debugging versions of KERNEL32.DLL and KRNL386.EXE contain many diagnostic messages to tell you why the thunk did not initialize. To run the debugging versions of the system DLLs, use the "Switch to Debug DLLs" icon in the Start Menu under Win32 SDK Tools. Use the "Switch to Non-debugging DLLs" to switch back to the retail version.
- Verify that the Win16-based DLL has a call to ThunkConnect16() and the Win32-based DLL has a corresponding call to ThunkConnect32(). If one of these is missing, then the other will fail, and the thunk DLLs will fail to load.
- Put breakpoints in your Win32 DLL's DllMain(), and in your Win16 DLL's DllEntryPoint() and LibMain() functions to see which DLLs are not loading.
If your ThunkConnect16() and ThunkConnect32() calls are working properly,but the thunk still isn't, it is time to simplify your thunk. You canactually attack this in two ways. First, start by removing parameters fromthe thunk one by one and recompiling it. Or, second, create a simple thunkthat works, and build it up until it fails by following these steps:
- Create a simple thunk and execute it just to make sure you have the thunk mechanism set up correctly. A good choice for a simple thunk is a function with no return value and no parameters. If even the simple thunk doesn't work, run through the preliminary checklist above to make sure you have things set up correctly. Then proceed with step 2.
- Check to make sure the target DLL and any DLLs it relies on can be found and loaded. If one is missing, or the loader can't find it, the thunk won't work.
- Make sure your target DLL isn't doing something that it can't in the context of a thunk.
Once you have a simplified thunk that works, but your real thunk stilldoesn't work, follow these steps:
- Add parameters to the simple thunk one at a time to determine if a parameter is causing the failure. If one is, make sure that the parameter is the right type, that the function is declared and defined with the same number and types of parameters in both DLLs and in the thunk compiler, and that the function is declared as PASCAL or _stdcall.
- If your target DLL is a Win16-based DLL and it can't access its global or static data, make sure you've exported the function correctly. If you use the /GD switch with Visual C++, you must declare and define the function with the __export keyword in the Win16-based DLL's source code. Just listing the function's name in the DLL's module definition (.DEF) file is not enough because the compiler does not process the .DEF file, so it won't generate the prolog and epilog code that exported functions require.
- If calls to LocalAlloc() in your target Win16-based DLL cause general protection (GP) faults, make sure your function is exported as described in step 2.
- If you get a GP fault in KERNEL32 just after your target Win16-based function returns, make sure the target function is declared and defined as PASCAL. The C calling convention cannot be used. Although uncommon in C or C++ code, but more likely in assembly language, make sure that the target function didn't modify the DS, SS, BP, SI, or DI registers.
- If you get a GP fault in your 32-bit thunk DLL or KERNEL32 immediately after your Win32-based target function returns, make sure that the target function is declared as _stdcall and that it didn't modify the DS, ES, FS, SS, EBP, EBX, ESI, or EDI registers. C or C++ code should not cause the registers to be modified, but assembly-language code should be checked carefully.
- If your Win16-based target function returns to an invalid location, make sure it is declared and defined as FAR. This is especially important for small model DLLs; functions in medium and large model DLLs are FAR by default.
- If you experience a GP fault in a Win16-based function when you access more than 64K of data from a pointer passed in as a parameter (that is, a thunked pointer), you need to allocate an array of tiled selectors, as described in the following article in the Microsoft Knowledge Base:
132005On the Win16 side, thunked pointers always consist of a single selector with a limit of 64K, which means you cannot use them as huge pointers. The entire original range of data that the pointer addresses is accessible to the Win16-based target DLL - but only if it creates an array of tiled selectors to reference it, and if it uses huge pointer variables to access the data.
DOCERR: AllocSelector & FreeSelector Documentation Incomplete
- Make sure you only use a thunked pointer in the context of the thunk. Selectors allocated by the thunk compiler for use by Win16-based targets are freed as soon as the thunk returns.
- Put breakpoints at the beginning of your target functions to make sure you're getting into them. If you are, and you've debugged the target side independently of the thunk, and the error is caused inside the target, then chances are good that the target is doing something that can't be done in a thunk, or referencing memory that doesn't exist. Please see steps 7 and 8.