Debugging flat thunks generated by the thunk compiler can be difficult
because the thunk mechanism is complex and debugging tools capable of
tracing through thunks are difficult to use. This article presents an
overall strategy for debugging flat thunks, several specific debugging
techniques, and a troubleshooting guide that explains how to fix many
common thunking problems.
Before you get started debugging thunks, keep in mind that there are some
limitations on what a target DLL can do inside a thunk. This is because a
Win16-based application calling a Win32-based DLL is not a Win32-based
process; likewise, a Win32-based application calling a Win16-based DLL is
not a Win16-based process. Common specific limitations include:
You cannot create threads inside a thunk from a Win16-based application to a Win32-based DLL.
The code inside Win32-based DLLs called by thunks should require little stack space because the calling Win16-based processes have much smaller stacks than do Win32-based applications.
Win16-based DLLs that contain interrupt service routines (ISRs) must not thunk to Win32-based DLLs while handling interrupts.
Win32-based applications must not pass pointers to data located on the stack as parameters of thunks or call Win16-based DLLs that switch stacks.
Why Debugging Flat Thunks Can Be Difficult
Debugging flat thunks is difficult partly because the flat thunk mechanism
is a complex part of the Windows kernel. Its complexity stems from the fact
that it must transform function calls in 32-bit compiled code into calls
compatible with 16-bit code and vice versa. Because 32-bit code uses
different data types and CPU register sets from 16-bit code, the flat thunk
mechanism must translate function parameters, switch stacks, and translate
return values. It is optimized for speed, yet must allow preemptive Win32
code to call non-preemptive Win16 code. The thunk compiler makes creating
flat thunks much easier than manually creating them, but it isn't foolproof.
Debugging flat thunks is difficult not only because the mechanism itself is
complex, but also because the necessary debugging tools are more difficult
to master. Application-level debuggers such as the Microsoft Visual C++
debugger and WinDBG cannot trace through thunks because they consist of
both 32-bit and 16-bit code and cause the system to claim or release the
Win16Mutex. To trace through a thunk, you need to use a system-level
debugger such as WDEB386.EXE. The major drawbacks to using WDEB386.EXE are
that you need to know Intel x86 assembly language, know how Intel x86
microprocessors work, and remember many debugger commands.
The Best Strategy to Use
The best strategy for debugging thunks is to divide and conquer because is
relatively easy and eliminates most of the problems before you need to
trace through assembly language code in a system-level debugger. Flat
thunks are composed of a Win32-based DLL and a Win16-based DLL, so it is
possible to test each of these in isolation before testing them together.
Create a Win16-based application to test the Win16-based DLL, and create a
Win32-based application to test the Win32-based DLL. Doing so allows you to
use a wide variety of debugging tools to verify that each side works
Preliminary Checklist - Before Compiling with the Thunk Compiler
Once you've verified that each side works correctly, it's time to put the
two together to test the thunk itself. Before you compile the thunk with
the thunk compiler, make a preliminary check of the following items:
In your thunk script, make sure that each function has the correct number and types of parameters. Also make sure that the parameter types are supported by the thunk compiler. If they aren't, you will have to change the parameter somehow to pass the data with a supported type.
If you pass any structures as parameters, make sure you use the same structure packing in your Win32-based DLL, Win16-based DLL, and thunk script. You set structure packing in your C/C++ compiler's command line, and in the thunk compiler command line. Note that the thunk compiler's packing switch is lowercase for the 16-bit side, and uppercase for the 32-bit side.
Make sure that the functions you're thunking to are exported correctly and use the PASCAL calling convention if they're 16-bit, or _stdcall if they're 32-bit. The thunk compiler does not support the _cdecl and __fastcall calling conventions.
Make sure that your Win32-based DLL calls ThunkConnect32() each time its DllMain() function is called. Likewise, make sure the Win16-based DLL has an exported DllEntryPoint() function, separate from its LibMain(), that calls ThunkConnect16() and returns TRUE if ThunkConnect16() succeeds.
NOTE: You actually call XXX_ThunkConnect16() and XXX_ThunkConnect32() where XXX is the symbol you define with the thunk compiler's -t switch. The code generated by the thunk compiler uses these symbols to generate tables that call ThunkConnect16() and ThunkConnect32.
Make sure that the value specified in the thunk compiler's command line -t switch is the same for both the Win32 and Win16 thunk DLLs. The value must also correspond to the prefix of the ThunkConnect calls in your Win16-based and Win32-based DLLs (see the note in step 4).
Verify that the Win16-based DLL has DLLEntryPoint exported with the RESIDENTNAME keyword in its module definition (.DEF) file. Without the RESIDENTNAME keyword, the ThunkConnect32/ThunkConnect16 call will fail and the DLLs will not load.
Verify that the 16-bit DLL has XXX_ThunkData16 exported with the RESIDENTNAME keyword in its module definition (.DEF) file.
Verify in your Win16-based DLL's makefile that the resource compiler is marking the DLL as 4.0. If it is marked less than 4.0, it won't load and the thunk will fail.
If your 32-bit to 16-bit thunk function returns a pointer, make sure that the base type is the same size on both the 16-bit and 32-bit sides of the thunk. If the size of the base type is different, then the thunk compiler issues an error message stating, "Cannot return pointers to non-identical types." One way to work around this problem is to return a pointer to a different, but compatible, data type. For example, a thunk cannot return a pointer to an int because an int is two bytes on the 16-bit side, but four bytes on the 32-bit side. Change the thunk's return type from a pointer to an int to a pointer to a long in the thunk script and the source code of the Win16-based and Win32-based DLLs.
If you write a 16-bit to 32-bit thunk that returns a pointer, the thunk compiler issues an error message stating, "Pointer types may not be returned." The thunk compiler does not allow 16-bit to 32-bit thunks to return pointer types because once the thunk has returned from the 32-bit function, the pointer will not point to data in the correct Win32-based process address space. This is because the address spaces of all Win32-based processes use the same range addresses and are preemptively context-switched.
If the linker reports an "unresolved external" error and the symbol is a function name that is spelled consistently throughout all source code, module definition files, and the thunk script, make sure that all occurrences of its prototype are consistent. On the Win32 side, the thunk function must be declared with the __stdcall type; on the Win16 side, the function must be declared with the PASCAL type. In C++ projects, be sure to declare and define both sides of the thunk function with the extern "C" linkage specifier in addition to the __stdcall or the PASCAL type.
Trouble-Shooting Guide - After Compiling with the Thunk Compiler
After you check the preliminaries, build your thunk DLLs and try to run
them. If they run, continue with further testing to make sure they're rock
solid. If they don't run, use the following troubleshooting guide to
determine and fix the cause of the problem.
ThunkConnect16() in the Win16 or ThunkConnect32() in the Win32 side fails:
Run the debugging versions of the system DLLs. The debugging versions of KERNEL32.DLL and KRNL386.EXE contain many diagnostic messages to tell you why the thunk did not initialize. To run the debugging versions of the system DLLs, use the "Switch to Debug DLLs" icon in the Start Menu under Win32 SDK Tools. Use the "Switch to Non-debugging DLLs" to switch back to the retail version.
Verify that the Win16-based DLL has a call to ThunkConnect16() and the Win32-based DLL has a corresponding call to ThunkConnect32(). If one of these is missing, then the other will fail, and the thunk DLLs will fail to load.
Put breakpoints in your Win32 DLL's DllMain(), and in your Win16 DLL's DllEntryPoint() and LibMain() functions to see which DLLs are not loading.
If your ThunkConnect16() and ThunkConnect32() calls are working properly,
but the thunk still isn't, it is time to simplify your thunk. You can
actually attack this in two ways. First, start by removing parameters from
the thunk one by one and recompiling it. Or, second, create a simple thunk
that works, and build it up until it fails by following these steps:
Create a simple thunk and execute it just to make sure you have the thunk mechanism set up correctly. A good choice for a simple thunk is a function with no return value and no parameters. If even the simple thunk doesn't work, run through the preliminary checklist above to make sure you have things set up correctly. Then proceed with step 2.
Check to make sure the target DLL and any DLLs it relies on can be found and loaded. If one is missing, or the loader can't find it, the thunk won't work.
Make sure your target DLL isn't doing something that it can't in the context of a thunk.
Once you have a simplified thunk that works, but your real thunk still
doesn't work, follow these steps:
Add parameters to the simple thunk one at a time to determine if a parameter is causing the failure. If one is, make sure that the parameter is the right type, that the function is declared and defined with the same number and types of parameters in both DLLs and in the thunk compiler, and that the function is declared as PASCAL or _stdcall.
If your target DLL is a Win16-based DLL and it can't access its global or static data, make sure you've exported the function correctly. If you use the /GD switch with Visual C++, you must declare and define the function with the __export keyword in the Win16-based DLL's source code. Just listing the function's name in the DLL's module definition (.DEF) file is not enough because the compiler does not process the .DEF file, so it won't generate the prolog and epilog code that exported functions require.
If calls to LocalAlloc() in your target Win16-based DLL cause general protection (GP) faults, make sure your function is exported as described in step 2.
If you get a GP fault in KERNEL32 just after your target Win16-based function returns, make sure the target function is declared and defined as PASCAL. The C calling convention cannot be used. Although uncommon in C or C++ code, but more likely in assembly language, make sure that the target function didn't modify the DS, SS, BP, SI, or DI registers.
If you get a GP fault in your 32-bit thunk DLL or KERNEL32 immediately after your Win32-based target function returns, make sure that the target function is declared as _stdcall and that it didn't modify the DS, ES, FS, SS, EBP, EBX, ESI, or EDI registers. C or C++ code should not cause the registers to be modified, but assembly-language code should be checked carefully.
If your Win16-based target function returns to an invalid location, make sure it is declared and defined as FAR. This is especially important for small model DLLs; functions in medium and large model DLLs are FAR by default.
If you experience a GP fault in a Win16-based function when you access more than 64K of data from a pointer passed in as a parameter (that is, a thunked pointer), you need to allocate an array of tiled selectors, as described in the following article in the Microsoft Knowledge Base:
On the Win16 side, thunked pointers always consist of a single selector with a limit of 64K, which means you cannot use them as huge pointers. The entire original range of data that the pointer addresses is accessible to the Win16-based target DLL - but only if it creates an array of tiled selectors to reference it, and if it uses huge pointer variables to access the data.
Make sure you only use a thunked pointer in the context of the thunk. Selectors allocated by the thunk compiler for use by Win16-based targets are freed as soon as the thunk returns.
Put breakpoints at the beginning of your target functions to make sure you're getting into them. If you are, and you've debugged the target side independently of the thunk, and the error is caused inside the target, then chances are good that the target is doing something that can't be done in a thunk, or referencing memory that doesn't exist. Please see steps 7 and 8.