Bradley Grainger home

ATL proxy/stub DllCanUnloadNow bug

All our of COM server DLLs have proxy/stub code, which is used when data needs to be marshalled across process boundaries. For example, LibSys.exe loads the actual code for the LbxMetadataCache object, but LDLS.exe calls this code. Proxy code is loaded into the LDLS.exe process, which forwards the call to the stub code in LibSys.exe, which then calls the actual LbxMetadataCache code. By and large, this proxy/stub code is created automatically for you. With ATL, you have the choice of building the proxy/stub as a separate DLL, or merging the proxy/stub code with your main COM DLL. We chose the latter route, since it halves the number of DLLs we need to ship. Whichever route you choose, the ATL COM server project wizard generates the appropriate code for you. Part of the generated code is the DllCanUnloadNow function. This method returns S_OK (aka true) if it's safe for Windows to unload the DLL, or S_FALSE (aka false) if it's not. Your implementation should keep track of whether any clients are using the objects in your DLL, and only return S_OK if you're not being used any more. If the proxy/stub code is merged into your DLL, your DllCanUnloadNow method has two things to keep track of:

  1. Are any clients using the COM objects defined in this DLL?
  2. Are any clients using proxy code to communicate with COM objects loaded into another process?

You can only unload if the answer to both of these questions is "no".

ATL 7.0 changes the auto-generated code for this function significantly. Here is the code from ATL 6.0:

STDAPI DllCanUnloadNow()
{
#ifdef _MERGE_PROXYSTUB
    // proxy/stub code
    if (PrxDllCanUnloadNow() != S_OK)
        return S_FALSE;
#endif

    // check module lock count
    return (_Module.GetLockCount() == 0) ? S_OK : S_FALSE;
}

Here is the code generated by ATL 7.0:

STDAPI DllCanUnloadNow()
{
#ifdef _MERGE_PROXYSTUB
    // proxy/stub code
    HRESULT hr = PrxDllCanUnloadNow();
    if (FAILED(hr))
        return hr;
#endif

    // check module lock count
    return _AtlModule.DllCanUnloadNow();
}

The problem is that S_FALSE is still a success value, and FAILED(hr) will be false. So even if the proxy says "Don't unload me; clients are still using me as a proxy", the DLL will be unloaded if no clients are using the DLL directly.

So, how did we find this out? If Dragon NaturallySpeaking is running in the background, the system becomes a lot more aggressive about checking for DLLs that are no longer being used. It will very frequently query metcache.dll to find out whether it can be unloaded. The proxy says, "I'm still in use, don't unload me". DllCanUnloadNow ignores that result and asks if any clients are using the COM object directly. This is false, so the DLL tells Windows to unload it. Windows then unloads the DLL. Fractions of a second later, LDLS calls the LbxMetadataCache proxy code, which no longer exists in memory! This, as you might expect, is no good. An exception occurs, and error tips start popping up on the user's screen.

I fixed the problem by changing:

    HRESULT hr = PrxDllCanUnloadNow();
    if (FAILED(hr))
        return hr;

to:

    HRESULT hr = PrxDllCanUnloadNow();
    if (hr != S_OK)
        return hr;

This causes the DLL to stay in memory if any clients are using it as a proxy. Only when no one is using it as a proxy and no one is using the COM objects directly will the DLL be unloaded.

See also First VS2005 bug report.