So far, we have mostly ignored the implementations of the getNumber
and setNumber
operations we defined in our discussion of object creation. Obviously, getNumber
and setNumber
must be interlocked against concurrent access — without this interlock, concurrent requests from clients could result in one thread writing to the _phNum
member while another thread is reading it, with unpredictable results. (Conversely, the name
operation need not have an interlock because the name of a phone entry is immutable.) To interlock getNumber
and setNumber
, we can add a mutex _m
to PhoneEntryI
:
class PhoneEntryI : public PhoneEntry { public: // ... static PhoneEntryFactoryIPtr _factory; private: const string _name; string _phNum; Mutex _m; };
The getNumber
and setNumber
implementations then lock _m
to protect _phNum
from concurrent access:
string PhoneEntryI::name(const Current&) { return _name; } string PhoneEntryI::getNumber(const Current&) { // Incorrect implementation! Mutex::Lock lock(_m); return _phNum; } void PhoneEntryI::setNumber(const string& phNum, const Current&) { // Incorrect implementation! Mutex::Lock lock(_m); _phNum = phNum; }
This looks good but, as it turns out, destroy
throws a spanner in the works: as shown, this code suffers from a rare, but real, race condition. Consider the situation where a client calls destroy
at the same time as another client calls setNumber
. In a server with a thread pool with more than one thread, the calls can be dispatched in separate threads and can therefore execute concurrently.
The following sequence of events can occur:
- The thread dispatching the
setNumber
call locates the servant, enters the operation implementation, and is suspended by the scheduler immediately on entry to the operation, before it can lock_m
. - The thread dispatching the
destroy
call locates the servant, entersdestroy
, successfully removes the servant from the Active Servant Map (ASM) and the_names
set, and returns. - The thread that was suspended in
setNumber
is scheduled again, locks_m
, and now operates on a conceptually already-destroyed Ice object.
Note that, even if we lock _m
in destroy
, the problem persists:
void PhoneEntryI::destroy(const Current& c) { // Note: still not correct. IceUtil::Mutex::Lock lock(_m); c.adapter?>remove(_name); _factory?>remove(_name, c.adapter); }
Even though destroy
now locks _m
and so cannot run concurrently with getNumber
and setNumber
, the preceding scenario can still arise. The problem here is that a thread can enter the servant and be suspended before it gets a chance to acquire a lock. With the code as it stands, this is not a problem: setNumber
will simply update the _phNum
member variable in a servant that no longer has an ASM entry. In other words, the Ice object is already destroyed — it just so happens that the servant for that Ice object is still hanging around because there is still an operation executing inside it. Any updates to the servant will succeed (even though they are useless because the servant's destructor will run as soon as the last invocation leaves the servant.)
Note that this scenario is not unique to C++ and can arise even with Java synchronized operations: in that case, a thread can be suspended just after the Ice run time has identified the target servant, but before it actually calls the operation on the target servant. While the thread is suspended, another thread can execute destroy
.
While this race condition does not affect our implementation, it does affect more complex applications, particularly if the servant modifies external state, such as a file system or database. For example, setNumber
could modify a file in the file system; in that case, destroy
would delete that file and probably close a file descriptor or stream. If we were to allow setNumber
to continue executing after destroy
has already done its job, we would likely encounter problems: setNumber
might not find the file where it expects it to be or try to use the closed file descriptor and return an error; or worse, setNumber
might end up re-creating the file in the process of updating the already-destroyed entry's phone number. (What exactly happens depends on how we write the code for each operation.)
Of course, we can try to anticipate these scenarios and handle the error conditions appropriately, but doing this for complex systems with complex servants rapidly gets out of hand: in each operation, we would have to ask ourselves what might happen if the servant is destroyed concurrently and, if so, take appropriate recovery action.
It is preferable to instead deal with interleaved invocations of destroy
and other operations in a systematic fashion. We can do this by adding a _destroyed
member to the PhoneEntryI
servant. This member is initialized to false by the constructor and set to true by destroy
. On entry to every operation (including destroy
), we lock the mutex, test the _destroyed
flag, and throw ObjectNotExistException
if the flag is set:
class PhoneEntryI : public PhoneEntry { public: // ... static PhoneEntryFactoryIPtr _factory; private: const string _name; string _phNum; bool _destroyed; Mutex _m; }; PhoneEntryI::PhoneEntryI(const string& name, const string& phNum) : _name(name), _phNum(phNum), _destroyed(false) { } string PhoneEntryI::name(const Current&) { Mutex::Lock lock(_m); if (_destroyed) throw ObjectNotExistException(__FILE__, __LINE__); return _name; } string PhoneEntryI::getNumber(const Current&) { Mutex::Lock lock(_m); if (_destroyed) throw ObjectNotExistException(__FILE__, __LINE__); return _phNum; } void PhoneEntryI::setNumber(const string& phNum, const Current&) { Mutex::Lock lock(_m); if (_destroyed) throw ObjectNotExistException(__FILE__, __LINE__); _phNum = phNum; } void PhoneEntryI::destroy(const Current& c) { Mutex::Lock lock(_m); if (_destroyed) throw ObjectNotExistException(__FILE__, __LINE__); _destroyed = true; c.adapter?>remove(_name); _factory?>remove(_name, c.adapter); // Dubious! }
If you are concerned about the repeated code on entry to every operation, you can put that code into a member function or base class to make it reusable (although the benefits of doing so are probably too minor to make this worthwhile).
Using the _destroyed
flag, if an operation is dispatched and suspended before it can lock the mutex and, meanwhile, destroy
runs to completion in another thread, it becomes impossible for an operation to operate on the state of such a "zombie" servant: the test on entry to each operation ensures that any operation that runs after destroy
immediately raises ObjectNotExistException
.
Also note the "dubious" comment in destroy
: the operation first locks _m
and, while holding that lock, calls remove
on the factory, which in turn locks its own _namesMutex
. This is not wrong as such, but as we will see shortly, it can easily lead to deadlocks if we modify the application later.