C plus plus:Modern C plus plus:RAII

From GPWiki
Jump to: navigation, search

Modern C++ : Going Beyond "C with Classes"

Introduction

One common complaint about C and C++ is that you need to manage your own memory. A huge number of C programs end up leaking memory. Admittedly, if you're coding C-style in C++, it is quite difficult to always match your news and new[]s to deletes and delete[]s.

The Concept

RAII is an idiom that takes advantage of templates, destructors, and C++'s absence of GC (Garbage Collection) to provide an elegant, consistant method for handling all resources. GC may be convenient for memory, but I've yet to see one that manages filehandles, mutex locks, or sockets, for example.

RAII is really quite a simple idea. Basically, all resources should be owned by an instance of a class, and that class should release them in its destructor. If that instance is a local variable in a function, the resource will be released when the function returns. If that instance is a member of a class, the resource will be freed once the class is freed, even without a custom destructor.

RAII is an acronym that stands for Resource Aquisition Is Initialisation. You'll probably never actually hear someone use that name though, as it's somewhat misleading. The most important part of RAII is that destructors release the resources, not that they're aquired in constructors.

Examples

fstream

std::fstreams do have a close() member function, but it's rarely needed since, being a RAII class, it closes the file when it goes out of scope.

It also has a constructor that takes the name and modes for the file to open which aquires the resource during initialisation.

Containers

std::vector was the very first thing I covered in this series, and it's a RAII class, as are all containers. They manage the memory they use so you don't have to.

Smart Pointers

The current standard only includes one smart pointer, std::auto_ptr. It's a conceptually simple class that "owns" a pointer and deletes it when it goes out of scope.

The complication with std::auto_ptr comes from its ownership transferring semantics, which will be discussed below.

Scoped Locks

The Boost Thread library uses a nice RAII class called scoped_lock.

Instead of letting users of the library call lock() and unlock() functions on mutexes, they instead create instances of scoped_locks that lock the mutex on construction and release it when destructed. This means that mutexes cannot accidently be left locked and means that they're automatically released in reverse order of locking, thanks to the construction and destruction order guarantees for automatic local variables.

Design Considerations

A basic smart pointer is one of the clearest, most obviously useful situations for RAII, so let's try writing one, starting from the naïve version:

template <typename T>
class naive_ptr {
    T *ptr;
  public:
    naive_ptr() : ptr(0) {}
    explicit naive_ptr(T *p) : ptr(p) {}
    ~naive_ptr() { delete ptr; }
    T *get() const { return ptr; }
    void reset(T *p = 0) { delete ptr; ptr = p; }
    T *release() { T *p = ptr; ptr = 0; return p; }
 
    // And we need to make it act like a pointer too
    T *operator->() const { return ptr; }
    T &operator*() const { return *ptr; }
};
template <typename T>
bool operator==(naive_ptr<T> const lhs, naive_ptr<T> const rhs) {
    return lhs.get()==rhs.get();
}
template <typename T>
bool operator!=(naive_ptr<T> const lhs, naive_ptr<T> const rhs) {
    return !( lhs == rhs );
}

The functions included are fairly simple and obvious:

get
To get the value of the contained pointer, if we need it, since &*myptr is ugly.
reset
To safely change the contained pointer
release
To release the pointer from the control of the naive_ptr, in case we want to keep track of it some other way. ( For example, putting it into a different type of smart pointer or into a ptr_* from the Boost Ptr Container library. )
operator* and operator->
So that it can be dereferenced like a normal pointer.

The only thing here that might be surprising is that so many of the functions are const. The thing to remember here is that a T * const is a very different thing from a T const *—even if the pointer is const, the pointee can still be modified.

Much more interesting are the functions that are not included.

operator[]
This implementation of naive_ptr uses deletenot delete[]—to release the memory associated with the pointer. This means that storing a pointer allocated with new[] in one is quite unsafe, so if we prevent it from looking like an array it'll be harder for people to make this mistake. Similarly, there is no arithmetic provided.
operator T*
Experience has shown that an implicit cast to the pointer type is not a good idea. It ends up allowing the use of subscripting and arithmatic, which, as above, is undesirable. It also makes it legal syntax to call delete with a naive_ptr as the argument, which is clearly bad. You might not think that it would happen, but for people unclear of the idea or when changing old code to use smart pointers, it's quite possible.
operator=(T*)
Giving a smart pointer a pointer to manage is something that should be quite explicit. Once a smart pointer owns a pointer, it'll take care of it. With an implicit operator= from plain pointers it's far to easy for a pointer to become owned by multipule smart pointers. myptr = &*myptr; is quite safe (if pointless) with a regular pointer, but would be fatal on a smart pointer, as it would delete the pointer. Plain pointers can also be repointed to and fro many times without releasing memory, but that's not so with our naive_ptr (unless you religiously use release, but that's not a good plan as it rather defeats the purpose of using a smart pointer in the first place). Plain pointers can also point to stack objects that are not to be deleted, which is also very dangerous with naive_ptr.
operator<
Relational operators on pointers are technically only defined when both pointers point into the same array, which should never be happening with naive_ptr. We could use std::less<T*> instead, as it defines a total ordering for pointers, but that ordering is useful mainly for use as keys in associative containers and, as I'll explain later, it's illegal to store naive_ptrs in containers.

Right now we have something that looks fairly useful. In fact, if you test it out, you might find that it seems to work fine:

#include <iostream>
#include "naive_ptr.hpp"
 
int main() {
    naive_ptr<int> p( new int(13) );
    std::cout << "*p = " << *p << std::endl;
    p.reset( new int(42) );
    std::cout << "*p = " << *p << std::endl;
}

The example above gives the results one would expect and doesn't leak any memory.

So what's the problem? Copies.

It's trivial to make an example that fails miserably:

#include <iostream>
#include "naive_ptr.hpp"
 
int main() {
    naive_ptr<int> p1( new int(13) );
    std::cout << "*p1 = " << *p1 << std::endl;
    naive_ptr<int> p2 = p1;
    std::cout << "*p2 = " << *p2 << std::endl;
}

The output will be fine, but it will (hopefully) crash while it's exiting.

The problem is the classic "Rule of Three" violation. naive_ptr has a pointer member that gets shallow copied when the object is copy constructed or assigned, which in this case results in the same pointer value being deleted twice, resulting in undefined behaviour.

There are 3 basic ways of dealing with this problem:

Don't allow copies
This is the method chosen for streams (such as fstream) in the std::lib. It's certainly easy to implement and is fine in most situations. If you disallow copying of naive_ptrs (by declaring and not implementing a private copy constructor and private assignment operator) and remove the mutating operations (reset and remove), you end up with something quite similar to boost::scoped_ptr from the Boost Smart Ptr library.
Do a deep copy.
This is the method used by containers. Copying a container means making a copy of each element. The intuitive deep copy, ptr ? new T(*ptr) : 0, will fail on polymorphic types, however. (If T were an abstract base class, for example.)
Transfer ownership
This is the method used by std::auto_ptr. As evidenced by the existance of std::auto_ptr_ref (an auxillary class used so transfer semantics to work properly) and the number of revisions the relevant section of the standard went through, it's not simple to implement, but can be incredibly useful. The original owner releases its pointer and the copy assumes ownership. It's particularly nice as it has no runtime overhead compared to normal pointer copies. Thanks to this, std::auto_ptr is a particularly elegant way of returning pointers to heap data from functions.

In Closing

RAII is a great help in writing elegant, safe code. Thanks to this idea, C++ has no need for—and doesn't have—the finally construction found in many GCed languages, such as Java. It's also safer, as it doesn't require the programmer to explicitly call the cleanup code at the end of each path through a function.

What's Next

RAII in the form of std::vector does a great job of managing resources when we previously would have needed to new[] and delete[] manually, but there are other ways of storing objects. Luckily, std::vector isn't the only nice data structure that the std::lib provides. There are lots of other Containers as well, for different situations.