C plus plus:Modern C plus plus:Containers

From GPWiki
Jump to: navigation, search

Modern C++ : Going Beyond "C with Classes"

Introduction

The previous article in this series talked about std::vector. Though I used the word "contains" in the output of the demo program, I glossed over the fact that std::vector is a Standard Container. Something is a Standard Container if and only if it satisfies the C++ Standard's requirements for a container. Thanks to templates, these requirements don't involve deriving from any specific base class, unlike, say, Java's CharSequence.

You can find the full list of Container requirements at Roguewave; They're not copied here. One nice consequence of this is that you can write your own containers, but for now, this article covers just the subcategories of Containers and the examples provided by the Standard Library.

Sequence Containers

Vector

std::vector is a Sequence Container, so you can probably make a fairly good guess at what a Sequence Container is. Basically, it just means that the programmer has control over where the elements are. If you insert something in a certain location and the insertion succeeds, it'll be where you asked it to be.

You'll notice that the Container Requirements specify a few optional member functions. A few of them should be familiar from std::vector, such as push_back() and at().

Interestingly, they're only allowed if they can be implemented in (amortized) constant complexity. std::vector has push_back(), as we've seen, but not push_front because to insert at the beginning of a vector requires moving ALL the elements in the vector right by one, which is clearly O(n). push_back on a vector is amortized constant because it doesn't actually allocate only one more, but instead increases the allocated space by some factor, often doubling it. This means that push_backing n elements is O(n), which means that a single push_back is amortized O(1). ( This is the trick behind how std::vector is able to be so much faster than new[] in the example in the Vectors section. The std::vector-using version is O(n) and the new[]-using one is O(n*n). )

Deque

std::deque is a somewhat strange container. It's not used terribly often, but now's a good time to mention it because it includes all of the optional Sequence Container member functions.

std::deque's main claim to fame is that it offers the push_front member function. As such, it's a decent choice if you are using it mostly as a queue ( as it's name, Double-Ended Queue, suggests ) but need more than just the "pure" queue interface offered by the std::queue adapter ( which by default uses std::deque internally anyways ).

Also, since it doesn't need to be stored contiguously in memory reallocations don't have to copy the entire contents, it can be useful if you need a huge container.

List

If you've taken a data structures course, you've undoubtedly heard of linked list. The std::lib's implementation of a doubly-linked list is called std::list.

std::list offers ((push|pop)_)?(back|front), but, unsurprisingly, not operator[] or at, since getting to the nth element requires following n links -- O(n), not O(1).

Because linked lists work by having nodes store pointers to the preceding and following nodes, lists have lots of special properties. Inserting or erasing from any position is O(1). Also, the special splice and merge member functions let you move elements around inside a list in constant time, or even between lists in usually constant time.

Also, the nodal design means that elements stay at the same location in memory until they are erased. Compare the output for std::vector and std::list in the following little program:

#include <iostream>
#include <vector>
#include <list>
 
int main() {
 
    std::list<int> mylist;
    mylist.push_back(1);
    mylist.push_back(2);
    int *list2 = &mylist.back();
    std::cout << "*list2 = " << *list2 << std::endl; 
    mylist.push_back(3);
 
    std::vector<int> myvector;
    myvector.push_back(1);
    myvector.push_back(2);
    myvector.push_back(3);
    int *vector2 = &myvector[1];
    std::cout << "*vector2 = " << *vector2 << std::endl; 
 
    // erase the first element of each
    mylist.erase( mylist.begin() );
    myvector.erase( myvector.begin() );
 
    // what to the pointers point to now?
    std::cout << "*list2 = " << *list2 << std::endl; 
    std::cout << "*vector2 = " << *vector2 << std::endl; 
 
}

Running it you will probably see the following:

$ ./invalidation
*list2 = 2
*vector2 = 2
*list2 = 2
*vector2 = 3

If you think a little about how std::vector ( or dynamic arrays ) work, it should be fairly clear why you end up with a 3 at vector2.

Associative Containers

Unlike Sequence Containers, Associative Containers decide their own ordering. As such, an insert into one does not require that you specify a position. ( It's possible to give a position, but it's treated as a "hint". )

A comprehensive discussion of associative containers ( as with std::list ) requires iterators, so I'll save it to use in examples in the next section.

Set and MultiSet

std::set is usually implemented as a Red-Black tree ( or an AVL tree ). If you don't know what those are, the important part is that it means that the contents are sorted and organized in memory in such a way that insertion, erasure, and lookup can all be done in lg(n) time.

By default, all the elements are sorted into ascending order based on operator<. It's possible to use different orderings, but that discussion will need to wait for a description of functors.

std::multiset is like a std::set except it allows duplicate entries. Note that neither std::set nor std::multiset ever use operator==. Instead, two elements a and b are deemed equivalent if !(a < b) && !(b < a).

Map and MultiMap

std::map<K,D> is essentially just a nice interface to std::set< std::pair<K,D> >, where ordering and equivalence is determined based on the key ( in this case of type K ).

std::map is the most used of the associative containers and also the one for which the name makes the most since -- a map associates a value to a key. Since a std::map cannot have 2 elements with the same key, it has a special syntax ( using operator[] ) to get the value associated with a key. The only thing of which to be wary with this syntax is that the value type needs to be able to be default constructed since operator[] returns a reference to a newly-inserted value if no element with the specified key exists.

Here's a fairly simple example of using a std::map showing the syntax in action:

#include <iostream>
#include <fstream>
#include <ostream>
#include <map>
#include <string>
 
int main() {
 
    std::map< std::string, std::ostream* > output_targets;
 
    std::ofstream outfile( "outfile.txt" );
 
    output_targets["stdout"] = &std::cout;
    output_targets["stderr"] = &std::cerr;
    output_targets["file"] = &outfile;
 
    std::cout << "Please enter some text: ";
    std::string what;
    std::getline( std::cin, what );
 
    std::cout << "Where would you like to write to? ";
    std::string where;
    std::getline( std::cin, where );
 
    std::ostream *os = output_targets[where];
 
    if ( os ) {
        *os << what << std::endl;
    } else {
        std::cerr << "I don't know where '" << where << "' is, sorry.\n";
    }
 
}

This relies on the fact that T() is static_cast<T>(0) when T is a pointer type.

Similarly, std::multimap<K,D> is essentially just a nice interface to std::multiset< std::pair<K,D> >, where ordering and equivalence is determined based on the key ( in this case of type K ). Note that multimaps do not provide operator[] (as maps do) because they can hold any number of pairs with the same key.

Container Adapters

Container adapters are simple wrappers over containers that give a reduced interface to the internal container. The std::lib provides the following:

They're simple enough that just the reference documentation should be enough to understand them. I also haven't found much of an opportunity to use them, but they might be useful if a teach says you have to implement something using only a stack.

std::priority_queue sounds like it could be great for A*, for example, but it doesn't let you go through the contents of the internal container, so unfortunately it's not usable. Luckily, it just uses the Heap Operations included in the standard, so you don't need to re-implement everything.

Pseudo-Containers

If you find yourself writing something that holds elements, it's still a good idea to make it similar to a Container, even if it doesn't quite satisfy all the requirements. The iterator functions are especially useful, as we'll see in the next section.

Here are a few examples:

Choosing a Container

The most common problem that people have after learning about all these Containers is picking one. Luckily, LinuxSoftware.co.nz has a very useful flowchart to help you choose. Most people often also consider Iterator Invalidation, which I'll cover next article.

Even without that, however, it's still possible to give good guidelines about when the different containers are useful.

std::vector vs std::deque

Section 23.1.1 of the ISO C++ Standard reads as follows:

vector is the type of sequence that should be used by default [...] deque is the data structure of choice when most insertions and deletions take place at the beginning or at the end of the sequence.

If you're not sure which container you want, you probably want one of these 2. They're fairly similar, so a number of other people have already compared them. Herb Sutter's GotW #54 and Walter Storm's An In-Depth Study of the STL Deque Container are both interesting, instructive reads.

In Closing

The Standard Library provides many useful, common data structures. Dynamic arrays, linked lists, and balanced binary search trees can cover much of your needs.

Additionally, many other data structures have been implemented as Containers or at least with Container-like interfaces. TR1 includes a hash table which can already be found in a few implementations as std::tr1::unordered_map that is almost certain to make it into the next revision of the C++ Standard. The Boost Multi Array and Boost Multi Index libraries provide respectively a multi-demensional container and one that allows efficient lookup by multiple keys. The Boost Graph library is a remarkable generic library for graphs that also includes many algorithms for working with generic graphs.

What's Next

All these containers are nice, but apart from using templates and classes and overloaded operators to make them more convenient, they're not all that different from things you could write in C. Next time we talk about another brilliant pure abstraction that really lets us start doing cool things: Iterators.