C plus plus:Modern C plus plus:Iterators

From GPWiki
Jump to: navigation, search

Modern C++ : Going Beyond "C with Classes"

Contents

Introduction

The Containers discussion avoids many details because they require iterators. For std::vector it was fine, since random access containers can be used just fine with operator[]. A few slipped in (the erasure from a std::vector, for example), but now it's getting too hard to continue without them.

Iterators are a pure abstraction, so anything that acts like an iterator is an iterator. Because we have templates, iterators don't have to derive from anything, so there's no virtual function calls in them that could make them slow.

Iterator Concept

An iterator is basically just something that lets you move through a set of values.

One kind of iterator that you've probably seen already is a pointer into an array. You can use ++ to advance that pointer to the following element, * it to read or write the element at that location, use - to get the distance between 2 iterators, among other things. In fact, many std::vector implementations use plain pointers as iterators, since std::vector's elements must be contiguous in memory.

Iterator Categories

Pointers are Random Access Iterators. This means that you can do all the things you usually associate with pointer arithmetic, including stepping forward or backward by n in constant time or using operator[]. Random Access Iterators are the most permissive iterators, but there are many times when it's not efficient to provide all those capabilities for an iterator, in which case a different category will be used.

One important thing to remember is that although some categories allow more operations than others or are more restrictive in usage patterns, they're all basically used the same way. You always get or set the value by dereferencing with * and reading or assigning; You can always advance to the next position using ++ (assuming you're not at the end of the sequence).

The simplest iterator category is that of Output Iterators. Output iterators are always valid, can only step forward by one (using operator++), and must be written exactly once at each location (you can't skip places or overwrite the current location). Reading from them is meaningless and usually wont work. Output iterators are often used to "do something" with the value assigned to them. The simplest example is an iterator that writes to output. They could also be used to call a function each time something is written to them.

Analogously, there are Input Iterators. They can only be stepped by one, a must be read exactly once in each position, but are not always valid. Since there is almost always a limited amount that can be read from a source, Input Iterators must be able to be compared with == and !=. Once an input iterator compares equal to its associated end iterator, dereferencing it or continuing to move it is no longer safe. These are just about exactly the complement of Output Iterators: they're just sources instead of sinks. You could use them to read from a data stream or have reading from them return the result of a function.

All Forward Iterators are Input Iterators, but you can read or write to a location as many times as you like. Forward Iterators are not Output Iterators, because they have end iterators. ( It is never safe to write to an iterator that's equal to the end iterator. ) Also, various things can make an iterator invalid, at which point nothing involving the iterator is well-defined. This could happen, for example, if you remove the node in a linked list to which an iterator points. They can, however, be used as Output Iterators if you're careful that there are enough valid locations. The simplest example of a Forward Iterator would be in a singly-linked list where it's easy to iterate forward really slow to iterate backwards.

Bidirectional Iterators are Forward Iterators, with the added bonus of being able to use -- to move backwards in the sequence. Moving an iterator before the beginning of the range is undefined. The simplest example here is an iterator for a doubly-linked list.

Random Access Iterators are Bidirectional Iterators that let you move around by more than one position at once using +=, -=, +, and -. They are also the only ones that can be compared with ordering relational operators such as <, that act like arrays with operator[], and can be subtracted to get the distance between them in constant time.

Roguewave has a good page with the specific iterator requirements, if you're interested in the precise details.

Containers and Iterators

Part of the requirements for a Container is that it provide an iterator interface. This means that, no matter what the container, it can always be looped through in the same way.

There are member iterator and const_iterator types, and begin() and end() functions that return the beginning and end of the iterator range. For a Container c, c.begin()==c.end() if and only if c.empty() is true.

iterator lets you modify the element to which it referrs, while const_iterator ( which should really be called iterator_to_const ) does not. begin() and end() will return const_iterators if the object on which they are called is const, or plain iterators otherwise. You should prefer const_iterators if you don't need to modify the contents of the container through them, but remember that insert, erase, and most other member functions only accept normal iterators.

You can also construct all sequence containers and most associative containers from an iterator range. There are also insert() and erase() member functions that let you insert or erase many elements into or from a container at once, usually at significantly improved efficiency over repeatedly calling the single-element versions.

Here's a short program demonstrating the iterator range constructor and a templated function to demonstrate the consistant access through iterators.

#include <iostream>
#include <ostream>
#include <vector>
#include <deque>
#include <set>
#include <list>
 
template <typename Container>
std::ostream &show_elements( std::ostream &sink, Container const &c ) {
    for ( typename Container::const_iterator i = c.begin(); i != c.end(); ++i ) {
        sink << *i << ' ';   
    }
    return sink;
}
template <typename T, unsigned N>
std::ostream &show_elements( std::ostream &sink, T (&c)[N] ) {
    for ( T const *i = c; i != c+N; ++i ) {
        sink << *i << ' ';   
    }
    return sink;
}
 
int main() {
    using std::cout;
    using std::endl;
    int array[9] = {17,1,7,13,5,2,11,3,19};
    show_elements( cout, array ) << endl;
    std::vector<int> vector( array, array+9 );
    show_elements( cout, vector ) << endl;
    std::deque<int> deque( vector.begin(), vector.end() );
    show_elements( cout, deque ) << endl;
    std::set<int> set( deque.begin(), deque.end() );
    show_elements( cout, set ) << endl;
    std::list<int> list( set.begin(), set.end() );
    show_elements( cout, list ) << endl;
}

Obviously, a std::vector::iterator works very differently internally from a std::list::iterator, but thanks to the magic of templates and operator overloading, show_elements doesn't care. The only reason that the overload was needed is that arrays are not Containers. ( The array line could, using <boost/array.hpp>, be switched to boost::array<int,9> array = {17,1,7,13,5,2,11,3,19}; in which case the show_elements overload wouldn't be needed, but this way doesn't need you to get boost and shows how plain old pointers can act as iterators. ) Note that the construction from an iterator range doesn't care what type of iterator it gets, so long as it's an Input Iterator.

Iterators are how you specify positions for insertions, erasures, and just about anything else with containers. They're also returned by Associative Containers' find member function. If the requested element is not found, the container's end() iterator is returned. If find returned a reference, it would be forced to throw if the requested element was not found, which isn't done because it's fairly common to need to look whether an element is in a container or not and needing a try/catch to do that would be a nuisance and often quite slow.

One thing of which to be wary: Associative Containers set the ordering of their elements, so even their plain iterators wont let you change the keys. For example, when dereferenced, std::set<T>::iterator returns a T const & when dereferenced and std::map<K,V>::iterator returns a std::pair< const K, V >. If you need to modify a key, make a local copy of it, remove the copy in the Container, modify the local copy, and then reinsert it.

Iterator Invalidation

Interestingly, the Standard Containers are not defined in the standard by their internal representation. All the standard mandates are complexity guarantees for the various operations and the conditions for something known as iterator invalidation.

Iterators aren't magical. C++ is a language designed for speed that assumes the programmer knows what's going on, so you can have "dangling iterators" the same way you can have "dangling pointers". Iterator invalidation specifies the conditions under which iterators are no longer guaranteed to be legal, at which point they can no longer be used.

These conditions are usually exactly what you'd expect if you were implementing a dynamic array, linked list, binary tree, or whatever yourself, so a bit of thought means it's not too tough to choose a container offering what you need and then using it safely. However, it's still a good idea to use a checked Standard Library implementation when debugging, which adds in extra code to check that you're not using invalidated iterators, comparing iterators into different containers, specifying an insertion position with an iterator into a different container, or many other possible bugs. If you're using a recent version of g++, you can get a checked implementation of the Standard Library by defining _GLIBCXX_DEBUG. I've included an appendix demonstrating the usefulness of a checked std::lib implementation.

Your C++ book should have a full description of the invalidation conditions for nearly every container mutating function, but here's a summary of the most important ones.

std::vector

Anything that causes a reallocation (increase in the result of capacity()) will invalidate all iterators. Some examples include a push_back when size() == capacity(), resize(n) with n > capacity(), inserting m elements such that size()+m > capacity().

If reallocation does not occur, insertion and erasure still invalidate all iterators after the insertion or erasure position. This is the cause of the difference between list and vector in the invalidation example back in the list section of the Containers article.

Finally, the obvious, universal one: iterators to elements that no longer exist are not valid. This would happen after a clear(), a pop_back(), or for all elements at index >=n after a resize(n)

std::list

List is nice in that the only invalidation condition is the universal one. Even if you splice a range into a different list, iterators to elements in that range are still valid.

Special Iterators

Reverse Iterators

The simplest special iterator is reverse_iterator. It's a template that takes a Bidirectional or Random Access iterator and wraps it into one of the same category for which moving works backwards. Using ++ on a reverse iterator performs a -- on the wrapped iterator, for example.

Most Sequence Containers are also Reversible Containers, which means that they have reverse_iterator and const_reverse_iterator member types, as well as rbegin() and rend() member functions to get them. If show_elements were modified to use const_reverse_iterator and rbegin() and rend(), the elements would be printed in reverse order.

The one catch with reverse iterators is that their base() member function does not return the iterator you'd initially expect. Dereferencing a reverse iterator rit is like dereferencing rit.base()-1 ( although it works without using - so that it works for Bidirectional Iterators too ). It is logical, though: If *rit were like *(rit.base()), then what would be the base iterator for something returned by rend()? In general there's no valid iterator before the first. ( Think of a linked list for example. ) As a happy consequence, this means that a range of reverse iterators [a,b) is the same range as [b.base(),a.base()).

Stream Iterators

Iterators don't only come from containers, however. Iterators are also used for much more abstract sequences.

One of the most common non-container iterators is std::ostream_iterator, which is found in <iterator>. It's an Output Iterator, and writing to it is like using << to write to the ostream. Using it, show_elements could be rewritten as follows:

template <typename Container>
std::ostream &show_elements( std::ostream &sink, Container const &c ) {
    std::ostream_iterator< typename Container::value_type > out_iter( sink, " " );
    for ( typename Container::const_iterator i = c.begin(); i != c.end(); ++i ) {
        *out_iter++ = *i;   
    }
    return sink;
}

There is also an analogous std::istream_iterator, reading from which is like using >> to read from the stream.

Insertion Iterators

As mentioned above, Output Iterators are always valid. That probably seems unfeasable, if you're used to using std::strcpy and such where you need to be very careful about the size of your destination range. The std::lib's solution to this are the Insert Iterators.

Insert Iterators are special iterators than, when "assigned", actually call a function to insert the value into a container. The three provided by the std::lib are the following, all of which are templated on the Container type:

The front and back versions call push_front or push_back respectively, and insert_iterator calls insert, which means that you also need to pass its constructor the position. They also provide (back_|front_)?inserter() functions that deduce the type based on the argument, which are very useful for Algorithms.

In Closing

Iterators are a wonderful abstraction of pointers. Now we can loop though a container without needing to know if the next element is just a memory location away or whether we need to follow a next pointer or where we are in an in-order tree traversal.

What's Next

Did you notice how much the new show_elements looks like an implementation of std::memcpy, just using iterators instead of plain pointers? If so, good work—the real power of iterators will be more evident next time when we talk about Algorithms.

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox