C plus plus:Modern C plus plus:Algorithms

From GPWiki
Jump to: navigation, search

Modern C++ : Going Beyond "C with Classes"

Introduction

You may have noticed that sequence containers don't have member functions for looking though them for a specific value. That's because, thanks to the iterator abstraction, there's no need for one -- a non-member function that works on iterators is just as efficient and means that the code doesn't need to be duplicated in every container.

In the context of the std::lib, an Algorithm is basically just any function that does work on or to data specified by iterator range(s).

Basic Algorithms

The end of the Iterators section mentions that the second version of the show_elements function looks very much like a memcpy. Since it's the simplest algorithm, it's a good place to start. If you were going to write it yourself, it might look something like the following:

template <typename InputIterator, typename OutputIterator>
OutputIterator iterator_copy(InputIterator begin, InputIterator end,
                             OutputIterator sink) {
    while ( begin != end ) *sink++ = *begin++;
    return sink;
}

The one thing you might have done differently is put the OutputIterator as the first parameter, in order to match the ordering on the C memcpy and strcpy functions. The code is in the above order because the Algorithms that are part of the std::lib ( which I might call std::algorithms ) all have the source range as the first two parameters. It could have been done differently, but this is the way that was chosen, so it's best to stick with it. This way it's "copy (input range) to (output range)". It also means that for just about every std::algorithm, the first two parameters define the iterator range.

Since this is such a useful, basic function, it's in <algorithm> with the name std::copy. The standard version will probably not look exactly like the above code, however. It'll probably be written in a form that's easier for the compiler to optimize. Also, thanks to the magic of overloading, it might also be implemented to call std::memcpy for cases in which the InputIterator and OutputIterator are both pointers to POD types.

This means that std::copy is a perfect choice for all your copying needs. If they can, most implementations will use std::memcpy ( or std::memmove ) where possible to be as fast as possible. But if you use it on a range that's not std::memcpyable ( such as a range of std::strings ), then it'll do the right thing and do element-by-element copy.

Non-Mutating Algorithms

A Non-Mutating Algorithm is basically one that never writes to its iterators.

The simplest example of a Non-Mutating Algorithm is std::find. It would also be fairly trivial to write yourself:

template <typename InputIterator, typename Type>
InputIterator find(InputIterator begin, InputIterator end,
                   Type const &value) {
    while ( begin != end ) {
        if ( *begin == value ) return begin;
        ++begin;
    }
    return begin; // same as return end
}

Note how the "end" iterator represents "not found" ( you might use a null pointer for this in C ). Having this unique value for "not found" is one of the advantages of using half-open ranges.

All Algorithms work on so-called half-open ranges. In interval notation, you would say that the valid elements in the range are [begin,end), meaning that begin refers to a valid one and end does not. This has many advantages:

  • A unique "failure" value: end ( as mentioned above )
  • A convenient length: there are end-begin elements in the range [begin,end).
  • A simple way to represent empty ranges: [begin,begin) has no elements in it. ( If the ends were included, something along the lines of [begin,begin-1] would have to be used for empty ranges. )
  • A trivial termination condition. "If the traversal iterator equals the end, we're done" is much easier to code than "if the traversal iterator equaled the end one loop iteration ago, we're done".

There are a number of useful non-mutating algorithms, so here's a little program to demonstrate a few:

#include <iostream>
#include <algorithm>
#include <numeric>
#include <cstdlib>
#include <ctime>
#include <vector>
 
int main() {
 
    std::srand( std::time(0) );
 
    std::vector<int> v;
    std::vector<int>::iterator vi;
    for ( unsigned i = 0; i < 10; ++i ) {
        v.push_back( std::rand()%10 );
    }
 
    std::cout << "The smallest element is "
              << *std::min_element( v.begin(), v.end() )
              << std::endl;
 
    std::cout << "The largest element is "
              << *std::max_element( v.begin(), v.end() )
              << std::endl;
 
    vi = std::find( v.begin(), v.end(), 5 );
    if ( vi != v.end() ) {
        std::cout << "There is a 5 in position "
                  << ( vi - v.begin() )
                  << std::endl;
    } else {
        std::cout << "There is no 5"
                  << std::endl;
    }
 
    std::cout << "The average of the elements is "
              << std::accumulate( v.begin(), v.end(), 0.0 ) / v.size()
              << std::endl;
 
    std::cout << "There are "
              << std::count( v.begin(), v.end(), 0 )
              << " zeros"
              << std::endl;
 
}

The output will be something along the lines of the following:

The smallest element is 0
The largest element is 6
There is a 5 in position 3
The average of the elements is 3.5
There are 1 zeros
The smallest element is 0
The largest element is 9
There is no 5
The average of the elements is 4.2
There are 2 zeros

The one thing there that needs further explanation is the third parameter of <numeric>'s std::accumulate. That parameter is the initial value of the accumulator and also defines the return type of the function. Here, 0.0 is used to accumulate into a double, avoiding int overflow and making it not use integer division when finding the average.

One important thing to remember is that these algorithms use operators, so if you've overloaded some operators, you can use them to do rather surprising things:

#include <iostream>
#include <vector>
#include <string>
#include <numeric>
 
int main() {
 
    std::vector<char const *> v;
    v.push_back( "how " );
    v.push_back( "are " );
    v.push_back( "you?" );
 
    std::cout << std::accumulate( v.begin(), v.end(), std::string("Hello, ") )
              << std::endl;
 
}

Which has the following output:

Hello, how are you?

Mutating Algorithms

There are many more Mutating Algorithms than Non-Mutating ones, since they actually do things to the elements. std::copy is technically a Mutating Algorithm since it writes to the output iterator, but it's not a very interesting one.

One common example of a Mutating Algorithm is std::sort. Its two parameters are the iterator range on which it works, but unlike the other std::algorithms we've seen so far, it requires Random Access Iterators. The actual algorithm used isn't specified, but it's usually some form of quicksort. ( Roguewave's implementation uses introsort, a form of quicksort that switches to heapsort for inputs that are troublesome for a pure quicksort. ) In any case, you can safely treat it as an O( nlgn ) average-case sort.

One very cool thing about std::sort is that it is actually often faster than C's qsort. Thanks to overloaded operators, std::sort just uses <, which means that there is no function pointer needed. It's very hard for a compiler to inline a call to a comparison function through a function pointer, but for < between integers or an overloaded operator< it's usually not all that hard.

One of the most useful mutating operation is std::remove. The catch with it, however, is that std::remove doesn't actually remove anything. This is rather surprising at first, but think about it: Do iterators have any function that tells you what container they're in? Do they have any function that lets you delete them? Nope. All std::remove does is reorder the elements, returning an iterator to the first one "removed". This lead to the Erase-Remove Idiom:

std::vector<int> v;
put_a_bunch_of_stuff_in( v );
v.erase( std::remove( v.begin(), v.end(), 42 ), v.end() );

If there are no 42s to remove, then the remove returns v.end() and the erase does nothing. If there are 42s, then they get moved to the end of the range and the erase removes them.

Since you can't rearrange elements in Associative Containers, they have special erase member functions that remove all entries with a specified key. Also, Associative Containers have better-than-linear lookup time for elements, so these are much faster than could be done with iterators.

Roguewave's Algorithms documentation is nice enough to give examples and descriptions of the various algorithms, so instead of descriptions, here are a few highlights:

Writing Your Own

It's fairly straight forward to write your own algorithms. The only thing you need to be careful of is what type or iterators you require. Let's start by making an iterator-based version of show_elements, which is more useful anyways since it means you can display subranges and not only entire containers. To keep things simple, it'll always show the elements to std::cout with a space after each element:

template <typename InputIterator>
void show_range(InputIterator begin, InputIterator end) {
    for ( ; begin < end; ++begin) std::cout << *begin << ' ';
}

Fairly obvious, but something should jump out at you. Take a moment to look at the code and maybe try it with a few different containers.

Found the problem?

The templated iterator type InputIterator denotes that that's all this function should require from the iterators, but the template parameter name makes no difference. It uses < to compare the iterators, but only Random Access Iterators are Less-Than-Comparable, so there's a rather severe unnecessary restriction. A better version would use !=:

template <typename InputIterator>
void show_range(InputIterator begin, InputIterator end) {
    for ( ; begin != end; ++begin) std::cout << *begin << ' ';
}

Which can be used with just Input Iterators. Try, for example, opening a text file in a std::ifstream called input_file and running show_range( std::istream_iterator<std::string>(input_file), std::istream_iterator<std::string>() );.

That looks pretty good so far, but there's still that explicit loop. Algorithms are a great way to get rid of loops and make your code cleaner. Of course, algorithms will usually have an explicit loop inside them, but in this case all we're doing is a wrapper, so we can use std::copy and std::ostream_iterator to get rid of the explicit loop:

template <typename InputIterator>
void show_range(InputIterator begin, InputIterator end) {
    std::copy( begin, end,
               std::ostream_iterator< typename InputIterator::value_type >( std::cout, " " ) );
}

This version will work for most container iterators and all special iterators. There's one hidden problem, however: Pointers are iterators, but wont work with this function. Obviously, if InputIterator is, for example, an int*, you can't use :: on it. To get around this problem, there's std::iterator_traits which is partially specialised for pointers and pointers-to-const. Using it, we get a fully working version:

template <typename InputIterator>
void show_range(InputIterator begin, InputIterator end) {
    std::copy( begin, end,
               std::ostream_iterator< typename std::iterator_traits<InputIterator>
                                                  ::value_type >( std::cout, " " ) );
}

Now would be a good time to try writing a few of your own. If you're short on ideas, try writing a few of the simple sorts as algorithms using other algorithms. Selection sort can be done particularly elegantly; Algorithm:CustomSorts contains an implementation of it and some others to compare against.

In Closing

Algorithms are a great tool. They make code more readable, eliminate opportunities for little errors when typing up loops, and can even speed up code ( since the implementation will often optimize them ). They work on iterator ranges so they aren't limited to specific situations, greatly reducing the amount of code you need to write.

What's Next

If you looked at the list of Algorithms, you probably noticed a bunch of strange Algorithms whose names end in _if that need "predicates" or strange ones such as std::transform that don't seem to really do anything specific. They're best introduced in conjunction with Functors. Together, they let you far beyond the limits of procedural or object-oriented programming and let us write C++ programs that look almost like functional ones.