Transform each element of a container by passing it through a function and putting the result into another container
So far we’ve only looked at one algorithm – std::for_each. That’s allowed me to lay the groundwork for using lambdas and std::bind, but there are many more algorithms avilable in the standard library. In fact, in this thread , John Potter refers to std::for_each as “the goto of algorithms” (and that is far from the worst thing he says about std::for_each).
You can use std::for_each to simulate pretty much every algorithm, but as usual, just because you can do something doesn’t mean that you should. There are more specific algorithms we can use to express our intent directly, and being able to express our intent directly when we’re writing code is a Good Thing.
In this installment we move on from just calling a function with each element, now we’re going to call a function, get a result back from that function, then store that result.
Our declarations have a new function – f_int_T
– a function that takes an object of type T
and returns an integer.
class T { ... }; std::vector< T > v_T; std::vector< int > v_int; int f_int_T( T const& t );
Here’s the old way of solving our problem:
Solution #0
std::vector< T >::iterator i; std::vector< int >::iterator j; for( i = std::begin( v_T ), j = std::begin( v_int ); i != std::end( v_T ); ++i, ++j ) { *j = f_int_T( *i ); }
I am assuming that our output container v_int
has sufficient size for each element being assigned to it. It is easy to change this code to call push_back
on the output container if necessary.
We now have two iterators – one for the source range and one for the output. We call the function f_int_T
for each element of the source and we write the result of the function to the output. We don’t need to explicitly use the end of the output range – std::end( v_int )
doesn’t appear anywhere in the example.
We can implement the same algorithm using range-based for:
Solution #1
std::vector< int >::iterator j( std::begin( v_int ) ); for( T const& t : v_T ) { *j++ = f_int_T( t ); }
Using range-based for works, but we have to increment the output iterator j
inside the body of the loop. This example would be simpler if push_back
was appropriate.
So let’s introduce std::transform and see what we get:
Solution #2
std::transform( std::begin( v_T ), std::end( v_T ), std::begin( v_int ), []( T const& t ) { return f_int_T( t ); } );
The lambda function is overkill, but I wanted to reinforce the fact that lambdas are functions, which means they can have a return value. In this case because the lambda consists of a single return statement the compiler will automatically deduce the type of the return value (there is a way of explicitly specifying the type of the return value but I am not going into that here).
Here’s something interesting. The blocks of code in solutions #0 and #1 look like this:
{ *j = f_int_T( *i ); }
{ *j++ = f_int_T( t ); }
There’s an assignment in both of them and they both dereference the output iterator. The range-based for version has to increment the output iterator itself. Contrast that with the block of code in the solution #2, the lambda function:
{ return f_int_T( t ); }
The code inside the lambda is doing nothing other than calling f_int_T
and returning the result. All of the iteration, the dereferencing, the assignment is being handled by the std::transform algorithm.
As I said, the lambda is overkill, we can just use the function directly as an argument:
Solution #3
std::transform( std::begin( v_T ), std::end( v_T ), std::begin( v_int ), f_int_T );
This solution makes the intent of the code clear. We have the algorithm completely separated from the action to take each time through the loop. When I look at this code and see std::transform I know how the iteration will work. I know that the number of elements placed into the output will be the same as the number of elements in the source range. I know that no elements in the source range will be skipped. Short of exceptions I know that there will be no early exit from the loop. (Many of these arguments can be applied to std::transform with a lambda function, but I think the intent is even more pronounced here).
The function that does the transformation is entirely separate and can be tested in isolation (this is the advantage of the named function over a lambda). We have separated things that used to be commingled.
We can simplify things even more using adobe::transform:
Solution #4
adobe::transform( v_T, std::begin( v_int ), f_int_T );
Solution #4 strips things down to their basics – we have our source, the place where the output will go and what we’re going to do to each element. There is almost no boilerplate.
Incidentally, the output range can be the same as the source range. If the function changes the value but not the type of its argument (or the return type of the function can be converted to the argument type), we can run std::transform to change every element of a container in place.
I have chosen a simple example to illustrate the use of std::transform. The function we are calling is a non-member function and has a single parameter of the correct type. Refer to part 2 and part 3 of this series for more information about using std::bind to call more complex functions.
Variation #1
My examples all assumed that we had enough space in the output container, but what if we don’t? Perhaps we are trying to build the output container from scratch or append elements to an existing container. We don’t want to have to resize the container first and incur the cost of constructing default constructed objects that are going to get overwritten. We won’t be able to resize it at all if the type we are constructing doesn’t have a default constructor, or if we don’t know how many elements we are going to put into it (e.g. we are using input iterators). What if we want to use push_back
on every new element?
Solutions #0 and #1 are easy. Because they handle the dereferencing and assignment within their loop body it is easy to change them to use push_back
. E.g.:
Solution #0.1
std::vector< T >::iterator i; for( i = std::begin( v_T ); i != std::end( v_T ); ++i ) { v_int.push_back( f_int_T( *i ) ); }
In fact this is simpler than our original solution #0 since we don’t need an iterator for the output at all.
If we want the effect of push_back
when we use std::transform (or any other algorithm that writes to iterators) we use an insert iterator. An insert iterator looks like an iterator (i.e. it has the operations we would expect from an iterator – although many of them have no effect for an insert iterator) but when we assign to it, it will call a function on the container – in our case we want it to call push_back
.
It’s easier to show than talk about (see Josuttis “The C++ Standard Library” 2nd edition for a proper description):
adobe::transform( v_T, std::back_inserter( v_int ), f_int_T );
The iterator created by std::back_inserter will call push_back
every time it is assigned to. The standard library also contains std::front_inserter which will call push_front
, and std::inserter which calls insert
.
One drawback of push_back
is that it might cause the output container’s memory to be reallocated and all of the container’s contents copied over – in fact depending on the reallocation strategy it might have to reallocate several times as elements are appended. Assuming that we know the ultimate size of the output container we can call reserve
which will set aside the appropriate amount of memory but without default constructing any additional elements. This lets us use push_back
and know that there will only be a single memory reallocation.
Variation #2
What if we don’t necessarily want to process every element? What if we want something like this:
std::vector< T >::iterator i; std::vector< int >::iterator j; for( i = std::begin( v_T ), j = std::begin( v_int ); i != std::end( v_T ); ++i, ++j ) { if( somePredicate( *i ) ) { *j = f_int_T( *i ); } }
Some of the algorithms have “if” versions – we pass in a predicate and their action is only taken if that predicate returns true. Among others, std::copy, std::remove and std::count all have “if” variants that take predicates. There isn’t a transform_if
provided in the standard, but it isn’t difficult to write one:
template<
class InputIterator,
class OutputIterator,
class UnaryOperation,
class UnaryPredicate >
OutputIterator transform_if(
InputIterator first,
InputIterator last,
OutputIterator result,
UnaryOperation op,
UnaryPredicate test)
{
while( first != last )
{
if( test( *first ) )
{
*result++ = op( *first );
}
++first;
}
return result;
}
[ Edit: Guvante points out in the comments here that my implementation of transform_if
does not match the for_each
version. ]
To my mind, this is the real beauty of the STL – as well as providing a large set of containers and algorithms, it is extensible. The framework it provides makes it possible to add new algorithms or containers which interact seamlessly with existing parts of the STL. transform_if
is a new algorithm, but because it follows existing practice it doesn’t take a lot of additional work to understand it.
I’m enjoying your blog.
My frustration with std::transform is that it’s only one-to-one. Even your transform_if is one-to-one for a subset of the input.
But many real-life problems are any-to-any. Consider Unicode normalization, or compression, or decompression: You read n items and output m items. I’ve yet to see a good way to express those kinds of transformations without a raw loop, but I’m hopeful you’ll prove me wrong.
One thing I’ve been experimenting with lately is putting more smarts into the iterator classes. I have an application that is doing some parsing “by hand” (as opposed to using a parser generator, and yes, it may well make more sense to use a parser generator in the long run), and needs to compress all runs of whitespace into a single space character. I have a class called
SpaceCompressorIterator
(suggestions for a better name are welcome). When an object of this class is created you give it an iterator to your source sequence.SpaceCompressorIterator
has anoperator ++
which skips over runs of whitespace characters, and anoperator *
which returns either a single space (if there’s a run of whitespace), or the appropriate character from the source sequence. Once call toSpaceCompressorIterator::operator ++
might well lead to several calls ofoperator ++
on the source sequence.This works (at least for the limited cases I am using it for) but there are a couple of oddities. The constructor of
SpaceCompressorIterator
needs to take source iterators to the beginning and end of the source sequence – the code that skips blocks of whitespace has to know when to stop reading from the source sequence.SpaceCompressorIterator
itself is either an input iterator or a forward iterator depending on the type of iterator it is initialized with. If the source iterator is an input iterator thenSpaceCompressorIterator
, if the source iterator is a forward iterator or higher,SpaceCompressorIterator
is a forward iterator. Having said that, if the source iterator is bidirectional I am sure I could makeSpaceCompressorIterator
bidirectional, it would just take a little work to reverse the whitespace-skipping algorithm. I cannot makeSpaceCompressorIterator
random access even if the source iterator is random access – there is no way to implementoperator -
without iterating over the entire sequence.Enough talk, here’s some code. This is taken from my test suite so it is using std::string as the source, but I can also use iterators from a file:
Yes, I’ve done something almost identical for iterating over UTF-8 encoded text. Dereferencing the iterator gives a current code-point value, and incrementing advances 1 to 4 bytes as appropriate. It suffers from the same problems: To instantiate it, you need the range rather than just the beginning of the sequence, reverse iteration requires more work, and random access is impossible. In my mind, it’s more of an adapter than an iterator.
I can certainly do useful and interesting things with it. For example, I can use it with std::transform to convert UTF-8 text to UTF-32 text. It solves the any-to-one problem, but I’ve found it difficult to go the other way (one-to-any) at least in a way that can actually drop into the standard algorithms without surprising constraints.
Thanks for sharing your thoughts.
Hi Bob – Enjoying your writings on C++ – came here from a reference on Eric Lipperts blog. The thread in which John Potter argues about for_each was a blast from the past, for sure! I know the topic sounded familiar – I’d forgotten that I’d participated in that thread over 10 years ago (participated in an admittedly small and inconsequential way).
The transform_if example is not consistent, the template version only increments result when the predicate succeeds, while the for version always increments.
Thank you for pointing that out – I will add a note to the post making it clear that they do not have identical functionality.
Hello,
Nice article, I am wondering about how to catch invalid_argument exception in using std::transform that can be thrown by the function like following loop does;
for (const auto & element: container) {
try {
newContainer.push_back(some_func(element));
} catch (std::invalid_argument& e) {
log the exception info
}
}
Hi Akhtar,
I don’t think it is possible to do exactly what you want with std::transform. The whole point of std::transform is that it will write one element to the output iterator every time around the loop (as I noted above, it might consume more than one element for every iterator but it will always write exactly one element).
That leaves us with
transform_if
which doesn’t force us to write an element to the output for each iteration. Where I think we run into trouble is that in your example the “if” predicate depends on whether we’ve actually been able to compute the transformed value – i.e. the “transform” function and the “if” function are connected. I see several possibilities:true
and throws away the computed value. If the predicate fails it returnsfalse
. There are obvious performance implications here – in every case that succeeds we calculate the transformed value twice. Depending on the cost of the transformation that might or might not be a problem.Given the information you’ve presented, I like #3 – just write out the loop without using standard algorithms. I’m a big fan of the standard algorithms, but sometimes they just aren’t the most appropriate thing to use.