Sven Rosvall
Home - Contact Info
· Start
· Sven Rosvall
  · CV
  · Projects
  · Articles
    · Mixing Strings in C++
    · C++ as a Safer C
    · C++ Lookup Mysteries
· Kari Rosvall
· The Rosvalls

C++ Lookup Mysteries

Sven Rosvall

One day, my friend Tommy asked me why his C++ code failed. He wanted to print out a number of objects (of his own class) to a stream. It worked well with a plain for-loop and an output operator (<<), so he knew that his output operator for the class worked as intended. But when he used std::copy() and std::ostream_iterator it failed. He wanted to "go STL" because everyone, myself included, was telling him how great the STL is.

It took us a while to figure out what was wrong and it brought us down the dark sides of the inner workings of C++. It was an interest­ing experience and one that I would like to share.

This article investigates function lookup in C++ and also contains a sug­gestion what to do when you want to use several different output formats and still use output operators and STL.

The Code

Tommy used a class developed for a toolbox. This toolbox was declared inside a namespace, following project guidelines to avoid name collisions. Namespaces were considered good and were used a lot throughout the project.

Tommy followed the guidelines and put the toolbox client code in a different namespace. In here he had a need to print out objects of this new class stored in a container. He wrote a function that iterates over the container and an output operator for this purpose. His code was something like this:

namespace Client
{
  std::ostream &
  operator<<(std::ostream &os,
             Tools::Spanner const & s)
  {
    os << "Spanner{ID=" << s.getID()
       << ", gapSize=" << s.getGapSize()
       << "}";
    return os;
  }

  void printSpanners(std::ostream & os,
                     Tools::Toolbox const & tb)
  {
    for (Tools::SpannerCollection::const_iterator sit = tb.getSpanners().begin();
         sit != tb.getSpanners().end();
         ++sit)
    {
       os << *sit << "\n";
    }
  }
}

This code worked nicely. He then introduced some STL-isms and rewrote the printing function to use std::copy() and std::ostream_iterator. These functions are often together in C++ books to show the power and flexibility of the STL. An std::ostream_iterator is an output iterator and is used with algorithms in the same way as any other output iterator. When an object is assigned to a dereferenced std::ostream_iterator, this object is written to the output stream that the std::ostream_iterator was constructed with, using an output operator defined for that object. The std::ostream_iterator is specialised with a type of the objects it shall print out. The constructor of std::ostream_iterator can also take an optional second parameter that will be used as separator string between the printed objects. Every time an object is assigned through an std::ostream_iterator, that object is printed to the std::ostream object using the output operator.

The rewritten output operator code looked something like this:

void printSpanners(std::ostream & os,
                  Tools::Toolbox const & tb)
{
   std::copy(tb.getSpanners().begin(),
             tb.getSpanners().end(),
             std::ostream_iterator<Tools::Spanner>(os, "\n"));
}

Nice simple code, except that it didn’t compile. The compiler could not find an appropriate output operator. The error message from the compiler was not very helpful. It said it could not find the output operator, but did not provide many clues to what it was looking for or why it could not find the output operator that is shown above.

Tommy was very puzzled, he knew that an output operator existed. He had used it successfully just a minute ago. He tried to move the output operator to the global namespace to make sure that it would be visible, but this did not work either.

When neither Tommy nor his colleagues could figure this out, he lost his enthusiasm for the STL. When we met again, he was very quick to vent his frustration with the STL in front of everyone around. I was puzzled too when I heard this story and of course I tried to defend C++ and STL. But was the prob­lem with the compiler, the C++ standard or was there something in his code?

I asked him to come up with a small example, but he said there was too much code involved and too little time to reduce the code bit by bit while preserving the symptoms. Instead we had a discussion on how the code looked and we came up with the example above. We ran it through a couple of compilers and came up with similar error mes­sages for all of them. So we could probably not blame the compiler. But what was wrong?

The C++ Lookup Rules

Now that I had a code example, I could play with it a bit more and read the standard thoroughly.

During lookup, operators are treated as any function, they just have a special name. The rules for finding unqualified functions and opera­tors have two main parts. Firstly, the nearest enclosing namespace is searched for 'entities' with the same name. Note that as soon as a name is found the search stops. A function in an enclosing namespaces will be ignored even if the name found cannot be called with the arguments or if in fact it is not even a function, thus:

namespace A
{
  void f(int);
  void g(int);
 
  namespace B
  {
    void f(double);      // hides A::f(int)
    void g(const char*); // hides A::g(int)

    void caller()
    {
      f(1); // calls A::B::f(double)
      g(1); // error: cannot convert '1' to a 'const char*'
    }
  }
}

In this example we see that A::B::f(double) hides A::f(int) and is thus the only function considered in the first call. The int argument can be converted to double so this call is legal. In the same way, A::B::g(const char*) hides A::g(int). But the int argument in the second call cannot be converted to a pointer and the call is illegal. Note that A::g(int) is not considered at all, even though A::B::g(const char*) cannot be used in the call.

After searching the current and enclosing namespaces, any functions with the same name are searched in namespaces associated with the types of the arguments to the function. This second part is called argument-dependent-lookup (a.k.a. ADL or Koenig-lookup). Consider:

class X {};
void f(const X &);

namespace A
{
  class Y : public X {};
  void f(const Y &);
}

void caller()
{
  A::Y y;
  f(y);    // calling A::f(const Y &);
}
Both functions f(const X &) and A::f(const Y &) are found by the lookup rules and considered for overload resolution. f(const X &) is found by looking at the nearest namespace and A::f(const Y &) is found using argument-dependent-lookup. The argument y has a type defined in namespace A where the function A::f(const Y &) is found. The overload resolution rule looks at both functions and chooses A::f(const Y &) as a better match.

So, in the function printSpanners(), using the for loop, we find the output operator in the same name­space (Client). If the output operator was declared in the global namespace instead, we would find it there, unless there were other output opera­tors in the namespace Client. The namespace Tools would also be looked at as the argu­ment type Spanner is declared there, but there are no output opera­tors there.

The problem for Tommy is that when std::copy() is used, the first stage of the search starts in the namespace std, and not in namespace Client. This is because the call to the output operator is from within the function body of std::copy(). Namespace std has a number of output operators as defined in the C++ standard in order to facilitate formatted output of any built-in type and some types defined in the C++ library. It doesn’t matter that none of these over­loaded output operators can be used with Spanner. The lookup rule says that we find the function in the nearest enclosing namespace and stop. The output operator defined in the namespace Client is not considered at all as this namespace is not an enclosing namespace of namespace std. The compiler won’t even find the output operator if it was defined in the global namespace as it had already found some output operators in namespace std, its nearest namespace.

Had Tommy declared the output operator in the same namespace as the class (namespace Tools), he would have avoided this problem as the second rule (ADL) would have found it. It can be seen as part of the interface of the class and should be declared close to the class itself, preferably in the same header file. This is fine if you have control over the header file. It does not work if the header is part of a third party library. As a workaround it is possible to put the declaration in any header file by re-opening the namespace like this:

namespace Tools
{
  std::ostream &
    operator<<(std::ostream & os,
               Spanner const & s)
  {
    ...
  }
}

The Real Problem

So what was Tommy trying to do? Why was the output operator declared in the Client namespace and not in the Tools namespace where it belongs? Tommy said that he could have added the output operator in the Tools namespace, but he wanted different output formats for different client applications. He couldn’t place the output operators beside the Spanner class definition as you can only have one of them in the same namespace. There is no way to overload two output operators with another parameter. For his project it was easy to use namespaces to separate the output operators as no client in the same namespace would use more than one format.

A Solution to the Real Problem

So how can we make a design where we can have different output formats? How can we use these formats using output operators? And how can we make a design that will work when we use std::copy() and std::ostream_iterator?

To start this off, we want some way to select different formats when a Spanner object is printed to a stream. Possibly you could derive from Spanner and then overload the output operator on these derived classes. Not a very nice design and it won’t work as you cannot down­cast a Spanner object to the derived class.

A simpler approach is to use different named functions that do the formatting. We want a simple syntax such as:

std::cout << spanner.printNameAndGap()
          << std::endl;
This can be implemented by letting the member function printNameAndGap() return a string in the format we want. Nice and simple. Except that it is not always possible to add things to the class we are using, for example third party libraries. Here, the format­ting belongs to the user, not to the class itself. The class designer does not know what format all clients can possibly want to use. This approach is also inefficient, as a temporary string has to be created.

Instead we want to use a non-member function and we want writes made directly to the output stream. This function can return an object of a class that can be used with an overloaded output operator. To make it easy, we use the constructor of this formatting class instead of a sepa­rate function.

class PrintSpannerNameAndGap
{
public:
   PrintSpannerNameAndGap(Spanner const & s)
     : m_s(s) {}
   void print(std::ostream & os) const
   {
      os << "Spanner{ID=" << s.getID()
         << ", gapSize=" << s.getGapSize()
         << "}";
      return os;
   }
private:
   Spanner const & m_s;
};

std::ostream & operator<<(std::ostream & os,
     PrintSpannerNameAndGap const & spanner)
{
   spanner.print(os);
   return os;
}
We can now use this class like this:
std::cout << PrintSpannerNameAndGap(spanner)
          << std::endl;
This does not look too bad. Just watch out for that member reference to the original object. The PrintSpannerNameAndGap object must not exist longer than the referenced Spanner object. This is not a problem when it is used as shown above as it only exists as a tempo­rary object and disappears at the end of the statement.

Using std::copy() and std::ostream_iterator

std::copy() is nice but it is not possible to insert a formatting object in the way shown above. We have to look at other ways to indi­cate that we want different output.

If we look at the line using std::copy() and std::ostream_iterator there aren’t many opportunities for modification. We could adapt the source iterators (the begin/end pair) to return a different object when dereferenced and define an output operator for each different object type. The mechanism for choosing the correct overloaded output operator would be similar to the approach above.

But there is no need to create the iterator adaptor. We only have to specify to the std::ostream_iterator that it shall work with PrintSpannerNameAndGap objects. This makes the code much simpler:

std::copy(tb.getSpanners().begin(),
          tb.getSpanners().end(),
          std::ostream_iterator<PrintSpannerNameAndGap>(os, "\n"));
PrintSpannerNameAndGap is the same class as above. As the std::ostream_iterator requires a PrintSpanner­Name­AndGap, then the Spanner objects returned by the source iterators are implicitly converted to PrintSpanner­NameAndGap objects. This works because we did not make the constructor explicit. The PrintSpannerNameAndGap object is a temporary object and is deleted after the assignment statement in std::copy() has completed. The PrintSpannerNameAndGap object holds a reference to the Spanner object coming from the iterators to avoid unnecessary copying. This reference is OK as the PrintSpannerName­AndGap object has shorter lifetime than the Spanner object.

Possible Improvements

We could use templates to reduce the amount of boilerplate code. But introducing templates does not reduce enough code to motivate the extra complexity.

Another approach would be to let the formatting class PrintSpannerNameAndGap inherit from a base class that is used by all classes supporting different output formats. This base class would keep the reference to Spanner and declare the function print() pure virtual. A single output operator definition for this base class replaces all specific output operators. This only pays off when there are many different output formats for the same object type.

A specific functor object can be used with the C++ library algorithm std::for_each() to print out each element. Initialise the functor with the output stream and define an operator()(Spanner const &) that prints each object to the output stream in the required format.

Conclusion

It is not always easy to understand what happens under the hood in C++. But there are solutions to every problem even if good under­standing of C++ may be required. Don’t be afraid of asking friends or other ACCU members for advice.

Acknowledgements

Thanks to Tommy Persson who had the problem originally and spent time describing the problem to me, to Richard Corden for clarifying the C++ standard and to Thaddaeus Frogley for reviewing.


Copyright 2003-2012