|
· Start · Sven Rosvall · CV · Projects · Articles · Mixing Strings in C++ · C++ as a Safer C · C++ Lookup Mysteries · Kari Rosvall · The Rosvalls |
C++ Lookup MysteriesSven RosvallOne day, my friend Tommy asked me why his C++ code failed. He wanted to print out a number of objects (of his own class) to a stream. It worked well with a plain for-loop and an output operator (<<), so he knew that his output operator for the class worked as intended. But when he used std::copy() and std::ostream_iterator it failed. He wanted to "go STL" because everyone, myself included, was telling him how great the STL is. It took us a while to figure out what was wrong and it brought us down the dark sides of the inner workings of C++. It was an interesting experience and one that I would like to share. This article investigates function lookup in C++ and also contains a suggestion what to do when you want to use several different output formats and still use output operators and STL. The CodeTommy used a class developed for a toolbox. This toolbox was declared inside a namespace, following project guidelines to avoid name collisions. Namespaces were considered good and were used a lot throughout the project.Tommy followed the guidelines and put the toolbox client code in a different namespace. In here he had a need to print out objects of this new class stored in a container. He wrote a function that iterates over the container and an output operator for this purpose. His code was something like this: namespace Client { std::ostream & operator<<(std::ostream &os, Tools::Spanner const & s) { os << "Spanner{ID=" << s.getID() << ", gapSize=" << s.getGapSize() << "}"; return os; } void printSpanners(std::ostream & os, Tools::Toolbox const & tb) { for (Tools::SpannerCollection::const_iterator sit = tb.getSpanners().begin(); sit != tb.getSpanners().end(); ++sit) { os << *sit << "\n"; } } } PRE> This code worked nicely. He then introduced some STL-isms and rewrote the printing function to use std::copy() and std::ostream_iterator. These functions are often together in C++ books to show the power and flexibility of the STL. An std::ostream_iterator is an output iterator and is used with algorithms in the same way as any other output iterator. When an object is assigned to a dereferenced std::ostream_iterator, this object is written to the output stream that the std::ostream_iterator was constructed with, using an output operator defined for that object. The std::ostream_iterator is specialised with a type of the objects it shall print out. The constructor of std::ostream_iterator can also take an optional second parameter that will be used as separator string between the printed objects. Every time an object is assigned through an std::ostream_iterator, that object is printed to the std::ostream object using the output operator. The rewritten output operator code looked something like this:
void printSpanners(std::ostream & os, Tools::Toolbox const & tb) { std::copy(tb.getSpanners().begin(), tb.getSpanners().end(), std::ostream_iterator<Tools::Spanner>(os, "\n")); } Nice simple code, except that it didn’t compile. The compiler could not find an appropriate output operator. The error message from the compiler was not very helpful. It said it could not find the output operator, but did not provide many clues to what it was looking for or why it could not find the output operator that is shown above. Tommy was very puzzled, he knew that an output operator existed. He had used it successfully just a minute ago. He tried to move the output operator to the global namespace to make sure that it would be visible, but this did not work either. When neither Tommy nor his colleagues could figure this out, he lost his enthusiasm for the STL. When we met again, he was very quick to vent his frustration with the STL in front of everyone around. I was puzzled too when I heard this story and of course I tried to defend C++ and STL. But was the problem with the compiler, the C++ standard or was there something in his code? I asked him to come up with a small example, but he said there was too much code involved and too little time to reduce the code bit by bit while preserving the symptoms. Instead we had a discussion on how the code looked and we came up with the example above. We ran it through a couple of compilers and came up with similar error messages for all of them. So we could probably not blame the compiler. But what was wrong? The C++ Lookup RulesNow that I had a code example, I could play with it a bit more and read the standard thoroughly.During lookup, operators are treated as any function, they just have a special name. The rules for finding unqualified functions and operators have two main parts. Firstly, the nearest enclosing namespace is searched for 'entities' with the same name. Note that as soon as a name is found the search stops. A function in an enclosing namespaces will be ignored even if the name found cannot be called with the arguments or if in fact it is not even a function, thus:
namespace A { void f(int); void g(int); namespace B { void f(double); // hides A::f(int) void g(const char*); // hides A::g(int) void caller() { f(1); // calls A::B::f(double) g(1); // error: cannot convert '1' to a 'const char*' } } } In this example we see that A::B::f(double) hides A::f(int) and is thus the only function considered in the first call. The int argument can be converted to double so this call is legal. In the same way, A::B::g(const char*) hides A::g(int). But the int argument in the second call cannot be converted to a pointer and the call is illegal. Note that A::g(int) is not considered at all, even though A::B::g(const char*) cannot be used in the call. After searching the current and enclosing namespaces, any functions with the same name are searched in namespaces associated with the types of the arguments to the function. This second part is called argument-dependent-lookup (a.k.a. ADL or Koenig-lookup). Consider:
class X {};
void f(const X &);
namespace A
{
class Y : public X {};
void f(const Y &);
}
void caller()
{
A::Y y;
f(y); // calling A::f(const Y &);
}
So, in the function printSpanners(), using the for loop, we find the output operator in the same namespace (Client). If the output operator was declared in the global namespace instead, we would find it there, unless there were other output operators in the namespace Client. The namespace Tools would also be looked at as the argument type Spanner is declared there, but there are no output operators there. The problem for Tommy is that when std::copy() is used, the first stage of the search starts in the namespace std, and not in namespace Client. This is because the call to the output operator is from within the function body of std::copy(). Namespace std has a number of output operators as defined in the C++ standard in order to facilitate formatted output of any built-in type and some types defined in the C++ library. It doesn’t matter that none of these overloaded output operators can be used with Spanner. The lookup rule says that we find the function in the nearest enclosing namespace and stop. The output operator defined in the namespace Client is not considered at all as this namespace is not an enclosing namespace of namespace std. The compiler won’t even find the output operator if it was defined in the global namespace as it had already found some output operators in namespace std, its nearest namespace. Had Tommy declared the output operator in the same namespace as the class (namespace Tools), he would have avoided this problem as the second rule (ADL) would have found it. It can be seen as part of the interface of the class and should be declared close to the class itself, preferably in the same header file. This is fine if you have control over the header file. It does not work if the header is part of a third party library. As a workaround it is possible to put the declaration in any header file by re-opening the namespace like this: namespace Tools { std::ostream & operator<<(std::ostream & os, Spanner const & s) { ... } } The Real ProblemSo what was Tommy trying to do? Why was the output operator declared in the Client namespace and not in the Tools namespace where it belongs? Tommy said that he could have added the output operator in the Tools namespace, but he wanted different output formats for different client applications. He couldn’t place the output operators beside the Spanner class definition as you can only have one of them in the same namespace. There is no way to overload two output operators with another parameter. For his project it was easy to use namespaces to separate the output operators as no client in the same namespace would use more than one format.A Solution to the Real ProblemSo how can we make a design where we can have different output formats? How can we use these formats using output operators? And how can we make a design that will work when we use std::copy() and std::ostream_iterator?To start this off, we want some way to select different formats when a Spanner object is printed to a stream. Possibly you could derive from Spanner and then overload the output operator on these derived classes. Not a very nice design and it won’t work as you cannot downcast a Spanner object to the derived class. A simpler approach is to use different named functions that do the formatting. We want a simple syntax such as: std::cout << spanner.printNameAndGap() << std::endl; Instead we want to use a non-member function and we want writes made directly to the output stream. This function can return an object of a class that can be used with an overloaded output operator. To make it easy, we use the constructor of this formatting class instead of a separate function. class PrintSpannerNameAndGap { public: PrintSpannerNameAndGap(Spanner const & s) : m_s(s) {} void print(std::ostream & os) const { os << "Spanner{ID=" << s.getID() << ", gapSize=" << s.getGapSize() << "}"; return os; } private: Spanner const & m_s; }; std::ostream & operator<<(std::ostream & os, PrintSpannerNameAndGap const & spanner) { spanner.print(os); return os; } std::cout << PrintSpannerNameAndGap(spanner) << std::endl; Using std::copy() and std::ostream_iteratorstd::copy() is nice but it is not possible to insert a formatting object in the way shown above. We have to look at other ways to indicate that we want different output.If we look at the line using std::copy() and std::ostream_iterator there aren’t many opportunities for modification. We could adapt the source iterators (the begin/end pair) to return a different object when dereferenced and define an output operator for each different object type. The mechanism for choosing the correct overloaded output operator would be similar to the approach above. But there is no need to create the iterator adaptor. We only have to specify to the std::ostream_iterator that it shall work with PrintSpannerNameAndGap objects. This makes the code much simpler: std::copy(tb.getSpanners().begin(), tb.getSpanners().end(), std::ostream_iterator<PrintSpannerNameAndGap>(os, "\n")); Possible ImprovementsWe could use templates to reduce the amount of boilerplate code. But introducing templates does not reduce enough code to motivate the extra complexity.Another approach would be to let the formatting class PrintSpannerNameAndGap inherit from a base class that is used by all classes supporting different output formats. This base class would keep the reference to Spanner and declare the function print() pure virtual. A single output operator definition for this base class replaces all specific output operators. This only pays off when there are many different output formats for the same object type. A specific functor object can be used with the C++ library algorithm std::for_each() to print out each element. Initialise the functor with the output stream and define an operator()(Spanner const &) that prints each object to the output stream in the required format. ConclusionIt is not always easy to understand what happens under the hood in C++. But there are solutions to every problem even if good understanding of C++ may be required. Don’t be afraid of asking friends or other ACCU members for advice.AcknowledgementsThanks to Tommy Persson who had the problem originally and spent time describing the problem to me, to Richard Corden for clarifying the C++ standard and to Thaddaeus Frogley for reviewing.Copyright 2003-2012 |