BD Software delivers on-site training seminars for programmers in C, C++, Java, Perl and Unix
























BD Software delivers on-site training seminars for programmers in C, C++, Java, Perl and Unix

Thinking in STL: You Know It Don't Come Easy [1]

(Originally published in The C/C++ Users Journal, January 2003)

After several years of reading and thinking about STL, listening to experts talk about it and even writing the scaffolding to test all the STL code fragments from the original manuscript of Scott Meyers' Effective STL [2], I came up with a halfway original STL-related application of my own: a library of templates to assist in the initialization of containers from lists of arbitrary constant values. This project evolved into my Freeware "InitUtil" library [3].

Dan Saks has described the process of learning how to program as akin to reliving the evolutionary stages of computer programming, from machine/assembly language through procedural and then Object-Oriented languages [4]. If some of us take longer to traverse this path than others, then I could be poster child for the slow learners: about half way (2-3 iterations) into the development of a key function template of InitUtil, I realized that this code had begun life as a C-style function (with all the attendant pointer/memory-management complexity modern C++ is so good at eliminating) and that I was now in the midst of "STL-izing" it. I realized I was, in a way, reliving the evolutionary step from pointer-based to STL/iterator-based programming. Boy, did Dan's observation hit home in that moment.

I suspect I may not be the only one experiencing this transition-anxiety. In this article I document the steps I had to go through before arriving at a pure STL-based implementation of a simple little function template named make_s.

Note: A single source file containing the full source code in a form that can be compiled is provided here. The figures referred to in the text below contain line numbers and cannot be directly compiled.

Successive Refinement?

A Computer Science professor I once knew claimed that all software should compile cleanly and run perfectly the very first time. That is, he claimed that debugging would not be necessary if we simply designed and constructed software using some "methodology du jour" he would have been happy to tell me about, if I'd asked. I never did. Perhaps that's because I enjoy the process of successive refinement of software by subjective feel: envision it, code it, debug and run it; envision improvements, code them, debug and run them; when none of the envisioned improvements are worth taking the time or trouble to code, the project is finished.

I'm not smart enough to go from scratch directly to the best solution to a programming problem, skipping the intermediate stages. Plus, I'm seldom careful enough to produce any program over about five lines long that compiles and runs perfectly the first time. Thus, the InitUtil library is the result of the usual iterative development process.

It all began with the function template make_s. make_s parses a comma-delimited list of string values (passed as a single text parameter), and produces an STL-compatible container of std::string objects as its return value. A second Boolean parameter, with the default value of false, tells whether or not to preserve leading spaces in a string. The specific flavor of container desired is specified via explicit template specification during instantiation. Some examples:

// Create a list of strings, 
// skip leading spaces [5]:


list<string> ls = 
  make_s<list<string> >(
  "this, is, a, test, of, the,"
  "emergency, broadcast, system");

// Create a deque of strings, 
// preserve leading spaces:

deque<string> ds = 
  make_s<deque<string> >(
  "here, is another, one , with,"
  "all , leading,  spaces  ,"
  "preserved", true);

(There was also a companion function template named make_i for parsing numeric rather than string values [6]. However, since the subject of this article is the evolution of the parsing algorithm for value_types of std::string, I'm only going to discuss the evolution of make_s. The algorithm I show in the final version of make_s does survive essentially intact throughout the naming/calling convention improvements subsequently made to InitUtil.)

Evolution

Figure 1 shows the first version of make_s that compiled and actually worked. It also clearly shows my historical bias toward C strings (char *s) for representing text, to wit:

  • The intext parameter to make_s is defined as a const char *
  • I parse the input into individual C strings by replicating the parameter string (lines 12-13), then stuffing NUL bytes into the position of each comma (or the original trailing NUL) in line 22. In my one initial concession to STL, I took advantage of the ability to use char *'s as iterators and employed the std::find algorithm in line 21 to search for commas in the text
  • As each string is detected, I construct std::string objects from the C pointer curPos, then insert those strings into the temporary container using the container's own push_back function [7]
  • Before returning, the temporary buffer holding the copy of the input string has to be deallocated

While this pointer-intensive function did indeed work, the memory allocation issues and the poking of NUL bytes into the middle of a string seemed gratuitously primitive for a utility designed to work alongside STL. To save myself embarrassment, I really wanted make_s to employ more STL features within its own implementation.

The first issue I decided to excise was the NUL-stuffing. While C pointers are dependent upon a specific value, NUL, to delimit the end of a string, the workings of past-the-end iterators in STL are such that the specific value, if any, beyond the last element of a sequence is immaterial. I began my next version of make_s by switching from the std::string constructor that accepts a C string to the one that accepts a pair of iterators. I had now eliminated the need to modify the source text via NUL-stuffing.

At the same time, I noticed that the temporary std::string object s was only being used in two adjacent lines of code (Figure 1, lines 28-29). Rather than separately instantiating the string on one line and then using it on the next, I wrote one statement (Figure 2, line 28) combining the instantiation of a temporary string object (using the aforementioned two-iterator constructor) with the insertion of that object into its container. Along with eliminating a named variable, this approach also seems more likely to offer aggressive code optimizers the opportunity to earn their keep.

Having streamlined the parsing algorithm, I next turned my attention to my choice of data types. Even though the main loop's code appears quite clean while employing the "pointer as iterator" technique, the initialization and cleanup at the extremes of the function make it painfully obvious we're still dealing with raw pointers here. Does the use of these pointers actually offer any benefits? Well, the generated code is nice and short... But so what?

Code size isn't everything. That's a lesson I had recently learned well: part of my STL Error Decryptor package [8] is the "Proxy CL" program, a plug-in replacement for MSVC's CL.EXE compiler driver. The original implementation, using C strings exclusively, resulted in a 40K executable. I had resisted switching over to using std::string because I was afraid of the potential code bloat the inclusion of std::string would cause. After lamenting to him about the many internal buffer overflows and related problems I'd been grappling with, Thomas Becker (a major contributor to the Proxy CL program) finally shamed me into dropping the C strings, and the subsequent version of the Proxy CL used std::string exclusively for text manipulation. This STL-ized version weighed in at around 200K, but nobody noticed or cared [9]. Even though 200K is over three times the entire system address space of the machines I'd cut my teeth on as an 8080 programmer, by today's standards a program of that size seems to be considered minuscule. In any case, not a single string-related problem has ever arisen within the Proxy CL program again. I deemed this a worthwhile tradeoff...it was time to bid make_s's C strings farewell.

Fortunately, due to pointer/iterator duality, the logic of make_s's parsing loop remains essentially the same whether the variables are char pointers or iterators into a std::string. Whereas originally I had to compute the pointer to the end of the input text by adding the text length to the beginning pointer (Figure 2, line 15), std::string provides the past-the-end iterator I need directly via its end() member function. I take advantage of end() in the third version of make_s (Figure 3, line 19). At this point, it also finally dawned on me that there was no longer any need to manually replicate the input text. In fact, all the dynamic allocation code could have been stripped out back in version 2, and the only change I'd have had to make was to add the const qualifier to a few char pointer definitions. The fact I hadn't realized this just goes to show: to a C programmer, the association between text processing and dynamic allocation is deeply ingrained, indeed.

The third version of make_s still accepts a char * as a function parameter, but uses it only once to define the automatic std::string object (Figure 3, line 11) used for all subsequent processing. My original reason for leaving the intext parameter as a char * in version 3 (rather than accepting a std::string and letting string's constructor handle actual parameters of type char *) was that a char * is cheaper to pass than a std::string...but that would actually not be the case if the std::string were passed by reference. Even if the caller of make_s passes a char * as a parameter, it would work because the compiler would generate a temporary std::string object to be referenced by the formal parameter. There would even be a potential gain in efficiency, for if the caller maintained a single std::string object for use in multiple calls to make_s, a std::string construction would be avoided for every call. There would no longer any std::string construction within make_s, resulting in a net savings of one string constructor call for each use (beyond the first) of the same string in a call to the function. The penultimate version of make_s (Figure 4) thus accepts the text input parameter by reference-to-const-string, and that's the end of it: make_s has now been fully converted from using raw char pointers to using std::string.

There was just one more thing still troubling me. It was so subtle, that I thought of it once, forgot about it, and later remembered there was something else I could do, but not what it was. It took another week (in fact, until the morning of the day I'm writing this) before I remembered the issue: that pesky inner loop (Figure 4, lines 21-22). We're supposed to prefer the use of STL algorithms to explicit loops, right? So they say. As simple as this loop (to locate the first non-whitespace character) is, it's still a loop, and thus might qualify for re-writing using a straight call to some appropriate STL algorithm. In fact it does, as illustrated in Figure 5 (lines 20-21) via use of the std::find_if algorithm, the std::not1 function adapter, and the std::ptr_fun pointer adapter for ordinary functions (to allow the application of isspace to the algorithm). Despite the perk of no longer needing to explicitly guard against the possibility of dereferencing an end iterator, does this conversion necessarily constitute an improvement? I'm not really sure. I'd have to characterize that as a religious issue (but I'd tend toward the non-STL solution. Call me a Luddite).

Conclusion

Normally I'd have no desire to share early drafts of my software projects. In this case, however, there may be some value in having documented the sequence of successive refinements that propelled this code from pointer-land to STL-land. Perhaps after viewing the entire journey, you may recognize similar patterns looking to emerge in your own projects--and remember the shortcut I had to learn the hard way: Go directly to STL, do not pass pointers, do not collect 200 coding errors.

Notes

[1] Apologies to Bruce Eckel and Ringo Starr.

[2] The source code archive for Effective STL may be downloaded here.

[3] The complete container initialization library may be downloaded here. . I say this package is half-way original because the structure of these function templates was first put forth by Musser et. al. in their book, STL Tutorial and Reference Guide, 2nd ed. InitUtil expands upon the make template presented in that book.

[4] Dan Saks. "Stepping Up to C++: Writing Your First Class", The C Users Journal, March 1991

[5] As Scott Meyers pointed out to me, specifying the value_type of the container shouldn't be necessary since make_s only creates containers of strings. However, that is an issue somewhat tangential to the point of this article; that and other design improvements resulting from pre-publication review (such as Robert Schwartz's suggestion to generalize the ',' delimiter between strings) will have been incorporated into the distribution versions of InitUtil by the time this article sees print. However, as far as the versions of make_s shown here are concerned, what you see is what I wrote...

[6] Ultimately, through the use of partial specialization, InitUtil evolved to the point where a single function template name, make_cont, would serve to initialize containers of both string and non-string value_types. This supersedes both make_i and make_s, even though the two cases are handled quite distinctly. The only compilers I own that can actually compile the latest version of InitUtil (.92), however, are Comeau C++ and Metrowerks CodeWarrior 8. Since many current compilers do not yet support partial specialization and/or other ANSI/ISO C++ features that InitUtil v.92 depends upon, I'm continuing to distribute two alternate versions: v.91 uses a distinct function template name, make_cont_str (successor to make_s) to generate containers of strings, and v.91ms tolerates Microsoft VC++6/7 by dropping set/multiset support (so as not to require partial specialization).

[7] push_back won't work for set and multiset, since those containers don't provide it. In the distribution versions of InitUtil, the insertion of values into a container is performed with the aid of three associated function objects: one using push_back, and twin specializations (for set and multiset) both employing insert. Thus, all standard containers are supported orthogonally for the more Standard-compliant compilers.

[8] Leor Zolman. "An STL Error Message Decryptor for Visual C++", The C/C++ Users Journal, July 2001. Also online as that month's CUJ web feature.

[9] Of course, this isn't an embedded system. As Dan Saks pointed out to me, such a jump in code size might not be as easily tolerated in that genre. (Later, I discovered that when linking with MSVC 7.1 [aka Visual Studio.NET 2003], the executable size drops back down to about 110K. I'm assuming this is a linkage optimization and not evidence of a 90% code efficiency increase in the compiler.)

Addendum

Reader Jerry Feldman emailed me with this:

I was reading your article in the C/C++ users journal, and I fully agree with your premise. However, I have a comment: in the while loop you use an unending loop with a test:

if (nextComma == incopy.end())
break;


to exit the loop. Would it be better to eliminate the test altogether and redefine the loop:

while (curPos < intext.end())


This solves the problem where an empty string is passed to the function. It also makes a code a bit cleaner. It does cause

curpos = ++nextComma;


to be executed one additional time.

Thanks, Jerry, that is right on the money. This would have been a better way to do it.

 

Home | Courses | Tools | Resources
Clients | Successes | Contact | Site Map | Links | About Us

All text and images on this website are Copyright © 2001-2003 BD Software.
All rights reserved.


Courses:

C and C++ Seminars

Java Seminars

Unix-Related Seminars


Clients:

Qualcomm

Sapient, Inc.

Boston Technology

(See full
client list)


BD Software delivers on-site training seminars for programmers in C, C++, Java, Perl and Unix