Thinking in STL: You Know It Don't Come Easy [1]
(Originally published in The C/C++ Users Journal,
January 2003)
After several years of reading and thinking about STL, listening
to experts talk about it and even writing the scaffolding to test
all the STL code fragments from the original manuscript of Scott
Meyers' Effective STL [2], I came up
with a halfway original STL-related application of my own: a library
of templates to assist in the initialization of containers from
lists of arbitrary constant values. This project evolved into my
Freeware "InitUtil" library [3].
Dan Saks has described the process of learning how to program
as akin to reliving the evolutionary stages of computer programming,
from machine/assembly language through procedural and then Object-Oriented
languages [4]. If some of us take longer to
traverse this path than others, then I could be poster child for
the slow learners: about half way (2-3 iterations) into the development
of a key function template of InitUtil, I realized that this code
had begun life as a C-style function (with all the attendant pointer/memory-management
complexity modern C++ is so good at eliminating) and that I was
now in the midst of "STL-izing" it. I realized I was, in a way,
reliving the evolutionary step from pointer-based to STL/iterator-based
programming. Boy, did Dan's observation hit home in that moment.
I suspect I may not be the only one experiencing this transition-anxiety.
In this article I document the steps I had to go through before
arriving at a pure STL-based implementation of a simple little function
template named make_s.
Note: A single source file containing the full source code in a
form that can be compiled is provided here.
The figures referred to in the text below contain line numbers and
cannot be directly compiled.
Successive Refinement?
A Computer Science professor I once knew claimed that all software
should compile cleanly and run perfectly the very first time. That
is, he claimed that debugging would not be necessary if we simply
designed and constructed software using some "methodology du jour"
he would have been happy to tell me about, if I'd asked. I never
did. Perhaps that's because I enjoy the process of successive refinement
of software by subjective feel: envision it, code it, debug and
run it; envision improvements, code them, debug and run them; when
none of the envisioned improvements are worth taking the time or
trouble to code, the project is finished.
I'm not smart enough to go from scratch directly to the best solution
to a programming problem, skipping the intermediate stages. Plus,
I'm seldom careful enough to produce any program over about five
lines long that compiles and runs perfectly the first time. Thus,
the InitUtil library is the result of the usual iterative development
process.
It all began with the function template
make_s. make_s
parses a comma-delimited list of string values (passed as a single
text parameter), and produces an STL-compatible container of std::string
objects as its return value. A second Boolean parameter, with the
default value of false, tells whether or not to preserve leading
spaces in a string. The specific flavor of container desired is
specified via explicit template specification during instantiation.
Some examples:
// Create a list of strings,
// skip leading spaces [5]:
list<string> ls =
make_s<list<string> >(
"this, is, a, test, of, the,"
"emergency, broadcast, system");
// Create a deque of strings,
// preserve leading spaces:
deque<string> ds =
make_s<deque<string> >(
"here, is another, one , with,"
"all , leading, spaces ,"
"preserved", true);
(There was also a companion function template named make_i
for parsing numeric rather than string values [6].
However, since the subject of this article is the evolution of the
parsing algorithm for value_types of std::string,
I'm only going to discuss the evolution of make_s.
The algorithm I show in the final version of make_s
does survive essentially intact throughout the naming/calling convention
improvements subsequently made to InitUtil.)
Evolution
Figure 1 shows the first
version of make_s
that compiled and actually worked. It also clearly shows my historical
bias toward C strings (char
*s) for representing text, to wit:
- The intext parameter to make_s
is defined as a const
char *
- I parse the input into individual C strings by replicating
the parameter string (lines 12-13), then stuffing NUL bytes into
the position of each comma (or the original trailing NUL) in line
22. In my one initial concession to STL, I took advantage of the
ability to use char *'s
as iterators and employed the std::find
algorithm in line 21 to search for commas in the text
- As each string is detected, I construct std::string
objects from the C pointer curPos,
then insert those strings into the temporary container using the
container's own push_back
function [7]
- Before returning, the temporary buffer holding the copy of
the input string has to be deallocated
While this pointer-intensive function did indeed work, the memory
allocation issues and the poking of NUL bytes into the middle of
a string seemed gratuitously primitive for a utility designed to
work alongside STL. To save myself embarrassment, I really wanted
make_s to employ
more STL features within its own implementation.
The first issue I decided to excise was the NUL-stuffing. While
C pointers are dependent upon a specific value, NUL, to delimit
the end of a string, the workings of past-the-end iterators in STL
are such that the specific value, if any, beyond the last element
of a sequence is immaterial. I began my next version of make_s
by switching from the std::string
constructor that accepts a C string to the one that accepts a pair
of iterators. I had now eliminated the need to modify the source
text via NUL-stuffing.
At the same time, I noticed that the temporary std::string
object s was only being used in two adjacent lines of code (Figure
1, lines 28-29). Rather than separately instantiating the string
on one line and then using it on the next, I wrote one statement
(Figure 2, line 28) combining
the instantiation of a temporary string object (using the aforementioned
two-iterator constructor) with the insertion of that object into
its container. Along with eliminating a named variable, this approach
also seems more likely to offer aggressive code optimizers the opportunity
to earn their keep.
Having streamlined the parsing algorithm, I next turned my attention
to my choice of data types. Even though the main loop's code appears
quite clean while employing the "pointer as iterator" technique,
the initialization and cleanup at the extremes of the function make
it painfully obvious we're still dealing with raw pointers here.
Does the use of these pointers actually offer any benefits? Well,
the generated code is nice and short... But so what?
Code size isn't everything. That's a lesson I had recently learned
well: part of my STL Error Decryptor package [8]
is the "Proxy CL" program, a plug-in replacement for MSVC's CL.EXE
compiler driver. The original implementation, using C strings exclusively,
resulted in a 40K executable. I had resisted switching over to using
std::string because
I was afraid of the potential code bloat the inclusion of std::string
would cause. After lamenting to him about the many internal buffer
overflows and related problems I'd been grappling with, Thomas Becker
(a major contributor to the Proxy CL program) finally shamed me
into dropping the C strings, and the subsequent version of the Proxy
CL used std::string
exclusively for text manipulation. This STL-ized version weighed
in at around 200K, but nobody noticed or cared [9].
Even though 200K is over three times the entire system address space
of the machines I'd cut my teeth on as an 8080 programmer, by today's
standards a program of that size seems to be considered minuscule.
In any case, not a single string-related problem has ever arisen
within the Proxy CL program again. I deemed this a worthwhile tradeoff...it
was time to bid make_s's
C strings farewell.
Fortunately, due to pointer/iterator duality, the logic of make_s's
parsing loop remains essentially the same whether the variables
are char pointers or iterators into a std::string.
Whereas originally I had to compute the pointer to the end of the
input text by adding the text length to the beginning pointer (Figure
2, line 15), std::string
provides the past-the-end iterator I need directly via its end()
member function. I take advantage of
end() in the third version of make_s
(Figure 3, line 19). At
this point, it also finally dawned on me that there was no longer
any need to manually replicate the input text. In fact, all the
dynamic allocation code could have been stripped out back in version
2, and the only change I'd have had to make was to add the const
qualifier to a few char pointer definitions. The fact I hadn't realized
this just goes to show: to a C programmer, the association between
text processing and dynamic allocation is deeply ingrained, indeed.
The third version of make_s
still accepts a char *
as a function parameter, but uses it only once to define the automatic
std::string object
(Figure 3, line 11) used
for all subsequent processing. My original reason for leaving the
intext parameter as a char
* in version 3 (rather than accepting a
std::string and letting string's constructor handle actual
parameters of type char
*) was that a char
* is cheaper to pass than a std::string...but
that would actually not be the case if the std::string
were passed by reference. Even if the caller of make_s
passes a char *
as a parameter, it would work because the compiler would generate
a temporary std::string
object to be referenced by the formal parameter. There would even
be a potential gain in efficiency, for if the caller maintained
a single std::string
object for use in multiple calls to make_s,
a std::string construction
would be avoided for every call. There would no longer any std::string
construction within make_s,
resulting in a net savings of one string constructor call for each
use (beyond the first) of the same string in a call to the function.
The penultimate version of make_s
(Figure 4) thus accepts
the text input parameter by reference-to-const-string, and that's
the end of it: make_s
has now been fully converted from using raw char
pointers to using std::string.
There was just one more thing still troubling me. It was so subtle,
that I thought of it once, forgot about it, and later remembered
there was something else I could do, but not what it was. It took
another week (in fact, until the morning of the day I'm writing
this) before I remembered the issue: that pesky inner loop (Figure
4, lines 21-22). We're supposed to prefer the use of STL algorithms
to explicit loops, right? So they say. As simple as this loop (to
locate the first non-whitespace character) is, it's still a loop,
and thus might qualify for re-writing using a straight call to some
appropriate STL algorithm. In fact it does, as illustrated in Figure
5 (lines 20-21) via use of the std::find_if
algorithm, the std::not1
function adapter, and the std::ptr_fun
pointer adapter for ordinary functions (to allow the application
of isspace to the
algorithm). Despite the perk of no longer needing to explicitly
guard against the possibility of dereferencing an end iterator,
does this conversion necessarily constitute an improvement? I'm
not really sure. I'd have to characterize that as a religious issue
(but I'd tend toward the non-STL solution. Call me a Luddite).
Conclusion
Normally I'd have no desire to share early drafts of my software
projects. In this case, however, there may be some value in having
documented the sequence of successive refinements that propelled
this code from pointer-land to STL-land. Perhaps after viewing the
entire journey, you may recognize similar patterns looking to emerge
in your own projects--and remember the shortcut I had to learn the
hard way: Go directly to STL, do not pass pointers, do not collect
200 coding errors.
Notes
[1] Apologies to Bruce Eckel and Ringo Starr.
[2] The source code archive for Effective
STL may be downloaded here.
[3] The complete container initialization
library may be downloaded here.
. I say this package is half-way original because the structure
of these function templates was first put forth by Musser et. al.
in their book, STL
Tutorial and Reference Guide, 2nd ed. InitUtil expands upon
the make template
presented in that book.
[4] Dan Saks. "Stepping Up to C++: Writing
Your First Class", The C Users Journal, March 1991
[5] As Scott Meyers pointed out to me, specifying
the value_type of
the container shouldn't be necessary since make_s
only creates containers of strings. However, that is an issue somewhat
tangential to the point of this article; that and other design improvements
resulting from pre-publication review (such as Robert Schwartz's
suggestion to generalize the ','
delimiter between strings) will have been incorporated into the
distribution versions of InitUtil by the time this article sees
print. However, as far as the versions of make_s
shown here are concerned, what you see is what I wrote...
[6] Ultimately, through the use of partial
specialization, InitUtil evolved to the point where a single function
template name, make_cont,
would serve to initialize containers of both string and non-string
value_types. This supersedes both make_i
and make_s, even
though the two cases are handled quite distinctly. The only compilers
I own that can actually compile the latest version of InitUtil (.92),
however, are Comeau C++ and Metrowerks CodeWarrior 8. Since many
current compilers do not yet support partial specialization and/or
other ANSI/ISO C++ features that InitUtil v.92 depends upon, I'm
continuing to distribute two alternate versions: v.91 uses a distinct
function template name, make_cont_str
(successor to make_s)
to generate containers of strings, and v.91ms tolerates Microsoft
VC++6/7 by dropping set/multiset support (so as not to require partial
specialization).
[7] push_back
won't work for set and multiset, since those containers don't provide
it. In the distribution versions of InitUtil, the insertion of values
into a container is performed with the aid of three associated function
objects: one using push_back,
and twin specializations (for set and multiset) both employing insert.
Thus, all standard containers are supported orthogonally for the
more Standard-compliant compilers.
[8] Leor Zolman. "An STL Error Message
Decryptor for Visual C++", The C/C++ Users Journal,
July 2001. Also online
as that month's CUJ web feature.
[9] Of course, this isn't an embedded system.
As Dan Saks pointed out to me, such a jump in code size might not
be as easily tolerated in that genre. (Later, I discovered
that when linking with MSVC 7.1 [aka Visual Studio.NET 2003], the
executable size drops back down to about 110K. I'm assuming this
is a linkage optimization and not evidence of a 90% code efficiency
increase in the compiler.)
Addendum
Reader Jerry Feldman emailed me with this:
I was reading your article in the C/C++ users journal, and I
fully agree with your premise. However, I have a comment: in the
while loop you use an unending loop with a test:
if (nextComma == incopy.end())
break;
to exit the loop. Would it be better to eliminate the test altogether
and redefine the loop:
while (curPos <
intext.end())
This solves the problem where an empty string is passed to the
function. It also makes a code a bit cleaner. It does cause
curpos = ++nextComma;
to be executed one additional time.
Thanks, Jerry, that is right on the money. This would have been
a better way to do it.
Home |
Courses |
Tools |
Resources
Clients |
Successes |
Contact |
Site Map |
Links |
About Us
All text and images on this website are Copyright © 2001-2003 BD Software. All rights reserved.
|