Don't hesitate to send in feedback: send an e-mail if you like the C++ Annotations; if you think that important material was omitted; if you find errors or typos in the text or the code examples; or if you just feel like e-mailing. Send your e-mail to Frank B. Brokken.Please state the document version you're referring to, as found in the title (in this document: 7.2.0) and please state chapter and paragraph name or number you're referring to.
All received mail is processed conscientiously, and received suggestions for improvements will usually have been processed by the time a new version of the Annotations is released. Except for the incidental case I will normally not acknowledge the receipt of suggestions for improvements. Please don't interpret this as me not appreciating your efforts.
In this chapter several concrete examples of C++ programs, classes and
templates will be presented. Topics covered by this document such as virtual
functions, static members, etc. are illustrated in this chapter. The
examples roughly follow the organization of earlier chapters.
First, examples using stream classes are presented, including some
detailed examples illustrating polymorphism. With the advent of the
ANSI/ISO standard, classes supporting streams based on
file descriptors are no longer available, including the
Gnu procbuf
extension. These classes were frequently used in older C++ programs. This
section of the C++ Annotations develops an alternative: classes extending
streambuf, allowing the use of file descriptors, and classes around the
fork()
system call.
Next, several templates will be developed, both function templates and
full class templates. In addition to these templates Jesse van den Kieboom,
while attending the 2006-2007 edition of my C++ course, wrote a neat
auto_ptr class using reference counting. His implementation
(refcountautoptr.h) is provided here.
Finally, we'll touch the subjects of scanner and parser generators, and show how these tools may be used in C++ programs. These final examples assume a certain familiarity with the concepts underlying these tools, like grammars, parse-trees and parse-tree decoration. Once the input for a program exceeds a certain level of complexity, it's advantageous to use scanner- and parser-generators to produce code doing the actual input recognition. One of the examples in this chapter describes the usage of these tools in a C++ environment.
streambuf as the starting
point for constructing classes interfacing file descriptors.
In this section we will construct classes which may be used to write to a device identified by a file descriptor: it may be a file, but it could also be a pipe or socket. Section 21.2.2 discusses reading from devices given their file descriptors, while section 21.4.1 reconsiders redirection, discussed earlier in section 5.8.3.
Basically, deriving a class for
output operations is simple. The only
member function that must be overridden is the
virtual member int
overflow(int c)
. This member is responsible for
writing characters to the device once the class's buffer is full. If fd is
a file descriptor to which information may be written, and if we decide
against using a buffer then the member overflow() can simply be:
class UnbufferedFD: public std::streambuf
{
public:
int overflow(int c);
...
};
int UnbufferedFD::overflow(int c)
{
if (c != EOF)
{
if (write(d_fd, &c, 1) != 1)
return EOF;
}
return c;
}
The argument received by overflow() is either written as a value of
type char to the file descriptor, or EOF is returned.
This simple function does not use an output buffer. As the use of a buffer is strongly advised (see also the next section), the construction of a class using an output buffer will be discussed next in somewhat greater detail.
When an output buffer is used, the overflow() member will be a bit
more complex, as it is now only called when the buffer is full. Once the
buffer is full, we first have to flush the buffer, for which the (virtual)
function
streambuf::sync() is available. Since sync() is a virtual
function, classes derived from std::streambuf may redefine sync() to
flush a buffer std::streambuf itself doesn't know about.
Overriding sync() and using it in overflow() is not all that has
to be done: eventually we might have less information than fits into the
buffer. So, at the end of the
lifetime of our special streambuf object,
its buffer might only be partially full. Therefore, we must make sure that the
buffer is flushed once our object goes
out of scope. This is of course very
simple: sync() should be called by the
destructor as well.
Now that we've considered the consequences of using an output buffer, we're almost ready to construct our derived class. We will add a couple of additional features, though.
ofdnstreambuf has the following
characteristics:
std::streambuf.
It defines three data members, keeping track of the size of the buffer,
the file descriptor and the buffer itself. Here is the full class interface
class ofdnstreambuf: public std::streambuf
{
size_t d_bufsize;
int d_fd;
char *d_buffer;
public:
ofdnstreambuf();
ofdnstreambuf(int fd, size_t bufsize = 1);
~ofdnstreambuf();
void open(int fd, size_t bufsize = 1);
int sync();
int overflow(int c);
};
open() member (see
below). Here are the constructors:
inline ofdnstreambuf::ofdnstreambuf()
:
d_bufsize(0),
d_buffer(0)
{}
inline ofdnstreambuf::ofdnstreambuf(int fd, size_t bufsize)
{
open(fd, bufsize);
}
sync(), writing any characters stored in the output buffer to the
device. If there's no buffer, the destructor needs to perform no actions:
inline ofdnstreambuf::~ofdnstreambuf()
{
if (d_buffer)
{
sync();
delete[] d_buffer;
}
}
Even though the device is not closed in the above implementation this may not always be what one wants. It is left as an exercise to the reader to change this class in such a way that the device may optionally remain open. This approach was followed in, e.g., the Bobcat library. See also section 21.2.2.2.
open() member initializes the buffer. Using
setp(), the begin and end points of the buffer are
set. This is used by the streambuf base class to initialize
pbase(),
pptr(), and
epptr():
inline void ofdnstreambuf::open(int fd, size_t bufsize)
{
d_fd = fd;
d_bufsize = bufsize == 0 ? 1 : bufsize;
d_buffer = new char[d_bufsize];
setp(d_buffer, d_buffer + d_bufsize);
}
sync() will flush the as yet unflushed contents of the
buffer to the device. Next, the buffer is reinitialized using
setp(). Note that sync() returns 0 after a successful flush operation:
inline int ofdnstreambuf::sync()
{
if (pptr() > pbase())
{
write(d_fd, d_buffer, pptr() - pbase());
setp(d_buffer, d_buffer + d_bufsize);
}
return 0;
}
overflow() is
overridden. Since this member is called from the streambuf base
class when the buffer is full, sync() is called first to flush the filled
up buffer to the device. As this recreates an empty buffer, the character
c which could not be written to the buffer by the streambuf base class
is now entered into the buffer using the member functions pptr() and
pbump(). Notice that entering a character into the
buffer is implemented using available streambuf member functions, rather than
doing it `by hand', which might invalidate streambuf's internal
bookkeeping:
inline int ofdnstreambuf::overflow(int c)
{
sync();
if (c != EOF)
{
*pptr() = c;
pbump(1);
}
return c;
}
streambuf the header file
unistd.h must have been read by the compiler before the implementations of
the member functions can be compiled.
ofdstreambuf class to copy its standard input to file descriptor
STDOUT_FILENO, which is the
symbolic name of the file descriptor used
for the standard output. Here is the program:
#include <string>
#include <iostream>
#include <istream>
#include "fdout.h"
using namespace std;
int main(int argc)
{
ofdnstreambuf fds(STDOUT_FILENO, 500);
ostream os(&fds);
switch (argc)
{
case 1:
os << "COPYING cin LINE BY LINE\n";
for (string s; getline(cin, s); )
os << s << endl;
break;
case 2:
os << "COPYING cin BY EXTRACTING TO os.rdbuf()\n";
cin >> os.rdbuf(); // Alternatively, use: cin >> &fds;
break;
case 3:
os << "COPYING cin BY INSERTING cin.rdbuf() into os\n";
os << cin.rdbuf();
break;
}
}
std::
streambuf, they should be provided with an input buffer of at
least one character. The one-character input buffer allows for the use of the
member functions
istream::putback() or
istream::ungetc(). Stream
classes (like
istream) normally allow us to unget at least one character
using their member functions putback() or ungetc(). This is important,
as these stream classes usually interface to
streambuf objects. Although
it is strictly speaking not necessary to implement a buffer in classes derived
from streambuf using buffers in these cases is strongly advised: the
implementation is very simple and straightforward, and the applicability of
such classes will be greatly improved. Therefore, in all our classes
derived from the class streambuf at least a
buffer of one
character will be defined.
ifdstreambuf) from streambuf using a
buffer of one character, at least its member streambuf::underflow() should
be overridden, as this is the member to which all requests for input are
eventually directed. Since a buffer is also needed, the member
streambuf::setg() is used to inform the streambuf base class of the
size of the input buffer, so that it is able to set up its input buffer
pointers correctly. This will ensure that
eback(),
gptr(), and
egptr() return correct values.
The required class shows the following characteristics:
std::
streambuf as well.
The class defines two data members, one of them a fixed-sized one
character buffer. The data members are defined as
protected data members
so that derived classes (e.g., see section 21.2.2.3) can access them. Here
is the full class interface:
class ifdstreambuf: public std::streambuf
{
protected:
int d_fd;
char d_buffer[1];
public:
ifdstreambuf(int fd);
int underflow();
};
gptr() will be equal to
egptr(). Since this implies
that the buffer is empty, underflow() will immediately be called to refill
the buffer:
inline ifdstreambuf::ifdstreambuf(int fd)
:
d_fd(fd)
{
setg(d_buffer, d_buffer + 1, d_buffer + 1);
}
underflow() is overridden. It will first ensure that the
buffer is really empty. If not, then the next character in the buffer is
returned. If the buffer is really empty, it is refilled by reading from the
file descriptor. If this fails (for whatever reason), EOF is
returned. More sophisticated implementations could react more intelligently
here, of course. If the buffer could be refilled,
setg() is called to set
up streambuf's buffer pointers correctly.
streambuf the header file
unistd.h must have been read by the compiler before the implementations of
the member functions can be compiled.
ifdstreambuf class. It is used
in the following program:
#include <iostream>
#include <istream>
#include <unistd.h>
#include "ifdbuf.h"
using namespace std;
int main(int argc)
{
ifdstreambuf fds(STDIN_FILENO);
istream is(&fds);
cout << is.rdbuf();
}
ifdstreambuf developed in the previous section. To make things a bit more
interesting, in the class
ifdnstreambuf developed here, the member
streambuf::xsgetn() is also overridden, to optimize reading of series of
characters. Furthermore, a default constructor is provided which can be used
in combination with the open() member to construct an istream object
before the file descriptor becomes available. Then, once the descriptor
becomes available, the open() member can be used to initiate the object's
buffer. Later, in section 21.4, we'll encounter such a situation.
To save some space, the success of various calls was not checked. In `real
life' implementations, these checks should, of course, not be omitted. The
class ifdnstreambuf has the following characteristics:
std::
streambuf:
Like the class ifdstreambuf (section 21.2.2.1), its data
members are protected. Since the buffer's size is configurable, this size is
kept in a dedicated data member, d_bufsize:
class ifdnstreambuf: public std::streambuf
{
protected:
int d_fd;
size_t d_bufsize;
char* d_buffer;
public:
ifdnstreambuf();
ifdnstreambuf(int fd, size_t bufsize = 1);
~ifdnstreambuf();
void open(int fd, size_t bufsize = 1);
int underflow();
std::streamsize xsgetn(char *dest, std::streamsize n);
};
open() which will then
initialize the object so that it can actually be used:
inline ifdnstreambuf::ifdnstreambuf()
:
d_bufsize(0),
d_buffer(0)
{}
inline ifdnstreambuf::ifdnstreambuf(int fd, size_t bufsize)
{
open(fd, bufsize);
}
open(), its destructor will
both delete the object's buffer and use the file descriptor to close the
device:
ifdnstreambuf::~ifdnstreambuf()
{
if (d_bufsize)
{
close(d_fd);
delete[] d_buffer;
}
}
Even though the device is closed in the above implementation this may not
always be what one wants. In cases where the open file descriptor is already
available the intention may be to use that descriptor repeatedly, each time
using a newly constructed ifdnstreambuf object. It is left as an exercise
to the reader to change this class in such a way that the device may
optionally be closed. This approach was followed in, e.g., the
Bobcat library.
open() member simply allocates the object's buffer. It is
assumed that the calling program has already opened the device. Once the
buffer has been allocated, the base class member setg() is used to
ensure that
eback(),
gptr(), and
egptr() return correct values:
void ifdnstreambuf::open(int fd, size_t bufsize)
{
d_fd = fd;
d_bufsize = bufsize;
d_buffer = new char[d_bufsize];
setg(d_buffer, d_buffer + d_bufsize, d_buffer + d_bufsize);
}
underflow() is implemented almost
identically to ifdstreambuf's (section 21.2.2.1) member. The only
difference is that the current class supports a buffer of larger
sizes. Therefore, more characters (up to d_bufsize) may be read from the
device at once:
int ifdnstreambuf::underflow()
{
if (gptr() < egptr())
return *gptr();
int nread = read(d_fd, d_buffer, d_bufsize);
if (nread <= 0)
return EOF;
setg(d_buffer, d_buffer, d_buffer + nread);
return *gptr();
}
xsgetn() is overridden. In a loop, n is reduced until
0, at which point the function terminates. Alternatively, the member returns
if underflow() fails to obtain more characters. This member optimizes the
reading of series of characters: instead of calling
streambuf::sbumpc()
n times, a block of avail characters is copied to the destination,
using
streambuf::gpumb() to consume avail characters from the buffer
using one function call:
std::streamsize ifdnstreambuf::xsgetn(char *dest, std::streamsize n)
{
int nread = 0;
while (n)
{
if (!in_avail())
{
if (underflow() == EOF)
break;
}
int avail = in_avail();
if (avail > n)
avail = n;
memcpy(dest + nread, gptr(), avail);
gbump(avail);
nread += avail;
n -= avail;
}
return nread;
}
streambuf the header file
unistd.h must have been read by the compiler before the implementations of
the member functions can be compiled.
xsgetn() is called by
streambuf::sgetn(),
which is a streambuf member. The following example illustrates the use of
this member function with a ifdnstreambuf object:
#include <unistd.h>
#include <iostream>
#include <istream>
#include "ifdnbuf.h"
using namespace std;
int main(int argc)
{
// internally: 30 char buffer
ifdnstreambuf fds(STDIN_FILENO, 30);
char buf[80]; // main() reads blocks of 80
// chars
while (true)
{
size_t n = fds.sgetn(buf, 80);
if (n == 0)
break;
cout.write(buf, n);
}
}
streambuf should override the members
streambuf::seekoff() and
streambuf::seekpos(). The class
ifdseek, developed in this section,
can be used to read information from devices supporting such seek
operations. The class ifdseek was derived from ifdstreambuf, so it
uses a character buffer of just one character. The facilities to perform seek
operations, which are added to our new class ifdseek, will make sure that
the input buffer is reset when a seek operation is requested. The class could
also be derived from the class ifdnstreambuf; in which case, the arguments
to reset the input buffer must be adapted in such a way that its second and
third parameters point beyond the available input buffer. Let's have a look at
the characteristics of ifdseek:
ifdseek is derived from ifdstreambuf. Like the
latter class, ifdseek's member functions use facilities declared in
unistd.h. So, the compiler must have seen unistd.h before it can
compile the class's members functions. To reduce the amount of typing when
specifying types and constants from std::streambuf and std::ios,
several
typedefs are defined at the class's very top. These typedefs
refer to types that are defined in the header file
ios, which must
therefore be included as well before the compiler reads ifdseek's class
definition. Here is the class's interface:
class ifdseek: public ifdstreambuf
{
typedef std::streambuf::pos_type pos_type;
typedef std::streambuf::off_type off_type;
typedef std::ios::seekdir seekdir;
typedef std::ios::openmode openmode;
public:
ifdseek(int fd);
pos_type seekoff(off_type offset, seekdir dir, openmode);
pos_type seekpos(pos_type offset, openmode mode);
};
inline ifdseek::ifdseek(int fd)
:
ifdstreambuf(fd)
{}
seek_off() is responsible for performing the actual
seek operations. It calls
lseek() to seek a new position in a device whose
file descriptor is known. If seeking succeeds, setg() is called to define
an already empty buffer, so that the base class's underflow() member
will refill the buffer at the next input request.
ifdseek::pos_type ifdseek::seekoff(off_type off, seekdir dir, openmode)
{
pos_type pos =
lseek
(
d_fd, off,
(dir == std::ios::beg) ? SEEK_SET :
(dir == std::ios::cur) ? SEEK_CUR :
SEEK_END
);
if (pos < 0)
return -1;
setg(d_buffer, d_buffer + 1, d_buffer + 1);
return pos;
}
seekpos is overridden as well:
it is actually defined as a call to seekoff():
inline ifdseek::pos_type ifdseek::seekpos(pos_type off, openmode mode)
{
return seekoff(off, std::ios::beg, mode);
}
ifdseek is the following. If
this program is given its own source file using input
redirection then
seeking is supported, and with the exception of the first line, every other
line is shown twice:
#include "fdinseek.h"
#include <string>
#include <iostream>
#include <istream>
#include <iomanip>
using namespace std;
int main(int argc)
{
ifdseek fds(0);
istream is(&fds);
string s;
while (true)
{
if (!getline(is, s))
break;
streampos pos = is.tellg();
cout << setw(5) << pos << ": `" << s << "'\n";
if (!getline(is, s))
break;
streampos pos2 = is.tellg();
cout << setw(5) << pos2 << ": `" << s << "'\n";
if (!is.seekg(pos))
{
cout << "Seek failed\n";
break;
}
}
}
streambuf classes and classes derived from
streambuf should support at least ungetting the last read
character. Special care must be taken when series of unget() calls
must be supported. In this section the construction of a class supporting a
configurable number of
istream::unget() or
istream::putback() calls is
discussed.
Support for multiple (say `n') unget() calls is implemented by
reserving an initial section of the input buffer, which is gradually filled up
to contain the last n characters read. The class was implemented as
follows:
std::
streambuf. It
defines several data members, allowing the class to perform the bookkeeping
required to maintain an unget-buffer of a configurable size:
class fdunget: public std::streambuf
{
int d_fd;
size_t d_bufsize;
size_t d_reserved;
char* d_buffer;
char* d_base;
public:
fdunget(int fd, size_t bufsz, size_t unget);
~fdunget();
int underflow();
};
d_reserved bytes of the class's input buffer.
d_reserved. So, a certain number of bytes may be read. Then, once
reserved bytes have been read at least reserved bytes can be ungot.
d_base, pointing to a location reserved bytes from the start
of d_buffer. This will always be the point where the buffer refills start.
streambuf's buffer pointers using setg(). As no characters have been
read yet, all pointers are set to point to d_base. If unget() is
called at this point, no characters are available, so unget() will
(correctly) fail.
fdunget::fdunget(int fd, size_t bufsz, size_t unget)
:
d_fd(fd),
d_reserved(unget)
{
size_t allocate =
bufsz > d_reserved ?
bufsz
:
d_reserved + 1;
d_buffer = new char[allocate];
d_base = d_buffer + d_reserved;
setg(d_base, d_base, d_base);
d_bufsize = allocate - d_reserved;
}
inline fdunget::~fdunget()
{
delete[] d_buffer;
}
underflow() is overridden.
d_reserved, but it is equal to the actual
number of characters that can be ungot if this value is smaller.
d_base.
d_base and not from d_buffer.
streambuf's read buffer pointers are set up.
Eback() is set to move locations before d_base, thus
defining the guaranteed unget-area,
gptr() is set to d_base, since that's the location of the
first read character after a refill, and
egptr() is set just beyond the location of the last character
read into the buffer.
underflow()'s implementation:
int fdunget::underflow()
{
if (gptr() < egptr())
return *gptr();
size_t ungetsize = gptr() - eback();
size_t move = std::min(ungetsize, d_reserved);
memcpy(d_base - move, egptr() - move, move);
int nread = read(d_fd, d_base, d_bufsize);
if (nread <= 0) // none read -> return EOF
return EOF;
setg(d_base - move, d_base, d_base + nread);
return *gptr();
}
class fdunget. It reads at most
10 characters from the standard input, stopping at EOF. A guaranteed
unget-buffer of 2 characters is defined in a buffer holding 3 characters. Just
before reading a character, the program tries to unget at most 6
characters. This is, of course, not possible; but the program will nicely
unget as many characters as possible, considering the actual number of
characters read:
#include "fdunget.h"
#include <string>
#include <iostream>
#include <istream>
using namespace std;
int main(int argc)
{
fdunget fds(0, 3, 2);
istream is(&fds);
char c;
for (int idx = 0; idx < 10; ++idx)
{
cout << "after reading " << idx << " characters:\n";
for (int ug = 0; ug <= 6; ++ug)
{
if (!is.unget())
{
cout
<< "\tunget failed at attempt " << (ug + 1) << "\n"
<< "\trereading: '";
is.clear();
while (ug--)
{
is.get(c);
cout << c;
}
cout << "'\n";
break;
}
}
if (!is.get(c))
{
cout << " reached\n";
break;
}
cout << "Next character: " << c << endl;
}
}
/*
Generated output after 'echo abcde | program':
after reading 0 characters:
unget failed at attempt 1
rereading: ''
Next character: a
after reading 1 characters:
unget failed at attempt 2
rereading: 'a'
Next character: b
after reading 2 characters:
unget failed at attempt 3
rereading: 'ab'
Next character: c
after reading 3 characters:
unget failed at attempt 4
rereading: 'abc'
Next character: d
after reading 4 characters:
unget failed at attempt 4
rereading: 'bcd'
Next character: e
after reading 5 characters:
unget failed at attempt 4
rereading: 'cde'
Next character:
after reading 6 characters:
unget failed at attempt 4
rereading: 'de
'
reached
*/
istream objects operator>>(), the
standard extraction operator, is perfectly suited for the task as in most
cases the extracted fields are white-space or otherwise clearly separated from
each other. But this does not hold true in all situations. For example, when a
web-form is posted to some processing script or program, the receiving program
may receive the form field's values as
url-encoded characters: letters
and digits are sent unaltered, blanks are sent as + characters, and all
other characters start with % followed by the character's
ascii-value represented by its two digit hexadecimal value.
When decoding url-encoded information, a simple hexadecimal extraction
won't work, since that will extract as many hexadecimal characters as
available, instead of just two. Since the letters a-f and 0-9 are
legal hexadecimal characters, a text like My name is `Ed', url-encoded as
My+name+is+%60Ed%27
will result in the extraction of the hexadecimal values 60ed and
27, instead of 60 and 27. The name Ed will disappear from
view, which is clearly not what we want.
In this case, having seen the %, we could extract 2 characters, put
them in an
istringstream object, and extract the hexadecimal value from
the istringstream object. A bit cumbersome, but doable. Other approaches,
however, are possible as well.
The following class
fistream for fixed-sized field istream defines
an istream class supporting both fixed-sized field extractions and
blank-delimited extractions (as well as unformatted read() calls). The
class may be initialized as a
wrapper around an existing istream, or
it can be initialized using the name of an existing file. The class is derived
from istream, allowing all extractions and operations supported by
istreams in general. The class will need the following data members:
d_filebuf: a filebuffer used when fistream reads its information
from a named (existing) file. Since the filebuffer is only needed in
that case, and since it must be allocated dynamically, it is defined
as an auto_ptr<filebuf> object.
d_streambuf: a pointer to fistream's streambuf. It will point
to filebuf when fistream opens a file by name. When an
existing istream is used to construct an fistream, it will
point to the existing istream's streambuf.
d_iss: an istringstream object which is used for the fixed field
extractions.
d_width: a size_t indicating the width of the field to
extract. If 0 no fixed field extractions will be used, but
information will be extracted from the istream base class object
using standard extractions.
fistream's class interface:
class fistream: public std::istream
{
std::auto_ptr<std::filebuf> d_filebuf;
std::streambuf *d_streambuf;
std::istringstream d_iss;
size_t d_width;
As mentioned, fistream objects can be constructed from either a
filename or an existing istream object. Thus, the class interface shows
two constructors:
fistream(std::istream &stream);
fistream(char const *name,
std::ios::openmode mode = std::ios::in);
When an fistream object is constructed using an existing istream
object, the fistream's istream part will simply use the stream's
streambuf object:
fistream::fistream(istream &stream)
:
istream(stream.rdbuf()),
d_streambuf(rdbuf()),
d_width(0)
{}
When an fstream object is constructed using a filename, the
istream base initializer is given a new filebuf object to be used as
its streambuf. Since the class's data members are not initialized before
the class's base class has been constructed, d_filebuf can only be
initialized thereafter. By then, the filebuf is only available as
rdbuf(), which returns a streambuf. However, as it is actually a
filebuf, a reinterpret_cast is used to cast the streambuf pointer
returned by rdbuf() to a filebuf *, so d_filebuf can be
initialized:
fistream::fistream(char const *name, ios::openmode mode)
:
istream(new filebuf()),
d_filebuf(reinterpret_cast<filebuf *>(rdbuf())),
d_streambuf(d_filebuf.get()),
d_width(0)
{
d_filebuf->open(name, mode);
}
There is only one additional public member: setField(field const
&). This member is used to define the size of the next field to extract. Its
parameter is a reference to a field class, a
manipulator class
defining the width of the next field.
Since a field & is mentioned in fistream's interface, field
must be declared before fistream's interface starts. The class field
itself is simple: it declares fistream as its friend, and it has two data
members: d_width specifies the width of the next field, d_newWidth is
set to true if d_width's value should actually be used. If
d_newWidth is false, fistream will return to its standard extraction
mode. The class field furthermore has two constructors: a default
constructor, setting d_newWidth to false and a second constructor
expecting the width of the next field to extract as its value. Here is the
class field:
class field
{
friend class fistream;
size_t d_width;
bool d_newWidth;
public:
field(size_t width);
field();
};
inline field::field(size_t width)
:
d_width(width),
d_newWidth(true)
{}
inline field::field()
:
d_newWidth(false)
{}
Since field declares fistream as its friend, setField may
inspect field's members directly.
Time to return to setField(). This function expects a reference to a
field object, initialized in either of three different ways:
field(): When setField()'s argument is a field object
constructed by its default constructor the next extraction will use
the same fieldwidth as the previous extraction.
field(0): When this field object is used as setField()'s
argument, fixed-sized field extraction stops, and the fistream
will act like any standard istream object.
field(x): When the field object itself is initialized by a
non-zero size_t value x, then the next field width will be x
characters wide. The preparation of such a field is left to
setBuffer(), fistream's only private member.
setField()'s implementation:
std::istream &fistream::setField(field const ¶ms)
{
if (params.d_newWidth) // new field size requested
d_width = params.d_width; // set new width
if (!d_width) // no width?
rdbuf(d_streambuf); // return to the old buffer
else
setBuffer(); // define the extraction buffer
return *this;
}
The private member setBuffer() defines a buffer of d_width + 1
characters, and uses read() to fill the buffer with d_width
characters. The buffer is terminated by an ASCII-Z character. This buffer
is then used to initialize the d_str member. Finally, fistream's
rdbuf() member is used to extract the d_str's data via the
fistream object itself:
void fistream::setBuffer()
{
char *buffer = new char[d_width + 1];
rdbuf(d_streambuf); // use istream's buffer to
buffer[read(buffer, d_width).gcount()] = 0; // read d_width chars,
// terminated by ascii-Z
d_iss.str(buffer);
delete buffer;
rdbuf(d_iss.rdbuf()); // switch buffers
}
Although setField() could be used to configure fistream to use or
not to use fixed-sized field extraction using
manipulators is probably
preferable. To allow field objects to be used as manipulators, an
overloaded extraction operator was defined, accepting an istream & and a
field const & object. Using this extraction operator, statements like
fis >> field(2) >> x >> field(0);are possible (assuming
fis is a fistream object). Here is the
overloaded operator>>(), as well as its declaration:
istream &std::operator>>(istream &str, field const ¶ms)
{
return reinterpret_cast<fistream *>(&str)->setField(params);
}
Declaration:
namespace std
{
istream &operator>>(istream &str, FBB::field const ¶ms);
}
Finally, an example. The following program uses a fistream object to
url-decode url-encoded information appearing at its standard input:
int main()
{
fistream fis(cin);
fis >> hex;
while (true)
{
size_t x;
switch (x = fis.get())
{
case '\n':
cout << endl;
break;
case '+':
cout << ' ';
break;
case '%':
fis >> field(2) >> x >> field(0);
// FALLING THROUGH
default:
cout << static_cast<char>(x);
break;
case EOF:
return 0;
}
}
}
/*
Generated output after:
echo My+name+is+%60Ed%27 | a.out
My name is `Ed'
*/
fork()
system call is well
known. When a program needs to start a new process,
system() can be used,
but this requires the program to wait for the
child process to
terminate. The more general way to spawn subprocesses is to call fork().
In this section we will see how C++ can be used to wrap classes around a
complex system call like fork(). Much of what follows in this section
directly applies to the
Unix
operating system, and the discussion will
therefore focus on that operating system. However, other systems usually
provide comparable facilities. The following discussion is based heavily on
the notion of
design patterns, as published by Gamma et al. (1995)
When fork() is called, the current program is duplicated in memory,
thus creating a new process, and both processes continue their execution just
below the fork() system call. The two processes may, however, inspect the
return value of fork(): the return value in the original process (called
the
parent process) differs from the return value in the newly created
process (called the
child process):
fork() returns the
process ID of
the child process created by the fork() system call. This is a positive
integer value.
fork() returns 0.
fork() fails, -1 is returned.
A basic Fork class should hide all bookkeeping details of a system
call like fork() from its users. The class Fork developed here will do
just that. The class itself only needs to take care of the proper execution of
the fork() system call. Normally, fork() is called to start a child
process, usually boiling down to the execution of a separate process. This
child process may expect input at its standard input stream and/or may
generate output to its standard output and/or standard error streams. Fork
does not know all this, and does not have to know what the child process will
do. However, Fork objects should be able to activate their child
processes.
Unfortunately, Fork's constructor cannot know what actions its child
process should perform. Similarly, it cannot know what actions the parent
process should perform. For this particular situation, the
template method design pattern
was developed. According to Gamma c.s., the template method design
pattern
``Define(s) the skeleton of an algorithm in an operation, deferring some steps to subclasses. (The) Template Method (design pattern) lets subclasses redefine certain steps of an algorithm, without changing the algorithm's structure.''
This design pattern allows us to define an
abstract base class
already providing the essential steps related to the fork() system
call and deferring the implementation of certain normally used parts of the
fork() system call to subclasses.
The Fork abstract base class itself has the following characteristics:
d_pid. This data member will contain
the child's
process id (in the parent process) and the value 0 in the
child process. Its public interface declares but two members:
fork() member function, performing the actual forking
(i.e., it will create the (new) child process);
virtual destructor ~Fork(), which will be
overridden by derived classes defining their own destructors.
inline Fork::~Fork()
{}
Fork's interface:
class Fork
{
int d_pid;
public:
virtual ~Fork();
void fork();
protected:
int pid() const;
virtual void childRedirections();
virtual void parentRedirections();
virtual void childProcess() = 0; // both MUST be implemented
virtual void parentProcess() = 0;
int waitForChild(); // returns the status
};
protected section and can thus only be used by derived classes. They
are:
pid(), allowing derived classes to
access the system fork()'s return value:
inline int Fork::pid() const
{
return d_pid;
}
int waitForChild(), which can be called by parent
processes to wait for the completion of their child processes (as discussed
below). This member is declared in the class interface. Its implementation is:
#include "fork.ih"
int Fork::waitForChild()
{
int status;
waitpid(d_pid, &status, 0);
return WEXITSTATUS(status);
}
This simple implementation returns the child's
exit status to
the parent. The called system function
waitpid() blocks until the
child terminates.
fork() system calls are used,
parent processes
and
child processes
must always be distinguished. The
main distinction between these processes is that d_pid will be equal to
the child's process-id in the parent process, while d_pid will be equal to
0 in the child process itself. Since these two processes must always be
distinguished (and present), their implementation by classes derived from
Fork is enforced by Fork's interface: the members childProcess(),
defining the child process' actions and parentProcess(), defining the
parent process' actions were defined as pure virtual functions.
childRedirections(): this member should be implemented if any
standard stream (cin, cout) or cerr must be redirected in the
child process (cf. section 21.4.1);
parentRedirections(): this member should be implemented if any
standard stream (cin, cout) or cerr must be redirected in the
parent process.
inline void Fork::childRedirections()
{}
inline void Fork::parentRedirections()
{}
fork() calls the system function fork()
(Caution: since the system function fork() is called by a member
function having the same name, the :: scope resolution operator must be
used to prevent a recursive call of the member function itself). After calling
::fork(), depending on its return value, either parentProcess()
or childProcess() is called. Maybe redirection is
necessary. Fork::fork()'s implementation calls childRedirections()
just before calling childProcess(), and parentRedirections() just
before calling parentProcess():
#include "fork.ih"
void Fork::fork()
{
if ((d_pid = ::fork()) < 0)
throw "Fork::fork() failed";
if (d_pid == 0) // childprocess has pid == 0
{
childRedirections();
childProcess();
exit(1); // we shouldn't come here:
// childProcess() should exit
}
parentRedirections();
parentProcess();
}
In fork.cc the class's
internal header file fork.ih is
included. This header file takes care of the inclusion of the necessary system
header files, as well as the inclusion of fork.h itself. Its
implementation is:
#include "fork.h"
#include <cstdlib>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
Child processes should not return: once they have completed their tasks,
they should terminate. This happens automatically when the child process
performs a call to a member of the
exec...() family, but if the child
itself remains active, then it must make sure that it terminates properly. A
child process normally uses
exit() to terminate itself, but note that
exit() prevents the activation of destructors of objects
defined at the same or more superficial nesting levels than the level at
which exit() is called. Destructors of globally defined objects are
activated when exit() is used. When using exit() to terminate
childProcess(), it should either itself call a support member function
defining all nested objects it needs, or it should define all its objects in a
compound statement (e.g., using a throw block) calling exit() beyond
the compound statement.
Parent processes should normally wait for their children to complete. The terminating child processes inform their parent that they are about to terminate by sending out a signal which should be caught by their parents. If child processes terminate and their parent processes do not catch those signal then such child processes remain visible as so-called zombie processes.
If parent processes must wait for their children to complete, they may
call the member waitForChild(). This member returns the exit status of a
child process to its parent.
There exists a situation where the child process continues to
live, but the parent dies. In nature this happens all the time: parents
tend to die before their children do. In our context (i.e. C++), this is
called a
daemon program: the parent process dies and the child program
continues to run as a child of the basic
init process. Again, when the
child eventually dies a signal is sent to its `step-parent'
init. No zombie is created here, as init catches the termination
signals of all its
(step-) children. The construction of a
daemon process is very simple, given the availability of the class Fork
(cf. section 21.4.2).
ios::rdbuf() member function. By assigning the streambuf of a stream
to another stream, both stream objects access the same streambuf, thus
implementing redirection at the level of the programming language itself.
Note that this is fine within the context of the C++
program, but if that context is left, the redirection terminates, as the
operating system does not know about streambuf objects. This happens,
e.g., when a program uses a
system() call to start a subprogram. The
program at the end of this section uses C++ redirection to redirect the
information inserted into
cout to a file, and then calls
system("echo hello world")
to echo a well-known line of text. Since echo writes its information
to the standard output, this would be the program's redirected file if
C++'s redirection would be recognized by the operating system.
Actually, this doesn't happen; and hello world still appears at the
program's standard output instead of the redirected file. A solution of this
problem involves redirection at the operating system level, for which some
operating systems (e.g.,
Unix and friends) provide system calls like
dup() and
dup2(). Examples of these system calls are given in section
21.4.3.
Here is the example of the failing redirection at the system level
following C++ redirection using streambuf redirection:
#include <iostream>
#include <fstream>
#include <cstdlib>
using namespace::std;
int main()
{
ofstream of("outfile");
cout.rdbuf(of.rdbuf());
cout << "To the of stream" << endl;
system("echo hello world");
cout << "To the of stream" << endl;
}
/*
Generated output: on the file `outfile'
To the of stream
To the of stream
On standard output:
hello world
*/
fork() is to start a
child process. The parent process terminates immediately after spawning the
child process. If this happens, the child process continues to run as a child
process of
init, the always running first process on
Unix systems. Such
a process is often called a
daemon, running as a
background process.
Although the following example can easily be constructed as a plain C
program, it was included in the C++ Annotations because it is so closely
related to the current discussion of the Fork class. I thought about
adding a daemon() member to that class, but eventually decided against it
because the construction of a daemon program is very simple and requires no
features other than those currently offered by the class Fork. Here is an
example illustrating the construction of a daemon program:
#include <iostream>
#include <unistd.h>
#include "fork.h"
class Daemon: public Fork
{
public:
virtual void parentProcess() // the parent does nothing.
{}
virtual void childProcess()
{
sleep(3); // actions taken by the child
// just a message...
std::cout << "Hello from the child process\n";
exit (0); // The child process exits.
}
};
int main()
{
Daemon daemon;
daemon.fork(); // program immediately returns
return 0;
}
/*
Generated output:
The next command prompt, then after 3 seconds:
Hello from the child process
*/
pipe() system call. When two processes want to communicate
using such file descriptors, the following takes place:
pipe() system call. One of the file descriptors is used for writing, the
other file descriptor is used for reading.
fork() function is called),
duplicating the file descriptors. Now we have four file descriptors as both
the child process and the parent process have their own copies of the two
file descriptors created by pipe().
Pipe class
constructed here. Let's have a look at its characteristics (before the
implementations can be compiled, the compiler must have read the
class's header file as well as the file unistd.h):
pipe() system call expects a pointer to two int values,
which will represent, respectively, the file descriptors to use for accessing
the reading end and the writing end of the constructed pipe, after
pipe()'s successful completion. To avoid confusion, an enum is defined
associating these ends with symbolic constants. Furthermore, the class stores
the two file descriptors in a data member d_fd. Here is the class header
and its private data:
class Pipe
{
enum RW { READ, WRITE };
int d_fd[2];
pipe() to create a set of associated file descriptors used for
accessing both ends of a pipe:
Pipe::Pipe()
{
if (pipe(d_fd))
throw "Pipe::Pipe(): pipe() failed";
}
readOnly() and readFrom() are used to configure
the pipe's reading end. The latter function is used to set up redirection, by
providing an alternate file descriptor which can be used to read from the
pipe. Usually this alternate file descriptor is
STDIN_FILENO, allowing
cin to extract information from the pipe. The former function is merely
used to configure the reading end of the pipe: it closes the matching writing
end, and returns a file descriptor that can be used to read from the pipe:
int Pipe::readOnly()
{
close(d_fd[WRITE]);
return d_fd[READ];
}
void Pipe::readFrom(int fd)
{
readOnly();
redirect(d_fd[READ], fd);
close(d_fd[READ]);
}
writeOnly() and two writtenBy() members are available to
configure the writing end of a pipe. The former function is merely used to configure the writing end of the
pipe: it closes the matching reading end, and returns a file descriptor that
can be used to write to the pipe:
int Pipe::writeOnly()
{
close(d_fd[READ]);
return d_fd[WRITE];
}
void Pipe::writtenBy(int fd)
{
writtenBy(&fd, 1);
}
void Pipe::writtenBy(int const *fd, size_t n)
{
writeOnly();
for (size_t idx = 0; idx < n; idx++)
redirect(d_fd[WRITE], fd[idx]);
close(d_fd[WRITE]);
}
For the latter member two overloaded versions are available:
writtenBy(int fileDescriptor) is used to configure single
redirection, so that a specific file descriptor (usually
STDOUT_FILENO
or
STDERR_FILENO) may be used to write to the pipe;
(writtenBy(int *fileDescriptor, size_t n = 2)) may be used
to configure multiple redirection, providing an array argument containing
file descriptors. Information written to any of these file descriptors is
actually written into the pipe.
redirect(), which is used
to define a redirection using the
dup2() system call. This function
expects two file descriptors. The first file descriptor represents a file
descriptor which can be used to access the device's information, the second
file descriptor is an alternate file descriptor which may also
be used to access the device's information once dup2() has completed
successfully. Here is redirect()'s implementation:
void Pipe::redirect(int d_fd, int alternateFd)
{
if (dup2(d_fd, alternateFd) < 0)
throw "Pipe: redirection failed";
}
Pipe
objects, we'll now use Fork and Pipe in several demonstration
programs.
ParentSlurp, derived from Fork, starts a child process
which execs a program (like /bin/ls). The (standard) output of the
execed program is then read by the parent process. The parent process will
(for demonstration purposes) write the lines it receives to its standard
output stream, while prepending linenumbers to the received lines. It is most
convenient here to redirect the parents standard input stream, so that the
parent can read the output from the child process from its std::cin
input stream. Therefore, the only pipe that's used is used as an input
pipe at the parent, and an output pipe at the child.
The class ParentSlurp has the following characteristics:
Fork. Before starting ParentSlurp's class
interface, the compiler must have read both fork.h and
pipe.h. Furthermore, the class only uses one data member: a Pipe
object d_pipe.
Pipe's constructor automatically constructs a pipe, and
since d_pipe is automatically constructed by ParentSlurp's default
constructor, there is no need to define ParentSlurp's constructor
explicitly. As no construtor needs to be implemented, all ParentSlurp's
members can be declared as protected members. Here is the class's
interface:
class ParentSlurp: public Fork
{
Pipe d_pipe;
protected:
virtual void childRedirections();
virtual void parentRedirections();
virtual void childProcess();
virtual void parentProcess();
};
childRedirections() member configures the pipe as a pipe for
reading. So, all information written to the child's standard output stream
will end up in the pipe. The big advantage of this all is that no streams
around file descriptors are needed to write to a file descriptor:
inline void ParentSlurp::childRedirections()
{
d_pipe.writtenBy(STDOUT_FILENO);
}
parentRedirections() member, configures its end of the pipe
as a reading pipe. It does so by redirecting the reading end of the pipe to
its standard input file descriptor (STDIN_FILENO), thus allowing
extractions from cin instead of using streams built around file
descriptors.
inline void ParentSlurp::parentRedirections()
{
d_pipe.readFrom(STDIN_FILENO);
}
childProcess() member only has to concentrate on its own
actions. As it only needs to execute a program (writing information to its
standard output), the member consists of but one statement:
inline void ParentSlurp::childProcess()
{
execl("/bin/ls", "/bin/ls", static_cast<char *>(0));
}
parentProcess() member simply `slurps' the information
appearing at its standard input. Doing so, it actually reads the child's
output. It copies the received lines to its standard output stream after
having prefixed line numbers to them:
void ParentSlurp::parentProcess()
{
std::string line;
size_t nr = 1;
while (getline(std::cin, line))
std::cout << nr++ << ": " << line << std::endl;
waitForChild();
}
ParentSlurp object, and
calls its fork() member. Its output consists of a numbered list of files
in the directory where the program is started. Note that the program also
needs the fork.o, pipe.o and waitforchild.o object files (see
earlier sources):
int main()
{
ParentSlurp ps;
ps.fork();
return 0;
}
/*
Generated Output (example only, actually obtained output may differ):
1: a.out
2: bitand.h
3: bitfunctional
4: bitnot.h
5: daemon.cc
6: fdinseek.cc
7: fdinseek.h
...
*/
start will start a new child process. The parent will return the ID
(a number) to the user. The ID may thereupon be used to send a message to that
particular child process
<nr> text will send ``text'' to the child process having ID
<nr>;
stop <nr> will terminate the child process having ID <nr>;
exit will terminate the parent as well as all of its children.
A problem with programs like our monitor is that these programs allow
asynchronous input from multiple sources: input may appear at the
standard input as well as at the input-sides of pipes. Also, multiple output
channels are used. To handle situations like these, the
select() system
call was developed.
select() system call was developed to handle asynchronous
I/O multiplexing.
This system call can be used to handle, e.g., input appearing
simultaneously at a set of file descriptors.
The select() system function is rather complex, and its full
discussion is beyond the C++ Annotations' scope.
However, its use may be simplified by providing a class Selector,
hiding its details and offering an easy-to-use public interface. Here its
characteristics are discussed:
Select's members are very small, allowing us to define
most of its members as inline functions. The class requires quite a few data
members. Most of them of types that were specifically constructed for use by
select(). Therefore, before the class interface can be handled by the
compiler, various header files must have been read by it:
#include <limits.h>
#include <unistd.h>
#include <sys/time.h>
#include <sys/types.h>
fd_set is a type designed to be used by select() and variables of
this type contain the set of filedescriptors on which select() has sensed
some activity. Furthermore, select() allows us to fire an
asynchronous alarm. To specify alarm times, the class receives a
timeval data member. The remaining members are used by the class for
internal bookkeeping purposes, illustrated below. Here is the class's
interface:
class Selector
{
fd_set d_read;
fd_set d_write;
fd_set d_except;
fd_set d_ret_read;
fd_set d_ret_write;
fd_set d_ret_except;
timeval d_alarm;
int d_max;
int d_ret;
int d_readidx;
int d_writeidx;
int d_exceptidx;
public:
Selector();
int wait();
int nReady();
int readFd();
int writeFd();
int exceptFd();
void setAlarm(int sec, int usec = 0);
void noAlarm();
void addReadFd(int fd);
void addWriteFd(int fd);
void addExceptFd(int fd);
void rmReadFd(int fd);
void rmWriteFd(int fd);
void rmExceptFd(int fd);
private:
int checkSet(int *index, fd_set &set);
void addFd(fd_set *set, int fd);
};
Selector(): the (default) constructor. It
clears the read, write, and execute fd_set variables, and switches off the
alarm. Except for d_max, the remaining data members do not require
initializations. Here is the implementation of Selector's constructor:
Selector::Selector()
{
FD_ZERO(&d_read);
FD_ZERO(&d_write);
FD_ZERO(&d_except);
noAlarm();
d_max = 0;
}
int wait(): this member function will block()
until activity is sensed at any of the file descriptors monitored by
the Selector object, or if the alarm times out. It will throw an
exception when the select() system call itself fails. Here is wait()'s
implementation:
int Selector::wait()
{
timeval t = d_alarm;
d_ret_read = d_read;
d_ret_write = d_write;
d_ret_except = d_except;
d_readidx = 0;
d_writeidx = 0;
d_exceptidx = 0;
d_ret = select(d_max, &d_ret_read, &d_ret_write, &d_ret_except, &t);
if (d_ret < 0)
throw "Selector::wait()/select() failed";
return d_ret;
}
int nReady: this member function's return value
is defined only when wait() has returned. In that case it returns
0 for a alarm-timeout, -1 if select() failed, and the number of file
descriptors on which activity was sensed otherwise. It can be implemented
inline:
inline int Selector::nReady()
{
return d_ret;
}
int readFd(): this member function's return
value also is defined only after wait() has returned. Its return value is
-1 if no (more) input file descriptors are available. Otherwise the next file
descriptor available for reading is returned. Its inline implementation is:
inline int Selector::readFd()
{
return checkSet(&d_readidx, d_ret_read);
}
int writeFd(): operating analogously to
readFd(), it returns the next file descriptor to which output is written.
Using d_writeidx and d_ret_read, it is implemented analogously to
readFd();
int exceptFd(): operating analogously to
readFd(), it returns the next exception file descriptor on which activity
was sensed. Using d_except_idx and d_ret_except, it is implemented
analogously to readFd();
void setAlarm(int sec, int usec = 0): this
member activates Select's alarm facility. At least the number of seconds
to wait for the alarm to go off must be specified. It simply assigns values to
d_alarm's fields. Then, at the next Select::wait() call, the alarm
will fire (i.e., wait() returns with return value 0) once the configured
alarm-interval has passed. Here is its (inline) implementation:
inline void Selector::setAlarm(int sec, int usec)
{
d_alarm.tv_sec = sec;
d_alarm.tv_usec = usec;
}
void noAlarm(): this member switches off the
alarm, by simply setting the alarm interval to a very long period. Implemented
inline as:
inline void Selector::noAlarm()
{
setAlarm(INT_MAX, INT_MAX);
}
void addReadFd(int fd): this member adds a
file descriptor to the set of input file descriptors monitored by the
Selector object. The member function wait() will return once input is
available at the indicated file descriptor. Here is its inline implementation:
inline void Selector::addReadFd(int fd)
{
addFd(&d_read, fd);
}
void addWriteFd(int fd): this member adds a
file descriptor to the set of output file descriptors monitored by the
Selector object. The member function wait() will return once output is
available at the indicated file descriptor. Using d_write, it is
implemented analogously as addReadFd();
void addExceptFd(int fd): this member adds
a file descriptor to the set of exception file descriptors to be monitored by
the Selector object. The member function wait() will return once
activity is sensed at the indicated file descriptor. Using d_except, it
is implemented analogously as addReadFd();
void rmReadFd(int fd): this member removes a
file descriptor from the set of input file descriptors monitored by the
Selector object. Here is its inline implementation:
inline void Selector::rmReadFd(int fd)
{
FD_CLR(fd, &d_read);
}
void rmWriteFd(int fd): this member removes a
file descriptor from the set of output file descriptors monitored by the
Selector object. Using d_write, it is implemented analogously as
rmReadFd();
void rmExceptFd(int fd): this member removes
a file descriptor from the set of exception file descriptors to be monitored
by the Selector object. Using d_except, it is implemented analogously
as rmReadFd();
private section:
addFd() adds a certain file descriptor to a certain
fd_set. Here is its implementation:
void Selector::addFd(fd_set *set, int fd)
{
FD_SET(fd, set);
if (fd >= d_max)
d_max = fd + 1;
}
checkSet() tests whether a certain file descriptor
(*index) is found in a certain fd_set. Here is its implementation:
int Selector::checkSet(int *index, fd_set &set)
{
int &idx = *index;
while (idx < d_max && !FD_ISSET(idx, &set))
++idx;
return idx == d_max ? -1 : idx++;
}
monitor program uses a Monitor object to do most of the
work. The class has only one public constructor and one public member,
run(), to perform its tasks. Therefore, all other member functions
described below should be declared in the class's private section.
Monitor defines the private enum Commands, symbolically listing
the various commands its input language supports, as well as several data
members, among which a Selector object and a map using child order
numbers as its keys, and pointer to Child objects (see section 21.4.5.3)
as its values. Furthermore, Monitor has a static array member
s_handler[], storing pointers to member functions handling user commands.
A destructor should have been implemented too, but its implementation is
left as an exercise to the reader. Before the class interface can be processed
by the compiler, it must have seen select.h and child.h. Here is the
class header, including the interface of the nested function object
class Find:
class Monitor
{
enum Commands
{
UNKNOWN,
START,
EXIT,
STOP,
TEXT,
sizeofCommands
};
class Find
{
int d_nr;
public:
Find(int nr);
bool operator()(std::map<int, Child *>::value_type &vt)
const;
};
Selector d_selector;
int d_nr;
std::map<int, Child *> d_child;
static void (Monitor::*s_handler[])(int, std::string const &);
public:
enum Done
{};
Monitor();
void run();
private:
static void killChild(std::map<int, Child *>::value_type it);
static void initialize();
Commands next(int *value, std::string *line);
void processInput();
void processChild(int fd);
void createNewChild(int, std::string const &);
void exiting(int = 0, std::string const &msg = std::string());
void sendChild(int value, std::string const &line);
void stopChild(int value, std::string const &);
void unknown(int, std::string const &);
};
Since there's only one non-class type data member, the class's constructor
remains very short and could be implemented inline. However, the array
s_handler, storing pointers to functions needs to be initialized as
well. This can be accomplished in several ways:
Command enumeration only contains a fairly limited set
of commands, compile-time initialization could be considered:
void (Monitor::*Monitor::s_handler[])(int, string const &) =
{
&Monitor::unknown, // order follows enum Command's
&Monitor::createNewChild, // elements
&Monitor::exiting,
&Monitor::stopChild,
&Monitor::sendChild,
};
The advantage of this is that it's simple, and not requiring any run-time
effort. The disadvantage is of course relatively complex maintenance. If for
some reason Commads is modified, s_handler must be modified as
well. In cases like these, compile-time initialization is a little bit asking
for trouble. There is a simple alternative though, which admittedly
does take some execution time:
Monitor object is
constructed, which initializes the elements of the array explicitly. This has
the advantage of robustness against reordering of enumeration values, which is
important: enumerations do receive modifications during the development
cycle of a class. Maintenance is still required if new values are added to
the enumeration, but in that case maintenance is required anyway.
main() may
become a burden, or may be considered unacceptable, as it puts an additional
responsibility with the software engineer, rather than with the software. It's
a matter of taste whether that's a consideration to take seriously or not. If
the initialization function is not called, the program will clearly fail and
repairing the error caused by not calling the initialization function is
easily repaired. If that's considered bad practice, the initialization
function may be called from the class constructors as well. The following
initialization function used in the current implementation of the class
Monitor:
void (Monitor::*Monitor::s_handler[sizeofCommands])(int, string const &);
void Monitor::initialize()
{
if (s_handler[UNKNOWN] != 0) // already initialized
return;
s_handler[UNKNOWN] = &Monitor::unknown;
s_handler[START] = &Monitor::createNewChild;
s_handler[EXIT] = &Monitor::exiting;
s_handler[STOP] = &Monitor::stopChild;
s_handler[TEXT] = &Monitor::sendChild;
}
Monitor's constructor may call
the initialization and still defensibly be implemented inline:
inline Monitor::Monitor()
:
d_nr(0)
{
initialize();
}
The core of Monitor's activities are performed by run(). It
performs the following tasks:
Monitor object only listens to its standard
input: the set of input file descriptors to which d_selector will listen
is initialized to STDIN_FILENO.
d_selector's wait() function is called.
If input on cin is available, it is processed by processInput().
Otherwise, the input has arived from a child process. Information sent by
children is processed by processChild().
Monitor caught the termination signals. As noted by Ben Simons
(ben at mrxfx dot com) this is inappropriate: the process spawning child
processes has that responsibility (so, the parent process is responsible for
its child processes; a child process is in turn responsible for its own child
processes). Thanks, Ben).
run()'s implementation:
#include "monitor.ih"
void Monitor::run()
{
d_selector.addReadFd(STDIN_FILENO);
while (true)
{
cout << "? " << flush;
try
{
d_selector.wait();
int fd;
while ((fd = d_selector.readFd()) != -1)
{
if (fd == STDIN_FILENO)
processInput();
else
processChild(fd);
}
cout << "NEXT ...\n";
}
catch (char const *msg)
{
exiting(1, msg);
}
}
}
The member function processInput() reads the commands entered by the
user via the program's standard input stream. The member itself is rather
simple: it calls next() to obtain the next command entered by the user,
and then calls the corresponding function using the matching element of the
s_handler[] array. The members processInput() and
next() were defined as follows:
void Monitor::processInput()
{
string line;
int value;
Commands cmd = next(&value, &line);
(this->*s_handler[cmd])(value, line);
}
Monitor::Commands Monitor::next(int *value, string *line)
{
if (!getline(cin, *line))
exiting(1, "Command::next(): reading cin failed");
if (*line == "start")
return START;
if (*line == "exit" || *line == "quit")
{
*value = 0;
return EXIT;
}
if (line->find("stop") == 0)
{
istringstream istr(line->substr(4));
istr >> *value;
return !istr ? UNKNOWN : STOP;
}
istringstream istr(line->c_str());
istr >> *value;
if (istr)
{
getline(istr, *line);
return TEXT;
}
return UNKNOWN;
}
All other input sensed by d_select has been created by child
processes. Because d_select's readFd() member returns the
corresponding input file descriptor, this descriptor can be passed to
processChild(). Then using a
ifdstreambuf (see section 21.2.2.1),
its information is read from an input stream. The
communication protocol
used here is rather basic: To every line of input sent to a child, the child
sends exactly one line of text in return. Consequently, processChild()
just has to read one line of text:
void Monitor::processChild(int fd)
{
ifdstreambuf ifdbuf(fd);
istream istr(&ifdbuf);
string line;
getline(istr, line);
cout << d_child[fd]->pid() << ": " << line << endl;
}
Please note the construction d_child[fd]->pid() used in the above
source. Monitor defines the data member map<int, Child *> d_child.
This map contains the child's order number as its key, and a pointer to
the Child object as its value. A pointer is used here, rather than a
Child object, since we do want to use the facilities offered by the map,
but don't want to copy a Child object.
The implication of using pointers as map-values is of course that the
responsibility to destruct the Child object once it becomes superfluous
now lies with the programmer, and not any more with the run-time support
system.
Now that run()'s implementation has been covered, we'll concentrate on
the various commands users might enter:
start command is issued, a new child process is started.
A new element is added to d_child by the member createNewChild().
Next, the Child object should start its activities, but the Monitor
object can not wait here for the child process to complete its activities, as
there is no well-defined endpoint in the near future, and the user will
probably want to enter more commands. Therefore, the Child process
will run as a
daemon: its parent process will terminate immediately, and
its own child process will continue in the background. Consequently,
createNewChild() calls the child's fork() member. Although it is
the child's fork() function that is called, it is still the monitor
program wherein fork() is called. So, the monitor program is
duplicated by fork(). Execution then continues:
Child's parentProcess() in its parent process;
Child's childProcess() in its child process
Child's parentProcess() is an empty function, returning
immediately, the Child's parent process effectively continues immediately
below createNewChild()'s cp->fork() statement. As the child process
never returns (see section 21.4.5.3), the code below cp->fork() is never
executed by the Child's child process. This is exactly as it should be.
In the parent process, createNewChild()'s remaining code simply
adds the file descriptor that's available for reading information from the
child to the set of input file descriptors monitored by d_select, and
uses d_child to establish the association between that
file descriptor and the Child object's address:
void Monitor::createNewChild(int, string const &)
{
Child *cp = new Child(++d_nr);
cp->fork();
int fd = cp->readFd();
d_selector.addReadFd(fd);
d_child[fd] = cp;
cerr << "Child " << d_nr << " started\n";
}
stop <nr>
and <nr> text commands. The former command terminates child process
<nr>, by calling stopChild(). This function locates the child process
having the order number using an anonymous object of the class Find,
nested inside Monitor. The class Find simply compares the
provided nr with the children's order number returned by their nr()
members:
inline Monitor::Find::Find(int nr)
:
d_nr(nr)
{}
inline bool Monitor::Find::operator()(
std::map<int, Child *>::value_type &vt) const
{
return d_nr == vt.second->nr();
}
If the child process having order number nr was found, its file
descriptor is removed from d_selector's set of input file
descriptors. Then the child process itself is terminated by the static member
killChild(). The member killChild() is declared as a static member
function, as it is used as function argument of the for_each() generic
algorithm by erase() (see below). Here is killChild()'s
implementation:
void Monitor::killChild(map<int, Child *>::value_type it)
{
if (kill(it.second->pid(), SIGTERM))
cerr << "Couldn't kill process " << it.second->pid() << endl;
// reap defunct child process
int status = 0;
while( waitpid( it.second->pid(), &status, WNOHANG) > -1)
{};
return result;
}
Having terminated the specified child process, the corresponding Child
object is destroyed and its pointer is removed from d_child:
void Monitor::stopChild(int nr, string const &)
{
map<int, Child *>::iterator it =
find_if(d_child.begin(), d_child.end(), Find(nr));
if (it == d_child.end())
cerr << "No child number " << nr << endl;
else
{
d_selector.rmReadFd(it->second->readFd());
delete it->second;
d_child.erase(it);
}
}
<nr> text> will send text to child process
nr using the member function sendChild(). This function too, will
use a Find object to locate the process having order number nr, and
will then simply insert the text into the writing end of a pipe connected to
the indicated child process:
void Monitor::sendChild(int nr, string const &line)
{
map<int, Child *>::iterator it =
find_if(d_child.begin(), d_child.end(), Find(nr));
if (it == d_child.end())
cerr << "No child number " << nr << endl;
else
{
ofdnstreambuf ofdn(it->second->writeFd());
ostream out(&ofdn);
out << line << endl;
}
}
exit the member exiting() is called.
It terminates all child processes, by visiting
all elements of d_child using the
for_each() generic
algorithm (see section 17.4.17). The program is subsequently terminated:
void Monitor::exiting(int value, string const &msg)
{
for_each(d_child.begin(), d_child.end(), killChild);
if (msg.length())
cerr << msg << endl;
throw value;
}
main() function is simply:
#include "monitor.h"
int main()
try
{
Monitor monitor;
monitor.run();
}
catch (int exitValue)
{
return exitValue;
}
/*
Example of a session:
# a.out
? start
Child 1 started
? 1 hello world
? 3394: Child 1:1: hello world
? 1 hi there!
? 3394: Child 1:2: hi there!
? start
Child 2 started
? 3394: Child 1: standing by
? 3395: Child 2: standing by
? 3394: Child 1: standing by
? 3395: Child 2: standing by
? stop 1
? 3395: Child 2: standing by
? 2 hello world
? 3395: Child 2:1: hello world
? 1 hello world
No child number 1
? exit3395: Child 2: standing by
?
#
*/
Monitor object starts a child process, it has to create an
object of the class Child. The Child class is derived from the class
Fork, allowing its construction as a
daemon, as discussed in the
previous section. Since a Child object is a daemon, we know that its
parent process should be defined as an empty function. its childProcess()
must of course still be defined. Here are the characteristics of the class
Child:
Child class defines two Pipe data members, to allow
communications between its own child- and parent processes. As these pipes are
used by the Child's child process, their names are aimed at the child
process: the child process reads from d_in, and writes to d_out. Here
is the interface of the class Child:
class Child: public Fork
{
Pipe d_in;
Pipe d_out;
int d_parentReadFd;
int d_parentWriteFd;
int d_nr;
public:
Child(int nr);
virtual ~Child();
int readFd() const;
int writeFd() const;
int pid() const;
int nr() const;
virtual void childRedirections();
virtual void parentRedirections();
virtual void childProcess();
virtual void parentProcess();
};
Child's constructor simply stores its argument, a
child-process order number, in its own d_nr data member:
inline Child::Child(int nr)
:
d_nr(nr)
{}
Child's child process will simply obtain its information from
its standard input stream, and it will write its information to its standard
output stream. Since the communication channels are pipes, redirections must
be configured. The childRedirections()
member is implemented as follows:
void Child::childRedirections()
{
d_in.readFrom(STDIN_FILENO);
d_out.writtenBy(STDOUT_FILENO);
}
d_in is used for writing by the parent, and
d_out is used for reading by the parent. Here is the implementation of
parentRedirections():
void Child::parentRedirections()
{
d_parentReadFd = d_out.readOnly();
d_parentWriteFd = d_in.writeOnly();
}
Child object will exist until it is destroyed by the
Monitor's stopChild() member. By allowing its creator, the Monitor
object, to access the parent-side ends of the pipes, the Monitor object
can communicate with the Child's child process via those pipe-ends. The
members readFd() and writeFd() allow the Monitor object to access
these pipe-ends:
inline int Child::readFd() const
{
return d_parentReadFd;
}
inline int Child::writeFd() const
{
return d_parentWriteFd;
}
Child object's child process basically has two tasks to
perform:
childProcess() defines a local
Selector object, adding STDIN_FILENO to its set of monitored input
file descriptors.
Then, in an endless loop, childProcess() waits for selector.wait()
to return. When the alarm goes off it sends a message to its standard output.
(Hence, into the writing pipe). Otherwise, it will echo the messages appearing
at its standard input to its standard output. Here is the implementation of
the childProcess() member:
void Child::childProcess()
{
Selector selector;
size_t message = 0;
selector.addReadFd(STDIN_FILENO);
selector.setAlarm(5);
while (true)
{
try
{
if (!selector.wait()) // timeout
cout << "Child " << d_nr << ": standing by\n";
else
{
string line;
getline(cin, line);
cout << "Child " << d_nr << ":" << ++message << ": " <<
line << endl;
}
}
catch (...)
{
cout << "Child " << d_nr << ":" << ++message << ": " <<
"select() failed" << endl;
}
}
exit(0);
}
Monitor object to obtain the
Child's process ID and order number, respectively:
inline int Child::pid() const
{
return Fork::pid();
}
inline int Child::nr() const
{
return d_nr;
}
Child process terminates when the user enters a stop
command. When an existing child process number was entered, the corresponding
Child object is removed from Monitor's d_child map. As a result,
its destructor is called. In its turn, Child's destructor will call
kill to terminate its child, and then waits for the child to
terminate. Once the child has terminated, the destructor has completed its
work as well and returns, competing the erasure from d_child. The
implementation offered here will fail if the child process doesn't react to
the SIGTERM signal. In this demonstration program this does not happen. In
`real life' implementations more elaborate killing-procedures may be required
(e.g., using SIGKILL in addition to SIGTERM). As discussed in section
8.8 it is important to ensure that the destruction
succeeds. Here is the implementation of the Child's destructor:
Child::~Child()
{
if (pid())
{
cout << "Killing process " << pid() << "\n";
kill(pid(), SIGTERM);
int status;
wait(&status);
}
}
Some operators appear to be missing: there appear to be no predefined
function objects corresponding to
bitwise operations. However, their
construction is, given the available predefined function objects, not
difficult. The following examples show a
class template implementing a
function object calling the
bitwise and (
operator&()), and a template
class implementing a function object calling the
unary not
(
operator~()). It is left to the reader to construct similar function
objects for other operators.
Here is the implementation of a function object calling the
bitwise
operator&():
#include <functional>
template <typename _Tp>
struct bit_and: public std::binary_function<_Tp, _Tp, _Tp>
{
_Tp operator()(_Tp const &__x, _Tp const &__y) const
{
return __x & __y;
}
};
Here is the implementation of a function object calling operator~():
#include <functional>
template <typename _Tp>
struct bit_not: public std::unary_function<_Tp, _Tp>
{
_Tp operator()(_Tp const &__x) const
{
return ~__x;
}
};
These and other
missing predefined function objects
are also implemented in the
file
bitfunctional, which is found in the cplusplus.yo.zip archive. It
should be noted that these classes are derived from existing class templates
(e.g.,
std::binary_function and
std::unary_function). These base
classes offer several typedefs which are expected (used) by various generic
algorithms as defined in the STL (cf. chapter 17), thus following the
advice offered in, e.g., the C++ header file
bits/stl_function.h:
* The standard functors are derived from structs named unary_function * and binary_function. These two classes contain nothing but typedefs, * to aid in generic (template) programming. If you write your own * functors, you might consider doing the same.
Here is an example using bit_and() removing all odd numbers from a
vector of int values:
#include <iostream>
#include <algorithm>
#include <vector>
#include "bitand.h"
using namespace std;
int main()
{
vector<int> vi;
for (int idx = 0; idx < 10; ++idx)
vi.push_back(idx);
copy
(
vi.begin(),
remove_if(vi.begin(), vi.end(), bind2nd(bit_and<int>(), 1)),
ostream_iterator<int>(cout, " ")
);
cout << endl;
}
/*
Generated output:
0 2 4 6 8
*/
An object of this nested iterator class handled the dereferencing of the pointers stored in the vector. This allowed us to sort the strings pointed to by the vector's elements rather than the pointers.
A drawback of the approach taken in section 19.11.1 is that the class implementing the iterator is closely tied to the derived class as the iterator class was implemented as a nested class. What if we would like to provide any class derived from a container class storing pointers with an iterator handling the pointer-dereferencing?
In this section a variant to the earlier (nested class) approach is discussed. The iterator class will be defined as a class template, parameterizing the data type to which the container's elements point as well as the iterator type of the container itself. Once again, we will implement a RandomIterator as it is the most complex iterator type.
Our class is named RandomPtrIterator, indicating that it is a random
iterator operating on pointer values. The class template defines three
template type parameters:
Class). Like the earlier nested class, RandomPtrIterator's
constructor will be private. Therefore we need friend declarations to
allow client classes to construct RandomPtrIterators. However, a
friend class Class cannot be defined: template parameter types cannot be
used in friend class ... declarations. But this is no big problem: not
every member of the client class needs to construct iterators. In fact, only
Class's begin() and end() members must be able to construct
iterators. Using the template's first parameter, friend declarations can be
specified for the client's begin() and end() members.
BaseIterator);
Type).
RandomPtrIterator uses one private data
element, a BaseIterator. Here is the class interface, including the
constructor's implementation:
#include <iterator>
template <typename Class, typename BaseIterator, typename Type>
class RandomPtrIterator:
public std::iterator<std::random_access_iterator_tag, Type>
{
friend RandomPtrIterator<Class, BaseIterator, Type> Class::begin();
friend RandomPtrIterator<Class, BaseIterator, Type> Class::end();
BaseIterator d_current;
RandomPtrIterator(BaseIterator const ¤t);
public:
bool operator!=(RandomPtrIterator const &other) const;
int operator-(RandomPtrIterator const &rhs) const;
RandomPtrIterator const operator+(int step) const;
Type &operator*() const;
bool operator<(RandomPtrIterator const &other) const;
RandomPtrIterator &operator--();
RandomPtrIterator const operator--(int);
RandomPtrIterator &operator++();
RandomPtrIterator const operator++(int);
bool operator==(RandomPtrIterator const &other) const;
RandomPtrIterator const operator-(int step) const;
RandomPtrIterator &operator-=(int step);
RandomPtrIterator &operator+=(int step);
Type *operator->() const;
};
template <typename Class, typename BaseIterator, typename Type>
RandomPtrIterator<Class, BaseIterator, Type>::RandomPtrIterator(
BaseIterator const ¤t)
:
d_current(current)
{}
Dissecting its friend declarations, we see that the members
begin() and end() of a class Class, returning a
RandomPtrIterator object for the types Class, BaseIterator and
Type are granted access to RandomPtrIterator's private constructor.
That is exactly what we want. Note that begin() and end() are declared
as
bound friends.
All RandomPtrIterator's remaining members are public. Since
RandomPtrIterator is just a generalization of the nested class
iterator developed in section 19.11.1, re-implementing the required
member functions is easy, and only requires us to change iterator into
RandomPtrIterator and to change std::string into Type. For
example, operator<(), defined in the class iterator as
inline bool StringPtr::iterator::operator<(iterator const &other) const
{
return **d_current < **other.d_current;
}
is re-implemented as:
template <typename Class, typename BaseIterator, typename Type>
bool RandomPtrIterator<Class, BaseIterator, Type>::operator<(
RandomPtrIterator const &other) const
{
return **d_current < **other.d_current;
}
As a second example: operator*(), defined in the class
iterator as
inline std::string &StringPtr::iterator::operator*() const
{
return **d_current;
}
is re-implemented as:
template <typename Class, typename BaseIterator, typename Type>
Type &RandomPtrIterator<Class, BaseIterator, Type>::operator*() const
{
return **d_current;
}
The pre- and postfix increment operators are re-implemented as:
template <typename Class, typename BaseIterator, typename Type>
RandomPtrIterator<Class, BaseIterator, Type>
&RandomPtrIterator<Class, BaseIterator, Type>::operator++()
{
++d_current;
return *this;
}
template <typename Class, typename BaseIterator, typename Type>
RandomPtrIterator<Class, BaseIterator, Type> const
RandomPtrIterator<Class, BaseIterator, Type>::operator++(int)
{
return RandomPtrIterator(d_current++);
}
Remaining members can be implemented accordingly, their actual
implementations are left as an exercise to the reader (or can be obtained from
the cplusplus.yo.zip archive, of course).
Reimplementing the class StringPtr developed in section 19.11.1
is not difficult either. Apart from including the header file defining the
class template RandomPtrIterator, it requires only a single modification
as its iterator typedef must now be associated with a
RandomPtrIterator. Here are the full class interface and inline member
definitions:
#ifndef INCLUDED_STRINGPTR_H_
#define INCLUDED_STRINGPTR_H_
#include <vector>
#include <string>
#include "iterator.h"
class StringPtr: public std::vector<std::string *>
{
public:
typedef RandomPtrIterator
<
StringPtr,
std::vector<std::string *>::iterator,
std::string
>
iterator;
typedef std::reverse_iterator<iterator> reverse_iterator;
iterator begin();
iterator end();
reverse_iterator rbegin();
reverse_iterator rend();
};
inline StringPtr::iterator StringPtr::begin()
{
return iterator(this->std::vector<std::string *>::begin() );
}
inline StringPtr::iterator StringPtr::end()
{
return iterator(this->std::vector<std::string *>::end());
}
inline StringPtr::reverse_iterator StringPtr::rbegin()
{
return reverse_iterator(end());
}
inline StringPtr::reverse_iterator StringPtr::rend()
{
return reverse_iterator(begin());
}
#endif
Including StringPtr's modified header file into the program given in
section 19.11.2 will result in a program behaving identically to its
earlier version, albeit that StringPtr::begin() and StringPtr::end()
now return iterator objects constructed from a template definition.
atoi(),
atol(), and other functions, which can be used to convert
ASCII-Z
strings to numeric values. In C++, these functions are still available,
but a more type safe way to convert text to other types is by using
objects of the class std::istringsteam.
Using the std::istringstream class instead of the C standard
conversion functions may have the advantage of type-safety, but it also
appears to be a rather cumbersome alternative. After all, we will have to
construct and initialize a std::istringstream object first, before we're
actually able to extract a value of some type from it. This requires us to use
a variable. Then, if the extracted value is actually only needed to
initialize some function-parameter, one might wonder whether the additional
variable and the istringstream construction can somehow be avoided.
In this section we'll develop a class (
A2x) preventing all the
disadvantages of the standard C library functions, without requiring the
cumbersome definitions of std::istringstream objects over and over
again. The class is called A2x for
`
ascii to anything'.
A2x objects can be used to obtain a value for any type extractable
from std::istream objects given its textual representation. Since A2x
represents the object-variant of the C functions, it is not only type-safe
but also extensible. Consequently, their use is greatly preferred over the
standard C functions. Here are its characteristics:
A2x is derived from std::istringstream, so all members
of the class std::istringstream are available. Thus, extractions of values
of variables can always be performed effortlessly. Here's the class's
interface:
class A2x: public std::istringstream
{
public:
A2x();
A2x(char const *txt);
A2x(std::string const &str);
template <typename Type>
operator Type();
template <typename Type>
Type to();
A2x &operator=(char const *txt);
A2x &operator=(std::string const &str);
A2x &operator=(A2x const &other);
};
A2x has a default constructor and a constructor expecting a
std::string argument. The latter constructor may be used to initialize
A2x objects with text to be converted (e.g., a line of text obtained from
reading a configuration file):
inline A2x::A2x()
{}
inline A2x::A2x(char const *txt) // initialize from text
:
std::istringstream(txt)
{}
inline A2x::A2x(std::string const &str)
:
std::istringstream(str.c_str())
{}
A2x's real strength comes from its operator Type() conversion
member template. As it is a member template, it will automatically adapt
itself to the type of the variable that should be given a value, obtained by
converting the text stored inside the A2x object to the variable's
type. When the extraction fails, A2x's inherited good() member will
return false.
A2x.operator int<int>();
// or just:
A2x.operator int();
As neither syntax looks attractive, the member template
to() was provided as well, allowing constructions like:
A2x.to<int>();
Here is its implementation:
template <typename Type>
inline Type A2x::to()
{
Type t;
return (*this >> t) ? t : Type();
}
allowing for a trivial implementation of operator Type():
template <typename Type>
inline A2x::operator Type()
{
return to<Type>();
}
A2x object is available, it may be reinitialized using
its operator=() member:
#include "a2x.h"
A2x &A2x::operator=(char const *txt)
{
clear(); // very important!!! If a conversion failed, the object
// remains useless until executing this statement
str(txt);
return *this;
}
Here are some examples showing its use:
int x = A2x("12"); // initialize int x from a string "12"
A2x a2x("12.50"); // explicitly create an A2x object
double d;
d = a2x; // assign a variable using an A2x object
cout << d << endl;
a2x = "err";
d = a2x; // d is 0: the conversion failed,
cout << d << endl; // and a2x.good() == false
a2x = " a"; // reassign a2x to new text
char c = a2x; // c now 'a': internally operator>>() is used
cout << c << endl; // so initial blanks are skipped.
int expectsInt(int x); // initialize a parameter using an
expectsInt(A2x("1200")); // anonymous A2x object
d = A2x("12.45").to<int>(); // d is 12, not 12.45
cout << d << endl;
A complementary class (
X2a), converting values to text, can easily be
constructed as well. The construction of X2a is left as an exercise to
the reader.
operator()()) of function objects that are passed as arguments to
the generic algorithms.
Usually this approach requires the construction of a dedicated class
implementing the required function object. However, in many cases the class
context in which the iterators exist already offers the required
functionality. Alternatively, the functionality might exist as member function
of the objects to which the iterators refer. For example, finding the first
empty string object in a vector of string objects could profitably use
the string::empty() member.
Another frequently encountered situation is related to a
local context. Once again, consider the situation where the elements
of a string vector are all visited: each object must be inserted in a
stream whose reference is only known to the function in which the string
elements are visited, but some additional information must be passed to the
insertion function as well, making the use of the ostream_inserter less
appropriate.
The frustrating part of using generic algorithms is that these dedicated
function objects often very much look like each other, but the standard
solution (using predefined function objects using specialized iterators)
seldom do the required job: their fixed function interfaces (e.g.,
equal_to calling the object's operator==()) often are too rigid to be
useful and, furthermore, they are unable to use any additional local
context that is active when they are used.
One may wonder whether class templates might be constructed which can be used again and again to create dedicated function objects. Such class template instantiations should offer facilities to call configurable (member) functions using a configurable local context.
In the upcoming sections, several wrapper templates supporting these requirements are developed. To support a local context, a dedicated local context struct is introduced. Furthermore, the wrapper templates will allow us to specify at construction time the member function that should be called. Thus the rigidness of the fixed member function as used in the predefined function objects is avoided.
As an example of a generic algorithm usually requiring a simple function
object, consider for_each(). The operator()() of the function object
passed to this algorithm receives as its argument a reference to the object to
which the iterators refer. Generally, operator()() will do one of two
things:
operator()(string &str) may call str.length());
somefunction(str)).
somefunction()'s address could actually directly have been passed to the
generic algorithm, so why use this complex procedure? The answer is
context: if somefunction() would actually require other arguments,
representing the local context in which somefunction() was called, then
the function object's constructor could have received the local context as its
arguments, passing that local context on to somefunction(), together with
the object received by the function object's operator()() function. There
is no way to pass any local context to the generic algorithm's simple variant,
in which a function's address is passed to the generic function.
At first sight, however, the fact that a local context differs from one
situation to another makes it hard to standardize the local context: a local
context might consist of values, pointers, references, which differ in number
and types from one situation to another. Defining templates for all possible
situations is clearly impractical, and using C-style
variadic functions
is also not very attractive, since the arguments passed to a variadic function
object constructor cannot simply be passed on to the function object's
operator()().
The concept of a local context struct is introduced to standardize the local context. It is based on the following considerations:
const) local context.
show() may expect two arguments: an
ostream into which the information is inserted and an object which will
be inserted into the stream. For example:
void State::show(ostream &out, Item const &item)
{
out << "Here is item " << item.nr() << ":\n" <<
item << endl;
}
Of course, functions differ greatly in their parameter lists: both the
numbers and types of their parameters vary.
A
local context struct is used to standardize the parameter lists of
functions, for the benefit of template construction. In the above example, the
function State::show() uses a local context consisting of an ostream &
and an Item const &. This context never changes, and may be
offered through a struct defined as follows:
struct ShowContext
{
ostream &out;
Item const &item;
};
Note how this struct mimics State::show()'s parameter list. Since
it is directly connected to the function State::show() it is best defined
in the class State. Once we have defined this struct,
State::show()'s implementation is modified. It now expects a
ShowContext &:
void State::show(ShowContext &context)
{
context.out << "Here is item " << context.item.nr() << ":\n" <<
context.item << endl;
}
Using a local context struct any parameter list (except those of
variadic functions) can be standardized to a parameter list
consisting of a single element. Now that we have a single parameter to specify
any local context we're ready for the `templatization' of
function object wrapper classes.
operator()() is characteristic of
function objects. It may be defined as a function having various
parameters. In the context of generic algorithms, they usually have one or
two parameters referring to the data to be processed by the algorithm.
The class template constructor should be aware of the fact that
it is not known beforehand whether these parameters are objects,
primitive types, pointers or references. Knowing this, however, is
important when the function object instantiated from the template class is to
be used by generic algorithms, since many generic algorithms require that
various types are defined by the function object. For example, generic
algorithms
like for_each, calling unary
argument function objects, expect that these function objects define the types
argument_type and
result_type, referring to the
plain
types of, respectively, the function operator's argument and return
value. Analogously, generic algorithms like includes (cf. section
17.4.1) expect that function objects define the types
first_argument_type,
second_argument_type and result_type.
To determine the plain type of a template type parameter a
trait class
can be used. In section section 20.4 the trait class
TypeTrait was
intrioduced, allowing templates to determine what various characteristics are
of template type parameters. This TypeTrait class can profitably be used
here to determine the proper definnitions of types required by generic
algorithms.
Let's assume that we would like to create a function object changing all
letters in string objects into capital letters. Clearly we will have to
access the string's individual characters. However, the strings themselves may
be made available through references (e.g., when iterating over the elements
of a vector<string>), but also through pointers (e.g., when iterating over
the elements of a vector<string *>).
The template class providing the function object should not be responsible for the actions to be performed. Rather, executing the required actions should be deferred to a function, which can be specified when the template is instantiated. If that function is a member function it should, for various reasons, be a static member function:
.* and ->*, respectively), which
would add to the complexity of the final template class.
Generic algorithms also differ in the way they use the function object's return value. This turns out to be no problem: templates allow us to parameterize the return types of functions.
Strings holding a
vector<string> d_vs data member. We would like to change all occurrences
of a specified set of letters found in the strings stored in d_vs into
uppercase characters, and we would like to insert the original and modified
strings into a configurable ostream object. To accomplish this, our class
offers a member uppercase(ostream &out, char const *letters).
We would like to use the for_each() generic algorithm. This algorithm may
be given a function object. The function object will be initialized with a
local context consisting of the ostream object and the set of letters to
be used. The following support class is constructed:
class Support
{
std::ostream &d_out;
std::string d_letters;
public:
Support(std::ostream &out, char const *letters);
void operator()(std::string &str) const;
};
inline Support::Support(std::ostream &out, char const *letters)
:
d_out(out),
d_letters(letters)
{}
inline void Support::operator()(std::string &str) const
{
d_out << str << " ";
for
(
std::string::iterator strIter = str.begin();
strIter != str.end();
++strIter
)
{
if (d_letters.find(*strIter) != std::string::npos)
*strIter = toupper(*strIter);
}
d_out << str << std::endl;
}
Note that the implementation of operator()() contains another for
statement, which should be replaced by another for_each call. This
suggests that either another function object must be constructed or that an
overloaded version of operator()() must be defined. Both alternatives are
not very attractive: constructing large numbers of small function object
classes soon becomes a nuisance and there's a limit imposed by the parameter
types to the number of overloaded operator()() members that can be
defined, let alone that the self-documenting value of the purpose of
`operator()()' is very limited.
For now an anonymous Support class object is used in the
implementation of the class Strings. Here is an example of its definition
and use:
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include "support.h"
class Strings
{
std::vector<std::string> d_vs;
public:
void uppercase(std::ostream &out, char const *letters);
};
void Strings::uppercase(std::ostream &out, char const *letters)
{
for_each(d_vs.begin(), d_vs.end(), Support(out, letters));
}
using namespace std;
int main(int argc, char **argv)
{
Strings s;
s.uppercase(cout, argv[1]);
}
To `templatize' the Support class using the considerations discussed
previously, we perform the following steps:
struct, which is then passed
to the template's constructor, so of the template
type parameters should be defined as a reference (or pointer, if that's
preferred) to a context struct.
operator()() is
standardized: it will call a function, receiving the operator()()'s
argument (which also becomes a template parameter) and a reference to the
context as its arguments. The address of the function to call may be stored in
a local variable of the function template object. In the Support class,
operator()() uses a void return type. This type is usually
appropriate, but when defining predicates a bool may be
required. Therefore, the return type of the template's operator()() (and
thus the return type of the called function) is made configurable as well,
offering a default type void for convenience. Thus, we get the following
definition of the variable holding the address of the function to call:
ReturnType (*d_fun)(Type, Context);
and the template's operator()(), coining the classname
FnWrap1c for `function object wrapper of a unary (1) function object,
accepting context information', becomes:
template<typename Type, typename Context, typename ReturnType = void>
inline ReturnType FnWrap1c<Type, Context, ReturnType>::operator()(
Type param) const
{
return (*d_fun)(param, d_context);
}
template <typename Type, typename Context, typename ReturnType>
FnWrap1c<Type, Context, ReturnType>::FnWrap1c(
ReturnType fun(Type, Context), Context context)
:
d_fun(fun),
d_context(context)
{}
FnWrap1c using
the TypeTraits class to determine the argument_type and
result_type:
#include "typetrait.h"
template <typename Type, typename Context, typename ReturnType = void>
class FnWrap1c
{
ReturnType (*d_fun)(Type, Context);
Context d_context;
public:
typedef typename TypeTrait<Type>::Plain argument_type;
typedef typename TypeTrait<ReturnType>::Plain result_type;
FnWrap1c(ReturnType fun(Type, Context), Context context);
ReturnType operator()(Type param) const;
};
template <typename Type, typename Context, typename ReturnType>
FnWrap1c<Type, Context, ReturnType>::FnWrap1c(
ReturnType fun(Type, Context), Context context)
:
d_fun(fun),
d_context(context)
{}
template <typename Type, typename Context, typename ReturnType>
inline ReturnType FnWrap1c<Type, Context, ReturnType>::operator()(Type param)
const
{
return (*d_fun)(param, d_context);
}
Now the template can be used. The class Support is abandoned. Instead
of using a separate class, all members required to transform the strings of
Strings objects will be defined as static members inside Strings
itself. This neatly localizes the actions where they belong: inside the class
that wants to transform its own data. So, the original dedicated
implementation of Support::operator()() is now defined as a static
member xform inside the class Strings. The class also defines a
context struct Context, containing a reference to the ostream to use
and a set of characters to capitalize. The public function uppercase now
simply initializes the context struct, and creates an FnWrap1c object,
calling xform, which is passed to the for_each call. Note that
FnWrap1c's template type arguments exactly mirror xform's parameter
types. This is a general characteristic of the function object wrappers that
are introduced in this and the coming section.
It is the purpose of xform to transform the characters of an
individual string. What better procedure than to call for_each again, this
time using the string's begin() and end() members to define another
iterator range. In this case the letter set must be known, and so it is passed
as the local context to FnWrap1c defined inside xform. This latter
FnWrap1c object calls the static function foundToUpper to capitalize
individual characters of the strings if necessary. Here is the
new implementation of the class Strings, now using FnWrap1c:
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include "fnwrap1.h"
class Strings
{
std::vector<std::string> d_vs;
struct Context
{
std::ostream &out;
std::string letters;
};
public:
void uppercase(std::ostream &out, char const *letters);
private:
static void xform(std::string &str, Context &context);
static void foundToUpper(char &ch, std::string const &letters);
};
void Strings::uppercase(std::ostream &out, char const *letters)
{
Context context = {out, letters};
for_each(d_vs.begin(), d_vs.end(),
FnWrap1c<std::string &, Context &>(xform, context));
}
void Strings::xform(std::string &str, Context &context)
{
context.out << str << " ";
for_each(str.begin(), str.end(),
FnWrap1c<char &, std::string const &>(
foundToUpper, context.letters));
context.out << str << std::endl;
}
void Strings::foundToUpper(char &ch, std::string const &letters)
{
if (letters.find(ch) != std::string::npos)
ch = toupper(ch);
}
using namespace std;
int main(int argc, char **argv)
{
Strings s;
s.uppercase(cout, argv[1]);
}
To illustrate the use of the ReturnType template parameter, let's
assume that the transformations are only required up to the first empty
string. In this case, the find_if generic algorithm comes in handy, since
it stops once a predicate returns true. In that case the xform()
function should return a bool value, and the uppercase()
implementation specifies an explicit type (bool) for the ReturnType
template parameter. The next implementation of the class Strings merely
shows the required modifications, the remainder is not altered:
class Strings
{
private:
static bool xform(std::string &str, Context &context);
};
void Strings::uppercase(std::ostream &out, char const *letters)
{
Context context = {out, letters};
find_if(d_vs.begin(), d_vs.end(),
FnWrap1c<std::string &, Context &, bool>(xform, context));
}
bool Strings::xform(std::string &str, Context &context)
{
if (str.empty())
return true;
// previous implementation (not modified)
return false;
}
operator()() is now called with two, rather than one argument. Coining the
classname FnWrap2c, it's implementation is almost identical to
FnWrap1c's implementation, and it can be found in the
Bobcat library.
.
An example showing its use is:
// accumulating strings from a vector to one big string, using
// `accumulate'.
#include <iostream>
#include <numeric>
#include <string>
#include <vector>
#include <bobcat/fnwrap2c>
using namespace std;
using namespace FBB;
class Strings
{
vector<string> d_vs;
public:
Strings()
{
d_vs.push_back("one");
d_vs.push_back("two");
d_vs.push_back("three");
}
void display(ostream &out) const
{
SContext c = {1, out};
cout << "On Exit: " <<
accumulate(
d_vs.begin(), d_vs.end(),
string("HI"),
FnWrap2c<string const &, string const &,
SContext &, string>(&show, c)
) <<
endl;
}
private:
struct SContext
{
size_t nr;
ostream &out;
};
static string show(string const &str1,
string const &str2,
SContext &c)
{
c.out << c.nr++ << " " << str1 << " " << str2 <<
endl;
return str1 + " " + str2;
}
};
int main()
{
Strings s;
s.display(cout);
}
As with the unary template wrapper (see section 21.8.3), an
additional class may be defined that does not require a local context.
After compilation and linking, just run the program without any arguments. It
requires Bobcat's fnwrap2c and typetrait header files.
The current example assumes that the reader knows how to use the
scanner generator
flex and the
parser generator
bison. Both
bison and flex are well documented elsewhere. The original
predecessors of bison and flex, called
yacc and
lex are
described in several books, e.g. in
O'Reilly's book `lex & yacc'.
However, scanner- and parser generators are also (and maybe even more
commonly, nowadays) available as free software. Both bison and flex
are usually part of software distributions or they can be obtained from
ftp://prep.ai.mit.edu/pub/non-gnu. Flex creates a C++ class
when
%option c++ is specified.
For parser generators the program
bison is available. Back in the early
90's Alain Coetmeur (coetmeur@icdc.fr)
created a C++ variant (
bison++) creating a parser class. Although
bison++ program produces code that can be used in C++ programs it also
shows many characteristics that are more appropriate in a C context than
in a C++ context. In January 2005 I rewrote parts of Alain's bison++
program, resulting in the original version of the program
bisonc++. Then,
in May 2005 a complete rewrite of the bisonc++ parser gegerator was
completed, which is available on the Internet having version numbers 0.98 and
beyond. Bisonc++ can be downloaded from
http://bisoncpp.sourceforge.net/, where it is available as source
archive and as binary (i386) Debian binary package
(including bisonc++'s documentation). Bisonc++ creates a cleaner
parser class setup than bison++. In particular, it derives the parser
class from a base-class, containing the parser's token- and type-definitions
as well as all member functions which should not be (re)defined by the
programmer. Most of these members might also be defined directly in the parser
class. Because of this approach, the resulting parser class is very small,
declaring only members that are actually defined by the programmer (as well as
some other members, generated by bisonc++ itself, implementing the
parser's
parse() member). Actually, parse() is initially the only
public member of bisonc++'s generated parser class. Remaining members are
private. The only member which is not implemented by default is lex(),
producing the next lexical token. When the directive %scanner (see section
21.9.2.1) is used, bisonc++ will generate a standard implementation
for this member; otherwise it must be implemented by the programmer.
In this section of the Annotations we will focus on bisonc++ as our
parser generator.
Using flex and bisonc++ class-based scanners and parsers can be
generated. The advantage of this approach is that the interface to the scanner
and the parser tends to become cleaner than without using the class
interface. Furthermore, classes allow us to get rid of most if not all global
variables, making it easy to use multiple parsers in one program.
Below two examples are elaborated. The first example only uses flex. The
scanner it generates monitors the production of a file from several
parts. This example focuses on the lexical scanner, and on switching files
while churning through the information. The second example uses both flex
and bisonc++ to generate a scanner and a parser transforming standard
arithmetic expressions to their postfix notations, commonly used in
code generated by compilers and in HP-calculators. In the second example
the emphasis is mainly on bisonc++ and on composing a scanner object
inside a generated parser.
#include directive, followed by a text
string specifying the file (path) which should be included at the location of
the #include.
In order to avoid complexities irrelevant to the current example, the format
of the #include statement is restricted to the form
#include <filepath>. The file specified between the pointed
brackets should be available at the location indicated by filepath. If the
file is not available, the program terminates after issuing an error message.
The program is started with one or two filename arguments. If the program is
started with just one filename argument, the output is written to the
standard output stream cout. Otherwise, the output is written to
the stream whose name is given as the program's second argument.
The program defines a maximum nesting depth. Once this maximum is exceeded, the program terminates after issuing an error message. In that case, the filename stack indicating where which file was included is printed.
One additional feature is that (standard C++)
comment-lines are
ignored. So, include directives in comment-lines are ignored too.
The program is created along the following steps:
lexer is constructed, containing the
input-language specifications.
lexer the requirements for the
class Scanner evolve. The Scanner class is a
wrapper around the
class
yyFlexLexer generated by
flex. The requirements result in the
interface specification for the class Scanner.
main() is constructed. A Scanner object
is created inspecting the
command-line arguments. If successful,
the scanner's member yylex() is called to construct the
output file.
yyFlexLexer. However, we would of course want to use the
derived class's members in this code. This causes a little problem: how does a
base-class member know about members of classes derived from it?
Fortunately,
inheritance helps us to implement this. In the specification of
the class yyFlexLexer(), we notice that the function
yylex() is a
virtual function. The header file FlexLexer.h declares the
virtual member int yylex():
class yyFlexLexer: public FlexLexer
{
public:
yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 );
virtual ~yyFlexLexer();
void yy_switch_to_buffer( struct yy_buffer_state* new_buffer );
struct yy_buffer_state* yy_create_buffer( istream* s, int size );
void yy_delete_buffer( struct yy_buffer_state* b );
void yyrestart( istream* s );
virtual int yylex();
virtual void switch_streams( istream* new_in, ostream* new_out );
};
As this function is a virtual function it can be overridden in a
derived class. In that case the overridden function will be called from
its
base class (i.e., yyFlexLexer) code. Since the derived class's
yylex() is called, it will now have access to the members of the derived
class, and also to the public and protected members of its base class.
By default, the context in which the generated scanner is placed is the
function
yyFlexLexer::yylex(). This context changes if we use a derived
class, e.g., Scanner. To derive Scanner from yyFlexLexer,
generated by flex, do as follows:
yylex() must be declared in the derived class
Scanner.
flex about the derived
class's name.
Looking at the regular expressions themselves, notice that we need
rules to recognize comment, #include directives,
and all remaining characters. This is all fairly standard practice. When
an #include directive is detected, the directive is parsed by the
scanner. This too is
common practice. Here is what our lexical scanner will
do:
Scanner::Error value (invalidInclude) if this
fails;
nextSource;
#include directive has been processed,
pushSource() is called to perform the switch to another file;
EOF) is reached, the derived class's
member function popSource() is called, popping the previously
pushed file and returning
true;
popSource() returns
false,
resulting in calling yyterminate(), terminating the scanner.
The
lexical scanner specification file is organized similarly as the
one used for flex in C contexts. However, for C++ contexts,
flex may create a class (
yyFlexLexer) from which another class (e.g.,
Scanner) can be
derived. The
flex specification file itself has three sections:
scanner.h, in turn including
FlexLexer.h, which is part of the flex distribution.
However, due to the complex setup of this latter file, it should not be
read again by the code generated by flex. So, we now have the
following situation:
scanner.ih, since this declares, via scanner.h the
class Scanner, so that we're able to call Scanner's members from the
code associated with the regular expressions defined in the lexer
specification file.
scanner.h, defining class Scanner, the header file
FlexLexer.h, declaring Scanner's base class, must have been read
by the compiler before the class Scanner itself is defined.
flex already includes FlexLexer.h,
and as mentioned, FlexLexer.h may not be read again. However, flex will
also insert the specification file's preamble into the code it generates.
scanner.ih, and so
scanner.h, and so FlexLexer.h, we now do include FlexLexer.h
twice in code generated by flex. This must be prevented.
FlexLexer.h the following is
suggested:
scanner.ih includes scanner.h, scanner.h
itself is modified such that it includes FlexLexer.h, unless the C
preprocesser variable
_SKIP_FLEXLEXER_ is defined.
flex' specification file _SKIP_FLEXLEXER_ is defined
just prior to including scanner.ih.
flex will now re-include
FlexLexer.h. At the same time, compiling Scanner's members proceeds
independently of the lexer specification file's preamble, so here
FlexLexer.h is properly included too. Here is the specification files'
preamble:
%{
#define _SKIP_YYFLEXLEXER_
#include "scanner.ih"
%}
flex
symbol area, used to define symbols, like a
mini scanner, or
options. The following options are suggested:
%option 8bit: this allows the generated lexical scanner to
read 8-bit characters (rather than 7-bit, which is the default).
%option c++: this results in flex generating C++
code.
%option debug: this will include debugging
code into the code generated by
flex. Calling the member function
set_debug(true) will activate this
debugging code run-time. When activated, information about which rules are
matched is written to the standard error stream. To suppress the execution of
debug code the member function set_debug(false) may be called.
%option noyywrap: when the scanner reaches the end of file,
it will (by default) call a function yywrap() which may perform the switch
to another file to be processed. Since there exist alternatives which render
this function superfluous (see below), it is suggested to specify this option
as well.
%option outfile="yylex.cc": this defines yylex.cc as
the name of the generated C++ source file.
%option warn: this option is strongly suggested by the
flex documentation, so it's mentioned here as well. See flex'
documentation for details.
%option yyclass="Scanner": this defines Scanner as
the name of the class derived from yyFlexLexer.
%option yylineno: this option causes the
lexical scanner to keep track of the
line numbers of the files it is
scanning. When processing nested files, the variable yylineno is not
automatically reset to the last line number of a file, when returning
to a partially processed file. In those cases, yylineno will explicitly
have to be reset to a former value. If specified, the current line number is
returned by the public member
lineno(), returning an int.
%option yyclass="Scanner" outfile="yylex.cc" c++ 8bit warn noyywrap yylineno %option debug %x comment %x include eolnComment "//".* anyChar .|\n
istream *yyin
to the
ostream *yyout. For this the predefined macro
ECHO can be used. Here is the specification files' symbol area:
%%
/*
The comment-rules: comment lines are ignored.
*/
{eolnComment}
"/*" BEGIN comment;
<comment>{anyChar}
<comment>"*/" BEGIN INITIAL;
/*
File switching: #include <filepath>
*/
#include[ \t]+"<" BEGIN include;
<include>[^ \t>]+ d_nextSource = yytext;
<include>">"[ \t]*\n {
BEGIN INITIAL;
pushSource(YY_CURRENT_BUFFER, YY_BUF_SIZE);
}
<include>{anyChar} throw invalidInclude;
/*
The default rules: eating all the rest, echoing it to output
*/
{anyChar} ECHO;
/*
The <<EOF>> rule: pop a pushed file, or terminate the lexer
*/
<<EOF>> {
if (!popSource(YY_CURRENT_BUFFER))
yyterminate();
}
%%
yyFlexLexer are
protected, and
thus accessible to derived classes), most processing can be left to the
derived class's member functions. This results in a very clean setup of the
lexer specification file, requiring no or hardly any code in the preamble.
class Scanner, derived as usual from the class
yyFlexLexer, is
generated by flex(1)
. The derived class has access to data
controlled by the lexical scanner. In particular, it has access to the
following data members:
char *yytext, containing the
text
matched by a
regular expression. Clients may access this information using
the scanner's
YYText() member;
int yyleng, the
length of the
text in yytext. Clients may access this value using the scanner's
YYLeng() member;
int yylineno: the current
line number. This
variable is only maintained if
%option yylineno
is specified. Clients
may access this value using the scanner's
lineno() member.
FlexLexer.h.
Objects of the class Scanner perform two tasks:
EOF
is detected in a file.
Scanner:
FlexLexer.h, its class opening, and its
private data. Its public section starts off by defining the enum Error
defining various symbolic constants for errors that may be detected:
#if ! defined(_SKIP_YYFLEXLEXER_)
#include <FlexLexer.h>
#endif
class Scanner: public yyFlexLexer
{
std::stack<yy_buffer_state *> d_state;
std::vector<std::string> d_fileName;
std::string d_nextSource;
static size_t const s_maxDepth = 10;
public:
enum Error
{
invalidInclude,
circularInclusion,
nestingTooDeep,
cantRead,
};
Scanner's
constructor. It activates the initial input
(and output) file and pushes the name of the initial input file. Here is its
implementation:
#include "scanner.ih"
Scanner::Scanner(istream *yyin, string const &initialName)
{
switch_streams(yyin, yyout);
d_fileName.push_back(initialName);
}
#include directive, a
switch to another file is performed by pushSource(). If the filename
could not be extracted, the scanner throws an invalidInclude
exception
value. The pushSource() member and the matching function popSource()
handle file switching. Switching to another file proceeds as follows:
include-nesting is inspected.
If s_maxDepth is reached, the stack is considered full, and the scanner
throws a nestingTooDeep exception.
throwOnCircularInclusion() is called to avoid circular
inclusions when switching to new files. This function throws an exception if a
filename is included twice using a simple literal name check. Here is its
implementation:
#include "scanner.ih"
void Scanner::throwOnCircularInclusion()
{
vector<string>::iterator
it = find(d_fileName.begin(), d_fileName.end(), d_nextSource);
if (it != d_fileName.end())
throw circularInclusion;
}
ifstream object is created, for the filename
in nextSource. If this fails, the scanner throws a cantRead exception.
yy_buffer_state is created for the newly
opened stream, and the lexical scanner is instructed to switch to that stream
using yyFlexLexer's member function
yy_switch_to_buffer().
pushSource()'s implementation:
#include "scanner.ih"
void Scanner::pushSource(yy_buffer_state *current, size_t size)
{
if (d_state.size() == s_maxDepth)
throw nestingTooDeep;
throwOnCircularInclusion();
d_fileName.push_back(d_nextSource);
ifstream *newStream = new ifstream(d_nextSource.c_str());
if (!*newStream)
throw cantRead;
d_state.push(current);
yy_switch_to_buffer(yy_create_buffer(newStream, size));
}
yyFlexLexer provides a series of member functions that
can be used to switch files. The file-switching capability of a
yyFlexLexer object is founded on the struct yy_buffer_state,
containing the state of the
scan-buffer of the currently read file. This
buffer is pushed on the d_state stack when an #include is
encountered. Then yy_buffer_state's contents are replaced by the buffer
created for the file to be processed next. Note that in the flex
specification file the function pushSource() is called as
pushSource(YY_CURRENT_BUFFER, YY_BUF_SIZE);
YY_CURRENT_BUFFER and
YY_BUF_SIZE are macros that are only
available in the rules section of the lexer specification file, so they must
be passed as arguments to pushSource(). Currently it is not possible
to use these macros in the Scanner class's member functions directly.
yylineno is not updated when a
file switch is
performed. If line numbers are to be monitored, then the current value of
yylineno should be pushed on a stack, and yylineno should be reset by
pushSource(), whereas popSource() should reinstate a former value of
yylineno by popping a previously pushed value from the
stack. Scanner's current implementation maintains a simple
stack of
yy_buffer_state pointers. Changing that into a stack of
pair<yy_buffer_state *, size_t> elements would allow us to save (and
restore) line numbers as well. This modification is left as an
exercise to
the reader.
popSource() is called to pop the previously
pushed buffer from the stack, allowing the scanner to continue its scan just
beyond the just processed #include directive. The member popSource()
first inspects the size of the d_state stack: if empty, false is
returned and the function terminates. If not empty, then the current buffer is
deleted, to be replaced by the state waiting on top of the stack. The file
switch is performed by the yyFlexLexer members
yy_delete_buffer() and
yy_switch_to_buffer(). Note that yy_delete_buffer() takes care of the
closing of the ifstream and of deleting the memory allocated for this
stream in pushSource(). Furthermore, the filename that was last entered in
the d_fileName vector is removed. Having done all this, the function
returns true:
#include "scanner.ih"
bool Scanner::popSource(yy_buffer_state *current)
{
if (d_state.empty())
return false;
yy_delete_buffer(current);
yy_switch_to_buffer(d_state.top());
d_state.pop();
d_fileName.pop_back();
return true;
}
stackTrace() dumps the names of
the currently pushed files to the standard error stream. It may be called by
exception catchers. Here is its implementation:
#include "scanner.ih"
void Scanner::stackTrace()
{
for (size_t idx = 0; idx < d_fileName.size() - 1; ++idx)
cerr << idx << ": " << d_fileName[idx] << " included " <<
d_fileName[idx + 1] << endl;
}
lastFile() returns the name of the currently processed file. It
may be implemented inline:
inline std::string const &Scanner::lastFile()
{
return d_fileName.back();
}
Scanner::yylex().
Therefore, int yylex() must be declared by the class Scanner, as it
overrides FlexLexer's virtual member yylex().
Scanner is very simple. It expects a filename
indicating where to start the scanning process. Initially the number of
arguments is checked. If at least one argument was given, then an ifstream
object is created. If this object can be created, then a Scanner object is
constructed, receiving the address of the ifstream object and the name of
the initial input file as its arguments. Then the Scanner object's
yylex() member is called. The scanner object throws Scanner::Error
exceptions if it fails to perform its tasks properly. These exceptions are
caught near main()'s end. Here is the program's source:
#include "lexer.h"
using namespace std;
int main(int argc, char **argv)
{
if (argc == 1)
{
cerr << "Filename argument required\n";
exit (1);
}
ifstream yyin(argv[1]);
if (!yyin)
{
cerr << "Can't read " << argv[1] << endl;
exit(1);
}
Scanner scanner(&yyin, argv[1]);
try
{
return scanner.yylex();
}
catch (Scanner::Error err)
{
char const *msg[] =
{
"Include specification",
"Circular Include",
"Nesting",
"Read",
};
cerr << msg[err] << " error in " << scanner.lastFile() <<
", line " << scanner.lineno() << endl;
scanner.stackTrace();
return 1;
}
return 0;
}
flex and the
Gnu C++ compiler
g++ have
been installed:
flex. For
this the following command can be given:
flex lexer
yywrap() function is used, the
libfl.a library should be
linked against the final program. Normally, that's not required, and the
program can be constructed as, e.g.:
g++ -o lexer *.cc
%option debug was specified, debugging code will be included in the
generated scanner. To obtain debugging info, this code must also be
activated. Assuming the scanner object is called scanner, the statement
scanner.set_debug(true);
will produce debugging info to the standard error stream.
Starting point when developing programs that use both parsers and scanners is the grammar. The grammar defines a set of tokens which can be returned by the lexical scanner (commonly called the lexer).
Finally, auxiliary code is provided to `fill in the blanks': the actions performed by the parser and by the lexer are not normally specified literally in the grammatical rules or lexical regular expressions, but should be implemented in member functions, called from within the parser's rules or which are associated with the lexer's regular expressions.
In the previous section we've seen an example of a C++ class generated by
flex. In the current section we concentrate on the parser. The
parser can be generated from a grammar specification, processed by the
program
bisonc++. The grammar specification required for bisonc++ is
similar to the specifications required for
bison (and an existing program
bison++, written in the early nineties by the Frenchman
Alain Coetmeur), but bisonc++ generates a
C++ which more closely follows present-day standards than bison++,
which still shows many C-like features.
In this section a program is developed converting
infix expressions, in which binary operators are written between
their operands, to
postfix expressions, in which binary operators are
written behind their operands. Furthermore, the unary operator - will be
converted from its prefix notation to a postfix form. The unary +
operator is ignored as it requires no further actions. In essence our little
calculator is a micro compiler, transforming numeric expressions into
assembly-like instructions.
Our calculator will recognize a very basic set of operators: multiplication, addition, parentheses, and the unary minus. We'll distinguish real numbers from integers, to illustrate a subtlety in bison-like grammar specifications. That's all. The purpose of this section is, after all, to illustrate the construction of a C++ program that uses both a parser and a lexical scanner, rather than to construct a full-fledged calculator.
In the coming sections we'll develop the grammar specification for
bisonc++. Then, the regular expressions for the scanner are specified
according to flex' requirements. Finally the program is constructed.
bisonc++ is comparable to
the specification file required by
bison. Differences are related to the
class nature of the resulting parser. Our calculator will distinguish real
numbers from integers, and will support a basic set of arithmetic operators.
Bisonc++ should be used as follows:
bisonc++ this is no
different, and bisonc++ grammar definitions are for all practical
purposes identical to bison's grammar definitions.
bisonc++ is able to generate files defining the parser class and
the implementation of the member function parse().
parse()) must be
implemented by the programmer. Of course, they should also be
declared in the parser class's header. At the very least the member
lex() must be implemented. This member is called by parse() to
obtain the next available token. However, bisonc++ offers a
facility providing a standard implementation of the function
lex(). The member function
error(char const *msg)
is given a simple default implementation which may be modified by the
programmer. The member function error() is called when parse()
detects (syntactic) errors.
int main()
{
Parser parser;
return parser.parse();
}
The bisonc++ specification file consists of two sections:
bisonc++ also supports several
new declarations. These new declarations are important and are
discussed below.
bison, albeit that some members that were
available in bison and bison++ are considered obsolete in
bisonc++, while other members can now be used in a wider context.
For example, ACCEPT() and ABORT() can be called from any
member called from the parser's action blocks to terminate the parsing
process.
bison should note that there is no
header section anymore. Header sections are used by bison to provide
for the necessary declarations allowing the compiler to compile the C
function generated by bison. In C++ declarations are part of or
already used by class definitions. Therefore, a parser generator generating a
C++ class and some of its member functions does not require a header
section anymore.
bisonc++ are discussed here. The readery is referred to bisonc++'s
man-page
for a full description.
header base.h.
headerheader as the pathname to the file pre-included in the
parser's base-class header. This declaration is useful in
situations where the base class header file refers to types which
might not yet be known. E.g., with %union a std::string *
field might be used. Since the class std::string might not yet
be known to the compiler once it processes the base class header
file we need a way to inform the compiler about these classes and
types. The suggested procedure is to use a pre-include header file
declaring the required types. By default header will be
surrounded by double quotes (using, e.g., #include "header").
When the argument is surrounded by angle brackets #include
<header> will be included. In the latter case, quotes might be
required to escape interpretation by the shell (e.g., using -H
'<header>').
header .h
parser-class-name %name declaration previously used by bison++. It
defines the name of the C++ class that will be
generated. Contrary to bison++'s %name declaration,
%class-name may appear anywhere in the first section of the
grammar specification file. It may be defined only once. If no
%class-name is specified the default class name Parser will
be used.
parse() and its support functions with debugging code,
showing the actual parsing process on the standard output
stream. When included, the debugging output is active by default,
but its activity may be controlled using the setDebug(bool
on-off) member. Note that no #ifdef DEBUG macros are used
anymore. By rerunning bisonc++ without the --debug option
an equivalent parser is generated not containing the debugging
code.
header header .ih. The implementation header should
contain all directives and declarations only used by the
implementations of the parser's member functions. It is the only
header file that is included by the source file containing
parse()'s implementation. It is suggested that user defined
implementations of other class members use the same convention,
thus concentrating all directives and declarations that are
required for the compilation of other source files belonging to the
parser class in one header file.
source parse(). Defaults to parse.cc.
headerheader as the pathname to the file pre-included in the
parser's class header. This file should define a class
Scanner, offering a member int yylex() producing the next
token from the input stream to be analyzed by the parser generated
by bisonc++. When this option is used the parser's member
int lex() will be predefined as (assuming the parser class
name is Parser):
inline int Parser::lex()
{
return d_scanner.yylex();
}
and an object Scanner d_scanner will be composed into the
parser. The d_scanner object will be constructed using its
default constructor. If another constructor is required, the
parser class may be provided with an appropriate (overloaded)
parser constructor after having constructed the default parser
class header file using bisonc++. By default header will
be surrounded by double quotes (using, e.g., #include
"header"). When the argument is surrounded by angle brackets
#include <header> will be included.
typename should be the name of an unstructured type (e.g.,
size_t). By default it is int. See YYSTYPE in
bison. It should not be used if a %union specification is
used. Within the parser class, this type may be used as
STYPE.
union-definition bison declaration. As with bison
this generates a union for the parser's semantic type. The union
type is named STYPE. If no %union is declared, a simple
stack-type may be defined using the %stype declaration. If no
%stype declaration is used, the default stacktype (int) is
used.
%union declaration is:
%union
{
int i;
double d;
};
A union
cannot contain objects as its fields,
as
constructors
cannot be called when a union is created. This means that
a string cannot be a member of the
union. A string *, however, is a possible union member. By the way:
the
lexical scanner does not have to know about such a union. The scanner
can simply pass its scanned text to the parser through its
YYText()
member function. For example using
a statement like
$$.i = A2x(scanner.YYText());
matched text may be converted to a value of an appropriate type.
Tokens and non-terminals can be associated with union fields. This is
strongly advised, as it prevents type mismatches, since the compiler will be
able to check for type correctness. At the same time, the bison specific
variabels $$, $1, $2, etc. may be used, rather than the full field
specification (like $$.i). A non-terminal or a token may be associated
with a union field using the
<fieldname> specification. E.g.,
%token <i> INT // token association (deprecated, see below)
<d> DOUBLE
%type <i> intExpr // non-terminal association
In the example developed here, note that both the tokens and the
non-terminals can be associated with a field of the union. However, as noted
before, the lexical scanner does not have to know about all this. In our
opinion, it is cleaner to let the scanner do just one thing: scan texts. The
parser, knowing what the input is all about, may then convert strings like
"123" to an integer value. Consequently, the association of a union field
and a token is discouraged.
In the upcoming description of the rules of the grammar this will be
illustrated further.
In the %union discussion the
%token and %type specifications should be noted. They are used to
specify the tokens (
terminal symbols) that can be returned by
the lexical scanner, and to specify the return types of
non-terminals. Apart from %token the
token indicators
%left,
%right and
%nonassoc
may be used to specify the
associativity of operators. The tokens
mentioned at these indicators are interpreted as tokens indicating operators,
associating in the indicated direction. The
precedence of operators
is given by their order: the first specification has the lowest priority. To
overrule a certain precedence in a certain context,
%prec can be used. As all this is standard bisonc++ practice, it isn't
further elaborated here. The documentation provided with bisonc++'s
distribution should be consulted for further reference.
Here is the specification of the calculator's declaration section:
%filenames parser
%scanner ../scanner/scanner.h
%lines
%union {
int i;
double d;
};
%token INT
DOUBLE
%type <i> intExpr
%type <d> doubleExpr
%left '+'
%left '*'
%right UnaryMinus
In the declaration section %type specifiers are used, associating the
intExpr rule's value (see the next section) to the i-field of the
semantic-value union, and associating doubleExpr's value to the
d-field. At first sight this may look complex, since the expression rules
must be included for each individual return type. On the other hand, if the
union itself would have been used, we would still have had to specify
somewhere in the returned semantic values what field to use: less rules, but
more complex and error-prone code.
bisonc++. In particular, note
that no action block requires more than a single line of code. This keeps the
organization of the grammar relatively simple, and therefore enhances its
readability and understandability. Even the rule defining the parser's proper
termination (the empty line in the line rule) uses a single member
function call done(). The implementation of that function is simple, but
interesting in that it calls Parser::ACCEPT(), showing that the
ACCEPT() member can be called indirectly from a production rule's action
block. Here are the grammar's production rules:
lines:
lines
line
|
line
;
line:
intExpr
'\n'
{
display($1);
}
|
doubleExpr
'\n'
{
display($1);
}
|
'\n'
{
done();
}
|
error
'\n'
{
reset();
}
;
intExpr:
intExpr '*' intExpr
{
$$ = exec('*', $1, $3);
}
|
intExpr '+' intExpr
{
$$ = exec('+', $1, $3);
}
|
'(' intExpr ')'
{
$$ = $2;
}
|
'-' intExpr %prec UnaryMinus
{
$$ = neg($2);
}
|
INT
{
$$ = convert<int>();
}
;
doubleExpr:
doubleExpr '*' doubleExpr
{
$$ = exec('*', $1, $3);
}
|
doubleExpr '*' intExpr
{
$$ = exec('*', $1, d($3));
}
|
intExpr '*' doubleExpr
{
$$ = exec('*', d($1), $3);
}
|
doubleExpr '+' doubleExpr
{
$$ = exec('+', $1, $3);
}
|
doubleExpr '+' intExpr
{
$$ = exec('+', $1, d($3));
}
|
intExpr '+' doubleExpr
{
$$ = exec('+', d($1), $3);
}
|
'(' doubleExpr ')'
{
$$ = $2;
}
|
'-' doubleExpr %prec UnaryMinus
{
$$ = neg($2);
}
|
DOUBLE
{
$$ = convert<double>();
}
;
The above grammar is used to implement a simple
calculator in which
integer and real values can be negated, added, and multiplied, and in which
standard priority rules can be circumvented using parentheses. The grammar
shows the use of typed nonterminal symbols: doubleExpr is linked to real
(double) values, intExpr is linked to integer values. Precedence and type
association is defined in the parser's definition section.
Bisonc++ generates various files, among which the file defining
the parser's class. Functions called from the production rule's action blocks
are usually member functions of the parser, and these member functions must be
declared and defined. Once bisonc++ has generated the header file defining
the parser's class it will not automatically rewrite that file, allowing the
programmer to add new members to the parser class. Here is the parser.h
file as used for our little calculator:
#ifndef Parser_h_included
#define Parser_h_included
#include <iostream>
#include <sstream>
#include <bobcat/a2x>
#include "parserbase.h"
#include "../scanner/scanner.h"
#undef Parser
class Parser: public ParserBase
{
std::ostringstream d_rpn;
// $insert scannerobject
Scanner d_scanner;
public:
int parse();
private:
template <typename Type>
Type exec(char c, Type left, Type right);
template <typename Type>
Type neg(Type op);
template <typename Type>
Type convert();
void display(int x);
void display(double x);
void done() const;
void reset();
void error(char const *msg);
int lex();
void print();
static double d(int i);
// support functions for parse():
void executeAction(int d_production);
void errorRecovery();
int lookup();
void nextToken();
};
inline double Parser::d(int i)
{
return i;
}
template <typename Type>
Type Parser::exec(char c, Type left, Type right)
{
d_rpn << " " << c << " ";
return c == '*' ? left * right : left + right;
}
template <typename Type>
Type Parser::neg(Type op)
{
d_rpn << " n ";
return -op;
}
template <typename Type>
Type Parser::convert()
{
Type ret = FBB::A2x(d_scanner.YYText());
d_rpn << " " << ret << " ";
return ret;
}
inline void Parser::error(char const *msg)
{
std::cerr << msg << std::endl;
}
inline int Parser::lex()
{
return d_scanner.yylex();
}
inline void Parser::print()
{}
#endif
Parser::INT or
Parser::DOUBLE tokens. Here is the complete flex
specification file:
%{
#define _SKIP_YYFLEXLEXER_
#include "scanner.ih"
#include "../parser/parserbase.h"
%}
%option yyclass="Scanner" outfile="yylex.cc" c++ 8bit warn noyywrap
%option debug
%%
[ \t] ;
[0-9]+ return Parser::INT;
"."[0-9]* |
[0-9]+("."[0-9]*)? return Parser::DOUBLE;
.|\n return *yytext;
%%
bison and flex. In order to have bisonc++
generate the files parser.cc and parser.h, issue the command:
bisonc++ -V grammar
The option -V will generate the file parser.output showing
information about the internal structure of the provided grammar, among which
its states. It is useful for debugging purposes, and can be left out of the
command if no debugging is required. Bisonc++ may detect conflicts
(
shift-reduce conflicts and/or
reduce-reduce conflicts) in the
provided grammar. These conflicts may be resolved explicitly using
disambiguation rules or they are `resolved' by default. A
shift-reduce conflict is resolved by shifting, i.e., the next token is
consumed. A reduce-reduce conflict is resolved by using the first of two
competing production rules. Bisonc++ uses identical
conflict resolution procedures as bison and bison++.
Once a parser class and parsing member function has been constructed
flex may be used to create a lexical scanner (in, e.g., the file
yylex.cc) using the command
flex -I lexer
On Unix systems a command comparable to:
g++ -o calc -Wall *.cc -s
can be used to compile and link both the source of the main program and
the generated sources.
Finally, here is a source file in which the main() function and the
parser object is defined. The parser features the lexical scanner as one of
its data members:
#include "parser/parser.h"
using namespace std;
int main()
{
Parser parser;
cout << "Enter (nested) expressions containing ints, doubles, *, + and "
"unary -\n"
"operators. Enter an empty line to stop.\n";
return parser.parse();
}
Bisonc++ can be downloaded from
http://bisoncpp.sourceforge.net/. It requires the bobcat
library, which can be downloaded from
http://bobcat.sourceforge.net/.
One may wonder why a union is still used by Bisonc++ as C++ offers
inherently superior approaches to combine multiple types into one type. The
C++ way to combine types into one type is by defining a polymorphic base
class and a series of derived classes implementing the alternative data
types. Bisonc++ supports The union approach for backward compatibility
reasons: both bison(1) and bison++ support the %union directive.
The alternative to using a union is using a polymorphic base class. This
class will be developed below as the class Base. Since it's a polymorphic
base class it should have the following characteristics:
clone() member to implement as so-called virtual
constructor (cf. the
virtual constructor
design pattern,
Gamma et al. (1995));
insert() member and an overloaded operator<<() were defined to
allow derived objects to be inserted into ostream objects.
The class Base has the following interface:
#ifndef INCLUDED_BASE_
#define INCLUDED_BASE_
#include <iosfwd>
class Base
{
public:
virtual ~Base();
virtual Base *clone() const = 0;
virtual std::ostream &insert(std::ostream &os) const = 0;
};
inline Base::~Base()
{}
inline std::ostream &operator<<(std::ostream &out, Base const &obj)
{
return obj.insert(out);
}
#endif
The alternatives as defined by a classical union are now defined by
classes derived from the class Base. For example:
Int contain int values. Here is
its interface (and implementation):
#ifndef INCLUDED_INT_
#define INCLUDED_INT_
#include <ostream>
#include <bobcat/a2x>
#include "../base/base.h"
class Int: public Base
{
int d_value;
public:
Int(char const *text);
Int(int v);
virtual Base *clone() const;
int value() const; // directly access the value
virtual std::ostream &insert(std::ostream &os) const;
};
inline Int::Int(char const *txt)
:
d_value(FBB::A2x(txt))
{}
inline Int::Int(int v)
:
d_value(v)
{}
inline Base *Int::clone() const
{
return new Int(*this);
}
inline int Int::value() const
{
return d_value;
}
inline std::ostream &Int::insert(std::ostream &out) const
{
return out << d_value;
}
#endif
Text contain text. These objects can be
used, e.g., to store the names of identifiers recognized by a lexical scanner.
Here is the interface of the class Text:
#ifndef INCLUDED_TEXT_
#define INCLUDED_TEXT_
#include <string>
#include <ostream>
#include "../base/base.h"
class Text: public Base
{
std::string d_text;
public:
Text(char const *id);
virtual Base *clone() const;
std::string const &id() const; // directly access the name.
virtual std::ostream &insert(std::ostream &os) const;
};
inline Text::Text(char const *id)
:
d_text(id)
{}
inline Base *Text::clone() const
{
return new Text(*this);
}
inline std::string const &Text::id() const
{
return d_text;
}
inline std::ostream &Text::insert(std::ostream &out) const
{
return out << d_text;
}
#endif
The polymorphic Base, however, can't be used as the parser's semantic
value type:
Base class object cannot contain derived class's data members,
so plain Base class objects cannot be used to store the parser's semantic
values.
Base class reference as a semantic
value either as containers cannot store references.
Base
class object. Although a pointer would offer programmers the benefits of the
polymorphic nature of the Base class, it would also require them to keep
track of all memory used by Base objects, thus countering many of the
benefits of using a polymorphic base class.
To solve the above problems, a wrapper class Semantic around a
Base pointer is used. The wrapper class will take care of the proper
destruction of objects accessed via its Base pointer data member and it
willl offer facilities to access the proper derived class. The interface of
the class Semantic is:
#ifndef INCLUDED_SEMANTIC_
#define INCLUDED_SEMANTIC_
#include <ostream>
#include "../base/base.h"
class Semantic
{
Base *d_bp;
public:
Semantic(Base *bp = 0); // Semantic will own the bp
Semantic(Semantic const &other);
~Semantic();
Semantic &operator=(Semantic const &other);
Base const &base() const;
template <typename Class>
Class const &downcast();
private:
void copy(Semantic const &other);
};
inline Base const &Semantic::base() const
{
return *d_bp;
}
inline Semantic::Semantic(Base *bp)
:
d_bp(bp)
{}
inline Semantic::~Semantic()
{
delete d_bp;
}
inline Semantic::Semantic(Semantic const &other)
{
copy(other);
}
inline Semantic &Semantic::operator=(Semantic const &other)
{
if (this != &other)
{
delete d_bp;
copy(other);
}
return *this;
}
inline void Semantic::copy(Semantic const &other)
{
d_bp = other.d_bp ? other.d_bp->clone() : 0;
}
template <typename Class>
inline Class const &Semantic::downcast()
{
return dynamic_cast<Class &>(*d_bp);
}
inline std::ostream &operator<<(std::ostream &out, Semantic const &obj)
{
if (&obj.base())
return out << obj.base();
return out << "<UNDEFINED>";
}
#endif
Semantic is a slightly more `complete' class than Base and its
derivatives, since it contains a pointer which must be handled
appropriately. So it needs a copy constructor, an overloaded assignment
operator and a destructor. Apart from that, it supports members to obtain a
reference to the base class. This reference is then used by the overloaded
operator<<() to allow insertion into streams of objects of classes derived
from Base. It also offers a small member template returning a reference to
a derived class object from the semantic value's Base class pointer. This
member effectively implements (and improves) the type safety that is otherwise
strived for by typed nonterminals and typed tokens (using the %type
directive).
%stype will of course be
Semantic. A simple grammar is defined for this illustrative example. The
grammar expects input according to the following rule:
rule:
IDENTIFIER '(' IDENTIFIER ')' ';'
|
IDENTIFIER '=' INT ';'
;
The rule's actions simply echo the received identifiers and int values to
cout. Here is an example of a decorated production rule:
IDENTIFIER '=' INT ';'
{
cout << $1 << " " << $3 << endl;
}
Alternative actions could easily be defined, e.g., using the
Base::downcast() member:
IDENTIFIER '=' INT ';'
{
int value = $3.downcast<Int>().value();
}
Bisonc++'s parser stores all semantic values on its semantic values stack
(irrespective of the number of tokens that were defined in a particular
production rule). At any time all semantic values associated with
previously recognized tokens are available in an action block. Once the
semantic value stack is reduced, the Semantic class takes care of the
proper destruction of the objects pointed to by the Semantic data
member d_bp.
The scanner must of course be able to access the parser's data member
representing the most recent semantic value. This data member is available as
the parser's data member d_lval__, which can be offered to the scanner
object at its construction time. E.g., with a scanner expecting an
STYPE__ & the parser's constructor could simply be defined as:
inline Parser::Parser()
:
d_scanner(d_val__)
{}
d_val__ data member. Therefore the Scanner class may define a
Semantic &d_val member, which is initialized to Semantic d_val__
received by the Scanner's constructor:
inline Scanner::Scanner(Parser::STYPE__ &semval)
: // or: Semantic &semval
d_semval(semval)
{}
The scanner (generated by flex(1))
recognizes input patterns,
returns Parser tokens (e.g., Parser::INT), and returns a semantic value when
applicable. E.g., when recognizing a Parser::INT the rule is:
{
*d_semval = new Int(yytext);
return Parser::INT;
}
Note that, as the Semantic constructor expects but one argument,
automatic promotion from Base * to Semantic can be used in the
assignments to *d_semval.
The IDENTIFIER's semantic value is obtained as follows:
[a-zA-Z_][a-zA-Z0-9_]* { *d_semval = new Text(yytext); return Parser::IDENTIFIER; }