So you want to be a
cOOmpiler writer? - part IV

by Sean A. Corfield


In the last article I skimmed very briefly over the preprocessor and said that in this issue I would start to look at the type system. For once, I'm actually going to do what I said I would!

The type system

What does the draft say about types? It very conveniently partitions them into different categories that we will model directly. These partitions include:

An obvious class hierarchy should already be forming in your mind! What about the concept of "type" itself? What questions can we ask of a type?

A first pass gives us something like:

class AbsType
  AbsType() { }
  virtual ~AbsType() { }

  virtual size_t         size() const = 0;
  virtual const string&  name() const = 0;
  virtual bool	operator==(const AbsType&)
                                const = 0;
  virtual AbsType*       promoted()
                           { return this; }

The size and name pure virtuals should be self-explanatory: every concrete derived class must implement these, even if it is just to say "Error: you cannot take the size of a function." for example.

operator== needs more thought because a typical derived class version will look like:

	const AbsType& rhs
) const
  if (CharType* rhsp =
    // test they are the same char type
    // rhs is not char
    return false;

We must use RTTI to ensure that the dynamic type of both arguments is the same. The lhs type is known (because the virtual operator== despatches through that type) but we must check that the rhs is at least as derived as the lhs (generally the test is that the rhs is the same type). See Uli Breymann's article on this pattern elsewhere in this issue.

What about promoted? Why isn't it pure virtual? Because very few types actually promote to anything, it makes sense to provide a default action that "does nothing".

Building blocks

The scalar types form a fairly straightforward hierarchy (figure 1) but some of the other types pose more interesting problems. class, struct and union clearly share some attributes - they all have members, constructors and so on - but they also have differences, especially from the point of view of source code analysis (my original brief for this column). There is another construct in C++ which also has members: namespace. Abstracting appropriate classes from this problem is hard. I went through several iterations, discussing the pros and cons of early ideas with Scott Meyers (thanks Scott!) before settling on a four-level hierarchy below AbsType (see also figure 2):

Scalar Type implementation represented in UML struct type implementation represented in UML
class NamedScope : public AbsType { };
class NamespaceType : public NamedScope {};
class AbsClass : public NamedScope { };
class ClassType : public AbsClass { };
class StructUnion : public AbsClass { };
class StructType : public StructUnion { };
class UnionType : public StructUnion { };

Some words of explanation. First of all, namespace is not strictly speaking a type. However, handling of declarations is greatly simplified if every declared name can have a type associated with it. Furthermore, when dealing with qualified names, e.g., X::m, it is unmportant whether the qualifying name is a class or a namespace.

Why have a separate layer between AbsClass and StructType (and UnionType)? I was designing a source code analyser to check coding standards, amongst other things. Common in coding standards are rules that say things like "treat struct and union like C, keep C++ features for class". In terms of analysis, this means that finding member functions or access specifiers inside a struct or union should elicit a warning. The code to check the rules in the standards is embodied within methods in the type hierarchy in such a way that checks common to every derived class appear in base classes and differing checks are performed in overriding functions:

void StructType::checkRules()
	// other checks

This pattern is repeated throughout the type class hierarchy, and in fact throughout the entire application.

Mixing in templates

In the original design, template information was held with the declaration and the type system representation stayed "pure". This caused several problems - not the least of which was the fact that A<int> and A<void*> were both treated as plain old A. If this seems a strange decision, and with hindsight it certainly was, some words about the origins of the project are in order. In order to provide an accelerated path to market, the beta release of the product relied on the preprocessor provided on the target platform and templates were not supported. Lack of template support became an issue after a couple of early releases and then had to be grafted on fairly quickly. As compiler support for templates has improved, and especially with the advent of STL, the template support in the analyser needed revising.

Most aspects of an instantiated template class are identical to a non-template class. The template-specific attributes of template classes, template structs and template unions have something in common so it seems natural to abstract these into a class, TemplateType. Clearly a template class must have both AbsClass and TemplateType as bases. Because of the demands of source code analysis (rather than compilation), it is reasonable to enquire of a type whether or not it is an instantiated template. This leads to the observation that TemplateType should be derived from AbsType and so we have a mixin diamond - see figure 3. A secondary observation is that this approach makes it easy to support template namespaces and enums should either of those become common vendor extensions.

A template mixin implementation represented in UML

Next time

I'll leave you to ponder the impact of changing the original hierarchy in this way and next time I'll discuss some of those implications and the difficulties I encountered.

Sean A. Corfield
Object Consultancy Services