cc_type: Type Representation

This page documents cc_type.h and cc_type.cc, the files which make up the type representation module in Elsa. You should look at cc_type.h as you read this file.

1. Introduction

The broad intent of cc_type is to represent types in a way that's easy to understand and manipulate. The abstract syntax of C/C++ type declarations is very much not amenable to either, because it shares type specifiers among declarators in a declaration, and because declarators are inside-out (see the description of Declarator in cc.ast.html). The type checker has the responsibility of interpreting the C/C++ syntax, and constructing corresponding Type objects so that subsequent analyses don't have to deal with the actual syntax.

Another goal of the type representation is general independence from syntax. Ideally, cc_type would not refer to cc.ast at all, but I haven't quite been able to achieve that. Nevertheless, it remains a goal to minimize the knowledge cc_type has about cc.ast. Types are concepts that exist independently from any particular syntax used to describe them, and I want the module dependencies to reflect that basic fact to the extent possible.

One major decision I made for this module was to translate typedefs away entirely. This means I don't have a node for some kind of arbitrary "named type", which would have to be "unwrapped" every place a type is inspected. The potential disadvantage is that I won't use the programmer's names when referring to types in error messages, but there's an easy solution to that too: every constructed type could have a field saying what name, if any, the programmer had been using as an alias for this type. That's not implemented, but I'm confident it would eliminate any advantage to retaining typedefs, so I do not.

Types are divided into two major classes: atomic types and constructed types. Atomic types (my terminology) are types atop which type constructors (like "*" or "[]") might be applied, but which themselves do not have any constructors, so cannot be further "deconstructed". They consist of built-in types like "int", enums, structs, classes and unions; I regard the aggregation of structs/classes/unions to form an atomic wrapper around their members. Each kind of atomic type is explained in Section 2.

Constructed types, by constrast, are whatever is built on top of atomic types by applying type constructors such as "*". Each type constructor is explained in Section 3.

2. Atomic Types

AtomicType is the root of the atomic type hierarchy. It just contains methods to figure out what kind of type it actually is.

SimpleType is represents simple types built-in to C/C++, like "int".

CompoundType is certainly the most complex of all the type classes. It represents a class, struct, or union. Storage of class members is done by inheriting a Scope (cc_scope.h). Thus, CompoundType itself is mostly concerned with representing the inheritance hierarchy.

To get the name lookup semantics right, we must be able to tell when names are ambiguous and also know the inheritance path used to arrive at any base. This is made complicated by the presence of virtual inheritance. Each class contains its own, independent representation of its particular inheritance DAG; this representation is a graph of BaseClassSubobj objects. A base class subobject corresponds one-to-one to some particular (contiguous) range of byte offsets in the final object's layout.

CompoundType knows how to print out its DAG to Dot format. For example, in/std/3.4.5.cc yields 3.4.5.png.

EnumType represents an enum. It has a list of enumerator values, which are also added to the Env (cc_env.h) during type checking.

3. Constructed Types

Type is the root of the constructed type hierarchy. Like AtomicType, it has methods to find out which kind of type a particular object is (roll-my-own RTTI). It also has a variety of query methods (like isVoid()), and entry points for printing types.

CVAtomicType is always the leaf of a constructed type. It just wraps an AtomicType, but possibly with const or volatile qualifiers.

PointerType represents a pointer or reference type, possibly qualified with const or volatile.

FunctionType represents a function type. The parameters are represented as a list of Variable (variable.h) objects. Nonstatic member functions have the isMember flag set, in which case their first parameter is called "this" and has type "C cv * const", where "C" is the name of the class of which the function is a member, and "cv" is optional const/volatile flags. FunctionType also has a flag to indicate if it accepts a variable number of arguments, and a way (ExnSpec) to represent exception specifications.

Finally, function templates have a list of template parameters. As I think about it, it's kind of strange to imagine a function type being templatized, so maybe I should have put that list someplace else (Variable?).

ArrayType represents an array type. Types with no size specified have size NO_SIZE.

PointerToMemberType represents a C++ pointer-to-member. It says which class' member it thinks it points at, the type of the referred-to thing, and some optional const/volatile qualifiers. This is used for both pointers to both member functions and member data.

Since member functions already can be distinguished by the isMember flag, it would have been possible to only use PointerToMember for data members, and I might even switch to doing so at some point.

4. Templates

The template design is still somewhat incomplete. I'd like to have a pass that can fully instantiate templates, and so some of this is looking forward to the existence of such a pass.

TypeVariable is used for template functions and classes to stand for a type which is unknown because it's a parameter to the template. It should point at its corresponding Variable, rather than just having a StringRef name...

TemplateParams is a list of template parameters. I pulled this out into its own class for reasons I now don't remember...

ClassTemplateInfo is intended to contain information about template instantiations. It's not used right now.

5. TypeFactory

When the type checker (cc_tcheck.cc) constructs types, it actually does so via the TypeFactory interface. This is to make it possible for someone to build annotations on top of my Types, instead of going in and mucking about in cc_type.h. It has several core functions that must be defined by a derived class, and a variety of other functions with default implementations when such an implementation is "obvious" in terms of the core functions.

The present form of TypeFactory is driven by the existence of one project that has such an annotation system. As new analyses arise that may need to customize the way types are built, I'll add new entry points to TypeFactory and modify cc_tcheck.cc to use them.

BasicTypeFactory is an implementation of TypeFactory that just builds the types defined in cc_type.h in the obvious way.

Valid HTML 4.01!