Elsa Tutorial

1. Introduction

This tutorial is intended to help get a new Elsa user learn about the Elsa and develop a basic familiarity with the core components. After doing this tutorial, you will know how to do the following tasks:

Build Elsa
Parse and pretty-print a C++ file
Write a simple tree traversal analysis with ASTVisitor.
Write a simple language extension, including extensions to the lexer, parser, AST, type checker, and pretty printer.

If you run into difficulties you cannot resolve, you can email me at

  smcpeak cs berkeley edu
         @  .        .

though I might not respond right away.

At the end of this tutorial I will give pointers for more exploration.

2. Building Elsa

If you have not already, download and unpack the Elsa distribution tarball, available from the Elkhound page. Inside the Elsa distribution you will find a directory structure like this:

  elsa-XXXX.XX.XX
    |
    +- smbase               utility library
    |
    +- ast                  AST generator
    |
    +- elkhound             parser generator
    |
    +- elsa                 C++ parser/frontend/etc.
       |
       +- in                  C/C++ input files for regression testing
       |
       +- include             Elsa's compiler-specific headers

2.1 Configure

At the top elsa-XXXX.XX.XX directory, just run configure:

  elsa-XXXX.XX.XX$ ./configure

The above command just runs configure scripts in each of the four subdirectories. If you want to pass specific options to a configure script inside a specific directory, just go into the directory and run its particular script.

In particular, if you want to be able to use the tracing flags, which I recommend, then pass the -debug option to elsa/configure:

  elsa-XXXX.XX.XX$ cd elsa
  elsa$ ./configure -debug
  elsa$ cd ..

2.2 Make

At the top, run make:

  elsa-XXXX.XX.XX$ make

This just runs make in each of the four directories.

Hopefully, everything will build without problems. If not, it's possible you will have to change one or more of the source files to get them to compile. In that case, please send me the changes that you found necessary.

2.3 Make check

At the top, run make check:

  elsa-XXXX.XX.XX$ make check

Again, this just runs make check in each of the four directories. While this step is in principle optional, if something fails during the make check then it's likely there is a serious problem.

3. Run ccparse

Among the executables produced by make is elsa/ccparse, a program that runs the C++ parser and optionally does a number of post-processing activities. While developing Elsa, ccparse is the program that I run most often.

3.1 Parsing alone

ccparse takes the name of file to parse on the command line:

  elsa-XXXX.XX.XX$ cd elsa
  elsa$ ./ccparse in/t0001.cc
  %%% progress: 0ms: done parsing (1 ms, 0_234564 cycles)
  ambiguous nodes: 0
  %%% progress: 5ms: done type checking (4 ms)
  typechecking results:
    errors:   0
    warnings: 0
  %%% progress: 6ms: done elaborating (0 ms)

The output includes "%%% progress" lines that report on stages of processing and timing information, the number of ambiguous nodes in the AST, and the number of warnings and errors emitted by the type checker. Since everything worked, the output isn't all that interesting.

3.2 Pretty printing

We can ask ccparse to print out the source code it has just parsed in, by using the -tr prettyPrint command-line argument:

  elsa$ ./ccparse -tr prettyPrint in/t0001.cc
  %%% progress: 0ms: done parsing (0 ms, 0_235561 cycles)
  ambiguous nodes: 0
  %%% progress: 5ms: done type checking (3 ms)
  typechecking results:
    errors:   0
    warnings: 0
  %%% progress: 7ms: done elaborating (0 ms)
  %%% progress: 7ms: dsw pretty print...
  ---- START ----
  // -*-c++-*-
  int x;
  ---- STOP ----
  %%% progress: 9ms: dsw pretty print... done

Buried in amongst all that noise is the pretty-printed input file:

  // -*-c++-*-
  int x;

Obviously, in/t0001.cc is a fairly trivial input file. You're free to experiment with other files in the in/ directory, or even with your own input files. Note: You must preprocess with the input files with cpp first! Elsa does not include a preprocessor.

3.3 Printing the Abstract Syntax Tree

3.3.1 `in/t0001.cc`

ccparse can print the Abstract Syntax Tree (AST) produced by the parser, by passing -tr printAST:

  elsa$ ./ccparse -tr printAST in/t0001.cc
  %%% progress: 0ms: done parsing (1 ms, 0_235436 cycles)
  tree = TranslationUnit:
    topForms:
      topForms[0] = TF_decl:
        loc = in/t0001.cc:3:1
        decl = Declaration:
          dflags =
          spec = TS_simple:
            cv =
            loc = in/t0001.cc:3:1
            id = int
          decllist:
            decllist[0] = Declarator:
              context = DC_UNKNOWN
              decl = D_name:
                loc = in/t0001.cc:3:5
                name = PQ_name:
                  loc = in/t0001.cc:3:5
                  name = x
              init is null
              ctorStatement is null
              dtorStatement is null
  ambiguous nodes: 0
  %%% progress: 9ms: done type checking (3 ms)
  typechecking results:
    errors:   0
    warnings: 0
  %%% progress: 11ms: done elaborating (0 ms)

From this printout we can see that the entire input is a TranslationUnit, whose first topForm is a Declaration. The declaration has a type specifier, TS_simple, denoting int. It also has a single Declarator, with a D_name containing x.

Besides the structural information, the printout includes source locations, such as "in/t0001.cc:3:1". This means line 3, column 1 of the file in/t0001.cc.

3.3.1 `in/t0008.cc` (ambiguous)

The AST above is unambiguous; the parser was able to completely determine the syntactic interpretation of the input file, without any help from the type checker. This isn't always possible with C++, and since the Elsa type checker runs only after the parser, the parser sometimes produces an ambiguous AST. A simple example is in in/t0008.cc:

  elsa$ ./ccparse -tr printAST in/t0008.cc
  %%% progress: 0ms: done parsing (3 ms, 3_369729 cycles)
  tree = TranslationUnit:
    topForms:
      topForms[0] = TF_func:
        loc = in/t0008.cc:4:1
        f = Function:
          receiver: NULL
          retVal: NULL
          dflags = 
          retspec = TS_simple:
            cv = 
            loc = in/t0008.cc:4:1
            id = int
          nameAndParams = Declarator:
            context = DC_UNKNOWN
            decl = D_func:
              loc = in/t0008.cc:4:5
              base = D_name:
                loc = in/t0008.cc:4:5
                name = PQ_name:
                  loc = in/t0008.cc:4:5
                  name = main
              params:
              cv = 
              exnSpec is null
            init is null
            ctorStatement is null
            dtorStatement is null
          inits:
          body = S_compound:
            succ={ }
            loc = in/t0008.cc:5:1
            stmts:
              stmts[0] = S_decl:
                succ={ }
                loc = in/t0008.cc:6:3
                decl = Declaration:
                  dflags = 
                  spec = TS_simple:
                    cv = 
                    loc = in/t0008.cc:6:3
                    id = int
                  decllist:
                    decllist[0] = Declarator:
                      context = DC_UNKNOWN
                      decl = D_name:
                        loc = in/t0008.cc:6:7
                        name = PQ_name:
                          loc = in/t0008.cc:6:7
                          name = a
                      init is null
                      ctorStatement is null
                      dtorStatement is null
              stmts[1] = S_decl:
                succ={ }
                loc = in/t0008.cc:7:3
                decl = Declaration:
                  dflags = 
                  spec = TS_simple:
                    cv = 
                    loc = in/t0008.cc:7:3
                    id = int
                  decllist:
                    decllist[0] = Declarator:
                      context = DC_UNKNOWN
                      decl = D_name:
                        loc = in/t0008.cc:7:7
                        name = PQ_name:
                          loc = in/t0008.cc:7:7
                          name = b
                      init is null
                      ctorStatement is null
                      dtorStatement is null
              stmts[2] = S_expr:
                succ={ }
                loc = in/t0008.cc:9:3
                expr = FullExpression:
                  --------- ambiguous Expression: 2 alternatives ---------
                    tree = E_binary:
                      e1 = E_grouping:
                        expr = E_variable:
                          var: NULL
                          name = PQ_name:
                            loc = in/t0008.cc:9:4
                            name = a
                      op = &
                      e2 = E_grouping:
                        expr = E_variable:
                          var: NULL
                          name = PQ_name:
                            loc = in/t0008.cc:9:10
                            name = b
                  ---- or ----
                    tree = E_cast:
                      ctype = ASTTypeId:
                        spec = TS_name:
                          cv = 
                          loc = in/t0008.cc:9:4
                          name = PQ_name:
                            loc = in/t0008.cc:9:4
                            name = a
                          typenameUsed = false
                        decl = Declarator:
                          context = DC_UNKNOWN
                          decl = D_name:
                            loc = in/t0008.cc:9:5
                            name is null
                          init is null
                          ctorStatement is null
                          dtorStatement is null
                      expr = E_addrOf:
                        expr = E_grouping:
                          expr = E_variable:
                            var: NULL
                            name = PQ_name:
                              loc = in/t0008.cc:9:10
                              name = b
                  --------- end of ambiguous Expression ---------
          handlers:
          dtorStatement is null
          implicitlyDefined = false
  ambiguous nodes: 1
  %%% progress: 5ms: done type checking (4 ms)
  typechecking results:
    errors:   0
    warnings: 0
  %%% progress: 6ms: done elaborating (0 ms)

3.3.1 `in/t0008.cc` (unambiguous)

The ambiguity can be resolved (in favor of E_binary) by considering that the name x refers to a variable, not a type. The type checker does just this, and produces an unambiguous AST. The -tr printTypedAST command line option will print the AST as it exists after the type checker runs:

  elsa$ ./ccparse -tr printTypedAST in/t0008.cc
  %%% progress: 0ms: done parsing (0 ms, 0_541155 cycles)
  ambiguous nodes: 1
  %%% progress: 5ms: done type checking (5 ms)
  tree = TranslationUnit:
    topForms:
      topForms[0] = TF_func:
        loc = in/t0008.cc:4:1
        f = Function:
          funcType: int ()()
          receiver: NULL
          retVal: NULL
          dflags = 
          retspec = TS_simple:
            cv = 
            loc = in/t0008.cc:4:1
            id = int
          nameAndParams = Declarator:
            var:   int main()
            context = DC_FUNCTION
            decl = D_func:
              loc = in/t0008.cc:4:5
              base = D_name:
                loc = in/t0008.cc:4:5
                name = PQ_name:
                  loc = in/t0008.cc:4:5
                  name = main
              params:
              cv = 
              exnSpec is null
            init is null
            ctorStatement is null
            dtorStatement is null
          inits:
          body = S_compound:
            succ={ 6:3 }
            loc = in/t0008.cc:5:1
            stmts:
              stmts[0] = S_decl:
                succ={ 7:3 }
                loc = in/t0008.cc:6:3
                decl = Declaration:
                  dflags = 
                  spec = TS_simple:
                    cv = 
                    loc = in/t0008.cc:6:3
                    id = int
                  decllist:
                    decllist[0] = Declarator:
                      var:  int a
                      context = DC_S_DECL
                      decl = D_name:
                        loc = in/t0008.cc:6:7
                        name = PQ_name:
                          loc = in/t0008.cc:6:7
                          name = a
                      init is null
                      ctorStatement is null
                      dtorStatement is null
              stmts[1] = S_decl:
                succ={ 9:3 }
                loc = in/t0008.cc:7:3
                decl = Declaration:
                  dflags = 
                  spec = TS_simple:
                    cv = 
                    loc = in/t0008.cc:7:3
                    id = int
                  decllist:
                    decllist[0] = Declarator:
                      var:  int b
                      context = DC_S_DECL
                      decl = D_name:
                        loc = in/t0008.cc:7:7
                        name = PQ_name:
                          loc = in/t0008.cc:7:7
                          name = b
                      init is null
                      ctorStatement is null
                      dtorStatement is null
              stmts[2] = S_expr:
                succ={ }
                loc = in/t0008.cc:9:3
                expr = FullExpression:
                  expr = E_binary:
                    type: int
                    e1 = E_grouping:
                      type: int &
                      expr = E_variable:
                        type: int &
                        var: int a, at in/t0008.cc:6:7 (0x08271940)
                        name = PQ_name:
                          loc = in/t0008.cc:9:4
                          name = a
                    op = &
                    e2 = E_grouping:
                      type: int &
                      expr = E_variable:
                        type: int &
                        var: int b, at in/t0008.cc:7:7 (0x082719F8)
                        name = PQ_name:
                          loc = in/t0008.cc:9:10
                          name = b
          handlers:
          dtorStatement is null
          implicitlyDefined = false
  typechecking results:
    errors:   0
    warnings: 0
  %%% progress: 6ms: done elaborating (0 ms)

If you look carefully at the output, you will see that in addition to being unambiguous, the post-tcheck AST also has been annotated with information about the types of expressions, and the variables to which names refer. This information is very useful to analyses that come after the type checker.

If you want, take some time to experiment with ccparse and the trees it can print. Find some nasty C++ and see what Elsa thinks of it! Try to crash Elsa! When you've had your fill of segfaults (send me bug reports please?), you're ready to go on to the next part of the tutorial.

Go to Part 2: Modifying Elsa