Elsa Tutorial
This tutorial is intended to help get a new Elsa user learn about the Elsa and develop a basic familiarity with the core components. After doing this tutorial, you will know how to do the following tasks:
If you run into difficulties you cannot resolve, you can email me at
smcpeak cs berkeley edu @ . .though I might not respond right away.
At the end of this tutorial I will give pointers for more exploration.
If you have not already, download and unpack the Elsa distribution tarball, available from the Elkhound page. Inside the Elsa distribution you will find a directory structure like this:
elsa-XXXX.XX.XX | +- smbase utility library | +- ast AST generator | +- elkhound parser generator | +- elsa C++ parser/frontend/etc. | +- in C/C++ input files for regression testing | +- include Elsa's compiler-specific headers
At the top elsa-XXXX.XX.XX directory, just run configure:
elsa-XXXX.XX.XX$ ./configureThe above command just runs configure scripts in each of the four subdirectories. If you want to pass specific options to a configure script inside a specific directory, just go into the directory and run its particular script.
In particular, if you want to be able to use the tracing flags, which I recommend, then pass the -debug option to elsa/configure:
elsa-XXXX.XX.XX$ cd elsa elsa$ ./configure -debug elsa$ cd ..
At the top, run make:
elsa-XXXX.XX.XX$ makeThis just runs make in each of the four directories.
Hopefully, everything will build without problems. If not, it's possible you will have to change one or more of the source files to get them to compile. In that case, please send me the changes that you found necessary.
At the top, run make check:
elsa-XXXX.XX.XX$ make checkAgain, this just runs make check in each of the four directories. While this step is in principle optional, if something fails during the make check then it's likely there is a serious problem.
Among the executables produced by make is elsa/ccparse, a program that runs the C++ parser and optionally does a number of post-processing activities. While developing Elsa, ccparse is the program that I run most often.
ccparse takes the name of file to parse on the command line:
elsa-XXXX.XX.XX$ cd elsa elsa$ ./ccparse in/t0001.cc %%% progress: 0ms: done parsing (1 ms, 0_234564 cycles) ambiguous nodes: 0 %%% progress: 5ms: done type checking (4 ms) typechecking results: errors: 0 warnings: 0 %%% progress: 6ms: done elaborating (0 ms)The output includes "%%% progress" lines that report on stages of processing and timing information, the number of ambiguous nodes in the AST, and the number of warnings and errors emitted by the type checker. Since everything worked, the output isn't all that interesting.
We can ask ccparse to print out the source code it has just parsed in, by using the -tr prettyPrint command-line argument:
elsa$ ./ccparse -tr prettyPrint in/t0001.cc %%% progress: 0ms: done parsing (0 ms, 0_235561 cycles) ambiguous nodes: 0 %%% progress: 5ms: done type checking (3 ms) typechecking results: errors: 0 warnings: 0 %%% progress: 7ms: done elaborating (0 ms) %%% progress: 7ms: dsw pretty print... ---- START ---- // -*-c++-*- int x; ---- STOP ---- %%% progress: 9ms: dsw pretty print... doneBuried in amongst all that noise is the pretty-printed input file:
// -*-c++-*- int x;
Obviously, in/t0001.cc is a fairly trivial input file. You're free to experiment with other files in the in/ directory, or even with your own input files. Note: You must preprocess with the input files with cpp first! Elsa does not include a preprocessor.
ccparse can print the Abstract Syntax Tree (AST) produced by the parser, by passing -tr printAST:
elsa$ ./ccparse -tr printAST in/t0001.cc %%% progress: 0ms: done parsing (1 ms, 0_235436 cycles) tree = TranslationUnit: topForms: topForms[0] = TF_decl: loc = in/t0001.cc:3:1 decl = Declaration: dflags = spec = TS_simple: cv = loc = in/t0001.cc:3:1 id = int decllist: decllist[0] = Declarator: context = DC_UNKNOWN decl = D_name: loc = in/t0001.cc:3:5 name = PQ_name: loc = in/t0001.cc:3:5 name = x init is null ctorStatement is null dtorStatement is null ambiguous nodes: 0 %%% progress: 9ms: done type checking (3 ms) typechecking results: errors: 0 warnings: 0 %%% progress: 11ms: done elaborating (0 ms)From this printout we can see that the entire input is a TranslationUnit, whose first topForm is a Declaration. The declaration has a type specifier, TS_simple, denoting int. It also has a single Declarator, with a D_name containing x.
Besides the structural information, the printout includes source locations, such as "in/t0001.cc:3:1". This means line 3, column 1 of the file in/t0001.cc.
The AST above is unambiguous; the parser was able to completely determine the syntactic interpretation of the input file, without any help from the type checker. This isn't always possible with C++, and since the Elsa type checker runs only after the parser, the parser sometimes produces an ambiguous AST. A simple example is in in/t0008.cc:
elsa$ ./ccparse -tr printAST in/t0008.cc %%% progress: 0ms: done parsing (3 ms, 3_369729 cycles) tree = TranslationUnit: topForms: topForms[0] = TF_func: loc = in/t0008.cc:4:1 f = Function: receiver: NULL retVal: NULL dflags = retspec = TS_simple: cv = loc = in/t0008.cc:4:1 id = int nameAndParams = Declarator: context = DC_UNKNOWN decl = D_func: loc = in/t0008.cc:4:5 base = D_name: loc = in/t0008.cc:4:5 name = PQ_name: loc = in/t0008.cc:4:5 name = main params: cv = exnSpec is null init is null ctorStatement is null dtorStatement is null inits: body = S_compound: succ={ } loc = in/t0008.cc:5:1 stmts: stmts[0] = S_decl: succ={ } loc = in/t0008.cc:6:3 decl = Declaration: dflags = spec = TS_simple: cv = loc = in/t0008.cc:6:3 id = int decllist: decllist[0] = Declarator: context = DC_UNKNOWN decl = D_name: loc = in/t0008.cc:6:7 name = PQ_name: loc = in/t0008.cc:6:7 name = a init is null ctorStatement is null dtorStatement is null stmts[1] = S_decl: succ={ } loc = in/t0008.cc:7:3 decl = Declaration: dflags = spec = TS_simple: cv = loc = in/t0008.cc:7:3 id = int decllist: decllist[0] = Declarator: context = DC_UNKNOWN decl = D_name: loc = in/t0008.cc:7:7 name = PQ_name: loc = in/t0008.cc:7:7 name = b init is null ctorStatement is null dtorStatement is null stmts[2] = S_expr: succ={ } loc = in/t0008.cc:9:3 expr = FullExpression: --------- ambiguous Expression: 2 alternatives --------- tree = E_binary: e1 = E_grouping: expr = E_variable: var: NULL name = PQ_name: loc = in/t0008.cc:9:4 name = a op = & e2 = E_grouping: expr = E_variable: var: NULL name = PQ_name: loc = in/t0008.cc:9:10 name = b ---- or ---- tree = E_cast: ctype = ASTTypeId: spec = TS_name: cv = loc = in/t0008.cc:9:4 name = PQ_name: loc = in/t0008.cc:9:4 name = a typenameUsed = false decl = Declarator: context = DC_UNKNOWN decl = D_name: loc = in/t0008.cc:9:5 name is null init is null ctorStatement is null dtorStatement is null expr = E_addrOf: expr = E_grouping: expr = E_variable: var: NULL name = PQ_name: loc = in/t0008.cc:9:10 name = b --------- end of ambiguous Expression --------- handlers: dtorStatement is null implicitlyDefined = false ambiguous nodes: 1 %%% progress: 5ms: done type checking (4 ms) typechecking results: errors: 0 warnings: 0 %%% progress: 6ms: done elaborating (0 ms)
The ambiguity can be resolved (in favor of E_binary) by considering that the name x refers to a variable, not a type. The type checker does just this, and produces an unambiguous AST. The -tr printTypedAST command line option will print the AST as it exists after the type checker runs:
elsa$ ./ccparse -tr printTypedAST in/t0008.cc %%% progress: 0ms: done parsing (0 ms, 0_541155 cycles) ambiguous nodes: 1 %%% progress: 5ms: done type checking (5 ms) tree = TranslationUnit: topForms: topForms[0] = TF_func: loc = in/t0008.cc:4:1 f = Function: funcType: int ()() receiver: NULL retVal: NULL dflags = retspec = TS_simple: cv = loc = in/t0008.cc:4:1 id = int nameAndParams = Declarator: var:int main() context = DC_FUNCTION decl = D_func: loc = in/t0008.cc:4:5 base = D_name: loc = in/t0008.cc:4:5 name = PQ_name: loc = in/t0008.cc:4:5 name = main params: cv = exnSpec is null init is null ctorStatement is null dtorStatement is null inits: body = S_compound: succ={ 6:3 } loc = in/t0008.cc:5:1 stmts: stmts[0] = S_decl: succ={ 7:3 } loc = in/t0008.cc:6:3 decl = Declaration: dflags = spec = TS_simple: cv = loc = in/t0008.cc:6:3 id = int decllist: decllist[0] = Declarator: var: int a context = DC_S_DECL decl = D_name: loc = in/t0008.cc:6:7 name = PQ_name: loc = in/t0008.cc:6:7 name = a init is null ctorStatement is null dtorStatement is null stmts[1] = S_decl: succ={ 9:3 } loc = in/t0008.cc:7:3 decl = Declaration: dflags = spec = TS_simple: cv = loc = in/t0008.cc:7:3 id = int decllist: decllist[0] = Declarator: var: int b context = DC_S_DECL decl = D_name: loc = in/t0008.cc:7:7 name = PQ_name: loc = in/t0008.cc:7:7 name = b init is null ctorStatement is null dtorStatement is null stmts[2] = S_expr: succ={ } loc = in/t0008.cc:9:3 expr = FullExpression: expr = E_binary: type: int e1 = E_grouping: type: int & expr = E_variable: type: int & var: int a, at in/t0008.cc:6:7 (0x08271940) name = PQ_name: loc = in/t0008.cc:9:4 name = a op = & e2 = E_grouping: type: int & expr = E_variable: type: int & var: int b, at in/t0008.cc:7:7 (0x082719F8) name = PQ_name: loc = in/t0008.cc:9:10 name = b handlers: dtorStatement is null implicitlyDefined = false typechecking results: errors: 0 warnings: 0 %%% progress: 6ms: done elaborating (0 ms)
If you look carefully at the output, you will see that in addition to being unambiguous, the post-tcheck AST also has been annotated with information about the types of expressions, and the variables to which names refer. This information is very useful to analyses that come after the type checker.
If you want, take some time to experiment with ccparse and the trees it can print. Find some nasty C++ and see what Elsa thinks of it! Try to crash Elsa! When you've had your fill of segfaults (send me bug reports please?), you're ready to go on to the next part of the tutorial.