lookup.txt This is my attempt to document what the lookup rules of C++ are (in a more digested, concise form than the standard itself, but with liberal references to the standard), and how Elsa implements them. Initially the document just covers a few special cases that I have investigated carefully, but ideally it will grow to cover all the cases of name lookup in C++. The main difference between the account here and that in the standard is that this document will attempt to provide a complete narrative from start to finish (i.e., temporally coherent) of how lookup happens for a given syntax/context, whereas the standard is organized by language feature and so often scatters relevant details in many places. One reference worth mentioning: Brian Malloy has done some investigation of how to implement lookup in C++, and his papers are at http://www.cs.clemson.edu/~malloy/projects/keystone/keystone.html. My analysis is independent, however. ---------------------------------------------------- Notions of syntactic context There are two notions of syntactic context for a lookup. Global syntactic context: The first, which I call "global syntactic context", specifies the sequence of scopes to look in for a nominal unqualified lookup. It is affected by the nesting of scope-containing constructs such as functions and namespaces. It is also affected by issues such as definitions of class and namespace members outside the syntax of their containing entity, and visibility of various scopes during template instantiation. Local syntactic context: The second kind of context, what I call "local syntactic context", is the immediately surrounding neighborhood of tokens, giving contexts like "function name in a function call", "type specifier in a declaration", etc. The nominal lookup rules are in many cases slightly adjusted depending on this local context. This rest of this file has two parts. The first part discusses global syntactic context, and the second part discusses local syntactic context. Then, in principle, to find out the procedure for lookup of a given occurrence of a name, one first consults the global syntactic context rules to determine a nominal scope search order, which is then modified by the rules for the local syntactic context. ==================================================== ============= Global syntactic context ============= ==================================================== ---------------------------------------------------- Scope chaining As with most languages, C++ lookup often provides for searching in a sequence of locations (scopes), with priority given to the earliest scope that contains the name being looked up (e.g., 3.4.1p1). I call this phenomenon "scope chaining". There are four kinds of scope chaining mechanisms: 1. syntactic nesting for the current scope (Env::scopes) 2. syntactic nesting for non-current scopes (Scope::parentScope) 3. namespace 'using' directives (Scope::usingEdges) 4. base classes (CompoundType::bases) Env is responsible for internally following chains 1 and 2; they are (only) used for unqualified lookup. Scope is responsible for following chains 3 and 4; Env delegates to Scope as necessary. Consequently, using Scope::lookup (as opposed to Env::lookup) disables use of chains 1 and 2. LF_IGNORE_USING disables chain 3. LF_INNER_ONLY disables all four chains. Currently, nothing disables chain 4 by itself. ------------------------------------------------------- Scope::lookup: Lookup within a particular Scope Lookup within a scope has input: - StringRef name : name of symbol to look up - Env env : for error reporting (if not LF_SUPPRESS_ERROR) - LookupFlags flags : lookup modifiers And has output: - LookupSet &set : set of entities referred to by 'name' First, Scope::variables or Scope::typeTags is queried, depending on LF_QUERY_TAGS. If this yields a hit, the resulting variable's overload set (if any) is expanded and placed in a temporary 'set'. If LF_INNER_ONLY, return what we have. Next, if this Scope is a namespace, and LF_IGNORE_USING is not specified, the 'using' directive edges are considered, as per 7.3.4 and 3.4.3.2, adding to 'set' (scope chain 3 above): (TODO: explain this process) If 'set' is not empty, return. Finally, if this Scope is a class, its base classes are searched according to 10.2 (scope chain 4 above): - For each of the base class subobject, we search for 'name' in the scope corresponding to the subobject's class (Scope::lookup with LF_INNER_ONLY). - If we find entities v1 and v2 with the given name: - If v1 hides v2 (v2's subobject is an ancestor of v1's subobject), or vice versa, ignore the hidden entity (10.2p6). - If v1 and v2 are the same entity (just in different subobjects), we can regard them as the same if the entity is static, a type, or an enumerator (10.2p5). ------------------------------------------------- Env::lookupPQ: Lookup of possibly-qualified names Input: - Possibly empty sequence of qualifiers; each qualifier is - StringRef name - Possibly empty sequence of template arguments - StringRef name (final name) - Possibly empty sequence of template arguments (final targs) - Lookup flags Output: - LookupSet &set If the name is unqualified, then the lookup is delegated to Env::unqualifiedLookup (with scope=NULL). See below. For qualified lookup (3.4.3), the first qualifier in the chain is looked up using unqualified lookup (in Env::lookupScope), and any template arguments applied, to obtain a Scope object. If the qualifier is "::", the global Scope is used. Subsequent lookups are done in the Scope computed from the previous qualifiers, forming a chain of lookups, until the final name is looked up in the final qualifier's Scope. Finally, unless LF_TEMPL_PRIMARY is set in 'flags', the template arguments are applied to the final name. For this to be possible, the lookup must have been unambiguous, so any caller of lookup that *can* accept a lookup set *must* specify LF_TEMPL_PRIMARY. If LF_TEMPL_PRIMARY *is* set, then the final template arguments (if any) are ignored. If the lookup is being done in the context of an uninstantiated template, and any of the qualifiers are dependent types (14.6.2.1), then either Env::dependentTypeVar or Env::dependentVar will be the final result of lookup (depending on whether we are expecting to find a type or a variable). It is the responsibility of the caller to decide what to do about lookups that turn out to be dependent. (Sometimes the dependent(Type)Var can be used directly, and sometimes it cannot.) ------------------------------------------------- Env::unqualifiedLookup: Lookup of unqualified names Input: - StringRef name - LookupFlags flags Output: - LookupSet &set (It turns out to be convenient to allow callers to pass a Scope, in which case this function simply delegates to Scope::lookup. The description here ignores this possibility.) This function is responsible for doing lookup in the "current" scope, i.e., the set of scopes visible where the name appears. The sequence of scopes to be searched is maintained in Env::scopes, and is updated as the recursive tcheck traversal enters/exits syntactic constructs that contain scopes (functions, blocks, namespaces, etc.). This function is responsible for interpreting the contents of Env::scope. At least at the moment, it contains scopes arising from both chains 1 and 2 above; where necessary, Scope::parentScope is traversed in the process of updating Env::scope. If LF_INNER_ONLY is specified, only the topmost element of Env::scope is used. This corresponds to the innermost enclosing scope. This is useful when a name is about to be added to the innermost scope, and we need to check if the name has already been declared. Otherwise, the scopes in Env::scope are searched in turn, stopping when the name is found. As a wrinkle, the current design allows for certain Scopes to be searched in an order different from their physical ordering in Env::scope, via the Scope "delegation" mechanism. This is a design wart that should probably be fixed. ------------------------------------------- How is class member use-before-declaration implemented? One well-known difficulty with C++ name lookup is 3.3.6p1b1, which says that the scope of a class member includes (among other things) function bodies that occur textually *before* the member declaration. This is problematic because it means that it is not possible to classify each identifier as type or variable name based only on the text that precedes it. The (somewhat novel) solution adopted in Elsa is to allow each identifier to be interpreted as *both* a type and a variable. The parser yields *all* possible resulting parses (represented compactly with nearly maximal sharing of identical subparses). Later, this representation is disambiguated by the type checker. The reason this makes things easier is that, having created an (ambiguous) AST, it is easy for the type checker to skip over method bodies during a first pass over the class member declarations, and then make a second pass once the class' scope is completed. The mechanism is implemented in TS_classSpec::tcheckIntoCompound. The alternative, using the traditional feedback loop from the symbol table to the lexer, would be to have the lexer save the method bodies (perhaps as a textual string) somewhere, and then re-lex them once the class symbol table is built. It is my engineering opinion that it is better to avoid complicating lexing and parsing, and instead do disambiguation during type checking, where the full lookup mechanism is already present. ==================================================== ============= Local syntactic context ============== ==================================================== ----------------------------------------------------------- E_funCall: Function name of a function call Note that context-specific name lookup only occurs if the 'func' is an E_variable or an E_fieldAcc. Both contain a PQName, here called 'name'. At an E_funCall, we have the following information: - func - receiver object expression (if is E_fieldAcc) - qualifiers (optional) - StringRef name (name->getName()) - template arguments (optional), 'targs' - function (expression) arguments, 'eargs' Lookup is split across E_funCall::inner1_tcheck and E_funCall::inner2_tcheck. This split is for performance reasons, and forces a certain partition of the work (see comments in Expression::tcheck for more details). inner1's main job is to map the 'name' to an initial set of Variable object pointers. The set includes all overloaded instances from all relevant namespaces (i.e., those found via 'using' directives). * Do an ordinary (as opposed to argument dependent) lookup of the name, using the receiver object type, the qualifiers, and the StringRef name. Lookup is as specified in 10.2 if E_fieldAcc, or 3.4 if E_variable. * If template arguments are present but the lookup did not find any template functions, issue an error. The result of the inner1 lookup is carried in the 'candidates' list. inner2's job is to pick up where inner1 left off, and use the remaining information in the E_funCall to pick exactly the right Variable for the function call site, doing template instantiation and overload resolution as necessary. It does these steps: * It regards the set computed by inner1 (and stored in 'candidates') as denoting an initial candidate set (possibly empty). * If no receiver object and no qualifiers are present, augment the candidate set by using argument dependent lookup (3.4.2). * Refine/filter the candidate set; for each candidate: - If template arguments are explicitly present, but the candidate is not a template, remove it. - If the candidate is a template, bind the 'targs' (if present) to template parameters (14.8.1), and then do template argument deduction (using the 'eargs') to determine the remaining template parameters (14.8.2, 14.8.2.1). If any incompatibility is found, or not all template arguments have been determined, remove the candidate. Otherwise, replace the candidate with its (fully concrete) instantiation Variable. * Use 'eargs' to do overload resolution (13.3) to select the best candidate from the refined candidate set. ----------------------------------------------------------- E_fieldAcc: Lookup of a field name after '.' or '->' Sections discussed here: - 3.4.5: Lookup rules for field names. - 5.2.4, 5.2.5: Meaning of expressions with '.' and '->'. Other relevant sections: - 13.3: Overloaded function name used in a function call. - 13.4: Overloaded function name used in other contexts. - 13.5.6: If '->' is overloaded. - 14.6.2.2: Issues relating to dependent lookup. I am going to ignore the possibility of '->' and concentrate only on the '.' option. (Perhaps I will ammend this later to discuss '->'.) The structure of an arbitrary E_fieldAcc can be dissected as follows: LHS ICC QUAL FIELD ----------- --- ------------- ------------------------------- object-expr . :: qualifiers :: identifier (FIELD case 1) ~ identifier (2) operator (op) (3) operator (type-id) (4) identifier < targs > (5) 'template' identifier < targs > (6) with components: LHS, "left-hand side": the object being accessed. ICC, "initial colon-colon": A leading "::". Optional. QUAL, "qualifiers": Qualifiers before FIELD. Optional. FIELD: The field being accessed. There are six possibilities. Note that the presence of ICC implies that either QUAL is also present, or that FIELD is case 1, 3 or 5. I will divide this space along four dimensions: - Whether the LHS is a class type or a scalar (non-class) type. - Whether ICC is present. - Whether QUAL is present. - Which of the six options for FIELD is used. The notation "X" means "don't care". For each case, this table says which paragraphs of section 3.4.5 of the standard are applicable: col: 1 2 3 4 5 LHS: X class class scalar scalar ICC: yes no no no no QUAL: X yes no yes no --- ----- ----- ------ ------ 5 4 1,2 E E identifier (FIELD case 1) 5 4 2,3 ** 2,3 ~ identifier (2) 5 4 2 E E operator (op) (3) 5 4 2,7 E E operator (type-id) (4) 5 4 1,2 E E identifier < targs > (5) 5 4 2* E E 'template' identifier < targs > (6) 5 4 2 E E ~ identifier < targs > (7) The code 'E' means it is a syntax error. The code '**' is a case that is not covered by the standard. Below, I give my best guess as to the intention. Note: The cell marked '2*' *is* covered by paragraph 2, because the leading 'template' is not considered part of the id-expression, so it does not prevent it from being an unqualified-id. Note: The rules of paragraph 1 are used to decide between cases 1 and 5 of FIELD, so I show paragraph 1 in both cells without qualifiers. The standard does not explicitly state how to distinguish between cases 1 and 5 if qualifiers are present; I assume the intent is to generalize paragraph 1 in the obvious way, respecting which paragraph specifies how to lookup the qualified name. Note: Paragraph 6 (type checking template arguments in the scope where the E_fieldAcc occurs) potentially applies to columns 1, 2 and 4, and rows 5, 6 and 7. Section 3.4.3 (qualified lookup) is also relevant. 3.4.3p4 applies to column 1, and 3.4.3p5 applies to columns 1 and 2. So, given all of the above, the procedure for resolving the RHS of an E_fieldAcc is: 1. Type-check the LHS. 2. If the LHS type is not a class, check that the FIELD is case 2, then: 2a. If QUAL is present (code '**'), divide the RHS as follows: true-qualifiers :: identifier1 :: ~ identifier2 Do qualified lookups of true-qualifiers::identifier1 and true-qualifiers::identifier2. Both lookups should yield the same type as the LHS (5.2.4). 2b. If QUAL is not present, lookup the identifier in the current scope, and check that it is the same type as the LHS (3.4.5p2,3). Stop. The remaining procedure is for class-typed LHSs. Call the class C. 3. If ICC is present (3.4.5p5), let N be the global scope. Go to step 5. 4. If ICC is not present but QUAL is present (3.4.5p4), call the first qualifier in the sequence Q. (Does this apply if Q is a template-id? I am going to assume so, but a strict reading would say no.) 4a. Perform unqualified lookup of Q. If found, this must be a class or namespace. 4b. Lookup Q in C. If found, this must be a class. 4c. If lookups 4a and 4b both succeed, they must name the same class N. If only one succeeds, call its result N. If neither succeeds, report error. 5. Given the computed initial scope N above, divide the RHS: N :: leading-qualifiers :: final-qualifier :: FIELD (3.4.3p4) Do qualified lookup of leading-qualifiers, starting in scope N. Call the result LQ. Lookup final-qualifier in LQ; call the result FQ. 5a. If FQ is a namespace, lookup FIELD in FQ. 5b. If FQ is a class, lookup FIELD in LQ. 6. If neither ICC nor QUAL are present (3.4.5p1,2,3): 6a. If FIELD is case 1, 3, 5 or 6 (3.4.5p2), lookup FIELD in C. 6b. If FIELD is case 2 (3.4.5p3): - Perform unqualified lookup of the identifier. - Lookup the identifier in C. The lookups must yield a class, the same class if they both succeed. 6c. If FIELD is case 4 (3.4.5p7): - Perform unqualified lookup of the type. - Lookup the type in C. *Both* must succeed, and name the same type. Actually, this applies even to qualified names (3.4.3.1p1). 7. Having looked up the FIELD, it must name a nonempty set of members of C, or a nonempty set of members of one of C's base classes. If the set contains more than one element, overload resolution will happen later, depending on the local syntactic context of this E_fieldAcc. EOF