The Lexical Analyzer Class and its Memento

The lexical analyzer class and some of its member functions as well as its memento can be adapted by a set of dedicated top-level sections. Some of these sections receive implicit arguments such as a reference to the lexer class. Implicit arguments are listed directly underneath the section name.

The Lexical Analyzer Class

body { (user code) }

This section pastes extra content to the body of the lexer’s class definition. It can be used to add new members or declaring friend-ship relationships to other classes.

For example:

body {
private:
   int         my_counter;
   FILE*       my_database;
   friend void some_function(MyLexer&);
}

defines an additional variable my_counter and a friend function inside the lexer class’ body. The content of this section is by default public. Private members are accessible to the pattern actions but not to mode transition events.

constructor { (user code) }
``LEXER& self``
returns ``bool``

Extensions to the lexer’s constructor which is executed after all default construction is done. This is the place to initialize the additional members mentioned in the body section. The following example adds the initialization of my_counter and my_database.

constructor {
    self.my_counter = 4711;
    self.my_database = fopen("lexer.db", "r");
    return true;
}

The constructor must return a bool value indicating the success (true) or failure (false) of the construction.

destructor { (user code) }
``LEXER& self``

Extensions to the lexer’s destructor which is executed after all default destruction is done. This is the place to free or de-initialize additional resources. Also, it is good practice to mark the absence of resources. This safeguards agains accidental double-destruction. The following example frees the resource my_database if it has been successfully allocated before.

destructor {
    if( NULL != self.database_fh ) {  // Only close, if fh != NULL
        fclose(self.database_fh);
        self.database_fh = NULL;      // Mark fh as closed.
    }
}

Unlike the constructor extension, this section does not return a value.

reset { (user code) }
``LEXER& self``
returns ``bool``

Extensions to the lexer’s reset. It is pasted after every default member except for the buffer and its input chain has been destructed and before all default members are re-constructed. The customized constructor and destructor code is not executed in this process. The reset procedure is shown in Fig. 10. Resetting the byte loaders and lexatom loaders may fail, the initialization of the buffer and the default destruction may not. After that, the content of the reset section is executed. If this or the subsequent default construction fails, the reset function as a whole returns failure.

../_images/reset-flow-chart.svg

Fig. 10 Flow chart of the reset operation.

If the reset fails, this section must return false. In this case an internal error is set and the reset operation aborts. This can actually be used as assertion, if the lexer has been modified, but reset shall not be accomplished.

This ensures, that the client application may abort upon the next check of the lexer’s error_code.

print { (user code) }
``LEXER& self``

Content of this section is added to the automatically generated print function for the lexer. QUEX_SETTING_DEBUG_OUTPUT_CHANNEL which defaults to the standard output. This function is only relevant for pruposes where insight into the lexer is required. The lexer is referred to by self.

The Memento

Some tokens in the input stream might trigger an inclusion of another stream before the lexical analysis continues at the current position. For example, the C-Preprocessor command #include redirects the parser to consider an included file before parsing what follows it. Inclusions may be nested, i.e. an included file may also include further files. This process is controlled internally by a lexer’s include stack. The include stack receives a memento of the lexer’s state upon inclusion and revives it, once the included file’s analysis is terminated. The storing and restoring of a lexer’s state follows the ‘memento pattern’ []. That is, the important part of a lexer’s state is condensed in a memento when it is stored on the stack. Customizing the memento handling is only necessary, if a lexer contains user extensions.

memento { (user code) }

Pastes code into the body definition of the memento class. It needs to contain members for anything member added in the body section which is subject to store and restore upon inclusion. By default the content is pasted into the public section.

memento_pack { (user code) }
``LEXER& self``:      Reference to the lexical analyzer object.
``MEMENTO memento``:   Pointer to the memento object.
``STRING InputName``: Name of the new data source to be included.
returns ``bool``

This section contains code to be executed when the state of a lexical analyzer is stored in a memento upon inclusion. The code is executed after the default inclusion handling is performed, right before the memento is pushed on the stack.

The InputName may be a file name or any artificial identifier passed to one of the include-push functions (sec:include-stack).

The section may return true if the constructed memento is functional and false if not. A false causes an immediate deletion of the memento. Then, nothing will be pushed on the stack and the inclusion is aborted.

../_images/include-push-flow-chart.svg

Fig. 11 Flow chart of the include-push operation.

The include push operation is shown in Fig. 11. After the allocation of a new byte loader and a new lexatom loader, a new memento is created. It is then filled with the lexer’s inclusion relevant attributes. Then, the memento_pack section is executed. Finally, the buffer is setup with all the new objects so that it can start the analyis of the included file. In contrast to the reset operation, here every step may fail and abort the inclusion.

memento_unpack { (user code) }
``memento``: Pointer to the memento object.
``self``: Reference to the lexical analyzer object.

Code from this section is executed when the state of a lexical analyzer is restored from a memento. The code is executed after the default return from inclusion handling is performed, right before the deletion of the memento.

../_images/include-pop-flow-chart.svg

Fig. 12 Flow chart of the include-pop operation.

Fig. 12 shows the include-pop operation. It reverts what happens during the push. However, the only case where it may fail is when it checks whether there is actually a memento on the stack. If the memento stack is empty, this indicates that the highest level of inclusion is active. Since, include-pop is called upon the detection of and end-of-stream, the failure of the include-pop indicates the termination of lexical analysis.

The Main Header File

In order to ensure, that all variable, class, and function names in the lexer’s customization are declared when they are used in customizations, code can be pasted on top of the generated lexer file. Also, code can be pasted at the bottom of the generated lexer file. This helps to include definitions in the header which are dependent on the definition of the lexer and/or memento class. Pasting such additional code is accomplished by the following two sections.

header { (user code) }

Pastes code at at the bottom of the generated main header file. This is the place, where additional includes or #define statements are placed.

footer { (user code) }

Pastes code at at the bottom of the generated main header file.