Sections

The definition of a lexical analyzer to be generated by quex happens in several sections. The only section that is mandatory, is the mode section. No pattern-action pairs can be defined outside a mode and no lexical analyzer can work without pattern action pairs. The following list provides an overview about sections that can be defined in a source file to be treated by quex:

mode

A mode section starts with the keyword mode and has the following syntax

mode mode-name :
     base-mode-1 base-mode-2 ...
     <option-1> <option-2> ...
{
     pattern-1  action-1
     incidence-1    incidence-handler-1
     incidence-2    incidence-handler-2
     pattern-2  action-2
     pattern-3  action-3
}

After the mode keyword an identifier must name the mode to be specified. A ‘:’ is followed by a list of white space separated list of mode names of base modes (see section <<>>). Then options can be specified as a list of html-like tags, i.e. bracketed in < - > brackets (see section <<>>). An opening curly bracket { opens the region for the definition of pattern-action pairs and incidence handlers (see section <<>>). Finally, a closing } terminates the definition of a mode.

define

This section has the syntax

define {
    ...
    PATTERN_NAME    pattern-definition
    ...
}

The define keyword is followed by an opening curly bracket {. Then pattern names can be defined as pairs. Each pattern name is an identifier. Note, that the pattern names do not enter any namespace of the generated source code. They are only known inside the mode definitions. The pattern-definition can be any formal description of a lexeme structure using quex’s formal language [#f1] (see section <<>>).

token

In this section token identifier can be specified. The definition of token identifiers is optional. The fact that quex warns about undefined token-ids helps to avoid dubious effects of typos, where the analyzer sends token ids that no one catches.

The syntax of this section is

token {
    ...
    TOKEN_NAME;
    ...
}

The token identifiers need to be separated by semi-colons.

Note

The token identifier in this section are prefix-less. The token prefix, e.g. defined by comand line option --token-id-prefix is automatically pasted in front of the identifier.

repeated_token {
    ...
    TOKEN_NAME;
    ...
}

Inside this section the token names are listed that may be sent via implicit repetition using self_send_n(...). That is, inside the token a repetition number is stored and the receive() function keeps returning the same token identifier until the repetition number is zero. Only tokens, that appear inside the repeated_token section may be subject to this mechanism.

Additionally to the section defining the behavior of the lexical analyzer there are sections which allow one to paste code directly into the definition of the engine to be generated. They all follow the pattern:

section-name {
    ...
    section content
    ...
}

Whatever is contained between the two brackets is pasted in the corresponding location for the given section-name. The available sections are the following:

Content of this section is pasted into the header of the generated files. Here, additional include files may be specified or constants may be specified.

body

Extensions to the lexical analyzer class definition. This is useful for adding new class members to the analyzers or declaring friend-ship relationships to other classes. For example:

body {
        int         my_counter;
        friend void some_function(MyLexer&);
}

defines an additional variable my_counter and a friend function inside the lexer class’ body.

init

Extensions to the lexical analyzer constructor. This is the place to initialize the additional members mentioned in the body section. Note, that as in every code fragment, the analyzer itself is referred to via the self variable. For example

init {
        self.my_counter = 4711;
}

Initializes a self declared member of the analyzer my_counter to 4711.

reset

Section that defines customized behavior upon reset. This fragment is executed after the reset of the remaining parts of the lexical analyser. The analyzer is referred to by self.

A customized token type can easily be defined in a section called

token_type

Defines a customized token class, see Customized Token Classes.

Quex supports the inclusion of other files or streams during analysis. This is done by means of a include stack handler Include Stack. It writes the relevant state information into a so called memento [2] when it dives into a file and restores its state from it when it comes back. The following sections allow to make additions to the memento scheme of the include handler:

memento

Extensions to the memento class that saves file local data before a sub-file (included file) is handled.

memento_pack

Implicit Variables:

memento: Pointer to the memento object.

self: Reference to the lexical analyzer object.

InputName: Name of the new data source to be included. This may be a file name
or any artificial identifier passed to one of the include-push functions (Inclusion).

Code to be treated when the state of a lexical analyzer is stored in a memento.

memento_unpack

Implicit Variables:

memento: Pointer to the memento object.

self: Reference to the lexical analyzer object.

Code to be treated when the state of a lexical analyzer is restored from a memento.

An initial mode START_MODE in which the lexical analyzer starts its analysis can be specified via

start = START_MODE;

Footnotes

[1]Quex’s formal language for pattern descriptions consist mostly of POSIX regular expressions. However, some additions where made to facilitate the treatment of unicode properties (<<>>).
[2]See ‘Design Patterns’ (<<>>).