brief and keyword_list

The mode subsections brief and keyword_list enable a concise specification of patter-action pairs. Pattern action pairs written with the => operator tend to become long homogenous list, such as in the following example.

mode EXAMPLE {
    ...
    "="  => QUEX_TKN_OP_ASSIGNMENT;
    "+"  => QUEX_TKN_OP_PLUS;
    "-"  => QUEX_TKN_OP_MINUS;
    "*"  => QUEX_TKN_OP_MULT;
    "/"  => QUEX_TKN_OP_DIV;
    ...
}

This fragment can be rewritten elegantly with the brief subsection as shown below.

brief {
    "=" OP_ASSIGNMENT; "+" OP_PLUS; "-" OP_MINUS; "*" OP_MULT; "/" OP_DIV;
}

Or, more concisely, adding the prefix brief to every token-id name, as

brief OP_ {
    "=" ASSIGNMENT; "+" PLUS; "-" MINUS; "*" MULT; "/" DIV;
}

Or, when the LexemeNull needs to be passed to the token like this

brief (N) OP_ {
    "=" ASSIGNMENT; "+" PLUS; "-" MINUS; "*" MULT; "/" DIV;
}

A precise definition of the brief subsection is given below.

brief ['(' flags ')'] [brief-prefix] '{' list of(re identifier ';') '}'

Defines a list of pattern-action pairs by a list of pairs of regular expressions re and brief token identifiers. The final token id name is composed of the global token id prefix, the brief prefix, and the brief token identifier.

That is, with a given token id prefix QUEX_TKN_, the following brief subsection

brief MINE_ { "something"  SOME; }

is equivalent to a line

“something” => QUEX_TKN_MINE_SOME;

The flags in optional brackets, namely L, N, or ì are explained in table Flags passed to brief..

Table 10 Flags passed to brief.

Flag

Meaning

L

pass Lexeme exeme pointer to token.

N

pass LexemeNull to token.

t

define identifiers implicitly.

L lets the Lexeme be passed to each token sender. That is, a pattern-action pair [a-z]+ => QUEX_TKN_WORD(Lexeme); can be expressed as

brief (L) { [a-z]+   WORD; }

The LexemeNull is passed to the token, when N is specified. The expression "hello"|"bonjour" => QUEX_TKN_GREETING(LexemeNull); can be

expressed as

brief (L) { "hello"|"bonjour"  GREETING; }

When i is specified, it is not required that token-id names are mentioned in the token section. So that

mode EXAMPLE {
    brief (i) TYPE_ { int INT; float FLOAT; }
}

is equivalent to

token {
    TYPE_INT; TYPE_FLOAT;
}

mode EXAMPLE {
    brief TYPE_ { int INT; float FLOAT; }
}

For pattern-action pairs where the pattern is a plain string the even more convenient keyword_list section may be used. For example, the list of pattern-pairs as shown below

"for"      => QUEX_TKN_FOR(LexemeNull);
"while"    => QUEX_TKN_WHILE(LexemeNull);
"until"    => QUEX_TKN_UNTIL(LexemeNull);
"break"    => QUEX_TKN_BREAK(LexemeNull);
"continue" => QUEX_TKN_BREAK(LexemeNull);

is equivalent to

keyword_list {
   "for"; "while"; "until"; "break"; "continue";
}

The precise definition of the keyword_list section follows.

keyword_list ['(' flags ')'] [brief-prefix] '{' list of DFAs '}'

Defines a list of pattern-action pairs by a list of semi-colon separated keywords specified as regular expressions (DFAs). The regular expression for a keyword is taken as the last element of the token identifier name.

The DFAs in the keyword list must match only a single lexeme. Its first letter must have the Unicode Property ID_Start plus the _ which is not included in ID_Start, at the time of this writing. All following characters must have the Unicode Property ID_Continue.

With the given brief prefix the token identifier name is composed as described for the brief section. For example,:

keyword_list MINE_ { "Word" }

is equivalent to

"Word" => QUEX_TKN_MINE_Word;

The flags L, N and i work the same way as for the brief section. The complete list of flags passed to keyword_list is given in table Flags passed to keyword_list..

Table 11 Flags passed to keyword_list.

Flag

Meaning

u (default)

uppercase token identifiers.

l

lowercase token identifiers.

L

pass Lexeme exeme pointer to token.

N

pass LexemeNull to token.

i

define identifiers implicitly.

With the u flag set token id names are generated from keywords in uppercase. That is, a keyword Word is reported as QUEX_TKN_WORD. With l token id names are generated from keywords in lowercase. That is, a keyword Word is reported as QUEX_TKN_word.

Any syntactical means, mentioned in the previous chapters, may be used to describe a DFA in the keyword list, as long as it is a sub-pattern of:

(\P{ID_Start}|_)\P{ID_Continue}*

and does only match one lexeme. For example,

define       { TYPE_PREFIX Type_ }
mode EXAMPLE { keyword_list (u) { {TYPE_PREFIX}_integer }

uses macro expansion and is equivalent to

mode EXAMPLE { "Type_integer" => QUEX_TKN_TYPE_INTEGER; }

However, the following expression uses the optional ? operator so that it matches two lexemes.

mode EXAMPLE { keyword_list (u) { colou?r; } // error!

The expression colou?r matches actually color and colour.