Token Repetition

There are cases where a single step in the lexical analysis produces multiple lexical tokens with the same token identifier. A classical example is the ‘block-close’ token that appears in indentation based languages, such as Python. Consider, for example the following python code fragment:

for i in range(10):
    if i in my_list:
        print "found"
print "<end>"

When the lexical analyzer passes "found" and transits to the last print statement, it needs to close three indentation levels. Thus, three ‘block-close’ tokens need to be sent as result of one single analysis step.

The obvious solution is to have a field inside the token that tells how often it is to be repeated. Indeed, this is what the token send macro self_send_n() does. For it to function, the policy of how to set and get the repetition number must be defined inside the token class (see repetition_set and repetition_get in sec-token-class).

Tokens that should carry the potential to be sent repeatedly must be mentioned in a repeated_token section inside the quex input sources, e.g.

repeated_token {
    ABC;
    XYZ;
    CLOSE;
}

where QUEX_TKN_XYZ is the token identifier that can be repeated. Now, the generated engine supports token repetition for the above three token ids. This means that self_send_n() can be used for them and the token receive functions consider their possible repetition. That means that if, for example,

self_send_n(5, QUEX_TKN_XYZ);

is called from inside the analyzer, then

token_id = my_lexer.receive();

will return five times the token identifier QUEX_TKN_XYZ before it does the next analysis step. Note, implicit token repetition may have a minor impact on performance since for each analysis step an extra comparison is necessary. In practical, though, the reduction of function calls for repetition largely outweighs this impact.