A lexical analyzer mode referes to a behavior of the analyzer engine. More precisely, it defines a set of pattern-action pairs and event handlers that are exclusively active when the analyzer is in a particular mode. The reason for that might be syntactical. Imagine a nested mini-language in a ‘mother’ language that has interferences of its patterns with the pattern of the ‘mother’ language. For example, the mother language may contain floating pointer numbers defined as:
[0-9]+"."[0-9]* => FLOAT_NUMBER(Lexeme);
In the mini-language there might only be integers and the ‘dot’ is considered a period of a sentence, such as in:
[0-9]+ => NUMBER(Lexeme);
"." => TKN_PERIOD;
If both patterns were describe in a single mode, than interferences would occur. If a number occured at the end of a sentence, such as in:
The number of monkeys did not exceed 21. However, there was reason to ...
it would be eaten by the floating point number pattern (i.e. interpreted as 21.0), since the engine follows the longest match. The period at the end of the sentence would not be detected. This is an example were multiple modes are required from a syntax point of view. Another reason for having more than one mode is computational performance. The C-pre-processor statements in the #-regions (e.g. #ifdef, #define, #include, etc.) rely on a reduced syntax. Since, not the whole C-language features need to be present in those regions it might make sense to have them parsed in something like a C_PREPROCESSOR mode.
The following sections elaborate on the characteristics of modes, event handlers in modes, mode inheritance, the features and options to define modes.