Pattern Definition

Section <<sec-practical-patterns>> already discussed the format of the pattern file. In this section, it is described how to specify patterns by means of regular expressions. The regular expression syntax follows the scheme of what is usually called traditional UNIX syntax cite{}, includes elements of extended POSIX regular expressions cite{IEEE POSIX Standard 1003.2} and POSIX bracket expressions cite{}. This facilitates the migration from and to other lexical analyzer generators and test environments. Additionally, quex provides support for Unicode Properties. A compliance to Unicode Regular Expressions cite{Unicode 5.0 Technical Report Standard #18} is currently not targeted, though, because this expressive power is usually not required for compiler generation.

Nevertheless, quex provides features that, for example, flex does not. If it is intended to maintain compatibility of regular expressions with flex, then please refer to the flex manual cite{}, section ‘Patterns’ and do not use quex-specific constructs. This section discusses pure quex syntax. The explanation is divided into the consideration of context-free expressions and context-dependent expressions.

Quex uses regular expressions to describe patterns and provides its own syntax for filtering and combining character sets. The development of applications running unicode might impose the construction of larger descriptions for patterns. In order to keep mode descriptions clean quex provides a define section where patterns can be defined and later on referred to by their identifiers in curly brackets. See the following example:

define {
   /* Eating white space */
   WHITESPACE    [ \t\n]+
   // An identifier can never start with a number
   IDENTIFIER    [_a-zA-Z][_a-zA-Z0-9]*
}

mode MINE : {
    {WHITESPACE}  { /* do nothing */ }
    {IDENTIFIER}  => TKN_IDENTIFIER(Lexeme);
}

Patterns are used to identify atomic chunks of information such as ‘numbers’, ‘variable names’, ‘string constants’, and ‘keywords’. A concrete chain of characters that matches a particular pattern is called a lexeme. So ‘0.815’ would be a lexeme that matches a number pattern and ‘print’ might be a lexeme that matches a keyword pattern. The description of patterns by means of a formal language is the subject of the following subsections.