Manual Token Class Definition

When the command line option --token-class-file is specified, followed by a header file name, no token class is generated. This section discusses the requirements on the token class in order to properly interact with the lexer. At the end of this subsection, the command line arguments are mentioned which communicate the token class configuration. In the following TOKEN shall represent the chosen name of the token class.

First of all, a token class/struct must provide a means to construct, destruct, and copy a token. For that the following three functions must be provided.

function:: void TOKEN_construct(TOKEN* me); function:: void TOKEN_destruct(TOKEN* me); function:: void TOKEN_copy(TOKEN* me, const TOKEN* other);

The me pointer takes the role of the this pointer in C++. In C++, construction and destruction are best implemented in constructors and destructors of the token class to ensure that all relevant constructors and destructors of members are called. Then, the TOKEN_construct() function should call the constructor via placement new. TOKEN_destruct() is best implemented as an explicit call to the destructor.

For the implementation of brief token senders (section sec) the following function must be provided.

bool TOKEN_take_text(TOKEN*                 __this,
const LEXER_lexatom_t* Begin,
const LEXER_lexatom_t* End);

This function tells the token to store information about the current lexeme. Begin points to the first lexatom of the text to be carried by the token. End points to the first lexatom after the last lexatom of concern.

As a last requirement, the standard members id, line_n, and column_n must be provided with their name. To generate proper interactions with the lexer, Quex requires knowledge about the token class’ configuration. This information is, either, passed on the command line, or, as shown at the end of the previous section, in the token class’ header file, embraced by <<<QUEX-OPTIONS>>> tags. The command line arguments relevant for external token class definition explained in the list below.

--token-class-file name file-name

Disables the generation of the default token class and considers file-name to contain the name of the user defined header file.

--token-class name0::name1 ... ::class-name

Tells about the token class’ name and namespace. Nested name spaces are mentioned from left to right and separated by ::. The right-most name is the name of the token class itself. If the token class is located in the root name space, only the class name is specified without any ::.

The following three options define the type of the class members which carry the token identifier id, the line number line_n and the column number column_n.

--token-id-type       type-name
--token-line-n-type   type-name
--token-column-n-type type-name

If the lexer shall support token repetition (token identifiers with the \\repeatable tag), then one member in the token class must be specified to carry the repetition number. The name of this member is communicated via the following option.

--token-repetition-n-member-name number

When brief token senders shall accept lexemes as an input the token must provide a TOKEN_take_text function [#f1]_. This is communicated with the following command line option.

--token-class-support-take-text

In that case, also the lexatom type needs to be specified by the following options.

--lexatom-type  type-name

As a starting point, a generated token class might facilitate development. This also, has the advantage that all command line arguments are pre-specified in the <<<QUEX-OPTIONS>>> – ready to be customized.