Event Handlers

This section discusses events beyond plain pattern matches. Event handlers are defined inside mode definitions and follow the scheme on_ + event name. For example, the event handler for the event of match failure inside a mode EXAMPLE is specified as shown below.

mode EXAMPLE {
    on_failure /* Arguments are implicit */ {
        /* user code */
    }
}

Arguments to event handler are passed implicitly, in the same way as Lexeme, LexemeNull, LexemeBegin etc. are passed to pattern match actions. Obviously, the oversimplified syntax for event handlers interfers with regular expressions that match their names. To specify regular expression which happen to be on par with the name of an event handler, quotes have to be used, for example:

mode EXAMPLE {
    "on_failure" {
        // matches sequence: 'o', 'n', '_', 'f', 'a', 'i', 'l', 'u', 'r', 'e',
        // does not trigger on the event of failure.
    }
}

This section discusses all events handlers which can be handled by event handlers. The description of an event handler includes its syntax/name, the list of implicitly passed arguments, the prologue code execute anyways before the event handler, and its default behavior, as shown below.

on_event

Explanation of the event that triggers the call.

Arguments

Names of implicitly passed arguments.

Prologue

Code executed before event handler, even if event handler is customized.

Default

Description of default behavior.

The list possible passed implicit arguments together with their type are shown in tab:more_implicit_variables. Lexeme-related variables are typed as mentioned in Table 9.

An essential operation inside an event handler is to set or unset error codes. The following paragraphs discuss the two functions to perform setting and clearing in conformity with the ‘first error remains’ paradigm.

error_code_set_if_first(ErrorCode)

Sets the error code, if it is the first error code that appeared. A sequence of calls to this function leaves the error code with the error code of the first call and does not overwrite it with the error code of aftereffects.

For example, consider the handler on_failure. If a previous error E_Error_OnBadLexatom is stored as error code, error_code_set_if_first() cannot not change it anymore upon match failure to E_Error_OnFailure. In other words, .error_code carries the code of the first error event. It assumed that the first error handler, most probably, treats the root cause of trouble.

error_code_clear_this(ErrorCode)

Clears the current error code if it is equal to ErrorCode.

With this function, the functionality of ‘first come, first served’ can be repealed. However, this function imposes to specify the specific error code (ErrorCode) to be cancelled. This requirement forces the user to be aware about the scenarios in which he is acting.

Failures and End of Stream

Whenever it is impossible to match at a given position, whenever a lexatom appears that is unknown to the current encoding, and whenever the input stream terminates, then the lexer is in a state where it cannot proceed. These cases are handled by the following event handlers.

on_end_of_stream

Input stream exhaustion,

Arguments

None

Prologue

None

Default

Send TERMINATION; end analysis step.

By means of this handler the termination of lexical analysis, or the return to an including file can be handled.

on_failure

No pattern of the mode matches at the current input stream position.

Arguments

None

Prologue

Set E_Error_OnFailure.

Default

Send TERMINATION; end analysis step.

on_failure catches unexpected lexemes where there is no forethought pattern. This may occur due to a syntax error in the data stream, or due to an incomplete mode definition. In the first case, a failure handler can help the user to reflect on what it feeds into the interpreter. In the second case, it can help the developer of the interpreter to debug its specification.

Note

To provide compatibility with lex/flex, the marker <<FAIL>> may be used as alternative to on_failure and <<EOF>> as an alternative to end_of_stream.

The on_match and on_after_match handlers are not executed before and after the on_failure handler. on_failure signals that nothing matched which contradicts the condition under which the two handlers operate. The prologue of this on_failure always sets the E_Error_OnFailure flag. If this flag is not desired, it must be cleared explicitly inside the handler as shown below.

on_failure {
    self.error_code_clear_this(E_Error_OnFailure);
    self.send_string(QUEX_TKN_ERROR, Lexeme);
}

in C++, or accordingly in C

on_failure {
    self.error_code_clear_this(&self, E_Error_OnFailure);
    self.send_string(&self, QUEX_TKN_ERROR, Lexeme);
}

It is not admissible to specify patterns which accept nothing. However, actions such a mode change on the event that nothing matched can be implemented by on_failure and undo() as

on_failure {
    self.error_code_clear_this(E_Error_OnFailure);
    self.undo();
    self.enter_mode(NEW_MODE);
}

With undo() the lexeme unmatched by the current mode’s patterns is put back to the input stream. It then gets a second chance to produce a match in the in NEW_MODE. Care has to be taken, though, that at some point the current position is continued and the lexer does not iterate in infinite on_failure, undo(), mode transition cycles.

The on_failure handler may be used to react on lexemes which are still unknown when the lexer is written. This may be used to count strands of DNA, for example, which are not identified as a sequence motif []. Also, the error flag must be cancelled, since otherwise the lexer may break up analysis.

on_failure {
    self.error_code_clear_this(E_Error_OnFailure);
    self.send_string(&self, QUEX_TKN_NO_MOTIF, Lexeme);
    self.undo_n(LexemeL-1); // continue analysis at next nucleotide
}

Note

Singalizing an actual error by sending a FAILURE token instead of an error flag has a significant disadvantage. An error flag immediately stops the reception loop. A FAILURE token is stacked in the token queue. The moment when the token is popped from the queue might be so much delayed, that the circumstances of failure may no longer be apparent.

on_failure

Buffer loading failed for some unspecific reason.

Arguments

None

Prologue

Set E_Error_OnLoadFailure.

Default

Send TERMINATION; end analysis step.

Under normal conditions, this error must never occurr. However, it occurrs for example if a file has changed in the background, or someone inadvertedly tempered with the analyzers data structures, or if a defective low-level file system driver is used.

This handler is called upon the occurence of a character which is not present in the lexer’s character encoding. In that case, BadLexatom carries the code unit which has no correspondance in the current encoding. If a converter is used it signalizes the occurence of a conversion error. In the latter case, BadLexatom has no valid meaning. The offending code unit is remains hidden inside the converter.

The bad lexatom detection can be disabled by the command line options --no-bad-lexatom-detection or --nbld. The handler on_bad_lexatom has always precedence over on_failure. That is, if --encoding ASCII is specified as engine encoding and a value greater than 0x7F appears, an encoding error is issued even if at the same time no pattern matches. on_bad_lexatom may also detects buffer loads that contain the border sentinel.

Pattern Matching Adornments

Whenever a pattern matches on a lexeme in the input stream, the following sequence of actions is performed.

  1. (If line/column counting active) Count line and column number.

  2. (If off-side rule enabled) Call indentation handler.

  3. (If there is a begin-of-line pre-context) Store of last character.

  4. (If Lexeme appears in action) Store terminating zero at end of lexeme.

  5. Evaluate on_match event handler.

  6. Evaluate user action

  7. Evaluate on_after_match event handler.

With the handlers on_match and on_after_match it is possible to define actions which are to be performed at any match without writing it in every pattern action.

on_match

Upon pattern match before the user’s action is executed.

Arguments

Lexeme, LexemeBegin, LexemeEnd, LexemeL.

Prologue

None

Default

None

on_after_match

Upon pattern match after the user’s action is executed.

Arguments

Lexeme, LexemeBegin, LexemeEnd, LexemeL.

Prologue

None

Default

None

To make sure that the on_after_match handler is always executed, it is essential that the return statement is never used in any pattern action directly. Immediate return can be triggered with the FLUSH command. For an immediate continuation of analysis, CONTINUE must be used.

Note

The on_failure handler, or the <<FAIL>> pattern handle actually ‘mismatches’. Consequently, the on_match and on_after_match are not executed in that case.

Mode Transitions

Upon a mode transition, the event handlers on_entry and on_exit are executed.

on_entry

Before entering the present mode.

Arguments

self, FromModeP

Prologue

Assert transmission is admissible (in DEBUG).

Default

None

on_exit

Before exiting the present mode.

Arguments

self, ToModeP

Prologue

Assert transmission is admissible (in DEBUG).

Default

None

on_exit is called before the mode transition is accomplished. on_entry is called when the new mode has been set. Tokens may be sent from inside the entry/exit handlers. However, the lexical analyzer cannot FLUSH or CONTINUE immediately as it can upon pattern match. Tokens which are sent from inside these handlers are stacked in the token queue.

Buffer Handling

on_buffer_overflow

Before exiting the present mode.

Arguments

self, LexemeBegin, LexemeEnd, BufferSize, BufferBegin, BufferEnd

Prologue

None

Default

  • c current buffer size.

  • s target size of allocation.

  • B number of buffer border lexatoms (usually 1).

  • F fallback region’ size.

  1. Set s = 2 * c.

  2. If allocation of s is successful, then terminate.

  3. Set s = (s + c) / 2.

  4. If s == 2 * B + F, then set error E_Error_Buffer_Overflow_LexemeTooLong.

When new content is about to be loaded into the lexer’s buffer, there must be enough free space. If there is not enough, it is tried to move content towards the beginning. If this fails, because the current lexeme spans the complete buffer, the event on_buffer_overflow is triggered. The corresponding default handler tries to extend the current buffer, or copy it to a different location.

A customized on_buffer_overflow handler becomes mandatory, when the user applies his own buffer memory management. In embedded systems where dynamic allocation is forbidden, a user defined handler might be required. Then, error code might be sufficient and an empty, but existing, handler suffices.

If, for any reason, one directly refers with pointers into the content of the lexical analyzer’s buffer, then it is essential to react on the event that the buffer content changes.

Note

The required adaption of pointers of the lexical analyzer’s engine is accomplished is never overwritten. Quex prevents, that a user-defined handler on_buffer_before_change overwrites such actions which are essential for the lexer’s functioning.

After a change in buffer content, it can no longer be assumed that a pointer into the buffer points to the same content. If this important, the handler on_buffer_before_change offers the possibility to copy buffer content to a safe location and change the referring pointers accordingly.

To handle buffer overflow, the following two functions assign new memory or to extend it. Actually, the two functions below are used internally to allocate buffer memory for included streams (that’s why ‘nested’ appears in their names). However, since operations on the buffer’s setup are very sensitive, they are the means of choice to assign new memory to the buffer [1].

bool LEXER_Buffer_nested_extend(Buffer*  me, ptrdiff_t SizeAdd)

Attempts to allocate new memory for the buffer and migrates the current buffer content to the new memory. Returns false if and only if that attempt fails.

calls handler: on_buffer_before_change.

bool LEXER_Buffer_nested_migrate(Buffer*  me, LEXER*  memory, const size_t MemoryLexatomN, E_Ownership         Ownership)

Migrates the current buffer’s content to the specified memory chunk. Returns false if and only if that attempt fails.

calls handler: on_buffer_before_change.

Since both functions call potentially the event handler on_buffer_before_change, they cannot be called from inside it without risking infinite recursion.

Note

Inside event handlers, the current lexer buffer is available via self.buffer.

Finally, a handler may be specified for unforeseen errors related to buffer loading.

on_load_failure

Buffer loading failed for some unforeseen reason.

Arguments

self

Prologue

Set E_Error_OnLoadFailure.

Default

Send TERMINATION; end analysis step.

The related error occurrs, for example, if a file has changed in the background, or someone inadvertedly tempered with the analyzers data structures, or if a defective low-level file system driver is used.

Skippers

Range skippers exempt stream ranges from consideration which lie inside a defined set of open and closing delimiters. In C, for example, anything from /* to */ is considered a ‘comment’, that is, it does not result in any syntactical token. The following event handler signalizes when a range is not closed.

This handler is relevant for skip_range and skip_nested_range. For a nested range skipper the Counter argument notifies additionally about the nesting level, i.e. the number of missing closing delimiters.

Indentation Based Scopes

The default indentation handler already sends INDENT, DEDENT and NODENT tokens as soon as it is activated by the mode tag <indentation:>. If the behavior needs to be controlled in more detail, the following event handlers may be used.

on_indent

A line starts with a higher indentation than in the previous line.

Arguments

Indentation: Column number count of whitespace.

Prologue

None

Default

Send INDENT.

The on_indent event handler is called whenever an indentation is occurred, right before the first non-whitespace character is treated.

on_dedent

A line starts with a lower indentation than in the previous line.

Arguments

  • N: Number of scopes that closed.

  • Indentation: Column number count of whitespace.

Prologue

None

Default

Send N times DEDENT.

The on_dedent handler should send N of DEDENT tokens. If repeated tokens are enabled, a repeated token communicates the N sendings. Manually, the repeated token sending is accomplised by

on_nodent

A line starts with a same indentation as in the previous line.

Arguments

None

Prologue

None

Default

Send NODENT.

on_indentation_misfit

A line starts with a lower indentation not fitting any enclosing scope.

Arguments

IndentationStackSize, IndentationStack[], IndentationUpper, IndentationLower, N

Prologue

Set E_Error_OnIndentationMisfit.

Default

None

on_indentation_misfit handles the event that an indentation block was closed, but did not fit any open indentation domains. IndentationStackSize tells about the total size of the indentation stack. IndentationStack[I] delivers the indentation on level I, IndentationUpper delivers the highest indentation and IndentationLower the lowest.

The on_indentation_bad handler is called as soon as a character occurs which is specified as bad in the indentation handler definition. For example, there is a general argument about spaces vs. tabulator spaces when it comes to indentation based scopes (the off-side rule). It may be in the interest of a language designer to forbid one or the other and define it as bad.

Summary

The previous subsections elaborated on event handlers related to lexical analysis. Events like match failure or end of stream should be handled by any lexer. Others enable one to react on buffer content change, or to customize an indentation handling strategy.

Many event handlers are linked to errors being preset before their activation. A error code as assigned to the lexer’s .error_code member is given in Event handlers and implicitly set error codes..

Table 12 Event handlers and implicitly set error codes.

event handler

error code

on_bad_lexatom

E_Error_OnBadLexatom

on_failure

E_Error_OnFailure

on_indentation_bad

E_Error_OnIndentationBad

on_indentation_misfit

E_Error_OnIndentationMisfit

on_load_failure

E_Error_OnLoadFailure

on_skip_range_open

E_Error_OnSkipRangeOpen

In the introductory examples (e.g. Make It!) the .error_code has been always checked at the beginning of the token reception loop. This ensures that the analysis process stops immediately upon the event of an error.

Footnotes