Event Handlers¶
This section discusses events beyond plain pattern matches. Event handlers are
defined inside mode definitions and follow the scheme on_
+ event name.
For example, the event handler for the event of match failure inside a mode
EXAMPLE
is specified as shown below.
mode EXAMPLE {
on_failure /* Arguments are implicit */ {
/* user code */
}
}
Arguments to event handler are passed implicitly, in the same way as
Lexeme
, LexemeNull
, LexemeBegin
etc. are passed to pattern match
actions. Obviously, the oversimplified syntax for event handlers interfers with
regular expressions that match their names. To specify regular expression
which happen to be on par with the name of an event handler, quotes have to
be used, for example:
mode EXAMPLE {
"on_failure" {
// matches sequence: 'o', 'n', '_', 'f', 'a', 'i', 'l', 'u', 'r', 'e',
// does not trigger on the event of failure.
}
}
This section discusses all events handlers which can be handled by event handlers. The description of an event handler includes its syntax/name, the list of implicitly passed arguments, the prologue code execute anyways before the event handler, and its default behavior, as shown below.
|
|
Explanation of the event that triggers the call. |
|
Arguments |
Names of implicitly passed arguments. |
Prologue |
Code executed before event handler, even if event handler is customized. |
Default |
Description of default behavior. |
The list possible passed implicit arguments together with their type are shown
in tab:more_implicit_variables
. Lexeme-related variables are typed as
mentioned in Table 9.
An essential operation inside an event handler is to set or unset error codes. The following paragraphs discuss the two functions to perform setting and clearing in conformity with the ‘first error remains’ paradigm.
- error_code_set_if_first(ErrorCode)¶
Sets the error code, if it is the first error code that appeared. A sequence of calls to this function leaves the error code with the error code of the first call and does not overwrite it with the error code of aftereffects.
For example, consider the handler on_failure
. If a previous error
E_Error_OnBadLexatom
is stored as error code, error_code_set_if_first()
cannot not change it anymore upon match failure to E_Error_OnFailure
. In
other words, .error_code
carries the code of the first error event. It
assumed that the first error handler, most probably, treats the root cause of
trouble.
- error_code_clear_this(ErrorCode)¶
Clears the current error code if it is equal to
ErrorCode
.
With this function, the functionality of ‘first come, first served’ can be
repealed. However, this function imposes to specify the specific error code
(ErrorCode
) to be cancelled. This requirement forces the user to be aware
about the scenarios in which he is acting.
Failures and End of Stream¶
Whenever it is impossible to match at a given position, whenever a lexatom appears that is unknown to the current encoding, and whenever the input stream terminates, then the lexer is in a state where it cannot proceed. These cases are handled by the following event handlers.
|
|
Input stream exhaustion, |
|
Arguments |
None |
Prologue |
None |
Default |
Send |
By means of this handler the termination of lexical analysis, or the return to an including file can be handled.
|
|
No pattern of the mode matches at the current input stream position. |
|
Arguments |
None |
Prologue |
Set |
Default |
Send |
on_failure
catches unexpected lexemes where there is no forethought pattern.
This may occur due to a syntax error in the data stream, or due to an incomplete
mode definition. In the first case, a failure handler can help the user to reflect
on what it feeds into the interpreter. In the second case, it can help the
developer of the interpreter to debug its specification.
Note
To provide compatibility with lex/flex, the marker <<FAIL>>
may be used
as alternative to on_failure
and <<EOF>>
as an alternative to
end_of_stream
.
The on_match
and on_after_match
handlers are not executed before and
after the on_failure
handler. on_failure
signals that nothing matched
which contradicts the condition under which the two handlers operate. The
prologue of this on_failure
always sets the E_Error_OnFailure
flag. If
this flag is not desired, it must be cleared explicitly inside the handler as shown
below.
on_failure {
self.error_code_clear_this(E_Error_OnFailure);
self.send_string(QUEX_TKN_ERROR, Lexeme);
}
in C++, or accordingly in C
on_failure {
self.error_code_clear_this(&self, E_Error_OnFailure);
self.send_string(&self, QUEX_TKN_ERROR, Lexeme);
}
It is not admissible to specify patterns which accept nothing. However,
actions such a mode change on the event that nothing matched can be
implemented by on_failure
and undo()
as
on_failure {
self.error_code_clear_this(E_Error_OnFailure);
self.undo();
self.enter_mode(NEW_MODE);
}
With undo()
the lexeme unmatched by the current mode’s patterns is put
back to the input stream. It then gets a second chance to produce a match in
the in NEW_MODE
. Care has to be taken, though, that at some point the
current position is continued and the lexer does not iterate in infinite
on_failure
, undo()
, mode transition cycles.
The on_failure
handler may be used to react on lexemes which are still
unknown when the lexer is written. This may be used to count strands of DNA,
for example, which are not identified as a sequence motif [].
Also, the error flag must be cancelled, since otherwise the lexer may
break up analysis.
on_failure {
self.error_code_clear_this(E_Error_OnFailure);
self.send_string(&self, QUEX_TKN_NO_MOTIF, Lexeme);
self.undo_n(LexemeL-1); // continue analysis at next nucleotide
}
Note
Singalizing an actual error by sending a FAILURE
token instead of an
error flag has a significant disadvantage. An error flag immediately stops
the reception loop. A FAILURE
token is stacked in the token queue. The
moment when the token is popped from the queue might be so much delayed,
that the circumstances of failure may no longer be apparent.
|
|
Buffer loading failed for some unspecific reason. |
|
Arguments |
None |
Prologue |
Set |
Default |
Send |
Under normal conditions, this error must never occurr. However, it occurrs for example if a file has changed in the background, or someone inadvertedly tempered with the analyzers data structures, or if a defective low-level file system driver is used.
This handler is called upon the occurence of a character which is not present
in the lexer’s character encoding. In that case, BadLexatom
carries the
code unit which has no correspondance in the current encoding. If a converter
is used it signalizes the occurence of a conversion error. In the latter case,
BadLexatom
has no valid meaning. The offending code unit is remains
hidden inside the converter.
The bad lexatom detection can be disabled by the command line options
--no-bad-lexatom-detection
or --nbld
. The handler on_bad_lexatom
has always precedence over on_failure
. That is, if --encoding ASCII
is
specified as engine encoding and a value greater than 0x7F appears, an encoding
error is issued even if at the same time no pattern matches. on_bad_lexatom
may also detects buffer loads that contain the border sentinel.
Pattern Matching Adornments¶
Whenever a pattern matches on a lexeme in the input stream, the following sequence of actions is performed.
(If line/column counting active) Count line and column number.
(If off-side rule enabled) Call indentation handler.
(If there is a begin-of-line pre-context) Store of last character.
(If
Lexeme
appears in action) Store terminating zero at end of lexeme.Evaluate
on_match
event handler.Evaluate user action
Evaluate
on_after_match
event handler.
With the handlers on_match
and on_after_match
it is possible to
define actions which are to be performed at any match without writing it
in every pattern action.
|
|
Upon pattern match before the user’s action is executed. |
|
Arguments |
|
Prologue |
None |
Default |
None |
|
|
Upon pattern match after the user’s action is executed. |
|
Arguments |
|
Prologue |
None |
Default |
None |
To make sure that the on_after_match
handler is always executed, it is
essential that the return
statement is never used in any pattern action
directly. Immediate return can be triggered with the FLUSH
command. For an
immediate continuation of analysis, CONTINUE
must be used.
Note
The on_failure
handler, or the <<FAIL>>
pattern handle actually
‘mismatches’. Consequently, the on_match
and on_after_match
are not
executed in that case.
Mode Transitions¶
Upon a mode transition, the event handlers on_entry
and on_exit
are
executed.
|
|
Before entering the present mode. |
|
Arguments |
|
Prologue |
Assert transmission is admissible (in |
Default |
None |
|
|
Before exiting the present mode. |
|
Arguments |
|
Prologue |
Assert transmission is admissible (in |
Default |
None |
on_exit
is called before the mode transition is accomplished. on_entry
is called when the new mode has been set. Tokens may be sent from inside the
entry/exit handlers. However, the lexical analyzer cannot FLUSH
or
CONTINUE
immediately as it can upon pattern match. Tokens which are sent
from inside these handlers are stacked in the token queue.
Buffer Handling¶
|
|
Before exiting the present mode. |
|
Arguments |
|
Prologue |
None |
Default |
|
When new content is about to be loaded into the lexer’s buffer, there must
be enough free space. If there is not enough, it is tried to move content
towards the beginning. If this fails, because the current lexeme spans the
complete buffer, the event on_buffer_overflow
is triggered. The
corresponding default handler tries to extend the current buffer, or copy it
to a different location.
A customized on_buffer_overflow
handler becomes mandatory, when the user
applies his own buffer memory management. In embedded systems where dynamic
allocation is forbidden, a user defined handler might be required. Then,
error code might be sufficient and an empty, but existing, handler suffices.
If, for any reason, one directly refers with pointers into the content of the lexical analyzer’s buffer, then it is essential to react on the event that the buffer content changes.
Note
The required adaption of pointers of the lexical analyzer’s engine
is accomplished is never overwritten. Quex prevents, that a user-defined
handler on_buffer_before_change
overwrites such actions which
are essential for the lexer’s functioning.
After a change in buffer content, it can no longer be assumed that a pointer
into the buffer points to the same content. If this important, the handler
on_buffer_before_change
offers the possibility to copy buffer content to a
safe location and change the referring pointers accordingly.
To handle buffer overflow, the following two functions assign new memory or to extend it. Actually, the two functions below are used internally to allocate buffer memory for included streams (that’s why ‘nested’ appears in their names). However, since operations on the buffer’s setup are very sensitive, they are the means of choice to assign new memory to the buffer [1].
- bool LEXER_Buffer_nested_extend(Buffer* me, ptrdiff_t SizeAdd)
Attempts to allocate new memory for the buffer and migrates the current buffer content to the new memory. Returns
false
if and only if that attempt fails.calls handler:
on_buffer_before_change
.
- bool LEXER_Buffer_nested_migrate(Buffer* me, LEXER* memory, const size_t MemoryLexatomN, E_Ownership Ownership)
Migrates the current buffer’s content to the specified memory chunk. Returns
false
if and only if that attempt fails.calls handler:
on_buffer_before_change
.
Since both functions call potentially the event handler on_buffer_before_change
,
they cannot be called from inside it without risking infinite recursion.
Note
Inside event handlers, the current lexer buffer is available via
self.buffer
.
Finally, a handler may be specified for unforeseen errors related to buffer loading.
|
|
Buffer loading failed for some unforeseen reason. |
|
Arguments |
|
Prologue |
Set |
Default |
Send |
The related error occurrs, for example, if a file has changed in the background, or someone inadvertedly tempered with the analyzers data structures, or if a defective low-level file system driver is used.
Skippers¶
Range skippers exempt stream ranges from consideration which lie inside
a defined set of open and closing delimiters. In C, for example, anything
from /*
to */
is considered a ‘comment’, that is, it does not
result in any syntactical token. The following event handler signalizes
when a range is not closed.
This handler is relevant for skip_range
and skip_nested_range
. For a nested
range skipper the Counter
argument notifies additionally about the nesting
level, i.e. the number of missing closing delimiters.
Indentation Based Scopes¶
The default indentation handler already sends INDENT
, DEDENT
and
NODENT
tokens as soon as it is activated by the mode tag
<indentation:>
. If the behavior needs to be controlled in more detail, the
following event handlers may be used.
|
|
A line starts with a higher indentation than in the previous line. |
|
Arguments |
|
Prologue |
None |
Default |
Send |
The on_indent
event handler is called whenever an indentation is occurred,
right before the first non-whitespace character is treated.
|
|
A line starts with a lower indentation than in the previous line. |
|
Arguments |
|
Prologue |
None |
Default |
Send |
The on_dedent
handler should send N
of DEDENT
tokens. If repeated
tokens are enabled, a repeated token communicates the N
sendings.
Manually, the repeated token sending is accomplised by
|
|
A line starts with a same indentation as in the previous line. |
|
Arguments |
None |
Prologue |
None |
Default |
Send |
|
|
A line starts with a lower indentation not fitting any enclosing scope. |
|
Arguments |
|
Prologue |
Set |
Default |
None |
on_indentation_misfit
handles the event that an indentation block was
closed, but did not fit any open indentation domains. IndentationStackSize
tells about the total size of the indentation stack. IndentationStack[I]
delivers the indentation on level I
, IndentationUpper
delivers the
highest indentation and IndentationLower
the lowest.
The on_indentation_bad
handler is called as soon as a character occurs
which is specified as bad in the indentation handler definition. For
example, there is a general argument about spaces vs. tabulator spaces
when it comes to indentation based scopes (the off-side rule). It
may be in the interest of a language designer to forbid one or the other and
define it as bad
.
Summary¶
The previous subsections elaborated on event handlers related to lexical analysis. Events like match failure or end of stream should be handled by any lexer. Others enable one to react on buffer content change, or to customize an indentation handling strategy.
Many event handlers are linked to errors being preset before their activation.
A error code as assigned to the lexer’s .error_code
member is given in
Event handlers and implicitly set error codes..
event handler |
error code |
---|---|
on_bad_lexatom |
|
on_failure |
|
on_indentation_bad |
|
on_indentation_misfit |
|
on_load_failure |
|
on_skip_range_open |
|
In the introductory examples (e.g. Make It!) the .error_code
has been always checked at the beginning of the token reception loop. This
ensures that the analysis process stops immediately upon the event of an error.
Footnotes