Unary Graph Operations

Elementary compositions only operated by plugging ε-transitions in the original DFAs and did not process their graphs. This section is dedicated to operations on a single DFAs graph.

Reversible Operations

A modification is reversible, if for all possible operands A it is possible to find a procedure that re-generates the original A from its modification. There are two reversible graph operations, namely reversion and complement.

\\R{ RE }

Reversion: Matches any lexeme that can be described as a reversed lexeme matched by RE.

For example, \R{World} is equivalent to dlroW. This feature is useful for definitions of patterns of right-to-left writing systems such as Arabic, Binti and Hebrew. While in left-to-right mode, the words can be written in their original sequence. The reversion operator reorders the characters so that they match in reverse direction. Twofold reversion is equivalent to identity, for any DFA P it holds:

\R{\R{P}} == P
\\Not{ RE }

Complement: Matches any lexeme not matched by RE. The complement of the complement is equivalent to identity, namely for any DFA P, it holds:

\Not{\Not{P}} == P
../_images/state-machine-ieee-float.svg

Fig. 19 DFA matching IEEE floating point numbers.

In both cases, the modifications to the DFAs are non-trivial. fig:re-modifications-ieee-float shows a DFA that matches IEEE floating point numbers. The reversion operation consists:

#1 Swapping the direction of each transition. #2 Connecting all acceptance states via ε-transitions. #3 Transforming the ε-closure of acceptance states into an single initial state. #4 Transforming the initial state into an acceptance state. #5 Power set construction to obtain a DFA.

../_images/state-machine-ieee-float-reverse.svg

Fig. 20 Reversion of the DFA matching IEEE floating point numbers.

The result of the reverted IEEE floating point number DFA is displayed in fig:re-modifications-ieee-float-reverse. The complement operation is, actually, implemented as difference between \Universe and the given DFA. For that, the intersection operation is required, which is defined in a later section TODO. The resulting DFA as seen in fig:re-modifications-ieee-float-complement, also, is clearly not trivially derived from its source.

../_images/state-machine-ieee-float-complement.svg

Fig. 21 Complement of the DFA matching IEEE floating point numbers.

Irreversible Operations

\\AccFlush{R}

Makes all states in R acceptance states. This operation is irreversible, because once all states are acceptance states, all information is lost about what they were before.

\\AccInv{R}

Makes all non-acceptance states in R acceptance states and all acceptance states non-acceptance states. This operation is irreversible because an acceptance state transformed into a non-acceptance state may leave a branch is eliminated, because there is no acceptance at the end. Once a branch is eliminated, all information is lost about what state there was before.

\\Stem{R}

Generates a DFA with a graph consisting of paths which stop at the first reached acceptance state of R. This operation cuts any state beyond the first acceptance, and therefore, is irreversible.

\\Crown{R}

Generates a DFA with a graph consisting of paths starting from first acceptance states of R. This operation cuts any state before the first acceptance, and therefore, is irreversible.

\\InfDev{R}

Matches the infimum deviation of R. This expression accepts when the first character appears that wards off R from matching. An infimum deviation to not match lexemes which start with lexemes matched by R. As shown in figure Fig. 22 the infimum deviation of an IEEE floating point number only contains paths right before the first acceptance state. Any sequence beyond would match an IEEE floating point number, even if the tail does not fit completely. As can be seen in Fig. 19, a . or +/- followed by a . does not constitute a floating point number. However, with the appearance of the first digit [0-9], a sequence matches a floating point number. Then, the sequence cannot be part of the infimum deviation.

../_images/state-machine-ieee-float-infimum-deviation.svg

Fig. 22 Infimum deviation of the DFA matching IEEE floating point numbers.

\\First{R}

Generates a DFA that consists solely of the first transitions of R. For example, \First{123|abc} is equivalent to [1a].

\\NotFirst{R}

Generates a DFA that consists solely of the complement of the first transitions of R. For example, \First{123|abc} is equivalent to [^1a].

Sanity

Regular expressions may, actually, describe dysfunctional patterns. A dysfunctional pattern has at least one of the following properties.

  • It matches the zero-length lexeme. This would make the lexer accept without consuming any further lexatom from the input stream, if no other pattern matches. The lexer would stall in the loop of accepting nothing and not progressing.

  • There exists a state from where it matches on arbitrary repetitions of any lexatom. If this state is reached the complete input stream would be consumed. Since the matching pattern is longer than any other, all other patterns were outrun and dominated.

The singular DFAs \Nothing and \Universe from sec:re-singular are the prime example for dysfunctional patterns. \Nothing directly implements the first condition. \Universe implements the second condition. Notably, \Empty is not dysfunctional. It does not accept the empty lexeme and does not consume an arbitrary number of arbitrary lexatoms. It does what it is expected to do: it never ever matches. Dysfunctionality can be healed by the \Sanitize command.

\\Sanitize{P}

Sanitizes a pattern with regards to two issues. It removes a possible acceptance of the zero-length lexeme. It removes any possible acceptance of of arbitrary repetitions of arbirtrary lexatoms.

The complement of any functional DFA is by definition dysfunctional. To acquire a complement for practical use, it must be sanitized. For this, there exists the following shorthand:

\\A{ R }

Matches anything not matching R but not the zero-length lexeme and not an arbitrary number of repetitions at the end.

../_images/state-machine-ieee-float-sanitized-complement.svg

Fig. 23 Sanitized complement the DFA matching IEEE floating point numbers.

Fig. 23 shows the sanitized complement of an IEEE floating point number. It differs from the complement, as shown in Fig. 21 only, in that the initial state does not accept and state 8 does not iterate on \Any.

Note

When sanitization is applied, instead of a propper definition, it is advisable to review the resulting DFA. This can be done within the define section with the \plot command. For example,:

define {
    ...
    RAW_IDENTIFIER  {LETTER}{MORE}*
    IDENTIFIER      \Sanitize{RAW_IDENTIFIER}
    \plot (fh)      raw.dot       {RAW_IDENTIFIER}
    \plot (fh)      sanitized.dot {IDENTIFIER}
}

creates state machine graphs for the raw and the sanitized implementation of IDENTIFIER in the files raw.dot and sanitized.dot. With the graphviz package, the commands:

> dot -Tpdf raw.dot       -o raw.pdf
> dot -Tpdf sanitized.dot -o sanitized.pdf