Abstract

The Program

Quex is a tool to generate lexical analyzers. A lexical analyzer is a program that transforms a stream of characters into a stream of 'atomic chunks of meaning', as shown in the figure below:

As a result of the lexical analysis process the list of 'atomic chunks of meaning', so called 'tokens', prepare the interpretation on some higher level. Each token consists at least of a type identifier. Additionally, some parameters about the matching 'lexeme' might be stored in the token.

Quex provides a convenient means to describe a process of lexical analysis. It generates code in C/C++, which implements the user's lexical analyzer.

Features

  • Produces directly coded lexical analyzer, rather than table based engines.
  • Sophisticated lexical analyzer modes which allow mode inheritance and mode transitions.
  • Sophisticated buffer management which includes a free tell/seek based on character indices even with codings of where characters have dynamic size (e.g. UTF-8, UTF-16).
  • Support for a large variety of international character encodings relying on established conversion libraries (IBM's ICU or GNU's IConv).
  • Support for include stacks.
  • Inherent token handling (queue or single token). Support for customized token types.
  • Event handlers allow to trigger actions based on mode transitions, indentation events and other analyzis related events.
  • Many examples are provided along with the software that demonstrate its usage.

Mission

The vision of the author of quex was to open the door to high speed interpreted languages beyond the bounds of traditional ASCII character sets. Language constructs should be possible that include classical math symbols such as '≠' and '¬' as well as identifiers made up of characters from all scripts of the world.

The feature of sophisticated analyzer modes shall further facilitate the implementation of redundancy reduced languages. That means, that some tokens might be derived from the current state of the analyzer or its state transitions and do not need to be triggered by source code elements. This helps to reduce the visual noise of a programming language.

In short, the author wishes to make a contribution towards the implementation of clear and beautiful programming languages, while, at the same time facilitating this task for the person who is about to implement it.