Input

The lexical analysis process runs on lexatoms, which are represented as integers, Practically, for the sake of efficiency, they are stored in adjacent equal-sized memory cells close to the CPU, called the buffer [1]. This section elaborates on the process of filling the buffer with lexatoms.

If the input source is a file, it can be can be specified by its name being passed to a lexer’s constructor. For this to work, the file must be accessible via the standard library’s file input output API. This does not work for cases, where input is received over sockets, via an interactive command line, or directly from memory. In order to access raw input data via an arbitrary low level API Quex provides an abstraction layer: the byte loaders. That is, an instance of the struct (C) or class (C++) ByteLoader must provide a set of pointers to functions that implement an expected behavior.

If the integers of incoming bytes do not need further interpretation, such as when they are encoded in ASCII, they may be filled directly into the buffer. However, when some type of encoding or encryption is applied, the raw bytes need to be prepared. To handle such situations, the raw bytes are passed through customizable converters, managed by lexatom loaders. Lexatom loaders fill the buffer of a lexer [2]. The chain of loading raw data through a byte loader, converting it via a lexatom loader, filling the buffer, and the running lexical analysis on buffer data is depicted in Fig. 29. Nicely, by default the complete lexatom loading management happens automatically in the background.

../_images/input-stream-chain.svg

Fig. 29 Input provision from API abstraction to lexatoms.

The lexer, the buffer, the lexatom loader, and the byte loader interact closely. This makes it possible to navigate on lexatom level in the input stream. The lexer interacts with the buffer via commands the set the lexatom-based position for input. The buffer interacts with the lexatom loader via a traditional API similar to ‘tell(), seek(), and read()’. Internally, it manages translation buffers so that it can fulfill the positioning request of the lexer. The byte loader also provides an API similar to ‘tell(), seek(), and read()’ on byte level. How the particular byte loader interacts with the low-level which it abstracts is, of course, API specific.

The description of a minimalist application of byte loaders and lexatom laoders opens the discourse of this chapter. Then, after explaining the concept of a lexer’s buffer, byte loaders and lexatom loaders are discussed in detail. It is then shown how manual buffer loading is implemented as an alternative to automatic loading. The input stream of a lexer is only specified at four occasions, namely construction, reset, inclusion, and return from inclusion. A dedicated section explains these procedures. The final subject is stream navigation.

Footnotes