This chapter is dedicated to the subject of how to improve a lexical analyzer in terms of computational speed and memory footprint. It is not intended as a ‘silver bullet’ that guides ultimately to the optimal configuration. However, it introduces some basic thoughts which are crucial for optimization. This chapter starts by the introduction of two mechanisms that reduce the code size of the lexical analyzer: template and path compression. Then the influence of the token queue is discussed. Finally, the involvement of converters via engine based codecs is investigated. Were possible the findings are supported by data accessed by some sample applications which are available in the quex distribution package directory ‘demo/tuning/’.
Before any measurements can be made the following points need to be assumed:
A tuning setup is mostly the contrary of a debug scenario. For debugging all asserts should be active, print-outs prosper, and the buffer size is as small as possible to check out reload scenarios. Using a a debug setup for tuning, though, safely drives the development into nonsensical directions.
Quex introduces a powerful feature to reduce the code size of a lexical analyzer of a lexical analyzer: Template Compression. Template compression combines multiple states into a single state that changes slightly dependent on the particular state that is represents. It can be activated with the command line option:
> quex ... --template-compression
The aggressivity of the compression can be controlled by the template compresison ‘min-gain’ to be set by the command line argument:
> quex ... --template-compression-min-gain [number]
It represents the minimum of estimated bytes that could be spared before two states are considered for combination into a template state. If only states with the same entry and drop out section should be considered then the following option can be specified:
> quex ... --template-compression-uniform
Especially unicode lexical analyzers large transition maps can profit tremendously from template compression. If your compiler supports computed gotos try to set QUEX_OPTION_COMPUTED_GOTOS on the compilers’ command line, e.g. with GNU C++:
> g++ -DQUEX_OPTION_COMPUTED_GOTOS ... MyLexer.cpp -c -o MyLexer.o
This is displayed in the figures Three states to be compressed by Template compression. and Template state representing states 0, 1, and 2 from figure fig-template-compression-before.. The first figure displays three states which have a somehow similar transition map, that is they trigger on similar states to similar target states. A so called ‘template state’ must be able to detect all character ranges involved. An additional matrix helps to find the particular target state for the represented state.
Figure Template state representing states 0, 1, and 2 from figure fig-template-compression-before. displays a template state that shall represent state 0, 1, and 2 from figure Three states to be compressed by Template compression.. Every state is replaced by a small stub that defines a ‘state key’. The role of the state key is to define the behavior of the template state. In particular, it defines the target states that belong to certain trigger sets. State 0, 1, and 2 differ with respect to the character ranges [f-m] and [o-s]. For these ranges the transitions differ. The target states for these ranges are stored in the arrays X0 and X1. X0[i] represents the target state for [f-m] if the template needs to represent the state i. Analogously, X1[i] helps to mimik the transitions for [o-s].
Template compression can combine any set of states but its efficiency depends on the similarity. For this reason the template compression coefficient (see option --template-compression-coefficient) has been introduced that defines how aggressively states shall be combined. That means how much lack of similarity can is tolerated and states can still be combined into sets.
Quex introduces another way of compression that profits from the sequential order of similar states. It identifies paths of single character sequences in the state machine. The command line argument:
activates the analyzis and compression. With this compression all states are considered to be combined into a path. As a result some special handling is implemented to distinguish the particularities of each state. If only uniform states shall be considered, the command line flag:
may be provided. Then the overhead of particularities is avoided, but less states may be combined. Path compression requires a path-end character. By default it is set to the character code 0x16 (38 decimal, ‘SYNCHRONOUS IDLE’) assuming that this never occurs in the data stream. If this character occurs in the stream to be analyzed, then the path termination character must be defined explicitly by:
Where the specified number must be different from the buffer limit code.
It applies if a sequence of single characters can be identified that guide along a path of states with matching transition maps. This requirement seems very rigid and thus the chance of hitting a state machine that contains such states may appear low. However, in practical applications this exactly the case where keywords, such as for, while, etc. intersect with the pattern of identifiers, e.g. [a-z]+. In other words, languages with many keywords may profit from this approach.
Instead of implementing for each state of the path a full state, only one state is implemented a so called ‘pathwalker’. A pathwalker consists of the transition map which is common to all states of the path and the path itself.
As with template compression using the computed feature of your compiler might improve performance and code size.
It is possible to combine multiple compression types simply by defining multiple of them on the command line. The result of applying path compression before or after template compression may be significantly different. The sequence of analysis corresponds to the sequence that the activating command line options appear, i.e.:
> quex ... --template-compression ... --path-compression ...
determines that template compression is performed before path compression. Uniform and non-uniform compression can be treated as separate procedures. Thus, it is possible to say for example:
> quex ... --template-compression-uniform \ --path-compression \ --template-compression
which first does a template compression of very similar states, then a general path compression of the remainder. Then whatever remains of states is tried to be combined by aggressive template compression.
It follows a list of general hints for performance tuning:
The three most important things to consider when improving performance are the CPU’s cache, the CPU’s cache, and the CPU’s cache. At the time of this writing (2010 C.E.) a cache-miss is by factors slower then a normal cache read. A fast program can be slowed down to ‘snail speed’ simply by excessive cache miss scenarios.
Practically, this means that the data that is access frequently is best kept close together, so that cache misses are less probable.
The forth important thing about improving performance is to avoid frequent system calls. For example, allocate memory in a chunk and then cut from it when needed, instead of calling new, delete, malloc or free all the time. You might also consider to implement containers yourself instead of relying in STL or similar libraries, if this allows you to control memory placement.
The fifth important thing is to use memcpy and memmove for copying of content. Especially for larger datasets. Few people can compete with the inside that is put into this functions. Simply compare a memcpy operation with a for loop doing the same thing. It is not seldom a factor of 40 between the two. Use memmove when source and destination may overlap.
[to be continued]