Command Line Options¶

This chapter sums up all command line options which can be passed to quex (Version 0.71.2) together with their meaning. Most of the options here are alread explained in separate sections. The present enumeration serves the purpose of a quick reference. There are command line options for code generations, but also for queries. Each family of options is described in a separate section.

Code Generation¶

This section lists the command line options to control code generation.

-i [file name]+¶: The names following -i designate the files containing quex source code to be used as input.

Default: empty list

-o, --analyzer-class [name ::]* name¶

This option defines the name (and possibly name space) of the lexical analyser class that is to be created. The name space can be specified by means of a sequence where names are separated by ::. At the same time, this name also determines the file stem of the output files generated by quex. For example, the invocation

> quex ... -o MySpace::MySubSpace::MySubSubSpace::Lexer

specifies that the lexical analyzer class is Lexer and that it is located in the name space MySubSubSpace which in turn is located MySubSpace which it located in MySpace.

If no name space is specified, the analyzer is placed in name space quex for C++ and the root name space for C. If the analyzer shall be placed in the root name space a :: must be proceeding the class name. For example, the invocation

> quex ... -o ::Lexer

sets up the lexical analyzer in the root name space and

> quex ... -o Lexer

generates a lexical analyzer class Lexer in default name space quex.

--insight¶: Prints insights on construction process together with time stamps. This option is usefule for large, complex, and time consuming lexical analyzer specifications.

Default: false (disabled)

--output-directory, --odir directory¶: directory = name of the output directory where generated files are to be written. This does more than merely copying the sources to another place in the file system. It also changes the include file references inside the code to refer to the specified directory as a base.

--file-extension-scheme, --fes scheme¶

Specifies the file stem and extensions of the output files. The provided argument identifies the naming scheme. The possible values for scheme and their result is mentioned in the list below.

C++

No extension for header files that contain only declarations.
.i for header files containing inline function implementation.
.cpp for source files.

C

.h for header files.
.c for source files.

++

.h++ for header files.
.c++ for source files.

pp

.hpp for header files.
.cpp for source files.

cc

.hh for header files.
.cc for source files.

xx

.hxx for header files.
.cxx for source files.

If the option is not provided, then the naming scheme depends on the --language command line option. For C there is currently no different naming scheme supported.

--language, -l name¶

Defines the programming language of the output. name can be

C for plain C code.

C++ for C++ code.

dot for plotting information in graphviz format.

Default: C++

--computed-gotos, --cg¶: Generate code using a GCC’s computed goto features.

Default: false (disabled)

--character-display hex|utf8¶

Specifies how the character of the state transition are to be displayed when –language dot is used.

hex displays the Unicode code point in hexadecimal notation.

utf8 is specified the character will be displayed ‘as is’ in UTF8 notation.

Default: utf8

--normalize¶: If this option is set, the output of ‘–language dot’ will be a normalized state machine. That is, the state numbers will start from zero. If this flag is not set, the state indices are the same as in the generated code.

Default: false (disabled)

--cbm, --config-by-macros¶: When this flag is set, the configuration file is setup so that the configuration can be overwritten by external macro definitions of type -D…=….

Default: false (disabled)

--cbcm, --config-by-cmake¶: When this flag is set, the configuration file is setup so that the configuration must be done relying on the CMake configuration file feature.

Default: false (disabled)

--version-id string¶: string = arbitrary name of the version that was generated. This string is reported by the version() member function of the lexical analyser.

Default: 0.0.0-pre-release

--no-mode-transition-check¶: Turns off the mode transition check and makes the engine a little faster. During development this option should not be used. But the final lexical analyzer should be created with this option set.

Default: true (not disabled)

--mode-stack-size, --mss 64¶: Size of the mode stack required for GOSUB/RETURN mode transitions.

Default: 64

--indentation-stack-size, --indss 1024¶: Size of the indentation stack required for the off-side rule, i.e. indentation based block delimiters.

Default: 1024

--no-count-lines, --ncl¶: Lets quex generate an analyzer without internal line counting.

Default: true (not disabled)

--no-count-columns, --ncc¶: Lets quex generate an analyzer without internal column counting.

Default: true (not disabled)

--not-eol-is-eos, --neie¶: Disables the implicit ‘end-of-stream’ condition when ‘end-of-line’ is specified.

Default: true (not disabled)

--not-bol-is-bos, --nbib¶: Disables the implicit ‘begin-of-stream’ condition when ‘begin-of-line’ is specified.

Default: true (not disabled)

For the support of derivation from the generated lexical analyzer class the following command line options can be used.

--derived-class, --dc name¶: name = If specified, the name of the derived class that the user intends to provide (see section <<sec-formal-derivation>>). Note, specifying this option signalizes that the user wants to derive from the generated class. If this is not desired, this option, and the following, have to be left out. The name space of the derived analyzer class is specified analogously to the specification for –analyzer-class, as mentioned above.

--derived-class-file file name¶: file-name = If specified, the name of the file where the derived class is defined. This option only makes sense in the context of option --derived-class.

--token-id-prefix prefix¶

prefix = Name prefix to prepend to the name given in the token-id files. For example, if a token section contains the name COMPLEX and the token-prefix is TOKEN_PRE_ then the token-id inside the code will be TOKEN_PRE_COMPLEX.

The token prefix can contain name space delimiters, i.e. ::. In the brief token senders the name space specifier can be left out.

Default: QUEX_TKN_

--token-queue-size number¶: In conjunction with token passing policy ‘queue’, number specifies the number of tokens in the token queue. This determines the maximum number of tokens that can be send without returning from the analyzer function.

Default: 64

--token-id-offset number¶: number = Number where the numeric values for the token ids start to count. Note, that this does not include the standard token ids for termination, uninitialized, and indentation error.

Default: 10000

Certain token ids are standard, in a sense that they are required for a functioning lexical analyzer. Namely they are TERMINATION and UNINITIALIZED. The default values of those do not follow the token id offset, but are 0 and 1. If they need to be different, they must be defined in the ``token { … `` } section, e.g.

token {
    TERMINATION   = 10001;
    UNINITIALIZED = 10002;
    ...
}

A file with token ids can be provided by the option

--foreign-token-id-file file name [[begin-str] end-str]¶

file-name = Name of the file that contains an alternative definition of the numerical values for the token-ids.

Note, that quex does not reflect on actual program code. It extracts the token ids by heuristic. The optional second and third arguments allow to restrict the region in the file to search for token ids. It starts searching from a line that contains begin-str and stops at the first line containing end-str. For example

> quex ... --foreign-token-id-file my_token_ids.hpp   \
                                   yytokentype   '};' \
           --token-prefix          Bisonic::token::

reads only the token ids from the enum in the code fragment yytokentype.

Default: empty list

--foreign-token-id-file-show¶: If this option is specified, then Quex prints out the token ids which have been found in a foreign token id file.

Default: false (disabled)

The following options support the definition of a independently customized token class:

--token-class-file file name¶: file name = Name of file that contains the definition of the token class. The setting provided here is possibly overwritten if the token_type section defines a file name explicitly.

--token-class, --tc [name ::]+ name¶: name is the name of the token class. Using ‘::’-separators it is possible to defined the exact name space as mentioned for the –analyzer-class command line option.

--token-class-support-take-text, --tcstt¶: When specifying an external token class which provides the ‘take_text’ member function, this option needs to be given.

Default: false (disabled)

--token-id-type type name¶: type-name defines the type of the token id.

--token-line-n-type type name¶: type-name defines the type of the token line number member variable of the token class.

--token-column-n-type type name¶: type-name defines the type of the token column number member variable of the token class.

--token-repetition-n-member-name, --trnmn string¶: string defines the token class’ member name that is supposed to carry the repetition number of a token.

--no-token-stamp-line-count, --ntslc¶: When supplied, tokens are not stamped with the line number of their occurence.

Default: true (not disabled)

--no-token-stamp-column-count, --ntscc¶: When supplied, tokens are not stamped with the column number of their occurence.

Default: true (not disabled)

--no-token-stamp, --nts¶: When supplied, tokens are neither stamped with line nor column number of their occurence.

Default: false (disabled)

--token-class-only, --tco¶: When specified, quex only creates a token class. This token class differs from the default token classes in that it may be shared between multiple lexical analyzers.

Note

When this option is specified, then the LexemeNull is implemented along with the token class. In this case all analyzers that use the token class, shall define --lexeme-null-object according the token name space.

Default: false (disabled)

There may be cases where the characters used to indicate buffer limit needs to be redefined, because the default value appear in a pattern. For most encodings, such as ASCII and Unicode, the buffer limit codes do not intersect with valid used code points of characters. Theoretically however, the user may define buffer encodings that require a different definition of the limiting codes. The following option allows modification of the buffer limit code:

--buffer-limit number¶: Defines the value used to mark buffer borders. Since version 0.70.0 the character is no longer exluded from occuring in the input stream. The lexer double checks on content borders.

Default: 0

On several occasions quex produces code related to ‘newline’. The coding of newline has two traditions: The Unix tradition which codes it plainly as 0x0A, and the DOS tradition which codes it as 0x0D followed by 0x0A. To be on the safe side by default, quex codes newline as an alternative of both. In case, that the DOS tradition is not relevant, some performance improvements might be achieved, if the ‘0x0D, 0x0A’ is disabled. This can be done by the following flag.

--no-DOS¶: If specified, the DOS newline (0x0D, 0x0A) is not considered whenever newline is required.

Default: true (not disabled)

Input encodings other than ASCII or UTF32 (which map 1:1 to Unicode code points) can be used in two ways. Either on uses a converter that converts the file content into Unicode and the engine still runs on Unicode, or the engine itself is adapted to the require encoding.

Currently quex-generated lexers can interact with GNU IConv and IBM’s ICU library as input converters. Using one of those requires, of course, that the correspondent library is installed and available. On Unix systems, the iconv library is usually present. ICU is likely required to be installed but also freely available. Using input converters, such as IConv or ICU is a flexible solution. The converter can be adapted dynamically while the internal engine remains running on Unicode. Alternatively, the engine can run directly on a specific encoding, i.e. without a conversion to Unicode. This approach is less flexible, but may be faster.

--encoding encoding name¶: Specifies a encoding for the generated engine. The encoding name specifies the encoding of the internal analyzer engine. An engine generated for a specific encoding can only analyze input of this particular encoding.

Note

When --encoding is specified the command line flag -b or --buffer-element-size does not represent the number of bytes per character, but the number of bytes per code element. The encoding UTF8, for example, is of dynamic length and its code elements are bytes, thus only -b 1 makes sense. UTF16 triggers on elements of two bytes, while the length of an encoding for a character varies. For UTF16, only -b 2 makes sense.

Default: unicode

--encoding-file file name¶

By means of this option a freely customized encoding can be defined. The file name determines at the same time the file where the encoding mapping is described and the encoding’s name. The encoding’s name is the directory-stripped and extension-less part of the given follower. Each line of such a file must consist of three numbers, that specify ‘source interval begin’, ‘source interval length’, and ‘target interval end. Such a line specifies how a cohesive Unicode character range is mapped to the number range of the customized encoding. For example, the mapping for encoding iso8859-6 looks like the following.

0x000 0xA1 0x00
0x0A4 0x1  0xA4
0x0AD 0x1  0xAD
0x60C 0x1  0xAC
0x61B 0x1  0xBB
0x61F 0x1  0xBF
0x621 0x1A 0xC1
0x640 0x13 0xE0

Here, the Unicode range from 0 to 0xA1 is mapped one to one from Unicode to the encoding. 0xA4 and 0xAD are also the same as in Unicode. The remaining lines describe how Unicode characters from the 0x600-er page are mapped inside the range somewhere from 0xAC to 0xFF.

Note

This option is only to be used, if quex does not support the encoding directly. The options --encoding-info and --encoding-for-language help to find out whether Quex directly supports a specific encoding. If a --encoding-file is required, it is advisable to use --encoding-file-info file-name.dat to see if the mapping is in fact as desired.

--no-bad-lexatom-detection, --nbld¶: If present, the encoding error detection is turned off. That also means, that the ‘on_bad_lexatom’ handler is never possibly be called.

Default: true (not disabled)

The buffer on which a generated analyzer runs is characterized by its size (macro QUEX_SETTING_BUFFER_SIZE), by its element’s size, and their type. The latter two can be specified on the command line.

In general, a buffer element contains what causes a state transition in the analyzer. In ASCII code, a state transition happens on one byte which contains a character. If converters are used, the internal buffer runs on plain Unicode. Here also, a character occupies a fixed number of bytes. The check mark in 4 byte Unicode is coded as as 0x00001327. It is treated as one chunk and causes a single state transition.

If the internal engine runs on a specific encoding (--encoding ) which is dynamic, e.g. UTF8, then state transitions happen on parts of a character. The check mark sign is coded in three bytes 0xE2, 0x9C, and 0x93. Each byte is read separately and causes a separate state transition.

--buffer-element-size, -b, --bes 1, 2, 4, ...¶

With this option the number of bytes is specified that a buffer element occupies.

The size of a buffer element should be large enough so that it can carry the Unicode value of any character of the desired input coding space. When using Unicode, to be safe ‘-b 4’ should be used except that it is inconceivable that any code point beyond 0xFFFF ever appears. In this case ‘-b 2’ is enough.

When using dynamic sized encodings, this option is better not used. The encodings define their chunks themselves. For example, UTF8 is built upon one byte chunks and UTF16 is built upon chunks of two bytes.

Note

If a character size different from one byte is used, the .get_text() member of the token class does contain an array that particular type. This means, that .text().c_str() does not result in a nicely printable UTF8 string. Use the member .utf8_text() instead.

Default: -1

--lexatom-type, --buffer-element-type, --bet type name¶

A flexible approach to specify the buffer element size and type is by specifying the name of the buffer element’s type, which is the purpose of this option. Note, that there are some ‘well-known’ types such as uint*_t (C99 Standard), u* (Linux Kernel), unsigned* (OSAL) where the * stands for 8, 16, or 32. Quex can derive its size automatically.

Quex tries to determine the size of the buffer element type. This size is important to determine the target encoding when converters are used. That is, if the size is 4 byte a different Unicode encoding is used then if it was 2 byte. If quex fails to determine the size of a buffer element from the given name of the buffer element type, then the Unicode encoding must be specified explicitly by ‘–converter-ucs-coding-name’.

By default, the buffer element type is determined by the buffer element size.

Upon reload forward it may make sense to leave some of the tail of the current content in the buffer–right in front of the newly loaded content. This content is called ‘fallback region’. By default its holds:

If and only if the maximum length of pre-context patterns can be determined, then this distance is imposed as the length of the fallback region. Otherwise, no fallback region is imposed.

A fallback region implies, that a buffer must not only hold the current lexeme, but also the backward region. On the event of reload where this cannot be maintained, an overflow is notified. The behavior can be modified by the following options.

--fallback-mandatory, --fbm¶: Enforces the fallback region for buffers. Quex signalizes an error, if a pre-context pattern with arbitrary length occurrs. This option must be set in the context of ByteLoader-s that cannot do backward loading, or with manual buffer filling using ‘gavagers’ or ‘feeders’. It is advisable to set when backward loading is time-inefficient.

Default: false (disabled)

--fallback-optional, --fbo¶: This option imposes that fallback is not mandatory. It can be used for cases where all pre-contexts are of deterministic maximum size, but still fallback shall not be imposed. If there are any pre-contexts, then in that case, the related ByteLoader must be able to perform backward loading.

Default: false (disabled)

--no-stdlib, --nostdlib, --nsl¶: This option disables the usage of the Standard C/C++ library. It may be used for minimal dependency lexical analyzers. This option implies that memory management has to be provided externally (–extern-memory-management).

Default: true (not disabled)

--no-lib-lexeme, --nll¶: By means of this option, the implementation of ‘lib lexeme’ is controlled. In the context of multiple lexical analyzers running on the same lexatom type, it may make sense to produce only one ‘lib lexeme’. The library is created in lib/lexeme of the current lexer.

Default: true (not disabled)

--no-lib-quex, --nlq¶: ‘Lib quex’ is the part of the lexers which is the same for all Quex generated analyzers. When multiple lexers are linked into one application, ‘lib quex’ may only be implemented once. The library is created in lib/quex of the current lexer.

Default: true (not disabled)

--extern-memory-management, --emm¶: If set, functions of ‘MemoryManager’ are not implemented by Quex. Instead, the user must/can implement them. This makes sense, in environments where memory manager cannot be accomplished by ‘malloc/free’ or ‘new/delete’.

Default: false (disabled)

--no-lexeme-null, --nln¶: By this option it is controlled, whether a LexemeNull object is to be implemented for the current lexer.

Default: true (not disabled)

The implementation of customized converters is supported by the following options.

--converter-only, --co¶

Only generates lexeme converter code for converters towards UTF8, UTF16, and UTF32. Additionally, converters are provided towards ‘char’, and ‘wchar_t’. A converter to ‘pretty_char’ translates signal characters into ASCII strings.

This options requires ‘–buffer-element-type’ and ‘–encoding’.

Default: false (disabled)

--converter-source-name, --csn string¶: By default, converter generation uses the name of the source encoding as source name as prefix in function names. With this option the function name prefix can be given explicitly.

Template and Path Compression ore methods to combine multiple states into one ‘mega state’. The mega state combines in itself the common actions of the states that it represents. The result is a massive reduction in code size. The compression can be controlled with the following command line options:

--template-compression¶: If this option is set, then template compression is activated.

Default: false (disabled)

--template-compression-uniform¶: This flag enables template compression. In contrast to the previous flag it compresses such states into a template state which are uniform. Uniform means, that the states do not differ with respect to the actions performed at their entry. In some cases this might result in smaller code size and faster execution speed.

Default: false (disabled)

--template-compression-min-gain number¶: The number following this option specifies the template compression coefficient. It indicates the relative cost of routing to a target state compared to a simple ‘goto’ statement. The optimal value, with respect to code size and speed, may vary from processor platform to processor platform, and from compiler to compiler.

Default: 0

--path-compression¶: This flag activates path compression. By default, it compresses any sequence of states that can be lined up as a ‘path’.

Default: false (disabled)

--path-compression-uniform¶: Same as uniform template compression, only for path compression.

Default: false (disabled)

--path-termination number¶

Path compression requires a ‘pathwalker’ to determine quickly the end of a path. For this, each path internally ends with a signal character, the ‘path termination code’. It must be different from the buffer limit code in order to avoid ambiguities.

Modification of the ‘path termination code’ makes only sense if the input stream to be analyzed contains the default value.

Default: 1

The following options control the output of comment which is added to the generated code:

--comment-state-machine¶

With this option set a comment is generated that shows all state transitions of the analyzer in a comment at the begin of the analyzer function. The format follows the scheme presented in the following example

/* BEGIN: STATE MACHINE
 ...
 * 02353(A, S) <- (117, 398, A, S)
 *       <no epsilon>
 * 02369(A, S) <- (394, 1354, A, S), (384, 1329)
 *       == '=' ==> 02400
 *       <no epsilon>
 ...
 * END: STATE MACHINE
 */

It means that state 2369 is an acceptance state (flag ‘A’) and it should store the input position (‘S’), if no backtrack elimination is applied. It originates from pattern ‘394’ which is also an acceptance state and ‘384’. It transits to state 2400 on the incidence of a ‘=’ character.

Default: false (disabled)

--comment-transitions¶

Adds to each transition in a transition map information about the characters which trigger the transition, e.g. in a transition segment implemented in a C-switch case construct

...
case 0x67:
case 0x68: goto _2292;/* ['g', 'h'] */
case 0x69: goto _2295;/* 'i' */
case 0x6A:
case 0x6B: goto _2292;/* ['j', 'k'] */
case 0x6C: goto _2302;/* 'l' */
case 0x6D:
...

The output of the characters happens in UTF8 format.

Default: false (disabled)

--comment-mode-patterns¶

If this option is set a comment is printed that shows what pattern is present in a mode and from what mode it is inherited. The comment follows the following scheme:

       /* BEGIN: MODE PATTERNS
        ...
        * MODE: PROGRAM
        *
        *     PATTERN-ACTION PAIRS:
        *       (117) ALL:     [
]
        *       (119) CALC_OP: "+"|"-"|"*"|"/"
        *       (121) PROGRAM: "//"
        ...
        * END: MODE PATTERNS
        */

This means, that there is a mode PROGRAM. The first three pattern are related to the terminal states ‘117’, ‘119’, and ‘121’. The white space pattern of 117 was inherited from mode ALL. The math operator pattern was inherited from mode CALC_OP and the comment start pattern “//” was implemented in PROGRAM itself.

Default: false (disabled)

The comment output is framed by BEGIN: and END: markers. These markers facilitate the extraction of the comment information for further processing. For example, the Unix command ‘awk’ can be used to extract what appears in between BEGIN: and END: the following way:

awk 'BEGIN {w=0} /BEGIN:/ {w=1;} // {if(w) print;} /END:/ {w=0;}' MyLexer.c

When using multiple lexical analyzers it can be helpful to get precise information about all related name spaces. Such short reports on the standard output are triggered by the following option.

--show-name-spaces, --sns¶: If specified short information about the name space of the analyzer and the token are printed on the console.

Default: false (disabled)

Errors and Warnings¶

When the analyzer behaves unexpectedly, it may make sense to ponder over low-priority patterns outrunning high-priority patterns. The following flag supports these considerations.

--warning-on-outrun, --woo¶: When specified, each mode is investigated whether there are patterns of lower priority that potentially outrun patterns of higher priority. This may happen due to longer length of the matching lower priority pattern.

Default: false (disabled)

Some warnings, notes, or error messages might not be interesting or even be disturbing. For such cases, quex provides an interface to prevent messages on the standard output.

--suppress, -s [integer]+¶: By this option, errors, warnings, and notes may be suppressed. The option is followed by a list of integers–each integer represents a suppressed message.

Default: empty list

The following enumerates suppress codes together with their associated messages.

0: Warning if quex cannot find an included file while diving into a ‘foreign token id file’.

1: A token class file (--token-class-file ) may contain a section with extra command line arguments which are reported in a note.

2: Error check on dominated patterns, i.e. patterns that may never match due to higher precedence patterns which cover a super set of lexemes.

3: Error check on special patterns (skipper, indentation, etc.) whether they are the same.

4: Warning or error on ‘outrun’ of special patterns due to lexeme length. Attention: To allow this opens the door to very confusing situations. For example, a comment skipper on “/” may not trigger because a lower precedence pattern matches on “/*” which is longer and therefore wins.

5: Detect whether higher precedence patterns match on a subset of lexemes that a special pattern (skipper, indentation, etc.) matches. Attention: Allowing such behavior may cause confusing situations. If this is allowed a pattern may win against a skipper, for example. It is the expectation, though, that a skipper shall skip –which it cannot if such scenarios are allowed.

6: Warning if no token queue is used while some functionality might not work properly.

7: Warning if token ids are used without being explicitly defined.

8: Warning if a token id is mentioned as a ‘repeated token’ but has not been defined.

9: Warning if a prefix-less token name starts with the token prefix.

10: Warning if there is no ‘on_bad_lexatom’ handler while a encoding different from Unicode is used.

11: Warning a counter setup is defined without specifying a newline behavior.

12: Warning if a counter setup is defined without an \else section.

13: Warning if a default newline is used upon missing newline definition in a counter definition section.

14: Same as 13, except with hexadecimal ‘0D’.

15: Warning if a token type has no ‘take_text’ member function. It means, that the token type has no interface to automatically accept a lexeme or an accumulated string.

16: Warning if there is a string accumulator while ‘–suppress 15’ has been used.

Queries¶

The former command line options influenced the procedure of code generation. The options to solely query quex are listed in this section. First of all the two traditional options for help and version information are

--help, -h¶: Reports some help about the usage of quex on the console.

Default: false (disabled)

--version, -v¶: Prints information on the version of quex.

Default: false (disabled)

The following options allow to query on character sets and the result of regular expressions.

--encoding-info, --ei name¶: Displays the characters that are covered by the given encoding’s name. If the name is omitted, a list of all supported encodings is printed.

--encoding-list, --el¶: Displays all character encodings that can be implemented directly in the analyzer state machine without using a converter. Additionally, the encodings ‘utf8’ and ‘utf16’ are always supported.

Default: false (disabled)

--encoding-info-file, --eif file name¶: Displays the characters that are covered by the encoding provided in the given file. This makes sense in conjunction with --encoding-file where customized encodings can be defined.

--encoding-for-language, --eil language¶: Displays the encodings that quex supports for the given human language. If the language argument is omitted, all available languages are listed.

--property, --pr property¶: Displays information about the specified Unicode property. The property can also be a property alias. If property is not specified, then brief information about all available Unicode properties is displayed.

Default: empty string

--set-by-property, --sbpr setting¶: Displays the set of characters for the specified Unicode property setting. For query on binary properties only the name is required. All other properties require a term of the form name=value.

--property-match, --prm wildcard-expression¶: Displays property settings that match the given wildcard expression. This helps to find correct identifiers in the large list of Unicode settings. For example, the wildcard-expression Name=*LATIN* gives all settings of property Name that contain the string LATIN.

--set-by-expression, --sbe regular expression¶: Displays the resulting character set for the given regular expression. Larger character set expressions that are specified in [: ... :] brackets.

--numeric, --num¶: If this option is specified the numeric character codes are displayed rather then the characters.

Default: false (disabled)

--intervals, --itv¶: If this option is set, adjacent characters are displayed as intervals, i.e. in terms of begin and end of domains of adjacent character codes. This provides a concise display.

Default: false (disabled)

--names¶: If this option is given, resulting characters are displayed by their (lengthy) Unicode name.

Default: false (disabled)

For the support of software development of Quex and debugging, the following options are provided.

--unit-test¶: Implements some features for Unit Testing. This includes things such as statistics on memory management, or the implementation of a ‘strange input stream’.

Default: true (not disabled)

--debug-exception¶: If set, exceptions are no longer caught and treated internally. This options permits to trace locations where exceptions occur.

Default: false (disabled)

--debug-limit-recursion number¶: This options limits the number of possible recursions. It may extend the default which is set by the python application. Or, it may be set purposely small, so that larger recursions can be detected by triggering an exception.

Default: 0

--debug-original-paths¶: When code is generated the reference to the original templates is maintained. Thus, a debugger or compiler might directly point to the place in the Quex template base while actually observing generated code.

Default: false (disabled)

--debug-QUEX_TYPE_LEXATOM_EXT¶: This option leaves the lexatom type to be defined from outside the lexical analyzer upon compile time. For example: ‘-DQUEX_LEXATOM_TYPE_EXT=uint64_t’ then defines the lexatom type to be uint64_t. This is useful for unit tests, where one and the same lexer is tried with different buffer setups.

Default: false (disabled)

--ql, --quex-lib¶: Defines name prefix and namespace of the common library for all quex lexers. The specification format is the same as for the analyzer class.

In case, that no standard library is present, the usage of standard headers may be omitted. Further, Quex provides a minimalist implementation of standard library functions, so that the lexer may be lonestanding. These options are available by means of the following flags.

--no-stdlib, --nostdlib, --nsl¶: If set, no standard library headers are included. The standard functions, though, need to be implemented. No standard library function is declared, but the generated code still references them. Their declaration should be done in a header section, or the Quex’s tiny standard library must be used.

Default: true (not disabled)

--tiny-stdlib, --tsl¶: This implies the usage of Quex’s simplistic implementation of Standard Library functions. It is considered to be useful in enmbedded systems where the operation system’s functionality is minimal. A ‘printf’ function is not provided. The tiny standard library lives in the namespace of ‘libQuex’. That is, by default all functions in C have the prefix ‘quex_’ and in C++, they are located in ‘quex::’.

Default: false (disabled)