Get Lexical Analyzer Generator Quex at SourceForge.net. Fast, secure and Free Open Source software downloads

Previous topic

Analyzer Engine Codec

Next topic

Customized Converters

ConversionΒΆ

Quex provides support for character encodings by means of converters which are plugged into buffer fillers. The converters currently provided directly by quex are IBM’s ICU library and the GNU IConv library. The latter is present by default on most Unix Machines, the former on Windows Systems and the like. The lexical analyzer engine is not aware of the conversion. It commands the buffer filler to filler to fill the buffer memory and then iterates over the content. It does not know what processes happen in the background. The buffer filler, however, is internally adapted to the library on which it relies. If you want to use character set conversion, for example to parse a file encoded in ISO-8859-3, then you need to have one of the supported libraries installed.

Quex can setup the buffer filler for the converter of your choice by command line arguments. Those are

--icu

If you want to use IBM’s ICU library for conversion.

--iconv

If you want to use GNU IConv for conversion.

The decision which one to choose needs to be made should not have to do anything with the judgement about the clarity of the API of those libraries. The user is completely isolated from the details of that. Questions of concern might be

  1. Make sure that the converter supports the encoding that you want to parse. Especially compressed formats might not be supported by every converter.
  2. Computational speed: Compare both converters on some larger file and note down the differences in timing.
  3. Memory Consumption: Compare the memory footprint of the binary for both converters.

In order to use the converter you need to pass the name of the encoding to the constructor of the lexical analyzer as shown below. For details consider the class definition of the generated lexical analyzer class.

quex::tiny_lexer  qlex("example.txt", "UTF-8");