Make It!

The most direct away to setup an input configuration is to pass a file-name to the constructor of a lexer. In C, the constructor is called explicitly as a function, as seen below.

...
myLexer  lexer;
myLexer_from_file_name(&lexer, "example.txt", /* Converter */NULL);
...

In C++, the same thing is achieved by passing the arguments to the constructor at the time of initialization.

...
myLexer  lexer("example.txt", /* Converter */nullptr);
...

This initiates the lexer to read its input from the file example.txt in the current working directory. Here, the lexer uses a byte loader based on the Standard C or C++ I/O library functions. All related file handles and stream objects are owned by the lexer and their allocation and deallocation happens implicitly. If the lexer shall run on converted input, a converter may be provided. In C, when using the ICU library, this looks like this:

#include <myLexer/lib/quex/converter/icu/Converter_ICU>
#include <myLexer/lib/quex/converter/icu/Converter_ICU.i>
...

const size_t   LexatomSize_bit = sizeof(myLexer_lexatom_t)<<3;
myLexer        lexer;
myLexer_from_file_name(&lexer, "example.txt",
                       quex_Converter_ICU_new(LexatomSize_bit, "UTF8", NULL));

The #include-s of Converter_ICU and Converter_ICU.i paste the declaration and implementation of the ICU converter API into the current file. The setup above reads content from file example.txt decodes it from UTF8 to unicode and stores the unicode characters in the lexer’s buffer. In C++, only the constructor call differs.

const size_t   LexatomSize_bit = sizeof(myLexer_lexatom_t)<<3;
myLexer        lexer(&lexer, "example.txt",
                     quex::Converter_ICU_new(LexatomSize_bit, "UTF8", nullptr));

The functions with names ending with _new allocate and initialize a converter. Further, they contain a flag which allows the lexer to delete the converter when it is no longer needed.

When the standard I/O library is not suitable, an appropriate byte loader may be provided. The following sets up a byte loader for socket communication via POSIX file descriptors.

Or, in C++:

int               connected_fd = accept(listen_fd, (struct sockaddr*)NULL ,NULL);
quex::ByteLoader* byte_loader  = quex::ByteLoader_POSIX_new(connected_fd);
myLexer           lexer(byte_loader, /* Converter */nullptr);

Similarly to the converters, the _new functions allocate and initialize a byte loader and let the lexer take over ownership. Further, a byte loader is greedy, in a sense, that it takes ownership over any resource passed to its constructor, such as a file descriptor or an input stream. Passing a pointer to a local variable would, therefore, be a particularly bad idea. The lexer’s constructor accepting a byte loader can, also, accept a converter.

Enough information is now provided to setup any configuration that relies on provided byte loaders (Standard I/O, POSIX, OSAL, …) and converters (ICU, IConv). The two subsequent sections provide more details for readers who want to put themselves at compfort knowing more about what happens beyond the curtains or who might want to provide more customized solutions. Quex’s design, though, facilitates the implementation of other customized instances. The knowledge to do this is provided in the subsequent sections.