There are several kinds of tokens: keywords, identifiers, constants, strings, expression operators, and other separators. White space (blanks, tabs, new-lines) is ignored except that it serves to separate tokens; sometimes it is required to separate tokens. If the input has been parsed into tokens up to a particular character, the next token is taken to include the longest string of characters that could constitute a token.
The native character set of Limbo is Unicode, which is identical with the first 16-bit plane of the ISO 10646 standard. Any Unicode character can be used in comments or in strings and character constants. The implementation assumes that source files use the UTF-8 representation, in which 16-bit Unicode characters are represented as sequences of one, two, or three bytes.