Characters and strings

Limbo has a built-in type called string that represents text. A string constant is enclosed in double quotes, so

	s := "hello, world";

declares s to be a string holding the familiar greeting. Character constants are defined by single quotes, so the first character of s is 'h'. String values can be passed around like any other, for example:

	printstring(str: string)
	{
		sys->print("The string is: %s\n", str);
	}

The %s format is the sys->print specification for a string.

We can examine the contents of a string by indexing using square brackets. Indexing begins at element 0, so

	c := s[0];

sets c to the initial character of the string, the integer value 'h'. The type of c is int, not byte or a special character type. Again for emphasis: characters have type int. As a simple consequence, the type of a character constant such as 'h' is int.

Why is a character not a byte, as it is in C? The reason is that Limbo does not use ASCII as its character set, because ASCII is an American character standard that precludes languages other than English. Instead, Limbo uses the 16-bit Unicode standard: characters have values 0 through 65535. This gives sufficient space to represent all the major languages of the world: characters and strings in Limbo can contain Greek (e.g., "Eλλενικον"), Cyrillic ("русский"), Japanese ("日本語"), as well as Latin variants ("Français"). There is much more information about character sets and Unicode than we have space for here; the next chapter covers the topic in detail. For now, just keep in mind that characters are not bytes!

As mentioned earlier, the characters within a string may be indexed by square brackets []. This allows us to modify a character within a string as well as examine it. If s still holds the string "hello, world", then after

	s[5] = '!';

s will hold, "hello! world". We could be more exotic by mixing in some Greek:

	s[7] = 'ω';

produces, "hello! ωorld".

Limbo uses + to concatenate strings; after

	r := "rasp";
	b := "berry";
	rb := r + b;

the string rb has value "raspberry".

The built-in operator len reports the length of a string, in characters:

	len rb

yields the value 9, since there are 9 characters in "raspberry".

Finally, the operation of adding a character to the end of a string is such a common one that Limbo provides a simple mechanism to do so: simply index off the end. Combined with the len operator, the result is the idiom

	s[len s] = '?';

which in this case sets s to its final value, "hello! ωorld?". The following two lines therefore produce the same result:

	t := "hi!";
	t := ""; t[len t] = 'h'; t[len t] = 'i'; t[len t] = '!';

There are strict limits on this extension: it adds only one character at a time and applies only to the end of the string. For more sophisticated operations, use the concatenation operator +.

Strings have one other convenience: casts to and from numeric types convert between textual and binary representations of numbers. That is, to store the formatted string representation of an integer i, it suffices to say

	str := string i;

If i originally had value 17, str will now have the string value "17". To convert back,

	j := int str;

After this second conversion, j will also have integer value 17.

For more sophisticated conversion of values into strings, the system function sprint generates a formatted string and returns it as a value: Sys->

	message := sys->sprint("String %s has %d characters", s, len s);

Exercise: What is the string contained in str after the assignment

		str = string 'A';

Why? How can you construct the string "A" given the character value 'A'? (Hint: you need indexing.)

[XXX Something about bounds checking in this section?]

previous next

© Rob Pike and Howard Trickey 1997. All rights reserved.