UTF-16 string terminator

UTF-16 string terminator

What is the string terminator sequence for a UTF-16 string?
Let me rephrase the question in an attempt to clarify.  How's does the call to wcslen() work?


Answer 1:

Unicode does not define string terminators. Your environment or language does. For instance, C strings use 0x0 as a string terminator, as well as in .NET strings where a separate value in the String class is used to store the length of the string.

To answer your second question, wcslen looks for a terminating L'\0' character. Which as I read it, is any length of 0x00 bytes, depending on the compiler, but will likely be the two-byte sequence 0x00 0x00 if you’re using UTF-16 (encoding U+0000, ‘NUL’)

Answer 2: The wcslen function (from the Standard)

   [#3]   The  wcslen  function  returns  the  number  of  wide
   characters that precede the terminating null wide character.

And the null wide character is L'\0'

Answer 3:

There isn’t any. String terminators are not part of an encoding.

For example if you had the string ab it would be encoded in UTF-16 with the following sequence of bytes: 61 00 62 00. And if you had 大家 you would get 27-59-B6-5B. So as you can see no predetermined terminator sequence.