ABSTRACT:
Windows version of SL Viewer fails to receive Unicode supplementary characters (aka surrogate pairs) from keyboard. As a result, those characters are shown as two dots when typed, although cut-n-paste works fine. I found this bug in Japanese environment, but the problem affects any language environment that requires supplementary characters. Source review discovered Linux version has a same bug. A set of two patches is provided, one to fix Window sversion, another to fix Linux version.
BACKGROUND AND PROBLEM STATEMNT:
Recent input methods for Winodws allow users to type some Unicode
supplementary characters. In particular, Japanese MS-IME 2007 bundled
with Windows Vista or Japanese Office-IME 2007 bundled with Microsoft
Office Systems 2007 products allow them. The addition of this feature
may be partly because the Japan government is recently promoting use
of some (approx 300) Unicode supplimentary characters in IT systems.
The feature is generally known as JIS2004 support.
The SL viewer does support Unicode supplementary characters already.
I reviewed the source code and found that the current viewer carefully
handles Unicode supplimentary characters where required.
Good job, Lindens!
Unfortunately, Windows specific keyboard event handling has a bug, and
the Unicode supplementary characters the user typed are turned into
garbages. (Copy-and-paste through clipboard or just show the chat/IM
received from other viewer works fine.)
REPRODUCING STEPS:
- Although SL doesn't support Windows Vista yet, I used Windows Vista
environment, only because it's easier to reproduce the issue.
Windows Vista come with JIS2004 ready.
- You can observe the same problem in Windows XP, but you need to
install some updates to Windows and to buy Microsoft Office System
2007 to use JIS2004 on Windows XP. (Some updates to Windows
component that is essential for JIS2004 support is sold as a part of
Microsoft Office System product, probably due to a marketing
decision...)
Setup Windows Vista. You need either a Japanese version (any edition)
or an Ultimate Edition (any language version).
Install Japanese Language pack for Windows Vista, if you use a
language version other than Japanse. Set your Input Language to
Japanese and Keyboard Layout to Microsoft IME 2007.
Get the latest graphics driver that enables SL viewer to run on Vista.
Start SL viewer, login, and open the chat bar. Make sure the chat bar
has the input focus.
Turn IME on. Make sure the input mode is hiragana.
Type "shikaru". As you type these alphabets, corresponding three
hiragana letters appear in a small composition window on the screen.
Hit SPACE bar several times until the glyph for U+20B9F appears and
selected on the candidate list. (Don't confuse it with U+53F1. See
Fig1)
Hit ENTER to send it to the chat bar.
OBSERVED BEHAVIOUR:
The chat bar shows two dots followed by a hiragana "ru". (See
Fig2)
EXPECTED BEHAVIOUR:
The chat bar shows the character U+20B9F properly, followed by a
hiragana "ru". (See Fig3. This shot is taken after
applying the attached patch.)
THE CAUSE AND A FIX:
The current version of SL viewer has the Windows specific input event
handling code in linden/indra/llwindow/llwindowwin32.cpp. It receives
inputs from keyboard through WM_CHAR message. wParam of WM_CHAR
carries a UTF-16 encoding unit, and a Unicode supplementary character
(i.e., a character that is represented as a surrogate pair) is sent as
two consecutive WM_CHAR messages. The current WM_CHAR handler in SL
viewer simply zero-extends each UTF-16 encoding unit and consider the
resulting 32 bit value as a UTF-32 code. That process creates two
undefined UTF-32 values for a received supplementary character, and
pass them to corresponding UI elements through handleUnicodeChar.
(Hence, two dots appear on the chat bar.)
The source code near WM_CHAR message handler includes a comment
discussing a possible use of WM_UNICHAR message. Although the
intention of the author is unclear, I guess he/she was thinking of
Unicode supplementary characters. If so, I don't support the idea. I
have tested WM_UNICHAR message on early days of Windows XP, and the
observed behaviour was mysteriously different from the specification.
On the other hand, the semantics and the behaviour of WM_CHAR is rock
solid, and we can convert the series of UTF-16 encoding units sent via
WM_CHAR into UTF-32 easily by ourselves. I believe it's better to
stick on the good old WM_CHAR for stability.
My patch just does it. See the attachment (suppchar-1.patch) for
details.
LINUX:
At this moment, Linux version of SL viewer depends on SDL for input
handling, and even the latest version of SDL (1.2.11) has several bugs
that prevent entering Unicode characters through keyboard. However,
the SDL provides platform-independent interface, and the specification
is clear. (Also the fix of SDL is not so hard; See VWR-240 and its
attachmet.)
The case is same as Windows. SDL passes one UTF-16 encoding unit per
a Key Down event, and current SDL specific input handling code simply
treat the UTF-16 encoding unit as a UTF-32 value. It causes the same
bug as Windows. The same fix as Windows should work. See attached
patch (suppchar-2.patch).
ABOUT THE ATTACHMENTS:
Figures.png contains three figures for clarification of repro and are
not part of the patch set.
I'm attaching two separate patches.
The first one, suppchar-1.patch, is the patch to fix Windows bug.
It updates Windows specific code as well as some common codes.
The second one, suppchar-2.patch, fixes potential bug on Linux.
The first patch works without the second.
The second patch requires the first.