• All submissions to this site are governed by Second Life Project Contribution Agreement. By submitting patches and other information using this site, you acknowledge that you have read, understood, and agreed to those terms.
Issue Details (XML | Word | Printable)

Key: VWR-4010
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: James Linden
Reporter: Alissa Sabre
Votes: 2
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
1. Second Life Viewer - VWR

New search does not accept non ASCII characters

Created: 23/Dec/07 05:34 PM   Updated: 09/Feb/08 06:02 AM
Component/s: Internationalization, Search
Affects Version/s: 1.19.0 Release Candidate
Fix Version/s: 1.19.0 Release Candidate

File Attachments: 1. Text File 20071224.patch (0.8 kB)

Image Attachments:

1. Francais-NG.png
(16 kB)

2. Francais-OK.png
(32 kB)
Environment: Windows XP SP2 and MacOS X 10.4.11
Issue Links:
Duplicate
 
Relates
 

Source Version: 1.18.6.2
Linden Lab Issue ID: DEV-7959
Patch attached: Patch attached
Linden Lab Internal Branch: Branch_1-18-6-Viewer


 Description  « Hide
If you enter words with non ASCII characters in the new search, those non ASCII characters are interpreted strangely, and the search doesn't work as expected. It makes the new search system uneffective for no English speaking residents.

I tested this issue in detail using 1.18.6(2) RC viewer. I believe it is in 1.18.5 viewers, although I have not examined.

I also believe that this issue is same as SVC-1020. I'm filing this separate issue because I think filing it under VWR project attracts more appropriate developers in LL.

REPRODUCTION

Start the 1.18.6.2 viewer and login.

Type a word that contain non ASCII characters, e.g., "Français".

OBSERVED BEHAVIOUR

The non ASCII character ("ç" in this case) is translated into six hexadecimal digits, and the searcch returns no hits. (See figure "Francais-NG.png".)

EXPECTED BEHAVIOUR

The non ASCII character is handled as-is and some hits are returned. (See figure "Francais-OK.png".)

Note that the screen shot "Francais-OK.png" is taken on a viewer after applying the attached patch.

See SVC-1020 for other test cases and discussion.

TECHNICAL DETAILS

The cause of this bug is the way SL viewer creates URLs containing non ASCII characters. Including non ASCII characters in URL parameters requires each UTF-8 byte be written as two hexadecimal digits preceeded by a '%', SL viewer fails to do so. It passes UTF-8 bytes represented as a C++ signed char data to general purpose formatting function, so sign-extension occurs. As a result, six extra F's are put after the '%'. For example, "ç" in the example above is C3 A7 in UTF-8, and it should be written as %C3%A7 in URL. However, the current SL viewer make it as %FFFFFFC3%FFFFFFA7. Simply casting each byte to unsigned char (or U8 in LL dialect) solves the issue. The attached patch is essentially the same thing as ones attached to SVC-1020 (written by march Korda). I just stripped the patch to the minimum lines. (Although the change to lluri.cpp is not strictly necessary to fix this search issue.)



 All   Comments   Change History      Sort Order: Ascending order - Click to sort in descending order
James Linden added a comment - 27/Dec/07 02:46 PM
Yes, this is a sign extension issue. Thank you so much for finding this and for narrowing the patch down.

James Linden added a comment - 27/Dec/07 04:42 PM
Thank you for your help reproducing the problem and narrowing the patch. The LLURI portion of the patch is not quite right, it needs to be static_cast<S32>(static_cast<U8>(c)), otherwise some normal escaping does not work correctly.

I have changed llpaneldirfind.cpp to use LLURI::escape() and added a unit test for escaping the cedilla in Français.


James Linden added a comment - 27/Dec/07 04:43 PM
The fix will appear in the next 1.18.6 release candidate.

Alissa Sabre added a comment - 09/Feb/08 06:02 AM
I tested this issue on 1.19.0 RC1 viewer and found that it is fixed.

Thank you for the fix.