• All submissions to this site are governed by Second Life Project Contribution Agreement. By submitting patches and other information using this site, you acknowledge that you have read, understood, and agreed to those terms.
Issue Details (XML | Word | Printable)

Key: SVC-1020
Type: Bug Bug
Status: Closed Closed
Resolution: Duplicate
Priority: Normal Normal
Assignee: James Linden
Reporter: SakuraNoel Fayray
Votes: 10
Watchers: 6
Operations

If you were logged in you would be able to see more operations.
2. Second Life Service - SVC

New search does not accept Japanese inputs

Created: 01/Dec/07 11:54 PM   Updated: 16/Jan/08 04:31 AM
Return to search
Component/s: Internationalization, Search
Affects Version/s: None
Fix Version/s: None

File Attachments: 1. Text File SVC-1020-type2.patch (4 kB)
2. Text File SVC-1020.patch (3 kB)

Image Attachments:

1. 01.jpg
(69 kB)

2. 02.jpg
(47 kB)
Environment:
Mac OS X 10.4.11, WindowsXP
Second Life 1.18.5 (2) , Second Life 1.18.5 (3), Second Life 1.18.6
Issue Links:
Duplicate
 
Relates
 

Patch attached: Patch attached
Linden Lab Internal Branch: Branch_1-18-6-Viewer


 Description  « Hide
In a new search engine, it is not possible to retrieve it in Japanese.

1. A new search screen is displayed
2. Japanese is input to the input column.
3. "Search" button is pushed.

It operated normally until Second Life 1.18.5(3) was released.
Similar trouble is occurring in "Release Candidate" now.

The problem on the search engine side.



 All   Comments   Change History      Sort Order: Ascending order - Click to sort in descending order
Lex Neva added a comment - 02/Dec/07 09:43 AM
Just to be sure the problem is in the search engine, and not in how the client sends data to the search engine, could you test your search here?

http://www.lexneva.name/sl/search.html

That form submits searches directly to the new search system.


SakuraNoel Fayray added a comment - 03/Dec/07 12:02 AM
Hi, Lex.

1. Japanes (日本語) is input to the input column.
2. Search button is pushed.

[Result]
-------------------------------------------------------------------------------
Your search - { - did not match anything we could find within Second Life. Nothing was found in Second Life containing "{". Suggestions:
Make sure all words are spelled correctly.
Try different keywords.
Try more general keywords.
-------------------------------------------------------------------------------
It was not possible to retrieve it.


march Korda added a comment - 15/Dec/07 05:02 AM
repro of another way:

1. you can see japanese "あ"(HIRAGANA LETTER A) at unicode.org
http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=3042
>Unihan data for U+3042
>Glyphs
>Your Browser
>あ
>Encoding Forms
>UTF-8
>E3 81 82
2. copy character from "Your Browser" column to clipboard.
3. open secondlife viewer,
open "Search Second Life" window by Ctrl+F,
select "All" tab.
4. copy character from clipboard.
press Search Button
5.you can see this message.
(note that the character code in UTF-8 of "あ" is E3 81 82)

Your search - FFFFE3 FFFF81 FFFF82 - did not match anything we could find within Second Life.
Nothing was found in Second Life containing "FFFFE3 FFFF81 FFFF82".

Suggestions:

  • Make sure all words are spelled correctly.
  • Try different keywords.
  • Try more general keywords.

also you can see "日","本","語" at unicode.org
"日" : UTF-8 : E6 97 A5
http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=65E5
"本" : UTF-8 : E6 9C AC
http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=672C
"語" : UTF-8 : E8 AA 9E
http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=8A9E

Search result in Secondlife viewer is below:

Your search - FFFFE6 FFFF97 FFFFA5 FFFFE6 FFFF9C FFFFAC FFFFE8 FFFFAA FFFF9E - did not match anything we could find within Second Life.
Nothing was found in Second Life containing "FFFFE6 FFFF97 FFFFA5 FFFFE6 FFFF9C FFFFAC FFFFE8 FFFFAA FFFF9E".

Version:
Second Life 1.18.6 (0) Dec 4 2007 18:06:16 (Second Life Release)

You are at 219535.0, 303862.0, 301.4 in karuizawa located at sim4837.agni.lindenlab.com (63.210.159.233:13006)
Second Life Server 1.18.6.75511


march Korda added a comment - 15/Dec/07 05:17 AM
to Lex
You should add the following lines to a header and should set a character code to UTF-8.
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

A retrieval word is "日","本","語".
It should be converted with %E6%97%A5%E6%9C%AC%E8%AA%9E
but if encoding is set as Shift-JIS, It is converted with %93%FA%96%7B%8C%EA

http://secondlife.com/app/search/search_proxy.php?q=%E6%97%A5%E6%9C%AC%E8%AA%9E&s=All (UTF-8)
http://secondlife.com/app/search/search_proxy.php?q=%93%FA%96%7B%8C%EA&s=All (Shift-JIS)


Nock Forager added a comment - 17/Dec/07 10:06 PM
It's not Search Engine's problem (ya, there is no reason that google engine doesn't support multilanguages . With firefox, march's first URL(UTF-8 type) cue returns fine results. it seems SL client couldn't handle UTF words well (as usual...)

Celierra Darling added a comment - 17/Dec/07 10:43 PM
Seems to repro for me.

By the way, there seems to be a lot of people watching, but nobody voting for this... Voting helps people know to prioritize this higher, so don't forget to do both.


Celierra Darling added a comment - 18/Dec/07 02:05 AM
To clarify, right now, if I copy and paste "日本語" into Lex's form, it seems to work, but when I do it in the viewer, I get the same problem. Does this mean this should be moved to VWR? (I can move, if that should be done.)

Celierra Darling added a comment - 18/Dec/07 02:06 AM
Better description, adding "Search" to components affected.

march Korda added a comment - 19/Dec/07 03:49 AM
Patch attached.

search_text needs to URL encoding correctly.
But the value of the range to 0x80 to 0xFF did not care
when processing each byte in URL encoding process.
Then A sign extend is the cause,
the result became strange when MSB(Most Significant Bit) is 1.
@see below:
http://en.wikipedia.org/wiki/Sign_extension
http://ja.wikipedia.org/wiki/%E7%AC%A6%E5%8F%B7%E6%8B%A1%E5%BC%B5

Example:
In the case of "char c = 0x81",
if increases the word length to 32 like "static_cast<U32>(c)"
is set to 0xFFFFFF81 and URL encoded result is not "%81" but "%FFFFFF81".

With the current source, URL encoding process is embedded in the search process.
But URL encoding process is a general-purpose process.
So, I divided the function into two.
(I wonder why "std::setw(2)" at "LLURI::escape" looks doesn't work?)

I think Sign extension is different with the CPU/OS architecture and compiler.
my environment is Core2Duo E6600 + WindowsXP Home SP2(32bit) + VC++2005 ExpressEdition.
We need the more test case.


march Korda added a comment - 19/Dec/07 03:52 AM
to Nock, Celierra
Yes. This is not the issue of a server but an issue of a viewer.
and should be changed from SVC to VWR.

march Korda added a comment - 20/Dec/07 07:46 PM
Replace spaces with "+" for use by Google search appliance. (I just copy it from original source),
and add some comments.

march Korda added a comment - 21/Dec/07 03:44 PM
repro 1.18.6 (2)

Alissa Sabre added a comment - 23/Dec/07 05:48 PM
I created a VWR version of this issue.

James Linden added a comment - 27/Dec/07 04:47 PM
Thank you for your help reproducing the problem and providing patches. I have fixed this for the next 1.18.6 release candidate.

I changed your patch slightly. The LLURI portion of the patch needs to be static_cast<S32>(static_cast<U8>(c)), otherwise some normal escaping does not work correctly.

I have changed llpaneldirfind.cpp to use LLURI::escape() and added a unit test for escaping the cedilla in Français.

Japanese searches like 日本語 appear to function as well.


Alexa Linden added a comment - 10/Jan/08 09:51 AM
Duplicate of VWR-4010

march Korda added a comment - 13/Jan/08 03:29 PM
repro 1.18.6 (3)

march Korda added a comment - 16/Jan/08 04:31 AM
no repro 1.18.6 (4)
It looks work in Japanese.