|
|
|
Link to an issue SVC-3717 I've posted.
You don't need to post when your linking an issue, it shows up in the "Change History" and "All" views
This was directed to my attention tonight after a frustrating time trying to get non-english data from database. My test, so far, is small so I can validate the techniques. Elements:
1) SQL Server database with ntext fields containing non-english text. Important notes: first,SQL Server stores data as UCS-2 (ick). However, with the addition of codepage information, you can have SQL Server cough up UTF-8 (<% Session.Codepage=65001 %>). This works. So, in my database is "貢獻 L$@0 到彩池" which is served up via the ASP page to a web-browser as plain text (no added HTML). The following is shown in the browser: "è²¢ç» L$@0 åˆ°å½©æ± ã€‚" However, if I tell the browser that it's UTF-8, it's interpreted and looks fine. That said, according to the wiki, the BODY coming in from an llHTTPRequest, by default, is supposed to be treated as UTF-8. However, no matter how I've been able to prod the system, I just get the undecoded UTF-8 ( "è²¢ç» L$@0 åˆ°å½©æ± ã€‚" ). Basically, I'd say this was broke, but I really can't believe I'm the first to find this. You need to set charset=utf-8 in the HTTP header, as described in http://www.w3.org/International/O-HTTP-charset
Confirmed. Using ASP, this change made all the difference:
response.ContentType="text/plain;charset=utf-8" I've learned a lot about this whole process. I hope to put some info in the wiki about it. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Detailed test done by Alissa may help?