|
|
|
[
Permlink
| « Hide
]
Philip Linden added a comment - 30/Jun/09 11:09 AM
If you are interesting in doing this task as a contract for hire, reply with interest on this thread or on SL-DEV. Linden would be willing to pay for contract development for this feature.
The big problem I see with this is that it is going to introduce both a huge load on the part of google(I'm sure they can handle it, but it seems like its a bit unfair to dump hundreds of thousands of queries for language translation on their site an hour.) as well as a large latency time between when chat shows up and when it has a translation.
While I'm sure it not feasible to try and implement client side translation without implementing your own translation engine(are there any open source ones? I have not checked yet.) it seems the far better solution than hitting up web requests for every line of chat that hits the user. We also run into the problem of mixed native language input in chat, requiring an engine that can guess or pick the language being used by words and context, something the google api is not capable of at the moment. I've regularly seen conversations of no less than THREE languages happening at once within the same 20m chat range. We could allow the user to pick their language and transfer that with the basic avatar that is represented in world, possibly as a NVPair, but would have to have a default for unsupporting clients that do not yet have the feature. This would eliminate the need to identify text language based on input, but does not solve the problem of test coming from objects. It seems that any text should have a right click menu that has "Translate this from: ..." with a list of languages that are supported as a source. The upside of having everyone have this feature in all clients, is that you have no need for outgoing translation like most current solutions have, as both sides have translation happening. To summarize: • Solution to handle the pure volume of text that would regularly be translated. Undesired feature: sending every word I type to a spy service.
I and many others would simply disable this if it leaked information to a rathole like Google, etc. @Allen Kerensky While of course we wouldn't want our text chat being recorded, this feature will benefit many people. And as for your chat being recorded, it already is. Linden Lab have logs of all chat, IM, local chat, channels etc. They will have no more information then they did before as far as I am aware with this.
Philip, I want on the list of interested companies. Working on this would be huge fun.
This JIRA appears to intended for contractors wanting to bid on the development. Yet http://jira.secondlife.com/browse/SVC-2299 IMHO, this is sorely needed because an in-viewer translator can do things no scripted translator can do: Privacy is improved by using an anonymous 3rd party service. There would be fewer reasons for a prim based HTTP system to store your chat, even for short duration. It could translate note cards, which is impossible with scripted translators. It could translate web page links, which is impossible with scripted translators. It could translate IM's, which is impossible with scripted translators. It could be a huge help to text-to-speech screen readers. It can easily eliminate the 'foreign' chat and only show chat in the users chosen language. This would be a huge help for the sight-challenged and eyes-busy, hands busy people like me that are washing dishes while trying not to splash the laptop keyboard, again. As an addendum to the design, we have a scripting call allowing access to the language of chat spoken by an avatar...
http://wiki.secondlife.com/wiki/LlGetAgentLanguage This call should be used to then pass the appropriate language to the translation service. If I could vote for which company gets a bid, I would vote for Ferd Frederix. I have used his inworld translator, based on Google's, and which legally bears Google's name. Works great and I recommend it often.
I am thrilled that Helen Keller Day brought this to the attention of LL. All who participated and worked so hard on HKD should be highly commended. I also would like to see Ferd included on this project.
Actually, I believe it was Ferd's translator that inspired this project.
Though, I doubt he can dedicate the time to a full implementation. Blondin Linden has added Ferds Free Translator (FFT) to Help Island Staging Island for the next Help Island update. I have already offered to open source and release the FFT, for library use, after this HI rollout. Any product, and especially one with over 21,000 lines of code, should follow careful development and testing cycles. Scaling needs to be checked carefully, in stages. The multiple copies of my update and 'give' servers do not appear to be an issue at this time. I am confident Google can handle any load.
It would only be a partial fix, though. No llscript translator can solve the many issues that a built-in translator would solve. Just to clear one thing up: The FFT has written permission from Google Brand Permissions Department to say "xx's Google Translation: blah blah" in chat, and for use of their logo in specific situations. It is not affilliated with or associated with Google. whoops! Sorry if I misspoke Ferd
Machine translations are very bad sometimes. I have seen monolanguage people (people who only know one language) trust a translator blindly with very bad results.
I think its critical that the user is presented with a text box describing the problems with online translations, and how they can lead to misunderstandings (or at least amplify them). If the user refuses to accept the premisses, they should be unable to turn on the translation. Also they should be informed that turning it on, will mean that Google receives a log of all activities that the user does, and nobody knows what they will do with that data. I think before making blind assumptions about all translators, you should try out the one we are talking about.
It is quite a bit better than any other free translator currently available. This sounds great Philip! I do know there need to be more details for those that will create it (or help do that).
I think it's wise to use the translation teams to test this, they have the most experience on testing language then any one else on SL. EDIT: ow yeah, perhaps a good idea is to add text-to-voice, this way the people with poor eye sight or none at all can use SL better then before! This is something, like it or not, is going to be the future.. one of those 'Must Have' features.. and something I personally have been shouting from the roof tops for years, should become part of the system, not an add-on by some talented kind sole.
If the system was driven by an installed programme on each persons system, updated remotely for new updates, then the burden to do the translations will be much lower, same with Text to Speech. This will also releive some of the impact on the SL servers. The least amount of scripts users have to deploy, the better all round for everyone. ATM most peoples concerns are over privacy, if that is the case, get yourself a xmpp chat server and use that instead of the SL Client, for those times when you need to have better control over what you say. I agree, its a realy good idia for all person, if you include the IM/private too, well I'm very impressive by snowglobe, i can help you in french any moment
I totally agree Philip, this is a much needed add-on and something that will help all aspects of the community.
I would like to emphasise some points made earlier:
Thank you for the JIRA, I'm voting for this and encouraging others to do so. Everyone worries about load on Googles resources. In truth, they get more hits from dead celebrities, and breaking news events than they would from SL. When Michael Jackson died, their (Google's) servers did not crash or slow down. And, yet we were still able to find that recipe for sugar cookies, or that article on alzheimer's, or that picture of a '70 'Cuda.
I think the concerns, while valid, are being overblown. And as far as privacy, Since 9/11, who really beleives your chats or e-mails are private anymore? So, let's add this, and make SL work a little better. That's my opinion. Discussed in open source meeting today. Probably not going to get in time for 1.1, but hopefully soon. Stay tuned....
It would be good if the "suggest a better translation" function that Google offers when you translate an url at the Google Languages site could be implemented.
If load is a problem perhaps a latchable 'translate this' button or something like that could be implemented so that when there is no need for translation it could be shut off. And the idea of translatable IMs sounds wonderful
Attached is a patch which implements automatic translation, which we obtained via an engagement with rentacoder.com. This code needs some work before it's ready to be committed to Snowglobe, but it's an excellent (and fully functional) start.
The code includes portions of jsoncpp, an external library available here: More in a bit about what needs to change to get this ready for incorporation into Snowglobe.... More information about the attachments:
Explanation of the changes made: Patch against Snowglobe 1.1.2.2584: Patch against jsoncpp: Let's shoot for getting this into Snowglobe 1.2
Things that need to happen before this is ready to be committed (summarized from our initial internal evaluation of the patch):
1. jsoncpp needs to be turned into a separate library, preferably without modification. We should consider wrapping it inside an LLJson class. Not gating, but really good to have:
One other thought: we should make sure that jsoncpp is indeed the JSON library that we want to include in the viewer. I'm kinda surprised that we didn't previously have a JSON parser in the viewer (aside from the browser component), but my initial grep through the code suggests that no there isn't.
Hello all, I've put together a v2 of this patch that sorts out some of the issues:
> 1. jsoncpp needs to be turned into a separate library, preferably without modification. We should Done. This patch includes the JsonCpp files (still with one minor fix). > 2. Error condition handling. It's not clear what will going to happen if the translate service fails Done, there's a timeout of 10 seconds now, but it still needs a unit test. I'm having trouble updating It works in the viewer though, and displays '( ? )' if no translation is received (see ChatTranslationReceiver). Should this be something like 'translation timed out', which would need translating into each language? > 3. panel_preferences_web.xml: except for en-us, Merov got errors applying the patch for all other I think you mean panel_preferences_chat.xml here (panel_preferences_web.xml was only changed for en-us). I haven't been able to get the patch v1 to fail so far. > 4. Bug? In llui.cpp: LLUI::getTranslateLanguage() converts "en-us" into "en". I think that there Ah! Yes a bug probably. I was only considering the languages mentioned in the XML UI files - would 'fr-ca' come from the system language? Is there a list of all possible language settings? This v2 of the patch just does a substr(0,2). > 5. Manual replacement of a handful of HTML entities (" etc). We should probably do this with APR Is this the Apache Portable Runtime? I'm not sure what this means .. could you point me to a header to use? lltranslate.h does replace a few things like eg. " replaces " > 6. Hardcoded values like the Google API URL need to be broken out into constants and put in an Done. So maybe only issue 5 is still outstanding. Also, maybe the default setting for the translation language combobox should be 'UI Language' instead of 'System Default' (UI::getTranslateLanguage defaults to UI::getLanguage())? I chose JsonCpp because it's in the Public Domain, is C++, and seemed to work straight away. However it's only version 0.1 and I'm concerned that I had to add a (very minor) fix to convert unicode->utf8. I'm still looking at the unit test but thought I'd better post this in the meantime. I've added a zip file containing the patched files too as this is my first patch and it might be broken. Cheers! Hi Resu, Nice work! I'm in the process of building the code now. Thanks for diving in and taking care of most of the suggestions. On the language codes, I think we really meant "IETF language tags", which you can find more helpfully described on Wikipedia: http://en.wikipedia.org/wiki/IETF_language_tag
Hi Resu, I was able to compile the Windows version of the v2 patch just fine. Very cool to see it in action!
However, I ran into problems on Mac and Linux. On Linux (Ubuntu 8.10, running gcc 4.3.2), I ran into the error in the attached lljson-error.txt ("error: extra qualification 'LLJSON::'"), which I fixed by knocking off the LLJSON:: prefixes (see lljson.h.patch). On both Mac and Linux, I ran into a problem with the unit test: error: no matching function for call to 'LLTranslate::translate_message(boost::intrusive_ptr<LLCurl::Responder>, std::string&, std::string&, std::string&)" ...which is fully spelled out in lltranslate-error-2009-09-01.txt. I haven't yet figured out how to fix that problem, other than disable the unit test. On closer inspection, LLJson is probably not how we're going to want it. What we'll want to have for LLJson is a simplified/subsetted interface to the JSON library that isn't specific to our particular use case. Currently, LLJson has one method: "parseGoogleTranslate", whereas what it should probably have a more generic wrapper/set of wrappers around Json::Value.get and Json::Reader.parse() Fix for another GCC pickiness problem attached as
This patch fails due to DOS formatting on the indra/newview/skins/default/xui/*/panel_preferences_chat.xml files in SVN.
Techwolf, I noticed that too, and powered through it. I filed
Here you go. Took me a little while due to file will not report crlf on .xml files.
techwolf@laptop /usr/portage/distfiles/svn-src/snowglobe-trunk/trunk $ find . -type f ! -path ".svn" -exec grep -Pl '\r\n' {} \; (file list deleted by Rob Linden - 12:51pm - please repost corrected list to OK here are links to latest builds. IMPORTANT: In this build the feature is off by default - you need to set preferences->text chat->translate text.
For the actual production release, my thinking is that we need to have this feature on by default if it is to have real impact, since noone would otherwise Here's the public URLs for the autotranslate builds Windows Linux Having test the new build, having the new feature on by default would be a good things.
While on it's have not have any adverse effects on the client while I ran it, it was pretty much transparent. I applied the two patches to snowglobe-trunk r2749 and get the following error:
In file included from /var/tmp/portage/games-simulation/snowglobe-trunk-6/work/linden/indra/lljson/lljson.cpp:36: /var/tmp/portage/games-simulation/snowglobe-trunk-6/work/linden/indra/lljson/lljson.h:49:8: warning: extra tokens at end of #endif directive In file included from /var/tmp/portage/games-simulation/snowglobe-trunk-6/work/linden/indra/lljson/lljson.cpp:37: /var/tmp/portage/games-simulation/snowglobe-trunk-6/work/linden/indra/lljson/lljson.h:44: error: extra qualification 'LLJSON::' on member 'm_GoogleData' /var/tmp/portage/games-simulation/snowglobe-trunk-6/work/linden/indra/lljson/lljson.h:45: error: extra qualification 'LLJSON::' on member 'm_GoogleTranslation' /var/tmp/portage/games-simulation/snowglobe-trunk-6/work/linden/indra/lljson/lljson.h:46: error: extra qualification 'LLJSON::' on member 'm_GoogleLanguage' make[2]: *** [lljson/CMakeFiles/lljson.dir/lljson.cpp.o] Error 1 make[1]: *** [lljson/CMakeFiles/lljson.dir/all] Error 2 make[1]: *** Waiting for unfinished jobs.... This is on a standalone build on gentoo amd64 box. Edit: Turns out there are THREE patches, one needs to include #3 from rob also. I downloaded the build Philip provided, and it does not work with chat bubbles.
------------- Snowglobe 1.2.0 (2708) Sep 1 2009 11:36:31 (Snowglobe Test Build) Built with GCC version 40001 You are at 170565.3, 247432.6, 25.7 in Lugh located at sim8457.agni.lindenlab.com (216.82.39.16:12035) CPU: Dual i386 (Unknown) (2600 MHz) libcurl Version: libcurl/7.19.4 OpenSSL/0.9.8k zlib/1.2.3 Hi folks, the
Any volunteers to make this work? Will this feature be expanded to include Instant messages and group chat?
How about opening the browser URL's in my viewers language?
The syntax is http://translate.google.com/translate?langpair=en where DEST is the users selected language ISO code. For example, if i put this link in a llLoadURL(), I would get this page in English http://jira.secondlife.com/browse/SNOW-93 If I add a little bit to it, I get it in Spanish: http://translate.google.com/translate?langpair=en It's a simple string manipulation. Revision 2756
Bundling jsoncpp libraries per Files affected: @rob
Please remember to update the patch cmake files to look for system or LL supplied libraries for STANDALONE and NOT STANDALONE. Hi folks, I'm playing around with this one some more today. Came to the realization that rather than making the API fit the name, it's probably smarter to make the name fit the API. So, here's the changes I plan to do some quick twiddling with:
1. Change "lljson" to "llstringtrans" ("lltranslate" is already taken.) 2. Change " LLJSON::parseGoogleTranslate" to "LLStringTrans::remoteStringTranslate" No further abstraction of jsoncpp should really be needed. Later on, we may want to abstract away our JSON implementation, but it seems premature to do that right now. @Techwolf: would you (or someone else here) be willing to write a "FindJsonCpp.cmake" file? If so, I might be able to keep standalone builds working. If not, I can't make any guarantees.
One thing I've noticed as I've been shuffling things around. In lltranslate.cpp:
string_replace_all( translation, "'","\\"); That doesn't look right. ' is "apostrophe", but it looks like it's replacing it with a backslash. Why is that? A new version of this patch. I nuked the lljson abstraction since it really wasn't buying us much other than complexity, and using the prebuilt libraries rather than compiling jsoncpp inline. The unit test is still commented out because it fails on Linux.
Win32 libraries updated with the right link flags now, and now confirmed to build on all three platforms
A few comments/questions about the patch:
And about the UI:
\u0026 is a method for producing Unicode with a 4 digit hex number, similar to HTML &H39. Thickbrick Sleaford is correct in assuming that a way to fix this one character is to string replace \u0026 with an &. It only works with this one character, though. Other chars in foreign keyboard sets will be mapped to this space, for example, \u003c is the character Less Than '<'. This character can also be sent back as \0026lt; They are equivalent.
For example, you can get back '\u0026lt;', or '\u003c', and they have the same meaning. A better method is to strscanf for the regexp pattern /\u%4d/, thus grabbing the 4 characters, and convert the 4 into ASCII, and so you get the &, and many, many more possibilities. In the Ferds''s Free Translator, I do this by converting the 4 chars to an int, then the int into a UTF-8 character. This yields the &. C++ has other, equivalent functions, such as strscanf(). After the first pass looking for the \uxxxx pattern, a similar loop is used for pass 2.. I make a second pass to look for possible /&H%\d+;/i patterns that will now appear, since the \u0026 may be replaced by the actual character '&'. An example would be '&h38;', which is Yet Another Method of printing a '&'. This second pass at the string, and decoding the hex bytes, and converting to a int, and then to UTF-8, works on all possible Hex HTML entities. But wait, there is more! Now we have a string full of '&', '&', '&' , and so on. There are a lot of HTML entities, The ones I have spotted from eyeball methods when using the FFT are < (<), > (>), & (&), and " ('); Coonvering all possible entities requires something like an llUnEscape() function. It may be possible to get back 4 digit codes, such as Ṡ, which is Ṡ, aka Ṡ a Capital S-dot. The proper method is to scan for '&H(\d+);', using a regexp. I have not seen any 4-digit entities come back from Google, but I only have one pair of eyeballs. The do use the two-digit method described above. For more HTML entities, please refer to http://webdesign.about.com/library/bl_htmlcodes.htm You should be able to repro this by just using gestures that are full of quotes and <, > and other key strokes. If they look borked, it is because HTML entities are not being supported correctly. Ferd Frederix Has anyone tried the R-L languages yet?
Arabic, Hebrew, Farsi and Chinese are Right-to-Left languages ( Chinese is optional R-L) Hebrew is a special case, the llGetAGentLanguage code returns code 'iw' (correct)', but Google uses the incorrect code 'he'. When these codes are used, the viewer must take the google result, and reverse the string and print it, at least it must do so in lsl. To do this 100% correct, so numbers don't swap, requires a much larger function. As a simple example, numbers should not reversed. They always read left to right. The Yen and and other Dollar signs, when used with numbers, should not be reversed. A number with decimal points, such as $5.00, should not be reversed. In Unicode encoding, all non-punctuation characters are stored in writing order. This means that the writing direction of characters is stored within the characters. A three-pass algorithm is used where each character is judged to be neutral, 'strong', or 'weak'. For further details, see this link: http://en.wikipedia.org/wiki/Bi-directional_text Inclusion of the full bi-directional Unicode method in Second Life would make further internationalization much easier. My post as of 7:47 am today got the entities stripped into HTML. . It should have said that Ṡ, which is & #7776; aka & #x1E60; is a Capital S-dot. You can view the src of the page if you need the exact item.
About the \uNNNN problem, I think this should be handled in jsoncpp, but for some reason it isn't.
About language codes: as I understand it the system locale is <2 letter language>-<2 letter territory> on all platforms, and since Google translate seems to only care about the language, I we can too (i.e. just use the first two chars, like is being done now). The iw/he confusion for Hebrew is a special case. "he" is the correct form AFAICT, but google accepts both (both as a "to" and a "from" language). Ok...after getting a code review from Thickbrick and noticing a couple things myself, I've made yet another rev:
1. moved getTranslateLanguage out of llui I've not incorporated all of Thickbrick's suggestions, but I think I've got enough done to check this in and iterate on it. Thoughts? One more change. Now pulling the user agent from llviewerversion.h, and also removing poorly formatted headers.
Revision 2778
Files affected: @Rob: you mention above in a comment that "&apos" shuld be replaced with "'", not "\".
That's still present the version in svn, at line 94 of newview/lltranslate.h I added a new subtask
Proposed preference tweak (with screenshot) filed as
Awesome that the SL client viewer supports dynamic translation! Kudos to you all!
I guess this makes my Universal Translator obsolete...but so it goes. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||