[BUG-226911] Region crossing failures after 2019-04-26 #5353
Comments
animats commented at 2020-04-15T21:02:44Z A year later, this bug is still present. There's a milder form of "vehicle crossed but avatar did not" which may be easier to study. Entering a busy region for the first time in 10-20 minutes, the vehicle gets in, and the avatar entry is delayed. Some vehicles stop (they're not getting keyboard events) and others keep going with no user control until crashing into something. My own bikes detect this situation by checking llGetObjectDetails(avatar,[OBJECT_POS, OBJECT_ROOT]);
for each sitter. If the OBJECT_ROOT of the avatar isn't the vehicle, we're in the "half unsit" situation. This happens briefly at almost every region crossing, but if it persists, something will break. My bikes display the CROSSSLOW message shown above when this happens. If it lasts longer than 30 seconds, the bike shuts off and an error is reported. This situation comes up regularly with certain overloaded regions. If it resolves quickly, the user can go on driving. If it takes too long, the region crossing will probably fail with the usual "half-unsit" bug. Interestingly, editing the vehicle back across the region boundary to the stuck avatar's region will often recover the situation. The vehicle and avatar then reconnect. So a possible response to such situations is to move the vehicle back to before the region crossing. The SL system is trying to do an atomic transaction in a distributed system, getting the vehicle and all avatars from one sim to another. Sometimes it doesn't work. The traditional solution to such problems is to back out the transaction and try again. Open Simulator has code to do that, although I haven't tested it. It's an approach worth considering.
|
animats commented at 2020-06-16T20:44:06Z Attn: Mazidox Linden. Per discussion at Server User Group today, attached is a list of the logged region crossing delays for a test vehicle I use. These show the slow and failed region crossings of the last 60 days. Here's an excerpt; the full data is in an attachment. A CROSSSLOW event means there was a delay of at least 3 seconds at a region crossing. Multiple CROSSSLOW events indicate a longer delay. This excerpt shows a 21 second delay which was recovered. A CROSSFAIL event indicates that 30 seconds went by without the avatar catching up to the vehicle. More than 30 seconds of delay rarely ends with a successful recovery. Delay is measured from when the vehicle arrived to when the avatar arrived, as seen by the avatar showing the vehicle as its parent. (During region crossings, the avatar is a child prim of the vehicle, but the vehicle is not the parent of the avatar, and scripts can check this.) My own avatar for this is a classic avatar with some mesh clothing and few scripts. These are interesting because they're a well defined situation that is usually recovered. This looks like some retry mechanism in action. Packet retransmission? While most of these were recoverable, 10 full region crossing failures were logged in this data.
|
Chorazin Allen commented at 2020-10-07T22:56:27Z I've been looking into basically the same thing with a yacht motor script. Another thing that probably shouldn't be happening is that a CHANGED_LINK event may get sent around right after a region crossing. The cause seems to be the avatar detaching from the linkset temporarily whilst vehicle and avatar enter the new region. Sometimes the script won't find any delta in the expected linkset (the avatar has arrived before the script gets to react to the event and check the linkset) or will find that the avatar is no longer reported as being on the linkset. Could CHANGE_LINKED be suppressed for the case for an object region crossing until all its avatars have also arrived (or definitely failed to) ? |
animats commented at 2020-10-08T01:03:06Z, updated at 2020-10-08T01:03:55Z You're right. There are bogus CHANGED_LINK events at troubled region crossings. I just checked my log files. In the logs (see bikelog01.txt), the RIDERCOUNT entry shows when there's a CHANGED_LINK event. Usually that happens when an avatar gets on or off, and is followed by a SITTER log entry showing the avatar involved. But sometimes, and usually when a region crossing is slow, there's an unexpected CHANGED_LINK event. The code for the vehicle then checks the prims for sitters and lists the number of sitting avatars. The sitter info and count, based on llAvatarOnLinkSitTarget, does not change. This is mostly harmless. There's a separate situation where a region crossing fails, the avatar is logged out, and, upon relogging, tries to reuse the bike still sitting there. Sometimes the avatar is still listed as a "sitter", because a proper unsit never took place. It's possible for a vehicle to become un-sittable if all seats are full of long-gone sitters. A region restart seems to clear that. This is only a problem with the aftermath of a failed region crossing. |
Chorazin Allen commented at 2020-10-08T09:08:42Z I've got similar code that keeps track of the linkset prim only count and overall prim+sitter count and compares it on each CHANGED_LINK. On slightly delayed crossings I get CHANGED_LINK and nothing is actually different by the time I compare the prim only and prim+sitter totals to what they were before. On more delayed crossings the prim+sitter count has reduced and is now equal to the prim only count (assuming there's just the one avatar on the vehicle). In these instances, it usually seems to be the case that the vehicle has arrived before the rider (llGetAgentSize() reports ZERO_VECTOR for the agent). On some occasions, the effect is pronounced enough that AVSitter (which manages the sitting+poses; my scripts cover the engine and everything else so I need to keep track of where/if the driver is sitting) thinks the same avatar has sat down again and gives them a new pose menu right after a region crossing. So, I would rate it more significant than mostly harmless - there's a window that varies with the delay of the avatar arriving in the region after their vehicle around which the avatar is reported to have detached from the linkset and then rejoined it. |
Chorazin Allen commented at 2020-10-08T09:12:16Z Quick note on my testing environment: entering regions Sondraman or Osbourne Beach. The first is usually the most heavily loaded. I'm also using a script-heavy avatar which will be adding to the time it takes the avatar to be transferred. |
animats commented at 2020-10-08T17:46:45Z Right. It's not that the whole thing is harmless, it's that the extra CHANGED_LINK message may be. "There's a window that varies with the delay of the avatar arriving in the region after their vehicle around which the avatar is reported to have detached from the linkset and then rejoined it." Exactly. That's what's happening, and we see it in scripts. Region crossings take time. During that time, the SL world keeps running. This creates problems. Out of control vehicles are the biggest one. The user can't control the vehicle while the avatar is temporarily detached. So the script has to stop it to prevent a runaway, during which the vehicle can hit something or cross a second region boundary. If a second region crossing starts while the first one has not yet completed, the system can't handle that and the user usually has to log out. Unless region crossings can be made fast enough and reliable enough that scripts can ignore region crossing delays, scripts have to deal with what happens during region crossings while the avatars catch up. |
What just happened?
Vehicle with one rider crossed a region boundary. Vehicle crossed, but avatar did not. Avatar was stuck, with controls still taken. Moving the vehicle with viewer Edit back across the region boundary allowed sitting the avatar on the vehicle and continuing.
Seen 3 times, twice with a motorcycle (my code) and once with a helicopter (coded by others.) These are recent events, after the fixes to sim code for teleport and region crossing problems. This problem is infrequent, perhaps one per 200 region crossings.
Failed trips, from bike logging:
2019-04-26 23:18:46 ending in region Teal.
2019-04-26 23:40:07 ending in region Jaffee
Failed trip with helicopter:
2019-04-29 21:32 ending in region Haploa.
What were you doing when it happened?
Recently, region crossing failure rates seem to have improved, so I decided to try a long helicopter trip, from Bellasaria to northern Heterocera and back. 286 region crossings, two hours, one failure. Things are looking up.
What were you expecting to happen instead?
Normal region crossing.
Other information
Being able to make a trip that long is real progress. I'm also seeing success with two avatars on a vehicle, which used to fail after only a few region crossings. Christi Charon reports hosting a 30-person driving event and having no sim crossing failures. Thanks, dev team.
I'm speculating, but I think you may have fixed one type of region crossing failure, the kind where the avatar ends up way out of position and the situation is not recoverable at all. That's great.
This other type of failure seems to be relatively well defined. It's easy to detect from the vehicle script - if the vehicle has a sitter, use llGetObjectDetails on the sitter's key and ask for OBJECT_ROOT, which should be the vehicle. During a sim crossing, that request normally returns NULL_KEY briefly. If it continues to return NULL_KEY for a length of time (I use a 30 second timeout) my bikes log a region crossing failure and shut down. This is worth trapping sim side, if you're not doing that already.
My bikes stop in this situation. Many vehicles will run away, out of control. The helicopter eventually went off-world (from well inland on Heterocera) and a popup for that appeared.
Again, there seems to have been some real progress on this problem. Please keep at it until it Just Works and we don't have to worry about region crossings any more. Again, thanks.
Attachments
Original Jira Fields
The text was updated successfully, but these errors were encountered: