Skip to content
This repository has been archived by the owner on Mar 28, 2024. It is now read-only.

[BUG-226911] Region crossing failures after 2019-04-26 #5353

Open
sl-service-account opened this issue May 1, 2019 · 7 comments
Open

[BUG-226911] Region crossing failures after 2019-04-26 #5353

sl-service-account opened this issue May 1, 2019 · 7 comments

Comments

@sl-service-account
Copy link

sl-service-account commented May 1, 2019

What just happened?

Vehicle with one rider crossed a region boundary. Vehicle crossed, but avatar did not. Avatar was stuck, with controls still taken. Moving the vehicle with viewer Edit back across the region boundary allowed sitting the avatar on the vehicle and continuing.

Seen 3 times, twice with a motorcycle (my code) and once with a helicopter (coded by others.) These are recent events, after the fixes to sim code for teleport and region crossing problems. This problem is infrequent, perhaps one per 200 region crossings.

Failed trips, from bike logging:

2019-04-26 23:18:46 ending in region Teal.
2019-04-26 23:40:07 ending in region Jaffee

Failed trip with helicopter:

2019-04-29 21:32 ending in region Haploa.

What were you doing when it happened?

Recently, region crossing failure rates seem to have improved, so I decided to try a long helicopter trip, from Bellasaria to northern Heterocera and back. 286 region crossings, two hours, one failure. Things are looking up.

What were you expecting to happen instead?

Normal region crossing.

Other information

Being able to make a trip that long is real progress. I'm also seeing success with two avatars on a vehicle, which used to fail after only a few region crossings. Christi Charon reports hosting a 30-person driving event and having no sim crossing failures. Thanks, dev team.

I'm speculating, but I think you may have fixed one type of region crossing failure, the kind where the avatar ends up way out of position and the situation is not recoverable at all. That's great.

This other type of failure seems to be relatively well defined. It's easy to detect from the vehicle script - if the vehicle has a sitter, use llGetObjectDetails on the sitter's key and ask for OBJECT_ROOT, which should be the vehicle. During a sim crossing, that request normally returns NULL_KEY briefly. If it continues to return NULL_KEY for a length of time (I use a 30 second timeout) my bikes log a region crossing failure and shut down. This is worth trapping sim side, if you're not doing that already.

My bikes stop in this situation. Many vehicles will run away, out of control. The helicopter eventually went off-world (from well inland on Heterocera) and a popup for that appeared.

Again, there seems to have been some real progress on this problem. Please keep at it until it Just Works and we don't have to worry about region crossings any more. Again, thanks.

Attachments

Original Jira Fields
Field Value
Issue BUG-226911
Summary Region crossing failures after 2019-04-26
Type Bug
Priority Unset
Status Accepted
Resolution Triaged
Reporter animats (animats)
Created at 2019-05-01T02:51:27Z
Updated at 2020-10-08T17:46:44Z
{
  'Build Id': 'unset',
  'Business Unit': ['Platform'],
  'Date of First Response': '2020-10-07T17:56:26.978-0500',
  "Is there anything you'd like to add?": "Being able to make a trip that long is real progress. I'm also seeing success with two avatars on a vehicle, which used to fail after only a few region crossings. Christi Charon reports hosting a 30-person driving event and having no sim crossing failures. Thanks, dev team. \r\n\r\nI'm speculating, but I think you may have fixed one type of region crossing failure, the kind where the avatar ends up way out of position and the situation is not recoverable at all. That's great.\r\n\r\nThis other type of failure seems to be relatively well defined. It's easy to detect from the vehicle script - if the vehicle has a sitter, use llGetObjectDetails on the sitter's key and ask for OBJECT_ROOT, which should be the vehicle. During a sim crossing, that request normally returns NULL_KEY briefly. If it continues to return NULL_KEY for a length of time (I use a 30 second timeout) my bikes log a region crossing failure and shut down. This is worth trapping sim side, if you're not doing that already. \r\n\r\nMy bikes stop in this situation. Many vehicles will run away, out of control. The helicopter eventually went off-world  (from well inland on Heterocera) and a popup for that appeared.\r\n\r\nAgain, there seems to have been some real progress on this problem. Please keep at it until it Just Works and we don't have to worry about region crossings any more. Again, thanks.",
  'ReOpened Count': 0.0,
  'Severity': 'Unset',
  'System': 'SL Simulator',
  'Target Viewer Version': 'viewer-development',
  'What just happened?': 'Vehicle with one rider crossed a region boundary. Vehicle crossed, but avatar did not. Avatar was stuck, with controls still taken.  Moving the vehicle with viewer Edit back across the region boundary allowed sitting the avatar on the vehicle and continuing.\r\n\r\nSeen 3 times, twice with a motorcycle (my code) and once with a helicopter (coded by others.) These are recent events, after the fixes to sim code for teleport and region crossing problems. This problem is infrequent, perhaps one per 200 region crossings.\r\n\r\nFailed trips, from bike logging:\r\n\r\n2019-04-26 23:18:46 ending in region Teal.\r\n2019-04-26 23:40:07 ending in region  Jaffee\r\n\r\nFailed trip with helicopter:\r\n\r\n2019-04-29  21:32 ending in region Haploa.\r\n\r\n\r\n',
  'What were you doing when it happened?': 'Recently, region crossing failure rates seem to have improved, so I decided to try a long helicopter trip, from Bellasaria to northern Heterocera and back. 286 region crossings, two hours, one failure. Things are looking up.',
  'What were you expecting to happen instead?': 'Normal region crossing.',
}
@sl-service-account
Copy link
Author

animats commented at 2020-04-15T21:02:44Z

A year later, this bug is still present. There's a milder form  of "vehicle crossed but avatar did not" which may be easier to study.

Entering a busy region for the first time in 10-20 minutes, the vehicle gets in, and the avatar entry is delayed. Some vehicles stop (they're not getting keyboard events) and others keep going with no user control until crashing into something.

My own bikes detect this situation by checking

llGetObjectDetails(avatar,[OBJECT_POS, OBJECT_ROOT]);

 

for each sitter. If the OBJECT_ROOT of the avatar isn't the vehicle, we're in the "half unsit" situation. This happens briefly at almost every region crossing, but if it persists, something will break. My bikes display the CROSSSLOW message shown above when this happens. If it lasts longer than 30 seconds, the bike shuts off and an error is reported.

This situation comes up regularly with certain overloaded regions. If it resolves quickly, the user can go on driving. If it takes too long, the region crossing will probably fail with the usual "half-unsit" bug.

Interestingly, editing the vehicle back across the region boundary to the stuck avatar's region will often recover the situation. The vehicle and avatar then reconnect. So a possible response to such situations is to move the vehicle back to before the region crossing.

The SL system is trying to do an atomic transaction in a distributed system, getting the vehicle and all avatars from one sim to another. Sometimes it doesn't work. The traditional solution to such problems is to back out the transaction and try again. Open Simulator has code to do that, although I haven't tested it. It's an approach worth considering.

 

@sl-service-account
Copy link
Author

animats commented at 2020-06-16T20:44:06Z

Attn: Mazidox Linden.

Per discussion at Server User Group today, attached is a list of the logged region crossing delays for a test vehicle I use. These show the slow and failed region crossings of the last 60 days.  Here's an excerpt; the full data is in an attachment.

A CROSSSLOW event means there was a delay of at least 3 seconds at a region crossing. Multiple CROSSSLOW events indicate a longer delay. This excerpt shows a 21 second delay which was recovered. A CROSSFAIL event indicates that 30 seconds went by without the avatar catching up to the vehicle. More than 30 seconds of delay rarely ends with a successful recovery.

Delay is measured from when the vehicle arrived to when the avatar arrived, as seen by the avatar showing the vehicle as its parent. (During region crossings, the avatar is a child prim of the vehicle, but the vehicle is not the parent of the avatar, and scripts can check this.) My own avatar for this is a classic avatar with some mesh clothing and few scripts.

These are interesting because they're a well defined situation that is usually recovered. This looks like some retry mechanism in action. Packet retransmission?

While most of these were recoverable, 10 full region crossing failures were logged in this data.

 

+---------------------+------------------+------------------------------------+-----------+------------------+---------+---------+
| event_time          | owner_name       | object_name                        | eventtype | Region   | X      | Y       |
+---------------------+------------------+------------------------------------+-----------+------------------+---------+---------+
| 2020-04-15 23:04:06 | animats Resident | Smooth Scrambler - V2.7.1 (REVIEW) | CROSSSLOW | Horisme | 3.50849 | 176.052 |
| 2020-04-15 23:04:09 | animats Resident | Smooth Scrambler - V2.7.1 (REVIEW) | CROSSSLOW | Horisme | 3.50849 | 176.052 |
| 2020-04-15 23:04:12 | animats Resident | Smooth Scrambler - V2.7.1 (REVIEW) | CROSSSLOW | Horisme | 3.50849 | 176.052 |
| 2020-04-15 23:04:15 | animats Resident | Smooth Scrambler - V2.7.1 (REVIEW) | CROSSSLOW | Horisme | 3.50849 | 176.052 |
| 2020-04-15 23:04:18 | animats Resident | Smooth Scrambler - V2.7.1 (REVIEW) | CROSSSLOW | Horisme | 3.50849 | 176.052 |
| 2020-04-15 23:04:21 | animats Resident | Smooth Scrambler - V2.7.1 (REVIEW) | CROSSSLOW | Horisme | 3.50849 | 176.052 |
| 2020-04-15 23:04:24 | animats Resident | Smooth Scrambler - V2.7.1 (REVIEW) | CROSSSLOW | Horisme | 3.50849 | 176.052 |
| 2020-04-15 23:04:27 | animats Resident | Smooth Scrambler - V2.7.1 (REVIEW) | CROSSSLOW | Horisme | 3.50849 | 176.052 |

@sl-service-account
Copy link
Author

Chorazin Allen commented at 2020-10-07T22:56:27Z

I've been looking into basically the same thing with a yacht motor script. Another thing that probably shouldn't be happening is that a CHANGED_LINK event may get sent around right after a region crossing.

The cause seems to be the avatar detaching from the linkset temporarily whilst vehicle and avatar enter the new region. Sometimes the script won't find any delta in the expected linkset (the avatar has arrived before the script gets to react to the event and check the linkset) or will find that the avatar is no longer reported as being on the linkset.

Could CHANGE_LINKED be suppressed for the case for an object region crossing until all its avatars have also arrived (or definitely failed to) ?

@sl-service-account
Copy link
Author

animats commented at 2020-10-08T01:03:06Z, updated at 2020-10-08T01:03:55Z

You're right. There are bogus CHANGED_LINK events at troubled region crossings. I just checked my log files. In the logs (see bikelog01.txt), the RIDERCOUNT entry shows when there's a CHANGED_LINK event. Usually that happens when an avatar gets on or off, and is followed by a SITTER log entry showing the avatar involved. But sometimes, and usually when a region crossing is slow, there's an unexpected CHANGED_LINK event. The code for the vehicle then checks the prims for sitters and lists the number of sitting avatars. The sitter info and count, based on llAvatarOnLinkSitTarget, does not change.

This is mostly harmless.

There's a separate situation where a region crossing fails, the avatar is logged out, and, upon relogging, tries to reuse the bike still sitting there. Sometimes the avatar is still listed as a "sitter", because a proper unsit never took place.  It's possible for a vehicle to become un-sittable if all seats are full of long-gone sitters. A region restart seems to clear that. This is only a problem with the aftermath of a failed region crossing.

@sl-service-account
Copy link
Author

Chorazin Allen commented at 2020-10-08T09:08:42Z

I've got similar code that keeps track of the linkset prim only count and overall prim+sitter count and compares it on each CHANGED_LINK. On slightly delayed crossings I get CHANGED_LINK and nothing is actually different by the time I compare the prim only and prim+sitter totals to what they were before. On more delayed crossings the prim+sitter count has reduced and is now equal to the prim only count (assuming there's just the one avatar on the vehicle).

In these instances, it usually seems to be the case that the vehicle has arrived before the rider (llGetAgentSize() reports ZERO_VECTOR for the agent).

On some occasions, the effect is pronounced enough that AVSitter (which manages the sitting+poses; my scripts cover the engine and everything else so I need to keep track of where/if the driver is sitting) thinks the same avatar has sat down again and gives them a new pose menu right after a region crossing.

So, I would rate it more significant than mostly harmless - there's a window that varies with the delay of the avatar arriving in the region after their vehicle around which the avatar is reported to have detached from the linkset and then rejoined it. 

@sl-service-account
Copy link
Author

Chorazin Allen commented at 2020-10-08T09:12:16Z

Quick note on my testing environment: entering regions Sondraman or Osbourne Beach. The first is usually the most heavily loaded. I'm also using a script-heavy avatar which will be adding to the time it takes the avatar to be transferred.

@sl-service-account
Copy link
Author

animats commented at 2020-10-08T17:46:45Z

Right. It's not that the whole thing is harmless, it's that the extra CHANGED_LINK message may be.

"There's a window that varies with the delay of the avatar arriving in the region after their vehicle around which the avatar is reported to have detached from the linkset and then rejoined it."

Exactly. That's what's happening, and we see it in scripts. Region crossings take time. During that time, the SL world keeps running. This creates problems. Out of control vehicles are the biggest one. The user can't control the vehicle while the avatar is temporarily detached. So the script has to stop it to prevent a runaway, during which the vehicle can hit something or cross a second region boundary. If a second region crossing starts while the first one has not yet completed, the system can't handle that and the user usually has to log out.

Unless region crossings can be made fast enough and reliable enough that scripts can ignore region crossing delays, scripts have to deal with what happens during region crossings while the avatars catch up.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant