• All submissions to this site are governed by Second Life Project Contribution Agreement. By submitting patches and other information using this site, you acknowledge that you have read, understood, and agreed to those terms.
Issue Details (XML | Word | Printable)

Key: SVC-2005
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Critical Critical
Assignee: kelly linden
Reporter: Sean Heying
Votes: 29
Watchers: 9
Operations

If you were logged in you would be able to see more operations.
2. Second Life Service - SVC

Havok allocates too much memory and causes severe cyclical lag, especially for open space regions

Created: 03/Apr/08 02:20 AM   Updated: 27/Apr/08 08:14 PM
Return to search
Component/s: Physics
Affects Version/s: 1.20.0 Server
Fix Version/s: None

File Attachments: 1. File dilation.tiff (168 kB)

Image Attachments:

1. Beta Grid - Bismarck Sea Stress Test.jpg
(228 kB)

2. dilation.jpg
(69 kB)

3. lunata.jpg
(162 kB)
Environment:
Second Life 1.19.1 (2) Mar 21 2008 17:20:25 (Second Life Release)

You are at 201195.7, 237031.8, 720.8 in (Redacted) located at sim5585.agni.lindenlab.com (8.2.34.141:13006)
Second Life Server 1.20.0.83892

CPU: Dual i386 (Unknown) (2160 MHz)
Memory: 2048 MB
OS Version: Darwin 9.2.0 Darwin Kernel Version 9.2.0: Tue Feb 5 16:13:22 PST 2008; root:xnu-1228.3.13~1/RELEASE_I386 i386
Graphics Card Vendor: ATI Technologies Inc.
Graphics Card: ATI Radeon X1600 OpenGL Engine
OpenGL Version: 2.0 ATI-1.5.24
LLMozLib Version: [LLMediaImplLLMozLib] - 2.01.14147 (Mozilla GRE version 1.8.1.11_0000000000)
Packets Lost: 323/46177 (0.7%)
Issue Links:
Duplicate
 
Relates

Linden Lab Internal Branch: havok4-5


 Description  « Hide
Since 1.20.0 server was rolled out my early adopter region has been experiencing a new lag pattern. If you stand in one place and just wait after about 5 minutes dilation will start to become unsteady, dropping down to 0.21 with physics fps dropping to 8.

After about 1 minute the region stabilises and returns to flatline 1.0 dilation and 44.9 fps physics.

This doesn't matter how many avatars are present in the region. There are also no rezzers at all in the region.

This lag pattern was not evident during early adopter tests and nothing has been changed in the region for many weeks.

This also occurs in three other regions I have visited on different servers, if you wait you see the same pattern.

Matthew Linden was called in last night and witnessed the lag, finding nothing wrong in the debugging he did and blaming it on regions starting up due to the deploy of Havok4 to the grid. It has not cleared up despite the grid returning to normal operation.



 All   Comments   Change History      Sort Order: Ascending order - Click to sort in descending order
Sean Heying added a comment - 03/Apr/08 02:21 AM
Dilation.tiff in a lower resolution version showing the dilation pattern over a period of time.

Sean Heying made changes - 03/Apr/08 02:21 AM
Field Original Value New Value
Attachment dilation.jpg [ 15741 ]
sacha magne added a comment - 03/Apr/08 02:29 AM
Observation Confirmed at Sean's SIM and confirmed at Erin and Vaea too. Both are lowprims sims.

Sean Heying made changes - 03/Apr/08 03:21 AM
Description Since 1.20.0 server was rolled out my early adopter region has been experiencing a new lag pattern. If you stand in one place and just wait after about 5 minutes dilation will start to become unsteady, dropping down to 0.21 with physics fps dropping to 8.

After about 1 minute the region stabilises and returns to flatline 1.0 dilation and 22.9 fps physics.

This doesn't matter how many avatars are present in the region. There are also no rezzers at all in the region.

This lag pattern was not evident during early adopter tests and nothing has been changed in the region for many weeks.

This also occurs in three other regions I have visited on different servers, if you wait you see the same pattern.

Matthew Linden was called in last night and witnessed the lag, finding nothing wrong in the debugging he did and blaming it on regions starting up due to the deploy of Havok4 to the grid. It has not cleared up despite the grid returning to normal operation.

Since 1.20.0 server was rolled out my early adopter region has been experiencing a new lag pattern. If you stand in one place and just wait after about 5 minutes dilation will start to become unsteady, dropping down to 0.21 with physics fps dropping to 8.

After about 1 minute the region stabilises and returns to flatline 1.0 dilation and 44.9 fps physics.

This doesn't matter how many avatars are present in the region. There are also no rezzers at all in the region.

This lag pattern was not evident during early adopter tests and nothing has been changed in the region for many weeks.

This also occurs in three other regions I have visited on different servers, if you wait you see the same pattern.

Matthew Linden was called in last night and witnessed the lag, finding nothing wrong in the debugging he did and blaming it on regions starting up due to the deploy of Havok4 to the grid. It has not cleared up despite the grid returning to normal operation.

Dakotah Dallin added a comment - 03/Apr/08 03:59 AM
Same at region Kikai Enkai (Open Space), Islay (Open Space), Purple Coral (this is a full prim region with drops not that bad, but fps still going down to approx 20-25), see support ticket 4051-4659731.

Atashi Toshihiko added a comment - 03/Apr/08 04:19 AM
This sounds like the performance dropout 'spikes' that I was seeing during the RC tests. My Openspace sim Tangiwai would have moments of poor performance where Time Dilation dropped into the range of 0.1 to 0.5 and the physics FPS dropped to the 10-20 range. This would sometimes last only a few seconds, sometimes it was as long as 20-30 seconds, and once in a while it would take about a minute, before performance was restored.

I can confirm that since Havoc4 went mainstream, these performance dropouts continue to occur, although I'm not sure if they are happening as frequently.


Bato Brendel made changes - 03/Apr/08 05:48 AM
Link This issue is duplicated by SVC-1992 [ SVC-1992 ]
Bato Brendel made changes - 03/Apr/08 05:55 AM
Link This issue is duplicated by SVC-1974 [ SVC-1974 ]
Dytska Vieria added a comment - 03/Apr/08 07:13 AM - edited
This is happening on Mainland sim of Lunata since the Havac 4 upgrade - Time Dilation fluctuates between .45 and 1.0 every 10 SECONDS (NOT MINUTES) and Sim Time (Physics) jumps to over 40ms. This is CYCLIC - constantly occuring, there is no waiting for it to happen in Lunata. The sim is unusable with this kind of performance, before it had no problems.

Darien Caldwell made changes - 03/Apr/08 05:38 PM
Link This issue is duplicated by SVC-2034 [ SVC-2034 ]
Bato Brendel added a comment - 03/Apr/08 07:17 PM
Can everyone who has posted comment if they are talking about full 15k prim (full sim) or a openspace/void sim so far Ive only been experiencing this in openspace/voids (ie 1/4 prim shared server sims)

Sean Heying added a comment - 03/Apr/08 07:42 PM
I have seen this issue in Av Puli (Full Prim) and in Akeyo (Full Prim) the drop is not as noticable, only falling to .6 instead of 0.21 but the pattern is identical. 0.6 dilation can be walked through without really noticing it. 0.21 dilation is very noticable when you try and walk.

My region, as reported, is a void/openspace too.

Lunata does seem to be stuck in this pattern, although it might be an unrelated problem that the Havok team might need to look at,

I have spoken to Sidewinder and they are now aware of this Jira, I am told they are looking into it.


Sean Heying added a comment - 03/Apr/08 07:47 PM
Attached Dilation and physics FPS in Lunata (Mainland) Not identical pattern but noticeably affected.

Sean Heying made changes - 03/Apr/08 07:47 PM
Attachment lunata.jpg [ 15788 ]
Daniel Regenbogen added a comment - 04/Apr/08 04:06 AM - edited
2 brand new delivered Open Space SIMs (on the same server), HimAndIandUS and Innocent Boys affected as well. Regions are empty, no scripts, no objects.

kelly linden made changes - 06/Apr/08 06:46 PM
Assignee kelly linden [ kelly linden ]
kelly linden added a comment - 06/Apr/08 07:03 PM
Short version: We were preallocating too much memory for both regular and void regions. This has been addressed in the latest beta build. However, this may not address all issues - cyclic lag spikes can be a symptom of many potential problems and this may only be one.

Havok4 allows for pre-allocating memory specifically for the physics system. This doesn't actually limit the amount of memory physics may use (we do that separately), however there is a slight performance hit for allocating more memory (and releasing it later). We were a little over generous in our initial memory allocation AND were allocating the same amount for Open Space regions as normal regions. For the version currently on the beta grid we have dropped the initial allocation on regular regions from 128mb to 64mb, and void regions down to 16mb. It took me quite a bit of work (read: griefing my own sim) to bump past 64mb and even 16mb is more than enough for quite a bit

The problem with over allocating is that it starts to bump the processes into swap (where some memory is stored on disk) which is a huge performance hit.. This hit the void regions the most as they are sharing with 4x as many other regions.


Bato Brendel added a comment - 07/Apr/08 07:29 AM - edited
Hi Kelly and Sidewinder, well hopefully this was the primary cause of this if it is only that I would like to report from the beta grid it appears that is has done the trick (from what I can see) I just filled a void sim (Im hoping this are voids here) Locations: Bismarck Sea, Bougainville Strait, New Georgia Sound and the few other open water sims you have for us out here. Basically I completely filled Bismarck Sea with 150+% full of physical round prims. Now thats a stress test. only thing that was a concern that shows up after I did this was that I was able to rez more prims then the server is 'reporting it can allow'

Bato Brendel made changes - 07/Apr/08 07:29 AM
Dytska Vieria added a comment - 07/Apr/08 12:02 PM - edited
This problem was fixed on Lunata - identified 3 fish rezzers that once removed, returned Time Dilation to nearly constant 1.00 and Sim Time (Physics) to about 0.2ms. SVC-1992 has been closed.

Sean Heying made changes - 11/Apr/08 02:14 AM
Link This issue is duplicated by SVC-2028 [ SVC-2028 ]
ice stawberry made changes - 11/Apr/08 03:18 AM
Link This issue Relates to SVC-2093 [ SVC-2093 ]
ice stawberry added a comment - 11/Apr/08 03:27 AM
Linked Wien (class 5, full sim) as the problem sounds similar

Ralf Haifisch added a comment - 12/Apr/08 04:28 PM
the problem is hitting the openspace/void very badly. regular SIMs are affected as well, but by far not as bad as openspace.

Lasagna Garfield added a comment - 12/Apr/08 04:33 PM
changed to critical - openspace are realy only open spaces, any avis on it and they are unusable...

Lasagna Garfield made changes - 12/Apr/08 04:33 PM
Priority Major [ 3 ] Critical [ 2 ]
kelly linden added a comment - 12/Apr/08 04:49 PM
I forgot to mark this 'fix pending' when I last commented.

kelly linden made changes - 12/Apr/08 04:49 PM
Status Open [ 1 ] Fix Pending [ 10001 ]
Linden Lab Internal Branch havok4-5
Cliff Commons made changes - 14/Apr/08 05:12 PM
Link This issue is duplicated by SVC-2012 [ SVC-2012 ]
Sean Heying added a comment - 18/Apr/08 08:19 AM
Confirmed fixed in Second Life Server 1.20.1.85162

There are tiny 1ms lag drops every 2 to 3 minutes, but this is normal for openspaces in my experience.


Sean Heying made changes - 18/Apr/08 08:19 AM
Status Fix Pending [ 10001 ] Closed [ 6 ]
Resolution Fixed [ 1 ]
Cliff Commons added a comment - 18/Apr/08 08:20 AM
NOT FIXED!!!

You are at 201314.0, 287661.4, 23.0 in Achilles Island located at sim2167.agni.lindenlab.com (216.82.16.172:13017)
Second Life Server 1.20.1.85162

[8:07] MystiTool HUD: Achilles Island Stats Warning: Time Dilation = 0.26 (<0.40), Sim FPS = 11 (<20)
[8:10] MystiTool HUD: Achilles Island Stats Warning: Time Dilation = 0.21 (<0.40), Sim FPS = 9 (<20)
[8:11] MystiTool HUD: Achilles Island Stats Warning: Time Dilation = 0.33 (<0.40), Sim FPS = 17 (<20)
[8:12] MystiTool HUD: Achilles Island Stats Warning: Time Dilation = 0.19 (<0.40), Sim FPS = 9 (<20)
[8:13] MystiTool HUD: Achilles Island Stats Warning: Time Dilation = 0.21 (<0.40), Sim FPS = 9 (<20)
[8:14] MystiTool HUD: Achilles Island Stats Warning: Time Dilation = 0.27 (<0.40), Sim FPS = 11 (<20)

The stats window shows this also..just that MystiTool is easier to cut and paste. Note that this is not even set to go off until fps drops under 20...and even then it is only checking once in a while, so it's not catching all the time that the sim is out of whack.

Also, the main parcel is now showing over 485 selected / sat upon, while none actually are. I find this interesting, as it is more than were showing like that before the update and sim restart!!! To be honest, I was sort of expecting that update to clear the selected / sat upon and at least reset it back to 0, even if that one wasn't fixed yet.


Cliff Commons added a comment - 18/Apr/08 08:21 AM
See my comment above this. While I was typing it the issue closed.

Cliff Commons made changes - 18/Apr/08 08:21 AM
Resolution Fixed [ 1 ]
Status Closed [ 6 ] Reopened [ 4 ]
Cliff Commons added a comment - 18/Apr/08 08:29 AM - edited
Possible reason discovered. There are 4 openspace sims in a line where I am. I have always assumed that all four were the ones on the same processor. At least 1 of those is still not upgraded. I would guess that if any one has that kind of malfunction the other 3 will also be affected, so I will withhold judgement until the entire rollout has been completed.

Never mind. Just because they are in a line does not mean the 4 are on the same processor. All I needed to do was look at the server name to see they were different. Just clutching at straws I guess, unable to believe that at least my sims problem was not solved in the update. My first "NOT FIXED" comment is still valid.


kelly linden added a comment - 18/Apr/08 08:48 AM - edited
I am going to be brutal on keeping this issue closed.

There was a big bug in the memory allocation that more than doubled the memory usage of open space regions and caused almost all cases of this. I did hedge on my previous comments that because this jira item describes a vague symptom that could be the symptom of many, many issues it may not 100% fix everyone. However! It fixed the majority of the issues, the memory usage on void regions is back within the realm of normal.

I can not, and will not, maintain a single bug for every bug that has a symptom of cyclical lag. Sorry. This bug is now specifically for the memory issue and is fixed.

Please feel free to create a new bug if you have the updated version and are still experiencing cyclical lag issues so that issue can be investigated separately. If you really want do please feel free to link it or add it in a comment so others who come here can find the new issue.

I am going to change the title of this bug to "Havok allocates too much memory and causes severe cyclical lag, especially for open space regions"

Also, all regions on the same host will be upgraded at the same time, which means that the memory bloat won't effect updated regions. If you are still seeing severe cyclical lag the problem is somewhere else.

There are some other performance issue jira's, for example SVC-2172

Thanks.


kelly linden made changes - 18/Apr/08 08:48 AM
Status Reopened [ 4 ] Resolved [ 5 ]
Resolution Fixed [ 1 ]
kelly linden made changes - 18/Apr/08 08:49 AM
Summary Cyclical severe lag with dilation dropping to 0.21 and Physics FPS to 8 Havok allocates too much memory and causes severe cyclical lag, especially for open space regions
Cliff Commons added a comment - 18/Apr/08 09:21 AM
So blindingly upset I am unable to think straight!!

First you fold my original ticket SVC-2012 into this, then tell me that I am posting in the wrong place? You people are effing insane. I've been talking up with all my friends about what a great job you have been doing trying to roll out the fix for our problems. You may have fixed something..but not what I've reported, and you were the ones who decided it was the same issue, not me. So I'm the one that must be crazy, to continue to even deal with you blanking blanking blanks. Don't tell me everything is better, you've fixed the problem. It isn't, and you haven't, and your condescending blather to the contrary does not make it so!!!


kelly linden added a comment - 18/Apr/08 09:42 AM
I see that you commented it was a duplicate and changed its resolution accordingly. I could believe that was after talking with us. I would even agree with that action - at the time we knew we had a big problem that was causing that symptom and the chances were very high the issue was the same. We now know it is not, because you are still experiencing the issue on the updated version.

The original bug should probably be unlinked and re-opened in this case, with comments specifically to the effect that the issue still exists in the latest simulator version.

Perhaps this is a point of contention but jira is not a tool for user benefit. It is a tool for developer benefit - so we can classify, document and track bugs through to resolution. Having a single bug for a vague symptom is easy for users (hey, the bug you want is already here) but impossible for devs and QA who need a way to track specific issues and specific fixes.

I don't mean to be insulting or condescending. What I am trying to do is make sure that your issue does get addressed - which it will not if it stays connected to this one.

This bug is fixed, I am sorry that it isn't your bug.

Aside from all the reasoning above - everyone involved here is human and mistakes are possible, and judgment calls are always being made. As I said before it was a judgment call to link your issue to this one because the symptoms matched. Now that we know they aren't the same issue we can move forward.


kelly linden made changes - 18/Apr/08 10:21 AM
Link This issue is duplicated by SVC-2012 [ SVC-2012 ]
kelly linden made changes - 18/Apr/08 10:21 AM
Link This issue is related to by SVC-2012 [ SVC-2012 ]
Cliff Commons added a comment - 18/Apr/08 11:54 AM
From Kelly Linden:

"Perhaps this is a point of contention but jira is not a tool for user benefit."

Well..since the JIRA is not a tool for user benefit, might I suggest to all the users following this little match that they, like me, no longer bother to use it.

If they are going to admit that what we think is important is of no relevance to them I see no reason to help them improve their product, and they can sit on their hands watching people flow away as soon as a product that actually competes in their niche arrives in the marketplace.

Develop that.


kelly linden added a comment - 18/Apr/08 12:16 PM
Jira is a tool to help developers find and fix the bugs that effect residents, it is a way for residents to help developers. Fixing those bugs helps residents, but tracking them in jira really doesn't. Except in any way that it helps developers fix them faster. I don't mean to imply at all that what residents think is important is of no relevance to us, quite the contrary.

Again, I am sorry that the fix we thought would fix your issue turns out to not fix your issue. I have re-opened your original issue, changed the link to 'relates to' and assigned it to myself.

As far as jira is concerned I don't know any other way to help you.


Sean Heying added a comment - 18/Apr/08 05:02 PM
Respecfully Ciff, as the submitter I am the one who has researched and watched the bug I reported, and as such I am also in a case to declare a victory situation for the bug I submitted, not declaring victory would create a situation like SVC-85 where my Alt's report was taken, expanded and became a catchall for every similar bug.

That is not good for Jira, the developers or the other submitters.

It is a pity that your original bug got closed as duplicate, however, reopening that once it was clear they were two divergent issues would have been smarter. Certainly the tirade of abuse you launched was not warranted.


ZATZAi Asturias made changes - 27/Apr/08 08:09 PM
Link This issue is duplicated by SVC-2219 [ SVC-2219 ]
ZATZAi Asturias made changes - 27/Apr/08 08:14 PM
Link This issue is duplicated by SVC-2219 [ SVC-2219 ]
ZATZAi Asturias made changes - 27/Apr/08 08:14 PM
Link This issue is related to by SVC-2219 [ SVC-2219 ]
Sue Linden made changes - 13/Nov/08 12:10 PM
Workflow jira-2007-12-22a [ 54037 ] jira-2008-11-14 [ 82359 ]
Sue Linden made changes - 13/Nov/08 04:43 PM
Workflow jira-2008-11-14 [ 82359 ] jira-2008-11-14a [ 91085 ]