• All submissions to this site are governed by Second Life Project Contribution Agreement. By submitting patches and other information using this site, you acknowledge that you have read, understood, and agreed to those terms.
Issue Details (XML | Word | Printable)

Key: VWR-3878
Type: Bug Bug
Status: Reopened Reopened
Priority: Normal Normal
Assignee: Unassigned
Reporter: Nicholaz Beresford
Votes: 29
Watchers: 18
Operations

If you were logged in you would be able to see more operations.
1. Second Life Viewer - VWR

Purging cache textures causes viewer. to pause for many seconds, with heavy disk activity

Created: 17/Dec/07 07:27 AM   Updated: 22/Oct/09 10:50 AM
Return to search
Component/s: Performance
Affects Version/s: 1.20 Release Candidate, 1.18.4.3, 1.18.5.3, 1.21 Release Candidate
Fix Version/s: None

File Attachments: 1. Text File 0001_texture_cache_hiccups.v2.patch (4 kB)

Issue Links:
Duplicate
 
Relates

Last Triaged: 22/Oct/09 06:39 PM
Linden Lab Issue ID: DEV-8800
Patch attached: Patch attached


 Description  « Hide
From the (original Linden) source code (lltexturecache.cpp):

// NOTE: This may cause an occasional hiccup,
// but it really needs to be done on the control thread
// (i.e. here)
purgeTextures(false);
mDoPurge = FALSE;

These hiccups on large caches or fragmented hard drives can take anywhere from 10 seconds to as long as two minutes. Depending on how much the avatar travels, they can happen multiple times a day. They completely lock up the viewer and when taking longer periods of time, they will cause a viewer disconnect. They are one of the reasons of VWR-2051 (a couple of users who reported the hangs there say that these don't occur anymore in my builds).

The patch, instead of immediately physically deleting the files, pushes the file names into a list of files to delete and then calls a function to process the list in a time sliced way.



 All   Comments   Change History      Sort Order: Ascending order - Click to sort in descending order
Cron Stardust added a comment - 23/Dec/07 10:33 PM
I haven't been able to compile the client on my machine, but what about taking the source that does the lockups and adding a log line just before, and just after, that code executes.

IE:

// NOTE: This may cause an occasional hiccup,
// but it really needs to be done on the control thread
// (i.e. here)
llinfos << "PURGING TEXTURES!!" << llendl;
purgeTextures(false);
mDoPurge = FALSE;
llinfos << "PURGING TEXTURES DONE." << llendl;

In this way we can verify if it is this culling process that gives the issues. As soon as I can figure out how to compile sucessfully, I will try and give this problem a good whipping. (If it's not already resolved by then!)

Cron


Honey Fairweather added a comment - 10/Jan/08 07:54 PM
I'm voting for this because I have a very big and frequent problem with freezing viewer, I see dramatic disk activity when it's happening, I see the purging entry in the log Nicolaz mentions, and I hardly ever see such freezes with his client. Lindens please jump on this fix ASAP.

Gigs Taggart added a comment - 06/Feb/08 10:09 PM
This is definitely the cause of recent viewer freezes I've been having.

Nicholaz Beresford added a comment - 15/Apr/08 03:40 AM

I'm closing my last issues with patches here. This one is still valid but in the light of what that it's unassigned for four months and and in regard to what has been said about my patches on Pastrami's resident session, I guess it's a won't finish.

Sorry to pull the rug under 13 voters, but I don't want to leave orphans behind. If anyone feels inclined to carry this further, feel free to reopen (and the Lindens have it on the internal tracker anyway).


Coyote Pace added a comment - 15/Apr/08 05:03 AM
With all respect, Nicholaz, and well understanding your view: I am re-opening this issue nevertheless. This is still a terrible problem on my machine, and on others I have talked to. And it's NOT (usually) a 'crash' that fits into another jira issue; these occasional flurries of disk accesses just bring the SL session to a complete halt (mouse and all), but generally allow it to resume in 30 seconds or so, when the disk access finally abates. I should note that the disk access bursts tend to be come more and more frequent once they begin, and seem closely associated with cam'ing around busy scenes.

The problem is still there, it still needs to be fixed, and even if the Lindens don't care to roll in your patch, despite it apparently working fine – well, they need to find some other solution. Bringing SL to a dead stop like that for internal housekeeping makes a bad joke of the software design, and I'm sure they can do better than the status quo.

For the record:

CPU: Intel Core 2 Series Processor (1994 MHz)
Memory: 2046 MB
OS Version: Microsoft Windows Vista (Build 6000) (note: this is Vista Home Premium)
Graphics Card Vendor: NVIDIA Corporation
Graphics Card: GeForce 8600M GS/PCI/SSE2
OpenGL Version: 2.1.1


Coyote Pace added a comment - 15/Apr/08 05:06 AM
While I'm at it, I'll update the "affects version" to the current release version, as I was suffering through this again just last evening with 1.19.1.4. Updated the title too – it's more than a 'hiccup'.

Nicholaz Beresford added a comment - 15/Apr/08 08:29 AM

Coyote ... you are of course welcome to reopen ... it's your thread now

Nick


Which Linden added a comment - 24/Apr/08 07:41 PM
One problem with the patch as-is is that it never calls purgeTextureFilesTimeSliced except in the destructor, which I am reasonably sure runs at application exit. The overall effect is that anyone who tested this patch had no cache purging whatsoever!

Obviously all you have to do is to call it at line 1452 after purgeTextures(), but, just FYI.


Nicholaz Beresford added a comment - 06/May/08 03:16 AM

Yup, oversight when making the patch.

However, JFI: it's intended to be called outside of if (mDoPurge) { }


Which Linden added a comment - 07/May/08 10:48 AM
I shopped this around internally, and learned that there are upcoming plans to rearchitect/refactor the cache system, which would include timeslicing purging over multiple frames. In light of that, I think it's a bad idea to import this patch as-is because a) it will be duplicated work and b) having more code in the cache system would make it more difficult to complete said project. If the timeframe for that project falls by the wayside or it fails for some reason, I'll be sure to revisit applying the patch.

I'll keep this JIRA open because the description of the problem in it is good, so it'll be easy for us to find and resolve once we solve it one way or another.


Nicholaz Beresford added a comment - 07/May/08 01:00 PM

No problem for me (I do want to mention though, that I appreciate the elaborate feedback).

Toysoldier Thor added a comment - 07/May/08 01:02 PM
I am glad someone mentioned this issue existed as I thought it was related to the major Nvidia problems they have been having with the RCs. Lately (dont exactly know when this had gotten significantly worse - maybe the past week) I have noticed on both the 1.19.4 and the latest RC 1.20.5 viewers that my viewer hangs (not freezes completely - but rather I cannot move and the avitars around me and myself conitnue being animated - ie. dancing - but no more IMs and no more chatting and no moving). This lasts for about 1 to 2 minutes and then just releases. It seems to begin happening after a long session time (ie it does seem to appear within my first couple hours being online). I notice at the time that my disk drive is like solidly active.

I have suspected that it seemed to be a disk clearing activity that suspends the viewer but then I was wrapped up in the NVidia driver crash issues so I have just been suffering with this REALLY ANNOYING RECENT PROBLEM.

I hate this long freezing as ppl around me think that I am not talking to them and I cannot send any indicator to them that i am just hung up.

Is the work around for this problem that I should just re login? or is this a bug in how the viewers are handling the clearing of the cache in an improper manner? I cant understand why my version 1.19.4 has just recently begun to notice this. Is it this because of possible Server upgrades?

Toysoldier


Nicholaz Beresford added a comment - 07/May/08 05:30 PM
@tt: You can try if my viewer fixed that. You'll find the download here: http://nicholaz-beresford.blogspot.com/

Hypatia Callisto added a comment - 07/May/08 07:22 PM
I came here from VWR-6343. I decided to put the jist of my comment on VWR-6343 over here as it seems these two topics are related.

I think this was happening to my previous NVIDIA card (a 9600GT), but followed by a driver crash. And may explain why it never happened with my other computer, which is running a 7600GS with only 256 MB over an AGP bus, a far slower card with less memory, as well as slower graphics bus... could it be that the freezing bug is possibly worse with faster graphics cards? as I am not having it anywhere near as bad with the 8600GTS as it was with the 9600GT, and it has a 128 bit bus rather than the 256 bit PCI-E 2.0 bus that the 8800 and higher cards have.

In fact I've really not noticed it on the new card, but I did reinstall Vista on my system and cleared out the disk quite a bit, might have had a huge impact now that I read this JIRA.

could it be the faster the card, the worse this bug is? dunno.


Lyn Mimistrobell added a comment - 30/Oct/08 05:07 AM
I've also had the same issue, and allthough I didn't check the logs I found it happened more often when I increased my cache-size to 1GB. I've since reduced the cache size to 500MB but I might even bring it down more. I'm glad it's not just my system

Soft Linden added a comment - 13/Nov/08 09:01 AM
This has sat without attention for quite some time. This would a great candidate for discussion on sldev. Can anyone demonstrate that pausing is actually happening here? Some kind of a metric would be really useful for bumping the priority of this task. Anecdotal tests internally haven't shown that this makes a difference.

latransa pera added a comment - 13/Nov/08 01:00 PM
Soft, can you (or someone) suggest a way of instrumenting this behavior for demonstration purposes? I suppose I could borrow a video camera and shoot a movie :-P ... the machine itself is pretty much unresponsive during these interludes.

Soft Linden added a comment - 13/Nov/08 01:07 PM
@latransa - we know that pauses exist, but it would be helpful if you could show that they're attributable to this code. Setting up a timer that measures how much time is spent in this function in a frame, then perhaps displaying information in the viewer showing the worst 3 times in the last 100 frames would show that there was definitely a spike attributable to this function.

Nicholaz Beresford added a comment - 16/Nov/08 05:33 AM

Soft, did you actually read the description and comments here?

As the initial descriptions says, it's a matter of "multiple times a day" not something within a few hundred frames. Also the logging as an indicator that this function is called and how many time spent there is mentioned in the first two comments.

Repro:
1) set SL tGB cache.
2) travel a lot until cache overflows (set breakpoint or add a beep to purgeTextures )
3) observe complete hang of viewer (some seconds to almost a minute)
4) look at log to see reference to purgeTextures and time take for housekeeping.


Rob Linden added a comment - 19/Feb/09 03:41 PM
Soft is planning on adding a little more logging code to help isolate the contribution that this particular code might have.