• All submissions to this site are governed by Second Life Project Contribution Agreement. By submitting patches and other information using this site, you acknowledge that you have read, understood, and agreed to those terms.
Issue Details (XML | Word | Printable)

Key: SVC-3895
Type: Bug Bug
Status: Open Open
Priority: Showstopper Showstopper
Assignee: Babbage Linden
Reporter: Eata Kitty
Votes: 709
Watchers: 158
Operations

If you were logged in you would be able to see more operations.
2. Second Life Service - SVC

Rezzing Mono scripted object cripples sim FPS

Created: 24/Feb/09 10:49 AM   Updated: 05/Nov/09 04:10 AM
Return to search
Component/s: Scripts
Affects Version/s: 1.25 Server, 1.26 Server, 1.27 Server, 1.30 Server
Fix Version/s: None

File Attachments: None
Image Attachments:

1. Snapshot_046.png
(2.36 MB)

2. Snapshot_047.png
(2.34 MB)

3. Snapshot_049.png
(2.32 MB)
Issue Links:
Duplicate
 
Parent/Child
 
Relates

Last Triaged: 24/Feb/09 12:59 PM
Linden Lab Issue ID: DEV-27936


 Description  « Hide
Tested on Second Life Server 1.25.5.109327

----------------

Rezzing Mono scripted objects results in a heavy performance penalty. Identical objects compiled into LSL do not cause similar perfomance penalty.

This also leads me to believe that attaching objects with many Mono scripts results in a larger performance penalty to the simulator than LSL.

The following example can bring simulators to their knees.

----------------

Reproduction:

Rezzer

integer switch;
integer count;
default
{
    touch_start(integer total_number)
    {
        if(switch=!switch)
        {
            llSetTimerEvent(0.1);
        }
        else
        {
            llSetTimerEvent(0);
            llSetText("",<1,1,1>,1);
            count=0;
        }
    }
    timer()
    {
        llRezObject("Object",llGetPos() + <0,0,1>,ZERO_VECTOR,ZERO_ROTATION,1);
        llSetText("Rezzed Objects : "+(string)(++count),<1,1,1>,1);
    }
}

Child Object

default
{
    on_rez(integer foo)
    {
        if(foo)
            llSetTimerEvent(0.5);
    }

    timer()
    {
        llDie();
    }
}

Change the rezzed object between Mono/LSL to observe effect.

Monitor the results with the sim FPS shown via the Control Shift 1. Results will vary depending on simulator, lightly loaded sims will show less difference.



 All   Comments   Change History      Sort Order: Ascending order - Click to sort in descending order
Chalice Yao added a comment - 24/Feb/09 12:15 PM
The new script above was tested in the Duvillier region, and easily cut it from 0.99TD down to 0.88, 0.8 or lower.

The very same script in LSL doesn't change TD at all. This is probably the main reason why so many sims have had so bad performance lately. Temp-on-rez vendors, temp physicals, bullets, anything scripted in mono that gets rezzed has a hit on Time DIlation now.

The same goes for avatars entering the region. The hit on the sim's time dilation can go as bad as down to 0.2 TD now on login or TP into a region.


Knife Carver added a comment - 24/Feb/09 12:27 PM - edited
Changing this to Critical since it's only lag.

Also, this explains alot.


Ariu Arai added a comment - 24/Feb/09 12:28 PM
Voted.

I've just recently started having more issues with this working on my new sensor probes. Previously, they just rezzed gradually (taking about 3 seconds to finish). So I decided to make them all rez at once, which ironically does the same, only the sim comes to a halt for 3 seconds. (It actually varies, depending on what sim you're in. I found that Duvillier usually halts for about 3-5 seconds, my sim and WT usually halt for about 1-2 seconds).


Chalice Yao added a comment - 24/Feb/09 12:30 PM
The problem is increasing with every passing day, as more and more script that were previously written in LSL get recompiled to mono. This goes for hud attachments, probes, bullets, the works.

This is also probably the reason why this has been only recently surfacing in the last two months, and not since back at the very introduction of Mono in SL. More and more things are converted, and the issue increases in effect.


jessica lyon added a comment - 24/Feb/09 12:31 PM
This is HUGE... and needs to be showstopper.. this has to be set as high a priority as possible.

Chalice Yao added a comment - 24/Feb/09 12:35 PM - edited
I'm setting this back to showstopper.

The drop in time dilation above was caused by one object and one script. One.
The contents of the child script don't even matter at all.

I think that's all the explaination this max priority needs. Sim performance on scripted rezzes, TPs and logins is going to get worse all over the grid with every passing day as people convert their products from LSL to Mono.


Gumby Roffo added a comment - 24/Feb/09 12:36 PM
I have also noticed the same effects, The pets have had a very bad experience in Mono. That is the cats as they where only just scripted and first in mono faild to function or bogged the sim down. They have since been converted back to LSL and I can now rezz 30 of them at the same time. The dogs where scripted last july and work OK in LSL and still do.

imnotgoing sideways added a comment - 24/Feb/09 12:37 PM
Me: Someone just TPed in. (_)
Other: How you know?
Me: Did you just find yourself walking through the wall for about 5 seconds and rubberbanding back? (_)
Other: Yes.
Me: That's how. =-=

Cristalle Karami added a comment - 24/Feb/09 12:48 PM
This is a showstopper. Literally! I've been wondering why the Mystitool simulator warning system keeps going haywire every few seconds in just about every sim I've gone to. That might explain why.

wildefire walcott added a comment - 24/Feb/09 01:12 PM
This does explain a lot of recent behavior I've seen. Avatars TPing into a sim have always had some impact, but their impact in the past 6 weeks or so does seem worse than it was before.

Major grief potential...


Jor3l Boa added a comment - 24/Feb/09 01:45 PM - edited
Hope Andrew will fix this bug ... this breaks my probes and other items that rez stuff.

NOTE> 500++ objects rezzed in 2 mins, what about rez Spam bug.?


Siobhan McCallen added a comment - 24/Feb/09 02:09 PM
We probably would have caught this sooner, if MONO TIME actually worked.

Grey Blankes added a comment - 24/Feb/09 03:12 PM
I reproduced the bug as well as additional issues to include teleporting and attaching with mono compiled scripted objects.

My findings are that indeed mono does impede sim performance, however it is directly related to the efficiency of the script, LSL or otherwise. This is something that needs addressing but it is NOT a showstopper. I'm not going to change the status of this issue, but rezzing even hundreds of mono compiled scripted objects at the same time did not impede performance so much as the timer event and script structure as shown above.

Tests I tried included teleporting while wearing scripted objects compiled in mono, measuring the sim peformance with an alternate av while doing so, repeating in LSL of the same scripts, repeating without scripts and again all the above for attach and detach.

It seems mono does indeed have a "start up" time that ranges from ,5 to 3 seconds for most scripts, even incredibly complicated ones. It does affect rezzers moreso, it especially have an effect on timer events, other than that...there are ways around all this that will not reproduce this bug. It should also be noted that reducing the quantity of in flight ((rezzed)) bullets is going to infinitely aid performance of a given sim, reducing objects rezzed per second from the statistical 14 per second I noted with most of this particular creator's affected objects down to 4-6 objects per second would improve performance more than 100 percent by itself. Food for thought.

Mono is not as forgiving as LSL, that is something to consider all on its own. I'll monitor this JIRA, as even though it doesn't affect some stuff, it does affect everyone because we all have to deal with the people who's creations are affected by this bug. "Please fix this, or at least someone come up with a workaround in the meantime."


WarKirby Magojiro added a comment - 24/Feb/09 04:18 PM
I've been using mono scripts extensively since launch, and I have just finished a product that rezzes objects very rapidly for a short period (60 objects, at a rate of 6 per second)

It does seem to cause some sim performance problems.


Rika Watanabe added a comment - 24/Feb/09 04:37 PM
An interesting observation that may help tracking down the cause:

Loading a Mono script into an object inworld with llRemoteLoadScriptPin, as opposed to rezzing an object with a running script in it, does not seem to cause any visible sim FPS drop.


Chalice Yao added a comment - 24/Feb/09 05:05 PM
As stated in the description of the issue, the effect of this can vary from sim to sim it seems, without any real explaination as to why, aside of perhaps light background load.

In some sims it runs without too much of an issue, in other sims it slams time dilation quickly from 1 or 0.99 down to 0.8 or lower while it runs, which means those sims have a very hard time rezzing/loading any kind of mono script. The same code, when run with LSL, didn't nudge time dilation in the problem sims.


Jahar Aabye added a comment - 24/Feb/09 07:37 PM
There are likely two things going on here:

1. There's a brief blip when an item is first attached while the Mono VM loads the code. This is normal.

2. There appears to be a more serious problem that occurs when you call llRezObject() repeatedly and rapidly in a timer() event, in a Mono script, that does not appear to occur when you call llRezObject() from other events.

It is important to recognize these as different problems. Otherwise you run the risk of trying to fix the wrong thing, or chasing down bugs in the wrong places. There is likely some sort of bug lurking in how timer() events are handled in the Mono VM. It is possible that the automatic 0.1 second sleep that llRezObject() causes is somehow screwing with the timer() event in some unknown way. It is also possible that it has nothing to do with the automatic sleep, but rather is specific to this specific function. Further testing is clearly needed, but it is important to try to narrow this down further, because it appears to be specific to this one event, which implies a very specific, and hopefully easily fixable bug.


Ariu Arai added a comment - 24/Feb/09 07:50 PM
@ Jahar

It's not the timer event, it's the Mono scripts themselves being rezzed.

My Sim Scanner probes do not rez from a timer or loop event, and they cause a spike in sim FPS and TD. People teleporting in cause lag when they have mono compiled scripted attachments. Rezzing mono compiled scripted objects from your inventory causes lag (Depending on how many scripts there are).

There is definitely an issue in how the scripts are received by the sim, and/or how they're booted up. I had an idea that it might be with the way the Mono Script starts up, taking a portion of sim memory, etc.


Jahar Aabye added a comment - 24/Feb/09 07:57 PM
Yes Ariu, that's why I said that there are two separate issues involved here. Experimenting seems to show a definite blip when you rez a Mono-scripted object, as the Mono VM loads the script. That would be issue number 1. Not much can be done, but once a script is loaded once, it should be fine when rezzing subsequent copies.

However, we are finding differences in rezzing objects from a timer() event as in the repro included here, compared to rezzing them in different events. So it appears that there may be an issue with timers here, or else there is some other explanation as to why we're not seeing any problems when rezzing objects fairly quickly from other events.

There may also be some issues with how the Mono VM handles memory, and we'll see what the Mono guys have to say about that. More testing is definitely needed, but it's important to get to the bottom of what specifically seems to be causing these issues, and again, let me reiterate that there does seem to be something involved in the timer() event.


Eata Kitty added a comment - 25/Feb/09 04:24 AM - edited
I put twenty sequential rez events in state_entry, same performance drop as a timer. No movement in LSL.
default
{
    state_entry()
    {
        llRezObject("Object",llGetPos() + <0,0,1>,ZERO_VECTOR,ZERO_ROTATION,1);
        ...
        llRezObject("Object",llGetPos() + <0,0,1>,ZERO_VECTOR,ZERO_ROTATION,1);
    }
}

Grey Blankes added a comment - 25/Feb/09 07:00 AM
I see that...be damned if the bug specifics can't be nailed down. Absolutely incredible this has slipped through. I'm gonna make it a point to bug Soft Linden on the issue, for anyone who wants to do the same, today is "bug triage day", and this really does affect absolutely everyone, we seen varying effects from nearly eveyr rez or timer function, what makes this so bothersome is that we can test in one sim and be fine, test in another and it cripples it, both supposedly are the same class, same server version, so what's going on? Someone had a bit too much coffee the morning they were cranking away at the Mono VM, lets beat the doors down to get this fixed YESTERDAY.

Zwagoth Klaar added a comment - 25/Feb/09 07:37 AM
I'm willing to bet this is from the JIT compilation of scripts upon being rezzed. I'm not sure if JIT is required for duplicate scripts, but I bet it happens anyways. JIT is supposed to be transparent, but in a tight looped simulation environment, its going to be noticed regardless of how small you make it. If you background the task of JIT, you have rez delay. If you make it a priority and have it up front, its a sim hiccup. I'm thinking this is more a problem with the nature of mono/.net than with the implementation used.

While mono does bytecode sharing, I'm not sure at what point it goes ahead and figures out if the module can be shared as an instance. It is also somewhat unrealistic to expect perfect 0ms VM setup with no delays, regardless of what VM engine is used. You can get away with allocating 16/64KB of ram quite easily, and because the VM for LSL2 was just another C++ object, its almost free to set one up. In mono, the VM is outsourced, has its own memory allocation engine, and garbage collection(I bet you mono does GC when you try to load a new assembly.) Managed code in general has a high overhead cost associated with it, and the whole framework is not really set up for fast no delay instancing of hundreds and thousands of assemblies.

I'm still one to agree that large sim hiccups are a bad sign. This should be investigated. On another note, this is NOT a showstopper, as it does not prevent people from accessing the grid/region or directly make using the platform almost unbearable..


Fluf Fredriksson added a comment - 25/Feb/09 08:02 AM
Well, it is border line making SL unusable.
I've pretty much given up exploring SL in the last few weeks simply because of the degree of lag.
If this is a significant factor in the recent increase in lag across SL, then it's a major problem. I suspect that the use of sim scanning radar huds, rezzing mono probes is going increase and make things even worse, not better. Not to mention the updated standard huds and scripted gadgets now being released compiled in Mono.

Basically Mono isn't the holy grail it was hailed to be and needs to be restricted to objects that don't travel with avatars or rez other items containing mono scripts.
It would help if the lag causing conditions could be nailed down for script writers.

Ah well.

Time to go back and recompile things back to LSL then. "Now recompiled in lag free LSL!"...


Ariu Arai added a comment - 25/Feb/09 08:10 AM - edited
It may seem like it'd be a show stopper, since it has the potential to cause grief and crash sims, though according to the priority key this issue fits right in with 'Critical'.

"Critical
Generally, most crashes (particularly if they're easy to reproduce and affect many), content loss, significant memory leaks, greatly reduced performance, etc."

Like what Zwagoth said, it could be with the compile on script rez. It makes sense, I never remember sims lagging so much when you compile a script . This was especially annoying when Mono was first released, you'd go to a sandbox sim like Duvillier, and it was constant lag spikes all day. From people compiling their scripts to mono.

The bytecode sharing probably isn't activated at rez, instead probably later on after the scripts are loaded. Which is rather annoying. o.0 - But this issue MUST be addressed, considering it has a strong griefing potential. I actually already crashed my sim doing this (not on purpose, I didn't think rapid rezzing of a single script object would cause so much lag, where I couldn't delete the rezzer), and it was easy. basically the same script in the issue description, only the child objects didn't die, and the timer was an infinite loop.


Eata Kitty added a comment - 25/Feb/09 08:34 AM
[8:14] Babbage Linden: it may be caused by verification of the scripts digital signature on load
[8:14] Babbage Linden: although that should only happen the first time a script is rezzed in a region

Jahar Aabye added a comment - 25/Feb/09 10:02 AM
Yeah, the script's signature should only be happening on load the first time the script is rezzed. Also, I would think that the JIT compilation would only be an issue the first time the script is rezzed, although that might be different.

Regardless, both the script signature and JIT issues should have shown up when Mono was first introduced. The thing is, when Mono was first introduced, we did plenty of testing to check its impact on our guns, and guns are the definition of an object that rezzes a large number of prims fairly quickly. So if this had shown up when Mono first came out, we would have caught it then, or Eata would have, or Timmahy or Jenny or anyone else in SL who makes guns.

On the other hand, if this is something that has just now shown up with this last server rollout, it would seem less likely that it's something as basic as that, and more likely to be with how the Mono VM is implemented. It's also possible that the issue has to do with the database communication with the sim or with the asset cluster, or some combination thereof.

Unless something changed with either script loading or JIT compilation in the VM in just the last rollout, of course. That seems unlikely, but I suppose it shouldn't be discounted. I still think that there may be something specific to how the objects are rezzed, though, or else something specific with activity within the sim when this is going on. This doesn't seem to repro all the time, and some objects that rez many prims fairly quickly seem unaffected. I strongly suspect that this is going to be one of those bugs deep in the code, not necessarily something blatantly obvious.


Maggie Darwin added a comment - 25/Feb/09 02:32 PM
Some clarity might be provided by being a little more rigorous about what exactly we mean by "the first time a script is rezzed".

If a script with the same source code is in five different objects, and I rez all five of them, how many "first times" happen?

What if I have a copyable object and rez five copies?

And what about scripted attachments? If I log out and back in does that make a new "first time" for the scripted objects I'm wearing?

How about if I TP out and back in again?

How about if I detach the object and then wear it again?.


Zod Colville added a comment - 25/Feb/09 08:03 PM
I am a scripter and always do the best I can to make my scripts as efficient and lag free as possible, so I want to make sure I am understanding this issue correctly.

If I am reading this right, the hit to performance occurs when the object is rezzed, but will calm down after that. If this is true, I am assuming that objects which are not frequently re-rezed such as tip signs, vendors, dance machines, money drops, etc are still better off with MONO, while objects such as attachments are better with LSL.

Is this correct?


Moon Metty added a comment - 25/Feb/09 10:20 PM
After the rolling restart, the Mono rezzing/removing lag is no longer present on Neptune.

Eata Kitty added a comment - 26/Feb/09 05:42 AM
Zod you are correct. This problem does not really affect anything that stays "stationary" in world. It will affect things like vehicles on sim crossings, avatar attachments on rez/sim crossing/teleport and rezzers.

Ariu Arai added a comment - 26/Feb/09 07:17 AM
It does affect stationary things, especially servers that use email. Email messages often get dropped during lag/lag spikes, which now occurs quite often.

If you have slave scripts (like in NPV's, servers, vehicles, etc)or other unimportant scripts. You don't need mono. Mono is only really needed in scripts that need performance or memory. So for now, it's best just to keep only important scripts mono.


Eata Kitty added a comment - 26/Feb/09 12:33 PM
i don't follow. A server or vendor wouldn't create any new instances of scripts after the inital rez the fact it's running Mono would not cause any further problems?

Jahar Aabye added a comment - 26/Feb/09 12:48 PM
I think he's talking about the well-known problem of emails getting lost during periods of heavy database use. That's a completely separate bug, unrelatd to this one.

There are a lot of situations in which scripts can cause lag, or in which scripts fail to behave correctly. It's important to keep things focused on this specific problem, which seems to be directly related to TD spikes while rezzing objects scripted with Mono. Either it's an issue of the scripts loading on the Mono VM, problems with the Mono VM rezzing objects, problems with the Mono VM communicating to the database, or perhaps some other unknown issue.

But the email thing is completely unrelated to this, and long predated the introduction of Mono.


Ariu Arai added a comment - 26/Feb/09 01:25 PM - edited
Actually I meant servers/vendors being affected by people teleporting in/rezzing things. That causes a lag spike, and that causes some emails to be dropped. - So this isn't just causing lag, it's causing SL communications to become less reliable. I'm not talking about the servers/vendors themselves causing lag.

Oh, and Jahar, I'm a 'she', btw. :>

Note: It appears this issue is being resolved, though no word from the Lindens yet. - I am now able to rez a large group of same scripts (In mono) without any severe lag spikes. I'm pretty sure the recent rolling starts are including the fix for this issue as we speak: http://status.secondlifegrid.net/

If so: Good job Lindens! A speedy fix for a big bug.


Maggie Darwin added a comment - 27/Feb/09 03:53 AM
I wouldn't be in a big hurry to hand out kudos unless we hear officially that this is confirmed as a bug and that there's a specific fix in what's being rolled out.

Unless this is another one of those "this is related to a sim-crasher SEC- fix so we can't tell you until it's after it's fixed, when we won't have any time to devote to already-fixed bugs" deals.


Eata Kitty added a comment - 28/Feb/09 04:39 AM
I don't think anything has changed.

BETLOG Hax added a comment - 01/Mar/09 03:03 PM
I'm not seeing it at all.
Admittedly I just created the objects as per your scripts at top, and thus the sims VM would be fully aware of them before i tried to rapid rez anyhting.... but i got no adverse effect.

I tried a pistol test i made, all mono everything, consistently firing 15+ rounds/sec... at 196m/s muzzle velocity....travelling ~2000m straight up (before they hit 4096)
...no problems... no significant ctrl-shift-1 stats blips.

I tried my sim scanner, which rezzes a coalesced chuink of 16 very tiny prims.... they (modified warppos) travel straight up to 410m... spread out to a 64m grid... move to gound level... and then travel up to 4096 scanning and reporting at 64m intervals relative to a 0m elevation benchmark.
...no problems.

This is good, and mostly as expected... except for the fact that this is the sim you (Hen) were 'killing' recently with (I assume) the same issue...as we were discussing at the time if you recall... so I'd expect it to manifest again. Particularly if I rez a volumedetect scanner; which previously caused a 20-30ms physics spike. I had assumed this was because of the 20 or so moving/physics butterflies a resident has permanently on their property... but today there are none of the previous adverse effects. The sim is otherwise exactly the same. EXCEPT that an 'elevator' is no longer always set to physical, as it previously had been when i was seeing those ridiculous volumedetect<>physics spikes. Which is very interesting, and possibly directly related to the weird load.

I have no idea if this relates to this Jira issue... but thought it worth mentioning... maybe come check it out Hen?

Second Life 1.21.6 (99587) Oct 14 2008 17:42:25 (Second Life Release)
Release Notes

You are at 275996.5, 250952.2, 2016.4 in Ravenglass Realm located at sim3810.agni.lindenlab.com (216.82.23.37:13002)
Second Life Server 1.25.5.109327
Release Notes

CPU: Intel Core 2 Series Processor (2405 MHz)
Memory: 6143 MB
OS Version: Microsoft Windows Vista Service Pack 1 (Build 6001)
Graphics Card Vendor: NVIDIA Corporation
Graphics Card: GeForce 8800 GT/PCI/SSE2
OpenGL Version: 2.1.2

libcurl Version: libcurl/7.16.4 OpenSSL/0.9.7c zlib/1.2.3
J2C Decoder Version: KDU
LLMozLib Version: [LLMediaImplLLMozLib] - 2.01.22128 (Mozilla GRE version 1.8.1.13_0000000000)
Packets Lost: 200/51820 (0.4%)


Fluf Fredriksson added a comment - 03/Mar/09 12:53 PM
I've just tested with the suggested scripts as well, and no real hitches, maybe the mono script being rezzed might be so simple in nature / size that it barely blips?

I can however reproduce the mono script causing lag bug by attaching a radar I have compiled in mono "ASLocator1.8", and watching the frame rates / sim performance dip while it starts up. It also seems to take a chunk out of sim resources when I detach it as well.

Needs a better reproduction example IMHO.


Ariu Arai added a comment - 03/Mar/09 01:17 PM - edited
I believe the Lindens are working on this issue right now. I have noticed many changes in the script rezzing performance. Currently both LSL and Mono scripts when being rezzed at bulk cause a spike in sim performance, then the same scripted object being rezzed again does not cause a lag spike.

Though at the time this JIRA issue was created, only mono compiled scripts being rezzed in a loop/bulk caused huge lag spikes.


Maggie Darwin added a comment - 03/Mar/09 01:27 PM - edited
One has to wonder, if you rapidly rez objects containing a script the server hasn't seen before, if the server fails to recognize that it has seen them before and goes through the same extra initialization (compilation, signature verification, whatever it is; since the server source is secret we don't know) for each one?

Could get nasty.

And if the "Lindens are working on this issue", shouldn't it be assigned? The price for having free talent working on PJIRA should includes a responsibility to flow information back to the PJIRA users.


Jahar Aabye added a comment - 03/Mar/09 01:40 PM
It's been triaged and imported, but yes, you would expect that it would be assigned if it was being worked on. It certainly might be that they are working on it as we speak, but I also strongly suspect that what people are seeing is that the problem appears to be sporadic, and will not consistently repro.

That makes it a lot harder to nail down exactly what's going on here.


Maggie Darwin added a comment - 03/Mar/09 02:01 PM
As I understand it, if it's assigned to an actual person, it shouldn't be unassigned. And if it's in that magical flow state where no one person is actually working on it, it should (paradoxically) be assigned to "Workingonit Linden". How it is that it is triaged and has a DEV number, but is unassigned I don't understand.

Ariu Arai added a comment - 03/Mar/09 02:23 PM
I don't know why it isn't assigned, but I am very certain that they are doing something about this issue. I've ALWAYS had this issue with mono scripts causing lag at rez. I always thought that was the way things were, since I never created anything with a lot of scripts until after Mono was implemented. In my combat system, the older orbits used a lot of scripts (compiled in Mono). When rezzed, the sim would freeze for a second or two. Every time (and everywhere), not every once in a while. Now when I rez similar objects, the sim may or may not freeze for a few seconds. So something is being done. Especially evident since there has been recent rolling restarts.

A good example:

I did this test a few hours after this JIRA was created. It created the same result every time (the Mono probes taking 3 seconds to rez) and every where I tested them. Though in some sims the mono probes took 1-2 seconds, others took 3 seconds. Either way, they still created a massive lag spike.

[13:06] Probe Tester 3.18 (LSL) whispers: Took: 3.437628 seconds to grab 12 keys @ 44fps. Rezzed nearly instantly
[13:07] Probe Tester 3.18 (Mono) whispers: Took: 6.022919 seconds to grab 12 keys @ 44fps. Took 3 seconds to rez


Eata Kitty added a comment - 04/Mar/09 06:39 AM
Potentially restarting a sim could temporarily fix the problem if it's something that builds up over time or it could have even been a blip for a few days due to a load issue.

I wouldn't assume it was fixed yet, it might keep reappearing and it took a heavy toll on some sims when it did.


Cynebald Ceawlin added a comment - 11/Mar/09 08:02 PM
Not much useful info to add, other than that I strongly suspect it's still around – my partner's and my sim (Nimue Isle) has always run like a dream; we keep it relatively clean, most of the scripted objects on it have my scripts in them and I take reasonable care to keep them low-impact. Recently however we've started noticing a lot of rubber-banding as we work on our work platforms up in the sky (above 3000 m); watching the sim stats has confirmed we're seeing short-term but significant dips in sim FPS and time-dilation (always before both pegged at 45/1.0) Tonight as people have TP'd in and out, I've been watching and seeing it drop to time dilations in the 0.5/0.6 range, and FPS down in the 30's (with accompanying rubber-band effects as we tried to walk around). It comes back to normal (or nearly so, although I'm seeing time dilation sitting at 0.98/0.99 rather than 1.0 as it always used to be) very quickly – within a second or two – but we've NEVER noticed this happening before... Anybody else w/ recent observations on this?

Andromeda Recreant added a comment - 30/May/09 08:06 PM
This thing is nasty.

Akio Kamachi added a comment - 02/Jun/09 12:59 PM
It is possible to easily cripple any region dropping dilation to 0.07 exploiting this bug.

SimonT Quinnell added a comment - 02/Jun/09 04:10 PM
I've observed that not only is it the rezzing of mono scripted objects that is an issue, but also then they are derezzed as well! SO .. if avatars have scripted attachments they hit the sim both when they arrive AND when they leave. Since LL love having some numbers I might have a go at doing some benchmarks later today.

This is a major error in LL's Mono implementation and is a gridwide issue. The only reason why this hasn't really hit yet is that alot of avatars are still wearing their old LSL compiled AO's. It's going to get much worse!


BlckCobra Shikami added a comment - 03/Jun/09 12:16 AM
As discussed during beta office hours with Prospero (gone) / Vector these 2 issue seem to be related if not the same cause.

Akio Kamachi added a comment - 06/Jun/09 05:10 AM
Using llSetScriptState seems to trigger the same problem. I used an object containing 17 scripts with a main script that disabled/enabled them.

Compiled in LSL:
[4:58] Object: on
[4:58] Object: Average Time Dilation: 0.982803
[4:58] Object: Time required: 0.245266

[4:58] Object: off
[4:58] Object: Average Time Dilation: 0.989249
[4:58] Object: Time required: 0.248039

Compiled in MONO:
[5:00] Object: on
[5:00] Object: Average Time Dilation: 0.405780
[5:00] Object: Time required: 0.576675

[5:00] Object: off
[5:00] Object: Average Time Dilation: 0.979096
[5:00] Object: Time required: 0.042656

The ''lag spike'' is generated only when MONO scripts switch to ''on'' in this experiment.


Garvin Twine added a comment - 08/Jun/09 05:17 PM
I somehow cannot believe how Lindens seem simply to ignore this issue as it is a serious performance issue on EVERY sim which will get worse and worse as more and more scripts attached to avatars will be mono compiled...
The only comment by one here was:
[8:14] Babbage Linden: it may be caused by verification of the scripts digital signature on load
[8:14] Babbage Linden: although that should only happen the first time a script is rezzed in a region

well, so people shall simply not tp and change regions and so this is no issue for attachments?
I dont know how many avatars enter a new region every minute... but with 70000 online i imagine there is like one tp every second at least that means two sims (the one left and the one entered) go down in performance...

The issue was reported in February... now it is June... it was reported with server version 1.25 now we are on 1.26.4 ... and this issue was not even commented by a Linden here!!!!


Erica Kessel added a comment - 08/Jun/09 05:26 PM
1 teleport every second with 70,000 online works out to each resident teleporting every 19.5 hours on average, so the rate of teleports is much higher than that, I think.

Chalice Yao added a comment - 09/Jun/09 11:55 PM
By now I presume that this only happens in regions where the region/simulator memory is quite filled, and swapping occurs on rezzing of scripts and/or people teleporting in and out of the region.

This will presumably be fixed when the new script memory pools get implemented. Til then it's best to educate people about the 300+ resize/retexture mono scripts they are wearing in their clothes and on their 4+ HUDs.


SimonT Quinnell added a comment - 10/Jun/09 12:12 AM
@ Chalice

You presume wrong. I have tested this in a Class5 server on mainland .. a nice quiet sim. There were 2 avatars in the sim while i did my testing.

Just a simple AO alone drops the dilation time to 0.7 and fps below 32. (LSL compiled has no impact) Combine this with a MONO compiled collar (OpenCollar in my case) and we get a dilation time below 0.2 and fps between 2 and 7. And remember, this occurs in the destination AND the departure sim. AND this is just 1 avatar teleporting. Try walking around while people are teleporting in and out of a sim wearing a scripted collar and you sure know about it.

As for any fix .. the only thing we can presume is that it won't get fixed as no linden has bothered to comment.


Jahar Aabye added a comment - 10/Jun/09 01:05 AM
It is quite possible for memory swapping problems to affect your region even if your region has a low script load, since all four regions on a simulator are sharing the 4GB memory pool (approx 800MB per region if I remember correctly).

Now, if this specific issue is separate from the larger memory swapping issues that Mono has....unmasked....then it will still be easier to diagnose this problem once the Script Memory issues are addressed. Either this problem will go away when script memory limits are implemented, or else it will be more clear what the problem is at that point.

Regardless, there really is no idiot-proof fix for the database and server memory issues caused by walking around and/or teleporting while wearing 300+ scripts. Mono undoubtedly makes this problem worse, due to the greater amount of memory allocated. Mono may be a bit more efficient at runtime, but at the cost of memory, and that cost really adds up with that number of scripts.

Obviously if a scripted object is causing problems under Mono, recompiling back to LSL is a good stopgap solution. Fortunately, there is already code in the server software to monitor memory usage, hopefully good memory limits or changes in memory allocation can be implemented soon.


Chalice Yao added a comment - 10/Jun/09 01:26 AM - edited
Each region only has 200mb of actual script memory, even.

It should also be noted that Mono scripts, to my knowledge, require internal recompilation from the assembly code every time the script gets moved to a new sim, unlike LSL which I think simply gets serialized and run. Also, LSL gets serialized in one chunk, while I believe Mono scripts, due to the possible size, require several.

So, yes, Mono scripts will always have a bigger delay on TP to a new sim, and with a person carrying over 100 scripts (which is sadly, sadly not uncommon with many scripted gadgets and nomod clothes out there), this will always be more noticable than with LSL. (Failed teleports, lag when TPing in/out in the involving sims). It is in the nature of how the LSL backend works, and how the Mono backend works.

It's the reason why I hope for MISC-268 to get attention sooner or later, when the main internal work planned for Mono is done. With script limits in place, people will be more careful about the overhead a high number of scripts in a creation cause in memory and script time, and with the functions suggested in MiSC-268, it will be much easier to create things with a low script amount.

Something is causing the sims trouble. if it is the high memory demand of scripts, memory leaks during serialization over time, or general memory management troubles in the sim code time will tell, but things are being worked on and the script memory limits will be a good way to draw first conclusions and doing tests.


Maggie Darwin added a comment - 10/Jun/09 05:53 AM
The instrumentation made necessary by script memory limits will be a good way to draw first conclusions.

The limits themselves seem a good way to deflect the blame and consequences of a bloated and inefficient Mono LSL interpreter away from Linden Research and shuffle the blame off onto their customers.

I'm having trouble remembering what the benefits of Mono LSL are. I could have sworn it was supposed to include more script memory, but apparently we can't afford to use it.


SimonT Quinnell added a comment - 10/Jun/09 06:22 AM
What has surprised me with the testing I have done is the consistency of the effect rezzing and deleting a MONO script has on a sim. I've tried it out in a couple of sims. My land in the mainland full sim which I've quoted some numbers as well as a Homestead sim, which interestingly gave very similar numbers to the full sim. I was expecting the Homestead to be impacted much more.

This suggest to me (although i do admit on somewhat limited testing) that there is something fundamentally wrong that is occurring when a mono script is initialised and loaded into a sim and deleted from a sim and that it is independent on the current sim load.


Maldoror Bowman added a comment - 11/Jun/09 05:52 AM
It would really be interesting to know a little about the scripts involved in some of these tests. For example, do they all have on_rez handlers that might be JIT compiled when the object is rezzed? In the case of guns, if the script in each bullet fired causes the JIT compiler to be invoked to compile its on_rez handler, might that be enough of an overhead to cause the lag that Eata and others have reported? I cannot find any account of how the JITted code for IL is cached, and, in particular, whether that cache is reset when the object is rezzed. If it isn't, that might lead to some interesting tricks to force the script to JIT critical portions in advance then take the object into inventory with the code intact. Is there any documentation about this?

Pae Sinister added a comment - 12/Jun/09 08:35 AM
This is a showstopper for Mono, and since everybody is eagerly embracing Mono because we've been told at every turn how GOOD it is for sim health, it's a showstopper for the grid as a whole. It's a shame that TWO MONTHS have gone by since it was reported and it hasn't gotten assigned or commented on by a Linden. And fireworks season is right around the corner ...

This issue causes one round of a firework shell that is practically unnoticable in LSL to drag a sim down to 23 FPS from a full 45. One single fireworks shell. One rezzing script, ten prims, all mono-compiled timered particle emitters that do their job then politely llDie(). I can knock it down to single digits if I put on a moderate display like this, whereas I can do a complete show with LSL shells and you can't tell when the display begins and ends by looking at the sim's health.

What do you think somebody who is actually TRYING to hurt a sim can do with this? What's it going to take to get this fixed, a bunch of griefers bouncing around HIP with recompiled pop guns from the library dragging the sim to a halt? Recompiled pink blocks recursively rezzing themselves when the /b/ tards decide to have some lulz? This really needs to be fixed before it's exploited, and I'm frankly surprised it hasn't been exploited already.

Similar to this issue, watch what happens when you drop a linkset item with dozens of prims and mono scripts in each one and then de-link it. I had a 244-prim linkset like this that was going to be used for a special fireworks display but needed to be re-ordered, but the sim conveniently disappeared out from under me when I hit Ctrl-Shift-L and I've since abandoned the project (though I might revisit it using LSL2 now that I know better).


Ardy Lay added a comment - 12/Jun/09 09:28 PM
Second Life Server 1.26.4.120562 gets clobbered too.

Miles05 Reitveld added a comment - 13/Jun/09 12:46 PM
This has deeply affected my region, Twilight Town. A lot of us recompiled to mono, and objects in region have been recompiled to mono for the supposed less lag, but when we have a bunch of people, 20-30 agents entering and exiting the region, performance just starts going down like mad... I'll be tracking this for sure.

Foxxy Henhouse added a comment - 16/Jun/09 11:28 AM
This is a HUGE issue and should be a show-stopper

Akio Kamachi added a comment - 19/Jun/09 06:45 PM
I would really appreciate a Linden commenting on this issue to decide if I should revert frequently rezzed scripts to LSL or keep them MONO knowing if this issue will ever get resolved.

Huns Valen added a comment - 21/Jun/09 12:53 PM
Chipping in my support to get this fixed!

Elbereth Witte added a comment - 22/Jun/09 11:39 AM
I accept that this issue having a development issue id is equivalent to a linden commenting "achoo" here. Some sniffles and a formal acknowledgement would be nice, bonus points for "we don't get it" or "the fix it a week away" or "if you don't like it, play Red Pluto instead!" or such.

Ariu Arai added a comment - 23/Jun/09 10:15 AM - edited
I agree with Elbereth.

Seriously, what's going on Lindens? No Linden has left a comment explaining what might be going on, and this issue has been up for almost 4 months now! - We need bug fixes! Not new shiny features in SL. At the least, and response to this issue.

It's not just the 'frozen' effect caused by lag that's the problem here. (Frozen, as in you cannot move or interact with the world)

> If someone teleports in when you're saving a script, it'll most likely fail.
> If someone attaches a HUD (Most especially HUD's, but can also be other high script attachments), it'll cause lag, even when they detach it. It'll cause lag!
> Saving a mono script usually causes a small lag spike, which can be very frustrating when converting scripted LSL objects to mono.
> Once laggy mainland sims are now the performance equivalent to Weapons testing sandbox, constant lag spikes and huge lag fests which can fool you into thinking the sim has crashed.. Busy sims and mainland sims practically rain lag.

Ever since mono, any sim that receives high traffic is now rocking back and forth in a fetal position crying "Make it stop!". However, mono scripts do run faster, at the expense of simulators running slower. So technically, there's no improvement here, if anything, it's a performance decrease.

If this goes on, we'll have no choice but to organize, "PETS", or "People for the Ethical Treatment of Simulators/Severs" and boycott your virtual world until the simulators are treated ethically. (Sarcasm; but honestly, this issue is getting nearly that severe)


Nulflux Negulesco added a comment - 03/Jul/09 04:48 PM
I sell a product called the Master Particle Studio HUD. It is comprised of 255 linked prims and a couple hundred MONO compiled scripts that communicate mostly through linked messages. The linked messages only go to the child / parent prim that requires the data - I'm not using LINK_ALL / etc unless the commands need multiple recipients... The scripts themselves are quite complex and without a way to count the lines for all of them cumulatively I would guess there's something like 10,000 lines of code total for the entire system. My estimate could be wildly inaccurate but you get the picture, it's not small.

Recently I began to notice when anyone in the same simulator attached the MPS HUD - the sim literally froze for 1-2 seconds before it recovered. Detaching the HUD produced the same 1-2 second simulator-wide lag. Tests showed that it did not matter who attached the HUD - the sim froze for everyone simultaneously for the same length of time.

I'm not 100% sure but I think I recall that this problem did not manifest on every simulator and some sims seemed immune while others (such as Levkoy) had the bug.

If a Linden is interested in reproducing this issue they are welcome to contact me for a copy of my MPS HUD (free of course, for debugging purposes). If you decide to get it from my avatar inventory make sure you get the latest version (Master Particle Studio HUD v11.7). If the particle preview image doesn't load - don't worry it's disabled I'll soon be incorporating HUD rendered particles for the preview.


Squall Ichtama added a comment - 06/Jul/09 12:44 PM - edited
So i thought i'd test this out.
I went to an empty sim. I had the statistics window open. I opened inventory and selected wear on a pair of shoes i knew were scripted and were compiled to MONO. The sims time dialation dropped to 0.95 and it didn't pick up to 1.00 for about 10 seconds.
That much of a loss in an empty sim just by wearing shoes?
Something really isn't right there.

Voted and telling everyone i know about this issue. It goes a long way in explaining the weird lag in recent months.

Editing to add further obsivations.
I TP'd to another sim which i knew had atleast 20 people in. I had my stats window open. As i arrived the time dialtion nose dived to 0.45 then after 20 seconds recovered. I then watched more people TP in and each time the time dialation fell to around that level.
This is clearly a showstopper. If more than 1 person tries to change to something with mono compiled scripting or tp to another place with something already worn with mono compiled scripting then the sim gets killed.


alice klinger added a comment - 06/Jul/09 01:26 PM
So that is why we have such a massive chatlag in open chat now?
Might be why group chat does also not go through most of the times as well, could that be?
I mean if the sim is overloaded with a rezzing mono scripts then it explains why nothing goes.
Very deadly for a shopping sim that lives from massive visitor flow.
The more lag though, the faster people are frustrated and give up and not go shopping.
I had to give up on the hair faire this year just because the lag was too frustrating.
Of course this has my vote!

BETLOG Hax added a comment - 06/Jul/09 03:05 PM
Although I still don't think I have seen it as bas as some of the descriptions above, this is still a valid issue. Even if only because of the massive number of mono scripts in resizers. (that probably don't really need to be compiled in mono, and are sometimes horribly written)

I am not really up to date on the bytecode sharing status of mono scripts, but last I checked every script had to be compiled inworld, taken to inventory and manually dragged from inventory to prim in order to make use of the bytecode sharing feature, and I know none of the resizer scripts would demand this of their users. Perhaps herein lies the root of this problem?
Therefore perhaps pushing any jira relating the expediting the bytecode sharing implimentation in llGiveInventory() and llRemoteLoadScriptPin() would be useful.
...if this isn't already in effect.

However unless we are in the habit of regularly monitoring dilation vs script numbers the results will tend to always look really bad. Particularly in homesteads.
Just like any statistic, i'd strongly suggest we need to monitor them all the time for a better understanding of background level and standard deviations.

I make a product that generates a graphical history of dilation/FPS, but obviously I can't really pimp it here.
For diagnostic purposes I'd be happy to give one to anyone who has already commented in this thread. IM me with a link to your comment above if you want one.


meade paravane added a comment - 08/Jul/09 11:41 AM
I don't think this is the sole cause of the nasty open-chat lag/loss lately, alice.

Getting a sim to totally freeze up is pretty easy to do, even on an otherwise empty/idle region. Just get somebody with a bunch of prims attached, even scriptless or non-mono scripts, to TP in.


Sparket Schmooz added a comment - 20/Jul/09 05:34 AM
just noticed this Jira today, but i think the problem is getting worse since that last rollout. Everday i run the Get Top Scripts to keep the sim as lag free as possible, i like the Disable Scripts button they fixed. I use the Stat numbers frequently n noticed lately that someone tp ing to the sim does effect it. I dont know much about Mono vs LSL but last Friday my sim was like walkn thru paste after a dance event, for the first time. I dont see the numbers improving or remaining constant at all in Stats.

Ardy Lay added a comment - 25/Jul/09 08:39 AM
I added Second Life Server version 1.27 to this report as also affected as my repro and the above listed repro are still effective demonstrations.

Aulderbane Toonie added a comment - 25/Jul/09 09:18 AM
It's been half a year since this got posted, and through that time I've watched the lag-spikes, frame-rate plunge, and horrid thousand-plus pings cripple the sim I admin over, and others I've visited. Every time someone teleports in who has a Mono script does it. It's even worse since the sim is Class 4, but even class 5's get the hits. From what I understand it's more than just the scrips, it's other infastructure making it worse, but no one has posted things on that. Maybe that's why a Linden has yet to be assigned to this. That withstanding, let's peek in at this, hmm? Almost everyone uses scripts for something. This is why it's at Showstopper status. Daily crashes from a script/network lock-up is NOT fun.

Kira Wolkenberg added a comment - 27/Jul/09 06:57 PM
This isn't really an issue, see they are multiple reasons that sims can go down to that rate. If you have a rezzer that is rezzing a lot of physical prims or you have a lot of scripts poorly coded so that they are all running in a tight space constantly, e.g. A stargate with chevrons, the DHD, the warning lights, the ring rotation and more all shouting commands at each other, and a huge nest of scripts with listeners that are listening to every little thing. Then you should expect lag to show up. However in my home sim, I am a scripter and by definition alone practically all items on my parcel have a script in it. Heck, I even have a mono scripted RolePlay HUD and mono scripted AO. and the sim time dilation and FPS is only FPS: 44.90928, Dilation: 1.00000. I have noticed that the Mono AO actually makes it a lot easier to do things I want to, like combat sims, when its in mono as it lags less than the LSL scripted ones and when you combine that with MONO weapon scripted items and such, in a large completely avatar filled sim (70 avatars) all fighting, you will lag, but you won't lag more than others in fact you will place a much lower strain on the server than anyone else will unless they are also using a lot of mono scripted items. Hence the reason DCS2, XRPS, CCS, and other combat meters/HUDs are in MONO now and why my items are in mono too.

It should also be noted, that unless you code things in such a way that your coding for an embedded device you are more likely to add lag to the items. However, the real lag comes when you have LOADS of avatars in any sim, or you have loads of LSL scripted items in a sim and/or when you have a lot of Physics (yes bullets, cars, planes, elevators, etc...) going on all at the same time. All this adds up to the point that Sparket Schmooz was stating that it lags a sim down however that wasn't Mono's fault. Thats the fact that I doubt very highly that people that went there and if it was a sim full were not using high prim/ high rezz prim hair, boots, dresses, and other items. All those add up to bring down sim performance too.

And speaking in reference to Aulderbane, I can sympathize with you there, but I can't say that blaming it on Mono scripted items is the cause of it. I mean I and many others I know can attest to the reasoning behind why I said this here and in the first paragraph of my response. However, I can say that overall it could be your systems performance why it crashes you. A while back I had an Athlon 64 single core barely able to play SL and anytime I had more than 16 people near me, I would crash in a heartbeat. Well, I got a dual core laptop that officially isn't suppose to play SL as its graphics card isn't approved for it, yet I can play SL with it and have almost no problems till I turn up the draw rate then it locks up and crashes too. So, I recently purchased a Core i7 920 system that has 12 GBs of RAM and the most powerful graphics card around and it doesn't crash me or my partner. My partners Althon X2 3800+ crashes only when she is having network problems. And we live in nothing but MONO scripted items that either I or some scripter I know made. So not saying your system old, and not saying that you are completely misleadingby blaming it on MONO scripted items, but it could be other reasons for you crashing as Mono scripted items take up less memory from the server allowing you to use more scripts in your items than you could previously and have them all run in virtually real time. I would just look into analyzing your equipment and set up first as I think that is the problem more than anything. In reference to that, I have a friend with an Intel P4 HT that also crashes a lot, and I know its because their system is maxed out with what it can handle.

All in all, I am glad they made Mono compiled scripts possible as it means more potential for users like me that are scripters to make more complex scripted items and not have them slow down the sim any. Thanks Linden Labs.


Garvin Twine added a comment - 27/Jul/09 07:14 PM
@Kira:
I think you miss the whole point here:
Nobody is saying that mono scripts are bad...
The point is that when something... anything with mono scripted items enters a sim... the sim gets a short down because it
[quote]
[8:14] Babbage Linden: it may be caused by verification of the scripts digital signature on load
[8:14] Babbage Linden: although that should only happen the first time a script is rezzed in a region
[/quote]
This is the problem, not that mono scripts as such do a less good job than LSL compiled ones but that they cause lots of load on a simulator when they enter the region, and every teleport is entering a sim, every new rezzing (for example bullets from guns, arrows from bows etc)... causes the sim work way more if these item have mono scripts

Damen Hax added a comment - 27/Jul/09 09:16 PM
Not assigned?.. tick tock tick tock

I started informing users many many moons ago that they can tell when someone TP's into a sim by noticing the sims FPS drop considerably; Not long after that I began the tedious task of reverting MONO compiled code back to its original state.. Now many months later...

There is only one reason something is not fixed quickly.. its a biggie. So I can understand the time limits.. but to not even have this assigned or @ least have us think you're looking into it.. tick tock tick tock

Need to know what the server is actually doing with the introduction of a mono compiled script before one can assume to figure a fix.


Imaze Rhiano added a comment - 27/Jul/09 10:32 PM
It was in bug trial last night and imported to developers internal Jira.

civlet moody added a comment - 29/Jul/09 03:50 AM
I can almost confirm the behavior. Our offending objects were "Botanical - Realistic Fireflies (mt)." The behavior is not 100% consistent, though. Sometimes they run well with no impact, but occasionally, they just go brain dead and drag down sim fps, time dialation, and sim time (ms).

BETLOG Hax added a comment - 30/Jul/09 12:33 AM
civlet: This may not be related, but moving physical objects that are interpenetrating a prim using llVolumeDetect and the collision event to operate as a scanner (typically these are extremely large megaprims and auto-deployed by HUDs) will cause the relatively large script time that they normally have and yet everybody tolerates (as seen in the ctrl-shift-1 statistics window) , to convert to an even larger physics time (which is then usually very noticeable as time dilation).
Compiling such volumedetect scanners as LSO (not mono) seems to mitigate the problem (and the MONO bug) reasonably in sims with less prims and less moving objects.
...er, i just realised iv'e already mentioned that above. Doh

Simon Linden added a comment - 03/Aug/09 04:49 PM - edited
At risk of getting bombarded with IMs, I did a test using 5 of the scripts above and let it run for 5 hours. I did not see any slowdown. Each rezzer made about 100,000 objects over this period. This is on the 1.27.1 server. I also tried one on Duvillier and didn't see any slow down.

I'm not doubting that there is a problem, but the sample script and rezzer is not the direct cause of the bug. Something else is happening that causes the simulator to get really laggy.

Does anyone have ideas about how the simulator gets to the state where a new mono script slows it down?

On the sims where this is still happening, is there any type of usage that's common? i.e, lots of outfit changes, TP arrivals or region crossings, etc?


Maggie Darwin added a comment - 03/Aug/09 05:18 PM
There is certainly a pronounced TD spike when avatars arrive in our full-sim at Harrington.

The test object causes consistant lag visible in the TD data down to .98 or .97, but certainly not what anyone could call crippling with few (3) avatars in the sim.


Ann Otoole added a comment - 03/Aug/09 05:42 PM
What good does running the same script a zillion times do for a test when it is supposedly the validator that has to run for new scripts. It only runs once for a new script right? then you can run that script a zillion times? Perhaps the region remembers that a script has been validated? I have no idea.

Perhaps it is something more complicated.

No test has much validation unless the test is run under real load conditions. So no sense running any test unless in a region with at least 20 fashionistas in it shopping or haunting lucky chairs or MM boards. I bet you will see some td fluctuations as the avatars come and go at that population level. With 2 or 3 in a sim everything runs smooth.

Too bad we can't have a social network with SL like the old days when 70 in a sim was when it started to lag. Before windlight. Those were the days that put SL on the map. Before Windlight.

Good luck with the testing.


Ariu Arai added a comment - 03/Aug/09 06:30 PM - edited
Simon, what I think of this issue now since I've had a long time to explore it, is that is HAS to be something with the memory allocation for the loading scripts.

Here's a little test I did.

My sim: 2 Avatars, only .4ms of script time being used, 250 prims used.

Rezzed 200 LSL scripts (3.2mb footprint) - Small dip in sim stats for a split second (both TD and FPS drop at the same time)

Rezzed 200 Mono scripts (12.8mb footprint) - Large dip in sim stats for one second (The sim didn't freeze) (TD drops dramatically, FPS drops slightly after the TD restores)

Quiet sandbox, 5 Avatars, (est.) 3ms of script time being used, 2800 prims used.

Rezzed 200 LSL scripts (3.2mb footprint) - Medium dip in sim stats for a second. (both TD and FPS drop at the same time)

Rezzed 200 Mono scripts (12.8mb footprint) - Large dip in sim stats for 4 seconds (The sim is frozen during the 4 seconds) (TD drops dramatically, FPS drops slightly after the TD restores)

(The script chunks were rezzed multiple times without change in results)

At first, I never thought a server would actually choke on that small amount of data. Computers and servers process megabytes of data thousands times a minute without lagging.

I don't think its an issue with just Mono, but an issue with how script memory is allocated when the scripts are rezzed. That, or its just due to so many scripts running their opening events (state_entry, on_rez, changed, etc).

Edit:

I forgot to mention, that deleting the script chunks also causes an impact on the sim, so I wouldn't think it has anything to do with script syntax checks, script validations or anything like that. Not unless the sim for some reason has to re-validate a script before it deletes it.


Rygel Ryba added a comment - 04/Aug/09 12:09 AM
@Simon...

I do think you are onto something - that it's a situation that is not "just" the act of mono scripts being rezzed. I have found that when our sim gets to a point where I can "feel" people entering and leaving the sim that continues to get more and more pronounced. When it reaches the point where I can no longer walk 20 meters without having to let up on the keys to come out of a freeze, I reboot the sim. Things seem to be fine again for 3-4 days and the degradation process begins again. By the time day 10 rolls around, it's usually back to the point again where I'm looking for a quiet moment with no one around to reboot the thing again. I'm not sure how all this ties into solving the mystery, but thought I would mention the fact that a sim restart does, in fact, seem to help – at least for a few days.


Akio Kamachi added a comment - 07/Aug/09 12:34 PM
Steps in testing this issue:

1. Write a script containing only an empty touch_start event (or any) in your inventory.

2. Slide this script in an object 20 times and make sure they are compiled to mono.

3. Create a loop rezzer using while(1) or any other way to rez the object containing 20 mono scripts and slide it in.
4. Activate your rezzer.

You shouldnt need any dilation or fps monitoring to ''feel'' the effect as walking will be barely impossible, and this for as long as the rezzer is not deactivated. Waiting for too long will cause the sim to freeze a very long time to delete your mono scripted objects. Testing the same thing compiled in lsl will cause about no dilation/fps change.

I do not believe it is due to poor performance in script memory allocation as the mono scripts are a lot smaller in this case than their lsl copies. It is not related either to their starting events as touch_start is not called upon rezzing. It has either nothing to do with rezzing a script for the first time as it keeps rezzing the same script. The performance issue is also triggered when switching on mono scripts as experimented and posted on this page on 06/Jun/09 05:10 AM.

The only thing I can suspect being the cause would be a programming error, probably causing the ''verification of the scripts digital signature on load'' being called every time but that would not explain the performance drain upon de-rezzing. I suspect a simple programming error or extremely inefficient implementation of the mono engine. It really needs to be an error considering all the ressources required to start a simple script.


Lionheart Milena added a comment - 08/Aug/09 03:44 PM
I could reproduce the testing routine mentioned by Akio Kamachi. On a good performing sim of mine (FPS 45, Time Dilation 1.00), the lsl rezzer slows down the fps to 43-42 FPS, while the mono rezzer drops the FPS down to 18-19.

I've included the script code for testing purpose:

Create two objects "RezSlave" containing a simple basic script as mentioned above (one object for our lsl testing, the other for mono compiled testing):

default
{
state_entry()
{ llSay(0, "Hello, Avatar!"); }

touch_start(integer total_number)
{

}
}

Script compiled in one object to lsl, in a second object to mono. Then drag the script into your inventory and drop it back 20 times into the RezSlave objects (lsl to the lsl object, mono scripts into the mono object)

Now Create another two objects serving us as rezzer, one for the mono slaves, the other for the lsl slaves, containing the following script:

integer a;
integer b = 100;
default
{
state_entry()
{ llSay(0, "Hello, Avatar!"); } }

touch_start(integer total_number)
{
while(a++ < b)

{ llSetText("MONO Rezzer\n Rezzed "+(string)a+" Objects",<1.0,0.0,0.0>,1.0); llRezObject("RezSlave",llGetPos() + <0,0,1>,<0.0,0.0,0.0>, <0.0,0.0,0.0,1.0>, 0); }

}
}

(Replace MONO with LSL for the lsl rezzer)

now drop the RezSlave Object into the corresponding Rezzer Prims, one for the lsl test, the other one for the mono test.

Open the Statistic Window (CTRL-SHIFT-1)

Click one rezzer and observate the fps...then do the same with the other rezzer and compare the fps values. Mono lets drop the sim performance extremely.


Thomas Shikami added a comment - 10/Aug/09 04:51 PM
The problem isn't with rezzing the same script over and over, the problem is with rezzing a bunch of unrelated scripts that are all compiled with different asset ids. Loading unseen scripts is an expensive task on servers. If you want to mitigate the problem, you might want to limit loading scripts to one new assembly per frame, that'd keep the sim running while slowing down loading the scripts.

SimonT Quinnell added a comment - 11/Aug/09 12:52 AM - edited
"The problem isn't with rezzing the same script over and over"

I would disagree with that. Rezing multiple copies of the same MONO compiled script gives a performance hit, which suggests that something is majorly wrong. Here's what I did.

1. Create a prim and within that prim create multiple copies of the default script using the "New Script" button. All the scripts will be identical (but with names like New Script 11 .. etc)

2. Take prim into inventory.

3. Rez single prim and observe the change in fps (I use the following script to do that). Repeat 5 times and record average.

Here's the results I observed with the spike observed on rezzing the object

Table: Spike in sim performance observed by rezing a single prim.

Scripts in prim Compiled Time Dilation FPS
30 LSL 0.96 43.5
30 Mono 0.51 31.1
50 Mono 0.37 19.03
100 Mono 0.22 7.7

While i realize that this is a trivial exercise and that no one in their right mind will have 100 scripts in 1 prim, the important thing to note is that when I rez the same object with the same set of scripts I still get the spike for mono compiled scripts. There is no memory of the script asset being loaded before.

 
// The beginnings of a region-info script.
string region;
string sim;
float dilation;
float fps;
float min_dilation;
float min_fps;

 
default
{
    state_entry()
    {
        llSetTimerEvent(0.2);
        min_dilation = 1.0;
        min_fps = 45.0;
    }
    
    touch_start(integer num)
    {
        min_dilation = 1.0;
        min_fps = 45.0;
    }
        
    timer()
    {
        string here = llGetRegionName();
        if(region != here)
        {
            sim = llGetSimulatorHostname();
            region = here;
        }
        
        dilation = llGetRegionTimeDilation();
        fps = llGetRegionFPS();
        
        if (min_dilation > dilation) min_dilation = dilation;
        if (min_fps > fps) min_fps = fps;
        
        llSetText(
                "   REGION NAME : " + region + 
              "\n  SIM HOSTNAME : " + sim + 
              "\n TIME DILATION : " + (string)dilation +
              "["+(string)min_dilation+"]" +
              "\n    REGION FPS : " + (string)fps +
              "["+(string)min_fps+"]",
            <0,1,0>, 1.0);
    }
}

Tuft Meili added a comment - 11/Aug/09 02:02 AM
Thomas Shikami said "If you want to mitigate the problem, you might want to limit loading scripts to one new assembly per frame, that'd keep the sim running while slowing down loading the scripts."

One of the worst aspects of this problem is when you are changing sims. Judging from the apparent behaviour, you don't regain control over your avatar until its assets have loaded on the new sim. That seems to include all running scripts in all attachments. When teleporting, this is hidden by the black screen, but when walking across the border, you get the dreaded sim crossing lag, which e.g. have severe impact on all forms of vehicles,

Slowing down scipt loading even further than the delay caused by this bug may help overall sim performance, but the sim crossing lag will be almost unbearable.

It is alread bad enough if you use e.g. prim clothing (or other attachments) that use the one-script-per-prim trick for simultanous color, transparency or size change (to get around the delay in llSetPrimitiveParams() and its derivatives).


Matti Deigan added a comment - 11/Aug/09 03:03 PM - edited
This issue is not only limited to object rezzed by script/hand.

This ALSO includes scripts within ATTACHMENTS.

This issue is making mono useless in anything that is not fully static and dedicated (like a vendor or a server), as it is indeed slightly faster than LSL). This has been fully tested at my sim "Center of Gravity", and by performance loss /lag spikes (upto a td loss of 0.4, and sim FPS loss of 30, from the usual 1.0/45) when occasional avatars teleport in and out with mono attachments.

The scripts above me have reproduced the issue.

"You are at 201598.7, 294566.3, 3.3 in Center of Gravity located at sim7515.agni.lindenlab.com (216.82.35.8:13002)
Second Life Server 1.27.2.129782"


Winter Ventura added a comment - 21/Aug/09 08:21 PM
Been noticing this since it started, but I didn't have the control to tell what was the root cause.

the "Teleport lag".. that forces the sim to PAUSE for up to 15 seconds whenever an avatar tp'd in, or out, or crossed the border.. Everyone floats off in whatever direction, and we all wonder why. I noticed the corellation with dots appearing/leaving in my shop.. but didn't have a clue it was related to mono scripts (presumably in attachments/huds/etc)

This REALLY needs fixing.. and I can't click on "VOTE" any harder.


Cinco Pizzicato added a comment - 24/Aug/09 12:35 PM
Why is this unassigned?

Escort DeFarge added a comment - 24/Aug/09 02:36 PM - edited
This observation of unexpected performance degradation was made quite some time ago: SVC-2967

Maggie Darwin added a comment - 24/Aug/09 02:51 PM
It seems pretty clear to me that the strategy of relying on LL customers to experiment around and figure out why there are unworkable and unaccepatable simulator freezes when avatars enter a region on the main grid is failing. There are too many variables, and we have no access to the simulator source or any tool other than LSL scripts to determine why this is happening.

What is clear is that lag spikes on avatar arrival are excessive, to the point where vehicles (especially those with multiple passengers) crossing a region boundary can be nearly impossible.

If Linden Research no longer intends to support multi-region vehicle use, and the only viable way to move from region to region is to teleport, it would be nice if that would be stated explicitly, and we can simply discard the millions of L$ worth of content that is effectively useless at this time and stop wasting customer time and energy .


Argent Stonecutter added a comment - 29/Aug/09 07:04 AM
Should this really be "unexpected"? Well, the degree of the effect is higher than I expected, but Seal observed similar effects back when Mono was brand new, and I have been concerned about Mono and sim crossing since the first Mono video was shown.

Workaround: use LSL for attachments and vehicles.


BlckCobra Shikami added a comment - 29/Aug/09 09:04 AM
After certain comments and technical discussions it was not really unexpected that we would have to deal with some sim/sim transfer issues on mono scripts, even if everyone was so omptimistic about the mono advantages.
But there seem to be several side/other effects which apparently make the impact a lot bigger than it has to be. After a few days (for some after1-2 days) a region renders quite unusuable with lag spikes of 7-20 seconds which cause the entire sim to stall and make movement and other things impossible.
Moon Metty, me and others have collected data and assembled them in charts/diagramms (see SVC-4196) which show what happens to a sim the longer it runs and how entering avatars and/or rezzing heavy scripted objects affects it.

The silence is frightning: either no one really knows what is causing this horrible problem or solving it is not opportune at the moment for development reasons. (Rest of comment is self censored)


Gregory Maurer added a comment - 29/Aug/09 11:15 AM
If a region is getting slower over time, with a restart resetting the lag, then isn't it a memory leak?

Even if mono causes lag, you would expect it to be fairly consistent.


Mako Mabellon added a comment - 04/Sep/09 02:00 PM
Gregory Maurer: oh, there are far more interesting possible causes for that sort of behaviour than memory leaks. For example, there's something called memory fragmentation, where heavy memory allocation causes the free memory to be split up into smaller and smaller, increasingly useless chunks. Since Mono currently uses a non-compacting, non-generational garbage collector, it's vulnerable to this.

Unless the memory allocator is well-designed, the fragmentation can severely slow down memory allocation over time. I think that, while Second Life uses a fairly decent memory allocator itself, Mono has its own code for this that it uses instead.


Maggie Darwin added a comment - 04/Sep/09 02:43 PM
Is it known outside LL which version of Mono is used in the Second Life Simulator?

Balpien Hammerer added a comment - 11/Sep/09 05:10 PM
I just de-mono-ized all my boats and flyers after trying them out on region crossings. The LSL compiled ones seem to suffer far fewer long hand-off or broken hand-off problems. Many of my vehicles have multisit capabilities with a score of scripts that control each avie who sits in it. Though the system I use turns those scripts off (script-state set to FALSE) except the ones actually needed, I believe just having all those scripts there for the hand-off seems to have a large effect when they are mono compiled. It is diificult at this time to give any decent quantifiable numbers because the myriad of other performance damaging problems induce too much noise in the results. If/when those problems subside, I'll try to run a set of controlled experiments.

Still, the descriptions above show there is a serious design flaw with mono compiled script initialization. This promising change has been sullied by so many fit and polish issues - it is sad to see it fail. Worse, this bug has been assigned to the fictioous WorkingOnIt Linden, which for all PJIRA bug reports sporting that name is the kiss of death.

At least we have a workaround, to recompile everything that does not need a lot of working memory back to LSL. Mono will work for security orbs and large list management scripts. Those are stationary objects, so they will not be affected by the initialziation problems.


Escort DeFarge added a comment - 11/Sep/09 06:32 PM
Of course, we should remember that LSL compile is not the same as it was either...

Balpien Hammerer added a comment - 12/Sep/09 09:31 AM
I wrote a test in which a script toggles the run state of of scripts in the same prim. It does this in a timer loop with a 0.2 second period. Running with the the stats window open, I did not see any discernable difference between mono or LSL compiled scripts, and the absolute difference was minimal.

I did notice something odd though. The controller script shows the free memory amount as hover text. I used this to tell which object is mono or LSL. LSL scripts now have 15K free memory and mono scripts have 60K free memory. Did I miss a major bump in free memory introduced to the simulators? It used to be 4K and 16K.


Kayla Stonecutter added a comment - 12/Sep/09 04:30 PM
LSL scripts have always had 16K free, and Mono 64K. The 1K missing from LSL and 4k from Mono in your free memory report is from the bytecode and working memory from the script. Mono's bytecode can be up to 4x larger than LSL which is the reason it was given 4x the available memory.

Argent Stonecutter added a comment - 13/Sep/09 10:06 AM
@Balpien: The result of llGetFreeMemory was tweaked for a while under the assumption that scripts would break if they suddenly saw more than 16k free. You may be remembering seeing the results of that short-lived tweak.

Maggie Darwin added a comment - 18/Sep/09 04:42 AM - edited
In office hours:

[2009/09/17 17:48] Andrew Linden: We've got someone who will be looking into lag (and SVC-3895) for server-1.32
[2009/09/17 17:48] Andrew Linden: I would expect some progress there. The various LL brass have got the "we need to do something about lag" bee in their bonnet.

So, I would say this means a fix is at least three server versions away.


Catten Carter added a comment - 18/Sep/09 04:48 AM
LL is now doing even number grid releases, so 1.32 will be the next server rolled out once 1.30 is deployed to the whole grid.

Luna Bliss added a comment - 18/Sep/09 06:49 AM
Well that is just great....now that I have deleted Bliss Gardens, a beloved 5-sim nature area below my 5-sim store, so that people could actually cross sim borders in the store without flying off into the void or falling through the floor (usually occurring when script usage had escalated to an absurdly high number and sim FPS and time dilation were going up and down repeatedly, and occurring so frequently that I'd have to run around my sims restarting them hundreds of times a day to fix it).
I really can't believe this has taken so long to be considered seriously...even the Linden who finally came out to check on the issue after my many support tickets knew nothing about it and asked me to record incidents. I don't need to record incidents - I have hundreds of reports from people who said they could not shop in my store anymore.
Does anyone know the particular kind of scripts that are the worst offenders? I will try and remove those from the store as well. I am still experiencing this though not as severely since I deleted half the prims and scripts in my sims.

Ariu Arai added a comment - 18/Sep/09 09:44 AM
@Luna.

Sorry to hear that you had to give up your sim work, :o . The type of script that causes this excessive lag is any mono compiled script, in a large cluster. Usually 60+ mono compiled scripts in one link set will cause .8 - 1.2 seconds of lag in a sim under normal stress (4+ agents, around 6ms script time). What you can do to try and protect your sim from these lag spikes is to try to keep your sims script time as low as possible. The less stress on the sim, the less time the mono-script clusters will take to load. Though there really is no way to prevent the lag, it's going to happen even if your sim is nearly empty, you can only prevent serious lag.


Good to hear a Linden is finally looking into this issue. I'm disappointed in the response time though, almost 7 months?


DBDigital Epsilon added a comment - 18/Sep/09 10:58 AM
I have to agree that while it is great that the Lindens are looking into this, but the time it takes it rather excessive. Such large issues should be dealt with quickly not many months later. Though I must admit that in our case this has only become excessive in the past two updates (not counting todays).

As Andrew said: "Andrew Linden: I would expect some progress there. The various LL brass have got the "we need to do something about lag" bee in their bonnet." Sadly they should have this bee when it first appears not many months later. In actuality efficiency is something that should be strived for continuously. For if we have unhappy customers, they don't pay, and we can't afford our tier, which in the end means that LL loses out as businesses are forced to close also impacting on the over all SL economy and less overall content as well.

Also think of all the new potential customers have logged into SL think that is great then see all the lag to leave and never return. Since SL has a large learning curve to begin with, one should make it as easy as possible. Excessive lag is not condusive to a pleasant learning experience. It simply frustrates people and they leave.


Maggie Darwin added a comment - 18/Sep/09 11:00 AM
Catten's point is well taken; I'd forgotten about the new release numbering scheme. Although it will be interesting to watch the rate at which new versions are debugged and deployed under the new scheme. Those odd-numbered version aren't just decorations.

I'm sorry for Luna's loss...proving we need better content mass-backup tools.

Just goes to show you though...had to get to the point where "the various LL brass have got a bee in their bonnet" before resources are applied to difficult problems. Otherwise it's the low-hanging fruit for a quick Love Machine score.



Ann Otoole added a comment - 29/Sep/09 05:07 AM
Well if it is a GC issue then no wonder frequent region restarts are becoming a part of Second Life.

Maggie Darwin added a comment - 29/Sep/09 09:01 AM
Chances that this kind of junk is going to be properly addressed by a dev team full of Mono/C#/.Net/VS fanbois strike me as pretty small.

Now that LL and the residents have "invested" huge amounts of time and money in this totally busted project, there will be no way to resolve the cognitive dissonance but to blame the residents (you know, the CUSTOMERS) for the now-broken infrastructure and impose more resource restrictions on them to try to restore grid performance to acceptable levels.


Balpien Hammerer added a comment - 29/Sep/09 10:51 AM
Other than a few privacy/security products and a visitor tracker, all requiring a lot of scratch memory for their lists, I have de-mono'd everything I make and notified my customers.

The mono project, laudable as it might have been, is completely broken. About the only way to cause a multiprogramming system to lock up for seconds at a time is to have written something that requires serialization but have included macro-operations inside the critical region. I'd not be surprised to see a serialization lock aroiund the entire JIT process, which probably includes file I/O or worse, network communications happening inside the critical region.

Moving on to diagnosis, I found that rezzing he same object repeatedly did not seem to affect TD or FPS (the rezzed object has but one script in it). But, I could reproduce SimonT Quinnell's experiment. I did it using a widely used object, the solop dance server and myriad danceanim scripts. Rezzing a mono compiled version of that (with 64 danceanim scripts) lowers TD to the floor briefly. Since just about everyone attaches those dance HUDs to their avatars, I am not at all surprised to find grid-wide lag anymore.

I created a solid reproduction of this problem using a demo prim pump that rezzes a bunch of spheres each containing 64 instances of the default "Hello Avatar" script. This simple device tanks FPS and TD to near zero. I can barely move around when I activate this device in an empty region. TD 0.15, FPS 8.6. Starting up an LSL version of the prim pump does affect SIM performance but not to the massive degree that mono scripts will, TD 0.82, FPS 33 (see photos). I'll send a copy to Simon Linden.

At this point I 'd say we have more than a performance showstopper problem. Any script kiddie can effectively TP into a region and paralyze it if build is enabled.

Next test, a retry of the llSetScriptState on 128 mono compiled scripts.


Balpien Hammerer added a comment - 29/Sep/09 11:00 AM
Solid reproduction of mono scripts causing extreme SIM lag.

-environment--
Second Life 1.23.4 (123908) Jun 11 2009 15:16:56 (Second Life Release)
Release Notes
Built with MSVC version 1400
You are at 142568.5, 270751.2, 25.3 in Farhaven located at sim3421.agni.lindenlab.com (216.82.21.156:13000)
Second Life Server 1.30.0.133784

--photos--
46 - The two pyramids and cylinders are the rezzers (inactive). A visual lag meter is in the background. TD-.98 FPS 44.5

47 - mono rezzer activated. TD-.15 FPS 8.6

49 - LSL rezzer activated. TD-.82 FPS 33.5


baron nowhere added a comment - 29/Sep/09 11:31 AM
Thank you everyone for your comments and attention on this topic. I'm glad to see it stay constructive even with how frustrating and critical it is. I think you have demonstrated a lot of interesting clues, and I hope that all of the right people in LL are looking closely at this Jira.

I wanted to take a moment and also opine for the designs that have counted on Mono and would be a disaster of LL suddenly decided LSO was a better mousetrap:

My specific dilemma is that I spent over half a year carefully looking at how the lindens described the differences for Mono, and built a solution around that. My object is 4 scripts that I would have constructed as 20 scripts using LSO. I'm using around 55kb of byte code in each of my scripts now just for the program "stack" (with the additional memory reserved for heap), I'd be looking at a solution were I'd probably need months to redesign and add lots of new linked messages and additional traffic inside my prim for functionality I am currently able to encapsulate inside a single script via a few carefully designed efficient functions.

Switching to LSO for me would likely introduce countless new issues and set me back at least 4 months.

I'm trapped by Mono, and hoping that Linden Labs addresses the issue before I have to undergo such a dramatic process.

The other scary thing for me, since I was participating in Mono during the beta, was the announcement when Mono was moved from an active project into "maintenance". My impression was that the team was disbanded and moved onto other projects in LL (obviously with the goal of continuing to enhance our SL experience). All of us who were participating in Mono were a bit surprised at how quickly it moved into maintenance, and were disappointed that there wasn't a more sustained effort to monitor and polish up Mono and scripting initiatives.


Argent Stonecutter added a comment - 30/Sep/09 05:10 AM
@Balpien: they have explained why the whole sim locks up for the entire rezzing process in the "1000 prim rezzing limit" JIRA, it's because they don't want to perform a physics step with a partially rezzed build in place.

Haravikk Mistral added a comment - 30/Sep/09 07:02 AM
What does that mean a physics step? Couldn't the object just be treated as a phantom, non-physical object (i.e - have no physical presence) until it's fully rezzed?

Moon Metty added a comment - 30/Sep/09 01:27 PM - edited
Background of the freeze-detector.

When Mono was still in beta, we tried to come up with all kinds of tests to measure sim-performance. One of the things that came out of that was my dilation graph. It uses particles, because they are not affected by sim-performance.

Then later when Mono was released, I continued experimenting. I realized llGetRegionTimeDilation() has a problem: the closer dilation gets to zero, the worse the function performs. The extreme case is dilation 0.00, when in fact everything on the simulator stops working.

I changed the graph to compensate for that flaw, by referencing to a real-time clock. Now the average would show the after-effects of severe lagspikes. What we now call "freezes" would show as a flat line, because the simulator doesn't send updates, followed by a large dip in the average.

After staring at the graph for a long time, I made the first freeze-detector. This was in january, and it used the timestamp as reference. Much later I discovered that llGetAndResetTime() gives nearly identical results, and is much cheaper.

=======

The freeze-detector left a lot of messages in my chatlog since january. It's a very raw log, because:

-It's a HUD, it follows me around from sim to sim.
-It's impossible to tell what's happening on the region to cause the freezes.
-It doesn't say how long a region has been up.
-I do various bug-tests, causing freezes.
-I'm not always wearing the HUD, because of the spammy nature of the detector.

Still there are some things that may be helpful ...

=======

In january and early february, most freezes are shorter than 1 second, long freezes are 2 seconds. Even on extremely unhealthy sims, the freezes don't exceed 5 seconds.

Then halfway february etch is introduced (the 64-bit operating system). At first this seemed to remove most freezes, but after a few days they were back. Now this is very hard to tell, but I have a feeling that a region on etch performs better initially, and then deteriorates faster, to a slightly worse level. Obviously there is no way to confirm that on my end.

The real trouble starts in early april, when 1.26 is deployed. Freezes longer than 10 seconds start to show up. There is no sharp transition, probably because there were a number of restarts, the regions weren't up very long.
Well, in may and june we saw those terrible freezes, lasting over 30 seconds.

=======

We joked: "The freezes last longer than a region-restart".
Not far from the truth, hehe.
I hope my observations help to solve this problem.

=======

Edit: Oops, this was meant for SVC-4196


Maggie Darwin added a comment - 30/Sep/09 01:38 PM
I have to admit that in Harrington we haven't seen many freezes that exceed 10 seconds. But when an avatar arrives there is always a distinct, noticable dip TD that runs around 3-7 secs as observed on Betlog Hax's histogram. Not at all unusual for some of those bars (at one bar/sec) to show below .2 TD.

Argent Stonecutter added a comment - 01/Oct/09 08:25 AM
Haravikk: a physics step is the 45-times-a-second updates that the physics engine (and scripting virtual machines) perform. Both subsystems are stopped during rezzing to make sure that the state of the simulation is consistent before and after the new objects are added. I suppose they could do things differently, but freezing the simulation while rezzing new objects and scripts is the scheme they have chosen.

Ariu Arai added a comment - 01/Oct/09 09:10 AM
Now that makes a lot more sense, a 'loading' freeze for stability. Though with somewhat new Mono scripting engine, I believe it's time to change it from the 'simulation freezing', to something less disruptive. Perhaps there could be a way to 'pre-stage' the new objects/scripts in reserve memory, then inject them into the simulations memory and send all the update packets/etc. Alternatively, the lazy way of addressing this issue would be to have the simulator send a "Paused" packet to the client, which will show a little "Loading" icon somewhere on the viewer, and have it cease all physics in the viewer (That way you don't keep walking/objects don't start flying all over the place).

Argent Stonecutter added a comment - 02/Oct/09 04:39 AM
There's no physics in the viewer, just simple interpolated movement.

If you're just concerned about the rubber banding, it would be better to simply stop interpolated movement in the viewer if there's been no packets received from the server in the last couple of seconds. That way it takes care of network problems and the like as well.

The real issue here is that the startup time for Mono is so much higher than for LSL2. That's what needs to be addressed, even if only by officially de-emphasizing mono for attachments.


Vex Streeter added a comment - 02/Oct/09 03:46 PM
@Ariu the problem is that every avatar on the sim is effected, not just the avatar entering the sim.
@Argent I think the issue isn't that Mono takes longer to start up, but rather that while mono is starting up, the sim doesn't do anything else. I still find it incredible that even the most highly scripted avatar can be causing these sorts of delays - seconds are an eternity for these machines' cpus - smacks of extremely serious resource contention or blocking on the network (either directly as is "stop everything while I retrieve the mono bytecodes from this other machine" or indirectly as in "stop everything while I log the fact that I just started a mono script... but something else has locked the log queue that is blocked on the network"). So many ways to fail...

Balpien Hammerer added a comment - 02/Oct/09 05:20 PM
Argent, yes, I read that report. One interesting observation is if I rez non-physical phantom objects (and they are programmed to die almost immediately), I can tank a SIM far worse than my previous physical mono object rezzer. Eliminating physics interactions bring the problem down to a pathologic JITing problem. i really see no reason why perfomring a JIT has to stop the entire simulator. Again, I can rez the same objects, same conditions except that the scripts in the rezzed objects are LSL. Then the SIM is mostly unaffected. This suggests to me that object creation is not the problem.and, we already know that. There are thousands of combat SIMs that in the past have run just fine while people rez zillions of bullets at each other. Seconds of stall is highly indicative of performing long latency activities while holding a system wide critical section lock. That's just a horrid design.

Haravikk Mistral added a comment - 04/Oct/09 05:21 AM
Okay, but why should Mono delay the physics step? Or more generally; why should rezzing delay it?

Shouldn't a rezzing object just appear visually-only, and otherwise exist "outside" normal operation until it's read? i.e - the sim would have a dedicated thread for loading an object, and all that's associated with. Once an object is loaded, all its script state created/rebuilt and so-on, then and only then is it added to the physics and scripting engines. Until this time the object is basically a non-physical, phantom object with paused scripts. The visual appearance of the object would appear gradually as normal.


Ardy Lay added a comment - 07/Oct/09 04:07 AM
@Babbage Linden –
The two coalesced objects I provided have a simple script in each prim. One set is compiled LSL the other MONO. They listen on channel 33 for two commands "physical", not needed in this reproduction, and "debrick", which calls llDie(). When rezzing them, note the interframe processing time difference between the two. Same when saying /33 debrick
It is quite a remarkable difference in many conditions.
Item names are:
(al) BrickStack 990 LSL2 SVC-3895
(al) BrickStack 990 MONO SVC-3895
Very simple script in each brick:
-script-
default
{
state_entry() { llListen(33, "", "", ""); }

listen(integer channel, string name, key id, string message)
{
if(message=="debrick")

{ llDie(); }

else if (message=="physical")

{ llSetPrimitiveParams([PRIM_PHYSICS, TRUE]); }

}
}
-script-


taff nouvelle added a comment - 30/Oct/09 07:48 AM
Just a thought, is this in any way caused by the proposed scipt limits
Is a chunk of memory being reserved for the scripts and not being released.
I note that someone mentioned that rebooting a SIM clears the problem for a time.

Ariu Arai added a comment - 30/Oct/09 08:31 AM
@Taff
I don't think the script limits were ever implemented, atleast as far as I can tell. I tested this Jira a couple times, sometimes using massive amounts of scripts. (500+) Though I was the land owner, and I was the only person in the sim.

I would imagine the server memory is allocated per each sim (4 sims per one server for full regions), and for each sim, I'm sure there's reserve memory for scripts and active object data (Agents, objects, etc). Though the problem isn't with memory or script limits, it's caused by the region ceasing simulation until the new objects/scripts are loaded. As I mentioned above, the way to fix this would probably be to have the sim "pre-state" the objects (Process them), then inject them into the simulation, not inject them into the simulation, as it's being processed, that's just silly. This way, the simulation would continue, and you'd only get a slight bump in sim performance, when the simulator injects the new object/agent. (and probably a slight bit of lag as the object/agent is pre-staged before injection).

Rebooting the sim would clear the cache, and usually iron out a few kinks, but it wouldn't solve this Jira. If anything, it'd get worse, since the simulator would have to download the assets on the recent agents again if they teleport back, instead of just pulling them out of the cache file.

Anywho, as mentioned above, there should be a fix, or some progress in Server 1.32 for this Jira, the sim that is being deployed now. I haven't had time to test it though.


Imaze Rhiano added a comment - 31/Oct/09 04:35 AM
Scripts limits are not yet deployed.

Currently what is done are script diagnostics that will allow to monitor script memory usage and other statistics in server side. When this information is made available for client we can start monitoring our own attachment memory usage and parcel memory usage (YAY!). After this is made available for all residents in SL - there are at least 6 month grace period when memory limits are not enforced. When script limits are finally enforced - then residents can't rezz / run scripts anymore in attachments/objects that are beyond their memory limit budget. How is selected what objects or scripts are not allowed to run or rezzed is still open - also memory limits are not yet decided.

According Babbage: 99% of avatars are using 10MB of scripts for attachments - while at the 99% of entire regions are only using 65MB for parcel and vehicle scripts. If they want to support 40 avatars on a full region, they need to budget 465MB of the 800MB simulator's memory for scripts. There is at least one avatar that have 7612 attached scripts and using 98MB of memory. The median SL avatar has 100 scripts in their attachments and uses 780KB of scripts in those attachments (I am not sure how they did come to 780KB number - 100 LSL classic scripts should use 100 x 16 KB = 1.6MB + change. With mono scripts it might be possible have that kind memory - if scripts are very simple.)

Another development branch - beyond memory limits - is to improve LSL such way that you don't need anymore hundreds scripts in your attachment for your functionality. For example getting linked primitive parameters, improved scaling functionality and allowing mono scripts to allocate more memory for their use (so that you don't need to divide scripts anymore to multiple scripts to avoid memory barrier). Viewer code is also improved - especially among 3rd party viewers. For example: Emerald's radar and viewer side animation override. Also possibility of viewer side scripting is researched.

When object or avatar is rezzed to simulator, simulator loads objects/avatar's scripts to memory in non-threaded operation at single frame. For Mono script this loading is more CPU intensive, because Mono verifies that loaded assembly is valid by checking CIL code in assembly. In worst case scenario avatar have hundreds scripts and all of those have different assemblies - for example someone have converted their classic LSL hair resizing script to Mono script by just recompiling all scripts in hair to Mono. According Babbage changing this loading behavior to multi-threaded and multi-frame - is huge task - and might take long time to fix.

Check also Babbage's OH logs http://wiki.secondlife.com/wiki/User:Babbage_Linden.


Babbage Linden added a comment - 02/Nov/09 07:23 AM
I'm looking at this issue this week. If you could send me sample Mono scripted objects that either cause freezes when rezzed, derezzed or running along with instructions I'd appreciate it. The simpler the scripts the better from the perspective of diagnosing this problem.

Haravikk Mistral added a comment - 02/Nov/09 07:42 AM
The scripts are irrelevant really, you just need a lot of them. Fill an object with a load of identical scripts compiled in Mono, and another object with scripts compiled in LSO-LSL, and you should notice a big difference when rezzing the first object when compared to the second. For a particularly noticeable difference use over 50 or so, maybe it'll make it easier to diagnose.

mosseveno tenk added a comment - 02/Nov/09 07:46 AM
Rezzer type objects are particularly bad. I've had everyone recompile rezzing style vendors in LSL, and it has helped.

Vex Streeter added a comment - 02/Nov/09 07:57 AM
@Imaze et al. Why does Mono CIL need to be verified at all at the destination sim? Shouldn't it be verified once at asset creation (e.g. compile) time? I can see an argument for local verification of CIL in the long term IFF the code is crossing grid boundaries, but otherwise, this sounds like a vast waste of resources.

Colleen Marjeta added a comment - 02/Nov/09 07:58 AM
A good example we've seen is with bullets which are compiled with mono. LSL scripted bullets rez no problem, but shoot a lot of mono scripted bullets can bring an empty sim to its knees pretty quickly. I'll see if I can get a sample to you this evening when I get online.

Moon Metty added a comment - 02/Nov/09 09:47 AM
For testing I always use:
default
{
    state_entry()
    {
        
    }
}

On a freshly restarted region, the freezes last around 3 ms per script, so you need quite a few.
But if the region has been up for a few days, the freeze time rises.
After a week of uptime, you often see 20 ms per script, so 100 scripts freeze the region for 2 seconds.

I think the freeze time grows from the moment the region is restarted, and keeps growing until the next restart.


Balpien Hammerer added a comment - 02/Nov/09 11:32 AM
@Babbage

I sent you a simple rezzer, full perms. The rezzed objects, non-physical & phantom, have 64 default hello world scripts in them. When active, it will reduce TD and FPS to near zero. And yes, there seems to be a correlation with how bad the effect and how long the region has been running, though I recently tried this on a freshly restarted region and still saw TD drop from 1.0 to 0.5.


Moon Metty added a comment - 02/Nov/09 04:54 PM
Balpien, the time-dilation is 0.00000 during a freeze.

You can see this by the network-bandwidth after a long freeze.
It's slightly higher for a very short time, there is no catching up to do. The region was frozen solid, so the only updates come from clients that are connected and maybe neighbouring regions.


Moon Metty added a comment - 04/Nov/09 09:54 PM
Here's an observation:

Plain unscripted cubes also make a short freeze when rezzed.
On a freshly restarted region, this freeze may be even longer than one caused by the rezzing of a Mono-script.
The difference is that the freeze-time for prims stays more or less the same as the region is up longer, and for Mono-scripts it grows steadily.

=======

The lengthening of the Mono-freezes was always more present on agni than on aditi.


Imaze Rhiano added a comment - 05/Nov/09 02:45 AM
Okay - Babbie has made some progress. Read full office hour transcript here: http://wiki.secondlife.com/wiki/User:Babbage_Linden/Office_Hours/2009_11_04

According Mr B there is 3 reasons for this problem:

  1. Problem with the scheduler - they already have fixed this problem a mono-scheduler branch, but it requires good deployment plan, because fix requires that people retest their scripts.
  2. Scrip verification is taking too much time. The reason why they need to do digital signature verification currently is that they used to allow UDP uploads of scripts. Signature is used to check that script has gone through verification process some point. Some possible solutions for this:
    • Upgrade to mono 2.6 - and then use CoreCLR security sandbox rather than checking that scripts are safe in advanced. This is going to happen someday - but this is likely also going to take year or something.
    • Multithread rezzing task. This needs to happen - but no timeline yet.
    • Spread rezzing taks over multiple frames. Multithreading is better option than this one - and multithreading is going to need anyway in future.
  3. When derezzing objects server is doing all pending deletions in single frame.

I would say that changing single thread simulator code to multithreading one - is going to take long time and is full of hazards - but it needs to be done, because in future mutliprocessors (10+ cores) are going to be standard. So I guess - fixing this problem is taking year at least - even when Babbie knows where problem is.

PS: Babbie still requested more scripts and objects around this problem - so please keep sending them for him.


Argent Stonecutter added a comment - 05/Nov/09 04:10 AM
Until this problem is fixed, they definitely need to hold off penalizing LSL2 scripts by artificially inflating their size when they set script quotas.