|
|
|
I've noticed this too - Often also not being able to move when an avatar enters a sim - bandwidth shows as 0 in statistics bar, and there will be 3-10 sec before I can walk again. Only happnes when an av tps into a sim.
I have attached some diagrams as discussed with Prospero earlier.
The graphics show pretty well what happens when several pims are rezzed or visitors with highly primed attachments arrive. Basic of the diagrams: green: Number of prims on a certain parcel (x 0.1) The graphs have been scaled to properly show the effects (see legend) Diagram 1 shows in detail what happens when a person rezzes some (about 140-180) heavily scripted prims (about 9 scripts per prim from a builder tool) Diagram 2 shows the spikes when visitors arrive or leave. No prims are rezzed in the sim at that time. But noticable spikes (EPS, dTimer) appear on arrival or leaving of avatars. Diagram 3 show an interesting fact at about second 24131. At that time 1.26.3 Rolling restart hit the sim and after that the sim opearted much smoother and was hardly affected by prim rezzes or arriving visitors. 'We are continuing to monitor this because it is suspected that it will become worse again over the time due to possible memory leaks in the simulator'
It doesn't even need to be a memory leak per se, however memory is indeed the cause. To my knowledge, the memory pools for scripts that Babbage is working on and that are going to be implemented on a parcel and avatar basis are being developed mostly due to the fact that simulator performance - on all four regions hosted, that is - becomes abysmal once the machine needs to swap data to hdd and back into memory. This can be caused by a memory leak..orrr by simply too many scripts being accumulated on one of the four regions on the simulator. I myself witness this sim behavior almost daily in one of the public Linden sandbox regions. At a certain time of the day, long after restarts, performance just becomes horrible. Mere rezzing, logging in and out etc. take much longer than usual. net time, agent time, script time go through the roof. My suggestion is to check the four regions on the sim for total script amount and script memory next time this happens. Of course this can only be done by a dev, but it certainly'd be worth it. We have reduced many scripts for testing on our sim (only 2000), but if some avatars TP to the sim, all scripts on the sim stopping 15-30 seconds completely. Walking is not affected.
I asked in another Group and many many people have the same problem on their sims too. Zyngos for example stop 30 seconds and all scripts on sim stop there too. There are many more people out there that have the same issue, than votes. Hope this problem can be solved soon. It affects not only one sim, it affects many sims, if many avatars TP in and out from a sim. Only sims with no avatars to TP in it are not affected. If sim was restarted new, it runs ok, after 1 tag the issue is back again. Hope it can be fixed soon becasue its a serious problem. If there are 19 Avatars on the sim, and 1 Avatar TP to the sim, the script stop takes 30-45 seconds. only if the sim runs longer then 1 day. looks like a memory leak
hey I am a game room owner and every time this thing its happening to me all the players get angry, and i am loseing money cause they dont wanna play at my place cause it stop running, to me its a major problem and lindne have to fix it if they want us to stay here, we are 1 part of the reazons people get lindens with his credit cards, so i think they have to pay attention to us too, my place it start with a freeze (stop) of 7 seconds, but now its bigger, i saw the games stoped 3 minutes and more, cause that, dont acceptinga pays and other issues i please ask linden lab to check this issue cause i think its affecting now a few, they just dont know where to report it
I have a renter (who is renting a full sim from me for a gameroom) who has this same problem. She says Linden Labs closed her ticket 7 times without a satisfactory answer (she re-opened it after 6 times but gave up after that). Needless to say, she's very upset about this and is now talking about shutting down her gameroom because of this, which means lost business for me as a landowner.
It seems that when avatars teleport to the sim, they can run scripts(in their HUDs and attachments) even if the land settings don't allow it). I don't remember this being the case in the past. What has changed? This problem needs to be fixed ASAP. We created the above charts on a daily base to see wether the behavior changes from day to day. And there is definately a visible trend.
While the sim behaves pretty smooth on the first 1-3 days after a restart, the delays become larger every day. After about 10 days the sim reaches a state where delays are noticable big and affect avatars massively (stuck in movement for 10-30 seconds). Scripts processing is also affected. It's not clear what causes this, but it appears to be related to rezzing/de-rezzing scripted objects. Avatars with lots of prim attachments TPIng into the sim cause massive delay spikes which grow with each day. Tests show that the same avatar with the same number of attached prims causes a bigger delay with progressing time. After another restart the simulator acts smooth like before (see diagram 3). I found a solution if you own the complete sim or if all other residents on sim do the same way:
Over 80m all Avatar attached Scripts begin to work. You can not switch them off in land (settings-> Scripts on land off) if you are over 80m or in a skybox. They would continue running. What you must to do is following: 1.) Move your place or the complete sim to Ground <80m! So one of the Problem is that LL reduced the Script-Running high on land to 80m, on some sims i saw 50m. It does sound very much like Mono LSL has significantly higher real memory requirements than LSL 2...the existing real memory is overcomitted.
More real memory must be added, or fewer regions should be run per server. Quotaing avatar's memory usage is a lame response to a problem actually caused by a bloated Mono LSL implementation. Yes, this will cost money. Inefficient software is cheap to develop but expensive to run....and the grid is a huge runtime multiplier. This bug i suspect is in the new avatar script monitoring software they added to the top scripts.
Just like when everything pauses for the perons running top scriptswhen your refresh top scripts, everything stops for everyone when the avatars arrive in the sim and this software is adding them to top scripts i suspect. Depending on how busy the script is the mor it effect the sim and the faster it brings it to a grinding halt. We hare forced to restart several busy sims, once a day to keep this at bay though sometimes it takes 2 restarts a day. It freese scripts, avatar movement, etc. To date all contact wiht lindens have been denial there is a problem, or blaming it on too many scripts, even though it has been tested on severla sim with far below the recommended script traffic. I've been experiencing major dilation issues lately on my sim, which is on version 1.26.4.120562. After a reboot it runs smoothly for several hours, sometimes almost a full day before Net Time peaks to insane levels, sim FPS and Dilation drop to 0. Everything described here seems to fit nearly perfectly with the problems I have been experiencing. I really hope this is fixed sooner rather than later.
We've found that in sims where this is a problem, there tend to be significant collision lag problems due to scripted animal products or other physical objects like vehicles, and or elevators built in tight elevator shafts, and that high physics lag amplifies script lag measurements.
We've worked with Sion Labs to reduce the physics lag of their sionChickens from 150-200 collisions down to 3 for mature chickens and 30 for chicks and growing chickens. We are in discussions with Zooby's to help them improve their products as well. We are offering content developers Estate Manager power in our sandbox sim of Area 31 to test their products for lag to reduce both script and physical lag issues. Estate owners should require all residents update their sionChickens to "Version 11 Lagless" (not just Version 11) and similarly request that pet makers reduce script and physics lag much more. Scripted pets should not be left out to roam around. Returning these pets helps improve sim conditions significantly beyond their own script lag reduction. Pet owners should be requested to update their pets to the newest version. Elevators should be built with at least 0.75 meters of space around the elevator cars on each side to minimize physical friction with the shaft walls. Physical vehicles should not be permitted to be left out at any time. If someone wants to make a statement with a flashy vehicle, they should strip the scripts from it if they just want a prop. We have a covenant lag limit of 1 ms of script lag per 4,000 square meters, which would limit script lag to 16 ms per sim, just below the limit where things start to get swampy. Many residents are very stubborn and dont care, or dont think its their problem or that they are contributing to it. Unfortunately, when avatars teleport into a sim, they have to download everything in the sim, all prims, including everything that other avatars are wearing. If you take your typical primmed out blingtard with 2000 prims worth of hair, shoes, jewellry, and clothing, times 50 blingtards per high traffic sim, then you are talking 100,000 prims, plus their scripts and textures, that must be downloaded beyond just the 15,000 built prims in the sim. Of course you are going to have a significant lag spike every time someone tps into the sim. Thats what you get for high traffic and catering to blingtards. @IntLibber Brautigan
This happens on my building sim. Total cpu time of 0.5ms for all scripts from 282 scripts. I never have more than 3 Avi's in this sim at any time. This is not the normal Lag encountered in High traffic sims, poorly written scripts, collisions or physical objects. Plain and simple this is most likely an issue with a memory leak, caused by the avi/object script monitoring code or a new bug introduced into the Mono scripting system. @IntLibber Brautigan
Sorry this has nothing to do with your chickens, in fact your chickens are banned form all my ssm sims and BRATZ sims, due to the heavy lag they cause. Nor do any of the sims tested ahve any of the things you mentioned in them. Finally please note this problem developed after the avatar script monitoring sorftware arrived. Before that every sim tested did not ahve this problem. This is Definately a new bug caused within the server upgrades, not sim item related. Finally sorry but i find it a little insulting for you to walk in here and dismiss a bug based on your limited view of the problem. I don't recall your chickens being mentioned here, yet you seem to feel this is a good place to advertise their update. This bug is tested in some 25 sims now, with various levels of scripts, traffic etc. All are effected the same by the avatars tping in. Regards, samatha Congrejo This simple script is a good way to detect region-freezes:
default { state_entry() { llSetTimerEvent(0.01); llResetTime(); } timer() { float elapsed = llGetAndResetTime(); if (elapsed > 0.5) llOwnerSay((string)elapsed); } } The timer doesn't actually trigger 100 times a second, it just works as fast as possible, which is about 10 times a second. ======= You may say 0.5 seconds is a bit strict to count as a freeze, so I made an object that categorizes the freezes in duration. See screenshot "Freeze_distribution_Neptune_01" Above each bar you see the minimum and maximum time for the category, and the number of freezes. You would expect the graph gradually falling off with longer freezes, but this is not the case. See "Freeze_distribution_Neptune_02" ======= During this test the region was relatively healthy. Also, the freezes come in patches. Sometimes the short term average freeze-time is well over 20% In Harrington sim, I keep a Skidz Partz lag monitor running. In recent days (last couple months) , it's started reporting sim crashes when none are observed either by residents or by concierge Lindens.
Skidz tells me the monitor detects crashes by recording a timestamp, setting a one minute timer, and then inspecting the current timestamp; if there's more than two minutes difference it reports a crash. So there ya go... Same thing happens on Munkie Island where I spend most of my time. Whenever an avatar enters or leaves the region, regardless of how many scripted attachments they have, the region will take a major time dilation hit, usually down to 0.38 and stay there for about 5 seconds. It has gotten very annoying to deal with this. This also happens on every other region I have been to.
I wonder why LL is saying not important issue. Its a gridwide Problem and if you have Time of Weeks/Month to Analyse you see that its a global Problem.
I need a restart of Sim now again after 2 Weeks, but LL is on working at the Sim-Restart Ticket now since 5 days, its a mainland sim. One week ago I posted 2 graphs of the freeze-distribution on Neptune. This week I have 2 more.
Again, the test shows a period of around 85 hours, so that's convenient. See screenshots "Freeze_distribution_Neptune_03" and "Freeze_distribution_Neptune_04". It looks totally different! -A few days before the start of the test, the region had a restart, but the region had a restart before the first run of the test too. It's very hard to tell ... I wonder that you have the Issue after a fresh restarted sim. At our sim after 1,5 Weeks the freezes becomes to be heavy. And its a pain to restart the Mainland Sim with a Ticket to wait meanwhile 7 days without work on it. Thank god my landlord has now restarted it.
For big testing you can get the gridwide Money XPloder bag. Set it to 300L$ and 100 Avatars come to your sim in 10 minutes. You can see at every arrivement of an avatar that scritps stops if the sim running 1-2 weeks for 10-25 seconds. Then the next Avatar arrive, scripts stops again, the next AV arrive, scripts stop again. You will see you have 5 minutes on the sim without working scripts on sim, its a really pain and absolutely showstopper, if you have popular sims with many avatar arrivements. At our sim nothing is changing, we own it complete with no change of something , so we can good compare. (no renters and so on)
As of today, this problem still exists. it seems to vary in degree depending on the location: some places are far worse than others. The region 6pi seems particularly bad, even with just 6 avatars in it. Whenever someone elaves or arrives, it is total lock-up for up to 15 seconds or more: if you standing, you can't move, if you're walking, you just keep going.
If this is really somehow dependent on the region, I would hate to have a business or live in one that is badly hit by this. And Linden's silence on this is astounding. We have these short moments where we can't move or do anything else in simulator, everything is totally frozen. This behaviour is new. Maybe some months. I did not know the reason is other avatars entering the sim in the same moment, interesting.
I am not sure I agree that this is completely related to avatar entry..... I have been using the script above to track these SIM pauses along with GreenLife Emerald Viewers "Sim Entry" notification feature- and while monitoring, some of the worst delays there was no sim entry or exit.
Yes, sometimes they coincided: Other times however, they did not: I can confirm that each time one of these events took place of over .55- I noticed the sim pause. This problem seems to get worse over time, thought I don't have data to prove that. This seems to be MUCH worse since version 1.27 was rolled out.... 1.27.1 was a little better(reboots needed every 3 days) but 1.27.2 is much worse (Reboots needed daily). I stated in the original Jira notes, that we found that this issue occurred when objects were rezzed also, the more complex scripting wise the worse.
I personally believe this is related to the "fixed" memory leaks, as i gets worse with time, and a sim restart will remove it for a short time. We've been restarting our sims daily to prevent this from crippling our busier sims. Since most of the sim is empty- and that which isn't has VERY limited people allowed to rez, I can tell that no rezzing was going on during times where I am seeing the delay- I am only saying that there is more to the issue than just rezzing and entry.....
I am rebooting daily also... at least. Might be useful to give a perspective of a popular region, so I stuck the 2nd script posted above in a prim, wore it and went to Sensations. Given the location, I have replaced avatar names with "X Y" (clearly, not all the same person).
[20:44] CheckSIMFreeze: 0.836089 Added: this is a no-build area, with the exception of the update room, and I was near that, no one was in it. Perhaps I wasn't clear.... Most of the sim is empty of people.... however one parcel has traffic of 52,542 - so I wouldn't call that a non popular region
I have to admit I am a little puzzled about LL's silence on this one also. This problem appears to affect every simulator on the grid sooner or later, depending on the amount of traffic coming and going. It is especially painful for mainland based business owners, since they can't just restart the sim whenever the slow downs become too painful to live with.
I, for one, would feel better if a Linden would at least acknowledge this problem exists, even if there has been no progress made towards solving it. I would venture to guess that they know about it but can't comment because they have all been assigned either to "more pressing issues", or some new shiny which makes the business look good to share holders but that just ends up building the castle taller on the sand. Like that new viewer that has been mentioned, for example.
That is how big businesses work, and as a former coder myself, I know that from experience, and also know that the dev team is probably hating that they are not being allowed to fix major bugs, and are all too well aware that building on an unstable system will lead to far greater problems down the road. Management generally is unable to grasp this concept, however. The priority is new shinies. :-| Caveat: I speak from my own experience working as a coder for a multi-national bio-tech corporation. YMMV. Standard Disclaimer: I have noticed something that I believe is related to this issue, but not to mono scripts.
I entered the region Cloudmont. It is a completely protected full prim mainland sim. It runs only 8 scripts, with .1ms Script Time and 19ms Spare Time. It is heavily laden with Linden trees, and it has not been restarted in a while. I was wearing no scripted attachments. Lag was unbelieveable. Set your draw distance up high (I had mine set at 192m) and your max bandwidth as high as it will go, then enter the sim. You should see major lag as the trees rez for you, and you'll notice what is dragging the sim down is huge huge amounts of Images Time. After the Images Time settles down, and lag returns to normal, look in different directions and you will cause more huge lag spikes (over 150ms Images Time!). In a nearby sim, those same images rez with very little Images Time and very litte lag created. If someone else can confirm this, I think a new JIRA should be created for it which relates to this one, but focuses on Image Time, rather than mono scripts, as the culprit. If Cloudmont gets restarted, then this test won't work (in this sim at least) for a while. As of when I posted this, it is running on sim4507. Try it out, let me know what you think. Triple, I think this is a totally separate issue - sounds to me more like client side "lag". At least from what I understand of your description.
Images time is a component of the server time slice, Miro.
Yes I know that. But based on what was written, it sounded more client side to me. But I am far from an expert..
No, this is definitely server side. In fact, this sim is experiencing all the symptoms mentioned in this JIRA, except no mono scripts are involved. It is all server-side Image Time. Also, the moment you cross into an adjacent sim, the Image Time is normal and things are fine.
Fair enough - I hadn't realized that last part, re: crossign into another SIM. I stand corrected.
It seems pretty clear to me that the strategy of relying on LL customers to experiment around and figure out why there are unworkable and unaccepatable simulator freezes when avatars enter a region on the main grid is failing. There are too many variables, and we have no access to the simulator source or any tool other than LSL scripts to determine why this is happening.
What is clear is that lag spikes on avatar arrival are excessive, to the point where vehicles (especially those with multiple passengers) crossing a region boundary can be nearly impossible. If Linden Research no longer intends to support multi-region vehicle use, and the only viable way to move from region to region is to teleport, it would be nice if that would be stated explicitly, and we can simply discard the millions of L$ worth of content that is effectively useless at this time and stop wasting customer time and energy . I am experiencing this sim freezes too. Its getting worse and worse.
Also i noticed all teleports are taking longer , if they dont fail completely. A direct sim - crossing takes me abaout 30 seconds. Problem started for me with the actual server version. My impression is, that the whole simulator seems to work slower, but i don't have measurements. /me bumps the priority of this and updates the description. If y'all really only see things lock up for a few seconds, I'll unduplicate
/me reads the comments a little more and is tempted to reopen
The problem with the images time eating the whole sim is an old one that went away for a while - a year or so - but that I've recently seen again. Not as bad as it used to be but it's evil when it happens. When I see that, we basically get stuck down at < 5 sim fps until the sim restarts. Sometimes it takes several restarts to go away. This bug here, the one Kira added, seems to be that the sim stops processing scripts. I don't see anything in the description that says if everything else stops, too.
/me quotes a live chat from a few months ago...
Kira, if this is not what you're talking about, and there could well be several different issues torturing us, please say so and I'll go back to ...and I see there's a couple Lindens watching this.
Do you need more info on any of these problems? This problem is something fairly new - early this spring, things were really good in SL. Around late May, this problem turned up. It's not one sim. It's not one sim host. It's not just busy sims. It's not just when people with 10k scripts TP in. It's not one colo. It's not my PC or network. On a busy night, my home sees the sim stall for 10-30 seconds every minute or two. By that, I don't mean that it gets slow for a few seconds every 20 minutes, I mean the whole thing just stops for 10-30 seconds every minute or two. Really. I've shown this to a concierge Linden. I've shown this to a ComSys guy. I think we saw this tonight a couple times at Andrews office hour. I would be more than happy to demonstrate this on any region to any Linden that would like to see it. Tell me where M hangs out and chats to people and I'll TP in and out every few minutes, if you'd like. (sorry for the edit spam, those on the watch list.. darn funky atlassian formatting tags) I agree with Sindy totally on this one.
It started suddenly, about the time the 1.26 server was released, and about the time the servers were converted to Debian Etch. I would really look into Etch as being the possible culprit. The problem acts like the server is having to swap, and the hang lasts for the entire duration of the swap. Once it starts happening in a given sim, it keeps happening until that sim is restarted. Possibly a memory leak in some Etch module that wasn't there with Sarge? Just guessing here, but all the symptoms seem to point to a memory leak, and the more activity the sim sees, the faster the leak occurs. This may be asking for a lot, but is there any way to bring one of the agni servers up on Sarge, and put a few of the "problem regions" on it, as a test? I fear doing that on aditi would not be a good test due to lack of traffic. Added more information/clarification to the description
A note to any Lindens watching this, this is not a normal sim pause as things are rezzed or an avi tp's in. /me would bet good money that this is not caused by a memory leak, though there may indeed be leaks that make the problem worse and worse as time goes on.. I know I saw this on my home a couple weeks ago within an hour of so of a restart.
If something was leaking quickly enough to make the problem happen that soon, we'd have sims falling over left & right because they'd eaten all their resources up within a day or so. That and the sim otherwise seems quite happy when people aren't coming/going... Either bad swap behavior or memory fragmentation can look like a memory leak after a while. A genuine memory leak would eventually lead to a sim crash.
An Issue with a capital I!
I am not a technician, but I do notice the performance drops, mostly when an avatar enters the sim. I have a gaming sim, and get complaints from residents/players about this several times every day. People can't play the games properly because of these freezes, lock-ups. By the time the simulator has caught up, they missed several rounds in the game. I'm going to try what xkuschel benelli proposed earlier (see above) and hope this will bring a solution for now. Meanwhile I hope LL will look into this and keep us posted. Regards, The complication to your position Sindy is that the problem may be server/host related, rather than sim related. So whether it is a memory leak, a fragmentation issue, or a swap problem, once the server hits the wall all the regions running on it could be affected. I can easily see how, after a restart, the region comes back up on a host that is just about ready to start swapping, dealing with memory fragmentation, or whatever it is causing this problem.
Memory fragmentation does make a lot of sense here. The pauses can get so long that I automatically assumed swapping was the issue, but it really could be either. If it is a memory leak, it could be the OS leaking rather than the simulators, which could cause horrible performance without crashing the sims. Regardless, since we could be dealing with a server/host issue here rather than a simulator issue, I think it is important to keep in mind that this problem seemed to start occurring about the same time the servers were upgraded to Etch. The fact that the triggering event on these freezes is an avatar entering the sim suggests that a per-sim critical thread is being blocked for an outrageously long time. Do these stops happen on full sims that do not have avatars entering? If not, it is unlikely to be a host-level problem. Do these stops happen more often on homestead/openspace sims? If not, it is unlikely to be a per-core problem. Frankly, tens of seconds of stall is an outrageously long time for just about anything local to the machine - unless these machines are running deeeep into swap pretty much all the time, this behavior sounds to me more like a/the critical sim thread is blocking on the network for some reason. If I were the Linden assigned to fix this, I'd start by looking to see if if the core running a sim in this state goes idle or full-out during these freezes... Could be part of the initial av hand-off, some part of the sim machinery not properly threaded... heck, could even be something like priority inversion due to remote logging. So much opportunity to get things wrong in asynchronous distributed systems.
I thought Etch happened earlier in the year - like in the February timeframe. If so, and if this problem here really started happening around May, I think it's pretty unlikely that we'd suddenly see swapping or fragmentation problems of this magnitude, months after the upgrade to Etch.
Since I see a pretty solid 1-to-1 relationship between people TPing in/out and the freezes, I'd lean towards this not being a problem on the host as-a-whole, unless I've been lucky enough to have the only really-active sim on the host for the last few months running. LLNet was my biggest suspect for a while. One of the concierge Lindens said that they did see a big spike in network traffic as I TP'ed in (and no, I'm not that heavily scripted or primmed.. far less than a lot of people) but when I 'ping -t' the host and wait for the problem to happen, the response time doesn't even blink. That doesn't mean it's not LLNet but it seems like the problem would be higher up in the stack if ping doesn't show anything strange. i wonder if a good test would be to rez a bunch of 255-prim scripted objects then delete/return them all and watch for freezes. Then do it again but with them set to temp so the sim just nukes them instead of sending them back to inventory.... /me shrugs. Maybe it is a leak or fragmentation. Or some bit of hardware. Or 64-bit weirdness. Or LLNet. Or, my favorite, something due to the adult content changes that Cyn said we were all asking LL for. I had the same "bad feeling/adult content changes" re: group chat performance. There was a lot of eye-rolling when I floated the speculation....but no Linden ever actually denied that was the case. The best I could get was an "as far as I know, no".
I think that's basically a symptom of the new management's style; they'd be perfectly OK with jamming something in and not telling anybody who didn't need-to-know to make it happen. So none of the Lindens who aren't in direct control of something are prepared to make absolute statements that ${silly-thing-X} isn't being done because they truly don't know; it might be happening and be a state secret. perhaps this issue is related to the cause of the issue with rezzing high primcount coalesced objects, for fear of people standing on moving phys objects falling off as well as nonlinked phys objects breaking apart when entering the sim or being rezzed, they freeze the whole sim when that happen until everything is rezzed into the sim (in the case of the high prim count coalesced objects, the time the sim gets frozen might be too big, so they don't allow people to rez them at all)
/me updates the description so people don't think this is just about scripts.
I have 4 regions all in a square, and one region (the telehub) is the busiest, the other 3 while they are not idle have less usage. So since i have 4 regions and the servers handle 4 regions, why can't have them all share one server. That way, i don't impact anybody's regions but my own and no body impacts mine, thus, any lag would be my fault for sure. Yes, i know this probably belongs in another jira. Yes, I know it canot be done because the lindens don't tie regions to servers, but it does seem to be an equitable way to distribute resources. If i pat for the equivalent of 1 Server, why shouldn't i be on one server.
There is a slight twist to this in that it in actualy 3 servers owned by 1 av and 1 server owned by another (but they are a group of 4) and we split the costs 50/50. If we could put regions owned by different avs in to an estate and then tie the estates so that 4 multiples of 4 regions of the estate shared the same server, then the grouping could be at the estate level. even better, we should be able (as the estate owners) say which of the regions in a large estate would share the same server. This has been very noticeable in the Hippotropolis sim in the last few weeks, during the open source office hour.
Freezes causing Ping Sim to climb to > 10,000 ms are common at the beginning or ending of the meeting (when a lot of teleporting is going on). Rezzing or deleting just a few scripted objects does the same. This may explain the endless stream of "I can't teleport" bugs that are created and resolved daily on jira.
Attached is a version of the freeze detector script with extra spam. This is the pattern I see when the detector is stationary in a sim I TP in and out of: Teleport out: Wear 100+ mono scripted object, then try and tp into the sim and you'll see that effect (rezzing / derezzing will do the same). TP'ing with so many scripts attached is also close to impossible, so a lot of these "I cannot tp" Jira issues are most likely because the persons wear that many scripts thus creating the freeze when they finally succeed.
Read the issue description again, Catten. This is very much not another "I cannot tp" issue.
No agreed, but some people seem to think their TP issues are due to the sim freezing, where as it's more likely the sim freezes because they try and tp with a gazillion mono scripts
Linden Research encouraged content creators to convert their scripts to Mono.
Now that thousands of items have been converted and distributed, we're going to hear "Oh, you're wearing too many Mono scripts to TP reliably (because our Mono interpreter is kludgy)"? Is this a replay of "You're wearing too many prims to TP reliably (because our XML encode/decode is too slow)"? WTF, people? This is a very broken design. The "improved" script engine is a disaster. And the scripting developers have Mono/C# tunnel vision. And now the problem is going to be "solved" by imposing script memory quotas on parcels and avatars. Not that that will solve this problem, of course. If it takes excessive time to prepare a script for execution in a new location (why?), then that process needs to be backgrounded...or optimized to take a more reasonable resource/time level. It can't simply be allowed to seize a shared resource. I was met with blank stares at office hours when I suggested hooking up a debugger to a region and finding out where this time is being spent. Then there was discussion of "who has the appropriate Visual Studio skills" to do the debugging. Visual Studio. For a Debian-hosted server. I wish I was kidding. Background of the freeze-detector.
When Mono was still in beta, we tried to come up with all kinds of tests to measure sim-performance. One of the things that came out of that was my dilation graph. It uses particles, because they are not affected by sim-performance. Then later when Mono was released, I continued experimenting. I realized llGetRegionTimeDilation() has a problem: the closer dilation gets to zero, the worse the function performs. The extreme case is dilation 0.00, when in fact everything on the simulator stops working. I changed the graph to compensate for that flaw, by referencing to a real-time clock. Now the average would show the after-effects of severe lagspikes. What we now call "freezes" would show as a flat line, because the simulator doesn't send updates, followed by a large dip in the average. After staring at the graph for a long time, I made the first freeze-detector. This was in january, and it used the timestamp as reference. Much later I discovered that llGetAndResetTime() gives nearly identical results, and is much cheaper. ======= The freeze-detector left a lot of messages in my chatlog since january. It's a very raw log, because: -It's a HUD, it follows me around from sim to sim. Still there are some things that may be helpful ... ======= In january and early february, most freezes are shorter than 1 second, long freezes are 2 seconds. Even on extremely unhealthy sims, the freezes don't exceed 5 seconds. Then halfway february etch is introduced (the 64-bit operating system). At first this seemed to remove most freezes, but after a few days they were back. Now this is very hard to tell, but I have a feeling that a region on etch performs better initially, and then deteriorates faster, to a slightly worse level. Obviously there is no way to confirm that on my end. The real trouble starts in early april, when 1.26 is deployed. Freezes longer than 10 seconds start to show up. There is no sharp transition, probably because there were a number of restarts, the regions weren't up very long. ======= We joked: "The freezes last longer than a region-restart". Another little note,
We find that we can rez the exact same heavily scripted object over the course of a "SIM uptime" and get consistently increasingly worse "freezes". First rez with a freshly restarted sim perhaps 1-3 seconds (acceptable). The test object we used is a 104 prim carousel with 1 huge primary script, 4 secondary scripts, and numerous smaller other ones, all MONO. We haven't tried with it compiled as LSL. There was some comment by Babbage that the mono issue might happen only the first time that script is rezzed on a region. I don't know if that means the first time for that bootload or the first time in N minutes or only when the script isn't already on the sim or what.
It might be interesting, Kira, to try rezzing multiple copies of that object within a short time of each other. Rez one then wait, say, 30 seconds then rez another and see if the pattern stays the same. Of course, if a Linden felt like stepping in and saying "we already know what the problem is" then that test probably wouldn't be useful. Or if they felt like saying they didn't know what the problem is, it might indeed be useful. /me looks around for ANY comment from ANY Linden on this 5-month old critical bug with almost 250 votes that has been torturing EVERY high-traffic region. /me grumbles. C'mon LL, this is really ugly - throw us a bone here, please. Thanks Lil, for at least tacitly acknowedging the existence of this JIRA. Still hasn't been resolved or even commented on by a Linden, though.
took the change of adding this issue to a support ticket, as no one seems to be assigned to this JIRA, and my kart track is slowly coming to a complete halt.
the support request: (problem description) 12/10/2009 12:24 PM PDT the answer: Hello Kardargo! Your region has been set to restart after a 5-minute delay. If the issue persist after this restart, please let us know! Regards, Sheldon Linden, Concierge Support the reply: looking forward to the next restart in a week or two.. and added info on the issue after the restart: the answer: I have visted the region and there were no issues with TPing in or out, I suspect that the issue only occurs when there are karts on the track. Can you check this? Please use the Statistics window to check the physics time and spare time during these races and let us know what you see. I suggest you remove the mega prims from the track, as they can cause abnormally high physics times. Thanks, Sheldon my reply: i understand that you have no issues tpíng in and out, thats was not the problem in the first place, it is the freezing up of the region when you tp in or out.. something the person that tp's in or out never notices right? and no it is not the physics of the karts, i keep my scripts as lag free as possible, and have seen 10 to 15 karts on the track without a problem and added info: the region freezes was never a problem before, so please stop looking for the answer on my region, and start looking in what the heck changed in the server code, this issue is ruining my business. and again: http://jira.secondlife.com/browse/SVC-4196 and the final resolution: If this issue is being caused by a server side bug, we are unable to assist via the support portal. Please make sure all the information you have is added to that Jira so that the developers can assist. Regards, Sheldon so..... here it is, i have to add the info to the JIRA and sit and wait until my business goes completely bankrupted, or someone at the LL office wakes up and solves the issue.... i would not set my money on the latter one... thanks all, Kardargo Adamczyk, unfortunately the issue has gotten worse, not better. Before this last server update we had to reboot our lightly used full island sims once every 3 days to avoid the huge lag spikes and time dilation. Since this last server update the sims are always in time dilation and if we do not reboot them EVERY SINGLE DAY the lag spikes make flying and general use unbearable.
Being on mainland, where LL seems to just let them lag off into oblivion, you have little hope of getting a daily reboot. From the last 2 posts, it seems that things are getting worse not better. I suggest we bump this up to showstopper. Also from what Samantha said, it sounds like a memory leak of some sort.
I don't think bumping this to showstopper will help. I really hate this bug but it probably doesn't fit as a showstopper either - showstopper is more of a "holy crap! shut down the grid, quick!!" sort of thing. More votes might help, though.
Don't worry, LL will get this fixed. I think the last (only) we heard was that it was scheduled for Q2 of 2010. For everyone not following issue: http://jira.secondlife.com/browse/SVC-3895
Babbage Linden added a comment - 02/Nov/09 07:23 AM regards, |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This make all NPV stop in sim crossing. I tested almost all NPV and all stop for several seconds.