Skip to content
This repository has been archived by the owner on Mar 28, 2024. It is now read-only.

[BUG-232971] Faster viewer shutdown time since performance improvements can lead to perceived inventory loss due to cache corruption #10142

Open
1 task
sl-service-account opened this issue Nov 23, 2022 · 2 comments

Comments

@sl-service-account
Copy link

sl-service-account commented Nov 23, 2022

  • When using viewers before the Performance Improvements viewer, for example the MFA Viewer 6.5.5.571282, when you quit the viewer, the logout screen would remain visible and the viewer icon would remain visible on the taskbar for at least 10 seconds (on my system), indicating the viewer was still shutting down.

  • Since the Performance Improvements viewer 6.6.0.571939, after quitting the viewer, the logout screen displays for half a second at most and the viewer icon would disappear after barely a second, indicating the viewer had a much faster shut down time.

  • However, the viewer icon actually disappears from the taskbar before the viewer has fully shut down and before the cache write has completed.

  • This apparent faster viewer shutdown post-performance viewer is causing many reports of missing inventory, due to cache corruption.

  • This can be reproduced in several ways:

    • User logs out, waits for the logout screen/viewer icon to disappear then quickly relogs - harder to reproduce.
    • User logs out, waits for the logout screen/viewer icon to disappear then quickly signs out of their windows account.
    • User logs out, waits for the logout screen/viewer icon to disappear then quickly shuts their laptop lid - easy to reproduce.
    • User logs out, waits for the viewer icon to disappear, then quickly shuts down or restarts their computer - very easy to reproduce missing inventory 100% of the time in the next viewer session
  • We have had more than the usual number of reports of "missing inventory" on Firestorm since we released Firestorm 6.6.3 with the performance viewer updates, but it has only just come to light why this is, thanks mostly to the reports & investigation in this forum blog post: https://community.secondlife.com/forums/topic/491480-missing-inventory-repetitively-in-latest-firestorm/

  • Please also see the following Firestorm bug reports:

    • Example of relogging too quickly: FIRE-32417 - Windows 11 shutdown too soon, and Re-opening Viewer too soon (no reboot), after Exit Firestorm causes inventory cache incomplete X.gz file, results in missing inventory and forcing clear cache/reload of inventory at next login.
    • Example of log out then rebooting computer too quickly: FIRE-32453 - Inventory cache loss when closing Windows
  • This is also reproducible on the current LL default viewer.


Steps To Reproduce - Rebooting computer too quickly after logging out of the viewer

  • This case is easy to reproduce, so doing this one first.

  • Login on Second Life Release 6.6.7.576223

  • If your inventory is not fully fetched, wait for it to finish fetching then relog (but don't relog too quickly! lol). This is just to make sure your inventory is fully cached.

  • If your inventory is already fully fetched at login, no need to do the above step.

  • Take a note of your inventory count at the top of your inventory folder.

  • Quit the viewer in the usual way.

  • As soon as the logout screen has poofed and the viewer icon has disappeared from the taskbar, reboot your computer.

  • Once rebooted, login on the viewer.

  • Observe your inventory count and whether you have any empty inventory folders.

    Observed Behaviour

  • Fig 01 attached shows my full inventory count of 160,305 items, 16,008 folders.

  • I quit the viewer, waited for the viewer icon to disappear from the taskbar, then rebooted my computer.

  • Whirly_01.log attached is from my viewer session that ended with the computer reboot.

  • I have enabled inventory debug logging by adding Inventory to logcontrol.xml

  • Note the last log lines:

  • {code}

  • 2022-11-23T20:03:47Z INFO # newview/llappviewer.cpp(5454) LLAppViewer::disconnectViewer : Disconnecting viewer!
    2022-11-23T20:03:47Z DEBUG #Inventory# newview/llinventorymodel.cpp(1934) LLInventoryModel::cache : Caching e181b494-d12e-464a-aeba-cddddd22562e for 2e255955-e183-48c4-b9f9-1cd1b80db94e
    2022-11-23T20:03:47Z INFO #Inventory# newview/llinventorymodel.cpp(2983) LLInventoryModel::saveToFile : saving inventory to: (C:
    Users
    Whirly
    AppData
    Local
    SecondLife
    2e255955-e183-48c4-b9f9-1cd1b80db94e.inv.llsd)
    {code}

  • It's clear the viewer has not finished all the shut down processes here and the cache write has not completed, even though the viewer icon had disappeared from the taskbar before I started the computer reboot.

  • After rebooting the computer, I logged into the viewer.

  • I received the modal warning message Could not put on outfit. The outfit folder contains no clothing, body parts, or attachments. because my worn outfit was missing all items in my inventory - see Fig 02 attached.

  • Most of my inventory was missing & there were many empty inventory folders - see Fig 03 attached

  • My inventory count was only 32,651 items. It should have been 160,305 items.

  • Whirly_02.log attached is the log from this session with the missing inventory.

  • Note the first lines of this log:

  • 2022-11-23T20:06:39Z INFO #MarkerFile# newview/llappviewer.cpp(3915) LLAppViewer::processMarkerFiles : Checking exec marker file for lock...Exec marker file existed but was not locked; rewritten.
    2022-11-23T20:06:39Z INFO #MarkerFile# newview/llappviewer.cpp(3924) LLAppViewer::processMarkerFiles : Exec marker 'C:\\Users\\Whirly\\AppData\\Roaming\\SecondLife\\logs\\SecondLife.exec_marker' found; last exec crashed
    2022-11-23T20:06:39Z INFO #MarkerFile# newview/llappviewer.cpp(3968) LLAppViewer::processMarkerFiles : Logout crash marker 'C:\\Users\\Whirly\\AppData\\Roaming\\SecondLife\\logs\\SecondLife.logout_marker', changing LastExecEvent to LOGOUT_FROZE
  • The previous session where I rebooted the computer after quitting the viewer has registered as a logout freeze.

  • This is probably bad for the viewer stats too if many people are rebooting or shutting down their computers quickly after quitting the viewer.

  • Inventory remains missing until inventory cache is purged.

    Expected Behaviour

  • The viewer icon should not disappear from the taskbar until the viewer has finished the logout process & fully shut down

  • If attempting to reboot the computer before the viewer has finished shutting down, there should be a message like this for the viewer: https://www.top-password.com/blog/wp-content/uploads/2019/09/this-app-is-preventing-shutdown.png

    Other Information

  • Dan thinks that BUG-232555, which also started with performance improvements, is very likely caused by the same problem as this bug.

  • Proposed Firestorm fix from Beq Janus (has CA): https://vcs.firestormviewer.org/phoenix-firestorm/changeset/69a31fad4caabec61e4a5e4094de63620ab4da66
    [FIRE-32453] [BUG-232971] disconnect sooner to force the cache write
    The call to forceQuit sets the viewer state to Quitting. This triggers all the thread pools to exit, and to do so with no sense of orderly closure.
    As a result the main window can exit which removes any visual indication to the user that work is still happening.
    When the user then restarts, issues are caused because of temporary files that may still be in use.
    It is possible too that the early closure of the mainwindow, allows the OS to terminate the process without prompting the user for a safe shutdown (unproven)

Attachments

Links

Related

Original Jira Fields
Field Value
Issue BUG-232971
Summary Faster viewer shutdown time since performance improvements can lead to perceived inventory loss due to cache corruption
Type Bug
Priority Unset
Status Accepted
Resolution Triaged
Labels performance_improvement
Reporter Whirly Fizzle (whirly.fizzle)
Created at 2022-11-23T19:58:18Z
Updated at 2022-12-01T01:30:34Z
{
  'Build Id': 'unset',
  'Business Unit': ['Platform'],
  'Date of First Response': '2022-11-28T12:20:03.547-0600',
  'ReOpened Count': 0.0,
  'Severity': 'Unset',
  'System': 'SL Viewer',
  'Target Viewer Version': 'viewer-development',
  'What just happened?': 'Filling in...',
  'What were you doing when it happened?': '....',
  'What were you expecting to happen instead?': '....',
  'Where': 'Any region',
}
@sl-service-account
Copy link
Author

Dan Linden commented at 2022-11-28T18:20:04Z

Thank you for the report, Whirly!

@sl-service-account
Copy link
Author

Beq Janus commented at 2022-12-01T01:30:35Z

This has been fixed in FS.

I can provide a PR to the lab if it is wanted; however, I am posting the comment I placed on the FS Jira here. I don't think the solution I have posted is the best option. It deals with the major problem of the inventory loss/corruption and allowing premature shutdown on Windows. In reality, I think that the current shutdown procedure is very fragile and would benefit from a more robust controlled closure that would allow us to hook in our additional TPV shutdown and persistence activity in a far neater manner. 

=============

From the FS Jira

Fix pushed as: https://vcs.firestormviewer.org/phoenix-firestorm/changeset/69a31fad4caabec61e4a5e4094de63620ab4da66

What was happening is as follows (simplified for brevity ):

The viewer receives the instruction to quit. That is processed by the OS first (the Windows 'X' has been clicked)and triggers a quitting state. This causes the viewer to logout of SL, and the enter the cleanup phase. The viewer (and thus all viewers dervied form the same upstream source (the LL default viewer) has recently been overhauled moving numerous tasks to separate threads. The cleanup phase thus now has to deal with closing those in a controlled manner and the way that this is being done right now is that the threads terminate as soon as they have no other input to process, this then leaves the coast clear for the main thread to release the remaining (often shared) resources. Part of this rapid shutdown includes the main window which is managed by a dedicated thread. This has two bad side-effects

  1. It suggests to the user that things are finished and they can close down/relog.

  2. It appears (though I am no windows expert) that the "the following processes are preventing shutdown" message only happens for processes that have an active Window. As a result the coast is clear for the Os to terminate despite the fact that the viewer is still busy.

The "fix" is not ideal, it does not enforce a safe shutdown process as we might like. In an ideal world, we'd add another step in the closing cycle that follows on from the logout and calls the viewer disconnect and could provide a hook for other shutdown actions. At the moment this is not easy because the threads and coroutines continue to run and there is a chance that further inventory updates may occur while we are writing out the caches. In practice this may not be a real issue but in view of getting a fix in place the current solution which is to do the persist to disk in a single thread before signalling quit to the other threads but without any co-ros being polled seems to be simple and effective. 

 

==============

Adding more context:

The quit action (the 'x' on windows) triggers the WM_QUIT command which in turn triggers the logout. Logout is managed inside the doFrame loop as it is asynchronous. There are two typical results, if the logout response arrives in a timely manner (timely being hardcoded to just 6 seconds) or it does not arrive. In both cases, the net result is a call to forceQuit() (just via a slightly different call stack). When quit is forced the quitting state is set and all the threads terminate ASAP. this includes the Window thread pool, which is ultimately what causes the user to lose sight of the mainwindow (I believe) and thus assume that it is now safe.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant