• All submissions to this site are governed by Second Life Project Contribution Agreement. By submitting patches and other information using this site, you acknowledge that you have read, understood, and agreed to those terms.
Issue Details (XML | Word | Printable)

Key: SVC-633
Type: New Feature New Feature
Status: Open Open
Priority: Normal Normal
Assignee: Unassigned
Reporter: Qie Niangao
Votes: 14
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
2. Second Life Service - SVC

Sort order of Search results by text match quality, not traffic

Created: 08/Sep/07 09:58 AM   Updated: 10/Jun/08 04:34 AM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: None

Issue Links:
Relates
 


 Description  « Hide
Traffic as a criterion for ordering Search results no longer has value. Even if the system weren't "gamed" by camping-bots (with hugely negative performance impact on affected sims), popularity is only rarely an important factor in matching the searcher's intent with available listings.

Instead, a metric based on an information-theoretic "quality of match" between search terms and descriptions in listings should be used.

The detailed approach to metric calculation is left as an exercise to developers, but to maximize the value of the metric, it must exhibit the following properties:
1. The total match score between a listing and a search string is directly proportional to a weighted sum of matching tokens,
2. The weight of an individual token match is decreased by the frequency with which that token occurs across all listings,
3. The weight of an individual token match is decreased by the frequency with which that token occurs across all search strings,
4. The total match score is decreased by the number of unmatched tokens in the search string, and
5. The total match score is decreased by the number of unmatched tokens in the listing.

(Several existing algorithms and statistics may be helpful in achieving an efficient implementation; "maximum entropy" and "Bayesian filters" are suggested for background consideration.)

Additional match weighting factors may be of value, including token pair (and higher degree) matches and frequencies, and token sequence.



 All   Comments   Change History      Sort Order: Ascending order - Click to sort in descending order
Qie Niangao added a comment - 08/Sep/07 01:06 PM
[in hopes of clarifying, the below is copied from http://forums.secondlife.com/showpost.php?p=1668655&postcount=20.
[Quote=Yumi Murakami]
What it seems he's saying is that search should work by matching the keywords you enter, except that:

1) A venue should be marked down for having keywords that many other venues have;
2) A venue should be marked down for having keywords that are often searched for;
3) A venue should be marked down for having keywords that were not among those searched for;
4) A venue should be marked down for not having keywords that were searched for.

Unfortunately, I can see there are a couple of problems with this.

1,2) I can see the idea of these is to stop there being "most popular keywords" (like "sex", "free L$") which everyone can put into their description. Unfortunately, this doesn't change the fact that if this is what people are searching for, they will have to be included in the description in order to match at all, and in that case "being marked down" will not matter as all search results will contain the string.

3,4) These don't really fit with what people want... For example, if you search for "skin", by those rules any description that mentions anything other than "skin" will be marked down, even if it's "female skin" or "quality skin". Also these rules can be gamed by splitting a land area into 16sqm parcels and assigning each a different one-word search name, although this will cost a lot in listing fees!
[/quote]

The idea is to try to make the matches heavily weighted in favor of how specific the match is between the listing and the query. And there are some pretty standard, formal measures of "specificity" in the sense that I mean: the information value of the match. An example might help:

Say I query for "pretty pink flowers."
Oryx Floristry's description is "FLOWERS for the discerning avatar. Special: anything PINK!"
Yumi's Flowers' description is "PRETTY FLOWERS for the discerning avatar. Also taxidermy supplies."
Qie's Sex Emporium describes itself: "PRETTY girls and boys for every fantasy. Large pendulous breasts and disproportionate PINK pudenda! Sex, kink, bondage, fetish, sleaze."

Oryx wins over Yumi if "pretty" is a lot more common in queries and listings than is "pink"( equivalently, the information value added by "pretty" is less than by "pink"-it's more redundantit reduces less uncertainty-it's less specific).

But Qie loses to everybody because his description is full of (we assume) very high-frequency naughty words, even though it contains the very specific "PINK."

If the search is for "sex kink bondage fetish", even though they're all very common, Qie's listing will at least match the words, and fare on a par with all the other listings of the same length matching those same words. But if the search happened to contain the very low-frequency word "pudenda", Qie's match pops way to the top.

One correction to Yumi's explanation of #1&2: It's not so much that venues are penalized for having high frequency words. Rather, it's that matching on the high-frequency words isn't valued as highly-isn't as informative-as matching on rarer words. (There is some "venue penalty" because those high-frequency words add to the total word count of a listing, so it costs something for them not to match, but that's not the main effect.)

As for #3&4: The idea is still specificity. If the same search words match two venues, I'd want to favor the one for which that match is the higher proportion of the total listing description. Single-word listings wouldn't be successful in gaming the system because they'd only match at most one word, compared to a real description that matched multiple search words and was reasonably on-point.

I have no idea if this explanation is helpful at all... I'm eager for any improvements so at least folks know what they're voting on in the jira.

Because this is really a very specific discussion, I'll copy this to a comment in the jira, and suggest further discussion on this might more productively move to that site. https://jira.secondlife.com/browse/SVC-633


Fluf Fredriksson added a comment - 21/Sep/07 03:41 PM
I'm all for anything that removes camping as a prime aim of sims. It renders the current system totally pointless (unless you happen to be looking for somewhere to camp!).

If you go to the most "popular" place in traffic terms you can just about forget trying to shop. You're in a very laggy sim where you spend more than half your time downloading details about campers attachments rather than the high rent shops you can't actually get any of the goods to rez in!

It's insane!

The traffic system may as well stay seeing as it's coded in already, but search needs to be massively improved based on the text I'm looking for rather than the number of campers.


Haravikk Mistral added a comment - 09/Dec/07 12:38 PM
This is a very well described proposal. I would however suggest that it would need to be migrated to a tag-based searching system (see SVC-172) and leave descriptions to be just that...descriptions.

Problem being, that someone who writes a well-worded description may be marked down for words that are not intended as search keywords, for example; "Sells a variety of professional clothing", the strings "Sells a variety of" is superfluous in this case. With specific tags/keywords we can ensure that only relevant keywords are being searched.

I'm going to look and see if there's a meta-issue or such for removing traffic and put this in with it, or create one if there isn't one.


Prokofy Neva added a comment - 09/Jun/08 09:19 PM
No, it has value in SEARCH PLACEs, always has, and is responsible for most sales for most people in business (not oldbies and very specialized gadget makers who rely soley on very expensive classifieds and word of mouth – but they are not typical).

Traffic works fine, and most key words do not turn up camped places, and those key words that do show only a few camped places when you use SEARCH PLACES instead of SEARCH ALL.

SEARCH PLACES sorted by TRAFFIC works great, not broken, needs no fix.

I'm also totally opposed to all kinds of ideological and political manipulation of search by weighting against words like "sex" or trying to punish people for patterns of behaviour that seem like search optimalization but may not be.

If you don't like camping, make job openings for newbies and help the economy. Oh, and regulate bots.


Qie Niangao added a comment - 10/Jun/08 04:34 AM
For brevity, the proposal omits many arguments against traffic, but I'll add one here that is a simple mathematical fact: Traffic does not scale in the same way the grid does. Right now, with tiny, sub-100K concurrency, only the top few percent of instantaneously most-busy spots are unable to accept agents in the sim. But this had better not be the case for long, or the SL economy will have been a dismal failure. There is always going to be too much demand for the busiest spots-the very last thing to do is to further concentrate demand on those spots. Instead, if the economy actually grows, it will very fast outstrip the scaling of any possible individual sim hardware to handle focused demand, and successful businesses will instead be clamoring for a mechanism by which Search can load-balance among their locations, returning only those with capacity for additional visitors-just like the 2D internet (or phone networks, or power grids, or absolutely anything where concentrated network demand would swamp individual components).

Further, the proposal makes no attempt to punish words that anybody might find politically incorrect. If "sex" were used in every single parcel description, then it would carry zero information value; if it were used in just one parcel description in the entire searched corpus, then matching that token would be extremely informative.

Haravikk's observation is quite correct, however: item 5 in the design desiderata would tend to reward a "telegraphic" text style, quite like tags, and penalize more verbose descriptions. Something like what's proposed here, applied to tags-only matching, would remove parcel names and descriptions from Search altogether, and it would be a welcome improvement to be able to use those fields normally again.