Page 4 of 5

Re: Update to popularity score algorithm

Posted: Sat Oct 08, 2011 2:14 am
by Lupo73
And to sort apps you could use 3 links:
- Last update first
- Best rating first
- Most active first

Re: Update to popularity score algorithm

Posted: Mon Oct 10, 2011 3:57 am
by Andrew Lee
Thanks for all your feedback. Looks like this is turning out to be more complicated than I thought. :D

I did some experiments on actual data and did not find too much difference between EMA (Exponential Moving Average) and SMA (Simple Moving Average), which is essentially what the current Sum-30 score is all about (just divide by 30 to get the SMA). EMA essentially emphasizes more recent data over past ones, so by the time we reach the older data, the effect to the final result is negligible due to the exponential nature of the window.

Just to illustrate, suppose you have 60 days of data starting with 100 (older) and going down to 10 (more recent). You are going to get both an EMA-30 and SMA-30 of about 30 (with EMA-30 slightly higher). It is easy to cook up a spreadsheet and play around with the figures. My conclusion is, past figures outside the window will not have significant impact over the final result.

I think the problem we are having is the definition of "popular". The issue is, "activity" and "voting" just do not mix. As I have explained before, if you mix the two, activity will always overwhelm voting. In addition, for activity, it makes more sense to use only recent data since it is so voluminous, but for voting, it makes more sense to use all data in our calculation since it is rarer, more spaced out in time, and hence more valuable.

As Lupo73 suggested, one logical solution may be to split the two (best rating vs most active). But I was thinking we should stick to ratings only (computed from votes), and cut out activity scores all together. We _can_ still have a most active list, but that will be for reference only, and I don't ever want to implement a "sort by most active" feature!

For this approach, we will have to use the existing registered user's votes as a starting point, then proceed from there. As I mentioned previously, votes by anonymous browsers are aggregated with activity scores, so those will be lost. :(

Re: Update to popularity score algorithm

Posted: Mon Oct 10, 2011 5:55 am
by Lupo73
Good decision! :wink: ..only why do you prefer to not implement also "sort by most active"? it could be an important sorting feature.. and essentially what seems you need is to split current counter in two scores..

About voting score, what formula have you decided to use? I can help you with it if you need..

Re: Update to popularity score algorithm

Posted: Mon Oct 10, 2011 8:11 am
by Hydaral
Andrew Lee wrote:I did some experiments on actual data and did not find too much difference between EMA (Exponential Moving Average) and SMA (Simple Moving Average)
EMA is really only useful if you want to call attention to rapid changes in the data, spikes, troughs, etc. This is why I use it for monitoring HDD capacities.

Re: Update to popularity score algorithm

Posted: Tue Oct 11, 2011 2:35 am
by Andrew Lee
Good decision! :wink: ..only why do you prefer to not implement also "sort by most active"? it could be an important sorting feature.. and essentially what seems you need is to split current counter in two scores..
Only to reduce complexity. Imagine having to display two scores "Popularity" and "Activity" in each item, and having to explain each of them. I bet it will be confusing for your typical user too.
About voting score, what formula have you decided to use? I can help you with it if you need..
To be decided, but probably based on your recommendation, since you have already done so much research on it. I will also give you some numbers from the database to play with before we make that decision.

However, before we dive into that, let's hear what the others think about this. Basing the "Popularity" score on votes alone does mean the popularity list will revert to being rather static, which is what some of you have complained about (though that can be mitigated somewhat by having a separate, more dynamic "Most active" list).

Re: Update to popularity score algorithm

Posted: Tue Oct 11, 2011 4:06 am
by Lupo73
Andrew Lee wrote:Imagine having to display two scores "Popularity" and "Activity" in each item, and having to explain each of them. I bet it will be confusing for your typical user too.
I think it could be a good solution instead :roll: and it doesn't confuse users, because is like generally websites do..

You could replace this line:

Code: Select all

1MB (uncompressed) - Popularity score (73/20)
with something like this:

Code: Select all

1MB (uncompressed) - 67% (popularity) - 78% (activity)
And eventually you could add links to "popularity" and "activity" words, to a page that describe their meanings (and algorithms used to evaluate them)..

Re: Update to popularity score algorithm

Posted: Tue Oct 11, 2011 11:53 am
by SYSTEM
Andrew Lee wrote:However, before we dive into that, let's hear what the others think about this. Basing the "Popularity" score on votes alone does mean the popularity list will revert to being rather static, which is what some of you have complained about (though that can be mitigated somewhat by having a separate, more dynamic "Most active" list).
Exactly.
SYSTEM wrote:IMHO, the "all time favorites" list isn't interesting at all. It's way too constant.
I think that the "Popular titles" box should stay dynamic. Indeed, one option is to have a separate, more dynamic and probably hidden popularity score that could be used to compile the "Most active" list.

Re: Update to popularity score algorithm

Posted: Thu Oct 20, 2011 11:27 am
by Andrew Lee
Currently, because all scores are consolidated at the end of every 24 hours to save space, it is very difficult to apply a different algo retrospectively to the scores. As such, I have started saving the data instead of purging them.

The data kept are as follows:

1. Following the "Website" link
2. Following the "Download" link
3. "This app rocks"
4. "This app sucks"

We should then to able to apply whatever scores or weightings to this data in our algo retrospectively.

I think in the end, we should try somthing like what Lupo73 suggested

Code: Select all

Popularity = 70 * (ActivityScore / MaxActivityScore) + 30 * (PositiveVotes / TotalVotes)
However, i will tweak it to be as follows:

Code: Select all

Popularity = 
    w1 * Old Score +
    w2 * Anonymous Voting Score +
    w3 * Registered Voting score +
    w4 * Activity (n-day)
The "Old Score" cannot be broken down further because they have been consolidated prior, which in hindsight, wasn't such a good idea. :cry: So I think we can assign just a small weightage to it in the final rating.

For voting scores, I think we will use the full history. It can be calculated using one of the formulas that Lupo73 suggested.

For activity, I think we can use only the last 30-day data to ensure freshness.

The popularity rating will then be a decimal value between 0.0 to 10.0. This will be one rating (which I am hoping to achieve, instead of two) which incorporates both voting and activity components.

We will only have enough data to play with after another 30 days or so, so please be patient. Then I will send the data to Lupo73 and we will work on the new rating algo together.

Your comments please.

Re: Update to popularity score algorithm

Posted: Fri Oct 21, 2011 6:24 am
by Lupo73
Good! Let me know :wink:

Re: Update to popularity score algorithm

Posted: Mon Nov 14, 2011 2:45 pm
by Lupo73
News about it? In the meanwhile I think could be restored the old scoring solution, because current one is really influenced by recent activity.. Andrew I'm ready to help you if you need, you can contact me with PM :wink:

Re: Update to popularity score algorithm

Posted: Mon Nov 14, 2011 6:34 pm
by Andrew Lee
Let's wait for another week or so. Then I will consolidate the data and send it to you for analysis.

Thanks!

Re: Update to popularity score algorithm

Posted: Fri Nov 18, 2011 6:19 am
by patpat
just my 2 cents, don't you guys think the 30 days period should operate the same day for all apps?
i.e. allways the 1st day of the month...

Re: Update to popularity score algorithm

Posted: Sat Nov 19, 2011 12:45 am
by SYSTEM
patpat wrote:just my 2 cents, don't you guys think the 30 days period should operate the same day for all apps?
i.e. allways the 1st day of the month...
How exactly would it work?

If the displayed popularity score followed the amount of traffic since the first day of the same month, all scores would drop to zero when the month changes. That surely would confuse visitors.

If the score displayed the popularity during the previous month, the scores would only be updated monthly rather than daily. IMO, it would be acceptable, but worse than the current situation.

Did you have something else in mind? :?:

Re: Update to popularity score algorithm

Posted: Sun Jun 24, 2012 6:24 am
by procyon
I'm not sure if that can be applied to the current vote/rating system but does it possible for logged members to see the thumbs "rocks"/"sucks" grayed out according to their previous vote for an entry ?

Re: Update to popularity score algorithm

Posted: Sun Jun 24, 2012 6:02 pm
by Andrew Lee
@procyon: This has been implemented, but it only affects the "rocks" button. i.e. if your id is in the "hearts" list for that entry, the "rocks" button will be grayed out.

This is due to the current implementation of the voting system. For logged in users, clicking the thumbs up button adds it to your favorites, while clicking the thumbs down button removes it from your favorites. So there isn't really an "anti-favorites" list.