Update to popularity score algorithm

Changes, updates etc. related to this website will be posted here.
Message
Author
User avatar
Lupo73
Posts: 1012
Joined: Mon Mar 19, 2007 8:55 am
Location: Italy
Contact:

Re: Update to popularity score algorithm

#46 Post by Lupo73 »

And to sort apps you could use 3 links:
- Last update first
- Best rating first
- Most active first

User avatar
Andrew Lee
Posts: 3048
Joined: Sat Feb 04, 2006 9:19 am
Contact:

Re: Update to popularity score algorithm

#47 Post by Andrew Lee »

Thanks for all your feedback. Looks like this is turning out to be more complicated than I thought. :D

I did some experiments on actual data and did not find too much difference between EMA (Exponential Moving Average) and SMA (Simple Moving Average), which is essentially what the current Sum-30 score is all about (just divide by 30 to get the SMA). EMA essentially emphasizes more recent data over past ones, so by the time we reach the older data, the effect to the final result is negligible due to the exponential nature of the window.

Just to illustrate, suppose you have 60 days of data starting with 100 (older) and going down to 10 (more recent). You are going to get both an EMA-30 and SMA-30 of about 30 (with EMA-30 slightly higher). It is easy to cook up a spreadsheet and play around with the figures. My conclusion is, past figures outside the window will not have significant impact over the final result.

I think the problem we are having is the definition of "popular". The issue is, "activity" and "voting" just do not mix. As I have explained before, if you mix the two, activity will always overwhelm voting. In addition, for activity, it makes more sense to use only recent data since it is so voluminous, but for voting, it makes more sense to use all data in our calculation since it is rarer, more spaced out in time, and hence more valuable.

As Lupo73 suggested, one logical solution may be to split the two (best rating vs most active). But I was thinking we should stick to ratings only (computed from votes), and cut out activity scores all together. We _can_ still have a most active list, but that will be for reference only, and I don't ever want to implement a "sort by most active" feature!

For this approach, we will have to use the existing registered user's votes as a starting point, then proceed from there. As I mentioned previously, votes by anonymous browsers are aggregated with activity scores, so those will be lost. :(

User avatar
Lupo73
Posts: 1012
Joined: Mon Mar 19, 2007 8:55 am
Location: Italy
Contact:

Re: Update to popularity score algorithm

#48 Post by Lupo73 »

Good decision! :wink: ..only why do you prefer to not implement also "sort by most active"? it could be an important sorting feature.. and essentially what seems you need is to split current counter in two scores..

About voting score, what formula have you decided to use? I can help you with it if you need..

Hydaral
Posts: 194
Joined: Tue Mar 09, 2010 7:36 pm

Re: Update to popularity score algorithm

#49 Post by Hydaral »

Andrew Lee wrote:I did some experiments on actual data and did not find too much difference between EMA (Exponential Moving Average) and SMA (Simple Moving Average)
EMA is really only useful if you want to call attention to rapid changes in the data, spikes, troughs, etc. This is why I use it for monitoring HDD capacities.

User avatar
Andrew Lee
Posts: 3048
Joined: Sat Feb 04, 2006 9:19 am
Contact:

Re: Update to popularity score algorithm

#50 Post by Andrew Lee »

Good decision! :wink: ..only why do you prefer to not implement also "sort by most active"? it could be an important sorting feature.. and essentially what seems you need is to split current counter in two scores..
Only to reduce complexity. Imagine having to display two scores "Popularity" and "Activity" in each item, and having to explain each of them. I bet it will be confusing for your typical user too.
About voting score, what formula have you decided to use? I can help you with it if you need..
To be decided, but probably based on your recommendation, since you have already done so much research on it. I will also give you some numbers from the database to play with before we make that decision.

However, before we dive into that, let's hear what the others think about this. Basing the "Popularity" score on votes alone does mean the popularity list will revert to being rather static, which is what some of you have complained about (though that can be mitigated somewhat by having a separate, more dynamic "Most active" list).

User avatar
Lupo73
Posts: 1012
Joined: Mon Mar 19, 2007 8:55 am
Location: Italy
Contact:

Re: Update to popularity score algorithm

#51 Post by Lupo73 »

Andrew Lee wrote:Imagine having to display two scores "Popularity" and "Activity" in each item, and having to explain each of them. I bet it will be confusing for your typical user too.
I think it could be a good solution instead :roll: and it doesn't confuse users, because is like generally websites do..

You could replace this line:

Code: Select all

1MB (uncompressed) - Popularity score (73/20)
with something like this:

Code: Select all

1MB (uncompressed) - 67% (popularity) - 78% (activity)
And eventually you could add links to "popularity" and "activity" words, to a page that describe their meanings (and algorithms used to evaluate them)..

User avatar
SYSTEM
Posts: 2041
Joined: Sat Jul 31, 2010 1:19 am
Location: Helsinki, Finland

Re: Update to popularity score algorithm

#52 Post by SYSTEM »

Andrew Lee wrote:However, before we dive into that, let's hear what the others think about this. Basing the "Popularity" score on votes alone does mean the popularity list will revert to being rather static, which is what some of you have complained about (though that can be mitigated somewhat by having a separate, more dynamic "Most active" list).
Exactly.
SYSTEM wrote:IMHO, the "all time favorites" list isn't interesting at all. It's way too constant.
I think that the "Popular titles" box should stay dynamic. Indeed, one option is to have a separate, more dynamic and probably hidden popularity score that could be used to compile the "Most active" list.
My YouTube channel | Release date of my 13th playlist: August 24, 2020

User avatar
Andrew Lee
Posts: 3048
Joined: Sat Feb 04, 2006 9:19 am
Contact:

Re: Update to popularity score algorithm

#53 Post by Andrew Lee »

Currently, because all scores are consolidated at the end of every 24 hours to save space, it is very difficult to apply a different algo retrospectively to the scores. As such, I have started saving the data instead of purging them.

The data kept are as follows:

1. Following the "Website" link
2. Following the "Download" link
3. "This app rocks"
4. "This app sucks"

We should then to able to apply whatever scores or weightings to this data in our algo retrospectively.

I think in the end, we should try somthing like what Lupo73 suggested

Code: Select all

Popularity = 70 * (ActivityScore / MaxActivityScore) + 30 * (PositiveVotes / TotalVotes)
However, i will tweak it to be as follows:

Code: Select all

Popularity = 
    w1 * Old Score +
    w2 * Anonymous Voting Score +
    w3 * Registered Voting score +
    w4 * Activity (n-day)
The "Old Score" cannot be broken down further because they have been consolidated prior, which in hindsight, wasn't such a good idea. :cry: So I think we can assign just a small weightage to it in the final rating.

For voting scores, I think we will use the full history. It can be calculated using one of the formulas that Lupo73 suggested.

For activity, I think we can use only the last 30-day data to ensure freshness.

The popularity rating will then be a decimal value between 0.0 to 10.0. This will be one rating (which I am hoping to achieve, instead of two) which incorporates both voting and activity components.

We will only have enough data to play with after another 30 days or so, so please be patient. Then I will send the data to Lupo73 and we will work on the new rating algo together.

Your comments please.

User avatar
Lupo73
Posts: 1012
Joined: Mon Mar 19, 2007 8:55 am
Location: Italy
Contact:

Re: Update to popularity score algorithm

#54 Post by Lupo73 »

Good! Let me know :wink:

User avatar
Lupo73
Posts: 1012
Joined: Mon Mar 19, 2007 8:55 am
Location: Italy
Contact:

Re: Update to popularity score algorithm

#55 Post by Lupo73 »

News about it? In the meanwhile I think could be restored the old scoring solution, because current one is really influenced by recent activity.. Andrew I'm ready to help you if you need, you can contact me with PM :wink:

User avatar
Andrew Lee
Posts: 3048
Joined: Sat Feb 04, 2006 9:19 am
Contact:

Re: Update to popularity score algorithm

#56 Post by Andrew Lee »

Let's wait for another week or so. Then I will consolidate the data and send it to you for analysis.

Thanks!

patpat
Posts: 13
Joined: Thu Aug 25, 2011 7:21 am

Re: Update to popularity score algorithm

#57 Post by patpat »

just my 2 cents, don't you guys think the 30 days period should operate the same day for all apps?
i.e. allways the 1st day of the month...

User avatar
SYSTEM
Posts: 2041
Joined: Sat Jul 31, 2010 1:19 am
Location: Helsinki, Finland

Re: Update to popularity score algorithm

#58 Post by SYSTEM »

patpat wrote:just my 2 cents, don't you guys think the 30 days period should operate the same day for all apps?
i.e. allways the 1st day of the month...
How exactly would it work?

If the displayed popularity score followed the amount of traffic since the first day of the same month, all scores would drop to zero when the month changes. That surely would confuse visitors.

If the score displayed the popularity during the previous month, the scores would only be updated monthly rather than daily. IMO, it would be acceptable, but worse than the current situation.

Did you have something else in mind? :?:
My YouTube channel | Release date of my 13th playlist: August 24, 2020

procyon
Posts: 34
Joined: Wed Apr 11, 2012 1:30 pm

Re: Update to popularity score algorithm

#59 Post by procyon »

I'm not sure if that can be applied to the current vote/rating system but does it possible for logged members to see the thumbs "rocks"/"sucks" grayed out according to their previous vote for an entry ?

User avatar
Andrew Lee
Posts: 3048
Joined: Sat Feb 04, 2006 9:19 am
Contact:

Re: Update to popularity score algorithm

#60 Post by Andrew Lee »

@procyon: This has been implemented, but it only affects the "rocks" button. i.e. if your id is in the "hearts" list for that entry, the "rocks" button will be grayed out.

This is due to the current implementation of the voting system. For logged in users, clicking the thumbs up button adds it to your favorites, while clicking the thumbs down button removes it from your favorites. So there isn't really an "anti-favorites" list.

Post Reply