Update to popularity score algorithm

Changes, updates etc. related to this website will be posted here.
Message
Author
User avatar
Lupo73
Posts: 1012
Joined: Mon Mar 19, 2007 8:55 am
Location: Italy
Contact:

Re: Update to popularity score algorithm

#31 Post by Lupo73 »

You are right..I saw it after the post and I'm doing some more tests to understand why (the code is updated if you want to test it)..
I also sent a message to the author of the second article to ask help with the formula..

Hydaral
Posts: 194
Joined: Tue Mar 09, 2010 7:36 pm

Re: Update to popularity score algorithm

#32 Post by Hydaral »

m^(2) wrote:Why 100+ 100- has lower popularity than 10- 10-?
IMO it should work the other way...
I would think that since more people have voted on the +100 -100, that the confidence is greater, therefore the percentage reflects that.

User avatar
Andrew Lee
Posts: 3048
Joined: Sat Feb 04, 2006 9:19 am
Contact:

Re: Update to popularity score algorithm

#33 Post by Andrew Lee »

Just fixed a serious bug in the code.

For certain entries which are not too popular, the historical aggregate score was also added to the final score, which makes some of them rise above the truly more popular ones based on the past 30-day total.

My apologies for this error. :oops:

The new top 10 is a mixture of old and new compared to the all-time list:

Code: Select all

Old list:
FastStone Capture => 196930
Yod'm 3D => 99423
Foxit Reader Portable => 68417
Undelete Plus => 29393
SilentNight Micro CD Burner => 27443
PixaMSN => 20691
EVEREST Home Edition => 19900
Universal Extractor => 19585
CCleaner => 17341
Mozilla Firefox, Portable Edition => 16610

Code: Select all

New list:
FastStone Capture => 4775
Undelete Plus => 1290
EVEREST Home Edition => 911
Free PDF Compressor => 789
Disk Digger => 681
PDF-XChange Viewer => 625
DriveImage XML => 608
TrueCrypt => 524
Q-Dir => 522
Softi FreeOCR => 443
For example, the top 3 are all found in the all-time list. Foxit Reader Portable has moved down but is still between 10 ~ 20 (not shown) etc.

This corrected list seems to look more reasonable now. :D

User avatar
Lupo73
Posts: 1012
Joined: Mon Mar 19, 2007 8:55 am
Location: Italy
Contact:

Re: Update to popularity score algorithm

#34 Post by Lupo73 »

I'm studying the new formula.. is the idea still valid? because I think current solution has only a limited value and not correctly reflects popularity.. but I'd like to know your and other opinions about it :P

A couple of questions about the database: does it still store all votes? (to eventually restore old votes if needed) ..and could you give me some examples of software ratings? (as + and - votes for each app, to make tests.. like 20 apps)
Because if in general plus votes are much more than minus votes, current formula could be already good..

User avatar
Lupo73
Posts: 1012
Joined: Mon Mar 19, 2007 8:55 am
Location: Italy
Contact:

Re: Update to popularity score algorithm

#35 Post by Lupo73 »

I updated the test code with a new better and easier solution, that uses statical weights for registered and unregistered votes:
Popularity = 0.8 * (0.8 * Value1 + 0.2 * Value2) + 0.2 * (0.8 * Value3 + 0.2 * Value4)

With:
Value1 = unregistered positive votes / unregistered total votes
Value2 = unregistered positive votes / unregistered maximum positive votes
Value3 = registered positive votes / registered total votes
Value4 = registered positive votes / registered maximum positive votes
Weights could be changed for example to give more importance to number of votes if the difference between positive and negative votes is less relevant..
Last edited by Lupo73 on Thu Sep 29, 2011 10:28 am, edited 1 time in total.

User avatar
Lupo73
Posts: 1012
Joined: Mon Mar 19, 2007 8:55 am
Location: Italy
Contact:

Re: Update to popularity score algorithm

#36 Post by Lupo73 »

I update the code and the screenshot again.. now there are 5 solutions available:
1. Wilson formula (that analyzes only single software rating and gives a limited weight to number of votes)
2. Bayes formula (that compares a software rating with average software ratings, but seems to be not reliable for ratings with a relevant percentage of negative votes)
3. Lupo formula (that is the simple solution I proposed in previous post, apparently good but that needs to be verified)
4. positive / total formula (that doesn't consider number of votes at all)
5. positive - negative / total formula (that is another reference formula fully independent from number of votes)

I'd like to have some example of ratings to test them with realistic values, but anyway my opinion is that Wilson formula is the best one.. or eventually could be used my simple formula, that allows to specify the weight of rating parameters..
About Bayes formula instead, I'm waiting an answer from the author of a related article.. because I think to have correctly implemented it, but results obtained are a little strange..

User avatar
Andrew Lee
Posts: 3048
Joined: Sat Feb 04, 2006 9:19 am
Contact:

Re: Update to popularity score algorithm

#37 Post by Andrew Lee »

@Lupo73: Thanks for your work so far!

Before we continue, let me summarize the current system as it is (after digging into the code).

Every time a browser is interested enough to click on the "Website" or "Download" link, a "+1" score is added to a "points" table. Let's call this the "activity score". For simplicity, I am not going to go into duplicate detection (another table stores the IP addresses for such actions, so if a user clicks "Website", then "Download", it will still count as "+1").

Every time a browser is interested enough to rate an entry, a "+5" or "-5" score is added to the "points" table depending on whether it's a "rocks" or "sucks" vote. Let's call this the "voting score". Separately, an internal score corresponding to the user's rank (if available) is added to a "appscore" table.

On a daily basis, the scores for each entry in the "points" table are consolidated via an SQL sum() and inserted into another "points2" table, then all entries in "points" are cleared for the next day (to save space).

As is obvious now, the popularity score is computed by summing the "points2" table for each entry (range is currently 30 days, previously it was the entire range).

So the only "neg" score available is the "sucks" vote. (The "appscore" table does not record the "sucks" vote. It is merely a record of all the "plus" votes by registered users of each entry).

Also, the "voting score" is overwhelmed by the "activity score". Very few anonymous browsers vote. For example, at this moment, the "activity score" is in the thousands, while the "voting score" is in the tens.

Due to the negligible presence of "neg" score in the current system, I suspect the formula won't make much of a difference. We can increase the weights assigned to registered users, but unless it's some ridiculous number, they will be overwhelmed by the "activity score" of anonymous browsers.

The point of contention now I think it whether the list should be based on the score over the entire range, or just the past x days (or some exponentially decreasing window applied to the scores, so older scores have lower weights).

My current thinking is maybe by having two lists i.e. recent popular titles + all time favorites, as some of you have suggested, this debate can be somewhat resolved. The reason is I don't think a single list can cater to both recent scores and perpetual scores simultaneously.

flector
Posts: 51
Joined: Sun Jul 23, 2006 10:45 am

Re: Update to popularity score algorithm

#38 Post by flector »

I'm still not clear on the concept. Should we be re-upvoting for programs we have previously up-voted?

User avatar
SYSTEM
Posts: 2041
Joined: Sat Jul 31, 2010 1:19 am
Location: Helsinki, Finland

Re: Update to popularity score algorithm

#39 Post by SYSTEM »

Andrew Lee wrote:My current thinking is maybe by having two lists i.e. recent popular titles + all time favorites, as some of you have suggested, this debate can be somewhat resolved. The reason is I don't think a single list can cater to both recent scores and perpetual scores simultaneously.
IMHO, the "all time favorites" list isn't interesting at all. It's way too constant.
flector wrote:I'm still not clear on the concept. Should we be re-upvoting for programs we have previously up-voted?
No.

http://www.portablefreeware.com/forums/ ... 412#p39412
My YouTube channel | Release date of my 13th playlist: August 24, 2020

flector
Posts: 51
Joined: Sun Jul 23, 2006 10:45 am

Re: Update to popularity score algorithm

#40 Post by flector »

You have a point with "all time favorites."

But as a freeware author, I've been looking at the popularity ratings of my software for years. Those ratings are now in the toilet.

Hydaral
Posts: 194
Joined: Tue Mar 09, 2010 7:36 pm

Re: Update to popularity score algorithm

#41 Post by Hydaral »

Andrew Lee wrote:The point of contention now I think it whether the list should be based on the score over the entire range, or just the past x days (or some exponentially decreasing window applied to the scores, so older scores have lower weights).
Have you considered using exponential moving averages?

http://en.wikipedia.org/wiki/Exponentia ... ng_average

I use this to indicate trends on how fast server hard disks are filling up where I work. After some tweaking it works quite well.

User avatar
Lupo73
Posts: 1012
Joined: Mon Mar 19, 2007 8:55 am
Location: Italy
Contact:

Re: Update to popularity score algorithm

#42 Post by Lupo73 »

Now current solution is clear.. the idea of use activity in rating is good..

I think you could use a unified formula like my solution to have a more accurate rating (giving a weight to activity score, a weight to unregistered voting score and one to registered voting score). It may resolves the "problem" of points overwhelming, because the importance of each aspect has a related percentage. For example you could evaluate popularity in this way:

Code: Select all

Popularity = 70 * (ActivityScore / MaxActivityScore) + 30 * (PositiveVotes / TotalVotes)
In ActivityScore you could add +1 for "Website" click and +2 for "Download" click (independently from the solution you will use, I think a Download click is more important that a Website click).
MaxActivityScore is evaluated checking all ActivityScore counters once per day.
For this solution you need also to separate counters of positive and negative votes for each app. But you could consider to add registered and unregistered votes together, giving: +1 to unregistered users, +2 * UserLevel to registered users.

Another good improvement could be to add a simple time factor, for example giving a different weight to scores of different dates (I have something in mind, but I avoid to write a too long message now).

In alternative, keeping current solution for rating, I think these aspects may be improved:
1. give a different weight to Download and Website clicks (as previously described)
2. give a bigger weight to "voting score" (or eventually keep them permanently, not only last 30 days votes)
3. offer the two scores you proposed (last 30 days and all times)

User avatar
Andrew Lee
Posts: 3048
Joined: Sat Feb 04, 2006 9:19 am
Contact:

Re: Update to popularity score algorithm

#43 Post by Andrew Lee »

@Hydaral: My point is, I don't think using exponential moving averages will eliminate the need for two separate lists. Right?

@Lupo73: Some of your suggestions can be readily implemented (eg. different scores for website and download). Others will need considerably more work and changes, which I will KIV until I have sorted the current situation out.

I think we need to discuss whether we can do only with one list, or whether we need two. If we have two, what should be the popularity score for an app (past x-days, or entire range)?

More to the point, would using an exponential moving average eliminate the need for two lists? If so, what is the ideal formula for the window?

Hydaral
Posts: 194
Joined: Tue Mar 09, 2010 7:36 pm

Re: Update to popularity score algorithm

#44 Post by Hydaral »

What about using EMA for the last x days (30?) then add a percentage of the total votes for that app (5%?). The EMA will weight new apps that have been voted up a lot recently and the percentage of the total will add weight to all-time popular apps.

User avatar
Lupo73
Posts: 1012
Joined: Mon Mar 19, 2007 8:55 am
Location: Italy
Contact:

Re: Update to popularity score algorithm

#45 Post by Lupo73 »

Parameters that can be considered:
- Registered user votes
- Unregistered user votes
- Download clicks (main activity)
- Website clicks (secondary activity)

Some considerations:
- a first doubt is about the unified counter for activity and votes.. because the risk is that more frequent is a software update and more popular it will be (I think it is the reason of your limitation for updates per month, but it may be not enough)
- a second doubt about the unified counter is that vote an app loses its importance if it is overwhelmed by activity score (so a separated parameter could give much more relevance to votes and stimulate users to vote apps)
- another consideration is that current solution of separated counters for registered and unregistered users is not very useful.. it may be studied a unified solution for them (eventually keeping the support to see preferences of other registered users)

After these considerations, my opinion is that a good solution could be two Popularity Scores:
1. Rating Score without a time limit and unified for registered and unregistered users
2. Activity Score with a time limit (e.g. 30 days) or without it (using EMA formula)

For the first counter you could use the Wilson formula and give different weights to Registered and Unregistered users (for example +1 to unregistered, +2*level to registered).
For the second counter you need to decide the formula and give different weights to Download and Website clicks (for example +1 to Website, +3 to Download).

Post Reply