Page 1 of 3

<Weird unicode characters thread issue>

Posted: Mon Oct 31, 2016 5:13 pm
by Orca
[Moderator note: this user was given a warning on his account and then decided to bail and delete all his previous entries. What follows are the replies to a site issue with odd characters.]

Re: "Бесплатная версия pc tools antivirus":

Posted: Mon Oct 31, 2016 5:30 pm
by Zero3K
It seems like you all are popular with foreigners then. I think that box should support unicode if that's the case.

Re: "Бесплатная версия pc tools antivirus":

Posted: Mon Oct 31, 2016 6:43 pm
by Specular
Already brought up here. It's the lack of Unicode support causing the mangled characters, and since the box now displays most popular for the day not just all time (since that was rather static).

I do wonder about the potential for abuse though if spammers game the search queries to place random software/ads in the popular search items.

Re: �-zip?

Posted: Mon Dec 12, 2016 8:09 pm
by Andrew Lee
Will look into this to find out what's going on...

Re: <Weird unicode characters thread issue>

Posted: Thu Dec 29, 2016 6:17 am
by joby_toss
Image

Weird unicode characters thread issue

Posted: Thu Dec 29, 2016 2:46 pm
by billon
This topic not visible on main page and not clickable:
x.png
that's because of "< >" around title?

Re: <Weird unicode characters thread issue>

Posted: Thu Dec 29, 2016 5:47 pm
by Andrew Lee
This topic not visible on main page and not clickable:
Fixed. Thanks for pointing this out.

Re: <Weird unicode characters thread issue>

Posted: Fri Jan 20, 2017 3:30 am
by SYSTEM
Orca wrote:At TPFC, "Бесплатная версия pc tools antivirus" is a popular search.

Really? I wouldn't have thought so.
Sounds believable to me. TPFC is such a small site that a handful of Russian visitors can probably bring such a search near the top.

Re: <Weird unicode characters thread issue>

Posted: Sat Jan 21, 2017 4:12 pm
by HairyPorter
Orca wrote:At TPFC, "[markb /forums/ucp.php?mode=login" is a popular search. Really? I wouldn't have thought so.
Such generic URLs can turn out to be a popular "search term" because other than text queries via the search box, it seems that TPFC's search engine is also capturing user clicks on hyperlinks & buttons.

Note: If interested, just mouseover the below sample generic URLs to view them. Don't click on them to make them even more "popular" !

The URL syntax in question suggests that a TPFC registered user "markb" might have been trying to login today &/or yesterday from TPFC's homepage or forum index page. (And TPFC does have a user named MarkB who had previously commented on various TPFC's software pages.) Perhaps he encountered repeated login failures (eg. wrong password, or forgot to allow session cookies), & kept clicking the login button upon every page refresh. And TPFC's search engine duly captured all his clicks.

Google Search likewise captured one of MarkB's login clicks on 09 Jan 2017 (this time from pg 5 of TPFC's software pages). Screenshot:
TPFCSearch-CapturesUserClicksGenericURLs-21Jan17.png
Furthermore, Google Search also captured MarkB's clicks on various occasions when he rated different software (Eg 1: 29 Dec 2016 | Eg 2: 11 Jan 2017 | Eg 3: 15 Jan 2017) whilst browsing through TPFC's software index pages.

To be fair, TPFC's & Google's search engines appear to capture all visitors' & registered users' clicks on hyperlinks & buttons. Except that most URLs don't receive numerous clicks within a short span of time, so these URLs (eg. the aforementioned software rating clicks) are deemed "not popular" by the search algorithm, are buried way down in search results, & hence don't usually come to attention.

Re: <Weird unicode characters thread issue>

Posted: Sun Jan 22, 2017 1:17 am
by SYSTEM
HairyPorter wrote: Note: If interested, just mouseover the below sample generic URLs to view them. Don't click on them to make them even more "popular" !
That's not a problem.

https://www.portablefreeware.com/forums ... 614#p83614
Andrew Lee wrote:The "u=0" parameter in those links ensure that they do not count towards the stats. I even double-checked again to make sure there isn't a bug in the code.

Re: <Weird unicode characters thread issue>

Posted: Sun Jan 22, 2017 12:36 pm
by HairyPorter
SYSTEM wrote:https://www.portablefreeware.com/forums ... 614#p83614
Andrew Lee wrote:The "u=0" parameter in those links ensure that they do not count towards the stats. I even double-checked again to make sure there isn't a bug in the code.
@SYSTEM -- Thanks for the info about TPFC's "u=0" parameter. Based on above description, I assume "u=0" is supposed to work the same way as the standard rel="nofollow" HTML attribute.

1) But why is TPFC's search engine apparently ignoring the "u=0" parameter currently hardcoded into TPFC's 'Popular Searches' links, as well as functional links such as those related to Login/ Rate/ Register etc. ? As implied by the recently-indexed "user login" URL example (which does have "u=0" appended to it), TPFC's search engine seems to be following & indexing user clicks on Login/ Rate buttons, to the point that the functional clicks of a persistent TPFC user managed to get ranked highly in TPFC's 'Popular Searches'.

In contrast, it is understandable that Google Search & other search engines are ignoring "u=0", since it is not the HTML standard. Hence the long list of TPFC functional-click URLs stored in their search indices.

2) Based on brief research, other phpBB-powered forums & websites appear to be using rel="nofollow" instead to make their internal &/or external links automatically obey that directive. Egs:-



Related Issue: Use rel="nofollow" for Specific Links (Google Webmasters)
Google Webmasters Search Console Help Center wrote: Crawl Prioritization: Search engine robots can't sign in or register as a member on your forum, so there's no reason to invite Googlebot to follow "register here" or "sign in" links. Using nofollow on these links enables Googlebot to crawl other pages you'd prefer to see in Google's index.
On the other hand, I can't find any phpBB documentation, examples of phpBB-powered sites, or any non-phpBB website using "u=0" for this purpose.

How did this "u=0" parameter come about ? Is it some special code unique to TPFC's backend ? More importantly, is it working as it should ?

Re: <Weird unicode characters thread issue>

Posted: Mon Jan 23, 2017 12:59 am
by SYSTEM
HairyPorter wrote:How did this "u=0" parameter come about ? Is it some special code unique to TPFC's backend ? More importantly, is it working as it should ?
Yes, "u=0" is unique to the TPFC backend written in PHP.

In the post I quoted, Andrew said that he had double-checked that "u=0" works correctly.
HairyPorter wrote:
SYSTEM wrote:https://www.portablefreeware.com/forums ... 614#p83614
Andrew Lee wrote:The "u=0" parameter in those links ensure that they do not count towards the stats. I even double-checked again to make sure there isn't a bug in the code.
@SYSTEM -- Thanks for the info about TPFC's "u=0" parameter. Based on above description, I assume "u=0" is supposed to work the same way as the standard rel="nofollow" HTML attribute.
No, not really. rel="nofollow" advises search engines not to index the link. "u=0" tells TPFC code not to count the search towards the search popularity statistics.
HairyPorter wrote: 1) But why is TPFC's search engine apparently ignoring the "u=0" parameter currently hardcoded into TPFC's 'Popular Searches' links, as well as functional links such as those related to Login/ Rate/ Register etc. ? As implied by the recently-indexed "user login" URL example (which does have "u=0" appended to it), TPFC's search engine seems to be following & indexing user clicks on Login/ Rate buttons, to the point that the functional clicks of a persistent TPFC user managed to get ranked highly in TPFC's 'Popular Searches'.
What is happening is that the Popular Searches box appends the "u=0" parameter. The searches the TPFC code indexes are without "u=0". However, when the Popular Searches box shows the most popular searches, then it adds "u=0" to prevent a feedback loop.

Following and indexing clicks on login and rate buttons sounds like a believable (although very strange) explanation. :|
HairyPorter wrote: 2) Based on brief research, other phpBB-powered forums & websites appear to be using rel="nofollow" instead to make their internal &/or external links automatically obey that directive. Egs:-



Related Issue: Use rel="nofollow" for Specific Links (Google Webmasters)
Google Webmasters Search Console Help Center wrote: Crawl Prioritization: Search engine robots can't sign in or register as a member on your forum, so there's no reason to invite Googlebot to follow "register here" or "sign in" links. Using nofollow on these links enables Googlebot to crawl other pages you'd prefer to see in Google's index.
On the other hand, I can't find any phpBB documentation, examples of phpBB-powered sites, or any non-phpBB website using "u=0" for this purpose.
TPFC can't use rel="nofollow" here. It's an HTML attribute. TPFC search code (written in PHP) can't know if the visitor triggered the search by clicking a link that has a rel="nofollow" attribute.

Re: <Weird unicode characters thread issue>

Posted: Wed Jan 25, 2017 2:04 pm
by Andrew Lee
I have verified again that "u=0" works as intended, and clicking on the "Popular searches" links does not create a feedback loop.

Maybe the current window of 1 day is too short and create all kinds of spurious results. Markb's careless romping with an hour is enough to skew the stats.

Should we increase the stats window to 3 or 5 days to smooth things out?

[EDIT] Stats window increased to 3 days.

Re: <Weird unicode characters thread issue>

Posted: Wed Jan 25, 2017 10:53 pm
by SYSTEM
Andrew Lee wrote:Should we increase the stats window to 3 or 5 days to smooth things out?
Sounds good to me.

Weird results in the Popular Searches box

Posted: Thu Mar 02, 2017 6:54 am
by __philippe
Improbable current "Popular Searches"... :?:
[markb /forums/ucp.php?mode=login
[markb /forums/forums/ucp.php?
mode=login
[markb /forums/?p=2[markb /forums/?p=5
[markb /forums/?p=4[markb /forums/?p=3
...