Page 1 of 4

House of fail (we got scraped)

Posted: Fri May 30, 2014 6:04 pm
by webfork
Just a note that another dupe site has shown up just scraping and re-posting our data housezofportable.com has appeared (sans the letter "z"). Hilariously, all the images are broken and when you click on an entry and select "read more" it opens up our site in a sub-window (sans URL of course).

This has happened before and will probably happen again so I'm not particularly surprised or upset. Still, anyone's welcome to take action on this:
  • Post something to WOT about it
  • They're registered with http://www.enom.com so contact the abuse email listed on the registrar: abuse@enom.com
  • Do what you can to get them on any evil/spam/malware lists out there. The obvious thing is to list them with Google as copyright violators but as you might imagine, they make that exceedingly difficult.

Re: House of fail (we got scraped)

Posted: Fri May 30, 2014 9:10 pm
by joby_toss
Done the WOT thing!

Re: House of fail (we got scraped)

Posted: Sat May 31, 2014 3:59 pm
by Andrew Lee
Wow! My brain just exploded! :D The Internet is a real crazy place is all I can say.

Re: House of fail (we got scraped)

Posted: Sat May 31, 2014 6:44 pm
by I am Baas

Re: House of fail (we got scraped)

Posted: Sun Jun 01, 2014 2:43 am
by Checker
Andrew Lee wrote:Wow! My brain just exploded! :D
Image

Re: House of fail (we got scraped)

Posted: Sat Jan 03, 2015 8:00 am
by bzl333
nothing can be done? its already 6 months

Re: House of fail (we got scraped)

Posted: Sun Jan 04, 2015 2:00 pm
by webfork
bzl333 wrote:nothing can be done? its already 6 months
Site appears to be offline but these things generally come and go.

Re: House of fail (we got scraped)

Posted: Fri Jan 09, 2015 3:39 pm
by bzl333
webfork wrote:
bzl333 wrote:nothing can be done? its already 6 months
Site appears to be offline but these things generally come and go.
oh, i thought i saw it a couple of days ago when i made the post. would seem by now that internet hosting companies would quickly take care of something like this.

Re: House of fail (we got scraped)

Posted: Fri Feb 12, 2016 8:28 pm
by Andrew Lee
webfork suggested I block any suspicious IP address. But after investigation, I'm convinced that this is not simple scraping. Whoever is doing this has the source code and database.

If you do a search of the database (or a search of the forum) at the offending site, the results are correctly returned. Simple scraping cannot do this. This can only be done if one has the source code _AND_ database.

I am extremely puzzled and troubled by the motive of the person behind this.

But more urgently, I'm going to change the database password and stop the automatic source code/database backup procedure until we can figure out what is going on.

If this person truly has the source code, he will have the database password. Although the VPS is configured not to accept database connections from external IPs, this is still a very dangerous situation and needs to be rectified immediately.

Re: House of fail (we got scraped)

Posted: Fri Feb 12, 2016 9:51 pm
by tactictoe
Voted NEGATIVE to WOT.

BTW it is not the very first time I stumble on site like this...
Here is another one but it mention your site at least: http://www.mbar.com/2010/05/download-gr ... ikasi.html (add 64 after www. if you need to visit this URL)
and link to you if you click one software title under your site update list of software (pff, confusing I know). Just a blog though.

@webfork
Do what you can to get them on any evil/spam/malware lists out there. The obvious thing is to list them with Google as copyright violators but as you might imagine, they make that exceedingly difficult.
Do you have a link for this?

Re: House of fail (we got scraped)

Posted: Fri Feb 12, 2016 10:00 pm
by tactictoe
Just a suggestion: why not put a disclaimer forbidding this type of incident on your website? I cannot find anything about it. Would be more easy to complain to the webmaster of the guilty web site if it was the case. Just MHO.

For the guilty website webfork talk about:

Download the dump of the database, just a valid membership permits to do that, than it's easy.
for e.g with one of my software (Movie Info search)
Look at the referral ID number of any software or a particular one:
Movie Info Search ID is #2747
Now how to refer it?
URL + ID = Targeted software
'http://www.zortablefreeware.com/index.zhp?id=' + '2747' (deliberately change a letter for another one)
There a snapshot can be done, not going to explain how neither more about how to scrap info for obvious reason.
The only way to stop this easy part is to stop the dump of the database or to make it not accessible to public or restricted.
A readable copyright notice should also be added to the database. At the end of file?
So, is this site used stolen password or the like? I guess not, it used what is in place and 'exploited' it. It's a simple hack IMHO. Recording Member ID who did access the database dump limits the search for one who hacked. With a little bit of logic it can eventually be very precise. Once the leak identified a member can be banned. It will not stop another registration then again arise the problem. May be a filter at registration could be there a solution.
To avoid this kind of thing my suggestion could be a solution on top of more what you can do physically on your server (Stop URL referral with filters from outside the server ?)... I don't really know much more what can be done, it's unavoidable when you are popular. And TPFC is popular.

I hope this post will help you guys to solve this type of problem in the future.

Note this site was up at the time I posted this one. Domain is for sale too though. Weird.

Re: House of fail (we got scraped)

Posted: Fri Feb 12, 2016 11:09 pm
by tactictoe
Horrible, :twisted:

edited: Joke removed.

Re: House of fail (we got scraped)

Posted: Sat Feb 13, 2016 12:27 am
by SYSTEM
Andrew Lee wrote: If you do a search of the database (or a search of the forum) at the offending site, the results are correctly returned. Simple scraping cannot do this. This can only be done if one has the source code _AND_ database.
Can't confirm.

I searched for "UniExtract". Universal Extractor and Universal Extractor 2 have "uniextract" as a keyword and therefore should show up when searching for UniExtract - but only if the site has DB access.

No results.

Then I searched for "Skype". What showed up was a list of Skype Portable updates and a SkypeContactsView update.

To me, it looks like HouseOfPortable.com is simply an autoblog that scrapes our RSS feed. Their search feature is simply the built-in search feature of WordPress that searches blog posts for the given keywords. (In other words, they essentially have a private database constructed from our feed.)

I have no idea what you mean with "Simple scraping cannot do this." :?

@tactictoe

Please don't link to spam sites. You can disable automatic linking by deselecting "Do not automatically parse URLs" below the "Post a reply" textarea.

Regarding your idea of restricting access to the database dump, I think it's unnecessary, at least for now. We can, and hopefully do, log who accesses the dump and when. If there are signs of someone downloading the dump too frequently (e.g. daily), then we can come up with countermeasures, such as restricting database dump access to users with the database editing privilege.

Re: House of fail (we got scraped)

Posted: Sat Feb 13, 2016 12:40 am
by joby_toss
tactictoe wrote:Just a suggestion: why not put a disclaimer forbidding this type of incident on your website? I cannot find anything about it.
Isn't © mark supposed to do just that?
Image

@Andrew: I'm not good at this kind of stuff, so my question is: are our user passwords compromised in any way?

Re: House of fail (we got scraped)

Posted: Sat Feb 13, 2016 1:11 am
by tactictoe
@tactictoe
Please don't link to spam sites. You can disable automatic linking by deselecting "Do not automatically parse URLs" below the "Post a reply" text area.
Thanks. It won't happen again, Post modified.
Isn't © mark supposed to do just that?
Regarding some lawyer I know, regarding website in general: no and yes (all depends where the law is applied as copyright definition apply in different way for different countries).

Us law (extract):
''...Fair use rights take precedence over the author's interest. Thus the copyright holder cannot use a non-binding disclaimer, or notification, to revoke the right of fair use on works. However, binding agreements such as contracts or licence agreements may take precedence over fair use rights..."

The dump file has no copyright, and this is what I was targeting.
Just the fair use in Australia will permits the use of this dump:
https://www.alrc.gov.au/publications/4- ... t-fair-use
If there are signs of someone downloading the dump too frequently (e.g. daily), then we can come up with countermeasures, such as restricting database dump access to users with the database editing privilege.
One way to stop a leak there, even with multiple login it can be stop. :D

Law is so complicated and I did not study it, just interested.
Anyway I was trying to help.