House of fail (we got scraped)

Any other tech-related topics
Message
Author
User avatar
webfork
Posts: 9799
Joined: Wed Apr 11, 2007 8:06 pm
Location: US, Texas
Contact:

House of fail (we got scraped)

#1 Post by webfork » Fri May 30, 2014 6:04 pm

Just a note that another dupe site has shown up just scraping and re-posting our data housezofportable.com has appeared (sans the letter "z"). Hilariously, all the images are broken and when you click on an entry and select "read more" it opens up our site in a sub-window (sans URL of course).

This has happened before and will probably happen again so I'm not particularly surprised or upset. Still, anyone's welcome to take action on this:
  • Post something to WOT about it
  • They're registered with http://www.enom.com so contact the abuse email listed on the registrar: abuse@enom.com
  • Do what you can to get them on any evil/spam/malware lists out there. The obvious thing is to list them with Google as copyright violators but as you might imagine, they make that exceedingly difficult.

User avatar
joby_toss
Posts: 2902
Joined: Sat Feb 09, 2008 9:57 am
Location: Romania
Contact:

Re: House of fail (we got scraped)

#2 Post by joby_toss » Fri May 30, 2014 9:10 pm

Done the WOT thing!

User avatar
Andrew Lee
Posts: 2606
Joined: Sat Feb 04, 2006 9:19 am
Contact:

Re: House of fail (we got scraped)

#3 Post by Andrew Lee » Sat May 31, 2014 3:59 pm

Wow! My brain just exploded! :D The Internet is a real crazy place is all I can say.


User avatar
Checker
Posts: 1624
Joined: Wed Jun 20, 2007 1:00 pm
Location: Ingolstadt [DE]

Re: House of fail (we got scraped)

#5 Post by Checker » Sun Jun 01, 2014 2:43 am

Andrew Lee wrote:Wow! My brain just exploded! :D
Image

bzl333
Posts: 167
Joined: Wed Jan 12, 2011 3:11 pm

Re: House of fail (we got scraped)

#6 Post by bzl333 » Sat Jan 03, 2015 8:00 am

nothing can be done? its already 6 months

User avatar
webfork
Posts: 9799
Joined: Wed Apr 11, 2007 8:06 pm
Location: US, Texas
Contact:

Re: House of fail (we got scraped)

#7 Post by webfork » Sun Jan 04, 2015 2:00 pm

bzl333 wrote:nothing can be done? its already 6 months
Site appears to be offline but these things generally come and go.

bzl333
Posts: 167
Joined: Wed Jan 12, 2011 3:11 pm

Re: House of fail (we got scraped)

#8 Post by bzl333 » Fri Jan 09, 2015 3:39 pm

webfork wrote:
bzl333 wrote:nothing can be done? its already 6 months
Site appears to be offline but these things generally come and go.
oh, i thought i saw it a couple of days ago when i made the post. would seem by now that internet hosting companies would quickly take care of something like this.

User avatar
Andrew Lee
Posts: 2606
Joined: Sat Feb 04, 2006 9:19 am
Contact:

Re: House of fail (we got scraped)

#9 Post by Andrew Lee » Fri Feb 12, 2016 8:28 pm

webfork suggested I block any suspicious IP address. But after investigation, I'm convinced that this is not simple scraping. Whoever is doing this has the source code and database.

If you do a search of the database (or a search of the forum) at the offending site, the results are correctly returned. Simple scraping cannot do this. This can only be done if one has the source code _AND_ database.

I am extremely puzzled and troubled by the motive of the person behind this.

But more urgently, I'm going to change the database password and stop the automatic source code/database backup procedure until we can figure out what is going on.

If this person truly has the source code, he will have the database password. Although the VPS is configured not to accept database connections from external IPs, this is still a very dangerous situation and needs to be rectified immediately.

User avatar
tactictoe
Posts: 283
Joined: Thu Dec 10, 2015 10:56 am
Location: A galaxy far far downunder
Contact:

Re: House of fail (we got scraped)

#10 Post by tactictoe » Fri Feb 12, 2016 9:51 pm

Voted NEGATIVE to WOT.

BTW it is not the very first time I stumble on site like this...
Here is another one but it mention your site at least: http://www.mbar.com/2010/05/download-gr ... ikasi.html (add 64 after www. if you need to visit this URL)
and link to you if you click one software title under your site update list of software (pff, confusing I know). Just a blog though.

@webfork
Do what you can to get them on any evil/spam/malware lists out there. The obvious thing is to list them with Google as copyright violators but as you might imagine, they make that exceedingly difficult.
Do you have a link for this?
Last edited by tactictoe on Sat Feb 13, 2016 1:12 am, edited 3 times in total.

User avatar
tactictoe
Posts: 283
Joined: Thu Dec 10, 2015 10:56 am
Location: A galaxy far far downunder
Contact:

Re: House of fail (we got scraped)

#11 Post by tactictoe » Fri Feb 12, 2016 10:00 pm

Just a suggestion: why not put a disclaimer forbidding this type of incident on your website? I cannot find anything about it. Would be more easy to complain to the webmaster of the guilty web site if it was the case. Just MHO.

For the guilty website webfork talk about:

Download the dump of the database, just a valid membership permits to do that, than it's easy.
for e.g with one of my software (Movie Info search)
Look at the referral ID number of any software or a particular one:
Movie Info Search ID is #2747
Now how to refer it?
URL + ID = Targeted software
'http://www.zortablefreeware.com/index.zhp?id=' + '2747' (deliberately change a letter for another one)
There a snapshot can be done, not going to explain how neither more about how to scrap info for obvious reason.
The only way to stop this easy part is to stop the dump of the database or to make it not accessible to public or restricted.
A readable copyright notice should also be added to the database. At the end of file?
So, is this site used stolen password or the like? I guess not, it used what is in place and 'exploited' it. It's a simple hack IMHO. Recording Member ID who did access the database dump limits the search for one who hacked. With a little bit of logic it can eventually be very precise. Once the leak identified a member can be banned. It will not stop another registration then again arise the problem. May be a filter at registration could be there a solution.
To avoid this kind of thing my suggestion could be a solution on top of more what you can do physically on your server (Stop URL referral with filters from outside the server ?)... I don't really know much more what can be done, it's unavoidable when you are popular. And TPFC is popular.

I hope this post will help you guys to solve this type of problem in the future.

Note this site was up at the time I posted this one. Domain is for sale too though. Weird.

User avatar
tactictoe
Posts: 283
Joined: Thu Dec 10, 2015 10:56 am
Location: A galaxy far far downunder
Contact:

Re: House of fail (we got scraped)

#12 Post by tactictoe » Fri Feb 12, 2016 11:09 pm

Horrible, :twisted:

edited: Joke removed.
Last edited by tactictoe on Sat Feb 13, 2016 7:41 pm, edited 4 times in total.

User avatar
SYSTEM
Posts: 1969
Joined: Sat Jul 31, 2010 1:19 am
Location: Helsinki, Finland

Re: House of fail (we got scraped)

#13 Post by SYSTEM » Sat Feb 13, 2016 12:27 am

Andrew Lee wrote: If you do a search of the database (or a search of the forum) at the offending site, the results are correctly returned. Simple scraping cannot do this. This can only be done if one has the source code _AND_ database.
Can't confirm.

I searched for "UniExtract". Universal Extractor and Universal Extractor 2 have "uniextract" as a keyword and therefore should show up when searching for UniExtract - but only if the site has DB access.

No results.

Then I searched for "Skype". What showed up was a list of Skype Portable updates and a SkypeContactsView update.

To me, it looks like HouseOfPortable.com is simply an autoblog that scrapes our RSS feed. Their search feature is simply the built-in search feature of WordPress that searches blog posts for the given keywords. (In other words, they essentially have a private database constructed from our feed.)

I have no idea what you mean with "Simple scraping cannot do this." :?

@tactictoe

Please don't link to spam sites. You can disable automatic linking by deselecting "Do not automatically parse URLs" below the "Post a reply" textarea.

Regarding your idea of restricting access to the database dump, I think it's unnecessary, at least for now. We can, and hopefully do, log who accesses the dump and when. If there are signs of someone downloading the dump too frequently (e.g. daily), then we can come up with countermeasures, such as restricting database dump access to users with the database editing privilege.
My YouTube channel | Release date of my 13th playlist: August 24, 2020

User avatar
joby_toss
Posts: 2902
Joined: Sat Feb 09, 2008 9:57 am
Location: Romania
Contact:

Re: House of fail (we got scraped)

#14 Post by joby_toss » Sat Feb 13, 2016 12:40 am

tactictoe wrote:Just a suggestion: why not put a disclaimer forbidding this type of incident on your website? I cannot find anything about it.
Isn't © mark supposed to do just that?
Image

@Andrew: I'm not good at this kind of stuff, so my question is: are our user passwords compromised in any way?

User avatar
tactictoe
Posts: 283
Joined: Thu Dec 10, 2015 10:56 am
Location: A galaxy far far downunder
Contact:

Re: House of fail (we got scraped)

#15 Post by tactictoe » Sat Feb 13, 2016 1:11 am

@tactictoe
Please don't link to spam sites. You can disable automatic linking by deselecting "Do not automatically parse URLs" below the "Post a reply" text area.
Thanks. It won't happen again, Post modified.
Isn't © mark supposed to do just that?
Regarding some lawyer I know, regarding website in general: no and yes (all depends where the law is applied as copyright definition apply in different way for different countries).

Us law (extract):
''...Fair use rights take precedence over the author's interest. Thus the copyright holder cannot use a non-binding disclaimer, or notification, to revoke the right of fair use on works. However, binding agreements such as contracts or licence agreements may take precedence over fair use rights..."

The dump file has no copyright, and this is what I was targeting.
Just the fair use in Australia will permits the use of this dump:
https://www.alrc.gov.au/publications/4- ... t-fair-use
If there are signs of someone downloading the dump too frequently (e.g. daily), then we can come up with countermeasures, such as restricting database dump access to users with the database editing privilege.
One way to stop a leak there, even with multiple login it can be stop. :D

Law is so complicated and I did not study it, just interested.
Anyway I was trying to help.

Post Reply