Kuroi Web Design and Development | Twitter
(Questions answered in the forum only - so that any forum member can benefit - not by personal message)
The non existent urls are caused from people linking to their generated content... Many scraper sites use a dynamic platform and sessions to produce real time scraped content... So the user gets urls like
mysite.com/index.html?jhbdfb or similar... But these pages are produced in real time, so the scraper site visitors link to pages which can never be reproduced.
A very common one of these is a site which displays content for folks at work behind a firewall... Work Friendly. These use this type of platform and until recently their results for site were allowed to be indexed... In fact duplicating all pages of every user added site in their database.
Redirecting the user agent effectively prevents this as the visitor is returned to the site of origin respectively. Blocking them if accomplished does nothing to gain from the incident. Many of these site crawl with one IP/User agent and produce results with another so you are forced to search your logs to find the correct IP/agent to deny access for... When if you just grab these urls as they are being called for you can convert this into a positive situation.
~Melanie
PRO-Webs, Inc. since 2003 :: Zen Cart Hosting :: Zen Cart SEO – 12 Steps to Success
**I answer questions in the forum, private messages are not conducive to a helpful community.
Thanks for everybodys helpful response.
One thing I have also noticed is we did not have a custom 404 page up i.e. a zencart one with the sitemap. So instead it was linking to a server 404 error document. Im not sure if that was causing problems with the SQL database.
Anyway I've blocked the IP range for the time being. Im a bit dubious of blocking the whole of China, even though 99% are unwelcome visitors. I do not want it to effect google rankings in other Asia countries.