Page 1 of 3 123 LastLast
Results 1 to 10 of 28
  1. #1
    Join Date
    Mar 2010
    Location
    United Kingdom
    Posts
    19
    Plugin Contributions
    0

    Default Spiders being banned from my site

    About two years ago I decided to hire a freelancer to make a custom skin and few mods for me, at the time I was not interested in SEO so I wouldn't have noticed, but I suspect he banned bots from indexing my site. I have also hired another freelancer within that time to do things for me so it could have been him instead.

    None the less, when I tried my hand at some SEO, some 9 months ago, I noticed that bots were unable to index my site. I eventually got help from a sysadmin friend who found that one of the files had banned anything from spiders.txt from indexing my site. We argued about whether this was a normal function of zen cart and then his fix was to simply empty my spiders.txt file and I foolishly did not note down what file he had found the offending code in. I can't get a hold of him to ask him to take a look and find the file again and I have downloaded all of my site files and searched a few keywords (such as spider) to try finding the code again without any luck. What can I do?

    My who's online is bogged down by bots and extremely laggy.

  2. #2
    Join Date
    Nov 2007
    Location
    Sunny Coast, Australia
    Posts
    3,379
    Plugin Contributions
    9

    Default Re: Spiders being banned from my site

    Quote Originally Posted by Zedez512 View Post
    About two years ago I decided to hire a freelancer to make a custom skin and few mods for me, at the time I was not interested in SEO so I wouldn't have noticed, but I suspect he banned bots from indexing my site. I have also hired another freelancer within that time to do things for me so it could have been him instead.

    None the less, when I tried my hand at some SEO, some 9 months ago, I noticed that bots were unable to index my site. I eventually got help from a sysadmin friend who found that one of the files had banned anything from spiders.txt from indexing my site. We argued about whether this was a normal function of zen cart and then his fix was to simply empty my spiders.txt file and I foolishly did not note down what file he had found the offending code in. I can't get a hold of him to ask him to take a look and find the file again and I have downloaded all of my site files and searched a few keywords (such as spider) to try finding the code again without any luck. What can I do?

    My who's online is bogged down by bots and extremely laggy.
    Geeez, I hate it when someone makes unauthorized changes to a client's site!!! Seems like none of your 'helpers' has the slightest clue about the workings of ZenCart!

    What is your ZC version?
    Which mods are installed?

  3. #3
    Join Date
    Jan 2007
    Location
    Australia
    Posts
    6,167
    Plugin Contributions
    7

    Default Re: Spiders being banned from my site

    Quote Originally Posted by Zedez512 View Post
    I eventually got help from a sysadmin friend who found that one of the files had banned anything from spiders.txt from indexing my site. We argued about whether this was a normal function of zen cart and then his fix was to simply empty my spiders.txt file and I foolishly did not note down what file he had found the offending code in.
    Curious. Although theoretically possible, the mind boggles at what was needed to make the spiders.txt file act as though it were a robots.txt file.

    The spiders.txt file is a zencart thingy that prevents known spiders from creating SESSIONS. It doesn't (or isn't supposed to) prevent bots from crawling a site. That is what the robots.txt is for

    Quote Originally Posted by Zedez512 View Post
    I can't get a hold of him to ask him to take a look and find the file again and I have downloaded all of my site files and searched a few keywords (such as spider) to try finding the code again without any luck. What can I do?.
    IF the freelancer did somehow manage to make the spiders.txt file act like a robots.txt file s/he is both very clever and very foolish.
    I couldn't. Why would anyone do that????

    I wouldn't know where to start looking for this - So I won't even try. I'm hoping that the freelancer didn't make any actual *code* changes to pull this feat off.

    What I can do is tell you what you *should* have.

    ... but first.....

    Quote Originally Posted by Zedez512 View Post
    My who's online is bogged down by bots and extremely laggy.
    As a general rule, bots are a good thing. Without them your site is going to be near impossible to find. Be very careful you don't block them all.

    OK, so there are basically two files for you to check and consider

    /includes/spiders.txt
    This file contains a simple list of known bots. One per line. This file is part of all Zencart releases - it isn't version specific, so check the contents of this file, if it is empty (or near empty) just replace it with a copy from the ZenCart distribution files. It will contain over 500 lines/entries. Not something to be edited by hand.

    This will take care a lot of the 'whos online' bot activity - but it doesn't prevent them.

    The other file is called
    robots.txt from crawling the site. This is located in the root folder of your store. It quite a small file, that would typically read like:

    ------------------------------
    User-agent: *
    Disallow: /cgi-bin/
    Disallow: /cache/
    Disallow: /logs/
    --------------------------------

    This is telling all 'good' bots to not crawl the folders that are disallowed.

    To Disallow a specific bot from crawling anything on the site, you'll need to add something like"

    -----------------------------
    User-Agent: badbot
    Disallow: /
    -----------------------------

    You can find more examples here http://www.robotstxt.org/robotstxt.html

    Restoring the spiders.txt file and judicially editing the robots.txt file should get things back to how they *should* be again. If not, you are going to have to dig really deep into the zencart code to see what the freelancer changed to make the spiders.txt file act like a robots.txt file. It *probably* won't come to that though 'cos it is giving the freelancer more credit than they deserve. ;-)

    Important: Not all bots will 'honor' the contents of the robots.txt file - (Google, Bing, and most do). For those that don't, they need to be block via other means - generally via the .htaccess file(s), but there are other methods (such as using a firewall).

    Hopefully this helps get you back on track.

    Cheers
    RodG

  4. #4
    Join Date
    Mar 2010
    Location
    United Kingdom
    Posts
    19
    Plugin Contributions
    0

    Default Re: Spiders being banned from my site

    My robots.txt and spiders.txt are fine, I'm 100% sure.

    What happens is that anything inside spiders.txt or I think identified as a spider, is being denied access to my website. So the current "fix" to allow spiders to index my site is to have an empty spiders.txt (I have a copy with entries though).

    Basically the issue is that if I have entries in spiders.txt, those bots are banned from indexing my site and when I don't have those entries, the who's online is completely clogged up with 2,000 bots so I can't actually see who is online with an active cart.

    I really wish that I had noted down the file which my friend showed me. I remember clearly seeing a couple of lines which were specifically banning spiders. I'm not sure what keywords I can search to track down this code within my files since "spider" doesn't seem to be showing it. I have already searched "deny" and "die" and didn't see anything out of the ordinary there.

    I guess I am asking for words/code which I can search to try finding this so I don't have to spend hundreds of dollars having someone dig through every file for me.

  5. #5
    Join Date
    Nov 2007
    Location
    Sunny Coast, Australia
    Posts
    3,379
    Plugin Contributions
    9

    Default Re: Spiders being banned from my site

    Quote Originally Posted by Zedez512 View Post
    My robots.txt and spiders.txt are fine, I'm 100% sure.

    What happens is that anything inside spiders.txt or I think identified as a spider, is being denied access to my website. So the current "fix" to allow spiders to index my site is to have an empty spiders.txt (I have a copy with entries though).

    Basically the issue is that if I have entries in spiders.txt, those bots are banned from indexing my site and when I don't have those entries, the who's online is completely clogged up with 2,000 bots so I can't actually see who is online with an active cart.

    I really wish that I had noted down the file which my friend showed me. I remember clearly seeing a couple of lines which were specifically banning spiders. I'm not sure what keywords I can search to track down this code within my files since "spider" doesn't seem to be showing it. I have already searched "deny" and "die" and didn't see anything out of the ordinary there.

    I guess I am asking for words/code which I can search to try finding this so I don't have to spend hundreds of dollars having someone dig through every file for me.
    You could do a comparison of your file system with a stock ZC file system of the same version. Use BeyondCompare or WinMerge or similar. Tedious but it will show the changes your guys did to the cart.

  6. #6
    Join Date
    Jan 2007
    Location
    Australia
    Posts
    6,167
    Plugin Contributions
    7

    Default Re: Spiders being banned from my site

    Quote Originally Posted by Zedez512 View Post
    My robots.txt and spiders.txt are fine, I'm 100% sure.
    I'd be happier if you were 200% sure :-)

    Quote Originally Posted by Zedez512 View Post
    I really wish that I had noted down the file which my friend showed me. I remember clearly seeing a couple of lines which were specifically banning spiders.
    If *I* were going to do a hack like this (and being a 'lazy' programmer) I'd probably hook into where the spiders.txt is already being read, which is the file

    /includes/init_includes/init_sessions.php

    My suggestion would be to compare this file with a ZenCart original.

    Quote Originally Posted by Zedez512 View Post
    I'm not sure what keywords I can search to track down this code within my files since "spider" doesn't seem to be showing it. I have already searched "deny" and "die" and didn't see anything out of the ordinary there.
    The 'obvious' thing to search for would be 'spiders.txt' - You should find ONE and only one file that contains this (the one I've mentioned above).

    If you didn't even find this one with your search then if there are others, you won't find them either, and that there is something amiss with your search (I used the developers toolkit. What did you use?)

    Quote Originally Posted by Zedez512 View Post
    I guess I am asking for words/code which I can search to try finding this so I don't have to spend hundreds of dollars having someone dig through every file for me.
    The way I see it, although the code could have been hacked to behave as described in a number of places, it could only work as described with *some* reference to the spiders.txt file itself - So if the hack hasn't been performed on the init_sessions.php file there *must* be another file that also contains 'spiders.txt'.

    Accept this as an indisputable fact, and take it from there. :-)

    Cheers
    RodG

  7. #7
    Join Date
    Mar 2010
    Location
    United Kingdom
    Posts
    19
    Plugin Contributions
    0

    Default Re: Spiders being banned from my site

    The only reference to spiders.txt is in fact that file and it's no different to the original

    When spiders.txt is empty, my site can be indexed, but we don't want spiders to have sessions do we?
    When spiders.txt has entries like normal, all bots get internal server error 500

    I just did some testing, changing:
    $spiders = file(DIR_WS_INCLUDES . 'newfile.txt');
    and changed $spider to something else and they still get error 500 in both cases.

  8. #8
    Join Date
    Nov 2007
    Location
    Sunny Coast, Australia
    Posts
    3,379
    Plugin Contributions
    9

    Default Re: Spiders being banned from my site

    Did you look at my suggestion in post #5 of this thread?

  9. #9
    Join Date
    Mar 2010
    Location
    United Kingdom
    Posts
    19
    Plugin Contributions
    0

    Default Re: Spiders being banned from my site

    Quote Originally Posted by frank18 View Post
    Did you look at my suggestion in post #5 of this thread?
    I will have a go at this but what if it is inside the new template files? The problem has persisted across two servers and the site was indexed just fine (under a different domain, I went for a new look and a new name) before I had any work done.

    Now that I think about it, I did have him make it so that customers were always logged in when returning to the site (or never logged out, unless they manually log out) so that they wouldn't lose their cart session, as most were unaware they would lose it when inactive and not logged in. Is it possible that he has done something to sessions which is causing a server error when bots are not given a customer session id?
    Last edited by Zedez512; 29 Nov 2015 at 01:03 PM.

  10. #10
    Join Date
    Nov 2007
    Location
    Sunny Coast, Australia
    Posts
    3,379
    Plugin Contributions
    9

    Default Re: Spiders being banned from my site

    Quote Originally Posted by Zedez512 View Post
    I will have a go at this but what if it is inside the new template files? The problem has persisted across two servers and the site was indexed just fine (under a different domain, I went for a new look and a new name) before I had any work done.

    Now that I think about it, I did have him make it so that customers were always logged in when returning to the site (or never logged out, unless they manually log out) so that they wouldn't lose their cart session, as most were unaware they would lose it when inactive and not logged in. Is it possible that he has done something to sessions which is causing a server error when bots are not given a customer session id?
    Hard to say without knowing what he has actually done.

    What do you see in the error logs ( /logs folder )?

 

 
Page 1 of 3 123 LastLast

Similar Threads

  1. Replies: 7
    Last Post: 7 Feb 2011, 09:22 PM
  2. Drouplicating site - Don't want spiders to Crawl Test Site
    By grafx2g in forum General Questions
    Replies: 3
    Last Post: 15 May 2009, 08:26 AM
  3. how to prevent my site from being copied?
    By lina0962 in forum General Questions
    Replies: 2
    Last Post: 16 Feb 2007, 04:37 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
disjunctive-egg
Zen-Cart, Internet Selling Services, Klamath Falls, OR