Results 1 to 6 of 6
  1. #1
    Join Date
    May 2008
    Posts
    8
    Plugin Contributions
    0

    help question please explain spider.txt

    Can someone explain the spider.txt file please? I see it in the includes folder and its loaded with over 396 entries including googlebot, yahoo, Microsoft and other common search engines.

    From what I can tell spider.txt is used to block these bots, right? Well I can understanding wanting to block malicious bots, but I want the search engines to index my site. If I have these popular search engines in spider.txt does that mean my site wont get indexed?

    What’s the best way to protect my site, but still allow it to be indexed by the search engines?

    Thanks

  2. #2
    Join Date
    Jan 2004
    Posts
    66,444
    Plugin Contributions
    279

    Default Re: please explain spider.txt

    The spiders.txt file is used to *identify* bots when they visit the site, so that the site knows that they're bots and lets them do the indexing that's needed.
    It is not used to block them from indexing your site. It only blocks their ability to place purchases on your site ... since that would confuse things significantly.
    .

    Zen Cart - putting the dream of business ownership within reach of anyone!
    Donate to: DrByte directly or to the Zen Cart team as a whole

    Remember: Any code suggestions you see here are merely suggestions. You assume full responsibility for your use of any such suggestions, including any impact ANY alterations you make to your site may have on your PCI compliance.
    Furthermore, any advice you see here about PCI matters is merely an opinion, and should not be relied upon as "official". Official PCI information should be obtained from the PCI Security Council directly or from one of their authorized Assessors.

  3. #3
    Join Date
    May 2008
    Posts
    8
    Plugin Contributions
    0

    Default Re: please explain spider.txt

    ahh, ok. That makes me feel better.

  4. #4
    Join Date
    Aug 2008
    Posts
    115
    Plugin Contributions
    0

    Default Re: please explain spider.txt

    I recently noticed a logged in "guest" entry in "Whos Online" with credentials referencing a host of searchme.com and with charlotte in its useragent line. Scrutinizing spiders.txt revealed no entry for searchme or charlotte (searchme's spider).

    by comparison, I noticed that yahoo had a spiders.txt reference of:
    "yahoo! slurp"
    ...as well as an apparent entry of just:
    "slurp"

    Would someone outline the procedure for correctly identifying and adding a new spider to spiders.txt?

    Should one add "charlotte" to spiders.txt, or "searchme charlotte", or both?

    Or is there someplace on line that publishes an up-to-date list of all current known spiders?

    (I wish i'd have copied the searchme's Host and User Agent entry as it was seen in admin ...but they're no longer online.)

    Thanks for any feedback...

  5. #5
    Join Date
    Aug 2008
    Posts
    115
    Plugin Contributions
    0

    Default Re: please explain spider.txt

    Ah, it's back ...here's the sample:

    Host: crawl1.nat.svl.searchme.com
    User Agent: Mozilla/5.0 (compatible; Charlotte/1.1; http://www.searchme.com/support/)

    I added "charlotte" to my spiders list and zc admin now identifies them as such ...and no session! However, without trial and errror, my questions above still stand:

    1) Would someone outline the procedure for correctly identifying and adding a new spider to spiders.txt and how to know what text to include?

    2) is there someplace online that publishes an up-to-date list of all current known spiders?
    Last edited by stride-r; 24 Jan 2009 at 12:09 AM.

  6. #6
    Join Date
    Oct 2009
    Location
    Bronx, NY
    Posts
    22
    Plugin Contributions
    0

    Default Re: please explain spider.txt

    This post helped tremendously, I had originally set 'Force Cookie Use' under 'sessions' to true after I tested adding to cart from a different computer and ran into the warning that cookies must be enabled (they already were on that computer). I set it to false again and used the updated spiders.txt file Dr. Byte provided, looks to be working now, much preferable this way. Thanks!

 

 

Similar Threads

  1. SQL Cache (Explain Please)
    By jabbawest in forum Basic Configuration
    Replies: 3
    Last Post: 29 Aug 2010, 11:57 PM
  2. Please explain how it works...
    By sollysmum in forum General Questions
    Replies: 2
    Last Post: 27 May 2010, 01:55 PM
  3. Permission for spider.txt file
    By Yolanda in forum General Questions
    Replies: 4
    Last Post: 8 May 2010, 10:18 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
disjunctive-egg