spiders.txt still not catching fatbot (thefind.com)

**litepockets** · 30 Dec 2011, 04:08 AM

OK, so a quick background, I've always seen thefind.com crawling my site in whos_online.php but the ip's have never been marked as a spider, despite my best efforts to correctly add the UA to the spiders.txt file.

I have scoured the forums and found the latest spiders.txt file that Dr. Byte posted in the forum, which to my knowledge contains the header:
$Id: spiders.txt 18372 2011-02-08 21:21:42Z drbyte $

thefind.com's bot UA is FatBot 2.0, or more specifically: Mozilla/5.0 (compatible; FatBot 2.0; http://www.thefind.com/crawler)

Based on the above referenced spiders.txt file, the "Bot" part of "FatBot" should trigger the admin to identify it as a spider since the very first line after the header in spiders.txt is, in fact, "bot". Needless to say it doesn't. So of course, the next step would be to simply add "fatbot" on a new at line of the spiders.txt file - OK, did it, no dice. FatBot 2.0? - nope, no dice.

This really wouldn't be a problem for me, but at any given time I have a dozen or so different ip's from the find crawling my site, and all are being assigned a new session id, it's really starting to **** me off. What the heck do I need to do to prevent sessions for this dang thing?

BTW, applicable sessions settings are:
cookie domain: true
force cookie use: true (full ssl - not on shared ip)
check ssl session id: false
check user agent: false
check ip address: false
prevent spider sessions: true
recreate session: true
ip to hose conv status: true

Please note: ip address for true guest shown in SS are blurred intentionally to protect their anonymity, and full domain paths are blurred due to the adult nature of the website (I also don't feel the path is germane to the issue at hand). But as you'll see, each ip from the find has it's own unique session. googlebot is correctly pinned as a spider.. so why not fatbot???

**Ajeh** · 30 Dec 2011, 05:12 AM

Those are caught on my site without any issues ...

Check your file:
/includes/spiders.txt

and make sure that the first bot under the ID is:
bot

**litepockets** · 30 Dec 2011, 07:20 AM

Originally Posted by litepockets

Based on the above referenced spiders.txt file, the "Bot" part of "FatBot" should trigger the admin to identify it as a spider since the very first line after the header in spiders.txt is, in fact, "bot".

Hi Ajeh, I referenced that in op. Not sure what the heck the issue is.

**Website Rob** · 30 Dec 2011, 08:17 AM

litepockets,
$Id: spiders.txt 18372 2011-02-08 21:21:42Z drbyte $ - is 1.5

$Id: spiders.txt 16983 2010-07-25 17:40:59Z drbyte $ - is 1.3.9h

If using ZC v1.3.9h then why not try using the appropriate file?

Ajeh, I noticed this in 1.3.9h and 1.5 RC3, 'spiders.txt' file.

gulperbot
hï¿¤mï¿¤hï¿¤kki
hÃ¤mÃ¤hÃ¤kki
hamahakki

Am I missing something or are those actual Robot names that will be recognized?

**DrByte** · 30 Dec 2011, 03:21 PM

litepockets,
Perhaps your site's code isn't fully recognizing mixed-case names. So, "bot" won't match the "Bot" in "fatBot".
Workaround: since "thefind.com" is mentioned in the UA string and is all lowercase, you can add "thefind" to the end of your own spiders.txt file.

**litepockets** · 2 Jan 2012, 10:45 PM

Originally Posted by Website Rob

litepockets,
$Id: spiders.txt 18372 2011-02-08 21:21:42Z drbyte $ - is 1.5

$Id: spiders.txt 16983 2010-07-25 17:40:59Z drbyte $ - is 1.3.9h

If using ZC v1.3.9h then why not try using the appropriate file?

Rob, I appreciate the input, but I downloaded the file per Dr. Byte's suggestion on this post: http://www.zen-cart.com/forum/showpo...5&postcount=32 - as you can see, he stated to use that version, which is what was going to be used in v1.5 (turns out that the 1.5 version has been modified from this, but nonetheless it's still Dr. Bytes suggested version for 1.3.9h).

Originally Posted by DrByte

litepockets,
Perhaps your site's code isn't fully recognizing mixed-case names. So, "bot" won't match the "Bot" in "fatBot".
Workaround: since "thefind.com" is mentioned in the UA string and is all lowercase, you can add "thefind" to the end of your own spiders.txt file.

Dr. Byte, thanks, I'll give "thefind" a try. I had recently tried "thefind.com" but had no success. I'll report back

**litepockets** · 2 Jan 2012, 11:19 PM

Dr. Byte,

The addition of "thefind" to the end of the spiders.txt file seemed to bring about mixed results...

Good News: is that for the first time, I actually saw their dang bot correctly identified as a spider.

Bad News:
1. Correct identification isn't consistent. It reported 1 IP with the fatbot UA string as a spider, and another IP with an identical UA string as a guest (chronologically AFTER the one that was correctly reported).
2. Googlebot is suddenly not being reported as a spider.
3. It would appear that a true guest is being incorrectly tagged as a spider.

Screenshot:

Attached: my current spiders.txt

Also worth noting, I have already followed the suggestions given by Ajeh in these posts:
http://www.zen-cart.com/forum/showpo...52&postcount=4
http://www.zen-cart.com/forum/showpo...28&postcount=5

**DrByte** · 3 Jan 2012, 03:01 AM

Sounds like something wrong with the many addons you've got.
I can't replicate your problem here.

**litepockets** · 3 Jan 2012, 03:11 AM

Dr. Byte,

Thank you for trying. Any general idea of which file(s) to examine other than those outlined?

**mcqueeneycoins** · 19 Jun 2012, 12:28 PM

I'm having the same trouble the other users have experienced with this spider not being recognized as a spider. I have added thefind, thefind.com, fatlens, fatbot, FatBot, and any possible combination I can think of to my spiders.txt file, but no luck. At the moment, this spider has 25 separate pages/sessions running. Is there any other way I can get this properly classified?

Thanks!

Thread: spiders.txt still not catching fatbot (thefind.com)

Thread Tools

Search Thread

Display

spiders.txt still not catching fatbot (thefind.com)

Re: Most recent spiders.txt still not catching fatbot (thefind.com)

Re: Most recent spiders.txt still not catching fatbot (thefind.com)

Re: Most recent spiders.txt still not catching fatbot (thefind.com)

Re: Most recent spiders.txt still not catching fatbot (thefind.com)

Re: Most recent spiders.txt still not catching fatbot (thefind.com)

Re: spiders.txt still not catching fatbot (thefind.com)

Re: spiders.txt still not catching fatbot (thefind.com)

Re: spiders.txt still not catching fatbot (thefind.com)

Re: spiders.txt still not catching fatbot (thefind.com)

Similar Threads

Include a bot in spiders.txt

Bookmarks

Bookmarks

Posting Permissions