Results 1 to 8 of 8
  1. #1
    Join Date
    Apr 2013
    Location
    United States
    Posts
    11
    Plugin Contributions
    0

    Default canonicallink url improperly sanitized

    The canonicalLink on product pages gets created on line 43 of includes/init_includes/init_canonical by
    zen_href_link

    Code:
    39 /**
         40  * for products (esp those linked to multiple categories):
         41  */
         42   case (strstr($current_page, '_info') && isset($_GET['products_id'])):
         43     $canonicalLink = zen_href_link($current_page, ($includeCPath ? 'cPath=' . zen_get_generated_category_path_rev(zen_get_products_category_id($_GET['products_id'])) . '&' : '') . 'products_id=' . $_GET['products_id'], 'NO     43 NSSL', false);
         44     break;
    This function is defined in includes/functions/html_output.php. The problem is that just before returning, the URL gets sanitized on line 94 replacing & with &
    Code:
         94     $link = preg_replace('/&/', '&', $link);
    The result in the document head is like
    Code:
    <link rel="canonical" href="http://example.com/index.php?main_page=product_info&amp;products_id=1" />
    instead of

    Code:
    <link rel="canonical" href="http://example.com/index.php?main_page=product_info&products_id=1" />
    Google bot then attempts to follow the link which triggers missing resource (404) on the webserver.


    Any suggestions how to fix this with minimal effort? I could add yet another argument to zen_href_link and use it to escape the sanitization, but may be there's a better way. May be there is no need to use zen_href_link at all to create that link, but I am very unfamiliar with zen-cart to know whether that's really the case.

  2. #2
    Join Date
    Jan 2004
    Posts
    66,373
    Blog Entries
    7
    Plugin Contributions
    274

    Default Re: canonicallink url improperly sanitized

    Quote Originally Posted by moogawooga View Post
    Google bot then attempts to follow the link which triggers missing resource (404) on the webserver.
    Changing &amp; to & in an href is opposite to pretty much every piece of guidance on proper formatting.

    How exactly did you arrive to this conclusion that this is specifically what's causing a 404 by googlebot? Excessive details encouraged.
    .

    Zen Cart - putting the dream of business ownership within reach of anyone!
    Donate to: DrByte directly or to the Zen Cart team as a whole

    Remember: Any code suggestions you see here are merely suggestions. You assume full responsibility for your use of any such suggestions, including any impact ANY alterations you make to your site may have on your PCI compliance.
    Furthermore, any advice you see here about PCI matters is merely an opinion, and should not be relied upon as "official". Official PCI information should be obtained from the PCI Security Council directly or from one of their authorized Assessors.

  3. #3
    Join Date
    Apr 2013
    Location
    United States
    Posts
    11
    Plugin Contributions
    0

    Default Re: canonicallink url improperly sanitized

    My mistake about who triggers the 404 - I must have looked in the next line for the user agent.
    Nevertheless the errors are still there triggered from non-bots. Here's a representative error from the apache log

    Code:
    198.241.217.15 - - [29/Apr/2013:20:02:43 -0400] "GET /index.php?main_page=product_info&amp;products_id=1145 HTTP/1.1" 404 36785 "http://MYSITE/index.php?main_page=product_info&cPath=64&products_id=1145&zenid=f31648d257b3a7eb6aa129dee97fcb31" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0"
    The way I tracked it down was by looking at the source code of the referring url (in blue) and the only place where the url which triggers the error was in the canonical link in the head. And of course, I double checked that a browser (tested in chrome, firefox & safari) is happy with http://MYSITE/index.php?main_page=product_info&products_id=1145 (gets to the correct product) and unhappy with http://MYSITE/index.php?main_page=product_info&amp;products_id=1145 (404 and zen cart returns 'product not found').

  4. #4
    Join Date
    Apr 2013
    Location
    United States
    Posts
    11
    Plugin Contributions
    0

    Default Re: canonicallink url improperly sanitized

    Forgot to add that once I saw the issue I checked that it exists in vanilla install of zc 1.5.1. The url in the canonical link contains &amp; instead of &, and is thus invalid.

  5. #5
    Join Date
    Apr 2013
    Location
    United States
    Posts
    11
    Plugin Contributions
    0

    Default Re: canonicallink url improperly sanitized

    Forgot to add that once I saw the issue I checked that it exists in vanilla install of zc 1.5.1. The url in the canonical link contains &amp; instead of &, and is thus invalid.


    Quote Originally Posted by DrByte View Post
    Changing & to & in an href is opposite to pretty much every piece of guidance on proper formatting.

    I do not think this has anything to do with 'formatting', as having correct URL is unrelated to the representation of ascii symbols in html.


    P.S. Sorry for the multiple responses - I hate vbulletin, and I'm not accustomed to 7min edit window.
    Last edited by moogawooga; 30 Apr 2013 at 09:20 AM.

  6. #6
    Join Date
    Aug 2005
    Location
    Vic, Oz
    Posts
    1,905
    Plugin Contributions
    5

    Default Re: canonicallink url improperly sanitized

    Quote Originally Posted by moogawooga View Post
    My mistake about who triggers the 404 - I must have looked in the next line for the user agent.
    Nevertheless the errors are still there triggered from non-bots. Here's a representative error from the apache log

    Code:
    198.241.217.15 - - [29/Apr/2013:20:02:43 -0400] "GET /index.php?main_page=product_info&amp;products_id=1145 HTTP/1.1" 404 36785 "http://MYSITE/index.php?main_page=product_info&cPath=64&products_id=1145&zenid=f31648d257b3a7eb6aa129dee97fcb31" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0"
    The way I tracked it down was by looking at the source code of the referring url (in blue) and the only place where the url which triggers the error was in the canonical link in the head. And of course, I double checked that a browser (tested in chrome, firefox & safari) is happy with http://MYSITE/index.php?main_page=product_info&products_id=1145 (gets to the correct product) and unhappy with http://MYSITE/index.php?main_page=product_info&amp;products_id=1145 (404 and zen cart returns 'product not found').
    You are "partially" right.

    This is the correct code to put directly into a browsers address bar
    Code:
    http://MYSITE/index.php?main_page=product_info&products_id=1145
    This will cause an error if directly put into a browsers address bar
    Code:
    http://MYSITE/index.php?main_page=product_info&amp;products_id=1145
    However this is the correct way to create a link in a website.
    10's of thousands of zen-cart site using this code are working perfectly!
    So there is something else going on here.

    Link to your site please?
    Last edited by gilby; 30 Apr 2013 at 09:45 AM.

  7. #7
    Join Date
    Jan 2004
    Posts
    66,373
    Blog Entries
    7
    Plugin Contributions
    274

    Default Re: canonicallink url improperly sanitized

    Actually, unless you've changed the core code to force the inclusion of cPath in the canonical URL for the products, page, then there would not be a cPath in it. Thus, the "error" you quoted from your logs couldn't have come from the canonical link.

    Thus it suggests you've either changed the core code, or you've got something else that's causing the errors which you claim are invalid.

    I'm inclined to think your 404 errors are triggered by something else. You removed your URL from your logs, so we can't actually test the true URL to be certain about the validity of the rest of the data.
    .

    Zen Cart - putting the dream of business ownership within reach of anyone!
    Donate to: DrByte directly or to the Zen Cart team as a whole

    Remember: Any code suggestions you see here are merely suggestions. You assume full responsibility for your use of any such suggestions, including any impact ANY alterations you make to your site may have on your PCI compliance.
    Furthermore, any advice you see here about PCI matters is merely an opinion, and should not be relied upon as "official". Official PCI information should be obtained from the PCI Security Council directly or from one of their authorized Assessors.

  8. #8
    Join Date
    Feb 2012
    Location
    mostly harmless
    Posts
    1,809
    Plugin Contributions
    8

    Default Re: canonicallink url improperly sanitized

    gilby and DrByte are correct. Here are some reference documents from the W3C in regards to handling when an ampersand appears in the URI of an element attribute (such as inside the "<a href="URI"></a>"). And from the W3C validation service help.

    Why? Because & is a reserved character in (X)HTML. It signals the start of an "HTML (or XHTML) entity". When the web browser (or search engine bot) parses the (X)HTML, it will read "&amp;", know this is the entity representing "&", and correctly load / follow the link (using "&" instead of "&amp;").

    As gilby pointed out, when you TYPE a link into a browser window, the URI is is not being read from (X)HTML (and thus does not need to be parsed) and is used "as-is". So when you type something into the browser's address bar you would use just "&" not "&amp;" in the URI.

    I'd guess there is somethings else which differs on your web server or Zen Cart installation which is the root cause of the 404s.
    The glass is not half full. The glass is not half empty. The glass is simply too big!
    Where are the Zen Cart Debug Logs? Where are the HTTP 500 / Server Error Logs?
    Zen Cart related projects maintained by lhûngîl : Plugin / Module Tracker

 

 

Similar Threads

  1. v151 canonicalLink not working on
    By Tadj Hemingway in forum General Questions
    Replies: 4
    Last Post: 14 Feb 2017, 05:36 PM
  2. v150 base href and canonicalLink have extra directory in urls...
    By WebKat in forum General Questions
    Replies: 4
    Last Post: 3 Mar 2012, 10:12 PM
  3. Newly installed ZC improperly?
    By malevolence in forum Installing on a Windows Server
    Replies: 4
    Last Post: 22 Sep 2010, 02:32 PM
  4. Dual languages showing up improperly
    By netocampeiro in forum Customization from the Admin
    Replies: 0
    Last Post: 3 Feb 2009, 05:34 PM
  5. Product Listings Aligning Improperly
    By ObiNYC in forum General Questions
    Replies: 7
    Last Post: 28 Feb 2008, 04:32 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
disjunctive-egg
Zen-Cart, Internet Selling Services, Klamath Falls, OR