Page 1 of 3 123 LastLast
Results 1 to 10 of 28
  1. #1
    Join Date
    Sep 2008
    Location
    Cleethorpes
    Posts
    1,227
    Plugin Contributions
    6

    Default Strip HTML and WORD formatting

    Does anybody know of a reliable way to strip out redundant html tags, inline styles and WORD formatting from product descriptions?

    For example:

    <span style="font-family="verdana, arial, sans-serif">MY CONTENT</span>

    would become:

    MY CONTENT

    or

    <p>MY CONTENT</p>
    <p></p>
    <p></p>

    would become

    <p>MY CONTENT</p>

    or

    <br /><br /><br />MY CONTENT

    would become

    <br />MY CONTENT

    or

    <SPAN lang=EN-IE style="mso-ansi-language: EN-IE">
    <p class="MSO Normal">
    <UL style="MARGIN-TOP: 0cm" type=circle><li class=MsoNormal style='mso-list:l3 level1 lfo3;tab-stops:list 36.0pt'>
    <o:p>MY CONTENT</o:p></li></p></SPAN>

    would become

    <p>MY CONTENT</p>


    I found this but not sure how to implement it - I've tried both and neither appear to work but then it could be something I am doing wrong. I would like to target '.mainDescription' content in product descriptions.

    this link which explains it http://tim.mackey.ie/CommentView,gui...6602d5718.aspx

    and also this code:

    function cleanHTML(input) {
    // 1. remove line breaks / Mso classes
    var stringStripper = /(\n|\r| class=(")?Mso[a-zA-Z]+(")?)/g;
    var output = input.replace(stringStripper, ' ');
    // 2. strip Word generated HTML comments
    var commentSripper = new RegExp('<!--(.*?)-->','g');
    var output = output.replace(commentSripper, '');
    var tagStripper = new RegExp('<(/)*(meta|link|span|\\?xml:|st1:|o:|font)(.*?)>','gi');
    // 3. remove tags leave content if any
    output = output.replace(tagStripper, '');
    // 4. Remove everything in between and including tags '<style(.)style(.)>'
    var badTags = ['style', 'script','applet','embed','noframes','noscript'];

    for (var i=0; i< badTags.length; i++) {
    tagStripper = new RegExp('<'+badTags[i]+'.*?'+badTags[i]+'(.*?)>', 'gi');
    output = output.replace(tagStripper, '');
    }
    // 5. remove attributes ' style="..."'
    var badAttributes = ['style', 'start'];
    for (var i=0; i< badAttributes.length; i++) {
    var attributeStripper = new RegExp(' ' + badAttributes[i] + '="(.*?)"','gi');
    output = output.replace(attributeStripper, '');
    }
    return output;
    }
    Nick Smith - Venture Design and Print
    https://venturedesignandprint.co.uk

  2. #2
    Join Date
    Jan 2004
    Posts
    66,380
    Blog Entries
    7
    Plugin Contributions
    274

    Default Re: Strip HTML and WORD formatting

    I believe the CKEditor plugin has a button to "Paste from Word" which does this cleanup for you.
    .

    Zen Cart - putting the dream of business ownership within reach of anyone!
    Donate to: DrByte directly or to the Zen Cart team as a whole

    Remember: Any code suggestions you see here are merely suggestions. You assume full responsibility for your use of any such suggestions, including any impact ANY alterations you make to your site may have on your PCI compliance.
    Furthermore, any advice you see here about PCI matters is merely an opinion, and should not be relied upon as "official". Official PCI information should be obtained from the PCI Security Council directly or from one of their authorized Assessors.

  3. #3
    Join Date
    Sep 2008
    Location
    Cleethorpes
    Posts
    1,227
    Plugin Contributions
    6

    Default Re: Strip HTML and WORD formatting

    Ok I realise that. However this is a database of 100's, probably a 1000 products so I would rather skip editing 1000 products if I can which is why I am looking towards JQUERY or JAVASCRIPT, or even PHP
    Nick Smith - Venture Design and Print
    https://venturedesignandprint.co.uk

  4. #4
    Join Date
    Sep 2008
    Location
    Cleethorpes
    Posts
    1,227
    Plugin Contributions
    6

    Default Re: Strip HTML and WORD formatting

    Do you think it could be done with this?

    http://php.net/manual/en/function.strip-tags.php

    added to

    <?php echo stripslashes($products_description); ?>

    in tpl_product_info_display.php
    Nick Smith - Venture Design and Print
    https://venturedesignandprint.co.uk

  5. #5
    Join Date
    Jan 2004
    Posts
    66,380
    Blog Entries
    7
    Plugin Contributions
    274

    Default Re: Strip HTML and WORD formatting

    No. Running strip_tags would remove ALL your formatting, including intentional formatting.

    Sounds like the problem is in your original data, not a Zen Cart issue. So, better to go back to the source of your original data. Clean that up, then re-import.
    .

    Zen Cart - putting the dream of business ownership within reach of anyone!
    Donate to: DrByte directly or to the Zen Cart team as a whole

    Remember: Any code suggestions you see here are merely suggestions. You assume full responsibility for your use of any such suggestions, including any impact ANY alterations you make to your site may have on your PCI compliance.
    Furthermore, any advice you see here about PCI matters is merely an opinion, and should not be relied upon as "official". Official PCI information should be obtained from the PCI Security Council directly or from one of their authorized Assessors.

  6. #6
    Join Date
    Feb 2008
    Posts
    529
    Plugin Contributions
    0

    Default Re: Strip HTML and WORD formatting

    Quote Originally Posted by Nick1973 View Post
    Does anybody know of a reliable way to strip out redundant html tags, inline styles and WORD formatting from product descriptions?

    ...
    This is a real hack and probably wrong headed in many ways - but I had similar issue once, and did the following to do some cleanup:

    To remove extraneous formatting from all products I exported copy of db, opened it in Notepad++ , went down to bottom of db where product descriptions are and did a ‘find’ on the offending formatting and 'replace' with nothing. I took a (big) chance and did a 'replace all'. Things like 'find <br /><br ><br />' and 'replace with <br />' also worked. When done I saved that db, dropped the original db in phpMyAdmin (after backing up), then imported the changed db. It worked for me. I found a few products that needed tweaking but this saved a huge amount of time overall.

    I'm not recommending this approach. Only noting its how I muddled through a similar problem.

  7. #7
    Join Date
    Sep 2008
    Location
    Cleethorpes
    Posts
    1,227
    Plugin Contributions
    6

    Default Re: Strip HTML and WORD formatting

    Ok Dr Byte, however this is also a common issue when clients try to copy and paste data direct from MS Word. Regardless of how much you try to train them not to, they will always do something different. I'm fully aware the problem exists in the original data, but it would take me an age to go through every single product and strip out the code by searching and deleting. It's not a workable option. And neither is copying and pasting the text for each product through the CKEditor plugin button "Paste from Word". I could be at it for weeks, probably very thin from malnutrition, and on the verge of insanity.

    The idea is to eventually override any inline styles with CSS too.

    Both javascript and jquery do the tasks I am after, however I am not sure how they should be implemented.
    Nick Smith - Venture Design and Print
    https://venturedesignandprint.co.uk

  8. #8
    Join Date
    Jan 2004
    Posts
    66,380
    Blog Entries
    7
    Plugin Contributions
    274

    Default Re: Strip HTML and WORD formatting

    Quote Originally Posted by Nick1973 View Post
    Ok Dr Byte, however this is also a common issue when clients try to copy and paste data direct from MS Word. Regardless of how much you try to train them not to, they will always do something different.
    Not denying that.

    Quote Originally Posted by Nick1973 View Post
    The idea is to eventually override any inline styles with CSS too.
    Ideal for sure.

    Quote Originally Posted by Nick1973 View Post
    Both javascript and jquery do the tasks I am after, however I am not sure how they should be implemented.
    I'm not sure how javascript and jquery can be used to clean up the data in your database. Are you planning to write a script to loop through every record, from a jquery task running in your browser?

    Please expand more on what you envision here ...
    Last edited by DrByte; 15 Apr 2016 at 06:25 PM. Reason: typo
    .

    Zen Cart - putting the dream of business ownership within reach of anyone!
    Donate to: DrByte directly or to the Zen Cart team as a whole

    Remember: Any code suggestions you see here are merely suggestions. You assume full responsibility for your use of any such suggestions, including any impact ANY alterations you make to your site may have on your PCI compliance.
    Furthermore, any advice you see here about PCI matters is merely an opinion, and should not be relied upon as "official". Official PCI information should be obtained from the PCI Security Council directly or from one of their authorized Assessors.

  9. #9
    Join Date
    Sep 2008
    Location
    Cleethorpes
    Posts
    1,227
    Plugin Contributions
    6

    Default Re: Strip HTML and WORD formatting

    The scripts already exist. I'm not after a script to rewrite/clean the entire database. I can most likely override spans with CSS, that is fine and I am aware of how that can be done.

    The scripts I posted earlier on in this conversation are third party scripts which are supposed to seek out certain html/word elements and disable them, however they are supposed to keep everything in between those elements. However I could not get these to work with Zen Cart. They do not rewrite the content, just strip away/disable certain tags.
    Nick Smith - Venture Design and Print
    https://venturedesignandprint.co.uk

  10. #10
    Join Date
    Jan 2004
    Posts
    66,380
    Blog Entries
    7
    Plugin Contributions
    274

    Default Re: Strip HTML and WORD formatting

    Okay. So you've got some scripts. The stuff you posted earlier is javascript.

    You said you don't want to clean the entire database.

    But you said you don't want to edit each product manually.
    So that means you *do* want to clean all of them in some automated way.

    But javascript code such as you've posted is designed to run in the browser. And the browser has no access to the database.
    So you need a script to pull each record from the database individually, put it in your browser, and then perform some sort of cleaning on the data, and then send the data back to your store's admin to save the changes back to the database.

    Am I correct that you're asking one of us to provide all of that for you?
    .

    Zen Cart - putting the dream of business ownership within reach of anyone!
    Donate to: DrByte directly or to the Zen Cart team as a whole

    Remember: Any code suggestions you see here are merely suggestions. You assume full responsibility for your use of any such suggestions, including any impact ANY alterations you make to your site may have on your PCI compliance.
    Furthermore, any advice you see here about PCI matters is merely an opinion, and should not be relied upon as "official". Official PCI information should be obtained from the PCI Security Council directly or from one of their authorized Assessors.

 

 
Page 1 of 3 123 LastLast

Similar Threads

  1. Word HEADER_ bin the menu bar and after the word login
    By moro in forum Installing on a Windows Server
    Replies: 1
    Last Post: 18 Dec 2011, 03:14 PM
  2. html email formatting
    By ads112001 in forum General Questions
    Replies: 1
    Last Post: 23 Feb 2011, 07:15 PM
  3. SEO URLs - how to strip HTML tags from product names?
    By gregy1403 in forum All Other Contributions/Addons
    Replies: 1
    Last Post: 14 Sep 2010, 04:31 PM
  4. Product listing - Don't strip html!
    By Jazzperson in forum Templates, Stylesheets, Page Layout
    Replies: 1
    Last Post: 6 Nov 2007, 04:31 PM
  5. WYSIWYG downloads? Problem with ms word html and the HTMLarea
    By vandiermen in forum All Other Contributions/Addons
    Replies: 5
    Last Post: 11 Sep 2007, 06:59 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
disjunctive-egg
Zen-Cart, Internet Selling Services, Klamath Falls, OR