Page 1 of 2 12 LastLast
Results 1 to 10 of 14
  1. #1
    Join Date
    Aug 2007
    Location
    Gijón, Asturias, Spain
    Posts
    2,586
    Plugin Contributions
    30

    Default using utf8mb4_unicode_ci for database character set and collation

    It seems that in general the consensus is that one really should be using character set utf8mb4 and collation utf8mb4_unicode_ci over utf-8/utf8-general_ci.

    I was looking at implementing this as part of an upgrade.

    Changing the database collation: ok.
    Changing all the table's collation: ok.
    Converting all the table's columns to utf8mb4: not ok.

    The issue is there are a few tables with columns that have keys that are too long (varchar 255) and so this query

    ALTER TABLE my_database.admin_menus CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
    Will return this error

    #1071 - Specified key was too long; max key length is 1000 bytes
    It is not possible to increase the limit (without recompiling mysql).

    There are six tables that cause this error, the rest convert ok.

    I've looked at the columns in those tables and all have config options and keys etc., that don't go over 15 chars or so.

    Only the sessions table has anything of size in it, and that is 33 chars.
    All of these columns need not use utf8mb4 at all, but for the same of being consistent...

    So I'm throwing out this idea for general discussion/comment: the concept of changing to utf8mb4, reducing those columns sizes (to 250 works) etc...
    Last edited by torvista; 10 Jan 2018 at 05:27 PM.
    Steve
    github.com/torvista: Spanish Language Pack, Google reCaptcha, Structured Data, Multiple Copy-Move-Delete, Image Checker, BackupMySQL Admin/Auto...

  2. #2
    Join Date
    Aug 2007
    Location
    Gijón, Asturias, Spain
    Posts
    2,586
    Plugin Contributions
    30

    Default Re: using utf8mb4_unicode_ci for database character set and collation

    This is the most concise article I found:
    https://mathiasbynens.be/notes/mysql-utf8mb4

    and reading further seems there is no argument to be had, it is the way to go...comments please?
    Steve
    github.com/torvista: Spanish Language Pack, Google reCaptcha, Structured Data, Multiple Copy-Move-Delete, Image Checker, BackupMySQL Admin/Auto...

  3. #3
    Join Date
    Sep 2009
    Location
    Stuart, FL
    Posts
    12,478
    Plugin Contributions
    88

    Default Re: using utf8mb4_unicode_ci for database character set and collation

    There are six tables that cause this error, the rest convert ok.

    Just to note that the number will (currently) grow for Zen Cart 1.5.6:
    Code:
    # Updates
    ALTER TABLE products_options MODIFY products_options_comment varchar(256) default NULL;
    
    # allow longer image paths
    ALTER TABLE products MODIFY products_image varchar(255) default NULL;
    ALTER TABLE products_attributes MODIFY attributes_image varchar(255) default NULL;
    ALTER TABLE banners MODIFY banners_image varchar(255) NOT NULL default '';
    ALTER TABLE categories MODIFY categories_image varchar(255) default NULL;
    ALTER TABLE manufacturers MODIFY manufacturers_image varchar(255) default NULL;
    ALTER TABLE record_artists MODIFY artists_image varchar(255) default NULL;
    ALTER TABLE record_company MODIFY record_company_image varchar(255) default NULL;
    
    ALTER TABLE coupons ADD coupon_calc_base TINYINT(1) NOT NULL DEFAULT '0';
    ALTER TABLE coupons ADD coupon_order_limit INT( 4 ) NOT NULL DEFAULT '0';
    ALTER TABLE coupons ADD coupon_is_valid_for_sales TINYINT(1) NOT NULL DEFAULT 1;
    ALTER TABLE coupons ADD coupon_product_count TINYINT(1) NOT NULL DEFAULT '0';
    ALTER TABLE coupons_description MODIFY coupon_name VARCHAR(64) NOT NULL DEFAULT '';
    
    # Add fields for easier order reconstruction/edit
    ALTER TABLE orders ADD order_weight FLOAT NOT NULL DEFAULT '0';
    ALTER TABLE orders MODIFY shipping_method VARCHAR(255) NOT NULL DEFAULT '';
    ALTER TABLE orders MODIFY order_total decimal(15,4) default NULL;
    ALTER TABLE orders MODIFY order_tax decimal(15,4) default NULL;
    
    ALTER TABLE orders_products ADD products_weight float NOT NULL default '0';
    ALTER TABLE orders_products ADD products_virtual tinyint( 1 ) NOT NULL default '0';
    ALTER TABLE orders_products ADD product_is_always_free_shipping tinyint( 1 ) NOT NULL default '0';
    ALTER TABLE orders_products ADD products_quantity_order_min float NOT NULL default '1';
    ALTER TABLE orders_products ADD products_quantity_order_units float NOT NULL default '1';
    ALTER TABLE orders_products ADD products_quantity_order_max float NOT NULL default '0';
    ALTER TABLE orders_products ADD products_quantity_mixed tinyint( 1 ) NOT NULL default '0';
    ALTER TABLE orders_products ADD products_mixed_discount_quantity tinyint( 1 ) NOT NULL default '1';
    ALTER TABLE orders_products_download ADD products_attributes_id int( 11 ) NOT NULL default '0';
    Great post, @torvista; that article is good reading.

  4. #4
    Join Date
    Aug 2007
    Location
    Gijón, Asturias, Spain
    Posts
    2,586
    Plugin Contributions
    30

    Default Re: using utf8mb4_unicode_ci for database character set and collation

    Thanks, so I'll have to examine the 156 upgrade sql when it happens...

    Meanwhile, here is a description of what I have done, with no obvious effects (ill or otherwise).

    https://github.com/torvista/zen-cart...ase-to-utf8mb4

    I'd appreciate those in the know having a look in case I have missed something....
    Steve
    github.com/torvista: Spanish Language Pack, Google reCaptcha, Structured Data, Multiple Copy-Move-Delete, Image Checker, BackupMySQL Admin/Auto...

  5. #5
    Join Date
    Jan 2004
    Posts
    66,373
    Blog Entries
    7
    Plugin Contributions
    274

    Default Re: using utf8mb4_unicode_ci for database character set and collation

    Ya, changing will force a bunch of tables to have to limit the number of storable characters to 191 down from 255 ... which then means a whole lot of places in code will need altering to accommodate the loss of those 64 characters of storage.
    .

    Zen Cart - putting the dream of business ownership within reach of anyone!
    Donate to: DrByte directly or to the Zen Cart team as a whole

    Remember: Any code suggestions you see here are merely suggestions. You assume full responsibility for your use of any such suggestions, including any impact ANY alterations you make to your site may have on your PCI compliance.
    Furthermore, any advice you see here about PCI matters is merely an opinion, and should not be relied upon as "official". Official PCI information should be obtained from the PCI Security Council directly or from one of their authorized Assessors.

  6. #6
    Join Date
    Aug 2007
    Location
    Gijón, Asturias, Spain
    Posts
    2,586
    Plugin Contributions
    30

    Default Re: using utf8mb4_unicode_ci for database character set and collation

    Understood.
    What is the big picture view then? Given from what I have read, the MySQL implementation of "utf-8" is flawed, hence the consensus to change.
    While this issue is not really an "issue" (problem) at all with respect to Zen Cart, should it not be changed at some point in the future?
    Steve
    github.com/torvista: Spanish Language Pack, Google reCaptcha, Structured Data, Multiple Copy-Move-Delete, Image Checker, BackupMySQL Admin/Auto...

  7. #7
    Join Date
    Jan 2004
    Posts
    66,373
    Blog Entries
    7
    Plugin Contributions
    274

    Default Re: using utf8mb4_unicode_ci for database character set and collation

    I must retract part of my last statement. The 191-character limit is for index sizes (which you referred to in an earlier post), so except for tables which have critical needs to index longer than 191 characters, or whose primary key is longer than 191 characters, the field lengths themselves don't need to change.

    Also, some articles I've read (including comments in mathias' post you quoted) suggest that MySQL 5.7 may not have this limitation.

    So I think the overall schema changes are small, after which your proposed conversion is probably the ideal approach to flip the collation when needed.

    I think the biggest challenge here though is that PHP supports up to 256 characters for a session_id ... but since our sessions table uses session_id as its primary key (for legitimate reasons) and primary keys ought to be indexing the entire length of the column, that creates a quandary for both the sessions and whos_online tables, as far as the session_id field and corresponding indexes are concerned.

    Without resolving the sessions and whos_online issues, I imagine the following changes will cover the other tables' needs:
    https://github.com/zencart/zencart/pull/1596

    Granted, if we avoid InnoDB engine storage, at least for affected tables, isn't this moot?
    .

    Zen Cart - putting the dream of business ownership within reach of anyone!
    Donate to: DrByte directly or to the Zen Cart team as a whole

    Remember: Any code suggestions you see here are merely suggestions. You assume full responsibility for your use of any such suggestions, including any impact ANY alterations you make to your site may have on your PCI compliance.
    Furthermore, any advice you see here about PCI matters is merely an opinion, and should not be relied upon as "official". Official PCI information should be obtained from the PCI Security Council directly or from one of their authorized Assessors.

  8. #8
    Join Date
    Jan 2004
    Posts
    66,373
    Blog Entries
    7
    Plugin Contributions
    274

    Default Re: using utf8mb4_unicode_ci for database character set and collation

    Thinking about this more ... PHP only supports english letters/digits. Since none of those are 4-multibyte, we won't trigger the > 191 issue will we?
    .

    Zen Cart - putting the dream of business ownership within reach of anyone!
    Donate to: DrByte directly or to the Zen Cart team as a whole

    Remember: Any code suggestions you see here are merely suggestions. You assume full responsibility for your use of any such suggestions, including any impact ANY alterations you make to your site may have on your PCI compliance.
    Furthermore, any advice you see here about PCI matters is merely an opinion, and should not be relied upon as "official". Official PCI information should be obtained from the PCI Security Council directly or from one of their authorized Assessors.

  9. #9
    Join Date
    Aug 2007
    Location
    Gijón, Asturias, Spain
    Posts
    2,586
    Plugin Contributions
    30

    Default Re: using utf8mb4_unicode_ci for database character set and collation

    Though trial and error I found the varchar length that would be accepted for a column conversion, would vary (240-249), I assume that is due to data already in the column?

    It seems all should be set to 191?

    I haven't actually implemented these changes to my production database due to your comment
    a whole lot of places in code will need altering to accommodate the loss of those 64 characters of storage.
    and also since this is a solution to a problem I don't have!

    So, are code changes necessary?
    Steve
    github.com/torvista: Spanish Language Pack, Google reCaptcha, Structured Data, Multiple Copy-Move-Delete, Image Checker, BackupMySQL Admin/Auto...

  10. #10
    Join Date
    Jan 2004
    Posts
    66,373
    Blog Entries
    7
    Plugin Contributions
    274

    Default Re: using utf8mb4_unicode_ci for database character set and collation

    Quote Originally Posted by torvista View Post
    So, are code changes necessary?
    Apart from the index changes I mentioned (which is schema, not code), I don't think any other changes are required.
    No code changes needed, as far as I can tell at this point.
    Save for maybe zc_install recognizing utf8mb4 instead of just utf8.

    Like you, this is "solving a problem I don't have" ... yet.
    .

    Zen Cart - putting the dream of business ownership within reach of anyone!
    Donate to: DrByte directly or to the Zen Cart team as a whole

    Remember: Any code suggestions you see here are merely suggestions. You assume full responsibility for your use of any such suggestions, including any impact ANY alterations you make to your site may have on your PCI compliance.
    Furthermore, any advice you see here about PCI matters is merely an opinion, and should not be relied upon as "official". Official PCI information should be obtained from the PCI Security Council directly or from one of their authorized Assessors.

 

 
Page 1 of 2 12 LastLast

Similar Threads

  1. v150 Character set/database collation/date settings
    By mi16chap in forum General Questions
    Replies: 2
    Last Post: 1 Feb 2012, 01:19 AM
  2. Database Collation
    By Ninja Scott in forum Upgrading to 1.5.x
    Replies: 0
    Last Post: 4 Jan 2012, 08:09 PM
  3. 1.5.0RC1 popup says default database character set is Latin1
    By cochlear in forum Installing on a Linux/Unix Server
    Replies: 1
    Last Post: 25 Oct 2011, 04:53 PM
  4. Help on Database Character Set
    By pcontente in forum General Questions
    Replies: 2
    Last Post: 27 Feb 2009, 06:35 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
disjunctive-egg
Zen-Cart, Internet Selling Services, Klamath Falls, OR