Hi,
It has long been my intention to look at Ceon URI Mapping's database structure with a view to making it scale better for larger sites.
As no-one reported any speed issues with the module until recently this has always been placed on the back-burner. Now that someone has finally had a problem with the speed of the module (and even though 3.6.3 has some significant speed enhancements for larger sites) it's time to reconsider the database structure for version 4.0.0.
As such I'd like to put a few options out there and get feedback from as many people as possible...
There are a few general facts about URIs on sites on which I'll be basing my restructuring. Please post here to let me know if these facts, as identified by me, apply or don't apply for your own/your clients' stores...
- Most sites don't use a slash at the end of any URI except the home page (i.e. http://mydomain.com/ is the home page but http://mydomain.com/my-category-one is used instead of http://mydomain.com/my-category-one/).
- The most used URI on a site is the home page.
- The second most used URIs on a site are category and product URIs.
- The next most URIs are manufacturer pages or "other" Zen Cart pages (eZ-pages, contact page etc.).
- The least used URIs are the historical URIs. The store itself only ever links using the "current" URIs so it never uses historical URIs. Historical URIs are only used when other sites linking to the site haven't updated their URIs yet. With these old links, after a certain amount of time, they are used less and less frequently as people/search engines have started to update their links and use the new URIs.
- Many URIs on a site share the same "base paths". I.e. URIs for products in a subcategory of /category-one/subcategory-one all start with "/category-one/subcategory-one": "/category-one/subcategory-one/product-name".
- The latter parts of URIs differ more than the first parts (e.g. most product names are different but many products share the same categories/subcategories).
- Few URIs are longer than 255 characters.
- Most URIs are less than 100 characters.
So first off, I'd appreciate it if people could let me know which of the above do and do not apply for them. Please give an overview of how your store/your client's store differs if they indeed do differ. E.g. if most URIs are actually more than 100 characters long, how long are they in general (what's the "mode" length? What lengths are used the most frequently?)?
Next, I'd like you to take a look at the following and add your comments..
Does anyone use slashes at the end of their URIs?
I'm thinking that the first enhancement I'll make is to add an option "Store sometimes/always uses slashes at the end of URIs other than the home page's URI". If this is set to "No", as it will be by default, then the queries to look up the URI for the current page will use the very quick LIKE or possibly = operator rather than a REGEXP operator (with no use of wildcards for the quickest lookups).
No longer one table to rule them all..
The next change is to rip the database table apart and instead use several specific tables.
By using tables for specific "types" of Zen Cart pages/URIs, the most frequently used tables can be made to be set sizes, resulting in the fastest possible lookups, and quick and efficient table modifications by the database server software (that last bit isn't important but is a nice side effect/bonus).
Category/Product/EZ-Page Tables
The primary table would be a categories_products_ezpages_short_mappings table which would only hold mappings for category pages, product pages and ez-page pages. Since the length of the uri column would be set (CHAR not TEXT or VARCHAR), there could be several versions of this table, each used for URIs of particular lengths. E.g.:
categories_products_ezpages_short_mappings - URIs 1 to 100 chars in length
categories_products_ezpages_medium_mappings - URIs 101 to 255 chars in length
The secondary table would be categories_products_ezpages_large_mappings. It would be the same as the above but use a TEXT column type for the uri field, for URIs longer than 255 characters.
The main reason for splitting the "set sized" categories_products_ezpages tables into more than one table would be that it would result in more efficient use of disk space as having URIs allowed to be 1 to 255 characters long in one table means that LOTs of URIs are going to have a lot of wasted white space in the 255 characters used to store the URI.
I'd like to hear what people think about this.. how many tables should this be split up into? Should "small" be 1 to 60 instead of 1 to 100? etc.? Let me hear your thoughts.
The disadvantage of having several fixed width tables for categories_products_ezpages is that building the links to these pages in the store can mean having several extra lookups per link, as each table is examined in turn until the appropriate URI is found.
All of these tables will have indexes though so the actual queries will be extremely quick for each table being examine to find the URI for a given category/product/ez-page URI. The thing is.. does having to make 2/3/4 super quick lookups outweigh the benefit of using less disk space and should instad one table be used for all category/product/ez-page URIs of between1 and 255 characters?
Regardless of whether the categories_products_ezpages set length table is one table or split into several, I think that there most likely will be a set width table for category/product/ez-page URIs (what I've called the primary table above) AND a non-set width table for category/product/ez-page URIs (the secondary table mentioned above) which uses TEXT for the URI.. that is essential to support URIs longer than 255 characters. Most stores don't need that but some do/will.
The set width and "large" tables will have indexes, but the set width table(s) will have the advantage of being able to have an index on the uri column.
The alternative is that one table can still be used for all the products/categories/expages but then the index probably couldn't make much use of the uri field.. Or could it? Could the index simply use a uri column of a limited length of 100/160/whatever and would that bring the benefit of quick lookups of about the same speed, without wasting disk space using CHAR column types?
I'd really love anyone with database expertise to comment on the various alternative options!
As URI mappings are primarly used in a READ mode rather than WRITE, updating variable width records isn't much of an issue.
Other Tables: "Other Zen Cart Pages" Table
There could then be a table for "other zen cart pages" which simply maps URIs to main_pages.
As there are't that many "other" pages on Zen Cart this is likely to be quite a small table so I don't know if it should exist on its own. Having to query it separately from other tables might be slower than simply including its mappings in another table.
Other Tables: "Other Zen Cart Pages with Query String Parameters" Table
To support manufacturers pages etc (i.e. pages with &manufacturers_id=XXX or &news_article_id=XX in the query string) there has to be a table with URI mappings for them.
As I was saying above, maybe all "Other Zen Cart Pages" should simply be included in this table?
The issue to address for this table is whether to make the query_string_parameters field a fixed length. I think it should be as I can't imagine anyone using more than 255 characters for this field. However, where should the line be drawn? How many characters should it support?
If the CHAR type is to be used, then supporting fewer characters means smaller database tables, but with the entire table using fixed length records, the length won't affect the speed of queries. So should the ultimate flexibility be given and use 255 characters but waste lots of disk space (relatively speaking) or should fewer characters be used?
Either way, this table isn't likely to be overly large unless a store uses it to map lots of news articles etc. I've rarely come across a store that uses lots and lots of manufacturers). Please let me know either way!
(As an aside, for the table records to be a set/fixed size NULL can't be supported. That's fine, the code can be updated to allow the use of empty query string parameters to signify "NULL").
Other Tables: "Alternative URIs" Table
I've judged this to be the least likely table of all the "current" URI mapping tables to be used so to allow ultimate fleixibility both its uri field and its alternative_uri field will use the TEXT column type. It will be the most inefficient table as it will require full table scans for each lookup carried out on it but I don't think many people use the alternative URI functionality. TELL ME IF THAT ISN'T TRUE!
Other Tables: "Historical" Tables
As it would be expected for historical URIs to be used less and less frequently over time as other sites link to the "current" URIs I'm tempted to have just one table for historical URIs with a TEXT column for the URI, just as the current Ceon URI Mappings (3.6.3) table has.
The alternative would be to have several historical tables, again using fixed and non-fixed width URIs for quicker lookups of shorter URIs.
Any thoughts on this are welcome.
Please comment on anything!
Okay, that's enough from me for now.. please let me know as much as you can about how your current stores work in relation to the above and what way you think things should work in 4.0.0.
If you don't know about any of the issues, don't feel the need to comment about those specific issues but please do comment on whatever you can.. I'd rather hear from 50 people about small parts of the above than just 2 about all of the above! All comments and opinions are welcome!
Please remember that due to the auto-upgrade functionality employed in the software, these changes will be performed by scripts, so don't worry about the amount of time any particular option would take for the database to be upgraded, let the electrons in the computer carry that out work, you won't have to do anything except click once to start the process.
The one thing about these updates is that the automated upgrade scripts may take some time on sites with many thousands of mappings so I'm intending to use the set_time_limit(0) command.. any servers which don't support that command will not be able to upgrade.. does this affect anyone? If so let me know now if your server doesn't let software set its own script time limit!
All the best...
Conor
ceon
Bookmarks