| Database operations often tend to be the main | | | | “active”, “inactive”, “pending”, |
| bottleneck for most web applications today. It’s not | | | | “expired” etc… |
| only the DBA’s (database administrators) that have | | | | There is even a way to get a “suggestion” from |
| to worry about these performance issues. We as | | | | MySQL itself on how to restructure your table. When |
| programmers need to do our part by structuring tables | | | | you do have a VARCHAR field, it can actually |
| properly, writing optimized queries and better code. | | | | suggest you to change that column type to ENUM |
| Here are some MySQL optimization techniques for | | | | instead. This done using the PROCEDURE |
| programmers. | | | | ANALYSE() call. Which brings us to: |
| 1. Optimize Your Queries For the Query Cache | | | | 10. Get Suggestions with PROCEDURE ANALYSE() |
| Most MySQL servers have query caching enabled. | | | | PROCEDURE ANALYSE() will let MySQL analyze the |
| It’s one of the most effective methods of | | | | columns structures and the actual data in your table to |
| improving performance, that is quietly handled by the | | | | come up with certain suggestions for you. It is only |
| database engine. When the same query is executed | | | | useful if there is actual data in your tables because |
| multiple times, the result is fetched from the cache, | | | | that plays a big role in the decision making. |
| which is quite fast. | | | | For example, if you created an INT field for your |
| The main problem is, it is so easy and hidden from the | | | | primary key, however do not have too many rows, it |
| programmer, most of us tend to ignore it. Some things | | | | might suggest you to use a MEDIUMINT instead. Or if |
| we do can actually prevent the query cache from | | | | you are using a VARCHAR field, you might get a |
| performing its task. | | | | suggestion to convert it to ENUM, if there are only few |
| // query cache does NOT work | | | | unique values. |
| $r = mysql_query("SELECT username FROM user | | | | You can also run this by clicking the “Propose table |
| WHERE signup_date >= CURDATE()"); | | | | structure” link in phpmyadmin, in one of your table |
| // query cache works! | | | | views. |
| $today = date("Y-m-d"); | | | | Keep in mind these are only suggestions. And if your |
| $r = mysql_query("SELECT username FROM user | | | | table is going to grow bigger, they may not even be |
| WHERE signup_date >= '$today'"); | | | | the right suggestions to follow. The decision is |
| The reason query cache does not work in the first line | | | | ultimately yours. |
| is the usage of the CURDATE() function. This applies | | | | 11. Use NOT NULL If You Can |
| to all non-deterministic functions like NOW() and | | | | Unless you have a very specific reason to use a |
| RAND() etc… Since the return result of the function | | | | NULL value, you should always set your columns as |
| can change, MySQL decides to disable query caching | | | | NOT NULL. |
| for that query. All we needed to do is to add an extra | | | | First of all, ask yourself if there is any difference |
| line of PHP before the query to prevent this from | | | | between having an empty string value vs. a NULL |
| happening. | | | | value (for INT fields: 0 vs. NULL). If there is no reason |
| 2. EXPLAIN Your SELECT Queries | | | | to have both, you do not need a NULL field. (Did you |
| Using the EXPLAIN keyword can give you insight on | | | | know that Oracle considers NULL and empty string as |
| what MySQL is doing to execute your query. This can | | | | being the same?) |
| help you spot the bottlenecks and other problems with | | | | NULL columns require additional space and they can |
| your query or table structures. | | | | add complexity to your comparison statements. Just |
| The results of an EXPLAIN query will show you which | | | | avoid them when you can. However, I understand |
| indexes are being utilized, how the table is being | | | | some people might have very specific reasons to |
| scanned and sorted etc… | | | | have NULL values, which is not always a bad thing. |
| Take a SELECT query (preferably a complex one, | | | | From MySQL docs: |
| with joins), and add the keyword EXPLAIN in front of | | | | “NULL columns require additional space in the row |
| it. You can just use phpmyadmin for this. It will show | | | | to record whether their values are NULL. For MyISAM |
| you the results in a nice table. For example, let’s | | | | tables, each NULL column takes one bit extra, rounded |
| say I forgot to add an index to a column, which I | | | | up to the nearest byte.” |
| perform joins on: | | | | 12. Prepared Statements |
| After adding the index to the group_id field: | | | | There are multiple benefits to using prepared |
| Now instead of scanning 7883 rows, it will only scan 9 | | | | statements, both for performance and security |
| and 16 rows from the 2 tables. A good rule of thumb is | | | | reasons. |
| to multiply all numbers under the “rows” column, | | | | Prepared Statements will filter the variables you bind to |
| and your query performance will be somewhat | | | | them by default, which is great for protecting your |
| proportional to the resulting number. | | | | application against SQL injection attacks. You can of |
| 3. LIMIT 1 When Getting a Unique Row | | | | course filter your variables manually too, but those |
| Sometimes when you are querying your tables, you | | | | methods are more prone to human error and |
| already know you are looking for just one row. You | | | | forgetfulness by the programmer. This is less of an |
| might be fetching a unique record, or you might just be | | | | issue when using some kind of framework or ORM. |
| just checking the existence of any number of records | | | | Since our focus is on performance, I should also |
| that satisfy your WHERE clause. | | | | mention the benefits in that area. These benefits are |
| In such cases, adding LIMIT 1 to your query can | | | | more significant when the same query is being used |
| increase performance. This way the database engine | | | | multiple times in your application. You can assign |
| will stop scanning for records after it finds just 1, | | | | different values to the same prepared statement, yet |
| instead of going thru the whole table or index. | | | | MySQL will only have to parse it once. |
| // do I have any users from Alabama? | | | | Also latest versions of MySQL transmits prepared |
| // what NOT to do: | | | | statements in a native binary form, which are more |
| $r = mysql_query("SELECT * FROM user WHERE | | | | efficient and can also help reduce network delays. |
| state = 'Alabama'");if (mysql_num_rows($r) > 0) { | | | | There was a time when many programmers used to |
| // ... | | | | avoid prepared statements on purpose, for a single |
| } | | | | important reason. They were not being cached by the |
| // much better: | | | | MySQL query cache. But since sometime around |
| $r = mysql_query("SELECT 1 FROM user WHERE | | | | version 5.1, query caching is supported too. |
| state = 'Alabama' LIMIT 1");if (mysql_num_rows($r) > | | | | To use prepared statements in PHP you check out |
| 0) { | | | | the mysqli extension or use a database abstraction |
| // ... | | | | layer like PDO. |
| } | | | | // create a prepared statementif ($stmt = |
| 4. Index the Search Fields | | | | $mysqli->prepare("SELECT username FROM user |
| Indexes are not just for the primary keys or the unique | | | | WHERE state=?")) { |
| keys. If there are any columns in your table that you | | | | // bind parameters |
| will search by, you should almost always index them. | | | | $stmt->bind_param("s", $state); |
| As you can see, this rule also applies on a partial string | | | | // execute |
| search like “last_name LIKE ‘a%’”. When | | | | $stmt->execute(); |
| searching from the beginning of the string, MySQL is | | | | // bind result variables |
| able to utilize the index on that column. | | | | $stmt->bind_result($username); |
| You should also understand which kinds of searches | | | | // fetch value |
| can not use the regular indexes. For instance, when | | | | $stmt->fetch();printf("%s is from %sn", $username, |
| searching for a word (e.g. “WHERE post_content | | | | $state); |
| LIKE ‘%apple%’”), you will not see a benefit | | | | $stmt->close(); |
| from a normal index. You will be better off using mysql | | | | } |
| fulltext search or building your own indexing solution. | | | | 13. Unbuffered Queries |
| 5. Index and Use Same Column Types for Joins | | | | Normally when you perform a query from a script, it |
| If your application contains many JOIN queries, you | | | | will wait for the execution of that query to finish before |
| need to make sure that the columns you join by are | | | | it can continue. You can change that by using |
| indexed on both tables. This affects how MySQL | | | | unbuffered queries. |
| internally optimizes the join operation. | | | | There is a great explanation in the PHP docs for the |
| Also, the columns that are joined, need to be the same | | | | mysql_unbuffered_query() function: |
| type. For instance, if you join a DECIMAL column, to an | | | | “mysql_unbuffered_query() sends the SQL query |
| INT column from another table, MySQL will be unable | | | | query to MySQL without automatically fetching and |
| to use at least one of the indexes. Even the character | | | | buffering the result rows as mysql_query() does. This |
| encodings need to be the same type for string type | | | | saves a considerable amount of memory with SQL |
| columns. | | | | queries that produce large result sets, and you can |
| // looking for companies in my state | | | | start working on the result set immediately after the |
| $r = mysql_query("SELECT company_name FROM | | | | first row has been retrieved as you don’t have to |
| users | | | | wait until the complete SQL query has been |
| LEFT JOIN companies ON (users.state = | | | | performed.” |
| companies.state) | | | | However, it comes with certain limitations. You have to |
| WHERE users.id = $user_id"); | | | | either read all the rows or call mysql_free_result() |
| // both state columns should be indexed | | | | before you can perform another query. Also you are |
| // and they both should be the same type and | | | | not allowed to use mysql_num_rows() or |
| character encoding | | | | mysql_data_seek() on the result set. |
| // or MySQL might do full table scans | | | | 14. Store IP Addresses as UNSIGNED INT |
| 6. Do Not ORDER BY RAND() | | | | Many programmers will create a VARCHAR(15) field |
| This is one of those tricks that sound cool at first, and | | | | without realizing they can actually store IP addresses |
| many rookie programmers fall for this trap. You may | | | | as integer values. With an INT you go down to only 4 |
| not realize what kind of terrible bottleneck you can | | | | bytes of space, and have a fixed size field instead. |
| create once you start using this in your queries. | | | | You have to make sure your column is an UNSIGNED |
| If you really need random rows out of your results, | | | | INT, because IP Addresses use the whole range of a |
| there are much better ways of doing it. Granted it | | | | 32 bit unsigned integer. |
| takes additional code, but you will prevent a bottleneck | | | | In your queries you can use the INET_ATON() to |
| that gets exponentially worse as your data grows. | | | | convert and IP to an integer, and INET_NTOA() for |
| The problem is, MySQL will have to perform RAND() | | | | vice versa. There are also similar functions in PHP |
| operation (which takes processing power) for every | | | | called ip2long() and long2ip(). |
| single row in the table before sorting it and giving you | | | | $r = "UPDATE users SET ip = |
| just 1 row. | | | | INET_ATON('{$_SERVER['REMOTE_ADDR']}') |
| // what NOT to do: | | | | WHERE user_id = $user_id"; |
| $r = mysql_query("SELECT username FROM user | | | | 15. Fixed-length (Static) Tables are Faster |
| ORDER BY RAND() LIMIT 1"); | | | | When every single column in a table is |
| // much better: | | | | “fixed-length”, the table is also considered |
| $r = mysql_query("SELECT count(*) FROM user"); | | | | “static” or “fixed-length”. Examples of |
| $d = mysql_fetch_row($r); | | | | column types that are NOT fixed-length are: |
| $rand = mt_rand(0,$d[0] - 1); | | | | VARCHAR, TEXT, BLOB. If you include even just 1 of |
| $r = mysql_query("SELECT username FROM user | | | | these types of columns, the table ceases to be |
| LIMIT $rand, 1"); | | | | fixed-length and has to be handled differently by the |
| So you pick a random number less than the number of | | | | MySQL engine. |
| results and use that as the offset in your LIMIT clause. | | | | Fixed-length tables can improve performance because |
| 7. Avoid SELECT * | | | | it is faster for MySQL engine to seek through the |
| The more data is read from the tables, the slower the | | | | records. When it wants to read a specific row in a |
| query will become. It increases the time it takes for the | | | | table, it can quickly calculate the position of it. If the row |
| disk operations. Also when the database server is | | | | size is not fixed, every time it needs to do a seek, it |
| separate from the web server, you will have longer | | | | has to consult the primary key index. |
| network delays due to the data having to be | | | | They are also easier to cache and easier to |
| transferred between the servers. | | | | reconstruct after a crash. But they also can take more |
| It is a good habit to always specify which columns you | | | | space. For instance, if you convert a VARCHAR(20) |
| need when you are doing your SELECT’s. | | | | field to a CHAR(20) field, it will always take 20 bytes |
| // not preferred | | | | of space regardless of what is it in. |
| $r = mysql_query("SELECT * FROM user WHERE | | | | By using “Vertical Partitioning” techniques, you |
| user_id = 1"); | | | | can separate the variable-length columns to a |
| $d = mysql_fetch_assoc($r);echo "Welcome | | | | separate table. Which brings us to: |
| {$d['username']}"; | | | | 16. Vertical Partitioning |
| // better: | | | | Vertical Partitioning is the act of splitting your table |
| $r = mysql_query("SELECT username FROM user | | | | structure in a vertical manner for optimization reasons. |
| WHERE user_id = 1"); | | | | Example 1: You might have a users table that contains |
| $d = mysql_fetch_assoc($r);echo "Welcome | | | | home addresses, that do not get read often. You can |
| {$d['username']}"; | | | | choose to split your table and store the address info |
| // the differences are more significant with bigger | | | | on a separate table. This way your main users table |
| result sets | | | | will shrink in size. As you know, smaller tables perform |
| 8. Almost Always Have an id Field | | | | faster. |
| In every table have an id column that is the PRIMARY | | | | Example 2: You have a “last_login” field in your |
| KEY, AUTO_INCREMENT and one of the flavors of | | | | table. It updates every time a user logs in to the |
| INT. Also preferably UNSIGNED, since the value can | | | | website. But every update on a table causes the |
| not be negative. | | | | query cache for that table to be flushed. You can put |
| Even if you have a users table that has a unique | | | | that field into another table to keep updates to your |
| username field, do not make that your primary key. | | | | users table to a minimum. |
| VARCHAR fields as primary keys are slower. And | | | | But you also need to make sure you don’t |
| you will have a better structure in your code by | | | | constantly need to join these 2 tables after the |
| referring to all users with their id’s internally. | | | | partitioning or you might actually suffer performance |
| There are also behind the scenes operations done by | | | | decline. |
| the MySQL engine itself, that uses the primary key | | | | 17. Split the Big DELETE or INSERT Queries |
| field internally. Which become even more important, the | | | | If you need to perform a big DELETE or INSERT |
| more complicated the database setup is. (clusters, | | | | query on a live website, you need to be careful not to |
| partitioning etc…). | | | | disturb the web traffic. When a big query like that is |
| One possible exception to the rule are the | | | | performed, it can lock your tables and bring your web |
| “association tables”, used for the many-to-many | | | | application to a halt. |
| type of associations between 2 tables. For example a | | | | Apache runs many parallel processes/threads. |
| “posts_tags” table that contains 2 columns: | | | | Therefore it works most efficiently when scripts finish |
| post_id, tag_id, that is used for the relations between | | | | executing as soon as possible, so the servers do not |
| two tables named “post” and “tags”. | | | | experience too many open connections and |
| These tables can have a PRIMARY key that contains | | | | processes at once that consume resources, especially |
| both id fields. | | | | the memory. |
| 9. Use ENUM over VARCHAR | | | | If you end up locking your tables for any extended |
| ENUM type columns are very fast and compact. | | | | period of time (like 30 seconds or more), on a high |
| Internally they are stored like TINYINT, yet they can | | | | traffic web site, you will cause a process and query |
| contain and display string values. This makes them a | | | | pileup, which might take a long time to clear or even |
| perfect candidate for certain fields. | | | | crash your web server. |
| If you have a field, which will contain only a few | | | | If you have some kind of maintenance script that |
| different kinds of values, use ENUM instead of | | | | needs to delete large numbers of rows, just use the |
| VARCHAR. For example, it could be a column named | | | | LIMIT clause to do it in smaller batches to avoid this |
| “status”, and only contain values such as | | | | congestion. |