How to Break a MySQL Left Join
Posted on February 19th, 2008 by Greg Allard in MySQL | Comments
Care must be taken when placing conditions on the results of the right-hand table of a LEFT JOIN because it could easily become a normal JOIN since MySQL is using a NULL row for every column in the right-hand table when no records exist.
Since I like to use a soft delete on all tables (hidden but not deleted), I need to put a condition to alter the results from the right-hand table. The proper way to place a condition on this table is to use:
... WHERE ( right_table.foo = 'bar' -- the condition you want OR right_table.foo IS NULL -- it's cool if it's empty ) ...
Here is an Example
Here are some example SQL statements to see exactly what is happening. First we create two tables with a one-to-many relationship.
CREATE TABLE tasks ( -- left task_id MEDIUMINT( 5 ) NOT NULL AUTO_INCREMENT PRIMARY KEY , task_text TEXT NOT NULL , task_deleted TINYINT( 1 ) NOT NULL , task_last_mod INT( 11 ) NOT NULL , task_created INT( 11 ) NOT NULL ) ENGINE = MYISAM; CREATE TABLE logs ( -- right log_id MEDIUMINT( 6 ) NOT NULL AUTO_INCREMENT PRIMARY KEY , task_id MEDIUMINT( 5 ) NOT NULL , log_text TEXT NOT NULL , log_length INT( 11 ) NOT NULL , log_deleted TINYINT( 1 ) NOT NULL , log_last_mod INT( 11 ) NOT NULL , log_created INT( 11 ) NOT NULL ) ENGINE = MYISAM;
After creating the tables we are ready to add some testing data.
INSERT INTO tasks ( task_id , task_text , task_deleted , task_last_mod , task_created ) VALUES ( 1 , 'Get me a beer.', 0, 1203372686 , 1203372686 ), ( 2 , 'Something ridiculous.', 0, 1203372686 , 1203372686 ), ( 3 , 'a logless task', 0, 1203372686, 1203372686 ); INSERT INTO logs ( log_id , task_id , log_text , log_length , log_deleted , log_last_mod , log_created ) VALUES ( 1, 1, 'got a beer', 120, 0, 1203372686, 1203372686 ), ( 2, 2, 'this took forever', 900, 0, 1203372686, 1203372686 ), ( 3, 1, 'a deleted one', 900, 1, 1203372686, 1203372686 ), ( 4, 1, 'moar beer', 120, 0, 1203372686, 1203372686 );
Simple SELECT
This will get all tasks, count the logs, and sum the logged time whether there is a log or not for the task.
SELECT task_text, count(tasks.task_id) AS log_count, sum(log_length) AS total_time FROM tasks LEFT JOIN logs ON tasks.task_id = logs.task_id WHERE task_deleted = 0 GROUP BY logs.task_id;
This query produces expected results as seen below.
task_text | log_count | total_time |
---|---|---|
a logless task | 1 | NULL |
Get me a beer. | 3 | 1140 |
Something ridiculous. | 1 | 900 |
Improper Conditional SELECT
If we were to place a condition on the right-hand table to limit the results to only count and sum the logs that are active, at first we might try:
SELECT task_text, count(tasks.task_id) AS log_count, sum(log_length) AS total_time FROM tasks LEFT JOIN logs ON tasks.task_id = logs.task_id WHERE task_deleted = 0 AND log_deleted = 0 -- this breaks the left join GROUP BY logs.task_id;
This produces the results in the following table. Notice that the logless task has disappeared and that the log count and sum for the Get me a beer task has changed to the desired values. This is the same as a normal JOIN statement.
task_text | log_count | total_time |
---|---|---|
Get me a beer. | 2 | 240 |
Something ridiculous. | 1 | 900 |
The Proper Way
The proper way to achieve the desired results is to use the following query:
SELECT task_text, count(tasks.task_id) AS log_count, sum(log_length) AS total_time FROM tasks LEFT JOIN logs ON tasks.task_id = logs.task_id WHERE task_deleted = 0 AND ( log_deleted = 0 OR log_deleted IS NULL ) GROUP BY logs.task_id;
Producing:
task_text | log_count | total_time |
---|---|---|
a logless task | 1 | NULL |
Get me a beer. | 2 | 240 |
Something ridiculous. | 1 | 900 |
As you can see the logless task is back and the values for Get me a beer are accurate.
I use webfaction to host a lot of my django projects. It has an easy setup that will get you developing quickly and a great community of talented programmers. There is also a quick setup for rails, wordpress, and a lot more.
Related posts:
- Careful with Numeric Data Types in MySQL TINYINT can be set to length of 4, but that doesn’t mean the max value is 9999. A tiny int...
- Conditions on Count or Sum in MySQL When selecting count, sum, or some other aggregate function, the value isn’t determined until after the WHERE clause so a...
- Tips for MySQL to PostgreSQL Switch If you’ve decided to move a few tables from MySQL to PostgreSQL, these few tips might help. I won’t get...