Care must be taken when placing conditions on the results of the right-hand table of a LEFT JOIN because it could easily become a normal JOIN since MySQL is using a NULL row for every column in the right-hand table when no records exist.

Since I like to use a soft delete on all tables (hidden but not deleted), I need to put a condition to alter the results from the right-hand table. The proper way to place a condition on this table is to use:

...
WHERE (
	right_table.foo = 'bar'  -- the condition you want
		OR
	right_table.foo IS NULL  -- it's cool if it's empty
)
...

Here is an Example

Here are some example SQL statements to see exactly what is happening. First we create two tables with a one-to-many relationship.

CREATE TABLE tasks (  -- left
 	task_id MEDIUMINT( 5 ) NOT NULL
 		AUTO_INCREMENT PRIMARY KEY ,
	 task_text TEXT NOT NULL ,
	 task_deleted TINYINT( 1 ) NOT NULL ,
	 task_last_mod INT( 11 ) NOT NULL ,
	 task_created INT( 11 ) NOT NULL
	) ENGINE = MYISAM;
 
CREATE TABLE logs (  -- right
	 log_id MEDIUMINT( 6 ) NOT NULL
	 	AUTO_INCREMENT PRIMARY KEY ,
	 task_id MEDIUMINT( 5 ) NOT NULL ,
	 log_text TEXT NOT NULL ,
	 log_length INT( 11 ) NOT NULL ,
	 log_deleted TINYINT( 1 ) NOT NULL ,
	 log_last_mod INT( 11 ) NOT NULL ,
	 log_created INT( 11 ) NOT NULL
) ENGINE = MYISAM;

After creating the tables we are ready to add some testing data.

INSERT INTO tasks (
 	task_id ,
 	task_text ,
	task_deleted ,
	task_last_mod ,
	task_created
)
VALUES (
	 1 , 'Get me a beer.', 0,
	 1203372686 , 1203372686
), (
	 2 , 'Something ridiculous.', 0,
	 1203372686 , 1203372686
), (
	 3 , 'a logless task', 0,
	 1203372686, 1203372686
);
 
INSERT INTO logs (
	 log_id ,
	 task_id ,
	 log_text ,
	 log_length ,
	 log_deleted ,
	 log_last_mod ,
	 log_created
)
VALUES (
	 1, 1, 'got a beer', 120,
	 0, 1203372686, 1203372686
), (
	 2, 2, 'this took forever', 900,
	 0, 1203372686, 1203372686
), (
	 3, 1, 'a deleted one', 900,
	 1, 1203372686, 1203372686
), (
	 4, 1, 'moar beer', 120,
	 0, 1203372686, 1203372686
);

Simple SELECT

This will get all tasks, count the logs, and sum the logged time whether there is a log or not for the task.

SELECT task_text, count(tasks.task_id) AS log_count,
	 sum(log_length) AS total_time
FROM tasks LEFT JOIN logs
ON tasks.task_id = logs.task_id
WHERE task_deleted = 0
GROUP BY logs.task_id;

This query produces expected results as seen below.

task_text log_count total_time
a logless task 1 NULL
Get me a beer. 3 1140
Something ridiculous. 1 900

Improper Conditional SELECT

If we were to place a condition on the right-hand table to limit the results to only count and sum the logs that are active, at first we might try:

SELECT task_text, count(tasks.task_id) AS log_count,
	 sum(log_length) AS total_time
FROM tasks LEFT JOIN logs ON tasks.task_id = logs.task_id
WHERE task_deleted = 0
AND log_deleted = 0  -- this breaks the left join
GROUP BY logs.task_id;

This produces the results in the following table. Notice that the logless task has disappeared and that the log count and sum for the Get me a beer task has changed to the desired values. This is the same as a normal JOIN statement.

task_text log_count total_time
Get me a beer. 2 240
Something ridiculous. 1 900

The Proper Way

The proper way to achieve the desired results is to use the following query:

SELECT task_text, count(tasks.task_id) AS log_count,
	 sum(log_length) AS total_time
FROM tasks LEFT JOIN logs ON tasks.task_id = logs.task_id
WHERE task_deleted = 0
AND (
	log_deleted = 0
	OR
	log_deleted IS NULL
)
GROUP BY logs.task_id;

Producing:

task_text log_count total_time
a logless task 1 NULL
Get me a beer. 2 240
Something ridiculous. 1 900

As you can see the logless task is back and the values for Get me a beer are accurate.

I use webfaction to host a lot of my django projects. It has an easy setup that will get you developing quickly and a great community of talented programmers. There is also a quick setup for rails, wordpress, and a lot more.

Related posts:

  1. Careful with Numeric Data Types in MySQL  TINYINT can be set to length of 4, but that doesn’t mean the max value is 9999. A tiny int...
  2. Conditions on Count or Sum in MySQL  When selecting count, sum, or some other aggregate function, the value isn’t determined until after the WHERE clause so a...
  3. Tips for MySQL to PostgreSQL Switch  If you’ve decided to move a few tables from MySQL to PostgreSQL, these few tips might help. I won’t get...