Joining Tables in MySQL

Revision as of 18:17, 22 October 2007 by Neil (Talk | contribs) (Equi-Join (aka the Inner Join))

Revision as of 18:17, 22 October 2007 by Neil (Talk | contribs) (Equi-Join (aka the Inner Join))

Table joins provide a way of associating data that resides in different tables during data retrieval. Suppose we have a table in our MySQL database called product. This table contains data about the products that an on-line electronics store sells. Each product in turn is purchased from a supplier and a single supplier can supply multiple items listed in the product table.

Given this scenario we have two options for storing contact information about the supplier. One option is to store the supplier contact information information with each row in the product where the product is sourced from that supplier. Whilst this would clearly work it is a highly inefficient approach to take because we would be duplicating the suppliers contact information for every product that the supplier sells to us. Also, if the supplier moves to a new location we would have to update every single row in the product table associated with that supplier.

A much better approach would be to have a separate supplier table which contains the contact information for each supplier and then reference this table when we want to extract the supplier information for a particualr product in the product table. This approach is known using a join and forms the basis of a relational database.

How Does a Join Work?

A join works through the use of keys. Continuing our example, our supplier table contains a column designated as the supplier_id. This column is configured as the primary key (for details on primary keys see Database Basics). The product table contains all of the products sold by our company, including product id, product description and product name. In addition it also contains the supplier id of the supplier from which we buy the individual products. Because this is a key from a different table,(the suppliers table) it is referred to a as foreign key. When using a SELECT statement to retrieve data from the product table we can use this foreign key to extract the relevant supplier information from the supplier table for each product.

Let's begin by looking at our two tables, the supplier table and the product. First, the supplier table contains the following rows:

mysql> SELECT * FROM suppliers;
+-------------+---------------+-------------------+------------------+
| supplier_id | supplier_name | supplier_address  | supplier_contact |
+-------------+---------------+-------------------+------------------+
|           1 | Microsoft     | 1 Microsoft Way   | Bill Gates       |
|           2 | Apple, Inc.   | 1 Infinate Loop   | Steve Jobs       |
|           3 | EasyTech      | 100 Beltway Drive | John Williams    |
|           4 | WildTech      | 100 Hard Drive    | Alan Wilkes      |
+-------------+---------------+-------------------+------------------+
4 rows in set (0.00 sec)

And our product table contains the following rows:

SELECT * FROM product;
+-----------+----------------------------+-----------------------------+-------------+
| prod_code | prod_name                  | prod_desc                   | supplier_id |
+-----------+----------------------------+-----------------------------+-------------+
|         1 | CD-RW Model 4543           | CD Writer                   |           3 |
|         2 | EasyTech Mouse 7632        | Cordless Mouse              |           3 |
|         3 | WildTech 250Gb 1700        | SATA Disk Drive             |           4 |
|         4 | Microsoft 10-20 Keyboard   | Ergonomic Keyboard          |           1 |
|         5 | Apple iPhone 8Gb           | Smart Phone                 |           2 |
+-----------+----------------------------+-----------------------------+-------------+

As you can see from the above output, the product rows contain a column which holds the supplier_id of the supplier from which the product is obtained. Now that we have the tables created, we can begin to perform some joins.

Performing a Cross-Join

Joining tables involves combining rows from two tables. The most basic of join types is the cross-join. The cross-join simply assigns a row from one table to every row of the second table. This is of little or no use in real terms, but for the purposes of completeness, the syntax for a cross-join is as follows:

SELECT column_names FROM table1, table2;

For example, if we were to perform the following command on our sample table we would get the following output:

+----------------------------+---------------+
| prod_name                  | supplier_name |
+----------------------------+---------------+
| CD-RW Model 4543           | Microsoft     |
| CD-RW Model 4543           | Apple, Inc.   |
| CD-RW Model 4543           | EasyTech      |
| CD-RW Model 4543           | WildTech      |
| EasyTech Mouse 7632        | Microsoft     |
| EasyTech Mouse 7632        | Apple, Inc.   |
| EasyTech Mouse 7632        | EasyTech      |
| EasyTech Mouse 7632        | WildTech      |
| WildTech 250Gb 1700        | Microsoft     |
| WildTech 250Gb 1700        | Apple, Inc.   |
| WildTech 250Gb 1700        | EasyTech      |
| WildTech 250Gb 1700        | WildTech      |
| Microsoft 10-20 Keyboard   | Microsoft     |
| Microsoft 10-20 Keyboard   | Apple, Inc.   |
| Microsoft 10-20 Keyboard   | EasyTech      |
| Microsoft 10-20 Keyboard   | WildTech      |
| Apple iPhone 8Gb           | Microsoft     |
| Apple iPhone 8Gb           | Apple, Inc.   |
| Apple iPhone 8Gb           | EasyTech      |
| Apple iPhone 8Gb           | WildTech      |
+----------------------------+---------------+

As you can see, it is hard to imagine how this could of use in many situations. A much more useful type of join is the Equi-Join or Inner Join.


Equi-Join (aka the Inner Join)

The Equi-Join joins rows from two or more tables based on comparisons between a specific column in each table. The syntax for this approach is as follows:

SELECT column_names FROM table1, table2 WHERE (table1.column = table2.column);

For example, to extract the product name and supplier name for each row in our product table we would use the following command:

SELECT prod_name, supplier_name, supplier_address FROM product, suppliers WHERE (product.supplier_id = suppliers.supplier_id);

Note that we have to use what is known as the fully qualified name for the supplier_id column in each table since both tables contain a supplier_id. A fully qualified column name is defined by specifying the table name followed by a dot (.) and then the column name.

The result of the above command is to produces a lists of products and the name and address of the supplier for each product:

+--------------------------+---------------+-------------------+
| prod_name                | supplier_name | supplier_address  |
+--------------------------+---------------+-------------------+
| Microsoft 10-20 Keyboard | Microsoft     | 1 Microsoft Way   |
| Apple iPhone 8Gb         | Apple, Inc.   | 1 Infinate Loop   |
| CD-RW Model 4543         | EasyTech      | 100 Beltway Drive |
| EasyTech Mouse 7632      | EasyTech      | 100 Beltway Drive |
| WildTech 250Gb 1700      | WildTech      | 100 Hard Drive    |
+--------------------------+---------------+-------------------+
5 rows in set (0.00 sec)

Performing a Left Join or a Right Join

Another way to join tables is use a LEFT JOIN in the select statement.The LEFT JOIN causes the tables to be joined before any WHERE clause is used. The syntax for this type of join is:

SELECT column names FROM table1 LEFT JOIN table2 ON (table1.column = table2.column;

Therefore, we can perform a LEFT JOIN that gives us the same result as our Equi-Join:

SELECT prod_name, supplier_name, supplier_address FROM product LEFT JOIN suppliers 
ON (product.supplier_id = suppliers.supplier_id);
+----------------------------+---------------+-------------------+
| prod_name                  | supplier_name | supplier_address  |
+----------------------------+---------------+-------------------+
| CD-RW Model 4543           | EasyTech      | 100 Beltway Drive |
| EasyTech Mouse 7632        | EasyTech      | 100 Beltway Drive |
| WildTech 250Gb 1700        | WildTech      | 100 Hard Drive    |
| Microsoft 10-20 Keyboard   | Microsoft     | 1 Microsoft Way   |
| Apple iPhone 8Gb           | Apple, Inc.   | 1 Infinate Loop   |
+----------------------------+---------------+-------------------+

One key different with the LEFT JOIN is that it will also list rows from the first table for which there is no match in the second table. For example, suppose we have product in our product table for which there is no matching supplier in the supplier table. When we run our SELECT statement the row will still be displayed, but with a NULL values for the supplier columns since no such supplier exists:

+----------------------------+---------------+-------------------+
| prod_name                  | supplier_name | supplier_address  |
+----------------------------+---------------+-------------------+
| CD-RW Model 4543           | EasyTech      | 100 Beltway Drive |
| EasyTech Mouse 7632        | EasyTech      | 100 Beltway Drive |
| WildTech 250Gb 1700        | WildTech      | 100 Hard Drive    |
| Microsoft 10-20 Keyboard   | Microsoft     | 1 Microsoft Way   |
| Apple iPhone 8Gb           | Apple, Inc.   | 1 Infinate Loop   |
| Moto Razr                  | NULL          | NULL              |
+----------------------------+---------------+-------------------+

The opposite effect can be achieved using a RIGHT JOIN, whereby all the rows in a the second table (i.e our supplier table) will be displayed regardless of whether that supplier has any products in our product table:

SELECT prod_name, supplier_name, supplier_address FROM product RIGHT JOIN suppliers 
ON (product.supplier_id = suppliers.supplier_id); 
+--------------------------+-----------------+------------------------+
| prod_name                | supplier_name   | supplier_address       |
+--------------------------+-----------------+------------------------+
| Microsoft 10-20 Keyboard | Microsoft       | 1 Microsoft Way        |
| Apple iPhone 8Gb         | Apple, Inc.     | 1 Infinate Loop        |
| CD-RW Model 4543         | EasyTech        | 100 Beltway Drive      |
| EasyTech Mouse 7632      | EasyTech        | 100 Beltway Drive      |
| WildTech 250Gb 1700      | WildTech        | 100 Hard Drive         |
| NULL                     | Hewlett Packard | 100 Printer Expressway |
+--------------------------+-----------------+------------------------+

Creating Joins with WHERE and USING

The next step is to incorporate some WHERE clauses into our LEFT and RIGHT joins. Say, for example, that we wish to list only products supplied by Microsoft:

SELECT prod_name, supplier_name, supplier_address FROM product RIGHT JOIN suppliers 
ON (product.supplier_id = suppliers.supplier_id) WHERE supplier_name='Microsoft';
+--------------------------+---------------+------------------+
| prod_name                | supplier_name | supplier_address |
+--------------------------+---------------+------------------+
| Microsoft 10-20 Keyboard | Microsoft     | 1 Microsoft Way  |
+--------------------------+---------------+------------------+
1 row in set (0.00 sec)

The USING clause further simplifies the tasks of creating joins. The purpose of USING is to avoid the use of fully qualified names (such as product.supplier_id and supplier.supplier_id) when reference columns that reside in different tables but have the names. For example, to perform the same join above based on the values of product.supplier_id and supplier.supplier_id we can simply use the following syntax:

SELECT prod_name, supplier_name, supplier_address FROM product 
LEFT JOIN suppliers USING (supplier_id) WHERE supplier_name='Microsoft'; 

Resulting in the following output:

+--------------------------+---------------+------------------+
| prod_name                | supplier_name | supplier_address |
+--------------------------+---------------+------------------+
| Microsoft 10-20 Keyboard | Microsoft     | 1 Microsoft Way  |
+--------------------------+---------------+------------------+
1 row in set (0.00 sec)