Joining Tables with SQL

There are a number of ways to join tables with SQL and it is crucial to understand the differences between these or you may end up creating a flawed dataset. Any table can be linked to another table as long as there is a field that exists in both; this is known as a “common key.”

The most important and most often used joins are INNER JOIN, FULL JOIN, and LEFT/RIGHT JOIN. There are some very complex uses of these that won’t be covered in this book, but if you want more in depth information on this topic, Learning SQL by Alan Beaulieu is a great resource.

To highlight the logic behind all the different joining operations, let’s assume that we have only two tables (Figure 1) each with only 5 rows of data. The common key for these is Customer_ID.

An INNER JOIN is where only those values in the common field that exist in both tables will be returned. For example in Figure 2, Customer_ID A and C exist in both tables, therefore the Sales and City values will be returned only for those particular rows. Notice that the prefix t1 and t2 before Customer_ID are the table names that were declared.

select t1.Customer_ID, Sales, t2.Customer_ID, City

from Table_1   T1

INNER JOIN  Table_2   T2  ON t1.Customer_ID = t2.Customer_ID

Figure 2

Another way to perform an INNER JOIN, which will give you the exact same results, is to set the common keys between the two tables equal to each other. The code for an inner join through a WHERE statement can be seen in the code below. Although there is nothing wrong with this, for the sake of consistency I generally use the “INNER JOIN” syntax. 

select t1.Customer_ID, Sales, t2.Customer_ID, City

from Table_1  T1, Table_2  T2 

where t1.Customer_ID = t2.Customer_ID

 A FULL JOIN (also called FULL OUTER JOIN) is where every value is returned whether it exists in both tables or not. In Figure 3 you can see that every customer ID, sales value, and city is returned.

select t1.Customer_ID, Sales, t2.Customer_ID, City

from Table_1   T1

FULL JOIN  Table_2   T2  ON t1.Customer_ID = t2.Customer_ID

Figure 3

A function called COALESCE allows you to combine common fields, so even if a value exists in one field and not another, the field will still be populated. In the example below, using COALESCE returns an ID for every row which we are calling New_ID.

select coalesce(t1.Customer_ID, t2.Customer_ID) as New_ID, Sales, City

from Table_1   T1

FULL JOIN  Table_2   T2  ON t1.Customer_ID = t2.Customer_ID

Figure 4

A LEFT JOIN (also called LEFT OUTER JOIN) is where every value on the left table (first table denoted in the code) is returned and only those values in the common field are returned from the right table. In Figure 5, we can see that every value from Table 1 is returned but only ‘Chicago’ and ‘New York’ are returned from Table 2. A RIGHT JOIN is the inverse of a LEFT JOIN and can be seen in Figure 6. 

select t1.Customer_ID, Sales, t2.Customer_ID, City

from Table_1  T1

LEFT JOIN  Table_2  T2  ON t1.Customer_ID = t2.Customer_ID

Figure 5
Figure 6

There is one other method of merging tables that is extremely valuable and that is the UNION statement. A UNION is when you are taking multiple tables with the same fields and merging them together. It is not exactly a join because you are not appending new fields, only new rows. In the example below, the two tables have the same fields Customer_ID and Age. We want to have one table that has all of these values together. The code below will give the result seen in Figure 8.

Figure 7.1
Figure 7.2

select Customer_ID, Age

from Table_3 

UNION

select Customer_ID, Age

from Table_4 

Figure 8

           Now we can apply these learnings and join the tables in the retail_db database. One thing to note is that MySQL Workbench does not have the capability for FULL OUTER JOIN. There are some complex work arounds to do this, but from what I have seen, FULL OUTER JOIN is not as commonly used as INNER JOIN, LEFT JOIN, and UNION anyway.

           We first want to retrieve data that has a customer’s sales history and city in the same row, but only for those customers who fall into both tables. This is done using an INNER JOIN as seen below.

select t1.Customer_ID, Sales, t2.Customer_ID, City

from retail_db.sales_tbl   T1

INNER JOIN  retail_db.location_tbl   T2 

ON t1.Customer_ID = t2.Customer_ID;;

Figure 9

Since not every customer is found in the retail_db.location_tbl table, if we want to return the sales history and demographic information using a LEFT JOIN on retail_db.sales_tbl, some values in City will show up as null. This can be seen in Figure 10 below.

select t1.Customer_ID, Sales, t2.Customer_ID, City

from retail_db.sales_tbl   T1

LEFT JOIN  retail_db.location_tbl  T2 

ON t1.Customer_ID = t2.Customer_ID;;

Figure 10

             Finally we can use UNION to combine retail_db.sales_tbl and retail_db.sales_tbl_new.

select *

from retail_db.sales_tbl   T1 

UNION

select *

from retail_db.sales_tbl_new   T2;;

Figure 11