There are a number of ways to join tables with SQL and it is crucial to understand the differences between these or you may end up creating a flawed dataset. Any table can be linked to another table as long as there is a field that exists in both; this is known as a “common key.”
The most important and most often used joins are INNER JOIN, FULL JOIN, and LEFT/RIGHT JOIN. There are some very complex uses of these that won’t be covered in this book, but if you want more in depth information on this topic, Learning SQL by Alan Beaulieu is a great resource.
To highlight the logic behind all the different joining operations, let’s assume that we have only two tables (Figure 1) each with only 5 rows of data. The common key for these is Customer_ID.
An INNER JOIN is where only those values in the common field that exist in both tables will be returned. For example in Figure 2, Customer_ID A and C exist in both tables, therefore the Sales and City values will be returned only for those particular rows. Notice that the prefix t1 and t2 before Customer_ID are the table names that were declared.
select t1.Customer_ID, Sales, t2.Customer_ID, City
from Table_1 T1
INNER JOIN Table_2 T2 ON t1.Customer_ID = t2.Customer_ID
Another way to perform an INNER JOIN, which will give you the exact same results, is to set the common keys between the two tables equal to each other. The code for an inner join through a WHERE statement can be seen in the code below. Although there is nothing wrong with this, for the sake of consistency I generally use the “INNER JOIN” syntax.
select t1.Customer_ID, Sales, t2.Customer_ID, City
from Table_1 T1, Table_2 T2
where t1.Customer_ID = t2.Customer_ID
A FULL JOIN (also called FULL OUTER JOIN) is where every value is returned whether it exists in both tables or not. In Figure 3 you can see that every customer ID, sales value, and city is returned.
select t1.Customer_ID, Sales, t2.Customer_ID, City
from Table_1 T1
FULL JOIN Table_2 T2 ON t1.Customer_ID = t2.Customer_ID
A function called COALESCE allows you to combine common fields, so even if a value exists in one field and not another, the field will still be populated. In the example below, using COALESCE returns an ID for every row which we are calling New_ID.
select coalesce(t1.Customer_ID, t2.Customer_ID) as New_ID, Sales, City
from Table_1 T1
FULL JOIN Table_2 T2 ON t1.Customer_ID = t2.Customer_ID
A LEFT JOIN (also called LEFT OUTER JOIN) is where every value on the left table (first table denoted in the code) is returned and only those values in the common field are returned from the right table. In Figure 5, we can see that every value from Table 1 is returned but only ‘Chicago’ and ‘New York’ are returned from Table 2. A RIGHT JOIN is the inverse of a LEFT JOIN and can be seen in Figure 6.
select t1.Customer_ID, Sales, t2.Customer_ID, City
from Table_1 T1
LEFT JOIN Table_2 T2 ON t1.Customer_ID = t2.Customer_ID
There is one other method of merging tables that is extremely valuable and that is the UNION statement. A UNION is when you are taking multiple tables with the same fields and merging them together. It is not exactly a join because you are not appending new fields, only new rows. In the example below, the two tables have the same fields Customer_ID and Age. We want to have one table that has all of these values together. The code below will give the result seen in Figure 8.
select Customer_ID, Age
from Table_3
UNION
select Customer_ID, Age
from Table_4
Now we can apply these learnings and join the tables in the retail_db database. One thing to note is that MySQL Workbench does not have the capability for FULL OUTER JOIN. There are some complex work arounds to do this, but from what I have seen, FULL OUTER JOIN is not as commonly used as INNER JOIN, LEFT JOIN, and UNION anyway.
We first want to retrieve data that has a customer’s sales history and city in the same row, but only for those customers who fall into both tables. This is done using an INNER JOIN as seen below.
select t1.Customer_ID, Sales, t2.Customer_ID, City
from retail_db.sales_tbl T1
INNER JOIN retail_db.location_tbl T2
ON t1.Customer_ID = t2.Customer_ID;;
Since not every customer is found in the retail_db.location_tbl table, if we want to return the sales history and demographic information using a LEFT JOIN on retail_db.sales_tbl, some values in City will show up as null. This can be seen in Figure 10 below.
select t1.Customer_ID, Sales, t2.Customer_ID, City
from retail_db.sales_tbl T1
LEFT JOIN retail_db.location_tbl T2
ON t1.Customer_ID = t2.Customer_ID;;
Finally we can use UNION to combine retail_db.sales_tbl and retail_db.sales_tbl_new.
select *
from retail_db.sales_tbl T1
UNION
select *
from retail_db.sales_tbl_new T2;;