SQL Querying Multiple Tables

From NovaOrdis Knowledge Base
Revision as of 00:52, 24 May 2024 by Ovidiu (talk | contribs) (→‎Join)
Jump to navigation Jump to search

External

Internal

Overview

More than one table can be used in a FROM query clause, and when that happens, it is said that the query performs a join before filtering and returning the results.

Join

The mechanism of linking two or more tables is referred to as a join. For two tables, the most generic way of combining rows is to take every row from the first table and combine it with every row with the other table. This is known as the cartesian product of two relations, or a cross join. If the two tables have N and M rows respectively, the result will have N x M rows. In case of three tables, with N, M and P rows respectively, the result will contain N x M x P rows. See the Cross Join section below for an example of how that works.

In most cases, cartesian products are not very useful, they produce a lot of redundant data. We are usually interested in combining rows that are related to each other, and dropping everything else. This is done with an inner join. The inner join can be thought of as performing the cartesian product, then retaining only the rows that satisfy the join condition. Inner joins are the most commonly used type of join. This mental model is confirmed by results of an inner join without a join condition: the cartesian product. Other types of joins are the outer joins: left outer join and right outer join.

Examples

All examples provided in this article are based on three tables (person, address and country), with the following schema:

person 

Column     | Type                                                  id | name             | company_id
-----------+---------                                           ------+------------------+-------------
id         | integer  # primary key                                1  | Alice            | 10
name       | text                                                  2  | Bob              | 20 
company_id | integer  # foreign key that references company(id)    3  | Charlie          | 30 

company

Column     | Type                                                  id | name             | city_id
-----------+---------                                           ------+------------------+-------------
id         | integer # primary key                                10  | Moonphone        | 100
name       | text                                                 20  | Vortextime       | 200  
city_id    | integer # foreign key that references city(id)       30  | Bluestone        | 300 

city

Column     | Type                                                  id | name             
-----------+---------                                           ------+------------------
id         | integer # primary key                                100 | San Francisco
name       | text                                                 200 | New York  
                                                                  300 | Chicago

Inner Join

An inner join is performed by specifying more than one table in the FROM clause with the following syntax. Each table that participates to the join, except the first one, is preceded by the JOIN keyword and followed by the ON keyword:

SELECT ...
  FROM table_one
    JOIN table_two ON <join_condition>
    JOIN table_three ON <join_condition>
    ...

A JOIN keyword without any qualifier signifies an inner join. For clarity, INNER can be specified:

SELECT ...
  FROM table_one
    INNER JOIN table_two ON <join_condition>
    INNER JOIN table_three ON <join_condition>
    ...

These two forms are equivalent.

The complete syntax to inner join two tables is:

 SELECT person.name AS name, company.name AS company
 FROM person 
   JOIN company ON person.company_id = company.id

  name   |  company
---------+------------
 Alice   | Moonphone
 Bob     | Vortextime
 Charlie | Bluestone

The complete syntax to inner join three tables is:

 SELECT person.name AS name, company.name AS company, city.name AS city 
 FROM person
   JOIN company ON person.company_id = company.id 
   JOIN city ON company.city_id = city.id;

  name   |  company   |     city
---------+------------+---------------
 Alice   | Moonphone  | San Francisco
 Bob     | Vortextime | New York
 Charlie | Bluestone  | Chicago

Join Condition

Tables are usually joined using foreign keys. The join condition states that we should associate the rows from the first table with rows from other table where the foreign key column value from the first table row is equal with the primary key column value from other table row.

Left Outer Join

Right Outer Join

Cross Join

A cross join produces the cartesian product of the tables participating in join.

For two tables:

 
SELECT * FROM person 
  CROSS JOIN company;

 id |  name   | company_id | eye_color | id |    name    | city_id
----+---------+------------+-----------+----+------------+---------
  1 | Alice   |         10 | blue      | 10 | Moonphone  |     100
  2 | Bob     |         20 | black     | 10 | Moonphone  |     100
  3 | Charlie |         30 | black     | 10 | Moonphone  |     100
  1 | Alice   |         10 | blue      | 20 | Vortextime |     200
  2 | Bob     |         20 | black     | 20 | Vortextime |     200
  3 | Charlie |         30 | black     | 20 | Vortextime |     200
  1 | Alice   |         10 | blue      | 30 | Bluestone  |     300
  2 | Bob     |         20 | black     | 30 | Bluestone  |     300
  3 | Charlie |         30 | black     | 30 | Bluestone  |     300
(9 rows)

This is the same as performing an inner join without a join condition.

 
SELECT * FROM person, company;

For three tables:

 
SELECT COUNT(*) FROM person 
  CROSS JOIN company
  CROSS JOIN city;

 count
-------
   27

Using Table Aliases in Joins

The SQL syntax permits to alias each of the join-participating tables, so you can identify which tables you are referring to when you reference columns in the SELECT clause, by using table aliases:

 FROM table_one AS table_one_alias
   JOIN table_two AS table_two_alias ON <join_condition>
   JOIN table_three AS table_three_alias ON <join_condition>
   ...

For permanent, temporary and virtual tables aliasing is optional, their original names are sufficient to reference columns. However for derived tables, the alias is useful:

SELECT person.name, derived_table_alias.name 
FROM person 
  JOIN (SELECT id, UPPER(name) AS name FROM company) AS derived_table_alias ON person.company_id = derived_table_alias.id;