Table Expressions#

A table expression computes a table. It contains a FROM clause that is optionally followed by WHERE, GROUP BY, and HAVING clauses.

1. The FROM Clause#

Derives a table from one or more other tables given in a table reference list:

FROM table_reference [, table_reference [, ...]]

A table reference can be a (schema-qualified) table name or a derived table (e.g. a subquery, a JOIN construct).

If more than one table reference is listed, the tables are cross-joined - the Cartesian product of their rows is formed. The result is a virtual table.

For tables which are parents, the table reference produces rows of all descendants, unless the keyword ONLY precedes the table name. Either way, only columns in the parent are produces, and those added in subtables are ignored.

1.1 Joined Tables#

Derived from two other (real or derived) tables. The general syntax is:

T1 join_type T2 [ join_condition ]

Joins can be chained or nested. () can be used to control join order, or else they’re evaluated left-to-right.

1.1.1 Cross Join#

Rows produced are a Cartesian product (every possible combination) of T1 and T2.

T1 CROSS JOIN T2 | T1, T2 | T1 INNER JOIN T2 ON TRUE

If T1 had N rows and T2 had M rows, N * M rows are produced.

CREATE TABLE T1 (num  int, name  char(1));
INSERT INTO T1 VALUES (1, 'a'), (2, 'b'), (3, 'c');
CREATE TABLE T2 (num  int, letters  char(3));
INSERT INTO T2 VALUES (1, 'xxx'), (3, 'yyy'), (5, 'zzz');
mydb=> SELECT * FROM T1, T2; 
 num | name | num | letters 
-----+------+-----+---------
   1 | a    |   1 | xxx
   1 | a    |   3 | yyy
   1 | a    |   5 | zzz
   2 | b    |   1 | xxx
   2 | b    |   3 | yyy
   2 | b    |   5 | zzz
   3 | c    |   1 | xxx
   3 | c    |   3 | yyy
   3 | c    |   5 | zzz
(9 rows)

Note: Joins bind more tightly than commas. FROM T1 CROSS JOIN T2 INNER JOIN T3 ON condition is not the same as FROM T1, T2 INNER JOIN T3 ON condition, since condition can reference T1 in the first case but not the second.

1.1.2 Qualified Joins#

T1 { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN T2 ON boolean_expression
T1 { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN T2 USING ( join column list )
T1 NATURAL { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN T2

INNER(default) and OUTER are optional. LEFT, RIGHT and FULL imply an outer join.

The join_condition is specified in ON, USING or NATURAL. It determines which rows from the 2 source tables are considered to “match”.

  • INNER JOIN:

    • Only rows that “match” in both tables are included.

      mydb=> SELECT * FROM T1 INNER JOIN T2 ON T1.num = T2.num;
       num | name | num | letters 
      -----+------+-----+---------
         1 | a    |   1 | xxx
         3 | c    |   3 | yyy
      (2 rows)
      
  • LEFT OUTER JOIN:

    • An inner join is performed.

    • Rows in T1 with no “match” in T2 are added, with null values in columns of T2.

      mydb=> SELECT * FROM T1 LEFT JOIN T2 USING (num);
       num | name | letters 
      -----+------+---------
         1 | a    | xxx
         2 | b    | 
         3 | c    | yyy
      (3 rows)   
      
  • RIGHT OUTER JOIN:

    • An inner join is performed.

    • Rows in T2 with no “match” in T1 are added, with null values in columns of T1.

      mydb=> SELECT * FROM T1 RIGHT JOIN T2 USING (num);
       num | name | letters 
      -----+------+---------
         1 | a    | xxx
         3 | c    | yyy
         5 |      | zzz
      (3 rows)
      
  • FULL OUTER JOIN:

    • An inner join is performed.

    • Rows in T1 with no “match” in T2 are added, with null values in columns of T2.

    • Rows in T2 with no “match” in T1 are added, with null values in columns of T1.

      mydb=> SELECT * FROM T1 FULl JOIN T2 USING (num);
       num | name | letters 
      -----+------+---------
         1 | a    | xxx
         2 | b    | 
         3 | c    | yyy
         5 |      | zzz
      (4 rows)
      

The ON clause takes a boolean value expression, same as in WHERE. A pair of rows from T1 and T2 match if the expression evaluates to true.

The USING clause is used when both sides of the join use the same name(s) for the joining column(s). It takes a comma-separated list of the shared column names. Joining T1 and T2 with USING (a, b) produces the join condition ON T1.a = T2.a AND T1.b = T2.b.

Importantly, only one of the shared columns is returned in JOIN USING, whereas JOIN ON still includes them.

NATURAL is a shorthand form of USING that forms a USING list consisting of all column names that appear in both tables. If there are no common names, NATURAL JOIN behaves like JOIN ... ON TRUE, resulting in a cross join.

Note: A restriction placed on the ON clause is processed before the join, while one placed in a WHERE clause is processed after the join. This matters a lot in outer joins:

mydb=> SELECT * FROM T1 LEFT JOIN T2 ON T1.num = T2.num AND T2.letters = 'xxx';
 num | name | num | letters 
-----+------+-----+---------
   1 | a    |   1 | xxx
   2 | b    |     | 
   3 | c    |     | 
(3 rows)

mydb=> SELECT * FROM T1 LEFT JOIN T2 ON T1.num = T2.num WHERE T2.letters = 'xxx';
 num | name | num | letters 
-----+------+-----+---------
   1 | a    |   1 | xxx
(1 row)

1.2 Table & Column Aliases#

Table aliases are temporary names given to tables, mainly for notational convenience. You may not refer to the original name afterwards.

FROM table_reference [AS] alias
FROM table_reference [AS] alias ( column1 [, column2 [, ...]

AS is optional. alias can be any identifier.

The second form gives temporary names to the table as well as its columns. If fewer column aliases are specified, the remaining columns are not renamed.

You must use an alias when joining a table to itself, or if the table reference is a subquery:

SELECT
    *
FROM
    people AS parent
JOIN people AS child ON
    parent.id = child.parent_id;

1.3 Subqueries#

Subqueries specifying a derived table must be enclosed in () and assigned a table alias.

FROM (SELECT ... FROM some_table) AS alias_name

A subquery can also be a VALUES list:

FROM (VALUES ('anne', 'smith'), ('bob', 'jones'), ('joe', 'blow'))
    AS names(first, last)

1.4 Table Functions#

These are functions that produce a set of rows of either base (scalar) data types or composite data types (table rows).

Columns from tabke functions can be used in SELECT, JOIN or WHERE clauses just like tables, views or subqueries.

Table functions may be combined using ROWS FROM, returning parallell columns; number of rows is that of the largest function result, with smaller results padded with null.

function_call [WITH ORDINALITY] [[AS] table_alias [(column_alias [, ... ])]]
ROWS FROM( function_call [, ... ] ) [WITH ORDINALITY] [[AS] table_alias [(column_alias [, ... ])]]

Using WITH ORDINALITY adds a bigint column numbering the columns of the function result set, starting from 1, named ‘ordinality’ (default).

The special table function UNNEST acts on array parameters, returning columns as if UNNEST had been called on each parameter separately and combined using ROWS FROM:

UNNEST( array_expression [, ... ] ) [WITH ORDINALITY] [[AS] table_alias [(column_alias [, ... ])]]

If no table_alias is set, the function name is used as the table name. In ROWS FROM (), the function’s first name is used.

If column aliases are not supplied, then for a function returning a base data type, the column name is also the same as the function name. For a function returning a composite type, the result columns get the names of the individual attributes of the type.

1.5 Lateral Subqueries#

Preceding subqueries in FROM with the key word LATERAL allows them to reference columns provided by preceding items. Without LATERAL, each subquery is evaluated independently.

SELECT * FROM foo, LATERAL (SELECT * FROM bar WHERE bar.id = foo.bar_id) ss;

LATERAL is optional in table functions since they can reference preceding items anyways.

A LATERAL item can appear at top level in the FROM list, or within a JOIN tree.

FROM items containing LATERAL cross-references are processed as:

  • For each row of the FROM item providing the cross-refereced column(s), or set of rows of mutliple FROM items providing the columns, the LATERAL item is evaluated using that row or row set’s values of the columns.

  • The resulting row(s) are joined as usual with the rows they were computed from.

  • Repeat for every row or set of rows from the column source table(s).

LATERAL is primarily useful when the cross-referenced column is necessary for computing the row(s) to be joined.

It is often particularly handy to LEFT JOIN to a LATERAL subquery, so that source rows will appear in the result even if the LATERAL subquery produces no rows for them.

2. The WHERE Clause#

Filters rows of the derived virtual table from FROM.

WHERE search_condition

where search_condition is any value expression that returns a boolean value. Only rows evaluating to true are kept (false, null are discarded).

Note: The join condition of an inner join can be written in the WHERE clause or in the JOIN clause:

FROM a, b WHERE a.id = b.id AND b.val > 5
-- is equivalent to
FROM a INNER JOIN b ON (a.id = b.id) WHERE b.val > 5

3. GROUP BY & HAVING Clauses#

GROUP BY groups rows in a table with similar values in the listed columns, to eliminate redundancy in the output and/or compute aggregates.

In general, if a table is grouped, columns not listed in GROUP BY cannot be referenced except in aggregate expressions.

In strict SQL, GROUP BY can only group by columns of the source table. But postgres columns in the select list and value expressions.

HAVING can be used to include only groups of interest. Expressions in a HAVING clause can refer to grouped expressions ungrouped expressions involving an aggregate function.

SELECT select_list FROM ... [WHERE ...] GROUP BY ... HAVING boolean_expression

If a query contains aggregate function calls, but no GROUP BY clause, grouping still occurs resulting in a single group row. The same is true if it contains a HAVING clause, even without any aggregate function calls or GROUP BY clause.

4. GROUPING SETS, CUBE & ROLLUP#

Grouping sets group rows just like GROUP BY clauses.

CREATE TABLE items (
    brand       varchar(20),
    size        varchar(3),
    sales       decimal(7, 2)
);
INSERT INTO items
    VALUES ('Foo', 'L', 10), ('Foo', 'M', 20), ('Bar', 'M', 15), ('Bar', 'L',  5);
mydb=> SELECT * FROM items;
 brand | size | sales 
-------+------+-------
 Foo   | L    | 10.00
 Foo   | M    | 20.00
 Bar   | M    | 15.00
 Bar   | L    |  5.00
(4 rows)

mydb=> SELECT brand, size, sum(sales) FROM items GROUP BY GROUPING SETS ((brand), (size), ());
 brand | size |  sum  
-------+------+-------
       |      | 50.00
 Foo   |      | 30.00
 Bar   |      | 20.00
       | L    | 15.00
       | M    | 35.00
(5 rows)

Each sublist of GROUPING SETS may specify zero or more columns / expressions, and is interpreted as if directly in a GROUP BY clause.

An empty grouping set () means that all rows are aggregated down to a single group.

References to the grouping columns / expressions are replaced by null values in result rows for grouping sets in which those columns do not appear.

ROLLUP ( e1, e2, e3, ... )
-- is equivalent to
GROUPING SETS (
    ( e1, e2, e3, ... ),
    ...
    ( e1, e2 ),
    ( e1 ),
    ( )
)

ROLLUP is commonly used for analysis over heirachical data e.g. total salary by department, division and company-wide total.

CUBE ( a, b, c )
-- is equivalent to 
GROUPING SETS (
    ( a, b, c ),
    ( a, b    ),
    ( a,    c ),
    ( a       ),
    (    b, c ),
    (    b    ),
    (       c ),
    (         )
) -- power set(all possible subsets)

Sublist elements in CUBE and ROLLUP clauses are treated as single units:

CUBE ( (a, b), (c, d) )
-- is equivalent to
GROUPING SETS (
    ( a, b, c, d ),
    ( a, b       ),
    (       c, d ),
    (            )
)

ROLLUP ( a, (b, c), d )
-- is equivalent to
GROUPING SETS (
    ( a, b, c, d ),
    ( a, b, c    ),
    ( a          ),
    (            )
)

CUBE and ROLLUP can either be used directly in GROUP BY, or nested inside a GROUPING SETS clause.

Nesting a GROUPING SET clause inside another treats all elements of the inner clause as if directly written in the outer clause.

If multiple grouping items are specified in a single GROUP BY, the final list of grouping sets is the cross product of the individual items:

GROUP BY a, CUBE (b, c), GROUPING SETS ((d), (e))
-- is equivalent to
GROUP BY GROUPING SETS (
    (a, b, c, d), (a, b, c, e),
    (a, b, d),    (a, b, e),
    (a, c, d),    (a, c, e),
    (a, d),       (a, e)
)

The final set of grouping sets might contain duplicates, which can be removed using the DISTINCT clause directly on the GROUP BY:

GROUP BY ROLLUP (a, b), ROLLUP (a, c)
-- is equivalent to
GROUP BY GROUPING SETS (
    (a, b, c),
    (a, b),
    (a, b),
    (a, c),
    (a),
    (a),
    (a, c),
    (a),
    ()
)

GROUP BY DISTINCT ROLLUP (a, b), ROLLUP (a, c)
-- is equivalent to
GROUP BY GROUPING SETS (
    (a, b, c),
    (a, b),
    (a, c),
    (a),
    ()
)

5. Window Function Processing#

Window functions are evaluated after grouping, aggregation and HAVING filtering are performed. They won’t see the original rows from FROM/WHERE.

Multiple window functions having syntactically equivalent PARTITION BY and ORDER BY clauses are guaranteed to be evaluated in a single pass over the data.

Currently, window functions always require presorted data, and so query output will be ordered according to one or another of the window functions’ PARTITION BY/ORDER BY clauses. Use an explicit top-level ORDER BY if you wish to guarantee a particular order.