Basics#
SQL (Structured Query Language) is a domain-specific language used to manage / process data stored in relational database management systems (RDBMS) e.g MySQL, PostgreSQL, SQLite. A relation is a table.
A table is a named collection of rows. Several tables can be grouped in a database. A collection of databases managed by a single server instance is called a database cluster.
Each row has the same set of named columns. The order of rows within a table is not guaranteed, but you can sort values for display.
Each column has a specific data type, and a fixed order in each row.
Spaces, tabs and newlines can be used freely in SQL commands.
-- introduces single-line comments.
1. Creating a table#
Use a CREATE TABLE statement with column names and their data types:
$ psql mydb
psql (15.3)
Type "help" for help.
mydb=>
mydb=> CREATE TABLE products (
mydb(> name varchar(50),
mydb(> items_in_stock int,
mydb(> price numeric(7, 2)
mydb(> );
CREATE TABLE
2. Populating a table#
Use an INSERT command:
mydb=> INSERT INTO products (name, items_in_stock, price) VALUES ('Apples', 100, 25);
INSERT 0 1
You can list columns in any order, with their respective values:
mydb=> INSERT INTO products (price, name, items_in_stock) VALUES (10, 'Bananas', 32);
INSERT 0 1
You can insert values without specifying columns (not recommended):
mydb=> INSERT INTO products VALUES ('Cherries', 74, 2.5);
INSERT 0 1
You can also use the COPY command to load large amounts of data from flat-text files (e.g. txt, csv) into a table.
Tip
The psql \copy command is more user-friendly when fetching/storing data in a file accessible to the psql client:
3. Querying a table#
Use a SELECT statement:
mydb=> SELECT * FROM products;
name | items_in_stock | price
----------+----------------+-------
Apples | 100 | 25.00
Bananas | 32 | 10.00
Cherries | 74 | 2.50
(3 rows)
* is shorthand for “all columns”. You can specify columns (recommended):
mydb=> SELECT name, price FROM products;
name | price
----------+-------
Apples | 25.00
Bananas | 10.00
Cherries | 2.50
(3 rows)
You can include expressions:
mydb=> SELECT name, items_in_stock * price AS inventory_value FROM products;
name | inventory_value
----------+-----------------
Apples | 2500.00
Bananas | 320.00
Cherries | 185.00
(3 rows)
You can use a WHERE clause to filter results:
mydb=> SELECT * FROM products WHERE items_in_stock < 50;
name | items_in_stock | price
---------+----------------+-------
Bananas | 32 | 10.00
(1 row)
You can use an ORDER BY clause to sort results:
mydb=> SELECT * FROM products ORDER BY price;
name | items_in_stock | price
----------+----------------+-------
Cherries | 74 | 2.50
Bananas | 32 | 10.00
Apples | 100 | 25.00
4. Joining tables#
A join query accesses multiple tables (or multiple instances of the same table) at once:
We’ll need another table to experiment with joins:
mydb=> CREATE TABLE suppliers (
mydb(> name varchar(70),
mydb(> product_name varchar(50),
mydb(> unit_price numeric(7, 2),
mydb(> last_delivery_date date
mydb(> );
CREATE TABLE
mydb=> INSERT INTO suppliers VALUES
mydb-> ('ACME Fruits Ltd', 'Bananas', 8.5, '2023-07-23'),
mydb-> ('Green Thumb Corp.', 'Spinach', 5.95, '2023-07-24'),
mydb-> ('Jolly Grocers', 'Apples', 23.80, '2023-07-24');
INSERT 0 3
mydb=> SELECT * FROM suppliers;
name | product_name | unit_price | last_delivery_date
-------------------+--------------+------------+--------------------
ACME Fruits Ltd | Bananas | 8.50 | 2023-07-23
Green Thumb Corp. | Spinach | 5.95 | 2023-07-24
Jolly Grocers | Apples | 23.80 | 2023-07-24
(3 rows)
It is good practice to qualify column names (e.g table.colname) and use aliases to avoid issues with duplicate column names:
mydb=> SELECT * FROM products JOIN suppliers ON name = product_name; -- this will fail
ERROR: column reference "name" is ambiguous
LINE 1: SELECT * FROM products JOIN suppliers ON name = product_name...
mydb=> SELECT * FROM products p JOIN suppliers s ON p.name = s.product_name; -- 2 "name" columns
name | items_in_stock | price | name | product_name | unit_price | last_delivery_date
---------+----------------+-------+-----------------+--------------+------------+--------------------
Apples | 100 | 25.00 | Jolly Grocers | Apples | 23.80 | 2023-07-24
Bananas | 32 | 10.00 | ACME Fruits Ltd | Bananas | 8.50 | 2023-07-23
(2 rows)
mydb=> SELECT s.name AS supplier_name, p.name AS product_name, p.price,
mydb-> s.unit_price AS purchase_price, p.items_in_stock, s.last_delivery_date
mydb-> FROM products p JOIN suppliers s ON p.name = s.product_name;
supplier_name | product_name | price | purchase_price | items_in_stock | last_delivery_date
-----------------+--------------+-------+----------------+----------------+--------------------
Jolly Grocers | Apples | 25.00 | 23.80 | 100 | 2023-07-24
ACME Fruits Ltd | Bananas | 10.00 | 8.50 | 32 | 2023-07-23
(2 rows)
The default is an inner join, which returns only rows that match the join condition. To include all possible results from both tables, we can use a full outer join:
mydb=> SELECT s.name AS supplier_name, p.name AS product_name, p.price,
mydb-> s.unit_price AS purchase_price, p.items_in_stock, s.last_delivery_date
mydb-> FROM products p FULL OUTER JOIN suppliers s ON p.name = s.product_name
mydb-> ORDER BY supplier_name;
supplier_name | product_name | price | purchase_price | items_in_stock | last_delivery_date
-------------------+--------------+-------+----------------+----------------+--------------------
ACME Fruits Ltd | Bananas | 10.00 | 8.50 | 32 | 2023-07-23
Green Thumb Corp. | | | 5.95 | | 2023-07-24
Jolly Grocers | Apples | 25.00 | 23.80 | 100 | 2023-07-24
| Cherries | 2.50 | | 74 |
(4 rows)
More on joins later.
5. Aggregate functions#
Aggregate functions compute a single result from multiple input rows e.g. count, sum, avg, max and min.
mydb=> SELECT sum(price * items_in_stock) FROM products AS total_inventory_value;
sum
---------
3005.00
(1 row)
mydb=> SELECT min(price), max(price), avg(price) FROM products;
min | max | avg
------+-------+---------------------
2.50 | 25.00 | 12.5000000000000000
(1 row)
To include aggregate functions in WHERE clauses, you can use a subquery. WHERE clauses determine which rows to include, and so are processed before aggregate functions.
mydb=> SELECT name, price FROM products WHERE price = min(price);
ERROR: aggregate functions are not allowed in WHERE
LINE 1: SELECT name, price FROM products WHERE price = min(price);
^
mydb=> SELECT name, price FROM products WHERE price = (SELECT min(price) FROM products);
name | price
----------+-------
Cherries | 2.50
(1 row)
Aggregate functions are often used in GROUP BY clauses:
mydb=> INSERT INTO suppliers (name, product_name, unit_price, last_delivery_date) VALUES
mydb-> ('Planet Farms', 'Apples', 24.10, '2023-07-25'),
mydb-> ('City Merchants', 'Bananas', 9.00, '2023-07-25');
INSERT 0 2
mydb=> SELECT product_name, count(name) AS num_suppliers, min(unit_price) AS min_price,
mydb-> max(unit_price) AS max_price, avg(unit_price) AS avg_price
mydb-> FROM suppliers
mydb-> GROUP BY product_name;
product_name | num_suppliers | min_price | max_price | avg_price
--------------+---------------+-----------+-----------+---------------------
Apples | 2 | 23.80 | 24.10 | 23.9500000000000000
Bananas | 2 | 8.50 | 9.00 | 8.7500000000000000
Spinach | 1 | 5.95 | 5.95 | 5.9500000000000000
(3 rows)
mydb=> SELECT s.product_name, avg(s.unit_price) AS avg_purchase_price, p.price AS sale_price
mydb-> FROM products p JOIN suppliers s ON p.name = s.product_name
mydb-> GROUP BY s.product_name, p.price
mydb-> ORDER BY sale_price DESC;
product_name | avg_purchase_price | sale_price
--------------+---------------------+------------
Apples | 23.9500000000000000 | 25.00
Bananas | 8.7500000000000000 | 10.00
You can filter grouped rows with a HAVING clause.
mydb=> SELECT product_name, avg(unit_price) AS avg_purchase_price
mydb-> FROM suppliers
mydb-> GROUP BY product_name
mydb-> HAVING avg(unit_price) < 10;
product_name | avg_purchase_price
--------------+--------------------
Bananas | 8.7500000000000000
Spinach | 5.9500000000000000
(2 rows)
Note
The fundamental difference between WHERE and HAVING is that WHERE selects input rows before grouping & aggregation, whereas HAVING selects group rows after groups and aggregates are computed.
HAVING clauses usually contain aggregate functions, but this isn’t a must. In such cases, WHERE clauses would be more efficient; we’d avoid doing grouping and aggregate calculations for all rows that fail the WHERE check.
6. Updates#
You can update existing rows using the UPDATE command:
mydb=> UPDATE products SET price = 3 WHERE name = 'Cherries';
UPDATE 1
mydb=> SELECT * FROM products;
name | items_in_stock | price
----------+----------------+-------
Apples | 100 | 25.00
Bananas | 32 | 10.00
Cherries | 74 | 3.00
(3 rows)
7. Deletions#
You can remove rows using the DELETE command:
mydb=> DELETE FROM products WHERE name = 'Bananas';
DELETE 1
mydb=> SELECT * FROM products;
name | items_in_stock | price
----------+----------------+-------
Apples | 100 | 25.00
Cherries | 74 | 3.00
(2 rows)
Caution
DELETE FROM tablename; will remove all rows. Be specific.
Tip
Start with a SELECT query to check the rows that would be selected. Then switch to a DELETE command.
You can use DROP TABLE to remove a table:
mydb=> DROP TABLE products;
DROP TABLE
mydb=> SELECT * FROM products;
ERROR: relation "products" does not exist
LINE 1: SELECT * FROM products;
^