Let's dig into the basics of the SQL query syntax by querying a table called
invoices. Each row in this table contains all the information related to a single invoice: customer details, the country in which the invoice was issued, its total amount, VAT rate, and so on.
Start with a simple query that retrieves (selects, in SQL terminology) all the data in the table:
SELECT * FROM learnsql.invoices;
SELECTis exactly the same as
select; column names, on the other hand, are case-sensitive!
* wildcard symbol, which translates to "all the columns in this table"; in alternative, we can explicitly select only the columns we're interested in, for example:
SELECT id, business_name, country, total, vat_rate FROM learnsql.invoices;
Let's now change the query so that it only returns invoices issued in Germany:
SELECT id, business_name, total, vat_rate FROM learnsql.invoices WHERE country = 'Germany';
New syntax: the
WHERE keyword introduces the conditions’ block: now the database will test each row against our conditions (a single one, in this case) and only return the rows that fulfill them.
Think about the
WHERE clauses as working on two different axes of your table: the
SELECT clause works on the vertical axis, creating a projection that restricts the columns which are returned to you; the
WHERE clause works instead on the horizontal axis, creating a filter that restricts the rows which are returned to you.
Let's now do the opposite and select only the invoices which have not been issued in Germany:
SELECT id, business_name, total, vat_rate FROM learnsql.invoices WHERE country <> 'Germany';
Try it in the Notebook
There are multiple operators that we can use to filter data, depending on the column type. For example, on an integer column like `total` we could use:
SELECT * FROM learnsql.invoices WHERE total > 10000; SELECT * FROM learnsql.invoices WHERE total >= 10000; -- greater or equal to SELECT * FROM learnsql.invoices WHERE total < 10000; SELECT * FROM learnsql.invoices WHERE total <= 10000; -- less than or equal to
Did you know? In Query.me you can have multiple queries in a single SQL block: just remember to add a trailing `;` to each. When you want to execute one of them, take care of selecting it before hitting the Execute button: if you forget to do so, you'll only see the results of the last one.
If you look closely at the results of the previous queries, you'll notice that some rows contain the `null` value in the `business_name` column: `null` is actually not a value, but a keyword indicating the absence of value. When writing conditions that involve `null`, we need to use a slightly different syntax:
SELECT id, business_name, total, vat_rate FROM learnsql.invoices WHERE business_name IS NULL; SELECT id, business_name, total, vat_rate FROM learnsql.invoices WHERE business_name IS NOT NULL; # don't do this! SELECT id, business_name, total, vat_rate FROM learnsql.invoices WHERE business_name = NULL; SELECT id, business_name, total, vat_rate FROM learnsql.invoices WHERE business_name <> NULL;
IS [NOT] NULL allows you to filter on columns that may contain
Important: using the
<> operators in combination with
null will return unexpected results, so take care to avoid doing so. Some databases will return a syntax error and outright refuse to run the query, while others might silently accept the query, but there's no guarantee that they will return the expected results.
Filtering data by a single condition is not very effective when our table has several columns, but luckily we can actually have multiple ones: when using more than one condition, we have to tell the database how we would like them to be combined.
Let's imagine that have two filters in mind:
Do we want a row to be returned only when it matches both conditions (= the invoice was issued in Germany and its total amount is greater than 100)? Then we should use the
Do we want a row to be returned if it matches at least one condition? Then we should use the
SELECT * FROM learnsql.invoices WHERE country = 'Germany' AND total > 100; SELECT * FROM learnsql.invoices WHERE country = 'Germany' OR total > 100;
OR keywords, to combine conditions.
Deep dive: Database engines try their utmost to optimize query executions, so they'll short-circuit boolean conditions whenever possible. What this means is that they will evaluate the first condition and, depending on the chosen operator (
OR), they might decide that there's no need to evaluate any remaining condition. These are the situations in which short-circuiting will happen:
AND: if the first condition evaluates to
FALSE, the row is immediately discarded from the result set
OR: if the first condition evaluates to
TRUE, the row is immediately added to the result set
In all other situations, it's necessary to also evaluate the next conditions performing the same steps as above, until it's clear whether the row belongs to the result set or not. In the worst case, all conditions will need to be evaluated.
References: Boolean logic
When using both
OR in the same query, we have to be aware of their precedence rules, which are best explained with a practical example.
Let's suppose that we would like to select all invoices that have already been paid and have been charged either 21% or a 19% VAT rate. Our first attempt might look like this:
SELECT paid, vat_rate FROM learnsql.invoices WHERE paid IS TRUE AND vat_rate = 0.21 OR vat_rate = 0.19;
If you execute the query and inspect the results, you'll quickly spot some rows where
paid is actually
false, which is not what we wanted! The problem is that we assumed that our query would be executed as follows (note the added parentheses, to make logical precedence explicit):
SELECT paid, vat_rate FROM learnsql.invoices WHERE paid IS TRUE AND (vat_rate = 0.21 OR vat_rate = 0.19);
whereas the database executed it as follows (again, note the parentheses):
SELECT paid, vat_rate FROM learnsql.invoices WHERE (paid IS TRUE AND vat_rate = 0.21) OR vat_rate = 0.19;
thus returning rows for invoices which have already been paid and had 21% VAT rate, as well as those that had 19% VAT rate (regardless of their payment status).
In order to avoid this, we have to make our intended precedence explicit by using parentheses, as in the first of the above two queries.
New syntax: when filtering on boolean columns such as
paid, we have to use the
IS TRUE / IS FALSE syntax.
Side note: there’s actually a better way to write the above query, which doesn’t require us to use
SELECT paid, vat_rate FROM learnsql.invoices WHERE paid IS TRUE AND vat_rate IN (0.21, 0.19);
IN operator takes as input a comma-separated list of values that will be accepted for that column.
Are you ready to practice what you’ve just learned? Head over to the Notebook and try to solve the exercises you find there!
Note: the solution to each exercise is in the collapsed block right below it: you can expand it by clicking the arrow icon that appears on the far left when you hover it.