Basic query syntax, part-2

Sorting data

Let's now introduce another SQL feature: sorting. As the name implies, it enables us to enforce an order on the data returned by our query. As an example, let's sort our query results so that the most expensive one will be returned first and the least expensive one will be returned last:

SELECT id, business_name, total, vat_rate
FROM learnsql.invoices
WHERE business_name IS NOT NULL
ORDER BY total DESC;

Try it in the Notebook

New syntax: ORDER BY <column_name> ASC / DESC allows you to sort data by specific column(s), specifying if it should be sorted in ascending order (from the lowest value to the highest one, ASC keyword) or descending order (from the highest value to the lowest one, DESC keyword).

We're not limited to sorting by a single column, by the way:

SELECT id, business_name, total, vat_rate
FROM learnsql.invoices
WHERE business_name IS NOT NULL
ORDER BY business_name ASC, total DESC;

Try it in the Notebook

Now the database will first sort all rows (that match our conditions) by business_name in ascending order and then, whenever multiple rows have the same business_name, further sort them by total in descending order.

Limiting results

We can instruct the database to only return (up to) a certain number of rows, even if more would fulfill our query criteria. While not mandatory, this is usually regarded as a good practice when working with big tables (think: more than a few hundred rows) because it reduces the amount of data that needs to be transferred over the network, thus making queries run faster and making the database load more predictable.

This feature is heavily used in pretty much every mobile app and website: think about the infinite scrolling mechanism of many social media or the links to proceed to the next page of results in search engines, blogs, e-commerce websites, and so on...

We can of course provide our own limit, which will override the default one:

SELECT *
FROM learnsql.invoices
LIMIT 100 OFFSET 101;

New syntax: LIMIT <number> OFFSET <number> indicates the maximum number of rows that should be returned by the database; the OFFSET keyword can then be used to select the next batch of results (in the above example, it will return the second "page" of results, meaning the next 100 rows starting from the one in position 101); OFFSET is an optional keyword and omitting is the same as writing OFFSET 0.

Try it in the Notebook

Sorting when applying limits

It's important to sort data when using the LIMIT keyword: in fact, unless an ORDER BY clause is provided by the query, running the same query multiple times might return different rows every time!

SELECT id, business_name, total, vat_rate
FROM learnsql.invoices
WHERE business_name IS NOT NULL
ORDER BY business_name ASC, total DESC;

Try it in the Notebook

Now, you might be wondering why DBMSs don’t have a default sorting mechanism, to provide stable results on each query execution. The answer is that sorting data is an expensive operation, which gets more expensive as the dataset grows: database engines try to optimize their execution plans to the utmost, so they will not perform unrequested operations!

Aliasing

Assigning a different name to a column in our query (note: this doesn’t affect original data!) can occasionally be useful, in particular when we apply some transformations to it before returning it in our query. Let’s, for example, return the full name of all private customers by concatenating (||operator) their last and first name, separated by a comma and a whitespace:

SELECT
last_name || ' , ' || first_name AS full_name
FROM learnsql.invoices
WHERE last_name IS NOT NULL AND first_name IS NOT NULL;

Try it in the Notebook

The AS keyword that you see on the second line enables us to change the name of the resulting column in our dataset to full_name.

This also gives us the opportunity to mention another feature we haven’t made use of, so far: we are not limited to returning column values as they are, we can actually return expressions based on column values that may better fit our reporting needs. For example, the following are all possible:

SELECT
last_name || ', ' || firstname AS full_name,
total * vat_rate AS total_incl_vat,
total - discont_applied AS amount
FROM learnsql.invoices
WHERE last_name IS NOT NULL;

Try it in the Notebook

Selecting distinct values

Let’s say that we want to get a list of all the countries in which we have issued invoices, but we only want to see each country once: simply selecting the country colum won’t do in this case, because the same country may appear multiple times in the results (according to the number of invoices issued in that country). Luckily, SQL provides us with a way to do just that:

SELECT
distinct country
FROM learnsql.invoices;

Try it in the Notebook

The distinct statement causes the DBMS to “reduce” the results so that we see each country exactly once, which is precisely what we wanted!

Query readability

As you might have noticed, the queries in this article have been written over multiple lines, unlike the ones in previous articles: for the database there's no difference at all, but for humans there is! As our queries get more complex, it's a good practice to write queries over multiple lines in order to improve their readability, so we'll keep writing them this way from now on.

For the same reason, it’s common to write keywords in capital letters, so that they can be immediately spotted in the query. Further down the line, as our queries become even more complex, we’ll also start using indentation.

Image caption: Keywords in capital letters can be quickly spotted in the query; the use of multiple lines enables us to divide the query into its logical parts.

Practice

Are you ready to practice what you’ve just learned? Head over to the Notebook and try to solve the exercises you find there!

Note: the solution to each exercise is in the collapsed block right below it: you can expand it by clicking the arrow icon that appears on the far left when you hover it.


Jump to