Demystifying SQL 102: DQL Basic Commands

SQL DQL: Basic Commands

At the heart of querying data, lies the domain of Data Querying Language (DQL). As you may remember from the previous post (and if you don't then I encourage you to give it a try, SQL DDL: Basic Commands), Data Definition Language (DDL) commands allowed us to create and define a database and its tables, i. e., it allowed us to define the database schema. DQL commands, on the other hand, are used to query and manipulate data stored in databases. 

In this blog post, we'll unravel the core DQL commands. From the fundamental SELECT statement to filtering with WHERE, sorting with ORDER BY, and limiting results with LIMIT, we'll embark on a journey to master the essentials of data retrieval.

Up next, we will review the most common DQL commands, but first, I will encourage you to visit the SQL DDL: Basic Commands blog post to learn about DB Fiddle and how to use it. This is a free-online tool where you can select distinct SQL dialects, create your own databases and tables, as well as develop SQL queries to test and share with your colleagues (see note below).

NOTE: We will use MySQL latest version available for the next explanations, so make sure to select MySQL to follow the next steps.
The reason behind this is because MySQL is one of the most popular open-source DBMS systems. 
MySQL has a large and active community of users and developers, which means you'll find a wealth of resources, tutorials, and forums 
to help you with any questions or issues you encounter. MySQL is also used widely in the industry, so you will find yourself learning
a dialect used in real-world scenarios and work places.

Data Preparation

First, we will create a database and a table in DB Fiddle. Once again, we will use MySQL to create and manipulate the data. Below you will find the SQL code used to create both a Database and a Table so we can use it for this exercise.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
CREATE DATABASE sqldojo_db;

USE sqldojo_db;
CREATE TABLE sqldojo_pets_table (name VARCHAR(55), owner VARCHAR(55), species VARCHAR(55), sex CHAR(1), birth DATE);

INSERT INTO sqldojo_pets_table (name, owner, species, sex, birth)
VALUES 
('Kika', 'Margaret Roberts', 'Dog', 'F', '2022-10-01'),
('Haru', 'Margaret Roberts', 'Dog', 'F', '2021-01-02'),
('Doggo', 'Robert Martin', 'Dog', 'M', '2017-11-08'),
('Katze', 'Martin Suarez', 'Cat', 'M', '2019-10-01'),
('Maito', 'Andrew Silva', 'Cat', 'F', '2018-08-12');

Creating a DB, Table and some records.
Creation of a table with some rows in MySQL Server within DB Fiddle.


Now that there is a table with some available data, we can proceed and use DQL commands to query and manipulate this data. Here is a very brief blog by Beau Carnes about the most known SQL commands: Basic SQL Commands - The List of Database Queries and Statements You Should Know.

SELECT & FROM Statements

The SELECT statement is the fundamental command for retrieving data from one or more tables. With this command you state which columns from your tables you want to retrieve. This command is not usually use as a stand alone command, it is accompanied with the FROM statement, as usually you need to specify the database and table from which to retrieve the columns; this is the FROM statement, where you can state which table and database use to retrieve the data.

1
2
3
4
SELECT 
  name,
  owner
FROM sqldojo_db.sqldojo_pets_table



SELECT & FROM statement example
SELECT & FROM statement example


To retrieve all the available columns in a table, you can simply use the * character instead of listing all the columns:

1
2
3
SELECT 
  *
FROM sqldojo_db.sqldojo_pets_table

SELECT all columns
SELECT all columns


As mentioned before, SELECT is not usually used as a stand alone command, although you can do it in case you want to select or display data that is not retrieved from a particular table. For example, you can use it to retrieve the current date or to show some number or perform some operation (see video below):

1
2
3
4
SELECT
  1 as constantnumber,
  CURDATE() as currentdate,
  1*5/10 as mathresult;


SELECT Statement as a stand alone command
SELECT Statement as a stand alone command

Is also worth mentioning that any column you want to retrieve can be renamed or given an alias. This is simply done by using the keyword "AS" followed by the new name to give to the column. This alias will not affect the original table, it will only be used to display the results.

Retrieving the data we want is important, but most of the time we won't be retrieving all the rows from a table. We may only want to retrieve certain rows based on a given condition (maybe they meet some certain condition or have an specific value). To do so, we will need the help of the WHERE statement.

WHERE Statements

After defining from which table you will obtain data from, you can add a WHERE statement to filter the data that you are interested in. For example, for the table we created, we may only want to retrieve cats; if that's the case, we have a column called "species" which contains if the pet is either a cat or a dog. The WHERE statement goes after the FROM statement:

1
2
3
4
5
SELECT
    name,
    species
FROM sqldojo_db.sqldojo_pets_table
WHERE species = 'Cat'

WHERE clause example
WHERE clause example

This WHERE clause can be as complex as you can imagine! You can add more rules/filters using some logical operators like AND or OR to meet multiple rules/conditions. For example, maybe we not only want Cats but also MALE CATS. As you may have already figured out, we have the "species" column and also the "sex" column, so we would need both of them to filter the desired results:

1
2
3
4
5
6
SELECT
    name,
    species
FROM sqldojo_db.sqldojo_pets_table
WHERE species = 'Cat'
 AND sex = 'M'

WHERE clause with multiple conditions
WHERE clause with multiple conditions

Selecting the rows and columns we want is useful, but sometimes we may want to take a look at some ordered data. Maybe see the results in alphabetical order, or by age. This can be accomplished with the ORDER BY statement.

ORDER BY Statement

You may find useful to sort the results of your SQL query based on one or multiple columns. For example, we may want to get the records of all the dogs
and shown them starting from the oldest to the youngest one; for this we can simply use the birth date column. Additionally, the ORDER BY employs two sorting option for each column we use to define the ORDER BY; these two sorting option is either ascending (ASC) or descending (DESC), these will basically say "show me the results based on the lowest to biggest value of column X". 

The ORDER BY statement goes at the end of the query (after the SELECT, FROM and WHERE statement). 

ORDER BY Statement
ORDER BY Statement

If no sorting order is explicitly mentioned in the query, it will use the default one (ASC). 

LIMIT Statement

If we want to only obtain certain number of rows from the SQL query, we can use the LIMIT statement. This will basically limit the number of returned records from the SELECT statement. If there are less records than the LIMIT specified, then it will show all the records (as these are less than the limit).

LIMIT Statement
LIMIT Statement


TOP Statement

The TOP statement is very similar to the LIMIT statement, but the syntax differ a little. The TOP stattement also specifies the number of records from top to bottom to return. This statement is placed right after the SELECT statement instead. Unfortunately, MySQL does not support the TOP statement, but here is an example:

1
2
3
4
5
6
SELECT TOP 2
    name,
    species
FROM sqldojo_db.sqldojo_pets_table
WHERE species = 'Cat'
 AND sex = 'M'


GROUP BY Statement

The GROUP BY command is used to aggregate data based on a common attribute. It is very powerful to summarise data and obtain insights. You can see it as a pivotal tool for understanding trends and distributions. The GROUP BY statement is written at the end of the SQL query, and is followed by the columns (attributes) defining the grouping criteria. The grouping criteria can be build with multiple columns.

When we aggregate data, we usually want to obtain a calculation on the set of values that were grouped. For example, maybe we want to group by the type of pet and obtain the total number of each pet; or maybe we want to know the average pet age; or even to know how many pets are for each type and sex (like 20 male cats, 10 female cats, 40 female dogs and 2 male dogs). To obtain these type of calculations, we make use of aggregate functions.

Aggregate Functions

Usually, the GROUP BY Statement coexists with aggregate functions. These functions are functions to consolidate your data and find insights. Up next we will define some of the most common and useful aggregate functions, although it is important to know that each SQL Flavour has different ones and for different data types (e.g., PostgreSQL has a function to aggregate the data in a list or array; this is the ARRAY_AGG function, which is not available in MySQL). Here is a quick overview of these functions, but you can also find multiple resources in the web, such as this one Aggregate Fucntions Cheatsheet.

COUNT Function

The COUNT function returns the number of rows or non-null values within a column in a group from a database table. To count the number of rows, you simply specify the * character.

1
2
3
4
5
SELECT 
  species,
  COUNT(*) AS total
FROM sqldojo_db.sqldojo_pets_table
GROUP BY species

And to COUNT the number of non-null values within a column, you specify the column name inside the function COUNT(column_name). If the column has a NULL value, then this is not counted in the result.

1
2
3
4
5
SELECT 
  species,
  COUNT(sex) AS valid_sex
FROM sqldojo_db.sqldojo_pets_table
GROUP BY species

COUNT Aggregate Function
COUNT Aggregate Function

Depending on the type of data, you may want to get the sum of the values within a column instead of the count or number of rows or non-null values. The next function will help us calculate this.

SUM Function

If we have some dataset with numerical values, for example, the number of vaccines applied to the pet, we may want to obtain the sum or total amount of vaccines applied per species so we can analyse which pet type is vaccinated the most. If there are two dog records, one with 2 vaccines and the other with 3, the COUNT function will result in 2 (only two records with valid column for the dogs), but in this case we will not obtain the total number of vaccines applied to all the dogs (we would expect to obtain 2+3=5). The SUM function comes handy for these cases. Let's add this new column 'vaccines' to the database table and apply this function:

1
2
3
4
5
6
SELECT 
  species,
  COUNT(*) AS total_records,
  SUM(vaccines) AS total_vaccines
FROM sqldojo_db.sqldojo_pets_table
GROUP BY species


SUM Aggregate Function
SUM Aggregate Function


What if, instead of looking for the SUM we may want to obtain the average number of vaccines applied? In this case we make use of the AVG function that we will discuss in the next parragraph.

AVG Function

If the average value is the desired metric to obtain, we have two alternatives: obtain the SUM and then divide by the COUNT, or simply use the built-in function AVG which yields the same result in a simpler way.

Lets say we want to obtain the average number of vaccines applied to each pet species:

1
2
3
4
5
6
7
8
SELECT 
  species,
  COUNT(*) AS total_records,
  SUM(vaccines) AS total_vaccines,
  AVG(vaccines) AS avg_vaccines,
  SUM(vaccines)/COUNT(*) 
FROM sqldojo_db.sqldojo_pets_table
GROUP BY species

AVG Aggregate Function
AVG Aggregate Function

And what if we want to see what the maximum value is or the minimum value is for a group? Maybe obtain the minimum number of vaccines or the maximum (most recent) date of birth? We have to use the MAX and MIN functions.

MAX & MIN Function

a

1
2
3
4
5
6
7
8
SELECT 
  species,
  COUNT(*) AS total_records,
  SUM(vaccines) AS total_vaccines,
  AVG(vaccines) AS avg_vaccines,
  SUM(vaccines)/COUNT(*) 
FROM sqldojo_db.sqldojo_pets_table
GROUP BY species

MAX & MIN Aggregate Functions
MAX & MIN Aggregate Functions

DQL commands are the backbone of data retrieval and manipulation. Understanding how to wield DQL will open doors to a world of insights and possibilities.

SQL Formatting and Execution

It is worth mentioning that formatting is not required in SQL (indentation, line jumps, etc.) Nonetheless, formatting is encouraged for easy readability, debugging and maintainability. Here are some best practices for formatting SQL code:

  • Do not write all the query on one line
  • Capitalize the SQL keywords/commands
  • Use underscore rather than spaces in the field names
  • End queries with a semicolon

 SQL is not executed from top to bottom (in the order the code is written). The order of execution is:

  1. FROM
  2. Selection
  3. Results (limit, order, etc.)

Comments

Popular Posts