Structured Query Language (SQL)

SQL, meaning Structured Query Language, is a programming language used to communicate with a relational database.

It uses simple English sentences that allow to SELECT (find), INSERT (add), UPDATE (change) or DELETE (remove) data.

SQL is a declarative language: in a SQL statement you describe what needs to be done, but not how to do it; the later, as well as the execution, is handled by the Relational DataBase Management System.

SQL Sub-Languages

One can think of SQL as three separate sub-languages which their specificities:

  • DDL: Data Definition Language: define the structure of a database and the tables and columns within it
  • DML: Data Manipulation Language: retrieve or modify data stored in a database (ex: SELECT queries)
  • DCL: Data Control Language: determine what various users are allowed to do when interacting with a database, in other words, the “security” of a database

Source

Data vs Schema

A RDBMS such as PostgreSQL knows how to interpret the syntax of an SQL statement thanks to the role that data and schema plays in a database.

Schema relates to the structure of a database, such as the name of a table and its columns, data types etc.

Data relates to the contents of a database, i.e., the actual values associated with rows and columns in a database table.

The combination of the two is what creates the structured data that we can interact with.

Source

Convention

SQL is case-insensitive, meaning it doesn’t care if its statement is written in lowercase or uppercase. This is not true for data, though.

The convention is to use uppercase for commands and lowercase for names, for clarity.

Database Naming

Database names shall be self-descriptive, no abbreviation nor acronym, and names shall be written in snake_case.

SQL Tables

Tables, sometimes called relations and the relationships between those are what provide structure to our data. Data are saved in the columns of our table.

Tables can be used to represent real world abstractions of business logic, such as customer or an order.

A basic table is create with the following statement:

CREATE TABLE table_name (
  column_1_name column_1_data_type [constraints, ...],
  column_2_name column_2_data_type [constraints, ...],
  -- ...
);

Each line (finished by a coma ,) represents one column of our table. The first part is the name of the column. Except for the convention described above, there is nothing particular to be aware of. The second part is the data type of the column (see below). The third part is one or more constraints (optional).

Data Types

A data type describes the values that is allowed for that column of the table. It’s both for organization and for protection of the database (to refrain from entering invalid type). Commons data types are described below:

Data TypeDescription
serial(Deprecated) Create identifier columns id (non-null auto-incrementing integer) for the database
IDENTITYNew feature replacing serial 1
char(N)Column can only contain a string of N characters in length (remaining filled with space characters)
varchar(N)Same as char(N) without the constrain of remaining string length
booleanBoolean data type (true / false, or t / f in PostgreSQL
integer or INTInteger data type
decimal(precision, scale)Decimal number with a total digits of precision and total digits in the fractional part of scale
timestampDate and tie in YYYY-MM-DD HH:MM:SS format
dateDate without the time

Constraints

Constraints are rules to put on columns in order to define what data values are allowed. While optional, most of the time it is recommended to add them as it helps maintaining the integrity and quality of the data that the column is storing.

Often used constraints include:

ConstraintsDescription
UNIQUEPrevents duplicate value in that column
NOT NULLCannot be left empty
DEFAULT fooIf no value is set at creation time, default value of foo is set in that field

Alter a Table

One can modify a table using the ALTER TABLE statement. The syntax is as follows:

ALTER TABLE
  table_name
  modify_clause
  arguments;

To rename a table, one can use the following:

ALTER TABLE
  foo RENAME TO bar;

To rename a column, one can use the following:

ALTER TABLE
  bar RENAME COLUMN foobar TO barfoo;

Alter a Column’s Datatype

To alter a column’s datatype, one can use ALTER COLUMN in conjunction with ALTER TABLE:

ALTER TABLE
  bar
ALTER COLUMN
  barfoo TYPE char(10);

Adding/Removing Constraints

Constraints are added/removed rather than changed. This is because you can have several constraints in a column.

ALTER TABLE
  bar
ADD
  [CONSTRAINT constraint_name]
  constraint_clause;

constraint_name is optional, as indicated by the brackets.

Depending on the type of constraint, the syntax can vary. This is true for so called “table constraints” and “column constraints”.

The difference is mainly syntactic.

NOT NULL for example is always a column constraint, which is why it has its special command for adding constraint to an existing table:

ALTER TABLE
  bar
ALTER COLUMN
  barfoo
SET
  NOT NULL;

To remove a constraint, we use the following syntax:

ALTER TABLE
  bar DROP CONSTRAINT constraint_name;

To drop the DEFAULT clause on data type serial, we use the following:

ALTER TABLE
  bar
ALTER COLUMN
  id DROP DEFAULT;

Adding/Removing a Column

The syntax is the ADD COLUMN clause to ALTER TABLE statement:

ALTER TABLE
  bar
ADD
  COLUMN last_login timestamp NOT NULL DEFAULT NOW();

Removing a column uses DROP COLUMN:

ALTER TABLE
  bar DROP COLUMN piyo;

Dropping Tables

This dangerous action can be done with the following command:

DROP TABLE bar;

This action is irreversible.

Source

Data

Insertion Statement Syntax

The most basic insertion statement is as follows:

INSERT INTO
  table_name (column1_name, column2_name,...)
VALUES
  (data_for_column1, data_for_column2,...);

Three pieces of information are to be provided:

  • The table name
  • The columns’ names
  • The values to be stored in each columns

Will columns’ names can be omitted under certain conditions, it’s generally best to write both the column and its data to avoid error down the road.

A successful insertion will produce the following result (meaning is in comment):

INSERT 0 1
--     |  \
--    oid  |
--       count

To insert several data sequentially in one command, you can separate each row with a comma ,:

INSERT INTO
  table_name (column1_name, column2_name,...)
VALUES
  (data_for_column1, data_for_column2,...),
  (data_for_column1, data_for_column2,...);

Source

Select Query Syntax

The basic syntax is as follow:

SELECT
  column_name,
  ...
FROM
  table_name
WHERE
  condition
[ORDER BY
  column_name [ASC]];

RDBMS treats everything that is not a keyword (SELECT, FROM etc) as an identifiers. Case does not matter, therefore SELECT and select are treated the same. Generally it is best to avoid using reserved keywords as identifiers, like year. However, if it ’s not possible, using double quote "year" lets the RDBMS knows the statement should be treated specifically as an identifier rather than a keyword.

Comparison Operators

List of often used operators:

OperatorDescription
<less than
>greater than
<=less than or equal to
>=greater than or equal to
=equal
<> or !=not equal

There are also comparison predicates that have a special syntax:

Comparison PredicatesDescription
BETWEENbetween two values
NOT BETWEENnot between two values
IS DISTINCT FROMis distinct from
IS NOT DISTINCT FROMis not distinct from
IS NULLis null
IS NOT NULLis not null

NULL is a special value in SQL. Although it is a value, it cannot be used for selection like WHERE column_name = NULL. Instead, we use WHERE column_name IS NULL.

Logical Operators

There are three:

  • AND
  • OR
  • NOT

There use with WHERE clause are pretty self-explanatory, for example with OR:

SELECT
  column_name
FROM
  table_name
WHERE
  condition
  OR other_condition;

String Matching Operators

String or pattern matching allows to conditionally search fro string patterns. We use LIKE (case-sensitive) and ILIKE (case-insensitive):

SELECT
  column_name
FROM
  table_name
WHERE
  condition LIKE '%foo';

Percentage % is the wildcard character for any number of characters. For one character, we can use underscore _ instead.

There is also a SIMILAR TO that works like LIKE but for regular expression pattern.

LIMIT and OFFSET

SELECT statements are great to return all kind of results. However when there are a lot of results to display, it is generally better to separate the data by “pages”. This is called “pagination”.

For example, to limit the selection to 1 user, we can use the following statement:

SELECT
  *
FROM
  users
LIMIT
  1;

This statement limits the selection to the first row of the table. If we wanted to skip the first result and only return the second row, we can use OFFSET:

SELECT
  *
FROM
  users
LIMIT
  1 OFFSET 1;

This would return the second row, and only the second row

DISTINCT

When joining multiple table, one thing that can often happens is data duplication. The DISTINCT qualifier helps up remediate to this problem.

SELECT
  DISTINCT full_name
FROM
  users;

SQL Functions

Functions performs operation on fields or data and return the result, or perform a transformations.