SQL Functions

I. Aggregate Functions

Aggregate functions perform a calculation on a set of values and return a single value. They are typically used with the GROUP BY clause to summarize or group data.

1. `COUNT()` function

The COUNT function counts the number of rows that match the specified criteria.

-- Count total number of employees
SELECT COUNT(emp_id) 
FROM employees; 

-- Count female employees born after 1970-01-01
SELECT COUNT(emp_id)
FROM employees
WHERE sex = 'F' AND birth_date > '1970-01-01'; 

2. `AVG()` function

The AVG function calculates the average value of a set of numbers.

-- Calculate average salary of male employees
SELECT AVG(salary)
FROM employees
WHERE sex = 'M';

3. `SUM()` function

The SUM function calculates the total sum of a set of values.

-- Calculate total sum of all employee salaries
SELECT SUM(salary)
FROM employees;

4. `MIN()`

The MIN function returns the smallest value in a column.

SELECT MIN(salary) FROM employees;

5. `MAX()`

The MAX function returns the largest value in a column.

SELECT MAX(salary) FROM employees;

II. Other Functions/Clause

1. `GROUP BY`

GROUP BY is a clause that organizes data into groups based on one or more columns, allowing aggregate functions to be applied to each group independently.

-- Group employees by branch and count employees in each branch 
SELECT branch_id, COUNT(*) as employee_count 
FROM employees 
GROUP BY branch_id;

-- Calculate average salary for each department
SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department;

`HAVING`

HAVING is a clause used to specify a search condition for a group or an aggregate. It's often used with GROUP BY clauses to filter the results of aggregate functions.

Example: The query groups employees by department, calculates the average salary for each department, and then only shows departments where the average salary is over 50,000.

SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 50000;

Note

The general rule is: any column in the SELECT clause that is not inside an aggregate function must be included in the GROUP BY clause.

To break this down further:

Columns used with aggregate functions (like COUNT, SUM, AVG, MAX, MIN) do not need to be in the GROUP BY clause.
All other columns in the SELECT statement that are not aggregated must be in the GROUP BY clause.→

Consider this table:

| department | salary |
|------------|--------|
| HR         | 50000  |
| HR         | 60000  |
| IT         | 70000  |
| IT         | 80000  |

Example 1:

SELECT department, AVG(salary)
FROM employees
GROUP BY department

In this query:

department is in the GROUP BY because it's not aggregated
salary is not in the GROUP BY because it's inside the AVG function

Example 2:

SELECT department, AVG(salary) 
FROM employees

→ This query would fail in most SQL databases because department is not aggregated and there's no GROUP BY clause.

Here, you're asking the database to show you the department column values alongside the aggregated AVG(salary) result. However, without a GROUP BY, which department should be shown with the overall average salary? There's no logical way to choose one.

`ROLLUP()`

ROLLUP is a GROUP BY extension that produces a result set containing multiple levels of subtotals, as well as a grand total. It generates all possible subtotal combinations based on the grouping columns specified.

Example:

-- Sales Table:
-- +------+----------+--------+
-- | year | quarter  | amount |
-- +------+----------+--------+
-- | 2023 | Q1       | 100    |
-- | 2023 | Q2       | 150    |
-- | 2023 | Q3       | 200    |
-- | 2023 | Q4       | 180    |
-- | 2024 | Q1       | 120    |
-- | 2024 | Q2       | 160    |
-- +------+----------+--------+

SELECT year, quarter, SUM(amount) as total_sales
FROM sales
GROUP BY ROLLUP(year, quarter);

-- Result:
-- +------+----------+-------------+
-- | year | quarter  | total_sales |
-- +------+----------+-------------+
-- | 2023 | Q1       | 100         |
-- | 2023 | Q2       | 150         |
-- | 2023 | Q3       | 200         |
-- | 2023 | Q4       | 180         |
-- | 2023 | NULL     | 630         | -- Subtotal for 2023
-- | 2024 | Q1       | 120         |
-- | 2024 | Q2       | 160         |
-- | 2024 | NULL     | 280         | -- Subtotal for 2024
-- | NULL | NULL     | 910         | -- Grand Total
-- +------+----------+-------------+

2. `OVER()`

The OVER() clause in SQL is used with window functions to perform calculations across a set of rows that are related to the current row.

The empty OVER() clause means it's applied to the whole result set.

Example:

SUM(sales) OVER(): This calculates the sum of all sales across the entire table. So, total_sales shows 750 for every row, which is the sum of all sales.
COUNT(*) OVER(): This counts the total number of rows in the table. So, product_count shows 5 for every row, which is the total number of products.

-- Sales Table:
-- +----+----------+--------+
-- | id | product   | sales |
-- +----+-----------+-------+
-- | 1  | Apple     | 100   |
-- | 2  | Banana    | 150   |
-- | 3  | Cherry    | 200   |
-- | 4  | Date      | 120   |
-- | 5  | Elderberry| 180   |
-- +----+----------+--------+

SELECT 
    id,
    product,
    sales,
    SUM(sales) OVER() AS total_sales,
    COUNT(*) OVER() AS product_count
FROM 
    sales
ORDER BY 
    id;

-- Result:
-- +----+------------+-------+-------------+---------------+
-- | id | product    | sales | total_sales | product_count |
-- +----+------------+-------+-------------+---------------+
-- | 1  | Apple      | 100   | 750         | 5             |
-- | 2  | Banana     | 150   | 750         | 5             |
-- | 3  | Cherry     | 200   | 750         | 5             |
-- | 4  | Date       | 120   | 750         | 5             |
-- | 5  | Elderberry | 180   | 750         | 5             |
-- +----+------------+-------+-------------+---------------+

3. `EXTRACT()`

The function retrieves a specific part (e.g., year, month) of a single date or timestamp.

-- Table: bookings
--+---------+-------------+-------------+------------------------+
--| id      | memid       | facid       | starttime              |
--+---------+-------------+-------------+------------------------+
--| 1       | 3           | 2           | 2012-07-05 09:00:00    |
--| 2       | 3           | 2           | 2012-09-14 10:30:00    |
--| 3       | 1           | 4           | 2012-08-18 12:00:00    |
--| 4       | 2           | 3           | 2012-11-25 14:45:00    |
--| 5       | 5           | 1           | 2012-10-10 08:15:00    |
--+---------+-------------+-------------+------------------------+

-- Extract year, month, and day from the starttime column of the bookings table
SELECT
    id,
    EXTRACT(YEAR FROM starttime) AS year,     -- Extract the year part
    EXTRACT(MONTH FROM starttime) AS month,   -- Extract the month part
    EXTRACT(DAY FROM starttime) AS day        -- Extract the day part
FROM bookings;

--+------------+------+-------+-----+
--| id         | year | month | day |
--+------------+------+-------+-----+
--| 1          | 2012 | 7     | 5   |
--| 2          | 2012 | 9     | 14  |
--| 3          | 2012 | 8     | 18  |
--| 4          | 2012 | 11    | 25  |
--| 5          | 2012 | 10    | 10  |
--+------------+------+-------+-----+

3. `ROW_NUMBER()`

ROW_NUMBER() is a window function that assigns a unique integer value to each row within a partition of a result set. It starts with 1 for the first row in each partition and increments by 1 for each subsequent row.

Example:

-- Students Table:
-- +----+----------+--------+-------+
-- | id | name     | class  | score |
-- +----+----------+--------+-------+
-- | 1  | Alice    | A      | 85    |
-- | 2  | Bob      | B      | 92    |
-- | 3  | Charlie  | A      | 78    |
-- | 4  | David    | B      | 88    |
-- | 5  | Eve      | A      | 95    |
-- | 6  | Frank    | B      | 75    |
-- +----+----------+--------+-------+

SELECT 
    id,
    name,
    class,
    score,
    ROW_NUMBER() OVER (ORDER BY score DESC) AS overall_rank
FROM 
    students
ORDER BY 
    class, score DESC;

-- Result:
-- +----+---------+-------+-------+--------------+
-- | id | name    | class | score | overall_rank |
-- +----+---------+-------+-------+--------------+
-- | 5  | Eve     | A     | 95    | 1            |           
-- | 1  | Alice   | A     | 85    | 3            |           
-- | 3  | Charlie | A     | 78    | 5            |           
-- | 2  | Bob     | B     | 92    | 2            |           
-- | 4  | David   | B     | 88    | 4            |           
-- | 6  | Frank   | B     | 75    | 6            |      
-- +----+---------+-------+-------+--------------+

4. `RANK()`

The RANK() function is a window function that assigns a rank to each row within a partition of a result set. The rank is determined by the ORDER BY clause within the OVER() clause. Key characteristics:

It assigns the same rank to ties (rows with equal values).
It leaves gaps in the ranking when there are ties.
The first rank is always 1, not 0.

Example

-- Students Table:
-- +----+----------+-------+
-- | id | name     | score |
-- +----+----------+-------+
-- | 1  | Alice    | 85    |
-- | 2  | Bob      | 92    |
-- | 3  | Charlie  | 78    |
-- | 4  | David    | 92    |
-- | 5  | Eve      | 85    |
-- | 6  | Frank    | 78    |
-- +----+----------+-------+

SELECT 
    id,
    name,
    score,
    RANK() OVER (ORDER BY score DESC) AS rank
FROM 
    students
ORDER BY 
    score DESC, name;

-- Result:
-- +----+---------+-------+------+
-- | id | name    | score | rank |
-- +----+---------+-------+------+
-- | 2  | Bob     | 92    | 1    |
-- | 4  | David   | 92    | 1    |
-- | 1  | Alice   | 85    | 3    |
-- | 5  | Eve     | 85    | 3    |
-- | 3  | Charlie | 78    | 5    |
-- | 6  | Frank   | 78    | 5    |
-- +----+---------+-------+------+

I. Aggregate Functions​

1. COUNT() function​

2. AVG() function​

3. SUM() function​

4. MIN()​

5. MAX()​

II. Other Functions/Clause​

1. GROUP BY​

HAVING​

Note​

ROLLUP()​

2. OVER()​

3. EXTRACT()​

3. ROW_NUMBER()​

4. RANK()​