SQL Functions
I. Aggregate Functions
Aggregate functions perform a calculation on a set of values and return a single value. They are typically used with the GROUP BY
clause to summarize or group data.
1. COUNT()
function
The COUNT
function counts the number of rows that match the specified criteria.
-- Count total number of employees
SELECT COUNT(emp_id)
FROM employees;
-- Count female employees born after 1970-01-01
SELECT COUNT(emp_id)
FROM employees
WHERE sex = 'F' AND birth_date > '1970-01-01';
2. AVG()
function
The AVG
function calculates the average value of a set of numbers.
-- Calculate average salary of male employees
SELECT AVG(salary)
FROM employees
WHERE sex = 'M';
3. SUM()
function
The SUM
function calculates the total sum of a set of values.
-- Calculate total sum of all employee salaries
SELECT SUM(salary)
FROM employees;
4. MIN()
The MIN
function returns the smallest value in a column.
SELECT MIN(salary) FROM employees;
5. MAX()
The MAX
function returns the largest value in a column.
SELECT MAX(salary) FROM employees;
II. Other Functions/Clause
1. GROUP BY
GROUP BY
is a clause that organizes data into groups based on one or more columns, allowing aggregate functions to be applied to each group independently.
-- Group employees by branch and count employees in each branch
SELECT branch_id, COUNT(*) as employee_count
FROM employees
GROUP BY branch_id;
-- Calculate average salary for each department
SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department;
HAVING
HAVING
is a clause used to specify a search condition for a group or an aggregate. It's often used with GROUP BY
clauses to filter the results of aggregate functions.
Example: The query groups employees by department, calculates the average salary for each department, and then only shows departments where the average salary is over 50,000.
SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 50000;
Note
The general rule is: any column in the SELECT
clause that is not inside an aggregate function must be included in the GROUP BY
clause.
To break this down further:
- Columns used with aggregate functions (like
COUNT
,SUM
,AVG
,MAX
,MIN
) do not need to be in the GROUP BY clause. - All other columns in the
SELECT
statement that are not aggregated must be in theGROUP BY
clause.→
Consider this table:
| department | salary |
|------------|--------|
| HR | 50000 |
| HR | 60000 |
| IT | 70000 |
| IT | 80000 |
SELECT department, AVG(salary)
FROM employees
GROUP BY department
In this query:
department
is in theGROUP BY
because it's not aggregatedsalary
is not in theGROUP BY
because it's inside theAVG
function
SELECT department, AVG(salary)
FROM employees
→ This query would fail in most SQL databases because department
is not aggregated and there's no GROUP BY
clause.
Here, you're asking the database to show you the department
column values alongside the aggregated AVG(salary)
result. However, without a GROUP BY
, which department
should be shown with the overall average salary? There's no logical way to choose one.
ROLLUP()
ROLLUP
is a GROUP BY
extension that produces a result set containing multiple levels of subtotals, as well as a grand total. It generates all possible subtotal combinations based on the grouping columns specified.
Example:
-- Sales Table:
-- +------+----------+--------+
-- | year | quarter | amount |
-- +------+----------+--------+
-- | 2023 | Q1 | 100 |
-- | 2023 | Q2 | 150 |
-- | 2023 | Q3 | 200 |
-- | 2023 | Q4 | 180 |
-- | 2024 | Q1 | 120 |
-- | 2024 | Q2 | 160 |
-- +------+----------+--------+
SELECT year, quarter, SUM(amount) as total_sales
FROM sales
GROUP BY ROLLUP(year, quarter);
-- Result:
-- +------+----------+-------------+
-- | year | quarter | total_sales |
-- +------+----------+-------------+
-- | 2023 | Q1 | 100 |
-- | 2023 | Q2 | 150 |
-- | 2023 | Q3 | 200 |
-- | 2023 | Q4 | 180 |
-- | 2023 | NULL | 630 | -- Subtotal for 2023
-- | 2024 | Q1 | 120 |
-- | 2024 | Q2 | 160 |
-- | 2024 | NULL | 280 | -- Subtotal for 2024
-- | NULL | NULL | 910 | -- Grand Total
-- +------+----------+-------------+
2. OVER()
The OVER()
clause in SQL is used with window functions to perform calculations across a set of rows that are related to the current row.
Example:The empty
OVER()
clause means it's applied to the whole result set.
SUM(sales) OVER()
: This calculates the sum of all sales across the entire table. So,total_sales
shows 750 for every row, which is the sum of all sales.COUNT(*) OVER()
: This counts the total number of rows in the table. So,product_count
shows 5 for every row, which is the total number of products.
-- Sales Table:
-- +----+----------+--------+
-- | id | product | sales |
-- +----+-----------+-------+
-- | 1 | Apple | 100 |
-- | 2 | Banana | 150 |
-- | 3 | Cherry | 200 |
-- | 4 | Date | 120 |
-- | 5 | Elderberry| 180 |
-- +----+----------+--------+
SELECT
id,
product,
sales,
SUM(sales) OVER() AS total_sales,
COUNT(*) OVER() AS product_count
FROM
sales
ORDER BY
id;
-- Result:
-- +----+------------+-------+-------------+---------------+
-- | id | product | sales | total_sales | product_count |
-- +----+------------+-------+-------------+---------------+
-- | 1 | Apple | 100 | 750 | 5 |
-- | 2 | Banana | 150 | 750 | 5 |
-- | 3 | Cherry | 200 | 750 | 5 |
-- | 4 | Date | 120 | 750 | 5 |
-- | 5 | Elderberry | 180 | 750 | 5 |
-- +----+------------+-------+-------------+---------------+
3. EXTRACT()
The function retrieves a specific part (e.g., year, month) of a single date or timestamp.
-- Table: bookings
--+---------+-------------+-------------+------------------------+
--| id | memid | facid | starttime |
--+---------+-------------+-------------+------------------------+
--| 1 | 3 | 2 | 2012-07-05 09:00:00 |
--| 2 | 3 | 2 | 2012-09-14 10:30:00 |
--| 3 | 1 | 4 | 2012-08-18 12:00:00 |
--| 4 | 2 | 3 | 2012-11-25 14:45:00 |
--| 5 | 5 | 1 | 2012-10-10 08:15:00 |
--+---------+-------------+-------------+------------------------+
-- Extract year, month, and day from the starttime column of the bookings table
SELECT
id,
EXTRACT(YEAR FROM starttime) AS year, -- Extract the year part
EXTRACT(MONTH FROM starttime) AS month, -- Extract the month part
EXTRACT(DAY FROM starttime) AS day -- Extract the day part
FROM bookings;
--+------------+------+-------+-----+
--| id | year | month | day |
--+------------+------+-------+-----+
--| 1 | 2012 | 7 | 5 |
--| 2 | 2012 | 9 | 14 |
--| 3 | 2012 | 8 | 18 |
--| 4 | 2012 | 11 | 25 |
--| 5 | 2012 | 10 | 10 |
--+------------+------+-------+-----+
3. ROW_NUMBER()
ROW_NUMBER()
is a window function that assigns a unique integer value to each row within a partition of a result set. It starts with 1 for the first row in each partition and increments by 1 for each subsequent row.
-- Students Table:
-- +----+----------+--------+-------+
-- | id | name | class | score |
-- +----+----------+--------+-------+
-- | 1 | Alice | A | 85 |
-- | 2 | Bob | B | 92 |
-- | 3 | Charlie | A | 78 |
-- | 4 | David | B | 88 |
-- | 5 | Eve | A | 95 |
-- | 6 | Frank | B | 75 |
-- +----+----------+--------+-------+
SELECT
id,
name,
class,
score,
ROW_NUMBER() OVER (ORDER BY score DESC) AS overall_rank
FROM
students
ORDER BY
class, score DESC;
-- Result:
-- +----+---------+-------+-------+--------------+
-- | id | name | class | score | overall_rank |
-- +----+---------+-------+-------+--------------+
-- | 5 | Eve | A | 95 | 1 |
-- | 1 | Alice | A | 85 | 3 |
-- | 3 | Charlie | A | 78 | 5 |
-- | 2 | Bob | B | 92 | 2 |
-- | 4 | David | B | 88 | 4 |
-- | 6 | Frank | B | 75 | 6 |
-- +----+---------+-------+-------+--------------+
4. RANK()
The RANK()
function is a window function that assigns a rank to each row within a partition of a result set. The rank is determined by the ORDER BY
clause within the OVER()
clause.
Key characteristics:
- It assigns the same rank to ties (rows with equal values).
- It leaves gaps in the ranking when there are ties.
- The first rank is always 1, not 0.
-- Students Table:
-- +----+----------+-------+
-- | id | name | score |
-- +----+----------+-------+
-- | 1 | Alice | 85 |
-- | 2 | Bob | 92 |
-- | 3 | Charlie | 78 |
-- | 4 | David | 92 |
-- | 5 | Eve | 85 |
-- | 6 | Frank | 78 |
-- +----+----------+-------+
SELECT
id,
name,
score,
RANK() OVER (ORDER BY score DESC) AS rank
FROM
students
ORDER BY
score DESC, name;
-- Result:
-- +----+---------+-------+------+
-- | id | name | score | rank |
-- +----+---------+-------+------+
-- | 2 | Bob | 92 | 1 |
-- | 4 | David | 92 | 1 |
-- | 1 | Alice | 85 | 3 |
-- | 5 | Eve | 85 | 3 |
-- | 3 | Charlie | 78 | 5 |
-- | 6 | Frank | 78 | 5 |
-- +----+---------+-------+------+