SQL aggregate functions. SQL query language. SELECT Statement: Advanced Features



  • Aggregate functions are used like the field names in the SELECT statement, with one exception: they take the field name as an argument. With functions SUM and AVG only numeric fields can be used. With functions COUNT, MAX and MIN both numeric and character fields can be used. When used with character fields MAX and MIN will translate them into the ASCII equivalent and process them in alphabetical order. Some DBMSs allow the use of nested aggregates, but this is a deviation from the ANSI standard with all the ensuing consequences.


For example, you can calculate the number of students who passed exams in each discipline. To do this, you need to run a query grouped by the "Discipline" field and display as a result the name of the discipline and the number of rows in the group for this discipline. Using the * character as an argument to the COUNT function means counting all the lines in the group.

SELECT R1. Discipline, COUNT (*)

GROUP BY R1. Discipline;

Result:


SELECT R1 Discipline, COUNT (*)

WHERE R1. IS NOT NULL score

GROUP BY R1. Discipline;

Result:


will not be included in the set of tuples before the grouping, so the number of tuples in the group for the discipline "Information Theory" will be 1 less.

A similar result can be obtained if you write the request in the following way:

SELECT R1. Discipline, COUNT (R1. Assessment)

GROUP BY R1. Discipline;

Function COUNT (ATTRIBUTE NAME) counts the number of defined values \u200b\u200bin a group, as opposed to a function COUNT (*), which counts the number of lines in the group. Indeed, in the group with the discipline "Information Theory" there will be 4 lines, but only 3 specific values \u200b\u200bof the attribute "Assessment".


Rules for handling null values \u200b\u200bin aggregate functions

If any values \u200b\u200bin the column are equal NULL they are excluded when calculating the function result.

If all values \u200b\u200bin a column are equal NULL then Max Min Sum Avg \u003d NULL, count \u003d 0 (zero).

If the table is empty, count (*) \u003d 0 .

Aggregate functions can also be used without the preliminary grouping operation, in this case the whole relation is considered as one group and one value per group can be calculated for this group.

Rules for interpreting aggregate functions

Aggregate functions can be included in the output list and then they are applied to the entire table.

SELECT MAX (Score) from R1 will give the maximum score at the session;

SELECT SUM (Score) from R1 will give the sum of all assessments per session;

SELECT AVG (Score) from R1 will give an average score for the entire session.


2; Result: "width \u003d" 640 "

Referring again to the "Session" database (tables R1), we find the number of successfully passed exams:

SELECT COUNT (*) As Rented _ exams

WHERE Grade 2;

Result:


The argument of aggregate functions can be separate columns of tables. In order to calculate, for example, the number of distinct values \u200b\u200bof a certain column in a group, it is necessary to use the DISTINCT keyword together with the column name. Let's calculate the number of different grades received for each discipline:

SELECT R1 Discipline, COUNT (DISTINCT R1 Evaluation)

WHERE R1. IS NOT NULL score

GROUP BY R1. Discipline;

Result:


The same result is obtained if you exclude the explicit condition in the WHERE part, in which case the query will look like this:

SELECT R1. Discipline, COUNT (DISTINCT R1. Assessment)

GROUP BY R1. Discipline;

Function COUNT (DISTINCT R1.Evaluation) considers only certain various values.

In order for the desired result to be obtained in this case too, it is necessary to perform a preliminary transformation of the data type of the "Score" column, bringing it to a real type, then the result of calculating the average will not be an integer. In this case, the request will look like this:


2 Group by R2. Group, R1. Discipline; Here, the CAST () function is converting the Score column to a valid data type. "width \u003d" 640 "

Select R2.Group, R1.Discipline, Count (*) as Total, AVG (cast (Score as decimal (3,1))) as Average_point

From R1, R2

where R1. Full name \u003d R2. Name and R1. score is not null

and R1. Grade 2

Group by R2. Group, R1. Discipline;

Here the function CAST () converts the "Score" column to a valid data type.


You cannot use aggregate functions in the WHERE clause because the conditions in this section are evaluated in terms of a single row, and aggregate functions are evaluated in terms of groups of rows.

The GROUP BY clause allows you to define a subset of the values \u200b\u200bin a particular field in terms of another field and apply an aggregate function to the subset. This makes it possible to combine fields and aggregate functions in a single SELECT clause. Aggregate functions can be used both in the expression for outputting the results of the SELECT row, and in the expression for processing the generated HAVING groups. In this case, each aggregate function is calculated for each selected group. The values \u200b\u200bobtained when calculating aggregate functions can be used to display the corresponding results or for the selection condition of groups.

Let's build a query that displays the groups in which more than one two was received in one discipline in exams:


one; Result: "width \u003d" 640 "

SELECT R2. Group

FROM R1, R2

WHERE R1. Full name \u003d R2. Name AND

R1 Score \u003d 2

GROUP BY R2. Group, R1. Discipline

HAVING count (*) 1;

Result:


We have a DB "Bank", consisting of one table F, which stores the relation F, containing information about accounts in the branches of a certain bank:

Find the total account balance in branches. It is possible to make a separate query for each of them by selecting SUM from the table for each branch, but the GROUP BY operation will put them all in one command:

SELECT Branch , SUM ( The remainder )

GROUP BY Branch;

GROUP BY applies aggregate functions independently for each group identified by the value of the Branch field. The group consists of lines with the same value of the field Branch, and the function SUM applied separately for each such group, i.e. the total account balance is calculated separately for each branch. The value of the field to which it applies GROUP BY , has, by definition, only one value per output group, as does the result of an aggregate function.


5,000; Arguments in the HAVING clause follow the same rules as in the SELECT clause, which uses GROUP BY. They must have one value per output group. "width \u003d" 640 "

Let's assume that you select only those branches with total account balances exceeding $ 5,000, as well as total balances for the selected branches. To display branches with total balances over $ 5,000, you must use the HAVING clause. The HAVING clause defines the criteria used to remove specific groups from the output, just as the WHERE clause does for individual rows.

The correct command would be:

SELECT Branch, SUM (Balance)

GROUP BY Branch

HAVING SUM ( The remainder ) 5 000;

Arguments in a sentence HAVING obey the same rules as in the sentence SELECT where is used GROUP BY ... They must have one value per output group.


The following command will be prohibited:

SELECT Branch, SUM (Balance)

GROUP BY Branch

HAVING Opened Date \u003d 12/27/2004 ;

Field Opening date cannot be used in a sentence HAVING because it can have more than one value per output group. To avoid this situation, the proposal HAVING should only refer to aggregates and fields selected GROUP BY ... There is a correct way to make the above query:

SELECT Branch, SUM (Balance)

WHERE Opening Date \u003d '27 / 12/2004 '

GROUP BY Branch;


The meaning of this query is as follows: find the amount of balances for each branch of accounts opened on December 27, 2004.

As stated earlier, HAVING can only use arguments that have the same value per output group. In practice, references to aggregate functions are the most common, but fields selected with GROUP BY are also valid. For example, we want to see the total account balances of branches in St. Petersburg, Pskov and Uryupinsk:

SELECT Branch, SUM (Balance)

FROM F, Q

WHERE F. Branch \u003d Q. Branch

GROUP BY Branch

HAVING IN Branch (‘St. Petersburg’, ‘Pskov’, ‘Uryupinsk’);

100,000; If the total balance is more than $ 100,000, then we will see it in the resulting ratio, otherwise we will get an empty ratio. "width \u003d" 640 "

Therefore, in the arithmetic expressions of predicates included in the selection condition of the HAVING clause, you can directly use only the specifications of the columns specified as grouping columns in the GROUP BY clause. The rest of the columns can be specified only within the specifications of the aggregate functions COUNT, SUM, AVG, MIN and MAX, which in this case calculate some aggregate value for the entire group of rows. The result of the HAVING clause is a grouped table containing only those row groups for which the result of evaluating the selection condition in the HAVING part is TRUE. In particular, if the HAVING clause is present in a query that does not contain GROUP BY, then the result of its execution will be either an empty table, or the result of the previous sections of the table expression, considered as a single group without grouping columns. Let's look at an example. Let's say we want to display the total balances for all branches, but only if it is more than $ 100,000. In this case, our query will not contain grouping operations, but will contain a HAVING section and will look like this:

SELECT SUM ( The remainder )

HAVING SUM ( The remainder ) 100 000;

If the total balance is more than $ 100,000, then we will see it in the resulting ratio, otherwise we will get an empty ratio.


GROUP BY clause (SELECT statement) allows you to group data (rows) by the value of a column or multiple columns or expressions. The result will be a set of summary rows.

Each column in the selection list must be present in the GROUP BY clause, the only exceptions are constants and columns - operands of aggregate functions.

A table can be grouped by any combination of its columns.

Aggregate functions are used to get a single total value from a group of rows. All aggregate functions perform calculations on a single argument, which can be either a column or an expression. Any aggregate function evaluates to a constant value that is displayed in a separate column in the result.

Aggregate functions are specified in the column list of a SELECT statement, which can also contain a GROUP BY clause. If the SELECT statement does not contain a GROUP BY clause, and the selection column list contains at least one aggregate function, then it must not contain simple columns. On the other hand, a select-column list can contain column names that are not arguments to the aggregate function if those columns are used as arguments to the GROUP BY clause.

If the query contains a WHERE clause, then aggregate functions calculate a value for the selection results.

Aggregate functions MIN and MAX the smallest and largest column values \u200b\u200bare calculated, respectively. Arguments can be numbers, strings, and dates. All NULL values \u200b\u200bare removed prior to calculation (i.e., not taken into account).

SUM aggregate function calculates the total sum of the column values. Only numbers can be arguments. Using the DISTINCT parameter removes any duplicate values \u200b\u200bin the column before applying the SUM function. Likewise, remove all null values \u200b\u200bbefore applying this aggregate function.

Aggregate function AVG returns the average of all values \u200b\u200bin a column. Arguments can also only be numbers, and all NULL values \u200b\u200bare removed before evaluation.

Aggregate function COUNT has two different forms:

  • COUNT (col_name) - counts the number of values \u200b\u200bin the col_name column, NULL values \u200b\u200bare not counted
  • COUNT (*) - counts the number of rows in the table, NULL values \u200b\u200bare also taken into account

If the query uses the DISTINCT keyword, all duplicate column values \u200b\u200bare removed before the COUNT function is used.

COUNT_BIG function same function as COUNT. The only difference between the two is the type of result they return: COUNT_BIG always returns BIGINT values, while COUNT returns INTEGER data values.

IN hAVING offer defines a condition that is applied to a group of rows. It has the same meaning for row groups as the WHERE clause for the contents of the corresponding table (WHERE applies before the grouping, HAVING after).

How can I find out the number of PC models produced by a particular vendor? How to determine the average price for computers with the same specifications? These and many other questions related to some statistical information can be answered using summary (aggregate) functions... The standard provides for the following aggregate functions:

All of these functions return a single value. The functions COUNT, MIN and MAX are applicable to any data type, while SUM and AVG are used only for numeric fields. Difference between function COUNT (*) and COUNT (<имя поля>) is that the second does not take NULL values \u200b\u200binto account when calculating.

Example. Find the minimum and maximum price for personal computers:

Example. Find the number of computers available from manufacturer A:

Example. If we are interested in the number of different models produced by manufacturer A, then the query can be formulated as follows (using the fact that in the Product table each model is recorded once):

Example. Find the number of different models available from manufacturer A. The query is similar to the previous one, in which it was required to determine the total number of models produced by manufacturer A. Here you also need to find the number of different models in the PC table (i.e. available for sale).

To ensure that only unique values \u200b\u200bare used when obtaining statistical indicators, when argument of aggregate functions can be used dISTINCT parameter... Other parameter ALL is the default and assumes that all return values \u200b\u200bin the column are counted. Operator,

If we need to get the number of PC models produced every manufacturer, you will need to use gROUP BY clausesyntactically following wHERE clauses.

GROUP BY clause

GROUP BY clause used to define groups of output lines to which can be applied aggregate functions (COUNT, MIN, MAX, AVG and SUM)... If this clause is missing and aggregate functions are used, then all columns with the names mentioned in SELECTshould be included in aggregate functionsand these functions will be applied to the entire set of rows that satisfy the query predicate. Otherwise, all columns of the SELECT list, not included in aggregate functions, must be specified in GROUP BY clause... As a result, all output lines of the query are divided into groups characterized by the same combinations of values \u200b\u200bin these columns. After that, aggregate functions will be applied to each group. Note that for GROUP BY all NULL values \u200b\u200bare treated as equal, i.e. when grouped by a field containing NULL values, all such rows will fall into one group.
If with GROUP BY clause, in the SELECT clause no aggregate functions, then the query will simply return one row from each group. This feature, along with the DISTINCT keyword, can be used to eliminate duplicate rows in the result set.
Let's look at a simple example:
SELECT model, COUNT (model) AS Qty_model, AVG (price) AS Avg_price
FROM PC
GROUP BY model;

In this request, for each PC model, their number and average cost are determined. All rows with the same model values \u200b\u200bform a group, and the SELECT output calculates the number of values \u200b\u200band the average price values \u200b\u200bfor each group. The query will result in the following table:
model Qty_model Avg_price
1121 3 850.0
1232 4 425.0
1233 3 843.33333333333337
1260 1 350.0

If a column with a date was present in the SELECT, then it would be possible to calculate these indicators for each specific date. To do this, you need to add the date as a grouping column, and then the aggregate functions would be calculated for each combination of values \u200b\u200b(model-date).

There are several specific rules for performing aggregate functions:

  • If as a result of the query no lines received (or more than one row for this group), then the initial data for calculating any of the aggregate functions is missing. In this case, the result of the COUNT functions is zero, and the result of all other functions is NULL.
  • Argument aggregate function cannot itself contain aggregate functions (function from function). Those. in one query, you cannot, say, get the maximum average values.
  • The result of the COUNT function is integer (INTEGER). Other aggregate functions inherit the data types of the values \u200b\u200bbeing processed.
  • If, when executing the SUM function, a result was obtained that exceeds the maximum value of the data type used, error.

So if the request does not contain gROUP BY clausesthen aggregate functionsincluded in sELECT clause, are executed on all resulting query lines. If the request contains gROUP BY clause, every rowset that has the same column or column group values \u200b\u200bspecified in gROUP BY clause, constitutes a group, and aggregate functions are performed for each group separately.

HAVING clause

If wHERE clause defines a predicate for filtering rows, then hAVING offer applied after grouping to define a similar predicate filtering groups by values aggregate functions... This clause is needed to test the values \u200b\u200bobtained with aggregate function not from separate lines of the record source defined in fROM clause, and from groups of such lines... Therefore, such a check cannot be contained in wHERE clause.

Hello! Today we will get acquainted with the aggregate functions in SQL, we will analyze in detail their work with data from the tables that we created in the previous lessons.

General concept

In the last tutorial on, we got acquainted with how data queries are built. Aggregate functions exist in order to be able to generalize the received data in some way, that is, to manipulate them the way we want.

These functions are performed using keywords that are included in the SELECT query, and how they are written will be described later. To be clear, here are some of the capabilities of aggregate functions in SQL:

  • Sum selected values
  • Find the arithmetic mean of values
  • Find the minimum and maximum of values

Examples of SQL Aggregate Functions

We will go over the most commonly used functions and provide a few examples.

SUM function

This function allows you to sum the values \u200b\u200bof a field in a SELECT query. Quite a useful function, the syntax of which is quite simple, like all other aggregate functions in SQL. For understanding, let's start right away with an example:

Get the sum of all orders from the Orders table that were completed in 2016.

It would be possible to simply withdraw the amount of orders, but it seems to me that this is quite simple. Let's recall the structure of our table:

onumamtodatecnumsnum
1001 128 2016-01-01 9 4
1002 1800 2016-04-10 10 7
1003 348 2017-04-08 2 1
1004 500 2016-06-07 3 3
1005 499 2017-12-04 5 4
1006 320 2016-03-03 5 4
1007 80 2017-09-02 7 1
1008 780 2016-03-07 1 3
1009 560 2017-10-07 3 7
1010 900 2016-01-08 6 8

The following code will perform the desired selection:

SELECT SUM (amt) FROM Orders WHERE odate BETWEEN "2016-01-01" and "2016-12-31";

As a result, we get:

SUM (amt)
4428

In this request, we used the SUM function, after which in parentheses you need to indicate the field for summing. Then we specified the condition in WHERE, which selected lines only from 2016. In fact, this condition can be written differently, but now the aggregate sum function in SQL is more important.

AVG function

The next function calculates the arithmetic mean of the data field, which we will specify as a parameter. The syntax for this function is identical to the sum function. Therefore, let's go straight to the simplest task:

Display the average order value from the Orders table.

And immediately the request:

SELECT AVG (amt) FROM Orders;

As a result, we get:

It's also worth saying that, unlike the previous functions, these 2 can work with character parameters, that is, you can write a query like MIN (odate) (in this case, we have a symbolic date), and then 2016-01-01 will be returned to us.

The fact is that these functions have a mechanism for converting characters into ASCII code, which they then compare.

Another important point is that we can perform some simple math operations in a SELECT query, for example, a query like this:

SELECT (MAX (amt) - MIN (amt)) AS "Difference" FROM Orders;

Will return a response like this:

Obviously, the number of orders is 10, but if you suddenly have a large table, then this function will be very convenient. For unique sellers, DISTINCT should be used here, because one seller can serve multiple orders.

GROUP BY statement

Now let's look at 2 important operators that help extend the functionality of our SQL queries. The first of these is the GROUP BY clause, which groups by any field, which is sometimes necessary. And already for this group it performs the specified action. For example:

Display the sum of all orders for each seller separately.

That is, now we need to select fields with the order price for each seller in the Orders table and sum them up. All this will make the GROUP BY clause in SQL quite easy:

SELECT snum, SUM (amt) AS "Sum of all orders" FROM Orders GROUP BY snum;

And in the end we get:

snumSum of all orders
1 428
3 1280
4 947
7 2360
8 900

As you can see, SQL has allocated a group for each salesperson and calculated the sum of all their orders.

HAVING operator

This operator is used as an addition to the previous one. It is necessary in order to set conditions for data selection during grouping. If the condition is met, then the group is selected, if not, then nothing will happen. Consider the following code:

SELECT snum, SUM (amt) AS "Sum of all orders" FROM Orders GROUP BY snum HAVING MAX (amt)\u003e 1000;

Which will create a group for the seller and calculate the amount of orders in this group, only if the maximum order amount is more than 1000. Obviously, there is only one such seller, a group will be selected for him and the sum of all orders will be calculated:

snumSum of all orders
7 2360

It would seem why not use the WHERE clause here, but SQL is built in such a way that in this case it will give an error, and that is why SQL has a HAVING operator.

Examples for aggregate functions in SQL

1. Write a query that would count all the orders completed on January 1, 2016.

SELECT SUM (amt) FROM Orders WHERE odate \u003d "2016-01-01";

2. Write a query that counts the number of distinct non-null values \u200b\u200bfor city in the customer table.

SELECT COUNT (DISTINCT city) FROM customers;

3. Write a request that selects the smallest amount for each customer.

SELECT cnum, MIN (amt) FROM orders GROUP BY cnum;

4. Write a query that would select customers whose names begin with the letter D.

SELECT cname FROM customers WHERE cname LIKE "Г%";

5. Write a query that would select the highest ranking in each city.

SELECT city, MAX (rating) FROM customers GROUP BY city;

Conclusion

On this we will end. In this article, we have learned about aggregate functions in SQL. We analyzed the basic concepts and basic examples that may come in handy further.

If you have any questions, then ask them in the comments.

The following subsections describe other SELECT clauses that can be used in queries, as well as aggregate functions and statement sets. Let me remind you that so far we have covered the use of the WHERE clause, and in this article we will look at the GROUP BY, ORDER BY, and HAVING clauses, and provide some examples of using these clauses in combination with the aggregate functions that are supported in Transact-SQL.

GROUP BY clause

Sentence GROUP BY groups a selected rowset to obtain a summary rowset based on the values \u200b\u200bof one or more columns or expressions. A simple use case for the GROUP BY clause is shown in the example below:

USE SampleDb; SELECT Job FROM Works_On GROUP BY Job;

In this example, employee positions are sampled and grouped.

In the example above, the GROUP BY clause creates a separate group for all possible values \u200b\u200b(including NULL) for the Job column.

The use of columns in the GROUP BY clause must meet certain conditions. In particular, each column in the query's fetch list must also appear in the GROUP BY clause. This requirement does not apply to constants and columns that are part of an aggregate function. (Aggregate functions are discussed in the next subsection.) This makes sense because only columns in the GROUP BY clause are guaranteed one value per group.

A table can be grouped by any combination of its columns. The example below demonstrates the grouping of rows in the Works_on table by two columns:

USE SampleDb; SELECT ProjectNumber, Job FROM Works_On GROUP BY ProjectNumber, Job;

The result of this query:

From the query results, you can see that there are nine groups with different combinations of project number and title. The sequence of column names in the GROUP BY clause does not have to be the same as in the SELECT column list.

Aggregate functions

Aggregate functions are used to get sum values. All aggregate functions can be divided into the following categories:

    ordinary aggregate functions;

    statistical aggregate functions;

    user-defined aggregate functions;

    analytical aggregate functions.

Here we will look at the first three types of aggregate functions.

Regular aggregate functions

Transact-SQL supports the following six aggregate functions: MIN, MAX, SUM, AVG, COUNT, COUNT_BIG.

All aggregate functions perform calculations on a single argument, which can be either a column or an expression. (The only exception is the second form of the two functions, COUNT and COUNT_BIG, namely COUNT (*) and COUNT_BIG (*), respectively.) Any aggregate function evaluates to a constant value that appears in a separate result column.

Aggregate functions are specified in the column list of a SELECT statement, which can also contain a GROUP BY clause. If the SELECT statement does not contain a GROUP BY clause, and the list of select columns contains at least one aggregate function, then it must not contain simple columns (other than columns used as arguments to the aggregate function). Therefore, the code in the example below is incorrect:

USE SampleDb; SELECT LastName, MIN (Id) FROM Employee;

Here, the LastName column of the Employee table should not be in the select column list because it is not an argument to the aggregate function. On the other hand, a select list of columns can contain column names that are not arguments to the aggregate function if those columns are used as arguments to the GROUP BY clause.

An aggregate function argument can be preceded by one of two possible keywords:

ALL

Indicates that calculations are performed on all values \u200b\u200bin the column. This is the default.

DISTINCT

Specifies that only unique column values \u200b\u200bare used for calculations.

Aggregate functions MIN and MAX

The aggregate functions MIN and MAX calculate the smallest and largest values \u200b\u200bin a column, respectively. If the query contains a WHERE clause, the MIN and MAX functions return the smallest and largest values \u200b\u200bfor rows that meet the specified conditions. The example below shows the use of the MIN aggregate function:

USE SampleDb; - Returns 2581 SELECT MIN (Id) AS "Minimum Id value" FROM Employee;

The result returned in the example above is not very informative. For example, the surname of the employee who owns this number is unknown. But it is not possible to get this last name in the usual way, because, as mentioned earlier, it is not allowed to explicitly specify the LastName column. In order to get the surname of this employee together with the lowest personnel number of the employee, a subquery is used. The example below demonstrates the use of such a subquery, where the subquery contains the SELECT statement from the previous example:

The result of executing the query:

The use of the MAX aggregate function is shown in the example below:

MIN and MAX can also accept strings and dates as arguments. In the case of a string argument, the values \u200b\u200bare compared using the actual sort order. For all temporary date arguments, the lowest column value is the earliest date, and the highest is the latest.

The DISTINCT keyword can be used with the MIN and MAX functions. All NULL values \u200b\u200bare removed from their argument columns before the aggregate functions MIN and MAX are applied.

SUM aggregate function

Aggregate sUM function calculates the total sum of the column values. The argument to this aggregate function must always be of a numeric data type. The use of the SUM aggregate function is shown in the example below:

USE SampleDb; SELECT SUM (Budget) "Total budget" FROM Project;

This example calculates the total budgets for all projects. The result of executing the query:

In this example, the aggregate function groups all of the project budget values \u200b\u200band determines their total. For this reason, the query contains an implicit grouping function (like all similar queries). The implicit grouping function from the example above can be specified explicitly, as shown in the example below:

USE SampleDb; SELECT SUM (Budget) "Total budget" FROM Project GROUP BY ();

Using the DISTINCT parameter removes any duplicate values \u200b\u200bin the column before applying the SUM function. Likewise, remove all null values \u200b\u200bbefore applying this aggregate function.

Aggregate function AVG

Aggregate aVG function returns the arithmetic mean of all values \u200b\u200bin a column. The argument to this aggregate function must always be of a numeric data type. All NULL values \u200b\u200bare removed from the argument before the AVG function is applied.

The use of the AVG aggregate function is shown in the example below:

USE SampleDb; - Returns 133833 SELECT AVG (Budget) "Average budget for the project" FROM Project;

This is where the arithmetic average of the budget is calculated for all budgets.

Aggregate functions COUNT and COUNT_BIG

Aggregate cOUNT function has two different forms:

COUNT (col_name) COUNT (*)

The first form of the function counts the number of values \u200b\u200bin the col_name column. If the query uses the DISTINCT keyword, all duplicate column values \u200b\u200bare removed before the COUNT function is used. This form of the COUNT function does not take NULL values \u200b\u200binto account when counting the number of column values.

The use of the first form of the COUNT aggregate function is shown in the example below:

USE SampleDb; SELECT ProjectNumber, COUNT (DISTINCT Job) "Jobs in Project" FROM Works_on GROUP BY ProjectNumber;

This is where the number of different positions is counted for each project. The result of this query:

As you can see from the example query, NULL values \u200b\u200bwere not taken into account by the COUNT function. (The sum of all values \u200b\u200bin the job column is 7, not 11 as it should be.)

The second form of the COUNT function, i.e. the COUNT (*) function counts the number of rows in a table. And if the SELECT statement of the query with the COUNT (*) function contains a WHERE clause with a condition, the function returns the number of rows that satisfy the specified condition. Unlike the first version of the COUNT function, the second form does not ignore NULL values, since this function operates on strings, not columns. The example below demonstrates the use of the COUNT (*) function:

USE SampleDb; SELECT Job AS "Job type", COUNT (*) "Workers needed" FROM Works_on GROUP BY Job;

This is where the number of posts in all projects is counted. The result of executing the query:

COUNT_BIG function same function as COUNT. The only difference between the two is the type of result they return: COUNT_BIG always returns BIGINT values, while COUNT returns INTEGER data values.

Statistical aggregate functions

The following functions make up the group of statistical aggregate functions:

VAR

Calculates the statistical variance of all values \u200b\u200bin a column or expression.

VARP

Calculates the statistical variance of the collection of all values \u200b\u200bin a column or expression.

STDEV

Calculates the standard deviation (which is calculated as the square root of the corresponding variance) of all values \u200b\u200bin a column or expression.

STDEVP

Calculates the standard deviation of the collection of all values \u200b\u200bin a column or expression.

User-defined aggregate functions

The Database Engine also supports the implementation of user-defined functions. This capability allows users to supplement the system aggregate functions with functions that they can implement and install themselves. These functions represent a special class of user-defined functions and are discussed in detail later.

HAVING clause

In a sentence HAVING defines a condition that is applied to a group of rows. Thus, this clause has the same meaning for row groups as the WHERE clause for the contents of the corresponding table. The syntax for the HAVING clause is as follows:

HAVING condition

Here the condition parameter represents a condition and contains aggregate functions or constants.

The use of the HAVING clause in conjunction with the COUNT (*) aggregate function is shown in the example below:

USE SampleDb; - Returns "p3" SELECT ProjectNumber FROM Works_on GROUP BY ProjectNumber HAVING COUNT (*)

In this example, the system uses the GROUP BY clause to group all rows by the values \u200b\u200bin the ProjectNumber column. After that, the number of rows in each group is counted and groups containing less than four rows (three or fewer) are selected.

The HAVING clause can also be used without aggregate functions, as shown in the example below:

USE SampleDb; - Will return "Consultant" SELECT Job FROM Works_on GROUP BY Job HAVING Job LIKE "К%";

This example groups the rows in the Works_on table by job title and eliminates jobs that do not start with the letter "K".

The HAVING clause can also be used without the GROUP BY clause, although this is not common practice. In this case, all rows of the table are returned in the same group.

ORDER BY clause

Sentence ORDER BY determines the sort order of the rows in the result set returned by the query. This sentence has the following syntax:

The sort order is specified in the col_name parameter. Col_number is an alternate sort order indicator that identifies columns in the order they appear in the select list of the SELECT statement (1 is the first column, 2 is the second column, and so on). ASC parameter defines sorting in ascending order, and dESC parameter - in the downstream. The default is ASC.

Column names in the ORDER BY clause do not have to be in the list of select columns. But this does not apply to queries like SELECT DISTINCT, since in such queries, the column names specified in the ORDER BY clause must also appear in the list of select columns. In addition, this clause cannot contain column names from tables that are not specified in the FROM clause.

As you can see from the syntax of the ORDER BY clause, sorting the result set can be done on multiple columns. This sorting is shown in the example below:

This example selects department numbers and last names and first names of employees for employees whose personnel numbers are less than 20,000, and sorted by last name and first name. The result of this query:

Columns in the ORDER BY clause can be specified not by their names, but by order in the selection list. Accordingly, the sentence in the example above can be rewritten as follows:

This alternative way of specifying columns by their position instead of names is used if the ordering criterion contains an aggregate function. (Another way is to use the column names, which then appear in the ORDER BY clause.) However, in the ORDER BY clause, it is recommended that you specify columns by their names rather than numbers, to make it easier to update the query if you need to add or remove columns in the select list. Specifying the columns in the ORDER BY clause by their numbers is shown in the example below:

USE SampleDb; SELECT ProjectNumber, COUNT (*) "Number of Employees" FROM Works_on GROUP BY ProjectNumber ORDER BY 2 DESC;

Here, for each project, the project number and the number of employees participating in it are selected, ordering the result in descending order by the number of employees.

Transact-SQL places NULL values \u200b\u200bat the top of the list when sorted in ascending order and at the end of the list when sorted in descending order.

Using ORDER BY Clause to Paginate Results

Displaying query results on the current page can either be implemented in a custom application or instructed to do so by the database server. In the first case, all database rows are sent to the application, whose task is to select the required rows and display them. In the second case, only the rows required for the current page are fetched and displayed from the server side. As you might expect, creating pages on the server side usually provides the best performance. only the lines needed for display are sent to the client.

To support server-side page creation, SQL Server 2012 introduces two new SELECT clauses: OFFSET and FETCH. The application of these two sentences is demonstrated in the example below. Here, the AdventureWorks2012 database (which you can find in the source) extracts the business ID, job title, and birthday of all female employees, sorted by job title in ascending order. The resulting rowset is split into 10-line pages and a third page is displayed:

In a sentence OFFSET specifies the number of result lines to skip in the displayed result. This amount is calculated after the rows are sorted with the ORDER BY clause. In a sentence FETCH NEXT specifies the number of WHERE and sorted rows to return. The parameter to this clause can be a constant, an expression, or the result of another query. FETCH NEXT clause is similar to clause FETCH FIRST.

The main goal when creating pages on the server side is to be able to implement common page forms using variables. This task can be accomplished through the SQL Server package.

SELECT Statement and IDENTITY Property

IDENTITY property allows you to define values \u200b\u200bfor a specific column of a table in the form of an automatically incremental counter. Columns of a numeric data type such as TINYINT, SMALLINT, INT, and BIGINT can have this property. For such a table column, the Database Engine automatically generates sequential values \u200b\u200bstarting at the specified starting value. Therefore, the IDENTITY property can be used to generate unambiguous numeric values \u200b\u200bfor the selected column.

A table can only contain one column with the IDENTITY property. The owner of the table has the ability to specify the initial value and the increment, as shown in the example below:

USE SampleDb; CREATE TABLE Product (Id INT IDENTITY (10000, 1) NOT NULL, Name NVARCHAR (30) NOT NULL, Price MONEY) INSERT INTO Product (Name, Price) VALUES ("Product1", 10), ("Product2", 15) , ("Product3", 8), ("Product4", 15), ("Product5", 40); - Will return 10004 SELECT IDENTITYCOL FROM Product WHERE Name \u003d "Product5"; - Analogous to the previous statement SELECT $ identity FROM Product WHERE Name \u003d "Product5";

This example first creates a Product table containing an Id column with an IDENTITY property. The values \u200b\u200bin the Id column are created automatically by the system, starting from 10,000 and increasing in one step for each subsequent value: 10,000, 10,001, 10,002, etc.

Several system functions and variables are associated with the IDENTITY property. For example, the example code uses system variable $ identity... As you can see from the results of executing this code, this variable is automatically referenced to the IDENTITY property. You can also use the system function instead IDENTITYCOL.

The initial value and increment of the column with the IDENTITY property can be found using the functions IDENT_SEED and IDENT_INCR respectively. These functions are applied as follows:

USE SampleDb; SELECT IDENT_SEED ("Product"), IDENT_INCR ("Product")

As mentioned, IDENTITY values \u200b\u200bare set automatically by the system. But the user can explicitly specify their values \u200b\u200bfor certain strings by assigning to the parameter IDENTITY_INSERT the ON value before inserting the explicit value:

SET IDENTITY INSERT table name ON

Because the IDENTITY_INSERT parameter can be used to set a column with the IDENTITY property to any value, including a duplicate one, the IDENTITY property does not usually enforce uniqueness of the column values. Therefore, UNIQUE or PRIMARY KEY constraints should be applied to enforce the uniqueness of the column values.

When you insert values \u200b\u200binto a table after setting IDENTITY_INSERT to on, the system creates the next value in the IDENTITY column, incrementing the highest current value for that column.

CREATE SEQUENCE statement

Using the IDENTITY property has several significant disadvantages, the most significant of which are:

    the property is restricted to the specified table;

    the new value of the column cannot be obtained in any other way than by applying it;

    the IDENTITY property can only be specified when creating a column.

For these reasons, SQL Server 2012 introduces sequences that have the same semantics as the IDENTITY property, but without the disadvantages previously listed. In this context, a sequence refers to the functionality of a database that allows you to specify counter values \u200b\u200bfor different database objects, such as columns and variables.

Sequences are created using the instruction CREATE SEQUENCE... The CREATE SEQUENCE statement is defined in the SQL standard and is supported by other relational database systems such as IBM DB2 and Oracle.

The example below shows how to create a sequence in SQL Server:

USE SampleDb; CREATE SEQUENCE dbo.Sequence1 AS INT START WITH 1 INCREMENT BY 5 MINVALUE 1 MAXVALUE 256 CYCLE;

In the example above, the values \u200b\u200bof Sequence1 are generated automatically by the system, starting at 1 and in increments of 5 for each successive value. Thus, in the START clause the initial value is indicated, and in iNCREMENT offer - step. (The step can be either positive or negative.)

In the next two, optional, sentences MINVALUE and MAXVALUE specifies the minimum and maximum values \u200b\u200bof the sequence object. (Note that MINVALUE must be less than or equal to the initial value, and MAXVALUE cannot be greater than the upper limit of the data type specified for the sequence.) In a sentence CYCLE indicates that the sequence is repeated from the beginning after exceeding the maximum (or minimum for a sequence with a negative step) value. By default, this clause is set to NO CYCLE, which means that exceeding the maximum or minimum sequence value raises an exception.

The main feature of sequences is their independence from tables, i.e. they can be used with any database object such as table columns or variables. (This property has a positive effect on storage and thus on performance. There is no need to store a specific sequence; only the last value is stored.)

New sequence values \u200b\u200bare created with nEXT VALUE FOR expressions, the application of which is shown in the example below:

USE SampleDb; - Returns 1 SELECT NEXT VALUE FOR dbo.sequence1; - Returns 6 (next step) SELECT NEXT VALUE FOR dbo.sequence1;

You can use the NEXT VALUE FOR expression to assign the result of a sequence to a variable or column cell. The example below shows the use of this expression to assign results to a column:

USE SampleDb; CREATE TABLE Product (Id INT NOT NULL, Name NVARCHAR (30) NOT NULL, Price MONEY) INSERT INTO Product VALUES (NEXT VALUE FOR dbo.sequence1, "Product1", 10); INSERT INTO Product VALUES (NEXT VALUE FOR dbo.sequence1, "Product2", 15); - ...

The example above first creates a table called Product with four columns. Next, two INSERT statements insert two rows into this table. The first two cells of the first column will be 11 and 16.

The example below shows the use of the catalog view sys.sequences to view the current value of a sequence without using it:

Typically, the NEXT VALUE FOR expression is used in an INSERT statement to instruct the system to insert the generated values. This expression can also be used as part of a multi-line query using the OVER clause.

To change the property of an existing sequence, apply aLTER SEQUENCE statement... One of the most important uses for this statement is in relation to the RESTART WITH option, which resets the specified sequence. The example below demonstrates the use of the ALTER SEQUENCE statement to reset almost all properties of the Sequence1:

USE SampleDb; ALTER SEQUENCE dbo.sequence1 RESTART WITH 100 INCREMENT BY 50 MINVALUE 50 MAXVALUE 200 NO CYCLE;

The sequence is deleted using the instruction DROP SEQUENCE.

Set operators

In addition to the operators discussed earlier, Transact-SQL supports three other set operators: UNION, INTERSECT, and EXCEPT.

UNION operator

UNION operator combines the results of two or more queries into a single result set that contains all rows that belong to all queries in the union. Consequently, the result of joining the two tables is a new table containing all the rows in one or both of the original tables.

The general form of the UNION operator looks like this:

select_1 UNION select_2 (select_3]) ...

The select_1, select_2, ... options are SELECT statements that create a join. If the ALL parameter is used, all rows are displayed, including duplicates. In the UNION statement, the ALL parameter has the same meaning as in the SELECT, but with one difference: this is the default for the SELECT, but must be specified explicitly for the UNION.

In its original form, SampleDb is not suitable for demonstrating the use of the UNION operator. Therefore, this section creates a new EmployeeEnh table that is identical to the existing Employee table but has an additional City column. This column indicates the place of residence of the employees.

Creating the EmployeeEnh table provides us with an opportunity to demonstrate the use of the clause INTO in the SELECT statement. The SELECT INTO statement performs two operations. First, a new table is created with the columns listed in the SELECT list. Then the rows from the original table are inserted into the new table. The new table name is specified in the INTO clause, and the source table name is specified in the FROM clause.

The example below shows the creation of the EmployeeEnh table from the Employee table:

USE SampleDb; SELECT * INTO EmployeeEnh FROM Employee; ALTER TABLE EmployeeEnh ADD City NCHAR (40) NULL;

In this example, the SELECT INTO statement creates the EmployeeEnh table, inserts all rows from the Employee source table into it, and then the ALTER TABLE statement adds the City column to the new table. But the added City column does not contain any values. Values \u200b\u200bcan be inserted into this column through Management Studio or by using the following code:

USE SampleDb; UPDATE EmployeeEnh SET City \u003d "Kazan" WHERE Id \u003d 2581; UPDATE EmployeeEnh SET City \u003d "Moscow" WHERE Id \u003d 9031; UPDATE EmployeeEnh SET City \u003d "Yekaterinburg" WHERE Id \u003d 10102; UPDATE EmployeeEnh SET City \u003d "St. Petersburg" WHERE Id \u003d 18316; UPDATE EmployeeEnh SET City \u003d "Krasnodar" WHERE Id \u003d 25348; UPDATE EmployeeEnh SET City \u003d "Kazan" WHERE Id \u003d 28559; UPDATE EmployeeEnh SET City \u003d "Perm" WHERE Id \u003d 29346;

We are now ready to demonstrate the use of the UNION statement. The example below shows a query to create a join between the EmployeeEnh and Department tables using this statement:

USE SampleDb; SELECT City AS "City" FROM EmployeeEnh UNION SELECT Location FROM Department;

The result of this query:

Only compatible tables can be joined using the UNION statement. Compatible tables mean that both lists of columns in the selection must contain the same number of columns, and the corresponding columns must have compatible data types. (In terms of compatibility, the INT and SMALLINT datatypes are not compatible.)

The result of the concatenation can only be ordered using the ORDER BY clause in the final SELECT statement, as shown in the example below. The GROUP BY and HAVING clauses can be used with individual SELECT statements, but not within the join itself.

The query in this example selects employees who either work in department d1 or who started working on a project before January 1, 2008.

The UNION operator supports the ALL option. This option does not remove duplicates from the result set. An OR operator can be used in place of the UNION operator if all SELECT statements joined by one or more UNION operators refer to the same table. In this case, the set of SELECT statements is replaced by a single SELECT statement with a set of OR statements.

INTERSECT and EXCEPT Operators

Two other operators for working with sets, INTERSECT and EXCEPT, define the intersection and the difference, respectively. Under the intersection, in this context, there is a set of rows that belong to both tables. And the difference between two tables is defined as all values \u200b\u200bthat belong to the first table and are not present in the second. The example below shows the use of the INTERSECT operator:

Transact-SQL does not support the use of the ALL parameter with either the INTERSECT operator or the EXCEPT operator. The use of the EXCEPT operator is shown in the example below:

Remember that these three set operators have different execution precedence: INTERSECT has the highest precedence, followed by EXCEPT, and UNION has the lowest precedence. Failure to pay attention to execution precedence when using several different set operators can lead to unexpected results.

CASE expressions

In the field of database application programming, it is sometimes necessary to modify the presentation of data. For example, people can be subdivided by their social class, using the values \u200b\u200b1, 2, and 3, for men, women, and children, respectively. This programming technique can reduce the time required to implement the program. CASE expression Transact-SQL allows you to easily implement this type of encoding.

Unlike most programming languages, CASE is not a statement, but an expression. Therefore, a CASE expression can be used almost anywhere Transact-SQL allows expressions. The CASE expression has two forms:

    simple CASE expression;

    search expression CASE.

The syntax for a simple CASE expression is as follows:

A statement with a simple CASE expression first searches the list of all expressions in wHEN clause the first expression that matches expression_1 and then executes the corresponding tHEN offer... If there is no matching expression in the WHEN list, then eLSE clause.

The syntax for a CASE search expression is as follows:

In this case, it searches for the first qualifying condition and then executes the corresponding THEN clause. If none of the conditions meet the requirements, the ELSE clause is executed. The use of the CASE search expression is shown in the example below:

USE SampleDb; SELECT ProjectName, CASE WHEN Budget\u003e 0 AND Budget 100000 AND Budget 150000 AND Budget

The result of this query:

This example weights the budgets of all projects and displays the calculated weights along with the corresponding project names.

The example below shows another way to use a CASE expression, where the WHEN clause contains subqueries that are part of the expression:

USE SampleDb; SELECT ProjectName, CASE WHEN p1.Budget (SELECT AVG (p2.Budget) FROM Project p2) THEN "above average" END "Budget category" FROM Project p1;

The result of this query is as follows:

Did you like the article? To share with friends: