Aggregate functions of the data composition system. SQL language. Formation of queries to the database Which sql aggregate function finds the maximum value

Additional functions have been added to SQL that allow you to compute generic group values. To use aggregate functions, a preliminary grouping operation is assumed. What is the essence of the grouping operation? When grouping, the entire set of tuples of a relation is divided into groups, in which tuples are collected that have the same attribute values \u200b\u200bthat are specified in the grouping list.

For example, let's group R1 by the value in the Discipline column. We will get 4 groups for which we can calculate some group values, for example, the number of tuples in a group, the maximum or minimum value of the Score column.

This is done using aggregate functions. Aggregate functions calculate a single value for the entire table group. A list of these functions is presented in table 5.7.

Table 5.7.Aggregate functions

R1
Full name Discipline Assessment
Group 1 Petrov F.I. Database
K. A. Sidorov Database
Mironov A.V. Database
Stepanova K.E. Database
Krylova T.S. Database
Vladimirov V.A. Database
Group 2 K. A. Sidorov Information theory
Stepanova K.E. Information theory
Krylova T.S. Information theory
Mironov A.V. Information theory Null
Group 3 Trofimov P.A. Networks and telecommunications
Ivanova E.A. Networks and telecommunications
Utkina N.V. Networks and telecommunications
Group 4 Vladimirov V.A. English language
Trofimov P.A. English language
Ivanova E.A. English language
Petrov F.I. English language i

Aggregate functions are used like field names in a SELECT statement, with one exception: they take the field name as an argument. Only numeric fields can be used with the SUM and AVG functions. Both numeric and character fields can be used with the COUNT, MAX, and MIN functions. When used with character fields, MAX and MIN will translate them to the ASCII equivalent and process them in alphabetical order. Some DBMSs allow the use of nested aggregates, but this is a deviation from the ANSI standard with all the ensuing consequences.



For example, you can calculate the number of students who passed exams in each discipline. To do this, you need to execute a query with a grouping by the field "Discipline" and display as a result the name of the discipline and the number of lines in the group for this discipline. Using the * character as an argument to the COUNT function means counting all the lines in the group.

SELECT R1 Discipline. СОUNТ (*)

GROUP BY R1 Discipline

Result:

If we want to count the number of those who passed the exam in any discipline, then we need to exclude undefined values \u200b\u200bfrom the initial ratio before grouping. In this case, the request will look like this:

SELECT R1 Discipline. COUNT (*)

FROM R1 WHERE R1.

IS NOT NULL score

GROUP BY Rl. Discipline

We get the result:

In this case, the line with the student

Mironov A, V. Information theory Null

will not be included in the set of tuples before the grouping, so the number of tuples in the group for the discipline "Information Theory" will be 1 less.

Aggregate functions can also be used without the preliminary grouping operation, in this case the whole relationship is considered as one group and for this group one value per group can be calculated.



Referring again to the Session database (tables Rl, R2, R3), we find the number of successfully passed exams:

WHERE Score\u003e 2:

This is, of course, different from selecting a field, since a single value is always returned, no matter how many rows are in the table. The argument of aggregate functions can be separate columns of tables. But in order to calculate, for example, the number of distinct values \u200b\u200bof a certain column in a group, it is necessary to use the DISTINCT keyword together with the column name. Let's calculate the number of different grades received for each discipline:

SELECT Rl Discipline.

COUNT (DISTINCT R1.Evaluation)

WHERE R1 IS NOT NULL

GROUP BY Rl. Discipline

Result:

The result can include a grouping field value and multiple aggregate functions, and multiple fields can be used in grouping conditions. In this case, groups are formed by a set of specified grouping fields. Aggregate operations can be applied to the concatenation of multiple source tables. For example, let's pose the question: determine for each group and each discipline the number of those who successfully passed the exam and the average score for the discipline.

SELECT R2. Group. R1 Discipline. COUNT (*), AVP (Assessment)

WHERE Rl. Full name \u003d R2. Full name AND

Rl. IS NOT NULL AND

Rl. Score\u003e 2

GROUP BY R2. Group. Rl Discipline

Result:

We cannot use aggregate functions in the WHERE clause because predicates are evaluated in terms of a single row and aggregate functions are evaluated in terms of groups of rows.

The GROUP BY clause allows you to define a subset of the values \u200b\u200bin a particular field in terms of another field and apply an aggregate function to the subset. This makes it possible to combine fields and aggregate functions in a single SELECT clause. Aggregate functions can be used both in the expression for outputting the results of a SELECT row, and in the expression for processing the generated HAVING groups. In this case, each aggregate function is calculated for each selected group. The values \u200b\u200bobtained when calculating aggregate functions can be used to display the corresponding results or for the selection condition of groups.

Let's construct a query that displays the groups in which more than one deuce was obtained in one discipline in exams:

SELECT R2. Group

WHERE Rl. Full name \u003d R2. Full name AND

Rl. Score \u003d 2

GROUP BY R2. Group. R1 Discipline

HAVING count (*)\u003e 1

In what follows, as an example, we will work not with the Session DB, but with the Bank DB, which consists of a single table F, which stores the relation F containing information about accounts in the branches of a certain bank:

F \u003d ;

Q \u003d (Branch, City);

because on this basis, you can better illustrate the work with aggregate functions and grouping.

For example, suppose we want to find the total account balance in branches. You can make a separate query for each of them by selecting SUM (Remaining) from the table for each branch. GROUP BY, however, will allow you to put them all in one command:

SELECT Branch, SUM

GROUP BY Branch:

GROUP BY applies aggregate functions independently for each group identified by the value of the Branch field. A group consists of lines with the same value of the Branch field, and the SUM function is applied separately for each such group, that is, the total account balance is calculated separately for each branch. The value of the field to which GROUP BY is applied has, by definition, only one value per output group, as does the result of an aggregate function. Therefore, we can combine an aggregate and a field in one query. You can also use GROUP BY with multiple fields.

Let's suppose that we would like to see only those total values \u200b\u200bof account balances that exceed $ 5000. To see total balances over $ 5000, you must use the HAVING clause. The HAVING clause defines the criteria used to remove specific groups from the output, just as the WHERE clause does for individual rows.

The correct command would be:

SELECT Branch, SUM (Balance)

GROUP BY Branch

HAVING SUM (Remaining)\u003e 5000;

Arguments in the HAVING clause follow the same rules as in the SELECT clause, which uses GROUP BY. They must have one value per output group.

The following command will be prohibited:

SELECT Branch SUM (Balance)

FROM F GROUP BY Branch

HAVING Opening Date \u003d 12/27/1999;

The OpenDate field cannot be used in a HAVING clause because it can have more than one value per display group. To avoid this situation, the HAVING clause should only refer to the aggregates and fields selected by GROUP BY. There is a correct way to make the above query:

SELECT Branch, SUM (Balance)

WHERE DateOpened \u003d "12/27/1999"

GROUP BY Branch;

The meaning of this query is as follows: find the amount of balances for each branch of accounts opened on December 27, 1999.

As stated earlier, HAVING can only use arguments that have the same value per output group. In practice, references to aggregate functions are the most common, but fields selected with GROUP BY are also valid. For example, we want to see the total account balances of branches in St. Petersburg, Pskov and Uryupinsk:

SELECT Branch.SUM (Remaining)

WHERE F. Branch \u003d Q. Branch

GROUP BY Branch

HAVING IN Branch ("St. Petersburg". "Pskov". "Uryupinsk");

Therefore, in the arithmetic expressions of predicates included in the selection condition of the HAVING clause, you can directly use only the specifications of the columns specified as grouping columns in the GROUP BY clause. The rest of the columns can be specified only within the specifications of the aggregate functions COUNT, SUM, AVG, MIN and MAX, which in this case calculate some aggregate value for the entire group of rows. The situation is similar with the subqueries included in the predicates of the selection condition of the HAVING section: if the subquery uses the characteristic of the current group, then it can be specified only by referring to the grouping columns.

The result of the HAVING clause is a grouped table containing only those row groups for which the result of evaluating the search condition is TRUE. In particular, if a HAVING clause is present in a table expression that does not contain GROUP BY, then the result of its execution will be either an empty table or the result of the previous sections of the table expression, considered as one group without grouping columns.

Nested SQL Queries

Now let's return to the "Session" database and consider using its example the use of nested queries.

With SQL, you can nest queries inside each other. Typically, the inner query generates a value that is checked in the outer query predicate (in the WHERE or HAVING clause) to determine whether it is true or not. The EXISTS predicate can be used in conjunction with a subquery, which returns true if the subquery output is not empty.

When combined with other features of the select operator, such as grouping, a subquery is a powerful tool for achieving the desired result. In the FROM part of the SELECT statement, it is permissible to apply synonyms to table names if, when forming a query, we need more than one instance of some relation. Synonyms are specified using the AS keyword, which can be omitted altogether. Therefore, the FROM part might look like this:

FROM Rl AS A, Rl AS B

FROM Rl A. Rl B:

both expressions are equivalent and are treated as applying a SELECT statement to two instances of R1.

For example, let's show how some queries to the Session database look in SQL:

  • List of those who passed all required exams.

WHERE Score\u003e 2

HAVING COUNT (*) \u003d (SELECT COUNT (*)

WHERE R2.Group \u003d R3.Group AND Full name and full name)

Here, the inline query defines the total number of exams that each student in the student's group must take, and this number is compared to the number of exams that this student has taken.

  • List of those who were supposed to take the DB exam, but have not yet passed.

SELESTHIO

WHERE R2.Fpynna \u003d R3.Group AND Discipline \u003d "DB" AND NOT EXISTS

(SELECT full name FROM Rl WHERE full name \u003d name and full name AND Discipline \u003d "DB")

The EXISTS (SubQuery) predicate is true when the SubQuery is not empty, that is, it contains at least one tuple, otherwise the EXISTS predicate is false.

The NOT EXISTS predicate is reversed - true only when the SubQuery is empty.

Notice how NOT EXISTS with a subquery allows you to do without the difference operation. For example, the wording of a query with the word “all” can be done as if with a double negation. Consider an example of a base that simulates the supply of individual parts by individual suppliers, it is represented by one SP relationship "Suppliers-parts" with the scheme

SP (Supplier_number. Part_number) P (part_number. Name)

This is how the response to the request is formulated: "Find suppliers who supply all the parts."

SELECT DISTINCT SUPPLIER_NUMBER FROM SP SP1 WHERE NOT EXISTS

(SELECT part_number

FROM P WHERE NOT EXISTS

(SELECT * FROM SP SP2

WHERE SP2.supplier_number \u003d SP1.supplier_number AND

sp2.part_number \u003d P.part_number)):

In fact, we have reformulated this request as: "Find suppliers such that there is no part that they would not supply." It should be noted that this query can also be implemented through aggregate functions with a subquery:

SELECT DISTINCT VendorNumber

GROUP BY VendorNumber

HAVING CounKDISTINCT part_number) \u003d

(SELECT Count (part_number)

The SQL92 standard extends comparison operators to multiple comparisons using the ANY and ALL keywords. This extension is used when comparing the value of a specific column with the data column returned by a subquery.

The ANY keyword in any comparison predicate means that the predicate will be true if the comparison predicate is true for at least one value from the subquery. The ALL keyword requires the comparison predicate to be true when compared against all rows in the subquery.

For example, let's find students who have passed all the exams for a grade no lower than "good". We work with the same base "Session", but add to it one more relation R4, which characterizes the delivery of laboratory work during the semester:

R 1 \u003d (name, discipline, assessment);

R 2 \u003d (name, group);

R 3 \u003d (Groups, Discipline)

R 4 \u003d (name, discipline, number_lab_work, grade);

Select R1.Full Name From R1 Where 4\u003e \u003d All (Select Rl.

Where R1.Fio \u003d R11.Fio)

Let's look at another example:

Select students who have an exam score not less than at least one score on their laboratory work in this discipline:

Select R1.

From R1 Where R1.Evaluation\u003e \u003d ANY (Select R4.

Where Rl. Discipline \u003d R4. Discipline AND R1. Phio \u003d R4. Phio)

SQL Outer Joins

The SQL2 standard has expanded the concept of conditional joins. In the SQL1 standard, when joining relations, only the conditions specified in the WHERE clause of the SELECT statement were used, and in this case, only tuples of the original relations, for which these conditions were defined and true, were concatenated by the specified conditions into the resulting relation. However, in reality, it is often necessary to join tables in such a way that all rows from the first table are included in the result, and instead of those rows of the second table for which the join condition is not met, the result would be undefined values. Or vice versa, all rows from the right (second) table are included, and missing parts of rows from the first table are padded with null values. Such joins were called outer joins, as opposed to joins defined by the SQL1 standard, which came to be called inner.

In general, the syntax for the FROM part in the SQL2 standard is as follows:

FROM<список исходных таблиц> |

< выражение естественного объединения > |

< выражение объединения >

< выражение перекрестного объединения > |

< выражение запроса на объединение >

<список исходных таблиц>::= <имя_таблицы_1>

[synonym name for table_1] [...]

[,<имя_таблицы_п>[ <имя синонима таблицы_n> ] ]

<выражение естественного объединениям:: =

<имя_таблицы_1> NATURAL (INNER | FULL | LEFT | RIGHT) JOIN<имя_таблицы_2>

<выражение перекрестного объединениям: = <имя_таблицы_1> CROSS JOIN<имя_таблицы_2>

<выражение запроса на объединением:=

<имя_таблицы_1> UNION JOIN<имя_таблицы_2>

<выражение объединениям:= <имя_таблицы_1> (INNER |

FULL | LEFT | RIGHT) JOIN (ON condition)<имя_таблицы_2>

In these definitions, INNER stands for an inner join, LEFT stands for a left join, that is, the result includes all rows of table 1, and parts of the resulting tuples for which there were no corresponding values \u200b\u200bin table 2 are padded with NULL values \u200b\u200b(undefined). The RIGHT keyword means a right outer join, and unlike a left join, in this case all rows of table 2 are included in the resulting relation, and the missing parts from table 1 are supplemented with null values. The FULL keyword defines a full outer join: both left and right. With a full outer join, both right and left outer joins are performed, and the resulting relation includes all rows from Table 1, padded with null values, and all rows from Table 2, padded with null values.

The OUTER keyword means external, but if the FULL, LEFT, RIGHT keywords are specified, the union is always considered external.

Let's consider some examples of outer joins. Let's go back to the Session DB again. Let's create a relationship in which all the marks received by all students on all exams they had to take will stand. If a student did not pass this exam, then instead of a grade, he will have an undefined value. To do this, we will successively perform the natural inner join of tables R2 and R3 by the Group attribute, and connect the resulting relationship with the left outer natural join with the table R1 using the Full Name and Discipline columns. At the same time, the standard allows to use a parenthesis structure, since the result of the union can be one of the arguments in the FROM part of the SELECT statement.

SELECT Rl. Full name, R1. Discipline. Rl.Evaluation

FROM (R2 NATURAL INNER JOIN R3) LEFT JOIN Rl USING (name. Discipline)

Result:

Full name Discipline Assessment
Petrov F.I. Database
K. A. Sidorov Database 4
L. V. Mironov Database
Stepanova K.E. Database
Krylova T.S. Database
Vladimirov V.A. Database
Petrov F.I. Information theory Null
K. A. Sidorov Information theory
Mironov A.V. Information theory Null
Stepanova K.E. Information theory
Krylova T.S. Information theory
Vladimirov V.A. Information theory Null
Petrov F.I. English language
K. A. Sidorov English language Null
Mironov A.V. English language Null
Stepanova K.E. English language Null
Krylova T.S. English language Null
Vladimirov V.A. English language
Trofimov P.A. Networks and telecommunications
Ivanova E.A. Networks and telecommunications

Let's consider one more example, for this we take the "Library" database. It consists of three relations, the names of the attributes are typed here in Latin letters, which is necessary in most commercial DBMS.

BOOKS (ISBN, TITL. AUTOR. COAUTOR. YEARJZD, PAGES)

READER (NUM_READER. NAME_READER, ADRESS. HOOM_PHONE. WORK_PHONE. BIRTH_DAY)

EXEMPLARE (INV, ISBN, YES_NO. NUM_READER. DATE_IN. DATE_DUT)

Here the BOOKS table describes all the books present in the library, it has the following attributes:

  • ISBN - unique book code;
  • TITL - title of the book;
  • AUTOR - author's surname;
  • COAUTOR - name of the co-author;
  • YEARIZD - year of publication;
  • PAGES is the number of pages.

The READER table stores information about all readers of the library and contains the following attributes:

  • NUM_READER - unique number of the library card;
  • NAME_READER - the surname and initials of the reader;
  • ADRESS - reader's address;
  • HOOM_PHONE - home phone number;
  • WORK_PHONE - work phone number;
  • BIRTH_DAY - the date of birth of the reader.

The EXEMPLARE table contains information about the current state of all instances of all books. It includes the following columns:

  • INV - unique inventory number of the book copy;
  • ISBN is a book code that identifies which book it is and refers to information from the first table;
  • YES_NO - sign of the presence or absence of this instance in the library at the current moment;
  • NUM_READER - number of the library card, if the book is issued to the reader, and Null otherwise;
  • DATE_IN - if the reader has the book, then this is the date when it was issued to the reader; a DATE_OUT is the date the reader should return the book to the library.

Let's define a list of books for each reader; if the reader has no books, then the book instance number is NULL. To perform this search, we need to use a left outer join, that is, we take all rows from the READER table and join them with rows from the EXEMPLARE table, if there is no row in the second table with the corresponding library card number, then the EXEMPLARE.INV attribute will be be undefined NULL:

SELECT READER.NAME_READER, EXEMPLARE.INV

FROM READER RIGHT JOIN EXEMPLARE ON READER.NUM_READER \u003d EXEMPLARE.NUM_READER

The outer join operation, as we already mentioned, can be used to form sources in the FROM clause, so, for example, the following query text will be valid:

FROM (BOOKS LEFT JOIN EXEMPLARE)

LEFT JOIN (READER NATURAL JOIN EXEMPLARE)

At the same time, for books, not a single copy of which is in the hands of the readers, the values \u200b\u200bof the library card number and the dates of taking and returning the book will be undefined.

Cross join in the interpretation of the SQL2 standard corresponds to the extended Cartesian product operation, that is, the operation of joining two tables, in which every row of the first table is joined to every row of the second table.

Operation unification requestis equivalent to the operation of set-theoretic union in algebra. In this case, the requirement for the equivalence of the schemes of the initial relations remains. A join request is executed as follows:

SELECT - query

UNION SELECT - query

UNION SELECT - query

All queries involved in a join operation must not contain expressions, that is, calculated fields.

For example, you need to display a list of readers who are holding the Idiot book or the Crime and Punishment book. Here's how the query will look:

SELECT READER. NAME_READER

FROM READER, EXEMPLARE.BOOKS

BOOKS.TITLE \u003d "(! LANG: Idiot"!}

SELECT READER.NAME_READER

FROM READER, EXEMPLARE, BOOKS

WHERE EXEMPLARE.NUM_READER \u003d READER.NUM_READER AND

EXEMPLRE.ISBN \u003d BOOKS.ISBN AND

BOOKS.TITLE \u003d "(! LANG: Crime and Punishment"!}

By default, duplicate tuples are always excluded when executing a union query. Therefore, if there are readers who have both books on hand, they will still be included in the resulting list only once.

A join request can combine any number of original queries.

So, to the previous request, you can add more readers who are holding the book "Castle":

SELECT READER. NAME_READER

FROM READER. EXEMPLARE, BOOKS

WHERE EXEMPLARE.NUM_READER \u003d READER.NUM_READER AND.

EXEMPLRE.ISBN \u003d BOOKS.ISBN AND

BOOKS.TITLE \u003d "(! LANG: Castle"!}

In the case when you need to keep all the rows from the original relationship, you must use the ALL keyword in the union operation. If duplicate tuples are saved, the flow diagram of the union query will look like this:

SELECT - query

SELECT - query

SELECT - query

However, you can achieve the same result by simply modifying the WHERE clause of the first part of the original query, ORing the local conditions, and eliminating duplicate tuples.

SELECT DISTINCT READER.NAME_READER

FROM READER. EXEMPLARE.BOOKS

WHERE EXEMPLARE.NUM_READER \u003d READER.NUM_READER AND

EXEMPLRE.ISBN \u003d BOOKS.ISBN AND

BOOKS.TITLE \u003d "(! LANG: Idiot" OR!}

BOOKS.TITLE \u003d "(! LANG: Crime and Punishment" OR!}

BOOKS.TITLE \u003d "(! LANG: Castle"!}

None of the original queries in the UNION operation should contain ordering clauses of the ORDER BY result, but the result of the join can be ordered by specifying the ORDER BY clause specifying the list of ordering columns after the text of the last original SELECT query.

the value in the Score column. Table 5.7. Aggregate functions
Function Result
COUNT The number of rows or non-empty field values \u200b\u200bthat the query selected
SUM The sum of all selected values \u200b\u200bfor this field
AVG The arithmetic mean of all selected values \u200b\u200bin this field
MIN The smallest of all selected values \u200b\u200bfor this field
MAX Largest of all selected values \u200b\u200bfor this field
R1
Full name Discipline Assessment
Group 1 Petrov F.I. Database 5
K. A. Sidorov Database 4
Mironov A.V. Database 2
Stepanova K.E. Database 2
Krylova T.S. Database 5
Vladimirov V.A. Database 5
Group 2 K. A. Sidorov Information theory 4
Stepanova K.E. Information theory 2
Krylova T.S. Information theory 5
Mironov A.V. Information theory Null
Group 3 Trofimov P.A. Networks and telecommunications 4
Ivanova E.A. Networks and telecommunications 5
Utkina N.V. Networks and telecommunications 5
Group 4 Vladimirov V.A. English language 4
Trofimov P.A. English language 5
Ivanova E.A. English language 3
Petrov F.I. English language 5

Aggregate functions are used like field names in a SELECT statement, with one exception: they take the field name as an argument. Only numeric fields can be used with the SUM and AVG functions. Both numeric and character fields can be used with the COUNT, MAX, and MIN functions. When used with character fields, MAX and MIN will translate them to the ASCII equivalent and process them in alphabetical order. Some DBMSs allow the use of nested aggregates, but this is a deviation from the ANSI standard with all the ensuing consequences.

For example, you can calculate the number of students who passed exams in each discipline. To do this, you need to execute a query grouped by the "Discipline" field and display as a result the name of the discipline and the number of rows in the group for this discipline. Using the * character as an argument to the COUNT function means counting all the lines in the group.

SELECT R1 Discipline, COUNT (*) FROM R1 GROUP BY R1 Discipline

Result:

If we want to count the number of those who passed the exam in any discipline, then we need to exclude undefined values \u200b\u200bfrom the original ratio before grouping. In this case, the request will look like this:

We get the result:

In this case, the line with the student

Mironov A.V. Information theory Null

will not fall into the set of tuples before the grouping, so the number of tuples in the group for the discipline " Information theory"will be 1 less.

Can be applied aggregate functions also without the operation of preliminary grouping, in this case the whole relation is considered as one group and for this group one value can be calculated per group.

Referring again to the Session database (tables R1, R2, R3), we find the number of successfully passed exams:

This is, of course, different from selecting a field, since a single value is always returned, no matter how many rows are in the table. The argument aggregate functions there can be separate columns of tables. But in order to calculate, for example, the number of distinct values \u200b\u200bof a certain column in a group, it is necessary to use the DISTINCT keyword together with the column name. Let's calculate the number of different grades received for each discipline:

Result:

The result can include the value of the grouping field and several aggregate functions, and multiple fields can be used in grouping conditions. In this case, groups are formed by a set of specified grouping fields. Aggregate operations can be applied to the concatenation of multiple source tables. For example, let's pose the question: determine for each group and each discipline the number of those who successfully passed the exam and the average score for the discipline.

Result:

We cannot use aggregate functions in the WHERE clause because predicates are evaluated in terms of a single row, and aggregate functions - in terms of line groups.

The GROUP BY clause allows you to define a subset of the values \u200b\u200bin a particular field in terms of another field and apply an aggregate function to the subset. This makes it possible to combine fields and aggregate functions in a single SELECT clause. Aggregate functions can be used both in the expression for outputting the results of the SELECT row, and in the expression of the condition for processing the formed HAVING groups. In this case, each aggregate function is calculated for each selected group. Calculated values aggregate functions, can be used to display relevant results or for group selection conditions.

Let's build a query that displays the groups in which more than one two was received in one discipline in exams:

In the future, as an example, we will work not with the "Session" database, but with the "Bank" database, which consists of a single table F, which stores the relation F containing information about accounts in the branches of a certain bank:

F \u003d (N, full name, Branch, Opening Date, Closing Date, Balance); Q \u003d (Branch, City);

because on this basis, you can better illustrate the work with aggregate functions and grouping.

For example, suppose we want to find the total account balance in branches. You can make a separate query for each of them by selecting SUM (Remaining) from the table for each branch. GROUP BY, however, will allow you to put them all in one command:

SELECT Branch, SUM (Balance) FROM F GROUP BY Branch;

GROUP BY applies aggregate functions independently for each group, defined by the value of the Branch field. The group consists of lines with the same Branch field value, and

Introduction

SQL (Structured Query Language) - Structured Query Language - standard query language for working with relational databases.

The first international standard for the SQL language was adopted in 1989 (hereinafter we will call it SQL / 89 or SQL1). Sometimes the SQL1 standard is also called the ANSI / ISO standard and the vast majority of the database systems available on the market fully support this standard.

At the end of 1992, a new international standard for the SQL language was adopted (hereinafter referred to as SQL / 92 or SQL2). And it is not without its drawbacks, but at the same time it is significantly more accurate and complete than SQL / 89. Most database vendors are currently making changes to their products to be more compliant with the SQL2 standard.

The last standard for the SQL language was released in 1996. It is called SQL3.

SQL cannot be fully attributed to traditional programming languages: it does not contain traditional operators for controlling program execution, operators for describing types and much more, it only contains a set of standard operators for accessing data stored in a database. SQL statements are embedded in the underlying programming language, which can be any standard language like C ++, PL, COBOL, etc. In addition, SQL statements can be executed directly interactively.

1. SQL structure.

SQL contains the following sections:

1. Data definition language (DDL) operators.

Operator Meaning Act
CREATE TABLE Create table Creates a new table in the database
DROP TABLE Delete table Removes a table from the database
ALTER TABLE Change table Modifies the structure of an existing table
CREATE VIEW Create view Creates a virtual table i.e. a table that doesn't actually exist but is modeled using this operator.
ALTER VIEW Change view Changes the structure or content of the virtual table
DROP VIEW Delete view Removes the description of the virtual table. You do not need to delete the table itself, because. it doesn't really exist.
CREATE INDEX Create Index Creates a special physical structure, called an index, to speed up data access
DROP INDEX Delete index Removes the created structure
CREATE SYNONYM Create synonym
DROP SYNONYM Remove synonym

2. Data Manipulation Language (DML) Operators



3. Data Query Language (DQL)

4. Transaction Management Tools (DCL)

5. Data Administration Tools (DDL)

Programmatic SQL

2. Data types

The following data types are supported in SQL / 89: CHARACTER, NUMERIC, DECIMAL, INTEGER, SMALLINT, FLOAT, REAL, DOUBLE PRECISION. These data types are classified into character string, exact number, and approximate number types.

The following data types have been added in the SQL92 standard:

VARCHAR (n) - variable length character strings

NCHAR (N) - constant length localized character strings

NCHAR VARYING (n) - variable length strings of localized characters

BIT (n) - constant length bit string

BIT VARYING (n) - variable length bit string

DATE calendar date

TIMESTAMP (precision) date and time

INTERVAL time interval

3. SELECT statement

Select is the only search operator that replaces all relational algebra operations.

The syntax diagram of the SELECT operator is shown in Fig. 1


Here, the ALL keyword means that all rows that satisfy the query conditions are included in the resulting rowset. The DISTINCT keyword means that only different rows are included in the result set, i.e. duplicate result rows are not included in the set. If no keyword is present, then this situation is interpreted as the presence of the keyword ALL.

The * means that all columns from the original query tables are included in the result set.

The FROM part specifies a list of the initial relations (tables) of the query.

The WHERE part specifies the selection conditions for the result term or the conditions for joining the tuples of the source tables.

The GROUP BY part specifies a list of grouping fields.

The HAVING part sets predicates-conditions imposed on each group.

The ORDER BY part specifies a list of ordering fields for the result.

The following predicates can be used in the condition expression for the WHERE part:

Pattern comparison predicate LIKE and NOT LIKE

· Predicate EXIST and NOT EXIST.

Comparison predicates { =, <>, >,<,>=,<=,}. The syntax diagram of comparison predicates is shown in Fig. 2.


predicate IN - is included in the set / is not included in the set.

The IN or NOT IN predicate can also be used to compare the tested expression with a subquery, in which case the syntax diagram is shown in Fig. five.

The IN predicate is true when the value of the attribute specified in the tested expression for the current tuple matches at least one of the set of values \u200b\u200bobtained as a result of executing the corresponding subquery or contained in the list of values. Conversely, the NOT IN predicate is true only when the value of the specified attribute in the current tuple does not match any of the set of values \u200b\u200bdefined by the inline subquery or the specified list of values.


predicate LIKE - includes (is similar)

The pattern can contain _ underscores to denote any single character;

% percent symbol - to represent any arbitrary sequence of characters.

The LIKE predicate is true when the value of the attribute specified by the column name in the current tuple includes the specified<шаблон>.

The NOT LIKE predicate is true when the value of the attribute in the current tuple does not include the given one<шаблон>.

Predicate NULL - unknown, undefined

The syntax diagram of the predicate is shown in Fig. 7.


All predicates considered earlier can be used in search conditions.

Putting aside our acquaintance with grouping for a while, consider in detail the first three lines of the SELECT statement:

SELECT - a keyword that tells the DBMS that this command is a request. All queries begin with this word followed by a space. It can be followed by a selection method - with the removal of duplicates ( DISTINCT), or without deleting ( ALL, is the default). This is followed by a list of columns, separated by commas, which are selected by a query from tables, or the symbol ‘ * ’To select the entire line. Any columns not listed here will not be included in the result dataset. This, of course, does not mean that they will be deleted or their information will be erased from the tables, because the query does not affect the information in the tables - it only shows the data.

FROM - a keyword that must be presented in each request. The FROM keyword is followed by one or more spaces and then a list of source tables that are used in the query. The table names are separated from each other by commas. You can assign alias names to tables, which is useful for performing a join operation on a table with itself or for accessing the current record of an outer query from a nested subquery (nested subqueries are not considered here). An alias is a temporary table name that is used only in this query and is not used further. The alias is separated from the base table name by at least one space. The syntax diagram of the FROM part is shown in Fig. nine.


All subsequent parts of the SELECT statement are optional.

· WHERE - a keyword followed by a condition predicate that defines the records that will be included in the resulting query dataset.

Consider a relationship database that simulates the passing of a session at some institution. Let it consist of three relations,,. We will assume that they are represented by tables R1, R2 and R3, respectively.

R1 \u003d (name, discipline, grade)

R2 \u003d (name, group)

R3 \u003d (Group, Discipline)

Here are some examples of using the SELECT statement.

· List of all groups (no repetitions) where exams are to be passed

SELECT DISTINCT Groups
FROM R3

· List of students who passed the DB exam with excellent marks

SELECT name
FROM R1
WHERE Discipline \u003d "DB" AND Assessment = 5

· A list of all students who need to take something, along with the name of the discipline.

SELECT Name, Discipline
FROM R2, R3
WHERE R1 group = R2.Group

Here the WHERE part sets the conditions for joining the relations R1 and R2. In the absence of join conditions in the WHERE part, the result will be equivalent to the extended Cartesian product and, in this case, each student would be assigned all disciplines from the R2 relation, and not those that his group should take.

· List of slobs with multiple twos

SELECT Full name
FROM R1 a, R1 b
WHERE a.name \u003d b.name AND
a.Discipline <> b. discipline AND
a.Evaluation<= 2 AND b.Evaluation.<= 2

Here we have used aliases to name the relationship R1 a and b, since we need to work with two instances of this relationship to write the search conditions.

It is clearly seen from these examples that the logic of the operation of the selection operator (Cartesian product-selection-projection) does not coincide with the order of description of data in it (first, a list of fields for projection, then a list of tables for a Cartesian product, then a join condition). The fact is that SQL was originally developed for use by the end user, and they tried to bring it closer to a natural language, not an algorithmic language. Of course, English was chosen as a natural language, as an international language widely used in computing and programming. For this reason, SQL at first causes confusion and annoyance among professional programmers who are beginning to learn it, who are used to talking to a machine in algorithmic languages.

The presence of undefined Null values \u200b\u200bincreases the flexibility of processing information stored in the database. In our examples, we can assume a situation when a student came to the exam, but did not pass it for some reason, in this case, the grade in a certain discipline for a given student has an undefined meaning. In this situation, the question can be posed: "Find students who came to the exam, but did not pass it, indicating the name of the discipline." The Select statement will look like this:

SELECT Name, Discipline

WHERE AssessmentIS NULL

I want to make a reservation right away that all the examples given earlier are conditional. Why? Wouldn't they work in real databases? Are they wrong? Everything is correct here except the names of attributes or table columns. In most DBMS (Database Management Systems) it is not allowed to name columns in national languages, these are database objects and language objects and it is required that they be named according to the rules for naming identifiers in a given language. Most often, the name of an attribute can be a sequence of letters of the Latin alphabet and numbers, starting with a letter, without some special characters (for example, spaces, periods, commas, percent signs,% and other special characters) and having some length restrictions. These restrictions are different in different DBMS, for example, in MS SQL Server 2000 - the length of an attribute name can be up to 128 characters. Long attribute names are inconvenient for writing a query, but very short single-letter names do not allow preserving the semantics of the meaning of a table column, so they choose some compromise and name them shortly, but conveniently, so that you don't have to look into the full description of the database when writing each query. In addition, the names of attributes, as well as the names of other objects, should not coincide with the keywords of the SQL language - i.e. those words that are included in the language operators.

Therefore, from the point of view of correctness, we should have represented the "Session" database schema in the form

R1 \u003d (St_name, Discipline, Mark)

R2 \u003d (St_name, N_group)

R3 \u003d (N_group, Discipline)

And change all requests accordingly.

Using Aggregate Functions and Subqueries in a Select Statement

Queries can calculate the generalized grouped value of fields in the same way as the value of a single field. This is done using aggregate functions. Aggregate functions produce a single value for the entire table group. List of these functions:

Aggregate functions are used like field names in a SELECT statement, with one exception: they take the field name as an argument. Only numeric fields can be used with the SUM and AVG functions. Both numeric and character fields can be used with the COUNT, MAX, and MIN functions. When used with character fields, MAX and MIN will translate them to the ASCII equivalent, and process them in alphabetical order. Some DBMSs allow the use of nested aggregates, but this is a deviation from the ANSI standard with all the ensuing consequences.

Referring again to the Session database (tables R1, R2, R3), we find the number of successfully passed exams:

SELECT COUNT (*)
FROM R1
WHERE Mark\u003e 2;

This is, of course, different from selecting a field, since a single value is always returned, no matter how many rows are in the table. Because of this, aggregate functions and fields cannot be selected at the same time unless the special GROUP BY clause is used.

The GROUP BY clause allows you to define a subset of values, hereinafter referred to as a group, and apply an aggregate function to that group. A group is formed from all rows for which the values \u200b\u200bof the grouping fields specified in the GROUP BY clause have the same value. This makes it possible to combine fields and aggregate functions in a single SELECT clause. The syntax diagram of the use of aggregate functions is shown in Fig. 10 Aggregate functions can be used both in the expression for outputting the results of a row SELECTand in the expression for the processing condition of the formed groups HAVING... In this case, each aggregate function is calculated for each selected group. The values \u200b\u200bobtained when calculating aggregate functions can be used to display the corresponding results or for the selection condition of groups.

When using aggregate functions, remember that only the values \u200b\u200bof the grouping fields and possibly the values \u200b\u200bof the aggregate functions can be present in the result set. It is not permissible to group by one value, but display other values. This will be a syntax error.

For example, a request like this will always fail:

Select A

Group By B

Indeed, let's figure it out. What do we want to find? We are trying to output some column value AND from the table T , and at the same time we group by another column, column IN.We perform grouping - this means we collect all rows with the same values \u200b\u200bof column B into one group and further, and then it is not clear, we display the value of column A, but in one group there can be many values, different values \u200b\u200bof column A. So what value are we output? This is incomprehensible to us and the computer. That is why he refuses to fulfill such a request and claims that we have a syntax error.


Let's go back to our Session database, but add a few more attributes to it. Firstly, there may be namesakes among students, therefore, to identify a student, we will use the intent of a student record book, which always uniquely identifies a student. And secondly, suppose that a student can make several attempts to pass the exam in the same discipline, and for this we introduce in relation R1 the date of the next attempt to pass the exam. And finally, the third addition, we will assume that many groups in different specialties study at our University, then the schema of our database will be as follows

Sessia (N_zach, Discipline, Mark, Data_ex)

Example 21... Get the total number of suppliers (keyword COUNT ):

SELECT COUNT (*) AS N

As a result, we get a table with one column and one row containing the number of rows from table P:

Using aggregate functions with groupings

Example 23 . For each part, obtain the total supplied quantity (keyword GROUP BY …):

SUM (PD.VOLUME) AS SM

GROUP BY PD.DNUM;

This request will be executed as follows. First, the rows in the source table will be grouped so that each group includes rows with the same DNUM values. Then the VOLUME field will be summed up inside each group. From each group, one row will be included in the result table:

Comment... In the list of selectable fields of a SELECT statement containing a GROUP BY clause, you can include onlyaggregate functions and fields, which are included in the grouping condition... The following query will throw a syntax error:

SUM (PD.VOLUME) AS SM

GROUP BY PD.DNUM;

The reason for the error is that the list of selected fields includes the PNUM field, which excludedto the GROUP BY section. Indeed, each resulting group of lines can contain several lines with variousthe values \u200b\u200bof the PNUM field. One summary line will be formed from each group of lines. At the same time, there is no definite answer to the question of what value to select for the PNUM field in the summary row.

Comment... Some SQL dialects do not consider this a bug. The query will execute, but it is impossible to predict what values \u200b\u200bwill be entered in the PNUM field in the result table.

Example 24 ... Get part numbers with a total shipment exceeding 400 (keyword HAVING …):

Comment... The condition that the total supplied quantity should be more than 400 cannot be formulated in the WHERE clause, since you cannot use aggregate functions in this section. Conditions using aggregate functions should be placed in a special section HAVING:

SUM (PD.VOLUME) AS SM

GROUP BY PD.DNUM

HAVING SUM (PD.VOLUME)\u003e 400;

As a result, we get the following table:

Comment... One query may contain both the conditions for selecting rows in the WHERE clause and the conditions for selecting groups in the HAVING clause. Group selection conditions cannot be transferred from the HAVING clause to the WHERE clause. Likewise, row selection conditions cannot be moved from a WHERE clause to a HAVING clause, except for conditions that include fields from the GROUP BY list.

Using subqueries

A very convenient tool that allows you to formulate queries in a more understandable way is the ability to use subqueries nested in the main query.

Example 25 ... Get a list of suppliers whose status is less than the maximum status in the supplier table (comparison with a subquery):

WHERE P.STATYS<

(SELECT MAX (P.STATUS)

Comment... Because the P.STATUS field is compared with the result of the subquery, then the subquery must be formulated to return a table consisting of exactly one row and one column.

Comment

    Execute oncenested subquery and get the maximum status value.

    Scan the supplier table P, each time comparing the value of the supplier status with the result of the subquery, and select only those rows in which the status is less than the maximum.

Example 26 ... Using a predicate IN

(SELECT DISTINCT PD.PNUM

WHERE PD.DNUM \u003d 2);

Comment... In this case, the nested subquery can return a table containing multiple rows.

Comment... The result of executing the query will be equivalent to the result of the following sequence of actions:

    Execute oncenested subquery and get a list of vendor numbers supplying part number 2.

    Scan the vendor table P, each time checking to see if the subquery contains the vendor number.

Example 27 ... Using a predicate EXIST ... Get a list of suppliers supplying part number 2:

PD.PNUM \u003d P.PNUM AND

Comment... The result of executing the query will be equivalent to the result of the following sequence of actions:

    Scan vendor table P, each time performing a subquerywith the new value of the vendor number taken from table P.

    Include in the query result only those rows from the supplier table for which the nested subquery returned a non-empty set of rows.

Comment... Unlike the two previous examples, the nested subquery contains a parameter (external reference) passed from the main query - the P.PNUM provider number. Such subqueries are called correlated (correlated ). The xref can have different values \u200b\u200bfor each candidate row evaluated by the subquery, so the subquery must be re-executed for each row selected in the main query. Such subqueries are specific to the EXIST predicate, but can be used in other subqueries as well.

Comment... It might seem that queries with correlated subqueries will run slower than queries with uncorrelated subqueries. In fact, this is not the case, since how the user formulated the request, does not definehow this request will be executed. SQL is non-procedural, but declarative. This means that the user making the request simply describes what should be the result of the requestand how this result will be obtained - the DBMS itself is responsible for this.

Example 28 ... Using a predicate NOT EXIST ... Get a list of suppliers not supplying part number 2:

PD.PNUM \u003d P.PNUM AND

Comment... As in the previous example, this uses a correlated subquery. The difference is that the main query will select those rows from the suppliers table for which the nested subquery will not return a single row.

Example 29 ... Obtain names of suppliers supplying all parts:

SELECT DISTINCT PNAME

PD.DNUM \u003d D.DNUM AND

PD.PNUM \u003d P.PNUM));

Comment... This query contains two nested subqueries and implements a relational operation dividing relations.

The innermost subquery is parameterized by two parameters (D.DNUM, P.PNUM) and has the following meaning: select all rows containing delivery data of the supplier with the PNUM number of the part with the DNUM number. Negation NOT EXIST indicates that this supplier is not supplying the part. An external subquery to it, which itself is a nested and parameterized parameter P.PNUM, makes sense: select a list of parts that are not supplied by the PNUM supplier. Negation of NOT EXIST means that for a supplier with a PNUM number there should be no parts that would not be supplied by that supplier. This exactly means that only the suppliers supplying all parts are selected in the external request.

Important! If a function parameter is of the String type and contains a field name that contains spaces, then such a field name must be enclosed in square brackets.
For example: "[Number of Turns]".

1. Amount (Total) - calculates the sum of the values \u200b\u200bof the expressions passed to it as an argument for all detail records. You can pass an Array as a parameter. In this case, the function will be applied to the contents of the array.

Example:
Amount (Sales.AmountTurnover)

2. Count - calculates the number of values \u200b\u200bother than NULL. You can pass an Array as a parameter. In this case, the function will be applied to the contents of the array.

Syntax:
Quantity ([Various] Parameter)

When specifying getting different values, you should specify Distinct before the method parameter Number.

Example:
Quantity (Sales. Contractor)
Quantity (Various Sales. Contractor)

3. Maximum - gets the maximum value. You can pass an Array as a parameter. In this case, the function will be applied to the contents of the array.

Example:
Maximum (Balances.Quantity)

4. Minimum - gets the minimum value. You can pass an Array as a parameter. In this case, the function will be applied to the contents of the array.

Example:
Minimum (Balances.Quantity)

5. Average - gets the average for non-NULL values. You can pass an Array as a parameter. In this case, the function will be applied to the contents of the array.

Example:
Average (Balances.Amount)

6. Array - forms an array containing the parameter value for each detailed record.

Syntax:
Array ([Various] Expression)

A table of values \u200b\u200bcan be used as a parameter. In this case, the result of the function will be an array containing the values \u200b\u200bof the first column of the table of values \u200b\u200bpassed as a parameter. If the expression contains the Array function, then this expression is considered to be aggregate. If the keyword Different is specified, then the resulting array will not contain duplicate values.

Example:
Array (Counterparty)

7.ValueTable - forms a table of values \u200b\u200bcontaining as many columns as the function's parameters. Detailed records are obtained from datasets that are needed to get all the fields involved in the function parameter expressions.

Syntax:
ValueTable ([Various] Expression1 [AS ColumnName1] [, Expression2 [AS ColumnName2], ...])

If the function parameters are residual fields, then the resulting table of values \u200b\u200bwill contain values \u200b\u200bfor records by unique combinations of dimensions from other periods. In this case, values \u200b\u200bare obtained only for residual fields, dimensions, accounts, period fields and their details. The values \u200b\u200bof other fields in records from other periods are considered to be NULL. If an expression contains the ValuesTable function, then this expression is considered to be aggregate. If the keyword Different is specified, then the resulting table of values \u200b\u200bwill not contain rows containing the same data. Each parameter can be followed by an optional AS keyword and a name that will be assigned to the column of the table of values.

Example:
Values \u200b\u200bTable (Various Nomenclature, Feature Nomenclature AS Feature)

8. Minimize (GroupBy) - designed to remove duplicates from an array.

Syntax:
Collapse (Expression, ColumnNumbers)

Options :

  • Expression - expression of type Array orValuesTable, the values \u200b\u200bof the elements of which are to be collapsed;
  • Column numbers - (if the expression is of the ValueTable type) the String type. Numbers or names (separated by commas) of the columns of the table of values, among which you want to search for duplicates. By default - all columns.
Example:
Minimize (Values \u200b\u200bTable (PhoneNumber, Address), "PhoneNumber");

9. GetPart - Retrieves a value table containing specific columns from the original value table.

Syntax:
GetPart (Expression, ColumnNumbers)

Options :

  • Expression - type Table of Values. The table of values \u200b\u200bfrom which to get the columns;
  • Column numbers - type String. Numbers or names (separated by commas) of the columns of the table of values \u200b\u200bto be obtained.
Returned Value: TableValues, which contains only the columns that are specified in the parameter.

Example:
GetPart (Collapse (Values \u200b\u200bTable (PhoneNumber, Address), "PhoneNumber"), "PhoneNumber");

10. Order - designed to order the elements of the array and the table of values.

Syntax:
Arrange (Expression, ColumnNumbers)

Options :

  • Expression - Array or ValuesTable from which to get the columns;
  • Column numbers - (if the expression is of the ValueTable type) numbers or names (separated by commas) of the values \u200b\u200btable columns by which you want to order. May contain the direction of ordering and the need for auto-ordering: Descending / Ascending + Auto-ordering.
Returned Value: Array orValuesTable, with ordered elements.

Example:
Sort (Values \u200b\u200bTable (PhoneNumber, Address, CallDate), "CallDate Descending");

11. JoinStrings - designed to combine strings into one line.

Syntax:
JoinStrings (Value, Element Separator, Column Separators)

Options :

  • Value - expressions to be combined into one line. If it is an Array, then the elements of the array will be concatenated into a string. If ValuesTable is, then all columns and rows of the table will be combined into a row;
  • Item Separator - a string containing text to be used as a separator between array elements and rows of the table of values. By default - line feed character;
  • Column Dividers - a string containing text to be used as a separator between the columns of the table of values. Default "; ".
Example:
ConnectStrings (ValueTable (PhoneNumber, Address));

12. GroupProcessing - returns the DataCompositionGroupProcessingData object. The object in the Data property is placed in the form of a table of values \u200b\u200bof the grouping values \u200b\u200bfor each expression specified in the parameter of the Expressions function. When using hierarchical grouping, each level of the hierarchy is handled separately. Values \u200b\u200bfor hierarchical records are also placed in the data. The object's CurrentItem property contains a row of the values \u200b\u200btable for which the function is currently being calculated.

Syntax:
GroupProcessing (Expressions, Hierarchy Expressions, GroupName)

Options :

  • Expressions... Expressions to evaluate. A string that contains the expressions to be evaluated, separated by commas. After each expression, there may be an optional AS keyword and a column name in the resulting table of values. Each expression forms a column of the table of values \u200b\u200bof the Data property of the DataGroupProcessingDataCompositionData object.
  • ExpressionsHierarchies... Expressions to evaluate for hierarchical records. This is the same as the Expressions parameter, except that the Hierarchy Expressions parameter is used for hierarchical records. If the parameter is not specified, the expressions specified in the Expression parameter are used to calculate values \u200b\u200bfor hierarchical records.
  • Group Name... The name of the grouping in which to calculate the processing grouping. Line. If not specified, then the calculation is performed in the current grouping. If the calculation is in the table and the parameter contains an empty string, or is not specified, then the value is calculated for grouping - a row. The layout composer, when generating the data composition layout, replaces the given name with the grouping name in the resulting layout. If the grouping is not available, then the function will be replaced with a NULL value.
13. Every - if at least one entry has the value False, then the result is False, otherwise True.

Syntax:
Everyone (Expression)

Parameter:

  • Expression - Boolean type.
Example:
Each()

14. Any - if at least one entry has the value True, then the result is True, otherwise False

Syntax:
Any (Expression)

Parameter:

  • Expression - Boolean type.
Example:
Any()

15.StandardGeneralCollectionDeviation (Stddev_Pop) - calculates the standard deviation of the population. Calculated by the formula: SQRT (GeneralCollection (X) Variance).

Syntax:
GeneralCollection StandardDev (Expression)

Parameter:

  • Expression - Number type.

Example:

X1 2 3 4 5 6 7 8 9
Y7 1 2 5 7 34 32 43 87
SELECT General Population Standard Deviation (Y) FROM Table
Result: 805.694444

16.StandardSample Deviation (Stddev_Samp) - calculates the cumulative standard deviation. Calculated by the formula: SQRT (SampleDispersion (X)).

Syntax:
SampleDefusion (Expression)

Parameter:

  • Expression - Number type.
Return type Number.

Example:

X1 2 3 4 5 6 7 8 9
Y7 1 2 5 7 34 32 43 87
SELECT Sample Standard Deviation (Y) FROM Table
Result: 28.3847573

17.Sampling variance (Var_Samp) - calculates the typical difference of a series of numbers without considering the NULL values \u200b\u200bin this set. Calculated by the formula: (Sum (X ^ 2) - Sum (X) ^ 2 / Amount (X)) / (Amount (X) - 1). If Count (X) \u003d 1, then NULL is returned.

Syntax:
Sample Variance (Expression)

Parameter:

  • Expression - Number type.
Example:
SELECT GeneralCollection Variance (Y) FROM Table
Result: 716.17284

19.Covariance of GeneralPop (Covar_Pop) - calculates the covariance of a number of numeric pairs. It is calculated by the formula: (Sum (Y * X) - Sum (X) * Sum (Y) / n) / n, where n is the number of pairs (Y, X) in which neither Y nor X are NULL.

Syntax:
GeneralCovariance (Y, X)

Options :

  • Y - Number type;
  • X - Number type.
Example:
X1 2 3 4 5 6 7 8 9
Y7 1 2 5 7 34 32 43 87
SELECT Covariance of General Population (Y, X) FROM Table
Result: 59.4444444

20.CovarianceSample (Covar_Samp) - calculates the typical difference of a series of numbers without considering NULL values \u200b\u200bin this set. It is calculated by the formula: (Sum (Y * X) - Sum (Y) * Sum (X) / n) / (n-1), where n is the number of pairs (Y, X) in which neither Y nor X is NULL.

Syntax:
Sample covariance (Y, X)

Options :

  • Y - Number type;
  • X - Number type.
Example:
X1 2 3 4 5 6 7 8 9
Y7 1 2 5 7 34 32 43 87
SELECT CovarianceSamples (Y, X) FROM Table
Result: 66.875

21. Correlation (Corr) - calculates the correlation coefficient of a number of numeric pairs. Calculated by the formula: GeneralCovariance (Y, X) / (GeneralCollection Standard Deviation (Y) * GeneralCollection Standard Deviation (X)). Pairs where Y or X are NULL are not counted.

Syntax:
Correlation (Y, X)

Options :

  • Y - Number type;
  • X - Number type.
Example:
X1 2 3 4 5 6 7 8 9
Y7 1 2 5 7 34 32 43 87
SELECT Correlation (X, Y) FROM Table
Result: 0.860296149

22. RegressionSlope (Regr_Slope) - calculates the slope of the line. It is calculated by the formula: Covariance of GeneralCollection (Y, X) / Variance of GeneralCollection (X). Calculated without considering NULL pairs.

Syntax:
Regression Slope (Y, X)

Options :

  • Y - Number type;
  • X - Number type.
Example:
X1 2 3 4 5 6 7 8 9
Y7 1 2 5 7 34 32 43 87
SELECT Regression Slope (Y, X) FROM Table
Result: 8.91666667

23. RegressionIntercept (Regr_Intercept) - calculates the Y-point of intersection of the regression line. Calculated by the formula: Average (Y) - Regression Slope (Y, X) * Average (X). Calculated without considering NULL pairs.

Syntax:
Regression Segment (Y, X)

Options :

  • Y - Number type;
  • X - Number type.
Example:
SELECT Regression Count (Y, X) FROM Table
Position: 9

25. RegressionR2 (Regr_R2) - calculates the coefficient of determination. Calculated without considering NULL pairs.

Syntax:
Regression R2 (Y, X)

Options :

  • Y - Number type;
  • X - Number type.
Return value:
  • Null - if the Variance of GeneralCollection (X) \u003d 0;
  • 1 - if Variance of GeneralCollection (Y) \u003d 0 AND Variance of GeneralCollection (X)<>0;
  • POW (Correlation (Y, X), 2) - if Variance of GeneralCollection (Y)\u003e 0 AND Variance of GeneralCollection (X)<>0.
Example:
X1 2 3 4 5 6 7 8 9
Y7 1 2 5 7 34 32 43 87
SELECT Regression R2 (Y, X) FROM Table
Result: 0.740109464

26. Regression MeanX (Regr_AvgX) - calculates the average of X after eliminating X and Y pairs, where either X or Y are empty. Average (X) is calculated without considering NULL pairs.

Syntax:
Regression Mean X (Y, X)

Options :

  • Y - Number type;
  • X - Number type.
Example:
X1 2 3 4 5 6 7 8 9
Y7 1 2 5 7 34 32 43 87
SELECT Regression Mean X (Y, X) FROM Table
Result: 5

27. RegressionAverageY (Regr_AvgY) - calculates the average Y after eliminating X and Y pairs, where either X or Y are empty. Average (Y) is calculated without considering NULL pairs.

Syntax:
Regression Mean Y (Y, X)

Options :

  • Y - Number type;
  • X - Number type.
Example:
X1 2 3 4 5 6 7 8 9
Y7 1 2 5 7 34 32 43 87
SELECT Regression Mean Y (Y, X) FROM Table
Result: 24.2222222

28. RegressionSXX (Regr_SXX) - calculated by the formula: RegressionNumber (Y, X) * Variance of GeneralCollection (X). Calculated without considering NULL pairs.

Syntax:
Regression SXX (Y, X)

Options :

  • Y - Number type;
  • X - Number type.
Returns the sum of squares of independent expressions used in a linear regression model. The function can be used to assess the statistical validity of a regression model.

Example:
SELECT Regression SYY (Y, X) FROM Table
Result: 6445.55556

30. RegressionSXY (Regr_SXY) - is calculated by the formula: RegressionNumber (Y, X) * Covariance of GeneralCollection (Y, X). Calculated without considering NULL pairs.

Syntax:
Regression SXY (Y, X)

Options :

  • Y - Number type;
  • X - Number type.
Example:
X1 2 3 4 5 6 7 8 9
Y7 1 2 5 7 34 32 43 87
SELECT RegressionSXY (Y, X) FROM Table
Position: 535

31. Rank

Syntax:
PlaceInOrder (Order, Hierrachia Order, GroupName)

Options :

  • Order - type String. Contains expressions in the sequence of which you want to place group records, separated by commas. The ordering direction is controlled by the words Ascending, Descending. You can also follow the field with the string Autoordering, which means that when ordering links, you must use the ordering fields defined for the referenced object. If no sequence is specified, the value is calculated in the grouping sequence;
  • Order of Hierrachia - type String. Contains ordering expressions for hierarchical records;
  • Group Name - type String. The name of the grouping in which to calculate the processing grouping. If not specified, then the calculation is performed in the current grouping. If the calculation is in the table and the parameter contains an empty string, or is not specified, then the value is calculated for grouping - a row. The layout composer, when generating the data composition layout, replaces the given name with the grouping name in the resulting layout. If the grouping is not available, then the function will be replaced with a NULL value.
If the sequence contains two or more records with the same collating field values, the function returns the same values \u200b\u200bfor all records.

Example:
PlaceOrder ("[Number of Turns]")

32. ClassificationABC (ClassificationABC)

Syntax:
ABC Classification (Value, Number of Groups, PercentageFor Groups, GroupName)

Options :

  • Value - type String. by which you want to calculate the classification. The string in which the expression is specified;
  • Number of Groups - Number type. Specifies the number of groups to split into;
  • PercentageFor Groups - type String. As much as how many groups you need to split minus 1. Separated by commas. If not set, then automatically;
  • Group Name - type String. The name of the grouping in which to calculate the processing grouping. If not specified, then the calculation is performed in the current grouping. If the calculation is in the table and the parameter contains an empty string, or is not specified, then the value is calculated for grouping - a row. The layout composer, when generating the data composition layout, replaces the given name with the grouping name in the resulting layout. If the grouping is not available, then the function will be replaced with a NULL value.
The result of the function will be a class number, starting from 1, which corresponds to class A.

Example:
Classification ABC ("Amount (Gross Profit)", 3, "60, 90")

Did you like the article? To share with friends: