Structured data concept. Definition and purpose of the database. General concept of data structure

Initially, the programming process provided for the programmer to write all algorithms directly in the machine language. This approach exacerbated the already difficult task of developing algorithms and too often resulted in bugs that needed to be discovered and corrected [a process known as debugging] before the work could be considered complete.

The first step towards making programming easier was to eliminate the use of numbers to write commands and operands directly in the form in which they are used in the machine. For this purpose, the development of programs began to widely use the mnemonic notation of various commands instead of their hexadecimal representation. For example, instead of the digital code of the register load command, the programmer could now write LOD, and instead of the command code for copying the register contents into memory, he could use the STO mnemonic notation. For writing operands, rules were developed according to which the programmer could assign descriptive names to some memory areas [they are often called identifiers] and use them when writing program commands instead of addresses of the corresponding memory cells. Such identifiers are commonly referred to as variables. This emphasizes that by changing the value allocated to a given chunk of memory, we change the value associated with the identifier assigned to that chunk during program execution.

When a variable is declared in a program, their type is usually determined at the same time. The data type determines both the interpretation of specific data and the operations that can be performed on it. Data types include Integer, Real, Character, and Boolean.

The Integer type is used to denote numeric data that is whole numbers. In memory, they are most often represented in binary's complement code. You can perform common arithmetic and comparison operations with Integer data.

The Real type is designed to represent numeric data that can contain non-integer values. They are usually stored in memory as binary floating point numbers. The operations that can be performed on Real data are similar to those performed on Integer data. However, the manipulations that must be performed to add two items of data of type Real are different from the manipulations required to perform actions on variables of type Integer.

The Character type is used for data consisting of characters that are stored in memory as ASCII or UNICODE codes. Data of this type can be compared with each other [determine which of the two characters precedes the other in alphabetical order]; check if one character string is another, and also concatenate two strings into one, longer string, appending one of them after the other [concatenation operation].

Boolean refers to data that can only have two values \u200b\u200bTrue and False. An example of such data is the result of an operation comparing two numbers. Boolean data operations include checking if the current value of a variable is True or False.

The main memory of the machine is organized in the form of separate cells with sequentially increasing addresses. However, these cells are often used as a basis for implementing other data placement methods. For example, text is usually viewed as a long string of characters, while sales information can be viewed as a rectangular table of numeric values, each representing the number of deals made by a particular employee on a particular day. The challenge is to provide the user with the means to manipulate such abstract structures, instead of delving into the details of the true organization of data in the main memory of the machine. To use a computer correctly, it is necessary to have a good knowledge of the structural relationships between data, the basic methods of representing structures within a computer, as well as methods of working with them. For connections between data in a computer, the following information structures are used: array, record, list, tree, stack, queue.

Arrays

An array is a structure that contains several elements of the same type. Indexes are used to order the elements of an array. Indexes are written in parentheses after the array name. An array with one index is called one-dimensional, with two it is called two-dimensional, and so on.

Recording

A record is a structure that does not have to be of the same type. The individual elements of a record are called fields. The field, in turn, can also be a record.

record Student (
FirstName,
LastName,
Group
)

Lists

A list is a set of records, each of which contains a special field - a pointer. The pointer associates a record with some other record, or contains Null values \u200b\u200bthat indicate that the value of the pointer is undefined.

Entries in a singly linked list have one pointer each, and they are linked in a chain:

The arrow in the figure indicates the contents of the pointer, and the word Data indicates the collections of fields that store data. The list can be organized using a two-dimensional array, where all the elements with the first index equal to 0 are intended to store data, and the elements with the first index equal to 1 are pointers.


In this list, entries containing letters of the English alphabet are arranged in alphabetical order. The first entry in the list contains the character "A", the second contains "B", and so on.

To work with a list, you need to be able to perform three basic operations:

Pass () - traversing or moving along the list;
Add () - adding a new entry to the list;
Delete () - deletes an entry from the list.

In addition to operations for working with the list, two more variables are needed:

variable Head, which stores information about the first entry in the list
the variable Current, which points to the current entry in the list

The table summarizes the descriptions of some operations on the list, an example of the implementation of which is given above.

Operation namePseudocode
Move one step down the list

function Pass (Current) (
if (M Null) then Current: \u003d M;
return (Current);
}

function Add (Current, New) (
M: \u003d M;
M: \u003d New;
return;
}

Add to the list the entry pointed to by the New variable

function Delete (Current) (
if (M Null) then
M: \u003d M];
return;
}

The records in a doubly linked list are linked together in a chain, but they have two pointer fields. One of them points to the previous item in the list, the other to the next item. This structure allows you to move through the list in two directions: forward and backward.

A circular list is a list whose last entry points to the first. There is no null entry in these lists.


A tree is a branched list, each entry of which can contain multiple pointers. The entries in the tree are called nodes. Nodes where all pointers are empty are called leaves. The top starting node of the tree is called the root node. In many problems it is sufficient to use binary [binary] trees, the nodes of which have no more than two pointers.

Example. You want to evaluate the mathematical expression (3 + 7) * (2 / (3-1)). Let's represent this expression as a tree:

Each node of this tree is a record of the following form:

record Node (
Operation
Number
LeftPointer
RightPointer
)

The leaves of the tree contain numbers, the rest of the nodes are symbols of operations.

Having implemented the described tree on a two-dimensional array, we get the following picture:


To calculate the value of a tree, you need to calculate the values \u200b\u200bof the right and left subtrees, and then perform the resulting operation on them. The pseudocode of the algorithm that solves the problem will be:

function Calculate (Current) (
if (M \u003d Null) then
Result: \u003d M;
else (
R1: \u003d Calculate (M);
R2: \u003d Calculate (M);
Result: \u003d R1 (M) R2;
}
return (Result);
}

A stack is a last-in-first-out data structure. The data stored on the stack is accessed through the top. Data is pushed onto the stack sequentially. The element pushed onto the stack very first is at the bottom, and in order to pop it off the stack, you must first pop all the data that was pushed onto the stack later.

When working with a stack, two emergencies are possible: an attempt to read data from an empty stack; an attempt to push an item onto the stack when the number of items on the stack has reached the maximum number allowed.

A queue is a first-in-first-out data structure. There is a variable amount of data in the queue. When enqueued, data is added to the tail; when retrieved, it is taken from the head.

Hash table

Hashing is a technique that provides direct access to records without using any additional structures. The process can be summarized as follows. The space where the data is stored is divided into several segments. Records are distributed across these segments according to some algorithm called a hashing algorithm that converts the value of the key field to a segment number. Each record is stored in a segment defined by this process. Therefore, a record can be retrieved by hashing the value of its key field and reading the records of the corresponding segment. A data structure constructed in this way is called a hash table.

For example, if you need to organize a hash table for storing uppercase letters of the English alphabet, then you can choose ASCII character codes as keys, and the hashing algorithm will cut off the least significant five bits and form an array element index for storing the character on their basis:

In general, a hashing algorithm should, based on the key value, produce an index value within the boundaries of the array and distribute the keys evenly across the array elements. Failure to comply with the last requirement leads to situations where several records fall into the same segment. These situations are called collisions.

The data stored in the computer memory is a collection of zeros and ones (bits). Bits are combined in sequence: bytes, words, etc. Each part of the RAM that can hold one byte or word is assigned a sequence number (address).

What is the meaning of the data, what symbols are they expressed - alphabetic or numeric, what does this or that number mean - all this is determined by the processing program. All data necessary for solving practical problems are divided into several different types, and the concept a type communicates not only with the presentation of data in the address space, but also with the way they are processed.

Any data can be classified as one of two types: basic (simple), the form of which is determined by the computer architecture, or complex, designed by the user to solve specific problems.

Simple data types are symbols, numbers, etc. elements, further crushing of which does not make sense. Data structures (complex types) are formed from elementary data.

Some structures:

· Array(function with finite scope) - a simple collection of data elements of the same type, a means of operating with a group of data of the same type. A single element of the array is specified by an index. An array can be one-dimensional, two-dimensional, etc. The types of one-dimensional arrays of variable length are structures of the type ring, stack, queue and deque.

· Recording(Cartesian product) - a collection of data elements of different types. In the simplest case, the record contains a constant number of elements, which are called fields... A collection of records of the same structure is called file... (A file is also called a data set in external memory, such as a magnetic disk). In order to be able to retrieve individual records from the file, each record is assigned a unique name or number, which serves as its identifier and is located in a separate field. This identifier is called key.

Data structures such as an array or a record occupy a constant volume in the computer memory, therefore they are called static structures. Static structures also include a bunch of.

There are a number of structures that can change their length - the so-called dynamic structures... These include tree, list, link.

An important structure for placing items that requires a non-linear address space is wood... There are many data structures that can be represented as trees. These are, for example, classification, hierarchical, recursive and other structures. For more information on trees, see section 1.2.1.

Figure: 1.1. Data type classification

1.1.2 Generic Data Structures or Models

Above, we examined several types of structures that are collections of data elements: array, tree, record. A more complex data type can include these structures as members. For example, the elements of a record can be an array, a stack, a tree, etc.

There is a wide variety of complex data types, but studies carried out on a large practical material have shown that among them several of the most common can be distinguished. Generalized structures are also called data modelssince they reflect the user's perception of real-world data.

Any data model must contain three components:

1. data structure - describes the user's point of view on the presentation of data.

2. set of allowed operations executed on the data structure. The data model assumes, at a minimum, the presence of a data definition language (DLS), which describes the structure of their storage, and a data manipulation language (DML), which includes data extraction and modification operations.

3. integrity constraints - a mechanism for maintaining the correspondence of data to the subject area based on formally described rules.

In the process of historical development, the following data models were used in the DBMS:

· hierarchical,

· network,

· relational.

Recently, an object-oriented approach to data presentation has become increasingly important.

1.2 Data Access Methods

Data presentation issues are closely related to the operations by which this data is processed. These operations include: fetch, change, enable and an exception data. All of the above operations are based on the operation accesswhich cannot be considered regardless of the presentation method.

In search problems, it is assumed that all data is stored in memory with a certain identification and, speaking of access, mean, first of all, access to data (called keys) that uniquely identifies the associated data sets.

Suppose we need to organize access to a file containing a set of identical records, each of which has a unique key field value. The easiest way to search is to sequentially scan each record in the file until you find the one whose key value matches the search criteria. Obviously, this method is quite inefficient, since the records in the file are not ordered by the value of the key field. Sorting the records in the file is also not applicable because it is even more time consuming and must be done after each record is added. Therefore, they proceed as follows - the keys, along with pointers to the corresponding records in the file, are copied into another structure that allows you to quickly perform sorting and searching operations. When accessing data, first in this structure, the corresponding key value is found, and then a record from the file is obtained from the pointer stored with it.

There are two classes of methods that implement key data access:

· tree search methods,

· hashing methods.

1.2.1 tree search methods

Definition: A tree is a finite set consisting of one or more elements, called nodes, such that:

1. between the nodes there is a relationship of the "source - generated" type;

2. there is only one node without a source node. It is called the root;

3. all nodes except the root have only one source; each node can have multiple child nodes;

4. the "original - generated" relation acts only in one direction, i.e. no descendant of some node can become an ancestor for it.

The number of spawned individual nodes (the number of subtrees of a given root) is called its degree... A knot with a zero degree is called a leaf or end knot. The maximum value of the degree of all nodes of a given tree is called by the degree of the tree.

If in a tree between generated nodes that have a common source, their order is considered essential, then the tree is called orderly... In search problems, ordered trees are almost always considered.

An ordered tree whose degree is at most 2 is called binary tree. The binary tree is especially used when searching in RAM. Search algorithm: first, the search argument is compared with the key at the root. If the argument matches the key, the search is over, if it does not match, then in the case when the argument is less than the key, the search continues in the left subtree, and in the case when there is more than the key in the right subtree. Increasing the level by 1, repeat the comparison, considering the current node as the root.

Example: Let a list of students be given, containing their names and average grade (see table 1.1). The name of the student is used as a key. Assuming all records are of fixed length, then the record number can be used as a pointer. In this case, the offset of the record in the file will be calculated as ([record_number] -1) * [record_length] ... Let the search argument be "Petrov". Figure 1.2 shows one possible binary search tree and search path for this dataset.

Table 1.1

Vasiliev

Kuznetsov

Tikhomirov

Figure: 1.2. Binary tree search

Note that the following rule for comparing string variables is used here: the value of a character is considered to correspond to its ordinal number in the alphabet. Therefore, "I" is less than "K", and "K" is less than "C". If the current characters in the compared strings are the same, then the characters in the following positions are compared.

Binary trees are especially effective when the set of keys is not known in advance, or when this set is changing rapidly. Obviously, with a variable set of keys, it is better to have balanced tree.

Definition: A binary tree is called balanced if the height of the left subtree of each node differs from the height of the right subtree by no more than 1.

When searching for data in external memory, the problem of reducing the number of data transfers from the external memory to the main memory is very important. Therefore, in this case, in comparison with binary trees, strongly branching trees will be more profitable. their height is smaller, then the search will require fewer calls to external memory. In this case, B-trees (B - balanced)

Definition: A B-tree of order n is a strongly branching tree of degree 2n + 1 with the following properties:

  1. Each node, except for the root, contains at least n and at most 2n keys.
  2. The root contains at least one and at most 2n keys.
  3. All leaves are located at the same level.
  4. Each intermediate node contains two lists: an ascending list of keys and a corresponding list of pointers (there is no list of pointers for leaf nodes).

For such a tree:

· sequential access can be organized relatively easily, since all leaves are at the same level;

· when adding and changing keys, all changes are limited, as a rule, to one node.

Figure: 1.3 Balanced tree

AT-tree in which the true values \u200b\u200bare contained only in the leaves (leaf nodes) is called AT +- tree... The internal nodes of such a tree contain separator keys that specify the range of key changes for subtrees.

You can read more about the different types of balanced trees, as well as methods of their implementation, in the literature, a list of which is given at the end of the page. It should be noted that B- trees are best suited only for organizing access to fairly simple (one-dimensional) data structures. To access more complex structures, such as, for example, spatial (multidimensional) data, in recent years, they increasingly use R-trees.

R-wood ( R-Tree) is an index structure for accessing spatial data proposed by A. Guttman (University of California, Berkeley). The R-tree allows arbitrary execution of operations of adding, deleting and searching data without periodic reindexing.

Records are used to represent data, each of which has a unique identifier (tuple-identifier). Each leaf node (leaf) of the tree contains a record of the form ( I, tuple-identifier) where I - n-dimensional box containing pointers to spatial data (also called minimal bounding rectangle, MBR), and each element in tuple-identifier contains the upper and lower bounds of the parallelepiped in the corresponding dimension.

Non-end nodes contain records of the form (I, childnode-pointer) where I the minimum bounding box for the MBR of all nodes derived from this one. Childnode-pointer is a pointer to derived nodes.

Let be M and m <= M/2 respectively, the maximum and minimum number of elements that can be placed in a node. Then the properties of the R-tree can be described as follows:

· R-Tree is a highly balanced tree, i.e. all leaves are at the same level.

· The root node has at least two children.

· For each element (I, childnode-pointer) in the non-terminal node I is an the smallest possible a parallelepiped, i.e. contains all parallelepipeds of derived nodes.

· Each end node (sheet) contains from m before M index records.

· For each index entry (I, tuple-identifier) at the end node I is a parallelepiped that contains n-dimensional data object pointed to tuple-identifier .

1.2.2 hashing

This method is used when the entire set of keys is known in advance and can be located in RAM for the duration of processing. In this case, a special function is constructed that uniquely maps a set of keys to a set of pointers, called a hash function (from the English word "to hash" - cut, grind). Having such a function, you can calculate the address of the record in the file by the given search key. In general, the key data used to determine the address of a record is organized in a table called a hash table.

If the set of keys is unknown in advance or is very large, then the idea of \u200b\u200bunambiguously calculating the address of a record by its key is abandoned, and the hash function is considered simply as a function that scatters a set of keys into a set of addresses.

Annotation: The general concept of a data structure as a performer who organizes work with data is given: storage, addition and deletion, search, etc. Implementation of some structures based on others is considered, in particular, implementation based on an array. The most important of the simplest data structures are presented: the queue and the stack, as well as their continuous array-based implementations. Numerous examples of using the stack in programming are given. The reverse Polish notation of the formula (the sign of the operation after the arguments) and the method of its calculation on a stack machine are considered. The graphical language PostScript is considered as an example of using reverse Polish notation. The material is illustrated by the "Stack Calculator" project, implemented in the C language.

Data structures

"Algorithms + data structures \u003d programs". This is the title of a book by Niklaus Wirth, a famous Swiss programming specialist, author of the Pascal, Modula-2, Oberon languages. The development of a structured approach to programming is associated with Wirth's name. N. Wirth is also known as a brilliant teacher and author of classical textbooks.

Both components of the program, highlighted by N. Wirth, are equally important. Not only an imperfect algorithm, but also an unsuccessful organization of work with data can slow down the program by tens, and sometimes even millions of times. On the other hand, possession programming theory and the ability to systematically apply it in practice allows you to quickly develop effective and at the same time aesthetically beautiful programs.

General concept of data structure

Data structure is a performer who organizes work with data, including storage, addition and deletion, modification, search, etc. Data structure maintains a certain order of access to them. A data structure can be thought of as a kind of warehouse or library. When describing a data structure, you need to list the set of actions that are possible for it, and clearly describe the result of each action. We will call such actions regulations... From a programmatic point of view, a data structure prescription system corresponds to a set of functions that work on shared variables.

Data structures are most conveniently implemented in object-oriented languages. In them, the class corresponds to the data structure, the data itself is stored in the member variables of the class (or the data is accessed through the member variables), the set of class methods corresponds to the prescription system. As a rule, in object-oriented languages, data structures are implemented as a library of standard classes: these are the so-called container classes C ++ languages \u200b\u200bincluded in the standard class library STL, or classes that implement various data structures from the library Java Developer Kit Java language.

However, data structures can be just as successfully implemented in traditional programming languages \u200b\u200bsuch as Fortran or C. In this case, one should adhere to the object-oriented programming style: clearly identify the set of functions that work with the data structure, and restrict access to data only by this set of functions. The data itself is implemented as static (not global) variables. When programming in C, the data structure corresponds to two source files:

  1. header, or h-file, which describes the interface of the data structure, i.e. a set of function prototypes corresponding to the data structure prescription system;
  2. implementation file, or C-file, which defines static variables that store and access data, and also implements functions that correspond to the data structure prescription system

Data structure usually implemented based on a simpler basic structurepreviously implemented, or based on an array and a set of simple variables. A clear distinction should be made between describing a data structure from a logical point of view and describing its implementation. There can be many different implementations, but from a logical point of view (i.e., from the point of view of an external user) they are all equivalent and differ, perhaps, only in the speed of prescription execution.


Structures and data types. Arrays, trees, lists, graphs. Data operations.

The data stored in the computer memory is a collection of zeros and ones (bits). Bits are combined in sequence: bytes, words, etc. Each part of the RAM that can hold one byte or word is assigned a sequence number (address).

What is the meaning of the data, what symbols are they expressed - alphabetic or numeric, what does this or that number mean - all this is determined by the processing program. All data necessary for solving practical problems are subdivided into several types, and the concept of type is associated not only with the presentation of data in the address space, but also with the way of processing them.

Any data can be classified as one of two types: basic (simple), the form of which is determined by the computer architecture, or complex, designed by the user to solve specific problems.

Simple data types are symbols, numbers, etc. elements, further crushing of which does not make sense. Data structures (complex types) are formed from elementary data.

Some structures:

Array (function with finite scope) is a simple collection of data elements of the same type, a means of operating with a group of data of the same type. A single element of the array is specified by an index. An array can be one-dimensional, two-dimensional, etc. Ring, stack, queue, and deque are the types of one-dimensional variable length arrays.

If an array always occupies a contiguous section of memory, then a list is the simplest example of a so-called dynamic data structure. In dynamic data structures, an object is contained in various memory areas, the number and composition of which can change during operation. The unity of such an object is maintained by combining its parts in the class description.

The simplest linear list is a linear sequence of elements. For each of them, except for the last, there is the next element, and for each, except for the first, the previous one. The list is traditionally depicted as a sequence of elements, each of which contains a link (pointer) to the next and / or previous element, however, note that physically in the representation of the list elements there may not be any links.

A typical set of operations on a list will include adding, removing and searching for its elements, calculating the length of the list, and sequentially processing the elements (iterating over) the list.

As with arrays, many class libraries include the ability to describe and manipulate lists (for example, the CList of the MFC class library). Despite this, it often becomes necessary to describe your own data structures in the form of lists containing more suitable operations for the problem being solved, simpler (and, therefore, more efficient) than standard ones, or with specific features (for example, ordered lists ).

Typically, when describing a list, the representation of each item in the list is described as a separate class. This class has a link to the next and / or previous element as its attribute.

Record (Cartesian product) - a collection of data elements of different types. In the simplest case, a record contains a constant number of elements, which are called fields. A collection of records of the same structure is called a file. (A file is also called a data set in external memory, such as a magnetic disk). In order to be able to extract individual records from the file, each record is assigned a unique name or number that serves as its identifier and is located in a separate field. This identifier is called a key.

Data structures such as an array or a record occupy a constant volume in the computer memory, therefore they are called static structures. There are also many static structures.

There are a number of structures that can change their length - the so-called dynamic structures. These include tree, list, link.

An important structure that requires a non-linear address space to place its elements is a tree. There are many data structures that can be represented as trees. These are, for example, classification, hierarchical, recursive and other structures.

Generalized structures or data models.

Above, we considered several types of structures that are collections of data elements: array, tree, record. A more complex data type can include these structures as members. For example, the elements of a record can be an array, a stack, a tree, etc.

There is a wide variety of complex types of data, but studies carried out on a large practical material have shown that among them several of the most common can be distinguished. Generic structures are also called data models because they reflect the user's view of real-world data.

Any data model must contain three components:

Data structure - describes the user's point of view on the presentation of data.

The set of valid operations performed on the data structure. The data model assumes, at a minimum, the presence of a data definition language (DLS), which describes the structure of their storage, and a data manipulation language (DML), which includes data extraction and modification operations.

Integrity constraints are a mechanism for maintaining compliance with domain data based on formally described rules.

In the process of historical development, the following data models were used in the DBMS:

Hierarchical - This model has one main object and the rest - subordinates - objects that are at different levels of the hierarchy. Object relationships form a hierarchical tree with one root object.

Networked - The networked approach to organizing data is an extension of the hierarchical one. In hierarchical structures, a child record must have exactly one ancestor; in a network data structure, a descendant can have any number of ancestors.

In the network data model, any object can be both master and subordinate at the same time, and can participate in the formation of any number of relationships with other objects.

Relational - In a relational model, data is broken down into sets that make up a tabular structure. This table structure consists of individual data items called fields. A single set or group of fields is known as a record.

Data access methods.

Data presentation issues are closely related to the operations by which this data is processed. These operations include fetching, modifying, including and excluding data. All of the above operations are based on the access operation, which cannot be considered regardless of the presentation method.

In search problems, it is assumed that all data is stored in memory with a certain identification and, speaking of access, they mean, first of all, access to data (called keys) that uniquely identifies the data sets associated with them.

Suppose we need to organize access to a file containing a set of identical records, each of which has a unique key field value. The easiest way to search is to sequentially scan each record in the file until you find the one whose key value matches the search criteria. Obviously, this method is quite inefficient, since the records in the file are not ordered by the value of the key field. Sorting the records in the file is also not applicable because it is even more time consuming and must be done after each record is added. Therefore, they proceed as follows - the keys, together with pointers to the corresponding records in the file, are copied into another structure, which allows you to quickly perform sorting and searching operations. When accessing data, first in this structure, the corresponding key value is found, and then a record from the file is obtained from the pointer stored with it.

There are two classes of methods that implement key data access:

Tree search methods,

Hash methods.

Graph theory is an important part of computational mathematics. With the help of this theory, a large number of problems from various fields are solved. The graph consists of many vertices and many edges that connect the vertices. From the point of view of graph theory, it doesn't matter what meaning is put into vertices and edges. The vertices can be populated areas, and the edges of the road connecting them, or the vertices are subroutines, connected by vertices by edges means the interaction of subroutines. Often the direction of the arc in the graph matters. If an edge has a direction, it is called an arc, and a graph with directed edges is called a digraph.

Let us now give a more formally basic definition of graph theory. The graph G is an ordered pair (V, E), where V is a nonempty set of vertices, E is the set of pairs of elements of the set V, and a pair of elements from V is called an edge. An ordered pair of elements from V is called an arc. If all pairs in E are ordered, then the graph is called directed.

A path is any sequence of vertices in a digraph such that in this sequence, vertex b can follow vertex a only if there is an arc following from a to b. Similarly, you can define a path consisting of arcs. A path starting at one vertex and ending at one vertex is called a cycle. A graph in which there are no cycles is called acyclic.

An important special case of a graph is a tree.

Definition: A tree is a finite set consisting of one or more elements, called nodes, such that:

There is a parent-child relationship between the nodes;

There is only one node that has no source. It is called the root;

All nodes except the root have only one source; each node can have several children;

The origin-generated relationship operates in only one direction, i.e. no descendant of some node can become an ancestor for it.

The number of spawned individual nodes (the number of subtrees of a given root) is called its degree. A knot with a zero degree is called a leaf or end knot. The maximum value of the degree of all nodes in a given tree is called the degree of the tree.

If in a tree between the generated nodes that have a common initial, their order is considered essential, then the tree is called ordered. In search problems, ordered trees are almost always considered.

An ordered tree whose degree is at most 2 is called a binary tree. The binary tree is especially used when searching in RAM. Search algorithm: first, the search argument is compared to the key in the root. If the argument matches the key, the search is over, if it does not match, then in the case when the argument is less than the key, the search continues in the left subtree, and in the case when there is more than the key in the right subtree. Increasing the level by 1 repeat the comparison, considering the current node as the root.

Binary trees are especially effective when the set of keys is not known in advance, or when this set is changing rapidly. Obviously, with a variable set of keys, it is better to have a balanced tree.

Definition: A binary tree is said to be balanced if the height of the left subtree of each node differs from the height of the right subtree by no more than 1.

Hashing.

This method is used when the entire set of keys is known in advance and can be stored in RAM for the duration of processing. In this case, a special function is constructed that uniquely maps a set of keys to a set of pointers, called a hash function (from the English "to hash" - cut, grind). Having such a function, you can calculate the address of the record in the file by the given search key. In general, the key data used to determine the address of a record is organized in a table called a hash table.

If the set of keys is unknown in advance or is very large, then the idea of \u200b\u200bunambiguously calculating the address of a record by its key is abandoned, and the hash function is considered simply as a function that scatters a set of keys into a set of addresses.

Data structure is a software unit that allows you to store and process a lot of similar or logically related information in computing devices. If you want to add, find, change, or delete information, the framework will provide a specific package of options that makes up its interface.

What does the concept of a data structure include?

This term can have several close, but still distinctive meanings. It:

  • abstract type;
  • implementation of an abstract type of information;
  • an instance of a data type, for example a specific list.

If we talk about the data structure in the context of functional programming, then it is a special unit that is saved during changes. It can be said informally as a single structure, although there may be different versions.

What forms the structure?

It is formed using links and operations on them in a specific programming language. It should be said that different types of structures are suitable for the implementation of different applications, some, for example, have a very narrow specialization and are suitable only for the production of specified tasks.

If you take B-trees, then they are usually suitable for building databases and only for them. At the same hour, hash tables are still widely used in practice to create various dictionaries, for example, to demonstrate domain names in the Internet addresses of PCs, and not only to form databases.

During the development of a particular software, the complexity of the implementation and the quality of the functionality of programs directly depends on the correct use of data structures. This understanding of things gave impetus to the development of formal development methods and programming languages, where structures, not algorithms, are placed on the leading positions in the program architecture.

It is worth noting that many programming languages \u200b\u200bhave an established type of modularity, which allows data structures to be safely used in various applications. Java, C # and C ++ are prime examples. Now the classical structure of the data used is presented in standard libraries of programming languages \u200b\u200bor it is directly built into the language itself. For example, hash tables are built into Lua, Python, Perl, Ruby, Tcl and others. The C ++ Standard Template Library is widely used.

Comparing structure in functional and imperative programming

It should be noted right away that it is more difficult to design structures for functional languages \u200b\u200bthan for imperative ones, at least for two reasons:

  1. In fact, all structures often use assignments in practice, which are not used in a purely functional style.
  2. Functional structures are flexible systems. In imperative programming, old versions are simply replaced with new ones, but in functional programming everything works as it did. In other words, in imperative programming, structures are ephemeral, but in functional programming, they are constant.

What does the structure include?

Often, the data that programs work with is stored in arrays built into the used programming language, a constant, or in a variable length. An array is the simplest structure with information, but some tasks require greater efficiency of some operations, so other structures are used (more complicated).

The simplest array is suitable for frequent access to the installed components by their indices and their changes, and removing elements from the middle is O (N) O (N). If you need to remove items in order to solve certain problems, you will have to use a different structure. For example, a binary tree (std :: set) allows you to do this in O (logN) O (log\u2061N), but it does not support working with indexes, it only iterates through the elements and searches them by value. Thus, we can say that the structure differs in the operations that it is capable of performing, as well as the speed of their execution. For example, consider the simplest structures that do not provide efficiency gains, but have a well-defined set of supported operations.

Stack

This is one of the types of data structures, presented in the form of a limited, simple array. The classic stack only supports three options:

  • Push an item onto the stack (Complexity: O (1) O (1)).
  • Pop an item from the stack (Complexity: O (1) O (1)).
  • Checking if the stack is empty or not (Complexity: O (1) O (1)).

To clarify how the stack works, you can apply the cookie jar analogy in practice. Imagine that there are several cookies at the bottom of the vessel. You can put a couple more pieces on top, or you can, on the contrary, take one cookie on top. The rest of the cookies will be covered with the top ones, and you won't know anything about them. This is the case with the stack. To describe the concept, the abbreviation LIFO (Last In, First Out) is used, which emphasizes that the component that got into the stack last will be the first one and will be popped out of it.

Queue

It is another type of data structure that supports the same set of options as the stack, but has opposite semantics. The abbreviation FIFO (First In, First Out) is used to describe the queue, because the element that was added first is retrieved first. The name of the structure speaks for itself - the principle of operation completely coincides with the queues, which can be seen in a store, supermarket.

Dec

This is another kind of data structure, also called a double-ended queue. The option supports the following set of operations:

  • Insert element to the beginning (Difficulty: O (1) O (1)).
  • Extract component from the beginning (Complexity: O (1) O (1)).
  • Adding an element to the end (Complexity: O (1) O (1)).
  • Retrieving an item from the end (Complexity: O (1) O (1)).
  • Check if the deck is empty (Difficulty: O (1) O (1)).

Lists

This data structure defines a sequence of linearly connected components, for which the operations of adding components to any place in the list and deleting it are allowed. A linear list is specified by a pointer to the beginning of the list. Typical list operations include traversing, finding a specific component, inserting an element, deleting a component, combining two lists into a single whole, splitting a list into pairs, and so on. It should be noted that in a linear list, in addition to the first, there is a previous component for each element, not including the last one. This means that the components of the list are in an ordered state. Yes, processing such a list is not always convenient, because there is no possibility of moving in the opposite direction - from the end of the list to the beginning. However, in a linear list, you can step by step through all the components.

There are also ring lists. This is the same structure as a linear list, but it has an additional relationship between the first and last components. In other words, the first component is next to the last item.

In this list, the elements are equal. Distinguishing the first and the last is a convention.

Trees

This is a collection of components, which are called nodes, in which there is a main (one) component in the form of a root, and all the rest are divided into many non-intersecting elements. Each set is a tree, and the root of each tree is a descendant of the root of the tree. In other words, all components are interconnected by parent-child relationships. As a result, a hierarchical structure of nodes can be observed. If nodes do not have children, then they are called leaves. Above the tree, such operations are defined as: adding a component and removing it, traversing, searching for a component. Binary trees play a special role in computer science. What it is? This is a special case of a tree, where each node can have no more than a couple of children that are the roots of the left and right subtree. If, in addition, for the nodes of the tree, the condition is satisfied that all the values \u200b\u200bof the components of the left subtree are less than the values \u200b\u200bof the root, and the values \u200b\u200bof the components of the right subtree are greater than the root, then such a tree is called a binary search tree, and it is intended for quickly finding elements. How does the search algorithm work in this case? The search value is compared with the value of the root, and depending on the result, the search either ends or continues, but exclusively in the left or right subtree. The total number of comparison operations will not exceed the height of the tree (this is the largest number of components on the path from the root to one of the leaves).

Graphs

Graphs are a collection of components called vertices, along with a complex of relationships between these vertices, called edges. The graphic interpretation of this structure is presented in the form of a set of points that correspond to the vertices, and some pairs are connected by lines or arrows, which correspond to the edges. The last case suggests that the graph should be called directed.

Graphs can describe objects of any structure, they are the main means for describing complex structures and the functioning of all systems.

Learn more about abstract structure

To build an algorithm, it is required to formalize the data, or, in other words, it is necessary to bring the data to a specific information model, which has already been studied and written. Once the model is found, it can be argued that an abstract structure has been established.

This is the main data structure that demonstrates the features, qualities of an object, the relationship between the components of an object and the operations that can be done on it. The main task is to search and display forms of information presentation that are comfortable for computer correction. It should be noted right away that informatics as an exact science works with formal objects.

Analysis of data structures is performed by the following objects:

  • Integers and real numbers.
  • Boolean values.
  • Symbols.

For processing all elements on a computer, there are corresponding algorithms and data structures. Typical objects can be combined into complex structures. You can add operations on them, rules to certain components of this structure.

The data organization structure includes:

  • Vectors.
  • Dynamic structures.
  • Tables.
  • Multidimensional arrays.
  • Graphs.

If all the elements are chosen successfully, then this will be the key to the formation of effective algorithms and data structures. If we apply the analogy of structures and real objects in practice, then we can effectively solve existing problems.

It is worth noting that all data organization structures exist separately in programming. They worked a lot on them in the eighteenth and nineteenth centuries, when there was still no trace of a computer.

It is possible to develop an algorithm in terms of an abstract structure, however, to implement an algorithm in a specific programming language, you will need to find a technique for representing it in data types, operators that are supported by a specific programming language. To create structures such as a vector, a plate, a string, a sequence, in many programming languages \u200b\u200bthere are classic data types: one-dimensional or two-dimensional array, string, file.

We figured out the characteristics of data structures, now it is worth paying more attention to understanding the concept of structure. When solving absolutely any problem, you need to work with some kind of data in order to perform operations on information. Each task has its own set of operations, but a certain set is used in practice more often to solve various tasks. In this case, it is useful to come up with a certain way of organizing the information that will allow you to perform these operations as efficiently as possible. As soon as such a method appeared, we can assume that you have a "black box" in which data of a certain kind will be stored and which will perform certain operations with data. This will allow you to take your mind off the details and concentrate fully on the specific features of the problem. This "black box" can be implemented in any way, while it is necessary to strive for the most productive implementation.

Who needs to know this?

It is worth getting acquainted with the information for novice programmers who want to find their place in this area, but do not know where to go. These are the basics in every programming language, so it will not be superfluous to immediately learn about data structures, and then work with them using specific examples and with a specific language. It should not be forgotten that each structure can be characterized by logical and physical representations, as well as a set of operations on these representations.

Do not forget: if you are talking about a particular structure, then keep in mind its logical representation, because the physical representation is completely hidden from the "external observer".

Besides, keep in mind that the logical representation is completely independent of the programming language and the computer, while the physical, on the contrary, depends on the translators and computers. For example, a two-dimensional array in Fortran and Pascal can be represented in the same way, but the physical representation in the same computer in these languages \u200b\u200bwill be different.

Do not rush to start learning specific structures, it is best to understand their classification, familiarize yourself with everything in theory and preferably in practice. It is worth remembering that variability is an important feature of structure, and it indicates a static, dynamic or semi-static position. Learn the basics before getting into more global things, this will help you further develop.

Did you like the article? To share with friends: