Organization of the memory of computing systems. Computing systems with shared memory

Topic 3.1 Organization of computations in computing systems

Purpose and characteristics of the aircraft. Organization of calculations in computing systems. Parallel computer, the concepts of command flow and data flow. Associative systems. Matrix systems. Pipeline computation. Command pipeline, data pipeline. Superscalarization.

The student must

know:

Command flow concept;

Data flow concept;

Types of computing systems;

Architectural features of computing systems

Computing systems

Computing system (Sun)- a set of interconnected and interacting processors or computers, peripheral equipment and software, designed to collect, store, process and distribute information.

The creation of the Armed Forces pursues the following main goals:

· Increasing system performance by accelerating data processing;

· Increasing the reliability and accuracy of calculations;

Providing the user with additional services, etc.

Topic 3.2

Aircraft classification depending on the number of command and data streams: OKOD (SISD), SMD (SIMD), MKOD (MISD), MKMD (MIMD).

Classification of multiprocessor aircraft with different ways of implementing shared memory: UMA, NUMA, SOMA. Comparative characteristics, hardware and software features.

Classification of multi-vehicle aircraft: MPP, NDW and COW. Appointment, characteristics, features.

Examples of aircraft of various types. The advantages and disadvantages of various types of computing systems.

Classification of computing systems

A distinctive feature of an aircraft in relation to classical computers is the presence in it of several computers that implement parallel processing .

The parallelism of operations execution significantly increases the system performance; it can significantly increase reliability (if one component of the system fails, its function can be taken over by another), as well as the reliability of the functioning of the system if operations are duplicated and the results are compared.

Computing systems can be divided into two groups:

· multi-machine ;

· multiprocessor .

Multi-machine computing system consists of several separate computers. Each computer in a multi-machine system has a classical architecture, and such a system is widely used. However, the effect of using such a computing system can be obtained only when solving a problem that has a special structure: it must be divided into as many weakly connected subtasks as there are computers in the system.

Multiprocessor architecture assumes the presence of several processors in the computer, so many data streams and many instruction streams can be organized in parallel. Thus, several fragments of one task can be executed simultaneously. The speed advantage of multiprocessor computing systems over uniprocessor ones is obvious.

The disadvantage is the possibility of conflict situations when multiple processors access the same memory area.

A feature of multiprocessor computing systems is the presence of shared RAM as a shared resource (Figure 11).

Figure 11 - Architecture of a multiprocessor computing system

Flynn's classification

Among all the aircraft classification systems under consideration, the most widespread was the classification proposed in 1966 by M. Flynn. It is based on flow concept , which is understood as a sequence of command or data elements processed by the processor. Flynn distinguishes 4 classes of architectures depending on the number of command streams and data streams:

· OKOD - single command stream - single data stream. These include the classic von Neumann VM. Conveyor processing does not matter, so both VM 6600 with scalar functional devices and 7600 with conveyor ones fall into the OKOD class.

· ICOD - multiple command stream - single data stream. In this architecture, multiple processors handle the same data stream. An example would be a VS, to the processors of which a distorted signal is supplied, and each of the processors processes this signal using its own filtering algorithm. Nevertheless, neither Flynn nor other specialists in the field of computer architecture have so far been able to imagine a real aircraft built on this principle. A number of researchers attribute conveyor systems to this class, but this has not found final recognition. Having an empty class should not be considered a flaw in Flynn's classification. Such classes can be useful when developing new concepts in the theory and practice of constructing an aircraft.

· OKMD - one command stream - many data streams - commands are issued by one control processor, and are executed simultaneously on all processing processors over the local data of these processors. SIMD (single instruction - multiple data)

· ICMD - many streams of commands - many streams of data - a set of computers working in their programs with their initial data. MIMD (multiple instruction - multiple data)

Flynn's classification scheme is the most common in the initial assessment of the aircraft, since it immediately allows you to assess the basic principle of the system. However, Flynn's classification also has obvious drawbacks: for example, the inability to unambiguously assign some architectures to one class or another. The second drawback is the excessive saturation of the MIMD class.

Existing computing systems of the MIMD class form three subclasses: symmetric multiprocessors (SMP), clusters and massively parallel systems (MPP). This classification is based on a structural - functional approach.

Symmetric multiprocessors Consist of a set of processors that have the same access to memory and external devices and operate under the same operating system (OS). A special case of SMP is single-processor computers. All SMP processors have shared memory with a single address space.

Using SMP provides the following capabilities:

· Scaling applications at low initial costs, by applying without converting applications on new, more productive hardware;

· Creation of applications in familiar software environments;

· The same access time to all memory;

· The ability to send messages with high bandwidth;

· Support for the coherence of a set of caches and blocks of main memory, indivisible synchronization and locking operations.

Cluster system It is formed from modules united by a communication system or shared external memory devices, for example, disk arrays.

The cluster size varies from a few modules to several tens of modules.

Within both shared and distributed memory, several models of memory system architectures are implemented. Figure 12 shows the classification of such models used in computing systems of the MIMD class (it is also true for the SIMD class).

Figure 12 - Classification of models of memory architectures of computing systems

In systems with shared memory all processors have equal opportunities to access a single address space. Single memory can be built as a single-block or modular, but the latter is usually the case.

Computing systems with shared memory, where access of any processor to memory is performed uniformly and takes the same time, are called systems with uniform memory access and denoted by the abbreviation UMA (Uniform Memory Access). This is the most common memory architecture for parallel VS with shared memory.

Technically, UMA systems assume the presence of a node connecting each of pprocessors with each of tmemory modules. The simplest way of constructing such aircraft - combining several processors (P i.) With a single memory (M P) through a common bus - is shown in Figure 12a. . In this case, however, only one of the processors can exchange on the bus at a time, that is, the processors must compete for access to the bus. When the processor Р i,fetches an instruction from memory, other processors P j(i ≠ j) must wait until the tire is free. If the system includes only two processors, they are able to operate at near maximum performance, since their access to the bus can be interleaved: while one processor decodes and executes an instruction, the other can use the bus to fetch the next instruction from memory. However, when a third processor is added, performance starts to degrade. If there are ten processors on the bus, the bus speed curve (Figure 12b) becomes horizontal, so adding an 11 processor does not increase performance. The bottom curve in this figure illustrates the fact that memory and bus have a fixed bandwidth determined by the combination of memory cycle time and bus protocol, and in a multiprocessor system with a shared bus, this bandwidth is distributed across multiple processors. If the processor cycle time is longer than the memory cycle, many processors can be connected to the bus. However, in fact, the processor is usually much faster than memory, so this scheme is not widely used.

An alternative way of building a multiprocessor CS with shared memory based on UMA is shown in Figure 13c. . Here the bus is replaced with a switch that routes processor requests to one of several memory modules. Despite the fact that there are several memory modules, they all belong to a single virtual address space. The advantage of this approach is that the switch is able to serve multiple requests in parallel. Each processor can be connected to its own memory module and have access to it at the maximum allowed speed. Rivalry between processors can arise when trying to access the same memory module at the same time. In this case, only one processor gets access, and the others are blocked.

Unfortunately, the UMA architecture doesn't scale well. The most common systems contain 4-8 processors, much less often 32-64 processors. In addition, such systems cannot be attributed to fail-safe, because the failure of one processor or memory module entails the failure of the entire aircraft.

Figure 13 - Shared memory:

a) combining processors using a bus and a system with local caches;

b) system performance as a function of the number of processors on the bus;

c) multiprocessor aircraft with shared memory, consisting of separate modules

Another approach to building a shared memory VS is heterogeneous memory access , designated as NUMA (Non-Uniform Memory Access). Here, as before, there is a single address space, but each processor has local memory. The processor accesses its own local memory directly, which is much faster than accessing remote memory over a switch or network. Such a system can be supplemented with global memory, then local storage devices act as fast cache memory for global memory. Such a scheme can improve the performance of the aircraft, but it is not able to indefinitely delay the equalization of direct performance. If each processor has a local cache (Figure 13a), there is a high probability (p\u003e0.9) that the required command or data is already in local memory. A reasonable likelihood of getting into local memory significantly reduces the number of processor calls to global memory and thus improves efficiency. The place of the break in the performance curve (the upper curve in Figure 13b ), the point where the addition of processors is still effective is now moved to the processor region 20, and the point where the curve becomes horizontal is moved to the processor region 30.

Within the concept NUMAseveral different approaches are implemented, denoted by abbreviations SOMA, CC-NUMAand NCC-NUMA.

IN cache-only architecture (SOMA, Cache Only Memory Architecture) the local memory of each processor is built as a large cache memory for quick access from the “own” processor. The caches of all processors are collectively considered the global memory of the system. There is no actual global memory. The fundamental feature of the SOMA concept is expressed in dynamics. Here the data is not statically bound to a specific memory module and does not have a unique address that remains unchanged for the entire lifetime of the variable. In the SOMA architecture, data is transferred to the cache memory of the processor that last requested it, while the variable is not fixed with a unique address and can be located in any physical cell at any time. Moving data from one local cache to another does not require the operating system to participate in this process, but involves complex and expensive memory management hardware. To organize such a regime, the so-called cache directories . Note also that the last copy of the item is never removed from the cache.

Since in the SOMA architecture data is moved to the local cache memory of the owner processor, such aircraft have a significant performance advantage over other NUMA architectures. On the other hand, if a single variable, or two different variables stored on the same line in the same cache, are required by two processors, that cache line must be moved back and forth between processors each time the data is accessed. Such effects can depend on the details of memory allocation and lead to unpredictable situations.

Model cache-coherent heterogeneous memory access (CC-NUMA, Cache Coherent Non-Uniform Memory Architecture) is fundamentally different from the SOMA model. The CC-NUMA system does not use cache memory, but rather regular physically allocated memory. No pages or data are copied between memory locations. There is no software messaging. There is just one memory stick, with parts physically connected by copper cable, and smart hardware. Hardware-based cache coherency means no software is required to store multiple copies of updated data or transfer it. The hardware level handles all this. Access to local memory modules in different nodes of the system can be performed simultaneously and is faster than accessing remote memory modules.

The difference between the model with cache-incoherent non-uniform memory access (NCC-NUMA, Non-Cache Coherent Non-Uniform Memory Architecture) from CC-NUMA is obvious from the name. The memory architecture assumes a single address space, but does not ensure consistency of global data at the hardware level. The management of the use of such data rests entirely with the software (applications or compilers). Despite this circumstance, which seems to be a disadvantage of the architecture, it turns out to be very useful in increasing the performance of computing systems with a memory architecture of the DSM type, considered in the section "Models of Distributed Memory Architecture".

In general, NUMA-based aircraft with shared memory are called virtual shared memory architectures (virtual shared memory architectures). This type of architecture, in particular CC-NUMA, has recently been considered as an independent and rather promising type of computing systems of the M1MD class.

Distributed memory architecture models.In a distributed memory system, each processor has its own memory and can only address it. Some authors call this type of systems multi-vehicle aircraft or multicomputers , emphasizing the fact "that the blocks from which the system is built are themselves small computing systems with a processor and memory. Models of distributed memory architectures are usually denoted as architecture without direct remote memory access (NORMA, No Remote Memory Access). This name follows from the fact that each processor has access only to its local memory. Access to remote memory (local memory of another processor) is possible only by exchanging messages with the processor that owns the addressable memory.

This organization has a number of advantages. First, there is no competition for the bus or switches when accessing data: each processor can fully use the bandwidth of the communication path with its own local memory. Second, the absence of a shared bus means that there are no associated restrictions on the number of processors: the size of the system is limited only by the network of processors. Third, the problem of cache coherence is removed. Each processor has the right to independently change its data without worrying about matching copies of data in its own local cache memory with the caches of other processors.

The student must

know:

Aircraft classification;

Examples of aircraft of various types.

be able to:

- choose the type of computing system in accordance with the problem being solved.

© 2015-2019 site
All rights belong to their authors. This site does not claim authorship, but provides free use.
Date the page was created: 2016-07-22

Organization of the memory subsystem in a PC

Memories of the PC memory subsystem can be arranged in the following hierarchy (Table 9.1):

Table 9.1. The hierarchy of the PC memory subsystem
№	Memory type	1985 year	2000 year
Sampling time	Typical volume	Price / Byte	Sampling time	Typical volume	Price / Byte
	Super-operative memory (registers)	0.2 5 ns	16/32 bit	$ 3 - 100	0.01 1 ns	32/64/128 bit	$ 0,1 10
	High-speed buffer storage (cache)	20 100 ns	8Kb - 64Kb	~ $ 10	0.5 - 2 ns	32Kb 1Mb	$ 0,1 - 0,5
	Operational (main) memory	~ 0.5 ms	1MB - 256MB	$ 0,02 1	2 ns 20 ns	128MB - 4GB	$ 0,01 0,1
	External storage (mass storage)	10 - 100 ms	1MB - 1GB	$ 0,002 - 0,04	5 - 20 ms	1GB - 0.5TB	$ 0,001 - 0,01

Processor registers constitute its context and store data used by the currently executing processor instructions. Processor registers are usually accessed by their mnemonic symbols in processor instructions.

The cache is used to match the speed of the CPU and main memory. Computing systems use a multi-level cache: level I (L1) cache, level II (L2) cache, etc. Desktop systems typically use a two-level cache, while server systems use a three-level cache. The cache stores instructions or data that are likely to be sent to the processor for processing in the near future. The operation of the cache is transparent to software, so the cache is usually not software accessible.

RAM stores, as a rule, functionally complete software modules (the operating system kernel, executable programs and their libraries, device drivers, etc.) and their data, directly involved in the work of programs, and is also used to save the results of calculations or other data processing before sending them to an external memory, to a data output device or communication interfaces.

Each memory cell is assigned a unique address. Organizational methods of memory allocation provide programmers with the ability to efficiently use the entire computer system. These methods include a continuous ("flat") memory model and a segmented memory model. When using a flat model of memory, the program operates with a single contiguous address space, a linear address space, in which memory cells are numbered sequentially and continuously from 0 to 2n-1, where n is the CPU capacity at the address. When using a segmented model for a program, memory is represented by a group of independent address blocks called segments. To address a byte of memory, a program must use a logical address consisting of a segment selector and an offset. The segment selector selects a specific segment, and the offset points to a specific cell in the address space of the selected segment.

Organizational methods of memory allocation make it possible to organize a computing system in which the working address space of the program exceeds the size of the RAM actually available in the system, while the lack of RAM is filled by external slower or cheaper memory (hard drive, flash memory, etc.) ) This concept is called virtual memory. In this case, the linear address space can be mapped to the physical address space either directly (a linear address is a physical address), or using the paging mechanism. In the second case, the linear address space is divided into equal-sized pages that make up virtual memory. Paging provides a mapping of the required virtual memory pages to the physical address space.

In addition to the implementation of the virtual memory system, external storage devices are used for long-term storage of programs and data in the form of files.

Cache memory

Cache memory is a high-speed memory located on the same die as the CPU or external to the CPU. The cache serves as a high-speed buffer between the CPU and the relatively slow main memory. The idea behind cache memory is based on predicting the most likely CPU accesses to RAM. This approach is based on the principle of temporal and spatial locality of the program.

If the CPU has accessed an object in RAM, it is highly likely that the CPU will soon refer to that object again. An example of this situation would be code or data in loops. This concept is described by the principle of temporary locality, according to which frequently used objects of main memory should be "closer" to the CPU (in the cache).

Three write methods are used to reconcile the contents of the cache and RAM:

Write through - simultaneously with the cache memory, the RAM is updated.
Buffered write through - information is delayed in the cache buffer before being written to RAM and rewritten to RAM in those cycles when the CPU is not accessing it.
Write back - the change bit in the tag field is used, and the line is rewritten into RAM only if the change bit is 1.

Typically, all but pass-through write methods allow deferring and grouping writes to RAM to increase performance.

Two types of data blocks are distinguished in the cache memory structure:

data display memory (data itself, duplicated from RAM);
tag memory (signs that indicate the location of the cached data in RAM).

The memory space for displaying data in the cache is broken up into lines — blocks of fixed length (for example, 32, 64, or 128 bytes). Each cache line can contain a contiguous aligned block of bytes from RAM. Which block of RAM is mapped to a given cache line is determined by the line tag and the mapping algorithm. According to the algorithms for mapping RAM to cache, there are three types of cache memory:

fully associative cache;
direct display cache;
multiple associative cache.

In a fully associative cache, it is common for the cache controller to place any block of RAM on any line in the cache (Figure 9.1). In this case, the physical address is split into two parts: the offset in the block (cache line) and the block number. When a block is placed in the cache, the block number is stored in the tag of the corresponding line. When the CPU accesses the cache for a block, the cache miss will only be detected after comparing the tags of all the lines with the block number.

One of the main advantages of this display method is good utilization of RAM, since there is no restriction on which block can be mapped to a particular cache line. The disadvantages include the complex hardware implementation of this method, which requires a large amount of circuitry (mainly comparators), which leads to an increase in the access time to such a cache and an increase in its cost.

Enlarge image
Figure: 9.1. Fully associative 8x8 cache for 10-bit address

An alternative way to map RAM to cache is a forward-mapped cache (or one-way associative cache). In this case, the memory address (block number) uniquely determines the cache line into which this block will be placed. The physical address is broken into three parts: the block offset (cache line), the cache line number, and the tag. This or that block will always fit into a strictly defined cache line, replacing another block stored there if necessary. When the CPU accesses the cache for a block it needs, it only needs to check one line of the tag to determine a hit or cache miss.

The obvious advantages of this algorithm are the simplicity and low cost of implementation. The disadvantages include the low efficiency of such a cache due to the likely frequent line reloads. For example, when referring to each 64th memory cell in the system in Fig. 9.2, the cache controller will be forced to constantly overload the same cache line, completely not using the rest.

Enlarge image
Figure: 9.2. 8x8 forward mapping cache for 10-bit address

Despite the obvious shortcomings, this technology has been successfully used, for example, in the Motorola MC68020 MP, for organizing the first-level instruction cache (Fig. 9.3). This microprocessor has a direct display cache of 64 lines of 4 bytes. The string tag, in addition to the 24 bits that specify the address of the cached block, contains a significant bit that determines the validity of the string (if the value bit is 0, this string is considered invalid and will not cause a cache hit). Data access is not cached.

Enlarge image
Figure: 9.3. Diagram of cache memory organization in MP Motorola MC68020

The trade-off between the first two algorithms is multiple associative cache or partial associative cache (Figure 9.4). With this method of organizing the cache memory, the lines are combined into groups, which can include 2, 4,: lines. In accordance with the number of lines in such groups, 2-input, 4-input, etc. are distinguished. associative cache. When accessing memory, the physical address is divided into three parts: the offset in the block (cache line), the group (set) number, and the tag. A memory block whose address corresponds to a certain group can be placed on any line of this group, and the corresponding value is placed in the line tag. It is obvious that within the selected group the principle of associativity is observed. On the other hand, this or that block can only fall into a strictly defined group, which echoes the principle of organizing the direct display cache. In order for the processor to be able to identify the cache miss, it will need to check the tags of only one group (2/4/8 /: rows).

Enlarge image
Figure: 9.4. 2-input 8x8 associative cache for 10-bit address

This display algorithm combines the advantages of both a fully associative cache (good memory utilization, high speed) and a direct access cache (simplicity and low cost), only slightly inferior in these characteristics to the original algorithms. That is why the multiple associative cache is the most widespread (Table 9.2).

Table 9.2. IA-32 Cache Subsystem Specifications
	Intel486	Pentium	Pentium MMX	P6	Pentium 4
L1 instruction cache
A type	4-in. assoc.	2-in. assoc.	4-in. assoc.	4-in. assoc.	8-in. assoc.
Line size, bytes					-
Total volume, KB	8/16			8/16	12Kmops
L1 data cache
A type	Shared with instruction cache	2-in. assoc.	4-in. assoc.	2/4-in. assoc.	4-in. assoc.
Line size, bytes
Total volume, KB			8/16
L2 cache
A type	External	external 4-in. assoc.	4-in. assoc.	8-in. assoc.
Line size, bytes
Total volume, KB	256/512	128-2048	256/512

Notes: Intel-486 uses a single level 1 instruction and data cache. In Pentium Pro L1, the data cache is 8 KB 2-input associative, in other P6 models - 16 KB 4-input associative. Pentium 4 uses L1 micro-op cache (trace cache) instead of L1 instruction cache.

To organize the cache memory, you can use the Princeton architecture (mixed cache for instructions and data, for example, in Intel-486). This obvious (and inevitable for von Neumann systems with cache memory external to the CPU) is not always the most efficient solution. Separating the cache memory into instruction and data caches (Harvard architecture cache) improves the efficiency of the cache for the following reasons:

Many modern processors have a pipelined architecture in which pipeline blocks run in parallel. Thus, instruction fetching and instruction data access occur at different stages of the pipeline, and the use of separate cache memory allows these operations to be performed in parallel.
The instruction cache can be implemented as read-only and therefore does not require any write-back algorithms, which makes this cache simpler, cheaper, and faster.

That is why all the latest IA-32 models, starting with the Pentium, use the Harvard architecture to organize the L1 cache.

The criterion for the efficient operation of the cache can be considered a decrease in the average access time to memory compared to a system without a cache memory. In this case, the average access time can be estimated as follows:

T cf \u003d (T hit x R hit) + (T miss x (1 R hit))

where T hit is the access time to the cache memory in case of a hit (includes the time to identify a miss or a hit), T miss is the time required to load a block from main memory into a cache line in case of a cache miss and then deliver the requested data to the processor , R hit - hit rate.

Obviously, the closer the value of R hit to 1, the closer the value of T cf to T hit. The hit rate is primarily determined by the cache architecture and size. The impact of the presence and absence of cache memory and its size on the increase in CPU performance is shown in table. 9.3.

Table 9.1. The hierarchy of the PC memory subsystem

№	Memory type	1985 year			2000 year
№	Memory type	Sample time	Typical volume	Price / Byte	Sample time	Typical volume	Price / Byte
1	Super-operative memory (registers)	0.2 5 ns	16/32 bit	$ 3 - 100	0.01 1 ns	32/64/128 bit	$ 0,1 10
2	High-speed buffer storage (cache)	20 100 ns	8Kb - 64Kb	~ $ 10	0.5 - 2 ns	32Kb 1Mb	$ 0,1 - 0,5
3	Operational (main) memory	~ 0.5 ms	1MB - 256MB	$ 0,02 1	2 ns 20 ns	128MB - 4GB	$ 0,01 0,1
4	External storage (mass storage)	10 - 100 ms	1MB - 1GB	$ 0,002 - 0,04	5 - 20 ms	1GB - 0.5TB	$ 0,001 - 0,01

Processor registers constitute its context and store data used by the currently executing processor instructions. As a rule, processor registers are accessed by their mnemonic designations in processor instructions.

Cache used to match the speed of the CPU and main memory. Computing systems use a multi-level cache: level I (L1) cache, level II (L2) cache, etc. Desktop systems typically use a two-level cache, while server systems use a three-level cache. The cache stores instructions or data that are likely to be sent to the processor for processing in the near future. The operation of the cache is transparent to software, so the cache is usually not software accessible.

RAM stores, as a rule, functionally complete software modules (the operating system kernel, executable programs and their libraries, device drivers, etc.) and their data directly involved in the operation of programs, and is also used to save the results of calculations or other processing data before sending them to an external memory, to a data output device or communication interfaces.

Each cell random access memory assigned a unique address. Organizational methods of memory allocation provide programmers with the ability to efficiently use the entire computer system. These methods include a continuous ("flat") memory model and a segmented memory model. When using a flat model of memory, the program operates with a single contiguous address space, a linear address space in which memory cells are numbered sequentially and continuously from 0 to 2n-1, where n is the CPU capacity at the address. Using a segmented model for a program, memory is represented by a group of independent address blocks called segments. To address a byte of memory, the program must use a logical address consisting of a segment selector and an offset. The segment selector selects a specific segment, and the offset points to a specific cell in the address space of the selected segment.

Memory is an important part of computing systems. The organization of the interaction between the processor and memory determines the main characteristics of the computing system, the rest of the elements provide the connection of this link with external devices with the outside world. The memory is connected to the memory management controller (memory management device) via the address bus, data bus and control bus. The bit width of the data bus determines how many bits simultaneously (in parallel) can be read from memory. Each bit (1 bit) is stored in a memory element. Elements for memory of various types are built on the basis of different physical principles of recording and storing information. Memory elements are combined into memory cells. In this case, all elements of the cell are addressed simultaneously, in the same way and are organized so that they can simultaneously output data to the data bus. Such combined cells form a word. The number of bits of data read from memory at the same time is called the sample length. To store 1 byte, 8 memory elements are used, eight-bit memory cells are organized using an 8-line data bus.

Memory microcircuits (chips) are used to create memory modules that are installed in special slots (connectors) of the computer system. Now the most common DIMM modules are memory modules with two rows of pins.

The width of the address bus determines the address space, that is, the number of memory cells that can be addressed directly. If the width of the address bus is n, then the number of all possible binary combinations (the number of addresses) will be determined as N \u003d 2n.

Figure: 1. Organization of communication between the memory system and the processor

The memory of a computing device can perform three operations:

a) information storage;

b) recording of information;

c) reading information.

Memory characteristics:

Memory capacity defines the maximum amount of information stored in memory and is measured in bits, bytes, kilobytes, megabytes, gigabytes, terabytes, etc.

Specific capacity is defined as the ratio of the memory capacity to the amount physically occupied by it.

The information recording density is defined as the amount of information per unit area of \u200b\u200bthe information carrier or per unit length of the information carrier.

Memory access time. Memory performance is determined by the duration of operations when accessing memory. The access time for writing and the access time for reading is the sum of the search time for a memory cell at a given address and the actual writing or reading, respectively.

Memory classification:

Random access memory

For random access memory (electronic memory), the access time does not depend on the location of the desired memory area. The cell is selected by address using electronic circuits.

Direct circular access

Direct circular access is used when accessing disk memory. The storage medium rotates continuously, so the ability to access the same memory area is cyclical.

Sequential access

Sequential access to data is possible when using magnetic tape as a carrier, where sequential scanning of sections of the carrier is necessary to find the required data.

Addressless memory

Stack and associative storage devices can be classified as addressless. When accessing unaddressed memory, the cell address is not specified in the memory access command. In stacked memory devices, the address of a memory cell keeps track of a special address register. When accessing the stack, the address from this register is set. When accessing the associative memory, the search for information is carried out according to the attribute (tag) by comparing the tags of all memory cells with the associative attribute. An associative feature is written to a special feature register to perform a comparison operation.

Memory classification by functional purpose:

ROM - read only memory or ROM (Read Only-Memory), are used to store permanent data and utility programs.

SRAM is a super-operative memory device, it is a set of general-purpose registers - RON, designed to store operands and the results of an operation in the processor.

RAM - random access memory or RAM (Random Access Memory - memory with random access), is used to store the executable program and operational data. If any register can be accessed for writing / reading at its address, then such a register structure forms a random access RAM.

Classification by the way information is stored:

Static memory

In static storage devices, LSIs are made on bistable trigger memory elements (having two stable states - hence the name of the memory).

Dynamic memory

In dynamic memory devices, cheaper LSIs are used, in which the storage element is a capacitor. The capacitor discharges over time (this is the dynamics), therefore it is necessary to maintain the value of the potential by recharging the capacitor. This process is called regeneration.

Persistent memory

In read-only memory devices, the memory element is a fusible fuse or a semiconductor diode that plays the role of a destructible bridge. In reprogrammable ROMs, cells made on MOS transistors with floating and insulated gate are used for recording and storing information, information is recorded electrically when current flows through the source / drain channel, charges are deposited on the gate and are stored for as long as you like. The information is erased by applying a voltage of a different sign to the source / drain section in a reprogrammable ROM with electrical erasure or irradiation with ultraviolet radiation in a ROM with an ultraviolet erasure.

Holographic memory

In holographic storage devices, information is stored in the volume of a holographic crystal in the form of a snapshot of the interference of two waves, reference and information. This promising form of storage has a high data density and is currently under development.

Biological memory

In biological storage devices, information is recorded using a change in the state of organic molecules that have the ability to store charge and exchange electrons.

Memory on magnetic media

In external storage devices on magnetic media, information is stored in the form of sections of the ferromagnetic surface of a disk or magnetic tape magnetized in a certain direction.

Optical memory

In optical external storage devices, information is recorded in the form of sections having different light scattering coefficients of a directed laser beam.

Memory is one of the main components of any computer. Its capacity and speed largely determine the performance of the entire computer system. In this issue, the most important technologies for creating and organizing memory were considered.