Multi-threaded applications. Multithreaded Application Architectures Multithreaded Applications

Threads and processes are related concepts in computing. Both are sequences of instructions that must be executed in a specific order. Instructions in separate threads or processes, however, can run in parallel.

Processes exist in the operating system and correspond to what users see as programs or applications. A thread, on the other hand, exists within a process. For this reason, threads are sometimes referred to as "lightweight processes". Each process consists of one or more threads. The existence of multiple processes allows a computer to "simultaneously" perform multiple tasks. The existence of multiple threads allows a process to share work for parallel execution. On a multiprocessor computer, processes or threads can run on different processors. This allows for real parallel work.

Completely parallel processing is not always possible. Streams sometimes need to synchronize. One thread may be waiting for the result of another thread, or one thread may need exclusive access to a resource that is being used by another thread. Synchronization problems are a common cause of errors in multithreaded applications. Sometimes a thread can end up waiting for a resource that will never be available. This ends up in a condition called deadlock.

The first thing to learn is the process consists of at least one thread... In the OS, each process has an address space and a single control thread. In fact, this determines the process.

One side, a process can be thought of as a way to combine related resources into one group... A process has an address space that contains program text and data, as well as other resources. Resources are open files, child processes, unhandled alarms, signal handlers, credentials, and more. It is much easier to manage resources by bundling them in the form of a process.

On the other side, a process can be thought of as a thread of executable co-commands or just a thread... The thread has a command counter that keeps track of the order in which actions are performed. It has registers that store the current variables. It has a stack containing a process execution protocol, where a separate frame is allocated for each procedure called but not yet returned. Although a thread must execute within a process, the concept of thread and process must be distinguished. Processes are used to group resources, and threads are objects that alternately execute on the CPU.

The concept of streams adds to the process model possibility of simultaneous execution of several programs in the same process environment are fairly independent. Multiple threads running in parallel in a single process is like having multiple processes running in parallel on the same computer. In the first case, threads share the address space, open files, and other resources. In the second case, processes share physical memory, disks, printers, and other resources. Streams have some of the properties of processes, which is why they are sometimes referred to as lightweight processes. Term multithreading also used to describe the use of multiple threads in a single process.

Any stream consists of two components:

kernel object through which the operating system controls the flow. Statistical information about the stream is also stored there (additional streams are also created by the kernel);
thread stack, which contains the parameters of all functions and local variables that the thread needs to execute the code.

Summing up the line, let's fix: the main difference between processes and threads, consists in the fact that processes are isolated from each other, so they use different address spaces, and threads can use the same space (inside the process) while performing actions without interfering with each other. This is what convenience of multi-threaded programming: By splitting the application into several sequential threads, we can increase performance, simplify the user interface and achieve scalability (if your application is installed on a multiprocessor system, executing threads on different processors, your program will run at an astonishing speed =)).

1. A thread defines the sequence of code execution in a process.

2. The process does not execute anything, it just serves as a container for threads.

3. Streams are always created in the context of a process, and their whole life takes place only within its boundaries.

4. Threads can execute the same code and manipulate the same data, as well as share descriptors of kernel objects, since the descriptor table is not created in separate threads, but in processes.

5. Since threads consume significantly less resources than processes, try to solve your problems by using additional threads and avoid creating new processes (but be smart about this).

Multitasking(eng. multitasking) - a property of the operating system or programming environment to provide the ability to parallel (or pseudo-parallel) processing of several processes. True operating system multitasking is possible only in distributed computing systems.

File: Screenshot of Debian (Release 7.1, "Wheezy") running the GNOME desktop environment, Firefox, Tor, and VLC Player.jpg

The desktop of a modern operating system, reflecting the activity of several processes.

There are 2 types of multitasking:

· Process multitasking(based on processes - concurrently executing programs). Here, a program is the smallest piece of code that can be controlled by the operating system's scheduler. Better known to most users (working in a text editor and listening to music).

· Thread multitasking(stream-based). The smallest unit of managed code is a thread (one program can execute 2 or more tasks at the same time).

Multithreading is a specialized form of multitasking.

1 Properties of a multitasking environment

2 Difficulties in implementing a multitasking environment

3 History of multitasking operating systems

4 types of pseudo-parallel multitasking

o 4.1 Non-preemptive multitasking

o 4.2 Collaborative or cooperative multitasking

o 4.3 Preemptive or priority multitasking (real-time mode)

5 Problem situations in multitasking systems

o 5.1 Starvation

o 5.2 Race condition

· 7 Notes

Properties of a multitasking environment [edit | edit source]

Primitive multitasking environments provide a clean "resource sharing" where each task is assigned a specific chunk of memory, and the task is invoked at strictly defined intervals.

More advanced multitasking systems allocate resources dynamically when a task starts in memory or leaves memory, depending on its priority and on the system's strategy. This multitasking environment has the following features:

Each task has its own priority, in accordance with which it receives processor time and memory

The system organizes task queues so that all tasks receive resources, depending on the priorities and strategy of the system

The system organizes the processing of interrupts, according to which tasks can be activated, deactivated and deleted

· At the end of the specified time slice, the kernel temporarily transfers the task from the execution state to the ready state, giving resources to other tasks. If there is a lack of memory, pages of non-executing tasks can be preempted to disk (swapping), and then, after a certain time by the system, restored in memory

The system protects the address space of the task from unauthorized interference from other tasks

The system protects the address space of its kernel from unauthorized interference of tasks

The system detects crashes and freezes of individual tasks and stops them

The system solves conflicts of access to resources and devices, avoiding deadlocks of general hang-up from waiting for locked resources

The system guarantees each task that sooner or later it will be activated

The system processes requests in real time

The system provides communication between processes

Difficulties in implementing a multitasking environment [edit | edit source]

The main difficulty in implementing a multitasking environment is its reliability, expressed in memory protection, handling failures and interrupts, preventing freezes and deadlocks.

Besides being reliable, a multitasking environment must be efficient. The cost of resources for its maintenance should not: interfere with processes through, slow down their work, sharply limit memory.

Multithreading- a property of a platform (for example, an operating system, a virtual machine, etc.) or an application that a process spawned in an operating system can consist of several streams running "in parallel", that is, without a prescribed order in time. For some tasks, this separation can achieve a more efficient use of computer resources.

Such streams also called threads of execution(from the English. thread of execution); sometimes called "threads" (literal translation of English. thread) or informally "threads".

The essence of multithreading is quasi-multitasking at the level of one executable process, that is, all threads are executed in the address space of the process. In addition, all threads in a process have not only a common address space, but also common file descriptors. A running process has at least one (main) thread.

Multithreading (as a programming doctrine) should not be confused with either multitasking or multiprocessing, despite the fact that operating systems that implement multitasking usually implement multithreading as well.

The advantages of multithreading in programming include the following:

· Simplification of the program in some cases through the use of a common address space.

· Less time spent on creating a stream relative to the process.

· Increase in process performance due to parallelization of processor computations and I / O operations.

1 Types of Stream Implementation

2 Interaction of threads

3 Criticism of terminology

· 6 Notes

Types of Stream Implementation [edit | edit source]

· Stream in user space. Each process has a thread table similar to the kernel process table.

The advantages and disadvantages of this type are as follows: Disadvantages

1. Absence of a timer interrupt within one process

2. When using a blocking system request for a process, all its threads are blocked.

3. Complexity of implementation

· Thread in kernel space. Along with the process table, there is a thread table in kernel space.

· "Fibers" (eng. fibers). Multiple user-mode threads running in a single kernel-mode thread. The kernel space thread consumes noticeable resources, primarily physical memory and the kernel-mode address range for the kernel-mode stack. Therefore, the concept of "fiber" was introduced - a lightweight flow performed exclusively in user mode. Each stream can have multiple "fibers".

Interaction of threads [edit | edit source]

In a multithreaded environment, problems often arise related to the use of the same data or devices by parallel executing threads. To solve such problems, such methods of interaction of threads as mutual exclusions (mutexes), semaphores, critical sections and events are used.

· Mutual exclusions (mutex, mutex) is a synchronization object that is set to a special signal state when not busy with any thread. Only one thread owns this object at any given time, hence the name of such objects (from English mut ually ex clusive access - mutually exclusive access) - concurrent access to a shared resource is excluded. After all the necessary actions are taken, the mutex is released, giving other threads access to the shared resource. The object can support recursive capture a second time by the same thread, incrementing the counter without blocking the thread, and then requiring multiple deallocations. This is, for example, the critical section in Win32. However, there are some implementations that do not support this and cause the thread to deadlock when attempting a recursive capture. This is FAST_MUTEX in the Windows kernel.

· Semaphores represent available resources that can be acquired by multiple threads at the same time until the resource pool is empty. Then additional threads must wait until the required amount of resources is available again. Semaphores are very efficient because they allow concurrent access to resources. A semaphore is a logical extension of a mutex - a semaphore with a counter of 1 is equivalent to a mutex, but the counter can be more than 1.

· Developments. An object that stores 1 bit of information "signaled or not", over which the operations "signal", "reset to non-signaled state" and "wait" are defined. Waiting on a signaled event is the absence of an operation with immediate continuation of the thread execution. Waiting on an unselected event causes the thread to be suspended until another thread (or the second phase of the interrupt handler in the OS kernel) signals the event. It is possible to wait for several events in the "any" or "all" modes. It is also possible to create an event that is automatically reset to a non-signaled state after the first - and only - waiting thread wakes up (such an object is used as the basis for the implementation of the "critical section" object). They are actively used in MS Windows, both in user mode and in kernel mode. There is a similar object in the Linux kernel called kwait_queue.

· Critical sections provide synchronization similar to mutexes, except that objects representing critical sections are available within the same process. Events, mutexes, and semaphores can also be used in a single-process application, but critical section implementations in some operating systems (such as Windows NT) provide a faster and more efficient mutually exclusive synchronization mechanism - the get and release operations on the critical section are optimized for the case of a single thread (no contention) in order to avoid any system calls leading to the OS kernel. Like mutexes, an object representing a critical section can only be used by one thread at a time, which makes them extremely useful for delimiting access to shared resources.

· Conditional variables (condvars). They are similar to events, but they are not objects that occupy memory - only the address of a variable is used, the concept of "contents of a variable" does not exist, the address of an arbitrary object can be used as a conditional variable. Unlike events, setting a condition variable to a signaled state has no consequences if there are currently no threads waiting on the variable. Setting an event in a similar case entails storing the "signaled" state inside the event itself, after which the next threads wishing to wait for the event continue execution immediately without stopping. To make full use of such an object, it is also necessary to release the mutex and wait for the conditional variable atomically. They are widely used in UNIX-like operating systems. Discussions about the advantages and disadvantages of events and condition variables are a notable part of discussions about the advantages and disadvantages of Windows and UNIX.

IO completion port (IOCP). The "queue" object implemented in the OS kernel and accessible through system calls with the operations "put the structure in the tail of the queue" and "take the next structure from the head of the queue" - the last call pauses the execution of the thread if the queue is empty and until another thread will not make the call to "put". The most important feature of IOCP is that structures can be placed into it not only by an explicit system call from user mode, but also implicitly within the OS kernel as a result of an asynchronous I / O operation completed on one of the file descriptors. To achieve this effect, you must use the "bind file descriptor to IOCP" system call. In this case, the structure placed in the queue contains the error code of the I / O operation, and also, if this operation is successful, the number of actually input or output bytes. The completion port implementation also limits the number of threads that run on a single processor / core after a structure is queued. The object is specific to MS Windows, and allows processing incoming connection requests and data chunks in server software in an architecture where the number of threads can be less than the number of clients (there is no requirement to create a separate thread with resource consumption for each new client).

· ERESOURCE. A mutex that supports recursive capture, with shared or exclusive capture semantics. Semantics: an object can be either free, or captured by an arbitrary number of threads in a shared manner, or captured by just one thread in an exclusive manner. Any capture attempts that violate this rule will block the thread until the object is freed so that the capture is allowed. There are also operations like TryToAcquire - it never blocks a thread, either captures, or (if a lock is needed) returns FALSE without doing anything. It is used in the Windows kernel, especially in file systems - for example, any open disk file corresponds to an FCB structure, in which there are 2 such objects to synchronize access to the file size. One of them - paging IO resource - is captured exclusively in the file truncation path, and ensures that at the time of truncation, there is no active I / O from the cache and from memory mapping on the file.

Rundown protection. Semi-documented (calls are present in header files, but not in documentation) object in the Windows kernel. Counter with "increase", "decrease" and "wait" operations. Waiting will block the thread until decrement operations decrement the counter to zero. In addition, the increment operation may fail, and the presence of a currently active wait time causes all increment operations to fail.

Earlier posts talked about multithreading in Windows using CreateThread and other WinAPI, as well as multithreading in Linux and other * nix systems using pthreads. If you are writing in C ++ 11 or later, then std :: thread and other multithreaded primitives introduced in this language standard are available to you. The following will show you how to work with them. Unlike WinAPI and pthreads, code written in std :: thread is cross-platform.

Note: The above code has been tested on GCC 7.1 and Clang 4.0 under Arch Linux, GCC 5.4 and Clang 3.8 under Ubuntu 16.04 LTS, GCC 5.4 and Clang 3.8 under FreeBSD 11, and Visual Studio Community 2017 under Windows 10. CMake cannot speak before version 3.8 the compiler should use the C ++ 17 standard specified in the project properties. How to install CMake 3.8 on Ubuntu 16.04. For the code to compile with Clang, the libc ++ package must be installed on * nix systems. For Arch Linux, the package is available on the AUR. There is a libc ++ - dev package in Ubuntu, but you may encounter problems that prevent the code from building so easily. The workaround is described on StackOverflow. On FreeBSD, you need to install the cmake-modules package to compile the project.

Mutexes

Below is the simplest example of using trades and mutexes:

#include
#include
#include
#include

Std :: mutex mtx;
static int counter = 0;

for (;;) (
{
std :: lock_guard< std:: mutex >lock (mtx);

break;
int ctr_val = ++ counter;
std :: cout<< "Thread " << tnum << ": counter = " <<
ctr_val<< std:: endl ;
}

}
}

int main () (
std :: vector< std:: thread >threads;
for (int i = 0; i< 10 ; i++ ) {

}

// can "t use const auto & here since .join () is not marked const

thr.join ();
}

Std :: cout<< "Done!" << std:: endl ;
return 0;
}

Note the wrapping of std :: mutex in std :: lock_guard according to the RAII idiom. This approach ensures that the mutex is released when the scope exits in any case, including when exceptions are thrown. To capture multiple mutexes at once in order to prevent deadlocks, there is the std :: scoped_lock class. However, it only appeared in C ++ 17 and therefore may not work everywhere. For earlier versions of C ++ there is a std :: lock template similar in functionality, although it requires writing additional code to correctly release locks using RAII.

RWLock

Often a situation arises in which access to an object is more often read than write. In this case, instead of the usual mutex, it is more efficient to use read-write lock, aka RWLock. RWLock can be captured by several threads at once for reading, or by only one thread for writing. RWLock in C ++ corresponds to the std :: shared_mutex and std :: shared_timed_mutex classes:

#include
#include
#include
#include

// std :: shared_mutex mtx; // will not work with GCC 5.4
std :: shared_timed_mutex mtx;

static int counter = 0;
static const int MAX_COUNTER_VAL = 100;

void thread_proc (int tnum) (
for (;;) (
{
// see also std :: shared_lock
std :: unique_lock< std:: shared_timed_mutex >lock (mtx);
if (counter == MAX_COUNTER_VAL)
break;
int ctr_val = ++ counter;
std :: cout<< "Thread " << tnum << ": counter = " <<
ctr_val<< std:: endl ;
}
std :: this_thread :: sleep_for (std :: chrono :: milliseconds (10));
}
}

int main () (
std :: vector< std:: thread >threads;
for (int i = 0; i< 10 ; i++ ) {
std :: thread thr (thread_proc, i);
threads.emplace_back (std :: move (thr));
}

for (auto & thr: threads) (
thr.join ();
}

Std :: cout<< "Done!" << std:: endl ;
return 0;
}

By analogy with std :: lock_guard, the std :: unique_lock and std :: shared_lock classes are used to capture RWLock, depending on how we want to capture the lock. The std :: shared_timed_mutex class appeared in C ++ 14 and works on all * modern platforms (I will not say for mobile devices, game consoles, and so on). Unlike std :: shared_mutex, it has try_lock_for, try_lock_unti and others methods that try to lock the mutex for a given time. I strongly suspect that std :: shared_mutex should be cheaper than std :: shared_timed_mutex. However, std :: shared_mutex appeared only in C ++ 17, which means it is not supported everywhere. In particular, the still widely used GCC 5.4 does not know about it.

Thread Local Storage

Sometimes you need to create a variable, like a global one, but which only one thread sees. Other threads also see the variable, but they have its own local meaning. To do this, they came up with Thread Local Storage, or TLS (has nothing to do with Transport Layer Security!). Among other things, TLS can be used to significantly speed up the generation of pseudo-random numbers. An example of using TLS in C ++:

#include
#include
#include
#include

Std :: mutex io_mtx;
thread_local int counter = 0;
static const int MAX_COUNTER_VAL = 10;

void thread_proc (int tnum) (
for (;;) (
counter ++;
if (counter == MAX_COUNTER_VAL)
break;
{
std :: lock_guard< std:: mutex >lock (io_mtx);
std :: cout<< "Thread " << tnum << ": counter = " <<
counter<< std:: endl ;
}
std :: this_thread :: sleep_for (std :: chrono :: milliseconds (10));
}
}

int main () (
std :: vector< std:: thread >threads;
for (int i = 0; i< 10 ; i++ ) {
std :: thread thr (thread_proc, i);
threads.emplace_back (std :: move (thr));
}

for (auto & thr: threads) (
thr.join ();
}

Std :: cout<< "Done!" << std:: endl ;
return 0;
}

The mutex is used here solely to synchronize the output to the console. No synchronization is required to access thread_local variables.

Atomic variables

Atomic variables are often used to perform simple operations without using mutexes. For example, you need to increment a counter from multiple threads. Instead of wrapping int in std :: mutex, it's more efficient to use std :: atomic_int. C ++ also offers the types std :: atomic_char, std :: atomic_bool, and many others. Lock-free algorithms and data structures are also implemented using atomic variables. It should be noted that they are very difficult to develop and debug, and not all systems work faster than similar algorithms and data structures with locks.

Sample code:

#include
#include
#include
#include
#include

static std :: atomic_int atomic_counter (0);
static const int MAX_COUNTER_VAL = 100;

Std :: mutex io_mtx;

void thread_proc (int tnum) (
for (;;) (
{
int ctr_val = ++ atomic_counter;
if (ctr_val> = MAX_COUNTER_VAL)
break;

{
std :: lock_guard< std:: mutex >lock (io_mtx);
std :: cout<< "Thread " << tnum << ": counter = " <<
ctr_val<< std:: endl ;
}
}
std :: this_thread :: sleep_for (std :: chrono :: milliseconds (10));
}
}

int main () (
std :: vector< std:: thread >threads;

int nthreads = std :: thread :: hardware_concurrency ();
if (nthreads == 0) nthreads = 2;

for (int i = 0; i< nthreads; i++ ) {
std :: thread thr (thread_proc, i);
threads.emplace_back (std :: move (thr));
}

for (auto & thr: threads) (
thr.join ();
}

Std :: cout<< "Done!" << std:: endl ;
return 0;
}

Note the use of the hardware_concurrency procedure. It returns an estimate of the number of threads that can be run in parallel on the current system. For example, on a machine with a quad-core processor that supports hyper threading, the procedure returns 8. Also, the procedure can return zero if the evaluation failed or the procedure was simply not implemented.

For some information on how atomic variables work at the assembler level, see the article Cheat Sheet for Basic Assembly Instructions x86 / x64.

Conclusion

As far as I can see, this all works really well. That is, when writing cross-platform applications in C ++, you can safely forget about WinAPI and pthreads. In pure C, cross-platform trades also exist since C11. But they are still not supported by Visual Studio (I checked) and are unlikely to ever be. It's no secret that Microsoft sees no interest in developing support for the C language in its compiler, preferring to concentrate on C ++.

There are still a lot of primitives behind the scenes: std :: condition_variable (_any), std: :( shared_) future, std :: promise, std :: sync and others. I recommend cppreference.com to review them. It might also make sense to read the book C ++ Concurrency in Action. But I must warn you that it is no longer new, contains a lot of water, and in fact retells a dozen articles from cppreference.com.

The full source code for this note, as usual, is on GitHub. How do you write multithreaded C ++ applications now?

Clay Breshears

Introduction

Intel's multithreading implementation methods include four main phases: analysis, design and implementation, debugging, and performance tuning. This is the approach used to create a multi-threaded application from sequential code. Working with software during the first, third and fourth stages is covered quite widely, while the information on the implementation of the second step is clearly insufficient.

Many books have been published on parallel algorithms and parallel computing. However, these publications mainly cover message passing, distributed memory systems, or theoretical parallel computing models that are sometimes inapplicable to real multicore platforms. If you are ready to get serious about multithreading programming, you will probably need knowledge about developing algorithms for these models. Of course, the use of these models is rather limited, so many software developers may have to implement them in practice.

It is no exaggeration to say that the development of multithreaded applications is, first of all, a creative activity, and only then a scientific activity. In this article, you will learn about eight easy rules to help you expand your base of concurrent programming practices and improve the efficiency of threading your applications.

Rule 1. Select the operations performed in the program code independently of each other

Parallel processing applies only to those operations in sequential code that are performed independently of each other. A good example of how independent actions lead to a real single result is building a house. It involves workers of many specialties: carpenters, electricians, plasterers, plumbers, roofers, painters, bricklayers, gardeners, etc. Of course, some of them cannot start working before others have finished their activities (for example, roofers will not start work until the walls are built, and painters will not paint these walls if they are not plastered). But in general, we can say that all people involved in the construction act independently of each other.

Consider another example - the work cycle of a DVD rental shop that receives orders for certain films. Orders are distributed among the employees of the point who are looking for these films in the warehouse. Naturally, if one of the workers takes a disc from the warehouse on which a film with Audrey Hepburn was recorded, this will in no way affect another worker looking for another action movie with Arnold Schwarzenegger, and even less will it affect their colleague, who is in search of discs with new season of the series "Friends". In our example, we believe that all the problems associated with the lack of films in stock were resolved before the orders arrived at the rental point, and the packaging and shipping of any order will not affect the processing of others.

In your work, you will probably come across computations that can only be processed in a specific sequence, and not in parallel, since different iterations or steps of the loop depend on each other and must be performed in a strict order. Let's take a living example from the wild. Imagine a pregnant deer. Since bearing a fetus lasts an average of eight months, then, whatever one may say, a fawn will not appear in a month, even if eight reindeer become pregnant at the same time. However, eight reindeer at the same time would do their job perfectly if harnessed to all of them in Santa's sleigh.

Rule 2. Apply parallelism with a low level of detail

There are two approaches to parallel partitioning of sequential program code: bottom-up and top-down. First, at the stage of code analysis, code segments (so-called "hot" spots) are determined, which take up a significant part of the program execution time. Separating these code segments in parallel (if possible) will provide the maximum performance gain.

The bottom-up approach implements multithreaded processing of code hot spots. If parallel splitting of the found points is not possible, you should examine the application call stack to determine other segments that are available for parallel splitting and take a long time to complete. Let's say you're working on an application for compressing graphics. Compression can be implemented using several independent parallel streams that process individual segments of the image. However, even if you managed to implement multithreading "hotspots", do not neglect the analysis of the call stack, as a result of which you can find segments available for parallel splitting at a higher level of the program code. This way you can increase the granularity of the parallel processing.

In the top-down approach, the work of the program code is analyzed, and its individual segments are highlighted, the execution of which leads to the completion of the entire task. If there is no clear independence of the main code segments, analyze their constituent parts to find independent computations. By analyzing the program code, you can determine the code modules that consume the most CPU time. Let's look at how to implement threading in a video encoding application. Parallel processing can be implemented at the lowest level - for independent pixels of one frame, or at a higher level - for groups of frames that can be processed independently of other groups. If an application is being written to process multiple video files at the same time, parallel splitting at this level may be even easier, and the detail will be the lowest.

The granularity of parallel computing refers to the amount of computation that must be performed before synchronizing between threads. In other words, the less frequently synchronization occurs, the lower the granularity. Fine-grained threading computations can cause the system overhead of threading to outweigh the useful computations performed by those threads. The increase in the number of threads with the same amount of computation complicates the processing process. Low-granularity multithreading introduces less system latency and has more potential for scalability, which can be achieved with additional threads. To implement low-granularity parallel processing, it is recommended to use a top-down approach and thread at a high level in the call stack.

Rule 3. Build scalability into your code to improve performance as the number of cores grows.

Not so long ago, in addition to dual-core processors, quad-core ones appeared on the market. Moreover, Intel has already announced a processor with 80 cores, capable of performing a trillion floating point operations per second. Since the number of cores in processors will only grow over time, your code must have adequate potential for scalability. Scalability is a parameter by which one can judge the ability of an application to adequately respond to changes such as an increase in system resources (number of cores, memory size, bus frequency, etc.) or an increase in the amount of data. With the number of cores in future processors increasing, write scalable code that will increase in performance by increasing system resources.

To paraphrase one of the laws of C. Northecote Parkinson, we can say that "data processing takes up all available system resources." This means that as computing resources increase (for example, the number of cores), all of them are most likely to be used to process data. Let's go back to the video compression application discussed above. The addition of additional cores to the processor is unlikely to affect the size of the frames processed - instead, the number of threads processing the frame will increase, which will lead to a decrease in the number of pixels per stream. As a result, due to the organization of additional streams, the amount of service data will increase, and the degree of parallelism granularity will decrease. Another more likely scenario is an increase in the size or number of video files that need to be encoded. In this case, the organization of additional streams that will process larger (or additional) video files will allow the entire volume of work to be divided directly at the stage where the increase took place. In turn, an application with such capabilities will have a high potential for scalability.

Designing and implementing parallel processing using data decomposition provides increased scalability compared to using functional decomposition. The number of independent functions in the program code is most often limited and does not change during the execution of the application. Since each independent function is allocated a separate thread (and, accordingly, a processor core), then with an increase in the number of cores, additionally organized threads will not cause an increase in performance. So, parallel partitioning models with data decomposition will provide increased potential for scalability of the application due to the fact that the amount of processed data will increase with the number of processor cores.

Even if the program code is threading independent functions, it is possible to use additional threads that are launched when the input load increases. Let's go back to the house building example discussed above. The construction's peculiar purpose is to complete a limited number of independent tasks. However, if you are instructed to build twice as many floors, you will probably want to hire additional workers in some specialties (painters, roofers, plumbers, etc.). Consequently, you need to develop applications that can adapt to the data decomposition resulting from increased workload. If your code implements functional decomposition, consider organizing additional threads as the number of processor cores increases.

Rule 4. Use thread-safe libraries

If you might need a library to handle data hot spots in your code, be sure to consider using out-of-the-box functions instead of your own code. In short, do not try to reinvent the wheel by developing code segments whose functions are already provided in optimized procedures from the library. Many libraries, including the Intel® Math Kernel Library (Intel® MKL) and Intel® Integrated Performance Primitives (Intel® IPP), already contain multi-threaded functionality optimized for multi-core processors.

It is worth noting that when using procedures from the multithreaded libraries, you need to make sure that calling one or another library will not affect the normal operation of threads. That is, if procedure calls are made from two different threads, correct results should be returned from each call. If procedures refer to shared library variables and update them, a data race may occur, which will adversely affect the reliability of the calculation results. To work correctly with threads, the library procedure is added as new (that is, it does not update anything other than local variables) or synchronized to protect access to shared resources. Conclusion: before using any third-party library in your program code, read the documentation attached to it to make sure it works correctly with streams.

Rule 5. Use an Appropriate Multithreading Model

Suppose that functions from the multithreaded libraries are clearly not enough for the parallel division of all suitable code segments, and you had to think about the organization of threads. Don't rush to create your own (cumbersome) thread structure if the OpenMP library already contains all the functionality you need.

The downside of explicit multithreading is the impossibility of precise thread control.

If you only need parallel separation of resource-intensive loops, or the additional flexibility that explicit threads provide is secondary to you, then in this case it makes no sense to do extra work. The more complex the implementation of multithreading, the greater the likelihood of errors in the code and the more difficult its subsequent revision.

The OpenMP library is focused on data decomposition and is especially well suited for threading loops working with large amounts of information. Despite the fact that only data decomposition is applicable to some applications, it is necessary to take into account additional requirements (for example, of the employer or customer), according to which the use of OpenMP is unacceptable and it remains to implement multithreading using explicit methods. In this case, OpenMP can be used for preliminary threading to estimate the potential performance gains, scalability, and approximate effort that would be required to subsequently split the code by explicitly multithreading.

Rule 6. The result of the program code should not depend on the sequence of execution of parallel threads

For sequential program code, it is enough to simply define an expression that will be executed after any other expression. In multi-threaded code, the order of execution of threads is not defined and depends on the instructions of the operating system scheduler. Strictly speaking, it is almost impossible to predict the sequence of threads that will be launched to perform an operation, or to determine which thread will be launched by the scheduler at a later time. Prediction is primarily used to reduce the latency of an application, especially when running on a platform with a processor with fewer cores than the number of organized threads. If a thread is blocked because it needs access to an area not written to the cache, or because it needs to execute an I / O request, the scheduler will suspend it and start the thread ready to start.

Data race situations are an immediate result of uncertainty in thread execution scheduling. It may be wrong to assume that some thread will change the value of a shared variable before another thread reads that value. With a good luck, the order of execution of threads for a particular platform will remain the same across all launches of the application. However, the smallest changes in the state of the system (for example, the location of data on the hard disk, the speed of memory, or even a deviation from the nominal frequency of the AC power supply network) can provoke a different order of execution of threads. Thus, for program code that works correctly only with a certain sequence of threads, problems associated with "data race" situations and deadlocks are likely.

From the point of view of performance gain, it is preferable not to restrict the order of execution of threads. A strict sequence of execution of streams is allowed only in case of emergency, determined by a predetermined criterion. In the event of such a circumstance, the threads will be launched in the order specified by the provided synchronization mechanisms. For example, imagine two friends reading a newspaper spread out on a table. First, they can read at different speeds, and second, they can read different articles. And here it doesn't matter who reads the spread of the newspaper first - in any case, he will have to wait for his friend before turning the page. At the same time, there are no restrictions on the time and order of reading articles - friends read at any speed, and synchronization between them occurs immediately when turning the page.

Rule 7. Use local stream storage. Assign locks to specific data areas as needed

Synchronization inevitably increases the load on the system, which in no way speeds up the process of obtaining the results of parallel computations, but ensures their correctness. Yes, synchronization is necessary, but it shouldn't be overused. To minimize synchronization, local storage of streams or allocated memory areas (for example, array elements marked with identifiers of the corresponding streams) is used.

The need to share temporary variables by different threads is rare. Such variables must be declared or allocated locally to each thread. Variables whose values are intermediate results of the execution of threads must also be declared local to the corresponding threads. Synchronization is required to sum these intermediate results in a shared memory area. To minimize the potential stress on the system, it is preferable to update this common area as little as possible. For explicit multithreading methods, there are thread local storage APIs that ensure the integrity of local data from the start of execution of one multithreaded segment of code until the start of the next segment (or during the processing of one call to a multithreaded function until the next execution of the same function).

If it is not possible to store streams locally, access to shared resources is synchronized using various objects, such as locks. In this case, it is important to correctly assign locks to specific data blocks, which is easiest to do if the number of locks is equal to the number of data blocks. A single locking mechanism that synchronizes access to multiple areas of memory is used only when all these areas are constantly in the same critical section of the program code.

What to do if you need to synchronize access to a large amount of data, for example, to an array of 10,000 elements? Providing a single lock for the entire array is definitely a bottleneck in the application. Do you really have to organize locking for each element separately? Then, even if 32 or 64 parallel threads will access the data, you will have to prevent access conflicts to a fairly large memory area, and the probability of such conflicts is 1%. Fortunately, there is a kind of golden mean, the so-called "modulo locks". If N modulo locks are used, each will synchronize access to the Nth part of the shared data area. For example, if two such locks are organized, one of them will prevent access to even elements of the array, and the other - to odd ones. In this case, threads, referring to the required element, determine its parity and set the appropriate lock. The number of locks modulo is selected taking into account the number of threads and the probability of simultaneous access by several threads to the same memory area.

Note that the simultaneous use of several locking mechanisms is not allowed to synchronize access to one memory area. Let us recall Segal's law: “A person who has one watch knows for sure what time it is. A person who has a few watches is not sure of anything. " Let's assume that two different locks control access to a variable. In this case, the first lock can be used by one segment of the code, and the second by another segment. Then the threads executing these segments will find themselves in a race situation for the shared data they are accessing at the same time.

Rule 8. Change the software algorithm if required to implement multithreading

The criterion for evaluating the performance of applications, both sequential and parallel, is the execution time. As an estimate of the algorithm, an asymptotic order is suitable. This theoretical metric is almost always useful for evaluating the performance of an application. That is, all other things being equal, an application with a growth rate of O (n log n) (quicksort) will run faster than an application with a growth rate of O (n2) (selective sort), although the results of these applications are the same.

The better the asymptotic order of execution, the faster the parallel application runs. However, even the most efficient sequential algorithm cannot always be split into parallel streams. If a program hot spot is too difficult to split, and there is no way to multithread at a higher level of the hot spot call stack, you should first consider using a different sequential algorithm that is easier to split than the original one. Of course, there are other ways to prepare your code for threading.

As an illustration of the last statement, consider the multiplication of two square matrices. Strassen's algorithm has one of the best asymptotic execution orders: O (n2.81), which is much better than the O (n3) order of the ordinary triple nested loop algorithm. According to Strassen's algorithm, each matrix is divided into four submatrices, after which seven recursive calls are made to multiply n / 2 × n / 2 submatrices. To parallelize recursive calls, you can create a new thread that will sequentially perform seven independent multiplications of submatrices until they reach the specified size. In this case, the number of threads will grow exponentially, and the granularity of the computations performed by each newly formed thread will increase with decreasing size of the submatrices. Let's consider another option - organizing a pool of seven threads working simultaneously and performing one multiplication of submatrices. Upon termination of the thread pool, the Strassen method is recursively called to multiply the submatrices (as in the sequential version of the program code). If the system running such a program has more than eight processor cores, some of them will be idle.

The matrix multiplication algorithm is much easier to parallelize using a nested ternary loop. In this case, data decomposition is applied, in which matrices are divided into rows, columns or submatrices, and each of the threads performs certain calculations. The implementation of such an algorithm is carried out using OpenMP pragmas inserted at some level of the loop, or by explicitly organizing threads that perform matrix division. The implementation of this simpler sequential algorithm will require much less modifications in the program code, compared to the implementation of the multithreaded Strassen algorithm.

So, now you know eight simple rules for effectively converting sequential code to parallel. By following these guidelines, you will be able to create multithreaded solutions significantly faster, with increased reliability, optimal performance, and fewer bottlenecks.

To return to the multithreaded programming tutorials web page, go to

Andrey Kolesov

Getting started examining the principles of creating multithreaded applications for the Microsoft .NET Framework, let's make a reservation: although all examples are given in Visual Basic .NET, the methodology for creating such programs is generally the same for all programming languages that support .NET, including C #. VB was chosen to demonstrate the technology for creating multi-threaded applications primarily because previous versions of this tool did not provide such an opportunity.

Beware: Visual Basic .NET can do THIS too!

As you know, Visual Basic (up to and including version 6.0) has never allowed creating multithreaded software components (EXE, ActiveX DLL, and OCX) before. Remember that COM architecture includes three different threading models: Single Thread, Single Threaded Apartment (STA), and Multi-Threaded Apartment. VB 6.0 allows you to create programs of the first two types. The STA version provides for a pseudo-multithreading mode - several threads really work in parallel, but at the same time the program code of each of them is protected from access to it from the outside (in particular, threads cannot use shared resources).

Visual Basic .NET can now implement free threading in its native form. More precisely, in .NET, this mode is supported at the level of the common class libraries Class Library and the Common Language Runtime. As a result, VB.NET gained access to these capabilities along with other programming languages .NET.

At one time, the VB developer community, expressing dissatisfaction with many future innovations of this language, reacted with great approval to the news that with the help of the new version of the tool it will be possible to create multithreaded programs (see "Waiting for Visual Studio .NET", "BYTE / Russia "No. 1/2001). However, many experts expressed more restrained assessments of this innovation. For example, here is the opinion of Dan Appleman, a well-known developer and author of numerous books for VB programmers: due to human rather than technological factors ... I am afraid of multithreading in VB.NET because VB programmers usually do not have experience designing and debugging multithreaded applications. "

Indeed, like other low-level programming tools (for example, the same Win APIs), free multithreading, on the one hand, provides more opportunities for creating high-performance scalable solutions, and on the other hand, it imposes higher requirements on the qualifications of developers. Moreover, the problem here is aggravated by the fact that the search for errors in a multithreaded application is very difficult, since they appear most often randomly, as a result of a specific intersection of parallel computational processes (it is often simply impossible to reproduce such a situation again). That is why the methods of traditional debugging of programs in the form of their repeated run usually do not help in this case. And the only way to safely use multithreading is to design your application well, adhering to all the classic principles of "good programming".

The problem with VB programmers is that although many of them are quite experienced professionals and are well aware of the pitfalls of multithreading, using VB6 could dull their vigilance. After all, accusing VB of limitations, we sometimes forget that many of the limitations were determined by the improved security features of this tool, which prevent or eliminate developer errors. For example, VB6 automatically creates a separate copy of all global variables for each thread, thus preventing possible conflicts between them. In VB.NET, such problems are completely shifted onto the shoulders of the programmer. It should also be remembered that the use of a multi-threaded model instead of a single-threaded one does not always lead to an increase in program performance; performance may even decrease (even in multiprocessor systems!).

However, all of the above should not be taken as advice not to mess with multithreading. You just need to have a good idea of when such modes are worth using, and understand that a more powerful development tool always places higher demands on the programmer's qualifications.

Parallel processing in VB6

Of course, it was possible to organize pseudo-parallel data processing using VB6, but these possibilities were very limited. For example, several years ago I needed to write a procedure that pauses the execution of a program for a specified number of seconds (the corresponding SLEEP statement was readily available in Microsoft Basic / DOS). It is easy to implement it yourself as the following simple subroutine:

Its performance can be easily verified, for example, by using the following code for handling a button click on a form:

To solve this problem in VB6, inside the Do ... Loop of the SleepVB procedure, you need to uncomment the call to the DoEvents function, which transfers control to the operating system and returns the number of open forms in this VB application. But note that displaying a window with the message "Another hello!", In turn, blocks the execution of the entire application, including the SleepVB procedure.

By using global variables as flags, you can also ensure that the SleepVB procedure that is running can terminate abnormally. It, in turn, is the simplest example of a computational process that completely takes up processor resources. But if you will be doing some useful calculations (and not spinning in an empty loop), then you need to keep in mind that the call to the DoEvent function takes quite a long time, so this should be done at fairly large intervals.

To see the limited support for parallel computing in VB6, replace the call to the DoEvents function with a label output:

Label1.Caption = Timer

In this case, not only will the Command2 button not be triggered, but even within 5 seconds the content of the label will not change.

For another experiment, add a wait call to the code for Command2 (this can be done since the SleepVB procedure is reentrant):

Private Sub Command2_Click () Call SleepVB (5) MsgBox "Another hello!" End Sub

Next, launch the application and click Command1, and after 2-3 seconds - Command2. The first message to appear is "Another hello!" Although the process was started later. This is because the DoEvents function only checks for events on the visuals, and not for the presence of other computational threads. Moreover, the VB application actually runs in one thread, so control returned to the event procedure that was last started.

.NET Thread Control

Building multi-threaded .NET applications relies on the use of a group of .NET Framework base classes that are described in the System.Threading namespace. In this case, the key role belongs to the Thread class, with the help of which almost all operations for managing threads are performed. From this point on, everything that has been said about working with threads applies to all programming tools in .NET, including C #.

For a first acquaintance with the creation of parallel streams, we will create a Windows application with a form on which we will place the ButtonStart and ButtonAbort buttons and write the following code:

Immediately I would like to draw your attention to three points. First, the Imports keywords are used to refer to the abbreviated names of the classes described here by the namespace. I specifically cited another use of Imports to describe the abbreviated equivalent of a long namespace name (VB = Microsoft.VisualBasic) that can be applied to program text. In this case, you can immediately see which namespace the Timer object belongs to.

Second, I used boolean brackets #Region to visually separate the code I wrote from the code generated automatically by the form designer (the latter is not shown here).

Thirdly, the descriptions of the input parameters of event procedures have been specially removed (this will be done sometimes in the future), so as not to be distracted by things that are not important in this case.

Start the application and click the ButtonStart button. The process of waiting in a loop for a specified time interval has started, and in this case (unlike the example with VB6) - in an independent thread. This is easy to see - all visual elements of the form are accessible. For example, by clicking the ButtonAbort button, you can abort the process using the Abort method (but closing the form using the system Close button will not abort the procedure!). For clarity of the dynamics of the process, you can place a label on the form, and add the output of the current time to the waiting loop of the SleepVBNET procedure:

Label1.Text = _ "Current time =" & VB.TimeOfDay

The execution of the SleepVBNET procedure (which in this case is already a method of the new object) will continue even if you add to the ButtonStart code the display of a message box about the start of calculations after the thread has started (Fig. 1).

A more complex option is a stream as a class

For further experiments with threads, let's create a new VB application of the Console type, consisting of a regular code module with a Main procedure (which starts executing when the application starts) and a module of the WorkerThreadClass class:

Let's launch the created application. A console window will appear, in which you will see a scrolling line of characters showing the model of the running computational process (WorkerThread). Then a message box will appear, issued by the calling process (Main), and finally we will see the picture shown in Fig. 2 (if you are not satisfied with the execution speed of the modeled process, then remove or add some arithmetic operations with the "a" variable in the WorkerThread procedure).

Please note: the message box "First thread started" was displayed with a noticeable delay after the start of the WorkerThread process (in the case of the form described in the previous paragraph, such a message would appear almost instantly after pressing the ButtonStart button). This is most likely because the event procedures take precedence over the process being started when working on the form. In the case of a console application, all procedures have the same priority. We'll discuss the issue of priorities later, but for now let's set the calling thread (Main) to the highest priority:

Thread.CurrentThread.Priority = _ ThreadPriority.Highest Thread1.Start ()

Now the window appears almost immediately. As you can see, there are two ways to create instances of the Thread object. First, we applied the first of them - we created a new object (thread) Thread1 and worked with it. The second option is to get the Thread object for the currently running thread using the static CurrentThread method. This is how the Main procedure set a higher priority for itself, but it could do it for any other thread, for example:

Thread1.Priority = ThreadPriority.Lowest Thread1.Start ()

To show the possibilities of managing a running process, add the following lines of code at the end of the Main procedure:

Now start the application while doing some mouse operations at the same time (hopefully you have chosen the desired latency level in WorkerThread so that the process is not very fast, but also not too slow).

First, "Process 1" will start in the console window, and the message "First thread started" appears. "Process 1" is running, and you quickly click the OK button in the message box.

Further - "Process 1" continues, but after two seconds the message "Thread is suspended" appears. Process 1 froze. Click OK on the message box: Process 1 continued and completed successfully.

In this snippet, we have used the Sleep method to suspend the current process. Note: Sleep is a static method and can only be applied to the current process, not any instance of Thread. The language syntax allows you to write Thread1.Sleep or Thread.Sleep, but in this case, the CurrentThread object is still used.

The Sleep method can also use the 0 argument. In this case, the current thread will release the unused remainder of its allotted time slice.

Another interesting use case for Sleep is with a Timeout.Infinite value. In this case, the thread will be suspended indefinitely until the state is interrupted by another thread using the Thread.Interrupt method.

To suspend an external thread from another thread without stopping the latter, you must use a call to the Thread.Suspend method. Then it will be possible to continue its execution by the Thread.Resume method, which we did in the above code.

A little about thread synchronization

Synchronizing threads is one of the main concerns when writing multithreaded applications, and the System.Threading space provides a wide range of tools to accomplish this. But now we will only get acquainted with the Thread.Join method, which allows you to track the end of a thread's execution. To see how it works, replace the last lines of Main with this code:

Process Priority Management

The allocation of processor time slices between threads is performed using priorities, which are set in the form of the Thread.Priority property. Streams created at runtime can be set to five values: Highest, AboveNormal, Normal (the default), BelowNormal, and Lowest. To see how priorities affect the execution speed of threads, let's write the following code for the Main procedure:

Sub Main () "description of the first process Dim Thread1 As Thread Dim oWorker1 As New WorkerThreadClass () Thread1 = New Thread (AddressOf _ oWorker1.WorkerThread)" Thread1.Priority = _ "ThreadPriority.BelowNormal" transfer the initial data: oWorker1.Start = 1 oWorker1.Finish = 10 oWorker1.ThreadName = "Countdown 1" oWorker1.SymThread = "." "description of the second process Dim Thread2 As Thread Dim oWorker2 As New WorkerThreadClass () Thread2 = New Thread (AddressOf _ oWorker2.WorkerThread)" transfer the initial data: oWorker2.Start = 11 oWorker2.Finish = 20 oWorker2.ThreadName = "Count 2" oWorker .SymThread = "*" "" starting a race Thread.CurrentThread.Priority = _ ThreadPriority.Highest Thread1.Start () Thread2.Start () "Waiting for the processes to finish Thread1.Join () Thread2.Join () MsgBox (" Both processes ended ") End Sub

Note that this uses a single class to create multiple threads. Let's start the application and look at the dynamics of the execution of the two threads (Fig. 3). Here you can see that, in general, they are executed at the same speed, the first is slightly ahead due to the earlier launch.

Now, before starting the first thread, set its priority one level lower:

Thread1.Priority = _ ThreadPriority.BelowNormal

The picture changed dramatically: the second stream took almost all the time away from the first (Fig. 4).

Note also the use of the Join method. With its help, we perform a fairly common variant of thread synchronization, in which the main program waits for the completion of several parallel computational processes.

Conclusion

We've just touched on the basics of developing multithreaded .NET applications. One of the most difficult and in practice topical issues is thread synchronization. In addition to using the Thread object described in this article (it has many methods and properties that we did not consider here), the Monitor and Mutex classes, as well as the lock (C #) and SyncLock (VB.NET) statements, play a very important role in thread management. ...

A more detailed description of this technology is given in separate chapters of the books and from which I would like to cite a few quotes (with which I completely agree) as a very short summary on the topic "Multithreading in .NET".

"If you're a beginner, you might be surprised to find that the overhead of creating and dispatching threads can make a single-threaded application run faster ... So always try to test both single-threaded and multi-threaded prototypes."

"You have to be careful about your multithreading design and tightly control access to shared objects and variables."

"Do not consider multithreading as the default approach."

"I asked an audience of experienced VB programmers if they would get free threading in a future version of VB. Almost everyone raised their hands. Then I asked who knows what he was doing. This time only a few people raised their hands. and there were knowing smiles on their faces. "

"If you are not intimidated by the difficulties of designing multithreaded applications, when used correctly, multithreading can dramatically improve application performance."

On my own I would add that the technology for creating multithreaded .NET applications (like many other .NET technologies) as a whole is practically independent of the language used. Therefore, I advise developers to study various books and articles, regardless of which programming language they choose to demonstrate a particular technology.

Literature:

Dan Appleman. Transition to VB.NET: strategies, concepts, code / Per. from English - SPb .: "Peter", 2002, - 464 p .: ill.
Tom Archer. C # basics. The latest technologies / Per. from English - M .: Publishing and trading house "Russian Edition", 2001. - 448 p .: ill.

Multitasking and multithreading

Let's start with this simple statement: 32-bit Windows operating systems support multitasking (multiprocessing) and multithreading modes of data processing. It is possible to discuss how well they do it, but that is another question.

Multitasking is a mode of operation where a computer can perform several tasks at the same time, in parallel. It is clear that if a computer has one processor, then we are talking about pseudo-parallelism, when the OS, according to some rules, can quickly switch between different tasks. A task is a program or part of a program (application) that performs some logical action and is the unit for which the OS allocates resources. In a somewhat simplified form, we can assume that in Windows a task is each software component implemented as a separate executable module (EXE, DLL). For Windows, the concept of "task" has the same meaning as "process", which, in particular, means the execution of program code strictly in the address space allocated for it.

There are two main types of multitasking - cooperative and preemptive. The first option, implemented in earlier versions of Windows, provides for switching between tasks only at the moment the active task accesses the OS (for example, for I / O). In this case, each thread is responsible for returning control to the OS. If the task forgot to do such an operation (for example, it got stuck in a loop), then quite often this led to the whole computer freezing.

Preemptive multitasking is a mode when the OS itself is responsible for giving each thread its due time-slice, after which it (if there are requests from other tasks) automatically interrupts this thread and decides what to start next. Previously, this mode was called "time-sharing".

What is a stream? A thread is an autonomous computational process, but not allocated at the OS level, but within a task. The fundamental difference between a thread and a "process-task" is that all threads of a task are executed in a single address space, that is, they can work with shared memory resources. This is precisely where their advantages (parallel data processing) and disadvantages (a threat to the program's reliability) lie. Here it should be borne in mind that in the case of multitasking, the OS is primarily responsible for protecting applications, and when using multithreading, the developer himself.

Note that the use of multitasking mode in single-processor systems allows increasing the overall performance of the multitasking system as a whole (although not always, since as the number of switches increases, the share of resources occupied by the OS increases). But the execution time for a specific task always, even if only slightly, increases due to the additional work of the OS.

If the processor is heavily loaded with tasks (with minimal downtime for I / O, for example, in the case of solving purely mathematical problems), a real overall performance increase is achieved only when using multiprocessor systems. Such systems allow different parallelization models - at the task level (each task can occupy only one processor, while threads are executed only in pseudo-parallelism) or at the thread level (when one task can occupy several processors with its threads).

Here you can also recall that when operating powerful shared computing systems, the ancestor of which was the IBM System / 360 family in the late 60s, one of the most urgent tasks was to choose the optimal multitasking control option - including in a dynamic mode, taking into account various parameters. In principle, multitasking control is a function of the operating system. But the effectiveness of the implementation of one or another option is directly related to the peculiarities of the architecture of the computer as a whole, and especially the processor. For example, the same high-performance IBM System / 360 worked well in shared systems for business tasks, but at the same time it was completely unsuitable for solving problems of the "real-time" class. At that time, significantly cheaper and simpler mini-computers such as DEC PDP 11/20 were clearly in the lead in this area.

What topic raises the most questions and difficulties for beginners? When I asked my teacher and Java programmer Alexander Pryakhin about this, he immediately replied: “Multithreading”. Thanks to him for the idea and help in preparing this article!

We will look into the inner world of the application and its processes, figure out what the essence of multithreading is, when it is useful, and how to implement it - using Java as an example. If you’re learning a different OOP language, don’t worry: the basic principles are the same.

About streams and their origins

To understand multithreading, let's first understand what a process is. A process is a piece of virtual memory and resources that the OS allocates to run a program. If you open several instances of the same application, the system will allocate a process for each. In modern browsers, a separate process can be responsible for each tab.

You have probably come across the Windows "Task Manager" (in Linux it is "System Monitor") and you know that unnecessary running processes load the system, and the most "heavy" of them often freeze, so they have to be terminated forcibly.

But users love multitasking: don't feed them bread - just open a dozen windows and jump back and forth. There is a dilemma: you need to ensure the simultaneous operation of applications and at the same time reduce the load on the system so that it does not slow down. Let's say the hardware can't keep up with the needs of the owners - you need to solve the issue at the software level.

We want the processor to execute more instructions and process more data per unit of time. That is, we need to fit more of the executed code in each time slice. Think of a unit of code execution as an object — that's a thread.

A complex case is easier to approach if you break it down into several simple ones. So when working with memory: a "heavy" process is divided into threads that take up fewer resources and are more likely to deliver the code to the calculator (how exactly - see below).

Each application has at least one process, and each process has at least one thread, which is called the main thread and from which, if necessary, new ones are launched.

Difference between threads and processes

Threads use the memory allocated for the process, and the processes require their own memory space. Therefore, threads are created and completed faster: the system does not need to allocate them a new address space each time, and then release it.

Processes each work with their own data - they can exchange something only through the mechanism of interprocess communication. Threads access each other's data and resources directly: what one changed is immediately available to everyone. The thread can control the "fellow" in the process, while the process controls exclusively its "daughters". Therefore, switching between streams is faster and communication between them is easier.

What is the conclusion from this? If you need to process a large amount of data as quickly as possible, break it up into chunks that can be processed by separate threads, and then piece the result together. It's better than spawning resource-hungry processes.

But why does a popular application like Firefox go the route of creating multiple processes? Because it is for the browser that isolated tabs work is reliable and flexible. If something is wrong with one process, it is not necessary to terminate the entire program - it is possible to save at least part of the data.

What is multithreading

So we come to the main thing. Multithreading is when the application process is split into threads that are processed in parallel - at one unit of time - by the processor.

The computational load is distributed between two or more cores, so that the interface and other program components do not slow down each other's work.

Multi-threaded applications can also be run on single-core processors, but then the threads are executed in turn: the first one worked, its state was saved - the second was allowed to work, saved - returned to the first or launched the third, etc.

Busy people complain that they only have two hands. Processes and programs can have as many hands as needed to complete the task as quickly as possible.

Wait for a signal: synchronization in multi-threaded applications

Imagine that several threads are trying to change the same data area at the same time. Whose changes will be eventually accepted and whose changes will be canceled? To avoid confusion with shared resources, threads need to coordinate their actions. To do this, they exchange information using signals. Each thread tells the others what it is doing and what changes to expect. So the data of all threads about the current state of resources is synchronized.

Basic Synchronization Tools

Mutual exclusion (mutual exclusion, abbreviated as mutex) - a "flag" going to the thread that is currently allowed to work with shared resources. Eliminates access by other threads to the occupied memory area. There can be several mutexes in an application, and they can be shared between processes. There is a catch: mutex forces the application to access the operating system kernel every time, which is costly.

Semaphore - allows you to limit the number of threads that can access a resource at a given moment. This will reduce the load on the processor when executing code where there are bottlenecks. The problem is that the optimal number of threads depends on the user's machine.

Event - you define a condition upon the occurrence of which control is transferred to the desired thread. Streams exchange event data to develop and logically continue each other's actions. One received the data, the other checked its correctness, the third saved it to the hard disk. Events differ in the way they are canceled. If you need to notify several threads about an event, you will have to manually set the cancellation function to stop the signal. If there is only one target thread, you can create an auto-reset event. It will stop the signal itself after it reaches the stream. Events can be queued for flexible flow control.

Critical section - a more complex mechanism that combines a loop counter and a semaphore. The counter allows you to postpone the start of the semaphore for the desired time. The advantage is that the kernel is only activated if the section is busy and the semaphore needs to be turned on. The rest of the time the thread runs in user mode. Alas, a section can only be used within one process.

How to implement multithreading in Java

The Thread class is responsible for working with threads in Java. Creating a new thread to execute a task means creating an instance of the Thread class and associating it with the code you want. This can be done in two ways:

subclass Thread;

implement the Runnable interface in your class, and then pass the class instances to the Thread constructor.

While we will not touch on the topic of deadlocks, when threads block each other's work and freeze, we will leave that for the next article.

Java multithreading example: ping pong with mutexes

If you think something terrible is about to happen, breathe out. We will consider working with synchronization objects almost in a playful way: two threads will be thrown by a mutex. But in fact, you will see a real application where only one thread can process publicly available data at a time.

First, let's create a class that inherits the properties of the Thread we already know, and write a kickBall method:

Public class PingPongThread extends Thread (PingPongThread (String name) (this.setName (name); // override the thread name) @Override public void run () (Ball ball = Ball.getBall (); while (ball.isInGame ()) (kickBall (ball);)) private void kickBall (Ball ball) (if (! ball.getSide (). equals (getName ())) (ball.kick (getName ());)))

Now let's take care of the ball. He will not be simple with us, but memorable: so that he can tell who hit him, from which side and how many times. To do this, we use a mutex: it will collect information about the work of each of the threads - this will allow isolated threads to communicate with each other. After the 15th hit, we will take the ball out of the game, so as not to seriously injure it.

Public class Ball (private int kicks = 0; private static Ball instance = new Ball (); private String side = ""; private Ball () () static Ball getBall () (return instance;) synchronized void kick (String playername) (kicks ++; side = playername; System.out.println (kicks + "" + side);) String getSide () (return side;) boolean isInGame () (return (kicks< 15); } }

And now two player threads are entering the scene. Let's call them, without further ado, Ping and Pong:

Public class PingPongGame (PingPongThread player1 = new PingPongThread ("Ping"); PingPongThread player2 = new PingPongThread ("Pong"); Ball ball; PingPongGame () (ball = Ball.getBall ();) void startGame () throws InterruptedException (player1 .start (); player2.start ();))

"Full stadium of people - time to start the match." We will officially announce the opening of the meeting - in the main class of the application:

Public class PingPong (public static void main (String args) throws InterruptedException (PingPongGame game = new PingPongGame (); game.startGame ();))

As you can see, there is nothing furious here. This is just an introduction to multithreading for now, but you already know how it works, and you can experiment - limit the duration of the game not by the number of strokes, but by time, for example. We'll come back to the topic of multithreading later - we'll look at the java.util.concurrent package, the Akka library, and the volatile mechanism. Let's also talk about implementing multithreading in Python.