What do you know about neural networks. What are neural networks and what tasks they can solve. Major players in the neural network market

Artificial Intelligence, neural networks, machine learning - what do all these popular concepts really mean? For most of the uninitiated people, which I myself am, they have always seemed like something fantastic, but in fact, their essence lies on the surface. I had an idea for a long time to write simple language about artificial neural networks. Learn for yourself and tell others what this technology is, how it works, consider its history and prospects. In this article, I tried not to get into the jungle, but simply and popularly tell about this promising direction in the world of high technologies.

Artificial intelligence, neural networks, machine learning - what do all these popular concepts really mean? For most uninitiated people, which I myself am, they always seemed like something fantastic, but in fact their essence lies on the surface. I had an idea for a long time to write in simple language about artificial neural networks. Learn for yourself and tell others what this technology is, how it works, consider its history and prospects. In this article, I tried not to get into the jungle, but simply and popularly tell about this promising direction in the world of high technologies.

A bit of history

For the first time, the concept of artificial neural networks (ANN) arose when trying to simulate the processes of the brain. The first major breakthrough in this area can be considered the creation of the McCulloch-Pitts neural network model in 1943. Scientists first developed a model of an artificial neuron. They also proposed a network design from these elements to perform logical operations. But most importantly, scientists have proven that such a network is capable of learning.

Next important step was the development by Donald Hebb of the first algorithm for computing ANN in 1949, which became fundamental for the next several decades. In 1958, Frank Rosenblatt developed the parceptron, a system that mimics the processes of the brain. At one time, the technology had no analogues and is still fundamental in neural networks. In 1986, almost simultaneously, independently of each other, American and Soviet scientists significantly improved the fundamental method of teaching the multilayer perceptron. In 2007, neural networks underwent a rebirth. British computer scientist Jeffrey Hinton first developed a deep learning algorithm for multilayer neural networks, which is now, for example, used to operate self-driving cars.

Briefly about the main

In the general sense of the word, neural networks are mathematical models that work on the principle of networks of nerve cells in an animal organism. ANNs can be implemented in both programmable and hardware solutions. For ease of perception, a neuron can be imagined as a certain cell, which has many input holes and one output. How numerous incoming signals are formed into outgoing signals determines the calculation algorithm. Effective values \u200b\u200bare fed to each neuron input, which are then propagated along interneuronal connections (synopsis). Synapses have one parameter - weight, due to which the input information changes when moving from one neuron to another. The easiest way to imagine how neural networks work can be represented by color mixing. Blue, green and red neurons have different weights... The information of the neuron, the weight of which will be more dominant in the next neuron.

The neural network itself is a system of many such neurons (processors). Individually, these processors are quite simple (much simpler than the personal computer), but being connected in large system neurons are capable of very complex tasks.

Depending on the area of \u200b\u200bapplication, the neural network can be interpreted in different ways, For example, from the point of view machine learning ANN is a pattern recognition method. From a mathematical point of view, this is a multi-parameter problem. From the point of view of cybernetics, it is a model of adaptive control of robotics. For artificial intelligence ANN is a fundamental building block for modeling natural intelligence using computational algorithms.

The main advantage of neural networks over conventional computation algorithms is their ability to learn. In the general sense of the word, learning consists in finding the correct coupling coefficients between neurons, as well as in synthesizing data and identifying complex relationships between input and output signals. In fact, successful training of a neural network means that the system will be able to identify the correct result based on data not present in the training sample.

Present situation

And no matter how promising this technology would be, so far ANNs are still very far from the capabilities of the human brain and thinking. Nevertheless, neural networks are already being used in many areas of human activity. So far, they are not able to make highly intellectual decisions, but they are able to replace a person where he was previously needed. Among the numerous areas of ANN application, one can note: the creation of self-learning systems of production processes, unmanned vehicles, image recognition systems, intelligent security systems, robotics, quality monitoring systems, voice interaction interfaces, analytics systems and much more. This widespread use of neural networks is, among other things, due to the emergence of various ways to accelerate the learning of ANN.

Today the market for neural networks is huge - it is billions and billions of dollars. As practice shows, most technologies of neural networks around the world differ little from each other. However, the use of neural networks is a very expensive activity, which in most cases can only be afforded by large companies... The development, training and testing of neural networks requires large computing power, it is obvious that large players in the IT market have enough of this. Among the main companies leading development in this area are Google DeepMind, Microsoft Research, IBM, Facebook and Baidu.

Of course, all this is good: neural networks are developing, the market is growing, but so far the main task has not been solved. Humanity has failed to create a technology that is even close in capabilities to the human brain. Let's take a look at the main differences between the human brain and artificial neural networks.

Why are neural networks still far from the human brain?

The most important difference that fundamentally changes the principle and efficiency of the system is the different signal transmission in artificial neural networks and in the biological network of neurons. The fact is that in the ANN neurons transmit values \u200b\u200bthat are real values, that is, numbers. In the human brain, impulses with a fixed amplitude are transmitted, and these impulses are almost instantaneous. Hence, there are a number of advantages to the human network of neurons.

First, communication lines in the brain are much more efficient and economical than in ANNs. Secondly, the impulse circuit ensures the simplicity of the technology implementation: it is sufficient to use analog circuits instead of complex computing mechanisms. Ultimately, impulse networks are protected from acoustic interference. Effective numbers are affected by noise, which increases the chance of errors.

Outcome

Of course, in the last decade, there has been a real boom in the development of neural networks. This is primarily due to the fact that the learning process of the ANN has become much faster and easier. Also, the so-called "pre-trained" neural networks began to be actively developed, which can significantly speed up the process of technology implementation. And if it is too early to say whether someday neural networks will be able to fully reproduce the capabilities of the human brain, the likelihood that ANNs will be able to replace a person in a quarter of existing professions in the next decade is increasingly becoming true.

For those who want to know more

The Great Neural War: What Google Is Really Up to
How cognitive computers can change our future

He showed how easy it is to create a neural network for recognizing pictures. But there is one but - what he described as a neural network is not. Before his next article, I want to tell you how to solve the same problem, but using the Kohonen neural network.

So, we will recognize numbers written in white on black, such as these:

Pictures are 45 by 45 pixels, which means that the inputs to our neural network will be 45 * 45.
For simplicity, we only recognize numbers from 0 to 5, so we will have 6 neurons - one for each answer.

The structure of our neural network:

Each connection of a network input to a neuron has its own weight. The impulse, passing through the connection, changes: impulse \u003d impulse * connection_weight.
The neuron receives impulses from all inputs and simply sums them up. The neuron with the highest total impulse wins. It's simple, we can implement it!

Classes for representing network elements (C #):
// Entrance
public class Input
{
// Connections to neurons
public Link OutgoingLinks;
}

// Connect the input to the neuron
public class Link
{
// Neuron
public Neuron Neuron;
// Link weight
public double Weight;
}

public class Neuron
{
// All inputs of the neuron
public Link IncomingLinks;
// The charge accumulated by the neuron
public double Power (get; set;)
}

Creation and initialization of a network is a boring business, for anyone interested - see the attached source. I will dwell only on the fact that the color of a pixel is a number from 0 to 255, with 0 being black, 255 being white, the colors between them are shades of gray.

The state of the KohonenNetwork class is the Input array and the Neuron array:
public class KohonenNetwork
{
private readonly Input _inputs;
private readonly Neuron _neurons;
...
}

Let's assume that our network is already trained. Then, in order to find out what is shown in the picture, we will call the Handle method, everything will be multiplied there, add up and find the maximum:
// Pass the vector through the neural network
public int Handle (int input)
{
for (var i \u003d 0; i< _inputs.Length; i++)
{
var inputNeuron \u003d _inputs [i];
foreach (var outgoingLink in inputNeuron.OutgoingLinks)
{
outgoingLink.Neuron.Power + \u003d outgoingLink.Weight * input [i];
}
}
var maxIndex \u003d 0;
for (var i \u003d 1; i< _neurons.Length; i++)
{
if (_neurons [i] .Power\u003e _neurons.Power)
maxIndex \u003d i;
}
// remove impulse from all neurons:
foreach (var outputNeuron in _neurons)
{
outputNeuron.Power \u003d 0;
}
return maxIndex;
}

But before asking the network anything, it needs to be trained. For training, we present pictures and indicate what is drawn on them:

Learning is a change in the weights of connections:
public void Study (int input, int correctAnswer)
{
var neuron \u003d _neurons;
for (var i \u003d 0; i< neuron.IncomingLinks.Length; i++)
{
var incomingLink \u003d neuron.IncomingLinks [i];
incomingLink.Weight \u003d incomingLink.Weight + 0.5 * (input [i] - incomingLink.Weight);
}
}

After training in two fonts, the neural network distinguishes numbers from other fonts as well. In particular, a control test will be passed on such numbers:
Of course, such a craft is not suitable for recognizing captchas - everything stops working, you just have to move, stretch or rotate the image.
However, it is becoming clear to everyone that using neural networks is not so difficult if you start with simple examples.

Today, on every corner, here and there, they shout about the benefits of neural networks. And only a few really understand what it is. If you turn to Wikipedia for an explanation, your head will spin from the height of the citadels of scientific terms and definitions built there. If you are far from genetic engineering, and the confused dry language of university textbooks only causes confusion and no ideas, then we will try to understand together the problem of neural networks.

To understand the problem, you need to find out the root cause, which lies completely on the surface. Remembering Sarah Connor, we understand with a shuddering heart that once the pioneers of computer development Warren McCulloch and Walter Pitts pursued the selfish goal of creating the first Artificial Intelligence.

Neural networks are an electronic prototype of a self-learning system. Like a child, a neural network absorbs information, chews it up, gains experience and learns. In the process of learning, such a network develops, grows and can draw its own conclusions and make decisions on its own.

If the human brain consists of neurons, then we will conventionally agree that an electronic neuron is a kind of imaginary box, which has many input holes, and the output one. The internal algorithm of the neuron determines the order of processing and analysis of the information received and transforming it into a single useful lump of knowledge. Depending on how well the inputs and outputs work, the entire system either thinks quickly, or, conversely, can slow down.

Important: Typically, analog information is used in neural networks.

Let us repeat that there can be many input streams of information (scientifically, this connection between initial information and our "neuron" is called synapses), and they are all different in nature and have unequal significance. For example, a person perceives the world around him through the organs of sight, touch and smell. It is logical that vision is more important than smell. Based on different life situations, we use certain senses: in complete darkness, touch and hearing come to the fore. By the same analogy, synapses in neural networks will have different significance in different situations, which is usually denoted by the connection weight. When writing code, a minimum threshold for passing information is set. If the weight of the connection is higher than the specified value, then the result of the neuron check is positive (and is equal to one in binary system), if less, then negative. It is logical that the higher the bar is set, the more accurate the neural network will work, but the longer it will take.

For a neural network to work correctly, you need to spend time training it - this is the main difference from simple programmable algorithms. Like a small child, a neural network needs an initial information base, but if you write the initial code correctly, then the neural network itself will be able not only to make the right choice from the available information, but also to make independent assumptions.

When writing the primary code, you need to explain your actions literally on your fingers. If we work, for example, with images, then at the first stage its size and class will matter to us. If the first characteristic tells us the number of inputs, then the second will help the neural network itself to deal with the information. Ideally, by loading the primary data and comparing the topology of the classes, the neural network will then be able to classify itself new information... Let's say we decided to upload a 3x5 pixel image. Simple arithmetic will tell us that the inputs will be: 3 * 5 \u003d 15. And the classification itself will determine the total number of outputs, i.e. neurons. Another example: neural networks need to recognize the letter "C". The given threshold is full correspondence with the letter, this will require one neuron with the number of inputs equal to the image size.

Let's say the size will be the same 3x5 pixels. By feeding the program various pictures of letters or numbers, we will teach it to determine the image of the symbol we need.

As in any teaching, a student should be punished for a wrong answer, and we will not give anything for a correct one. If the program perceives the correct answer as False, then we increase the input weight at each synapse. If, on the contrary, if the result is incorrect, the program considers the result to be positive or True, then we subtract the weight from each input to the neuron. It is more logical to start learning from acquaintance with the symbol we need. The first result will be incorrect, however, having slightly corrected the code, the program will work correctly during further work. The given example of an algorithm for constructing a code for a neural network is called a parcetron.

There are also more complex options for the operation of neural networks with the return of incorrect data, their analysis and logical conclusions of the network itself. For example, an online predictor of the future is quite a programmed neural network. Such programs are able to learn with or without a teacher, and are called adaptive resonance. Their essence lies in the fact that neurons already have their own ideas about expectations about what kind of information they want to receive and in what form. Between expectation and reality, there is a thin threshold of the so-called vigilance of neurons, which helps the network to correctly classify the incoming information and not miss a single pixel. The feature of the AR neural network is that it learns on its own from the very beginning, independently determines the threshold of vigilance of neurons. Which, in turn, plays a role in classifying information: the more alert the network, the more meticulous it is.

We got the very basics of knowledge about what a neural network is. Now let's try to summarize the information received. So, neural networks Is an electronic prototype of human thinking. They consist of electronic neurons and synapses - streams of information entering and exiting a neuron. Neural networks are programmed according to the principle of learning with a teacher (a programmer who uploads the primary information) or independently (based on assumptions and expectations from the information received, which is determined by the same programmer). Using a neural network, you can create any system: from simple definition of a pattern on pixel images to psychodiagnostics and economic analytics.

Neural networks

Simple neural network diagram. Green marked input elements, in yellow - output element

Artificial neural networks (ANN) - mathematical models, as well as their software or hardware implementations, built on the principle of the organization and functioning of biological neural networks - networks of nerve cells of a living organism. This concept arose when studying the processes occurring in the brain during thinking, and when trying to model these processes. The first such brain model was the perceptron. Subsequently, these models began to be used for practical purposes, usually in forecasting problems.

Neural networks are not programmed in the usual sense of the word, they are trained... Learning is one of the main advantages of neural networks over traditional algorithms. Technically, training consists in finding the coefficients of connections between neurons. During the learning process, the neural network is able to identify complex dependencies between input and output data, as well as perform generalization. This means that, in case of successful training, the network will be able to return the correct result based on data that was absent in the training sample.

Chronology

Known Applications

Clustering

Clustering is understood as dividing the set of input signals into classes, while neither the number nor the attributes of the classes are known in advance. After training, such a network is able to determine which class the input signal belongs to. The network can also signal that the input signal does not belong to any of the selected classes - this is a sign of new data that are absent in the training sample. Thus, a similar network can identify new, previously unknown signal classes... The correspondence between the classes allocated by the network and the classes existing in subject area, is established by man. Clustering is carried out, for example, by Kohonen neural networks.

Experimental selection of network characteristics

After selection overall structure you need to experimentally select the network parameters. For networks like a perceptron, this will be the number of layers, the number of blocks in hidden layers (for Word networks), the presence or absence of bypass connections, and the transfer functions of neurons. When choosing the number of layers and neurons in them, one should proceed from the fact that the network's ability to generalize is the higher, the greater the total number of connections between neurons... On the other hand, the number of links is bounded from above by the number of records in the training data.

Experimental selection of training parameters

After choosing a specific topology, you need to select the parameters for training the neural network. This step is especially important for supervised learning networks. The correct choice of parameters depends not only on how quickly the network's answers will converge to the correct answers. For example, choosing a low learning rate will increase the convergence time, but sometimes it avoids network paralysis. An increase in the learning moment can lead to both an increase and a decrease in the convergence time, depending on the shape of the error surface. Based on such a contradictory influence of the parameters, we can conclude that their values \u200b\u200bshould be chosen experimentally, guided by the criterion of training completion (for example, minimizing the error or limiting the training time).

Training the network itself

During training, the network scans the training sample in a certain order. The browsing order can be sequential, random, etc. Some unsupervised networks, for example, Hopfield networks, look through the sample only once. Others, such as Kohonen networks and supervised networks, scan the sample multiple times, with one full pass through the sample called the learning era... When teaching with a teacher, the set of initial data is divided into two parts - the training sample itself and the test data; the principle of separation can be arbitrary. The training data is fed to the network for training, and the validation data is used to calculate the network error (validation data is never used to train the network). Thus, if the error decreases on the test data, then the network does generalize. If the error on the training data continues to decrease, and the error on the test data increases, then the network has stopped generalizing and simply "remembers" the training data. This phenomenon is called network overfitting or overfitting. In such cases, training is usually stopped. During the learning process, other problems may appear, such as paralysis or the network hitting a local minimum of the error surface. It is impossible to predict the manifestation of this or that problem in advance, as well as to give unambiguous recommendations for their solution.

Checking the adequacy of training

Even in the case of seemingly successful learning, the network does not always learn exactly what the creator wanted from it. There is a known case when the network was trained to recognize images of tanks from photographs, but later it turned out that all the tanks were photographed against the same background. As a result, the network "learned" to recognize this type of terrain, instead of "learn" to recognize tanks. Thus, the network “understands” not what is required of it, but what is easiest to generalize.

Classification by type of input information

Analog neural networks (use information in the form of real numbers);
Binary neural networks (operate with information presented in binary form).

Classification by the nature of learning

Supervised learning - the output space of neural network solutions is known;
Unsupervised learning - the neural network forms the output space of solutions based on input only. Such networks are called self-organizing;
Reinforcement Learning is a system for assigning penalties and rewards from the environment.

Classification by the nature of synapse tuning

Signal transmission time classification

In a number of neural networks, the activating function may depend not only on the weighting coefficients of connections w ij , but also on the time of transmission of the pulse (signal) through the communication channels τ ij ... Therefore, in general, the activating (transmitting) communication function c ij from element u i to element u j looks like:. Then synchronous network ij each bond is equal to either zero or a fixed constant τ. Asynchronous is called a network whose transmission time τ ij for every link between elements u i and u j its own, but also constant.

Classification by the nature of ties

Feedforward networks

All connections are directed strictly from input neurons to output neurons. Examples of such networks are Rosenblatt's perceptron, multilayer perceptron, Word networks.

Recurrent neural networks

The signal from the output neurons or neurons of the hidden layer is partially transmitted back to the inputs of the neurons of the input layer (feedback). Recurrent network Hopfield network "filters" the input data, returning to a stable state and, thus, allows solving problems of data compression and building associative memory. Bidirectional networks are a special case of recurrent networks. In such networks, there are connections between layers both in the direction from the input layer to the output layer, and in the opposite direction. The classic example is Kosko's Neural Network.

Radial basis functions

Artificial neural networks that use radial-basis networks as activation functions (such networks are abbreviated as RBF networks). General view of the radial basis function:

, eg,

where x is the vector of input signals of the neuron, σ is the width of the function window, φ ( y) is a decreasing function (most often equal to zero outside a certain segment).

The radial-baseline network is characterized by three features:

1. The only hidden layer

2. Only neurons of the hidden layer have a nonlinear activation function

3. The synaptic weights of the connections of the input and hidden layers are equal to one

About the training procedure - see literature

Self-organizing cards

Such networks represent a competitive neural network with unsupervised learning that performs the task of visualization and clustering. It is a method of projecting a multidimensional space into a space with a lower dimension (most often two-dimensional), it is also used to solve problems of modeling, forecasting, etc. It is one of the versions of Kohonen's neural networks. Self-organizing Kohonen maps are primarily used for visualization and initial (“exploratory”) data analysis.

The signal to the Kohonen network goes to all neurons at once, the weights of the corresponding synapses are interpreted as coordinates of the node position, and the output signal is formed according to the principle of "winner takes everything" - that is, the neuron closest (in terms of synapse weights) to the input signal has a nonzero output signal object. In the process of training, the synapse weights are adjusted in such a way that the lattice nodes are "located" in the places of local data concentration, that is, they describe the cluster structure of the data cloud, on the other hand, the connections between neurons correspond to the neighborhood relations between the corresponding clusters in the feature space.

It is convenient to think of such maps as two-dimensional grids of nodes located in multidimensional space. Initially, a self-organizing map is a grid of nodes connected by links. Kohonen considered two options for connecting nodes - in a rectangular and a hexagonal grid - the difference is that in a rectangular grid, each node is connected to 4 neighboring nodes, and in a hexagonal grid - to 6 nearest nodes. For two such grids, the process of constructing the Kohonen network differs only in the place where the neighbors closest to the given node are moved.

The initial nesting of the mesh into the data space is arbitrary. The author's package SOM_PAK offers options for a random initial arrangement of nodes in space and a variant of an arrangement of nodes in a plane. After that, the nodes begin to move in space according to the following algorithm:

Data point is randomly selected x .
The closest to x card node (BMU - Best Matching Unit).
This node is moved a given step towards x. However, it does not move alone, but drags along a certain number of the nearest nodes from some neighborhood on the map. Of all the moving nodes, the central node - the one closest to the data point - is most strongly displaced, and the rest experience the smaller displacements, the further they are from the BMU. There are two stages in map tuning - the ordering stage and the fine-tuning stage. At the first stage, large values the surroundings and the movement of nodes is of a collective nature - as a result, the map "straightens" and roughly reflects the data structure; at the stage fine tuning the radius of the neighborhood is 1–2 and the individual positions of the nodes are adjusted. In addition, the value of the bias decays uniformly over time, that is, it is large at the beginning of each of the training stages and close to zero at the end.
The algorithm repeats a certain number of epochs (it is clear that the number of steps can vary greatly depending on the task).

Differences from machines with von Neumann architecture

A long period of evolution has given the human brain many qualities that are absent in machines with von Neumann architecture:

Mass concurrency;
Distributed presentation of information and calculations;
Ability to learn and generalize;
Adaptability;
Property of contextual information processing;
Tolerance to mistakes;
Low power consumption.

Neural networks - universal approximators

Neural networks are universal approximating devices and can simulate any continuous automaton with any accuracy. A generalized approximation theorem is proved: with the help of linear operations and a cascade connection, one can obtain a device from an arbitrary nonlinear element that calculates any continuous function with any predetermined accuracy. This means that the nonlinear characteristic of a neuron can be arbitrary: from sigmoidal to an arbitrary wave packet or wavelet, sine or polynomial. The complexity of a particular network may depend on the choice of a nonlinear function, but with any nonlinearity, the network remains a universal approximator and, with the correct choice of structure, can approximate the functioning of any continuous automaton as accurately as desired.

Application examples

Forecasting financial time series

Input data - stock price for the year. The task is to determine tomorrow's course. The following transformation is being carried out - the course for today, yesterday, for the day before yesterday, for the day before yesterday is lined up. The next row is shifted by date by one day, and so on. On the resulting set, a network with 3 inputs and one output is trained - that is, an output: rate for a date, inputs: rate for a date minus 1 day, minus 2 days, minus 3 days. For the trained network, we submit the course for today, yesterday, the day before yesterday and receive an answer for tomorrow. It is easy to see that in this case the network will simply display the dependence of one parameter on the three previous ones. If it is desirable to take into account some other parameter (for example, the general index for the industry), then it must be added as an input (and included in the examples), the network must be retrained and new results obtained. For the most accurate training, it is worth using the ORO method, as it is the most predictable and simpler to implement.

Psychodiagnostics

A series of works by M. G. Dorrer and co-authors is devoted to the study of the question of the possibility of developing psychological intuition in neural network expert systems. The results obtained provide an approach to the disclosure of the mechanism of intuition of neural networks, which manifests itself when they solve psychodiagnostic tasks. Created non-standard for computer techniques intuitive approach to psychodiagnostics, consisting in the exclusion of construction described reality... It allows you to shorten and simplify the work on psychodiagnostic techniques.

Chemoinformatics

Neural networks are widely used in chemical and biochemical research.Currently, neural networks are one of the most widespread methods of chemoinformatics for finding quantitative structure-property relationships, due to which they are actively used both for predicting the physical and chemical properties and biological activity of chemical compounds, and for directional design of chemical compounds and materials with predetermined properties, including in the development of new drugs.

Notes

McCulloch W.S., Pitts W., Logical calculus of ideas related to nervous activity // In collection: "Automata" ed. C.E. Shannon and J. McCarthy. - M .: Publishing house of foreign. lit., 1956 .-- pp. 363-384. (Translation of an English article of 1943)
Pattern Recognition and Adaptive Control. BERNARD WIDROW
Widrow B., Stearns S., Adaptive Signal Processing. - M .: Radio and communication, 1989 .-- 440 p.
Werbos P. J., Beyond regression: New tools for prediction and analysis in the behavioral sciences. Ph.D. thesis, Harvard University, Cambridge, MA, 1974.
Galushkin A.I. Synthesis of multilayer pattern recognition systems. - M .: "Energy", 1974.
Rumelhart D.E., Hinton G.E., Williams R.J., Learning Internal Representations by Error Propagation. In: Parallel Distributed Processing, vol. 1, pp. 318-362. Cambridge, MA, MIT Press. 1986.
Bartsev S.I., Okhonin V.A. Adaptive information processing networks. Krasnoyarsk: Institute of Physics, Siberian Branch of the USSR Academy of Sciences, 1986. Preprint N 59B. - 20 p.
BaseGroup Labs - Practical Application of Neural Networks in Classification Problems
This type of encoding is sometimes referred to as a "1 of N" code
Open systems - an introduction to neural networks
Mirkes E. M., Logically transparent neural networks and the production of explicit knowledge from data, In the book: Neuroinformatics / A. N. Gorban, V. L. Dunin-Barkovsky, A. N. Kirdin and others - Novosibirsk: Science. Siberian Enterprise RAS, 1998 .-- 296 with ISBN 5020314102
Mention of this story in Popular Mechanics magazine
http://www.intuit.ru/department/expert/neuro/10/ INTUIT.ru - Recurrent networks as associative storage devices]
Kohonen, T. (1989/1997/2001), Self-Organizing Maps, Berlin - New York: Springer-Verlag. First edition 1989, second edition 1997, third extended edition 2001, ISBN 0-387-51387-6, ISBN 3-540-67921-9
A. Yu. Zinoviev Visualization of multidimensional data. - Krasnoyarsk: Ed. Krasnoyarsk State Technical University, 2000 .-- 180 p.
Gorban A. N., Generalized approximation theorem and computational capabilities of neural networks, Siberian Journal of Computational Mathematics, 1998. Vol. 1, No. 1. P. 12-24.
Gorban A.N., Rossiyev D.A., Dorrer M.G., MultiNeuron - Neural Networks Simulator For Medical, Physiological, and Psychological Applications, Wcnn'95, Washington, DC: World Congress on Neural Networks 1995 International Neural Network Society Annual Meeting: Renaissance Hotel, Washington, DC, USA, July 17-21, 1995.
Dorrer M.G., Psychological intuition of artificial neural networks, Diss. ... 1998. Other copies online:,
Baskin I.I., Palyulin V.A., Zefirov N.S., Application of artificial neural networks in chemical and biochemical research, Vestn. Moscow Un-Ta. Ser. 2. Chemistry. 1999. Vol. 40. No. 5.
Galberstam N.M., Baskin I.I., Palyulin V.A., Zefirov N.S. Neural networks as a method for finding dependencies structure - property of organic compounds // Advances in chemistry... - 2003. - T. 72. - No. 7. - S. 706-727.
Baskin I.I., Palyulin V.A., Zefirov N.S. Multilayer perceptrons in the study of "structure-property" relationships for organic compounds // Russian Chemical Journal (Journal of the Russian Chemical Society named after D.I. Mendeleev)... - 2006. - T. 50. - S. 86-96.

What is a neural network?

A neural network is a sequence of neurons connected by synapses. The structure of the neural network came to the programming world straight from biology. Thanks to this structure, the machine acquires the ability to analyze and even memorize various information... Neural networks are also able not only to analyze incoming information, but also to reproduce it from their memory. For those interested, be sure to watch 2 videos from TED Talks: Video 1 , Video 2). In other words, a neural network is a machine interpretation of the human brain, which contains millions of neurons that transmit information in the form of electrical impulses.

What are neural networks?

For now, we will consider examples on the most basic type of neural networks - this is a feedforward network (hereinafter referred to as the NNN). Also in future articles I will introduce more concepts and tell you about recurrent neural networks. DSS, as the name implies, is a network with a serial connection of neural layers, in which information always goes only in one direction.

What are neural networks for?

Neural networks are used to solve complex problems that require analytical calculations similar to those of the human brain. The most common uses for neural networks are:

Classification - distribution of data by parameters. For example, a set of people is given at the entrance and you need to decide which of them to give a loan, and who not. This work can be done by a neural network, analyzing information such as age, solvency, credit history, etc.

Prediction - the ability to predict the next step. For example, the rise or fall of a stock based on the situation in the stock market.

Recognition - Currently, the most widespread use of neural networks. Used on Google when you are looking for a photo or in phone cameras when it detects the position of your face and makes it stand out and much more.

Now, to understand how neural networks work, let's take a look at its components and their parameters.

What is a neuron?

A neuron is a computing unit that receives information, performs simple calculations on it, and transfers it further. They are divided into three main types: input (blue), hidden (red), and output (green). There is also a bias neuron and a context neuron, which we will talk about in the next article. In the case when a neural network consists of a large number of neurons, the term layer is introduced. Accordingly, there is an input layer that receives information, n hidden layers (usually no more than 3) that process it and an output layer that outputs the result. Each of the neurons has 2 main parameters: input data and output data. In the case of an input neuron: input \u003d output. In the rest, the total information of all neurons from the previous layer enters the input field, after which it is normalized using the activation function (for now, just represent it f (x)) and gets into the output field.

Important to rememberthat neurons operate with numbers in the range or [-1,1]. But what, you ask, then handle numbers that go out of this range? At this stage, the simplest answer is to divide 1 by that number. This process is called normalization, and it is very often used in neural networks. More on this later.

What is a synapse?

A synapse is a connection between two neurons. Synapses have 1 parameter - weight. Thanks to him, the input information changes when it is transmitted from one neuron to another. Let's say there are 3 neurons that transmit information to the next one. Then we have 3 weights corresponding to each of these neurons. For the neuron with the greater weight, that information will be dominant in the next neuron (for example, color mixing). In fact, the set of weights of a neural network or a matrix of weights is a kind of brain of the entire system. It is thanks to these weights that the input information is processed and converted into a result.

Important to rememberthat during the initialization of the neural network, the weights are randomly assigned.

How does a neural network work?

IN this example depicts a part of a neural network, where the letters I denote the input neurons, the letter H - the hidden neuron, and the letter w - the weights. The formula shows that the input information is the sum of all input data multiplied by the corresponding weights. Then we give the input 1 and 0. Let w1 \u003d 0.4 and w2 \u003d 0.7 Input data of the neuron Н1 will be as follows: 1 * 0.4 + 0 * 0.7 \u003d 0.4. Now that we have the input, we can get the output by plugging the input into the activation function (more on that later). Now that we have the output, we pass it on. And so, we repeat for all layers until we reach the output neuron. Having launched such a network for the first time, we will see that the answer is far from correct, because the network is not trained. To improve her results, we will train her. But before we learn how to do this, let's introduce a few terms and properties of a neural network.

Activation function

An activation function is a way to normalize input data (we talked about this earlier). That is, if you have a large number at the input, having passed it through the activation function, you will get an output in the range you need. There are a lot of activation functions, so we will consider the most basic ones: Linear, Sigmoid (Logistic) and Hyperbolic tangent. Their main difference is the range of values.

Linear function

This function is almost never used, except when you need to test a neural network or transfer a value without transformations.

Sigmoid

This is the most common activation function and its range of values. It shows most of the examples on the web, it is also sometimes called the logistic function. Accordingly, if in your case there are negative values \u200b\u200b(for example, stocks can go not only up, but also down), then you will need a function that captures negative values.

Hyperbolic tangent

It makes sense to use the hyperbolic tangent only when your values \u200b\u200bcan be both negative and positive, since the range of the function is [-1,1]. It is inappropriate to use this function only with positive values, as it will significantly worsen the results of your neural network.

Training set

A training set is a sequence of data that a neural network operates on. In our case, exceptional or (xor), we have only 4 different outcomes, that is, we will have 4 training sets: 0xor0 \u003d 0, 0xor1 \u003d 1, 1xor0 \u003d 1.1xor1 \u003d 0.

Iteration

This is a kind of counter that increases every time the neural network goes through one training set. In other words, it is the total number of training sets traversed by the neural network.

Epoch

When initializing the neural network, this value is set to 0 and has a manually set ceiling. The larger the epoch, the better the network is trained and, accordingly, its result. The epoch increases each time we go through the entire set of training sets, in our case, 4 sets or 4 iterations.

Important do not confuse iteration with epoch and understand the sequence of their increment. First n
once the iteration increases, and then the epoch and not vice versa. In other words, you cannot first train a neural network only on one set, then on another, and so on. You need to train each set once per era. Thus, you can avoid errors in calculations.

Mistake

Error is a percentage that represents the discrepancy between expected and received responses. The error is formed every epoch and should decline. If this does not happen, then you are doing something wrong. The error can be calculated in different ways, but we will consider only three main ways: Mean Squared Error (hereinafter MSE), Root MSE, and Arctan. There is no restriction on use as in the activation function, and you are free to choose whichever method gives you the best results. One has only to take into account that each method counts errors in different ways. In Arctan, the error will almost always be larger, since it works according to the principle: the larger the difference, the larger the error. Root MSE will have the smallest error, therefore most often MSE is used, which keeps a balance in the error calculation.

Root MSE

The principle of calculating the error is the same in all cases. For each set, we count the error by subtracting it from the ideal answer. Further, either we square it, or we calculate the square tangent from this difference, after which we divide the resulting number by the number of sets.

A task

Now, to test yourself, calculate the result of a given neural network using a sigmoid and its error using MSE.

Data: I1 \u003d 1, I2 \u003d 0, w1 \u003d 0.45, w2 \u003d 0.78, w3 \u003d -0.12, w4 \u003d 0.13, w5 \u003d 1.5, w6 \u003d -2.3.