Translation

H.264 is a video compression standard. And it's ubiquitous, used to compress video on the internet, Blu-ray, phones, security cameras, drones, everywhere. Everyone is using H.264 now.

It should be noted the manufacturability of H.264. It appeared as a result of more than 30 years of work with one single goal: to reduce the required bandwidth for the transmission of high-quality video.

From a technical point of view, this is very interesting. The article will superficially describe the details of the operation of some compression mechanisms, I will try not to get bored with the details. In addition, it is worth noting that most of the technologies outlined below are valid for video compression in general, and not just for H.264.

Why compress anything at all?

Uncompressed video is a sequence of two-dimensional arrays containing information about the pixels of each frame. Thus, it is a three-dimensional (2 spatial and 1 temporal) byte array. Each pixel is encoded in three bytes - one for each of the three primary colors (red, green, and blue).

1080p @ 60 Hz = 1920x1080x60x3 => ~ 370 Mb / s data.

This would be almost impossible to use. A 50GB Blu-ray disc could only hold about 2 minutes. video. Copying won't be easy either. Even SSDs will have trouble writing from memory to disk.

So yes, compression is necessary.

I will definitely answer this question. But first, I'll show you something. Take a look at the Apple homepage:

I saved the image and will give an example of 2 files:

It's impressive, what other tricks are there?

Color processing

The human eye is not very good at distinguishing similar color shades. The smallest differences in brightness can be easily recognized, but not colors. Therefore, there must be a way to get rid of unnecessary color information and save even more space.

In TVs, RGB colors are converted to YCbCr, where Y is the luminance component (essentially the brightness of a black and white picture) and Cb and Cr are the color components. RGB and YCbCr are equivalent in terms of information entropy.

Why complicate things then? Isn't RGB enough?

In the days of black and white TVs, there was only the Y component. And with the advent of color TVs, engineers were faced with the task of transmitting an RGB color image along with black and white. Therefore, instead of two channels for transmission, it was decided to encode the color into the Cb and Cr components and transmit them together with Y, and color TVs themselves will convert the color and brightness components into their usual RGB.

But here's the trick: the luma component is encoded at full resolution, and the color components are encoded at only a quarter. And this can be neglected, because eye / brain does not distinguish shades well. This way you can reduce the size of the image in half and with minimal differences. 2 times! The machine will weigh 10 kg!

This downsampling image coding technology is called color downsampling. It has been around for a long time and is not limited to H.264.

These are the most significant technologies in lossy compression size reduction. We managed to get rid of most of the detail and cut the color information in half.

Is it possible even more?

Yes. Cropping the picture is just the first step. Up to this point, we have analyzed a single frame. It's time to look at time compression, where we have to work with a group of frames.

Motion compensation

H.264 is a standard that allows motion compensation.

Motion compensation? What is it?

Imagine you are watching a tennis match. The camera is fixed and shoots from a certain angle and the only thing that moves is the ball. How would you code this? You would do as usual, right? A three-dimensional array of pixels, two coordinates in space and one frame at a time, right?

But why? Most of the image is the same. The field, the net, the audience do not change, the only thing that moves is the ball. What if you define a single image of the background and one image of a ball moving over it. Wouldn't that save a lot of space? You see what I'm getting at, don't you? Motion compensation?

And that's exactly what H.264 does. H.264 splits the image into macroblocks, usually 16x16, which are used to calculate motion. One frame remains static, usually called an I-frame, and contains everything. Subsequent frames can be either P-frames or B-frames. In P-frames, the motion vector is encoded for each macroblock based on previous frames, so the decoder must use the previous frames by taking the last of the video I-frames and gradually adding changes to subsequent frames until it reaches the current one.

The situation is even more interesting with B-frames, in which the calculation is performed in both directions, based on the frames that come before and after them. Now you understand why the video at the beginning of the article weighs so little, it's just 3 I-frames, in which macroblocks rush.

With this technology, only motion vector differences are encoded, thereby providing a high compression ratio for any motion video.

We've covered static and temporary compression. With the help of quantization, we reduced the data size many times, then with the help of color subsampling, we halved what we received, and now, with motion compensation, we have managed to store only 3 frames out of 300 that were originally in the video in question.

It looks impressive. Now what?

We now draw the line using traditional lossless entropy coding. Why not?

Entropy coding

After stages of lossy compression, I-frames contain redundant data. In the motion vectors of each of the macroblocks in P-frames and B-frames, there is a lot of the same information, since they often move identically, as can be seen in the initial video.

This redundancy can be eliminated by entropy coding. And you don't have to worry about the data itself, since this is a standard lossless compression technology, which means everything can be restored.

Now that's it! H.264 is based on the aforementioned technologies. This is what the standard is about.

Fine! But I am curious to know how much our car weighs now.

The original video was shot at a non-standard resolution of 1232x1154. If you count, you get:

5 sec. @ 60 fps = 1232x1154x60x3x5 => 1.2 GB
Compressed video => 175 Kb

If we correlate the result with the agreed weight of the car in one ton, then we get a weight equal to 0.14 kg. 140 grams!

Yes, it's magic!

Of course, I have presented in a very simplified form the result of ten years of research in this area. If you want to know more, then

Video compression technology has been a stumbling block in the design of video surveillance systems since the advent of the Internet Protocol (IP) in the 1990s. Since then, video coding standards have gone through many stages of research. The compression standard has attracted the attention of the industry today H.265 or HEVC (High Efficiency Video Coding). It is the next version after H.264, which is currently the dominant IP video coding technology. We will try to figure out what are its prospects today and in the future.

Integration of H.265 technology can be hampered by the availability of optimized H.264, better encoding for CCTV systems

H.265: understanding what and why

The H.265 standard is a significant step forward in video coding. One of its advantages is that it doubles the compression efficiency of H.264. So when transmitting images of similar quality, H.265 uses only half the bitrate of the previous codec. This drastically reduces bandwidth and storage requirements, allowing better use of both hardware and software. Users, in fact, get more features at a lower cost. Because of this, most hardware manufacturers support the implementation of the H.265 compression standard for video surveillance. So soon we will be able to see H.265 as the next standard.

But despite all the advantages, H.265 is still far from mass adoption. The question arises: can users somehow optimize the transmission of images before the revolution occurs in the field of video surveillance? After all, the popularity of high-definition video is growing, and demand creates supply.

Recent advances for the current H.264 codec optimize bitrate in three ways: predictive encoding, noise suppression, and “long-term” bitrate control. This has resulted in a 75% reduction in memory requirements for H.264. Due to these innovations and some other factors, it is highly likely that in the next 5-10 years, both standards will peacefully coexist in the market.

Barriers to H.265 adoption

Integration of H.265 technology is likely to be hampered by the availability of optimized H.264 encoding, as well as the cost of upgrading existing systems to H.265. Additional complications will also arise with the change in manufacturing processes for the release of equipment that supports H.265 and with patents, which we will talk about later. In principle, H.264 remains a viable and workable standard for the vast majority of CCTV systems. Today it fully fulfills its functions - and, admittedly, quite well.

At the higher cost, users should be confident that the upgrade to H.265 is really worth it.

Limitations of laboratory tests

In tests conducted by the Joint Collaborative Team on Video Coding (JCT-VC), the compression ratio of H.265 has doubled compared to the previous H.264. But, as you might expect, these tests were carried out in a laboratory environment and are far from many of the difficulties that arise in the process of actually using the standard.

Real-time encoding with a balance between algorithm complexity and compression capability is what one wants to see in the development of H.265. In practice, the compression capability of the H.265 codec may not provide a 100% improvement over H.264, even though this has been claimed.

The H.264 standard has been deployed for over 10 years in the industry in which it has evolved, with support from all chipset manufacturers, and with access to a wide variety of encoders and decoders. This has been tested and proven in practice. In this sense, H.265 technology has a lot to catch up.

Patent price

Another problem that may hinder the massive distribution of the H.265 standard is the need to purchase a patent. Many business owners already have a patent for H.264, while H.265 was not very common in the industry in its early days, and the businesses that own it are not related. Low demand for the new standard results in a much higher patent cost - a key issue that security companies should seriously consider - how this will affect production and, as a result, the price tag for the end user. When introducing a new standard, price really matters, especially if users have to replace both the front and back end of the system in order to benefit from improved video compression. By paying several times more, the consumer must be sure that the upgrade is really worth it.

Optimized H.264 encoding technologies

Despite the above arguments, the main reason why we believe that H.265 will not become the dominant encoding solution anytime soon is the simple lack of demand - a number of innovative manufacturers have implemented optimized H.264 encoding technologies, and the need for H.265 is still simply no. This fact can be called "a solution to a problem that has not yet arisen."

Optimized H.264 technologies use predictive coding to reduce the bitrate spent on an unchanging background image

Since the launch of H.264 technology in 2003, the security industry has been developing high-performance video encoders in an effort to improve picture quality for video surveillance systems. Add to this the increasing popularity of high quality video, the increasing demands for bitrate and resolution, and it becomes apparent that the cost of system components as a whole has increased. The sheer volume of video data captured from CCTV cameras means that users must invest in ever-increasing storage requirements.

Predictive coding

How is the H.264 codec improved? First, basic research on video compression is being done in various industries. For example, in any video from cameras, users first pay attention to moving objects, and then to the static part of the picture. If the background does not change, it can be encoded as a keyframe. Optimized H.264 technologies use predictive coding to reduce the bitrate spent on a static background image. By applying this predictive coding throughout the system, users save significant bandwidth and storage costs.

Noise reduction

Another important element of H.264 optimization is noise reduction.

Noise or unwanted electrical signal displayed in the video stream is a serious interference to the digital video signal. This leads to the fact that in the background of the image there are many foreign pixels caused by fluctuations in light, temperature, or other signals in the air. But the optimized H.264 technologies using mining algorithms suppress most of the noise by encoding the foreground object of the image at a higher bit rate relative to the background image. The result: crisp, color-accurate images.

Long term bitrate control

Finally, the bitrate requirements for a particular scene can fluctuate throughout the day. For example, in a typical street scene at night there is little movement in the foreground, so the bitrate requirements are low. During the day, demands are greatly increased by vehicles and pedestrians moving in the foreground and background. Modern H.264 encoding technologies manage this timing by calculating the overall average bitrate and then automatically allocating the required bitrate at the time of day when it is needed. This occurs at the level of the setpoint values of the decoder. Here, the main advantage of long-term bitrate control is that users have the ability to accurately predict their video storage requirements so that they can measure the storage size needed.

***

Today, these advantages of H.264 exceed what the H.265 standard offers. Among other things, H.264 has a number of other advantages: compatibility with existing systems, lower cost of production, a wider range of products on which the codec can be applied, and lower patent risk.

Video compression designs tend to adhere to a roughly 10 year cycle. In 1994, the MPEG2 format was introduced. H.264 launched in 2003 and H.265 launched in 2013. In this case, the historical context is important because video coding standards respond not only to technological change, but also to trends across the video industry. When the MPEG2 format was the standard, the industry focused mainly on DVD players and TV resolutions where this format was used. The emergence of H.264 coincided with the introduction of HD technology, advanced IT technologies and the mobile Internet.

Uses of H.264 have included HD digital TV, Internet video, mobile video, CCTV, Blu-ray and more. Since H.265 is just entering the scene, we believe it will be the most widely used in ultra-HD technology development. and cloud storage applications.

Development prospects video compression technologies

After the launch of H.265, the members of the Joint Joint Video Coding Group (JCT-VC) began to forecast the future for this segment. In 2015, they formed the Joint Video Exploring Team (JVET) with a focus on further improving compression capabilities. Their latest testing data shows that improvements in H.265 compression performance are achieved by 20%. At the same time, another organization, the Alliance for Open Media (AOM), has brought together a number of Internet-oriented companies, including Microsoft, Google, Intel, and Amazon, in an effort to arrive at a free standard for Internet video. The plan is that this (free) standard will accelerate technology updates in the online world at a crazy rate.

Competition for these standards is likely to be tough - and it could also mean that the 10-year compression cycle will fade into oblivion, and new standards will appear in a much shorter time frame.

In the near future I want to post a note about the WD TV Live HD player, so I'll touch on a topic that is painful for iron players - why there are problems with video playback. Often the reason is in the unreasonably heaped H.264 stream. The H.264 standard provides for many signal compression mechanisms, here is a table in which each profile is assigned a set of capabilities that can be used in a stream. Profiles are, for example, such - Constrained Baseline Profile (CBP), Baseline Profile (BP), Main Profile (MP), High Profile (HiP), etc. There is also the concept of a level that defines the numerical limits within a particular profile. Levels are indicated by a pair of numbers from 1.0 to 5.1. The profile is usually written in the form @L , for example, you can find such designations - [email protected] or [email protected]

The quality standard is considered to be a stream from a Blu-Ray disc, its video stream corresponds to the profile [email protected] According to the table [email protected] imposes a maximum limitation on the stream - 62500 Kbps and provides the following modes (I cite the highest): 1.280 × [email protected](9), 1,920 × 1, [email protected](4), 2.048 × 1, [email protected](4). The number after the @ is the frame rate, and the number in parentheses is the number of reference frames (or reframes). Reframes is the number of frames that the current one can refer to during the decoding process. This parameter imposes requirements on the decoder memory size and, possibly, its increase will still entail some additional load on the decoder. So for Blu-Ray in full hd resolution, this parameter is only 4. On those blu-rays that were at hand I checked - this is really true, as well as compliance with this profile. However, videos downloaded from the network often have higher profiles, and the number of reframes sometimes reaches 19! You can view the properties of the stream with a free utility. I did this and found that about 20% of the films available have overestimated reframes and overestimated profiles. This subset has a fairly typical profile [email protected] For information, I will give its characteristics: stream up to 300000 Kbps (!), Maximum modes: 1,920 × 1, [email protected](16), 4,096 × 2, [email protected](5), 4,096 × 2, [email protected](5). Such an insane bitrate is not physically supported by a blu-ray disc (the maximum bitrate is 48 Mbit) and it does not crawl through the 100 Mbps grid, judging by the maximum resolutions, the profile is intended for encoding video for digital cinemas. Why this happens is understandable - people just put everything to the maximum and squeeze, completely without including their heads, and as a result we have problems that, fortunately, the creators of HD players are heroically fighting with, but with varying success. Soon I will write about how they manage to fight.

27.03.2009

In our age of marketing and end-to-end devaluation of appraisal categories, it's hard to take their word for it. It only smells of serious money - purchased opinions of authoritative persons appear, research results are forged, nameplates with the names of age-old brands flutter from piece to piece. The horror is that, strictly speaking, you cannot trust the press either. Well, if you can't, but you really want to, then - you can ...

Observing the latest trends in digital video compression, the Security News editors are trying to pay attention not only to positive assessments of world industry experts, but also to skeptical notes. If you are lucky, some harsh criticism comes across. The two expert opinions that we publish are more likely to be positive, although, according to some indications, they are only camouflaged as "objective". We invite domestic experts to the discussion: in the Russian industry press a few years ago, all forecasts converged on Wavelet coding. Why did the other solution "win", for technical reasons or in pursuit of profit? And did you win at all? We are waiting for your opinions.

Not so long ago I had the opportunity to attend two exhibitions - ISC West in Las Vegas and IFSEC in the United Kingdom. The strength of these events is that they can accurately determine where the market wind is blowing and what the minds of colleagues in the industry are doing. As the technical manager of a company that makes IP video management software, I was very interested in separating the wheat from the chaff.

Since I had to participate in both exhibitions before, I perfectly understood that the press here would be interested only in "the latest and greatest". Catching on to any topic, the media seem to start a race - who will best serve the latest of the last and the greatest of the greats. However, let's not forget that a couple of years ago such a "hot" topic was IP video surveillance - and today it is already becoming a de facto standard, significantly outstripping analog technologies in the development.

The new video compression format, H.264, has been hotly debated this year. Let me remind you that it was a joint development of two international standardization organizations - and ISO / IEC; this format is also known as MPEG-4 Part 10 AVC (Advanced Video Coding).

Squeeze even harder

Surveillance appetites for storage and network bandwidth are growing: no one wants to miss out on high frame rates and high resolution. Hence the expectations of greater efficiency from video compression methods. The H.264 encoder is able to reduce the size of a digital video file by more than 80% compared to a Motion JPEG compressed signal, while maintaining the same visual quality. Compared to the most popular version of the MPEG-4 format - MPEG-4 Part 2 Simple Profile (SP) - the H.264 codec usually wins 40-50 percent of the volume of video files.

The megapixel camera sector is growing, and until recently, the increased storage requirements for high-definition cameras were considered the main constraint on its growth. The use of the H.264 codec can significantly speed up the process of introducing megapixel cameras.

In my personal opinion, the H.264 format will almost completely supplant MPEG-4 (Part 2) in just a few years. And providers of video management solutions will begin to build in support for the new format in the near future, as will all the leading manufacturers of video cameras.

A spoon of tar

However, there are factors that restrain enthusiasm from the new product - in fact, in fact, the development is still at the very beginning of the path. Yes, the codec allows you to reduce the load on data transmission networks and save on the purchase of video storage. But its use is possible only in the context of high-performance cameras. The new compression algorithm uses much more complex mathematics than the previous standards - say, the decoding procedure is about twice as large as the analogous procedure for MPEG-4 Part 2 SP in terms of computations - accordingly, the demand for the computing power of systems is growing. At the same time, the H.264 standard itself became relatively long ago - about five years ago, and in some industries - excluding ours with you - it has already been adopted. Let's say it's used in a new generation of high definition consumer DVDs (Blu-ray format).

How it works

H.264 is a hybrid video block coding standard using motion compensation. The actual compensation is based on the use of motion vectors of areas of the frame to predict changes in the image. Since video images are characterized by a high degree of correlation between two successive frames, it is possible to use this to encode not the whole picture, but only the movement vectors of various parts of the image; in this case, the predicted difference between the current frame and its regions present in other frames (so-called reference frames) is encoded in a form offset from the original position. This technique is called intermediate prediction.

There are two main methods of inter-prediction, one reference frame based (P macroblocks) and bidirectional (B type macroblocks), which uses a combination of two reference frames. In order to provide access to arbitrary portions of the video image and to increase the degree of protection against errors, the standard also provides for the so-called infra-coding, in which the encoded data does not depend on the nature and content of any third-party images, as is the case with the use of intermediate prediction.

The H.264 standard provides for dividing the image into macroblocks up to 16x16 pixels each. Macroblocks are combined into groups - one or more - usually in the order they were scanned. Thus, a single image can be encoded as one or more groups. The use of macroblock grouping allows different error correction methods, different types of macroblock coding, as well as tools such as separate coding of half frames (as groups) when interlaced.

In color video images, the luminance component is encoded separately from the color one; taking into account the peculiarities of human vision, in this case, as a rule, sub-sampling of the color signal relative to the luminance signal is used. By and large, there are no fundamental differences between the new format and previous video coding standards (including MPEG-4 Part 2): they are all based on blocking in one way or another and are hybrid.

New funds

In addition to the improvements to existing encoding tools, the H.264 format also includes a number of new tools. The most important of them are the built-in adaptive deblocking filter, which allows to significantly reduce blocking distortions of the image, recording more than two reference frames for more accurate prediction, dividing macroblocks into smaller blocks (up to 4x4 pixels), prediction in infra-coding, and the use of an integer Transform to replace the Discrete Cosine Transform (DCT) used in earlier standards.

The H.264 format includes a fundamental solution to the network abstraction layer (NAL), which, when installed on top of the video coding layer (VCL) software engine, takes over the function of efficiently representing digital video in a format that provides easy Integration with a variety of different protocols and data transfer mechanisms is very attractive for networks based on the Internet Protocol (IP).

What's the bottom line?

The main result of all the improvements in coding technology embodied in the H.264 standard is that the new format really surpasses all previous digital video compression algorithms in its characteristics - and therefore can be considered the highest achievement in the field of digital video coding today.

So, is H.264 worth all the media hype around it? With the advent of the new format, video compression standards began to change rapidly - and today they are already able to maintain or even reduce the load on the bandwidth of data transmission networks when switching to high-definition video. And this is very valuable.

However, let us remember that all the delights of the new coding technology and the increasingly powerful megapixel cameras that have poured into the market can only be realized with the use of a solid control platform, on the basis of which video surveillance solutions are formed. The use of 100% open platforms for IP video management will allow you to integrate new technological solutions into your existing server infrastructure - without the need to completely replace the system hardware.

Truth or marketing. Will H.264 live up to users' expectations?

Tom Galvin, director of NetVideo Consulting, is formerly vice president of engineering at GE Security.
Based on materials from Security Dealer and Integrator magazine.

So the race to implement the H.264 video compression standard has started. Manufacturers are adopting this format as the standard for their DVRs, network cameras and encoders, vying with the promise of up to 50 percent video data reductions compared to MPEG-4 compression. A 50% reduction is a big claim as it can have a huge impact on the total cost of ownership of video surveillance systems. A decrease in bitrates results in an increase in the volume of digital data storage, a decrease in the load on the network infrastructure, or an increase in the quality of video images at the same digital data transfer rates.

Guided by a purely professional interest, I decided to answer the question: is the codec up to the level that it was given by numerous promises? And so that the answer is not unfounded, confirm the conclusion by direct comparison of the compression efficiency of the MPEG-4 and H.264 algorithms. The most interesting thing - is H.264 really capable of lowering bitrates without losing video quality?

The H.264 standard owes its origin to two different groups of experts who came together specifically to create it. The product, which appeared as a result of joint efforts, was known under various names. "H.264" is christened by the ITU-T organization, which coordinates telecommunication standards of the International Telecommunication Union. The International Organization for Standardization (ISO) calls the same standard MPEG-4 Part 10 / Advanced Video Coding (AVC) because it is an extension of the MPEG-4 suite of standards already successfully implemented in the vast a number of products related to video surveillance. The US security industry has adopted a somewhat less aristocratic but shorter term as "just" H.264.

The new standard defines a number of mathematical principles, the application of which in video compression can achieve more successful results than is observed in previously adopted standards. Many of the algorithms described in it are very demanding on the computing power of the equipment or are not applicable in a number of specific applications. To provide the required flexibility in application, the standard defines seven different profiles. A profile is a set of characteristics provided for a specific group of practical applications of the standard. Many of the video surveillance products are likely to be based on the baseline profile. The baseline profile is intended for hardware devices that have limited processing power, but require the lowest possible signal time delay. The other profiles are designed for a wide range of applications, from broadcast TV and high definition DVD (Blu-ray) to mobile telephony.

Whose pie is tastier?

For the "culinary competition" I used two encoders of different formats - H.264 and MPEG-4 - from Axis Communications, applying them to two typical video surveillance scenes. The first scene was filmed with a PTZ camera located in the parking lot, and the second - with a fixed camera mounted above the door in the foyer of the business center. Both scenes were filmed in 4CIF resolution at 30 frames per second. I used the NetVideo Device Manager software to measure the bitrates coming from each of the digital video stream sources. Through a rather tedious process of trial and error, I adjusted the compression ratios to achieve visually equivalent levels of video quality from both sources.

In both scenes, the device using H.264 compression recorded a decrease in average data rate of about 50 percent.

The measured signal time delay for both devices was approximately 100 milliseconds. The delay includes the time spent on digitizing the video signal, compressing the data stream and transmitting it over the network, decoding and displaying it on a personal computer screen. A delay of 100 milliseconds is a very small value and therefore cannot affect the efficiency of PTZ control.

I repeated the comparison tests in various scenes, and everywhere there was a difference between the displayed signals obtained using the MPEG-4 and H.264 compression formats. Typical artifacts, known as blocking effects, are significantly more noticeable in MPEG-4 than in H.264 at relatively high compression ratios.

As the signal compression ratio of video streams processed by MPEG-4 and H.264 encoders increases (and the corresponding decrease in bitrates and visual quality of the image), I noticed that the "blocks" on the MPEG-4 signal are becoming more noticeable, while the picture, compressed in H.264 format, it continues to be "smooth", getting rid of artifacts by reducing image detail.

The way the H.264 codec "deals" with blocking artifacts is due to such format properties as the ability to reduce the block size down to 4x4 pixels, as well as the use of a deblocking filter that smooths out the contrast areas between adjacent blocks.

Deblocking requires a lot of computational resources, therefore, for its implementation, video device encoders should use more powerful (and therefore more expensive!) Processors.

Decoders capable of decoding an H.264 signal must also have more processing power. The software decoder of the H.264 signal, which took part in our "competition", implemented on a personal computer, was twice as intensive as its counterpart MPEG-4; this was observed when filming both test scenes - in the parking lot and in the lobby. When using software applications that provide for the simultaneous display of multiple camera signals, this can significantly affect the hardware requirements of the PCs used.

Despite the fact that the decrease in the bitrate when using the H.264 codec is due to the increased requirements for computing resources, in my opinion, the H.264 format is a serious step in the development of video surveillance systems. The effectiveness of the implementation of the H.264 standard can be expressed in increasing the depth of archiving, reducing the cost of storing video data, or improving the image quality. I think the H.264 format will become ubiquitous as a video compression standard in the security industry, significantly reducing the operational costs of video surveillance systems with higher resolution and frame rates.

Added: 2017-08-31 12:11:30

Today, all modern video surveillance systems are digital in one way or another, that is, in the final form, information always has a digital representation. In this regard, for more efficient storage and transmission over the network, video compression is necessarily used according to certain algorithms.

Basic concepts

Almost everyone knows that video is a sequence of static images that change over time. And these images are composed of an array of pixels.

A pixel is the smallest logical element of an image that changes color depending on its content.

A frame is an array of all pixels that are generated by a video camera at a specific point in time. At the moment, the most common frame sizes in video surveillance systems are: 960x576 (WD1), 1280x720 (HD), 1920x1080 (FullHD), 2688x1520 (4Mpix) and 2560x1920 (5 Mpix).

The frame rate is the rate at which the frames on the monitor are interleaved. In most cases, 25 frames per second is the maximum. In professional jargon, equipment capable of recording and generating a video stream with a frequency of 25 fps has a RealTime prefix. At such a frequency, the human eye perceives a dynamic image smoothly and without twitching, as in reality.

Bit rate is the number of bits of information used to store or transmit video or audio content per unit of time (bps). The bitrate also displays the compression ratio of the data stream. In video surveillance systems, the bitrate can be constant (CBR - Constant Bitrate) or variable (Variable Bitrate). The constant bitrate corresponds to the specified parameters and remains unchanged throughout the entire file. Its main advantage is that you can predict the size of the final file. With a variable bitrate, the codec chooses its value based on the parameters of the desired quality. During the entire encoded video fragment, the bit rate may change.

Key frames (i - frames) - frames that contain complete information about the current image.

Predicted frames (p - frames) are frames containing information only about the difference between the current and the previous picture.

All compression algorithms used in video surveillance systems are based on lossy technologies. That is, during the compression process, part of the redundant information is cut off.

Why does a video need to be compressed?

For clarity, let's calculate a video stream without compression from a FullHD camera at a rate of 25 frames per second. So, we have a frame with a resolution of 1920x1080 and a total number of pixels of 2073600. Let's imagine one pixel in the simplest form of color coding - RGB24, where 8 bits are allocated for the components Red, Green and Blue. That is, 1 pixel will occupy 24 bits of information space. Therefore, one 1080p frame would require 49766400 bits or 47.5 Mbps. I would like to have 25 such frames per second. Hence, the uncompressed bitrate is 47.5 x 25 = 1187.5 Mbps = 1.16 Gbps, that is, to store an hourly fragment of video from a 2 Mpix IP video camera, you will need 500 GB of disk space, and the bandwidth of the gigabit network will not be enough to transmit the stream.

It should be noted that usually the maximum bitrate of a video stream with identical parameters when compressed with the H.264 codec is usually 8 Mbps, which is almost 150 times less than that of uncompressed video. From this it is obvious that without compression algorithms, video surveillance systems would cost tens, or even hundreds of times more expensive than what we have now.

Modern compression algorithms

Time does not stand still, requirements for picture quality are constantly growing. At the same time, the bandwidth of communication channels and the storage capacity would not keep up with this growth at all, if not for the constant improvement of compression algorithms.

H.264 standard

At the moment, the H.264 compression algorithm has been dominating in video surveillance systems for quite a long time.

H.264 compression consists in eliminating redundant data and reducing its volume using numerous algorithms, which we will not consider in detail in this article.

When setting up encoding in video surveillance systems, there are three main profiles of the H.264 codec:

Baseline profile implies minimal load on the decoder processor with low compression. Designed for viewing a camcorder in a local network on a computer.

Main the profile creates an average load on the processor with high compression. This profile is universal and suitable for high-performance PCs and most DVRs.

High the profile provides maximum compression with a heavy load on the decoder. The bitrate when working with such a profile will be 2-3 times lower than when using the baseline profile. When using a video server based on Intel or AMD processors, unlike a video recorder, the load will be distributed to the operation of the entire system.

The future-proof H.265 standard

The H.265 High Efficiency Video Coding (HEVC) compression format is a significant step forward in digital video coding, the main advantage of which is almost 2 times the efficiency compared to the previous H.264 standard. That is, thanks to the new algorithm, signal transmission requires half the network bandwidth, and half the storage capacity for storage. This allows the use of software and hardware at a much lower cost.

By the way, the new standard supports resolutions up to 35 Mpix (8192 x 4320 (8K)), since the maximum block size has been increased to 4096 pixels (H.264 has a block of 256 pixels).

Parallel coding, provided by the H.265 standard, makes it possible to simultaneously process different parts of the frame, which significantly speeds up playback and makes it possible to fully use modern multi-core processors.

In addition, the new standard has received a technology of random access to the image (Clean Random Access), which allows decoding of a randomly selected frame without the need to process the previous images in the stream. This is especially desirable when monitoring requires you to quickly switch to a specific channel.

Despite all the advantages, H.265 is still far from widespread use. Firstly, due to the fact that its use requires an updated hardware, secondly, in order to use the codec, it is necessary to purchase a patent, and thirdly, there are some discrepancies between the efficiencies obtained in laboratory and real conditions.

In the long term, H.265 is likely to replace H.264 as the premier video compression solution.

Optimized H.264 + format

The H.264 + compression algorithm is an innovative format designed specifically for use in video surveillance systems. In fact, H.264 + is a modified H.264 codec (AVC), which is optimized for video surveillance tasks, taking into account all the features.

In the video obtained from security cameras, the scene is always constant and practically does not change, moving objects of interest may be absent for a long time, and noise arising in poor lighting conditions significantly affects the image quality. In the updated format, all these features have been taken into account and are processed by the following technologies that increase the compression ratio:

predictive coding based on the background model;
noise suppression;
long-term video stream management.

Predictive coding. All modern compression algorithms combine intraframe and interframe compression. In intra-frame compression, reference i-frames are encoded independently of other frames, and predicted p-frames use i-frames and other p-frames (inter-frame compression). With inter-frame compression, the efficiency is highly dependent on the choice of the reference frame. Since the background in video surveillance is stable, it is best to use it as a reference i-frame, thereby increasing the compression efficiency of stationary objects and reducing the data flow per reference frames. An intelligent prediction algorithm selects keyframes among those with the least moving objects.

Noise suppression. Typically, moving objects are coded with a static background to maintain quality. Background noise is encoded along with the background. H.264 + uses special algorithms to separate the background from the moving object and encode it with a higher compression ratio. This technology allows you to partially suppress noise and reduce the bit rate.

Long-term video stream management. With background noise reduction, the video bitrate depends on the size of the background portion of the image. For example, when shooting outdoors in the daytime, the background accounts for a very small part of the image, since at this time there are a large number of moving people and cars in the frame. At the same time, the bitrate increases significantly. Conversely, at night the bit rate decreases, as there are much fewer moving objects. The H.264 + format has algorithms for tracking the intensity of video streams and, depending on the time of day, automatically changes the compression ratio. This video stream control technology allows not only to reduce the volume of the video archive, but also to preserve the image quality of moving objects.

Disadvantages of video compression

When using compression algorithms, sometimes so-called artifacts can be clearly observed in the image. For example, splitting an image into 8x8 pixel blocks or losing fine image details (blurring).

Conclusion

The H.264 compression algorithm remains the most popular standard for the vast majority of video surveillance systems. Today it fully fulfills its functions. The innovative H.265 format has not yet become widespread due to some peculiarities, but it has every chance of replacing its predecessor. The optimized H.264 + algorithm also has no global application, as it is used by only a few manufacturers.