DLP technology. Comparison of DLP systems Dlp principle of operation

Even the most fashionable IT terms should be used to the point and as correctly as possible. At least in order not to mislead consumers. Considering yourself as a DLP solution provider has definitely become fashionable. For example, at the recent CeBIT-2008 exhibition, the inscription “DLP solution” could often be seen at the stands of manufacturers of not only little-known antiviruses and proxy servers in the world, but even firewalls. Sometimes there was a feeling that around the next corner it would be possible to see some kind of CD ejector (the program that controls the opening of the CD drive) with the proud slogan of a corporate DLP solution. And, oddly enough, each of these manufacturers, as a rule, had a more or less logical explanation for such positioning of their product (naturally, apart from the desire to get “gesheft” from a fashionable term).

Before considering the market of manufacturers of DLP systems and its main players, you should decide what we mean by a DLP system. There have been many attempts to define this class of information systems: ILD & P - Information Leakage Detection & Prevention (“identification and prevention of information leaks”, the term was proposed by IDC in 2007), ILP - Information Leakage Protection (“protection against information leaks”, Forrester , 2006), ALS - Anti-Leakage Software (“anti-leakage software”, E&Y), Content Monitoring and Filtering (CMF, Gartner), Extrusion Prevention System (by analogy with Intrusion-prevention system).

But the name DLP - Data Loss Prevention (or Data Leak Prevention, protection against data leaks), proposed in 2005, was still adopted as a commonly used term. data from internal threats ”. At the same time, internal threats are understood as abuses (intentional or accidental) by employees of the organization who have legal rights to access the relevant data, their powers.

The most harmonious and consistent criteria for belonging to DLP systems were put forward by Forrester Research during their annual market research. They proposed four criteria according to which the system can be classified as DLP. 1.

Multichannel. The system must be able to monitor multiple possible channels of data leakage. In a networked environment, this is at least e-mail, Web and IM (instant messengers), not just scanning mail traffic or database activity. At the workstation - monitoring of file operations, work with the clipboard, as well as control of e-mail, Web and IM. 2.

Unified management. The system should have unified means of information security policy management, analysis and reporting of events across all monitoring channels. 3.

Active protection. The system should not only detect the facts of violation of the security policy, but also, if necessary, enforce it. For example, block suspicious messages. 4.

Based on these criteria, in 2008, Forrester selected a list of 12 software vendors for review and assessment (below they are listed in alphabetical order, with the name of the company acquired by this vendor in order to enter the DLP market in parentheses) :

  1. Code Green;
  2. InfoWatch;
  3. McAfee (Onigma);
  4. Orchestria;
  5. Reconnex;
  6. RSA / EMC (Tablus);
  7. Symantec (Vontu);
  8. Trend Micro (Provilla);
  9. Verdasys;
  10. Vericept;
  11. Websense (PortAuthority);
  12. Workshare.

To date, of the above 12 vendors on the Russian market, only InfoWatch and Websense are represented to one degree or another. The rest either do not work in Russia at all, or have just announced their intentions to start selling DLP solutions (Trend Micro).

Considering the functionality of DLP systems, analysts (Forrester, Gartner, IDC) introduce a categorization of protected objects - types of information objects subject to monitoring. Such a categorization allows, as a first approximation, to assess the scope of application of a particular system. There are three categories of monitoring objects.

1. Data-in-motion (data in motion) - e-mail messages, Internet pagers, peer-to-peer networks, file transfer, Web traffic, and other types of messages that can be transmitted over communication channels. 2. Data-at-rest (stored data) - information on workstations, laptops, file servers, specialized storages, USB devices and other types of data storage devices.

3. Data-in-use - information currently being processed.

At the moment, there are about two dozen domestic and foreign products on our market that have some properties of DLP systems. Brief information about them in the spirit of the above classification is listed in table. 1 and 2. Also in table. 1 introduced such a parameter as “centralized data storage and audit”, which implies the ability of the system to store data in a single depository (for all monitoring channels) for their further analysis and audit. This functionality has recently acquired particular importance not only due to the requirements of various legislative acts, but also due to its popularity with customers (based on the experience of implemented projects). All information contained in these tables is taken from open sources and marketing materials of the respective companies.

Based on the data given in Tables 1 and 2, we can conclude that today in Russia there are only three DLP systems (from InfoWatch, Perimetrix and WebSence). These include the recently announced integrated product from Jet Infosystems (SCVT + SMAP), since it will cover several channels and have a unified management of security policies.

It is rather difficult to talk about the market shares of these products in Russia, since most of the mentioned manufacturers do not disclose sales volumes, the number of clients and protected workstations, limiting themselves only to marketing information. We can only say for sure that the main suppliers at the moment are:

  • Dozor systems that have been on the market since 2001;
  • InfoWatch products sold since 2004;
  • WebSense CPS (started selling in Russia and worldwide in 2007);
  • Perimetrix (a young company, the first version of its products was announced on its website at the end of 2008).

In conclusion, I would like to add that belonging or not to the class of DLP systems does not make products worse or better - it's just a matter of classification and nothing more.

Table 1. Products presented on the Russian market and possessing certain properties of DLP systems
CompanyProductProduct features
Data-in-motion protectionData-in-use protectionData-at-rest protectionCentralized storage and auditing
InfoWatchIW Traffic MonitorYesYesNoYes
IW CryptoStorageNoNoYesNo
PerimetrixSafeSpaceYesYesYesYes
Jet InfosystemsDozor Jet (SKVT)YesNoNoYes
Dozor Jet (SMAP)YesNoNoYes
Smart Line Inc.DeviceLockNoYesNoYes
SecurITZlockNoYesNoNo
SecrecyKeeperNoYesNoNo
SpectorSoftSpector 360YesNoNoNo
Lumension SecuritySanctuary Device ControlNoYesNoNo
WebSenseWebsense Content ProtectionYesYesYesNo
InformzaschitaSecurity StudioNoYesYesNo
PrimetechInsiderNoYesNoNo
AtomPark SoftwareStaffCopNoYesNoNo
SoftInformSearchInform ServerYesYesNoNo
Table 2. Compliance of products on the Russian market with the criteria for belonging to the class of DLP systems
CompanyProductDLP system membership criterion
MultichannelUnified managementActive protectionConsidering both content and context
InfoWatchIW Traffic MonitorYesYesYesYes
PerimetrixSafeSpaceYesYesYesYes
Jet Infosystems"Dozor Jet" (SKVT)NoNoYesYes
"Dozor Jet" (SMAP)NoNoYesYes
"Smart Line Inc."DeviceLockNoNoNoNo
SecurITZlockNoNoNoNo
Smart Protection Labs SoftwareSecrecyKeeperYesYesYesNo
SpectorSoftSpector 360YesYesYesNo
Lumension SecuritySanctuary Device ControlNoNoNoNo
WebSenseWebsense Content ProtectionYesYesYesYes
"Informzashita"Security StudioYesYesYesNo
"Primetek"InsiderYesYesYesNo
"AtomPark Software"StaffCopYesYesYesNo
"SoftInform"SearchInform ServerYesYesNoNo
"Info Defense""Infoperimeter"YesYesNoNo

DLP technology

Digital Light Processing (DLP) is an advanced technology invented by Texas Instruments. Thanks to it, it became possible to create very small, very light (3 kg - is that weight?) And, nevertheless, quite powerful (more than 1000 ANSI Lm) multimedia projectors.

A brief history of creation

A long time ago, in a distant galaxy ...

In 1987 Dr. Larry J. Hornbeck invented digital multi-mirror device(Digital Micromirror Device or DMD). This invention completed a decade of Texas Instruments research into micromechanical deformable mirror devices(Deformable Mirror Devices or DMD again). The essence of the discovery consisted in the rejection of flexible mirrors in favor of an array of rigid mirrors with only two stable positions.

In 1989, Texas Instruments becomes one of four companies selected to implement the "projector" portion of the U.S. High-Definition Display, funded by the Office of Advanced Research Projects Planning (ARPA).

In May 1992, TI demonstrates the first DMD-based system to support the modern resolution standard for ARPA.

A High-Definition TV (HDTV) version of DMD based on three high-definition DMDs was shown in February 1994.

Mass sales of DMD chips began in 1995.

DLP technology

The key element of DLP multimedia projectors is an array of microscopic mirrors (DMDs) made from an aluminum alloy with very high reflectance. Each mirror is attached to a rigid substrate, which is connected to the matrix base through movable plates. Electrodes connected to CMOS SRAM cells are located at opposite corners of the mirrors. Under the action of an electric field, the substrate with the mirror takes one of two positions, which differ by exactly 20 ° due to the limiters located on the matrix base.

These two positions correspond to the reflection of the incoming light flux, respectively, into the lens and an effective light absorber, which ensures reliable heat dissipation and minimal light reflection.

The data bus and the matrix itself are designed to provide up to 60 frames per second or more with a resolution of 16 million colors.

The matrix of mirrors together with CMOS SRAM make up the DMD crystal - the basis of DLP technology.

The small size of the crystal is impressive. The area of ​​each mirror of the matrix is ​​16 microns or less, and the distance between the mirrors is about 1 micron. The crystal, and more than one, easily fits in the palm of your hand.

In total, if Texas Instruments does not deceive us, there are three types of crystals (or chips) with different resolutions. It:

  • SVGA: 848x600; 508,800 mirrors
  • XGA: 1024 × 768 black aperture (slit space); 786,432 mirrors
  • SXGA: 1280 × 1024; 1,310,720 mirrors

So we have a matrix, what can we do with it? Well, of course, it is more powerful to illuminate it with a light flux and place an optical system on the path of one of the directions of reflections of the mirrors, which focuses the image on the screen. It is wise to place a light absorber in the path of the other direction so that unnecessary light does not cause inconvenience. Now we can project monochrome pictures. But where is the color? Where is the brightness?

But this, it seems, was the invention of comrade Larry, which was discussed in the first paragraph of the section on the history of DLP creation. If you still do not understand what the matter is, get ready, because now a shock may happen to you :), because this self-evident elegant and quite obvious solution is currently the most advanced and technologically advanced in the field of image projection.

Remember a child's trick with a rotating flashlight, the light from which at some point merges and turns into a glowing circle. It is this joke of our view that allows us to finally abandon analog imaging systems in favor of completely digital ones. After all, even digital monitors at the last stage are analog in nature.

But what happens if we force the mirror to switch from one position to another with a high frequency? If we neglect the switching time of the mirror (and due to its microscopic size, this time can be completely neglected), then the apparent brightness will drop only two times. By changing the ratio of the time during which the mirror is in one and the other position, we can easily change the apparent brightness of the image. And since the frequency of the cycles is very, very high, there will be no visible flicker at all. Eureka. Although nothing special, it's all known for a long time :)

Well, now for the final touch. If the switching speed is high enough, then we can successively place light filters in the path of the luminous flux and thereby create a color image.

That is, in fact, the whole technology. We will trace its further evolutionary development on the example of the device of multimedia projectors.

DLP Projectors Device

Texas Instruments does not manufacture DLP projectors, many other companies do this, such as 3M, ACER, PROXIMA, PLUS, ASK PROXIMA, OPTOMA CORP., DAVIS, LIESEGANG, INFOCUS, VIEWSONIC, SHARP, COMPAQ, NEC, KODAK, TOSHIBA , LIESEGANG, etc. Most of the produced projectors are portable, weighing from 1.3 to 8 kg and power up to 2000 ANSI lumens. There are three types of projectors.

Single Matrix Projector

The simplest type that we have already described is - single matrix projector, where a rotating disk with color filters - blue, green and red is placed between the light source and the matrix. The rotational speed of the disk determines the frame rate we are used to.

The image is formed in turn by each of the primary colors, as a result, a normal full-color image is obtained.

All, or almost all, portable projectors are of the single matrix type.

A further development of this type of projectors was the introduction of a fourth, transparent light filter, which makes it possible to significantly increase the brightness of the image.

Three Matrix Projector

The most difficult type of projector is three matrix projector, where light is split into three color streams and reflected from three matrices at once. Such a projector has the purest color and frame rate, not limited by the speed of rotation of the disk, as in single-matrix projectors.

The exact match of the reflected flux from each matrix (flattening) is ensured using a prism, as you can see in the figure.

Dual Matrix Projector

An intermediate type of projectors is dual matrix projector... In this case, the light is split into two streams: red is reflected from one DMD matrix, and blue and green from another. The light filter, respectively, removes the blue or green components from the spectrum one by one.

The dual matrix projector offers intermediate image quality compared to the single matrix and triple matrix types.

Comparison of LCD and DLP Projectors

Compared to LCD projectors, DLP projectors have several important advantages:

Are there any disadvantages to DLP technology?

But theory is theory, but in practice there is still work to do. The main disadvantage is the imperfection of the technology and, as a consequence, the problem of mirrors sticking.

The fact is that with such microscopic dimensions, small details tend to "stick together", and a mirror with a base is no exception.

Despite the efforts made by Texas Instruments to invent new materials that reduce the adhesion of micromirrors, such a problem exists, as we saw when testing a multimedia projector. Infocus LP340... But, I must say, she does not particularly interfere with life.

Another problem is not so obvious and lies in the optimal selection of the mirror switching modes. Every DLP projector company has its own opinion on this.

And the last thing. Despite the minimum time for switching the mirrors from one position to another, this process leaves a barely noticeable loop on the screen. A kind of free antialiasing.

Technology development

  • In addition to the introduction of a transparent light filter, work is constantly underway to reduce the intermirror space and the area of ​​the column that attaches the mirror to the substrate (black point in the middle of the image element).
  • By dividing the matrix into separate blocks and expanding the data bus, the switching frequency of the mirrors is increased.
  • Work is underway to increase the number of mirrors and reduce the size of the matrix.
  • The power and contrast of the luminous flux are constantly increasing. There are already over 10,000 ANSI Lm triple matrix projectors and over 1000: 1 contrast ratios available in cutting edge digital cinemas.
  • DLP technology is completely ready to replace CRT technology for displaying images in home theaters.

Conclusion

This is not all that could be said about DLP technology, for example, we did not touch on the topic of using DMD matrices in printing. But we will wait until Texas Instruments confirms the information available from other sources, so as not to slip you a "linden". I hope this short story is enough to get, if not the most complete, but sufficient understanding of the technology and not torment sellers with questions about the advantages of DLP projectors over others.


Thanks to Alexey Slepynin for his help in preparing the material

To be fairly consistent in the definitions, we can say that information security began precisely with the advent of DLP systems. Prior to that, all products that dealt with "information security", in fact, protected not information, but the infrastructure - the places where data is stored, transmitted and processed. The computer, application or channel in which confidential information is located, processed or transmitted is protected by these products in the same way as the infrastructure in which completely harmless information circulates. That is, it was with the advent of DLP products that information systems finally learned to distinguish confidential information from non-confidential. Perhaps, with the integration of DLP technologies into the information infrastructure, companies will be able to save a lot on information protection - for example, use encryption only in cases where confidential information is stored or transmitted, and not encrypt information in other cases.

However, this is a matter of the future, and in the present, these technologies are used mainly to protect information from leaks. Information categorization technologies form the core of DLP systems. Each manufacturer considers its methods of detecting confidential information to be unique, protects them with patents and comes up with special trademarks for them. After all, the rest of the architecture elements other than these technologies (protocol interceptors, format parsers, incident management and data warehouses) are identical for most manufacturers, and for large companies they are even integrated with other information infrastructure security products. Basically, to categorize data in products for protecting corporate information from leaks, two main groups of technologies are used - linguistic (morphological, semantic) analysis and statistical methods (Digital Fingerprints, Document DNA, anti-plagiarism). Each technology has its own strengths and weaknesses, which determine their area of ​​application.

Linguistic analysis

The use of stop words ("secret", "confidential" and the like) to block outgoing e-mail messages in mail servers can be considered the progenitor of modern DLP systems. Of course, this does not protect against intruders - it is not difficult to remove the stop word, which is most often placed in a separate stamp of the document, and the meaning of the text will not change at all.

The development of linguistic technology was pushed forward at the beginning of this century by the creators of email filters. First of all, to protect e-mail from spam. It is now that reputational methods prevail in anti-spam technologies, and at the beginning of the century there was a real linguistic war between the projectile and the armor - spammers and anti-spammers. Remember the simplest methods for fooling stopword based filters? Replacing letters with similar letters from other encodings or numbers, transliteration, randomly spaced spaces, underscores or line breaks in the text. Antispammers quickly learned how to deal with such tricks, but then graphic spam and other cunning types of unwanted mail appeared.

However, it is impossible to use anti-spam technologies in DLP products without serious revision. Indeed, to combat spam, it is enough to divide the information flow into two categories: spam and non-spam. The Bayesian method, which is used to detect spam, gives only a binary result: "yes" or "no". This is not enough to protect corporate data from leaks - you cannot simply divide information into confidential and non-confidential. You need to be able to classify information by functional affiliation (financial, production, technological, commercial, marketing), and within classes, categorize it by access level (for free distribution, for limited access, for official use, secret, top secret, and so on).

Most modern systems of linguistic analysis use not only context analysis (that is, in what context, in combination with what other words a particular term is used), but also semantic analysis of the text. The larger the analyzed fragment, the more efficiently these technologies work. On a large piece of text, the analysis is carried out more accurately, the category and class of the document is more likely to be determined. When analyzing short messages (SMS, Internet pagers), nothing better than stop words has yet been invented. The author faced such a problem in the fall of 2008, when thousands of messages like "they are cutting us down", "they will take away our license", "outflow of depositors" from the workplaces of many banks through instant messengers went to the Network, which had to be immediately blocked from their clients.

Technology advantages

The advantages of linguistic technologies are that they work directly with the content of documents, that is, they do not care where and how the document was created, what stamp is on it, and what the file is called - the documents are protected immediately. This is important, for example, when processing drafts of confidential documents or to protect incoming documents. If the documents created and used within the company can somehow be named, stamped or marked in a specific way, then the incoming documents may have stamps and labels that are not accepted in the organization. Drafts (if, of course, they are not created in a secure document management system) may also already contain confidential information, but not yet contain the necessary labels and marks.

Another advantage of linguistic technologies is their learning ability. If you have pressed the "Do not spam" button in your mail client at least once in your life, then you already represent the client part of the linguistic engine training system. Note that you absolutely do not need to be a certified linguist and know what exactly will change in the database of categories - you just need to indicate to the system a false positive, and it will do the rest itself.

The third advantage of linguistic technologies is their scalability. The speed of information processing is proportional to its quantity and does not depend at all on the number of categories. Until recently, the construction of a hierarchical base of categories (historically it is called BKF - the base of content filtering, but this name no longer reflects the real meaning) looked like a kind of shamanism of professional linguists, so setting up the BKF could be safely attributed to shortcomings. But with the release in 2010 of several "autolinguists" products at once, the construction of the primary database of categories has become extremely simple - the system indicates the places where documents of a certain category are stored, and it itself determines the linguistic characteristics of this category, and in case of false positives it learns on its own. So now the ease of customization has been added to the advantages of linguistic technologies.

And one more advantage of linguistic technologies, which I would like to note in the article, is the ability to detect categories in information flows that are not related to documents within the company. A tool for controlling the content of information flows can identify categories such as illegal activities (piracy, distribution of prohibited goods), the use of company infrastructure for their own purposes, damage to the company's image (for example, spreading defamatory rumors), and so on.

Disadvantages of technology

The main disadvantage of linguistic technologies is their dependence on the language. It is not possible to use a linguistic engine designed for one language to analyze another. This was especially noticeable when American manufacturers entered the Russian market - they were not ready to face Russian word formation and the presence of six encodings. It was not enough to translate categories and keywords into Russian - in English, word formation is quite simple, and cases are put into prepositions, that is, when the case changes, the preposition changes, and not the word itself. Most nouns in English become verbs without word changes. Etc. In Russian, everything is not so - one root can give rise to dozens of words in different parts of speech.

In Germany, American manufacturers of linguistic technologies were faced with another problem - the so-called "compounds", compound words. In German, it is customary to attach definitions to the main word, as a result of which words are obtained, sometimes consisting of ten roots. In the English language there is no such thing, there the word is a sequence of letters between two spaces, respectively, the English linguistic engine was unable to process unfamiliar long words.

For the sake of fairness, it should be said that now these problems have been largely resolved by American manufacturers. The language engine had to be pretty much redesigned (and sometimes rewritten), but the large markets in Russia and Germany are certainly worth it. It is also difficult to process multilingual texts with linguistic technologies. However, most engines still cope with two languages, usually the national language + English - this is quite enough for most business tasks. Although the author has come across confidential texts containing, for example, Kazakh, Russian and English at the same time, this is more the exception than the rule.

Another disadvantage of linguistic technologies for monitoring the entire range of corporate confidential information is that not all confidential information is in the form of coherent texts. Although the information in databases is stored in text form, and there are no problems to extract the text from the DBMS, the information received most often contains proper names - full names, addresses, company names, as well as digital information - account numbers, credit cards, their balance, etc. ... Linguistic processing of this kind of data won't do much good. The same can be said about CAD / CAM formats, that is, drawings, which often contain intellectual property, program codes and media (video / audio) formats - some texts can be extracted from them, but their processing is also inefficient. Three years ago, this also applied to scanned texts, but the leading manufacturers of DLP systems quickly added optical recognition and coped with this problem.

But the biggest and most often criticized shortcoming of linguistic technologies is still the probabilistic approach to categorization. If you have ever read a letter with the category "Probably SPAM", then you will understand what I mean. If this happens with spam, where there are only two categories (spam / not spam), you can imagine what will happen when several dozen categories and privacy classes are loaded into the system. Although training the system can achieve 92-95% accuracy, for most users this means that every tenth or twentieth movement of information will be mistakenly assigned to the wrong class with all the consequences for the business (leakage or interruption of a legitimate process).

It is usually not customary to attribute the complexity of technology development to disadvantages, but it cannot be ignored. The development of a serious linguistic engine with the categorization of texts in more than two categories is a science-intensive and rather complicated technological process. Applied linguistics is a rapidly developing science that received a strong impetus in its development with the spread of Internet search, but today there are units of workable categorization engines on the market: there are only two of them for the Russian language, and for some languages ​​they simply have not yet been developed. Therefore, in the DLP market, there are only a couple of companies that are able to fully categorize information on the fly. It can be assumed that when the DLP market grows to multi-billion dollar sizes, Google will easily enter it. With its own linguistic engine, tested on trillions of search queries in thousands of categories, it will not be difficult for him to immediately grab a serious piece of this market.

Statistical Methods

The problem of computer search for significant citations (why exactly "significant" - a little later) interested linguists in the 70s of the last century, if not earlier. The text was broken into pieces of a certain size, and a hash was taken from each of them. If a certain sequence of hashes occurred in two texts at the same time, then with a high probability the texts in these areas coincided.

A by-product of research in this area is, for example, the "alternative chronology" of Anatoly Fomenko, a respected scholar who worked on "text correlations" and once compared Russian chronicles of different historical periods. Surprised at how much the chronicles of different centuries coincide (more than 60%), in the late 70s he put forward the theory that our chronology is several centuries shorter. Therefore, when some DLP company entering the market offers a "revolutionary citation search technology", it is very likely that the company has not created anything other than a new brand name.

Statistical technologies treat texts not as a coherent sequence of words, but as an arbitrary sequence of characters, therefore they work equally well with texts in any languages. Since any digital object - even a picture, even a program - is also a sequence of characters, the same methods can be used to analyze not only textual information, but also any digital objects. And if the hashes in two audio files coincide, one of them probably contains a quote from the other, so statistical methods are effective means of protecting against audio and video leaks, which are actively used in music studios and film companies.

It's time to get back to the concept of "meaningful quote". The key characteristic of a complex hash removed from the protected object (which in different products is called either Digital Fingerprint or Document DNA) is the step with which the hash is removed. As can be understood from the description, such a "fingerprint" is a unique characteristic of an object and at the same time has its own size. This is important because if you take prints from millions of documents (and this is the storage capacity of an average bank), then you will need enough disk space to store all the prints. The size of such a fingerprint depends on the hash step - the smaller the step, the larger the fingerprint. If you take a hash in steps of one character, then the size of the print will exceed the size of the sample itself. If you increase the step (for example, 10,000 characters) to reduce the "weight" of the print, then at the same time, the likelihood that a document containing a quotation from a sample of 9,900 characters in length will be confidential, but slip through unnoticed, increases.

On the other hand, if you take a very small step, several characters, to increase the detection accuracy, you can increase the number of false positives to an unacceptable value. In terms of text, this means that you should not remove the hash from each letter - all words consist of letters, and the system will take the presence of letters in the text for the content of a quote from the sample text. Usually, manufacturers themselves recommend some optimal step for removing hashes so that the quote size is sufficient and the weight of the print itself is small - from 3% (text) to 15% (compressed video). In some products, manufacturers allow you to change the size of the significance of the quote, that is, increase or decrease the hash step.

Technology advantages

As you can see from the description, a sample object is needed to detect a quote. And statistical methods can tell with good accuracy (up to 100%) whether there is a significant quote from the sample in the file being checked or not. That is, the system does not take responsibility for categorizing documents - this work is entirely on the conscience of the person who categorized the files before taking fingerprints. This greatly facilitates the protection of information in the event that infrequently changing and already categorized files are stored in some place (s) in the enterprise. Then it is enough to take a fingerprint from each of these files, and the system will, in accordance with the settings, block the transfer or copying of files containing significant quotes from the samples.

The independence of statistical methods from the language of the text and non-textual information is also an indisputable advantage. They are good for protecting static digital objects of any type - pictures, audio / video, databases. I will tell you about the protection of dynamic objects in the "disadvantages" section.

Disadvantages of technology

As with linguistics, the disadvantages of technology are the flip side of the merits. The simplicity of training the system (indicated the file to the system, and it is already protected) shifts the responsibility for training the system to the user. If suddenly a confidential file is in the wrong place or has not been indexed due to negligence or malicious intent, the system will not protect it. Accordingly, companies concerned about protecting confidential information from leakage should provide for a procedure to control how confidential files are indexed by the DLP system.

Another disadvantage is the physical size of the print. The author has repeatedly seen impressive pilot projects in prints where a DLP system is 100% likely to block the forwarding of documents containing meaningful citations from three hundred sample documents. However, after a year of operating the system in combat mode, the print of each outgoing letter is no longer compared with three hundred, but with millions of sample prints, which significantly slows down the work of the mail system, causing delays of tens of minutes.

As I promised above, I will describe my experience in protecting dynamic objects using statistical methods. The time it takes to take a print directly depends on the file size and format. For a text document like this article, it takes a split second, for an hour and a half MP4 movie - tens of seconds. For rarely modified files, this is not critical, but if an object changes every minute or even a second, then a problem arises: after each change of the object, a new fingerprint must be removed from it ... The code the programmer is working on is not the most difficult, much worse databases used in billing, ABS or call centers. If the fingerprint time is longer than the object invariance time, then the problem has no solution. This is not such an exotic case - for example, the fingerprint of a database storing phone numbers of customers of a federal cellular operator is taken for several days, and changes every second. So when a DLP vendor claims that their product can protect your database, mentally add the word "quasi-static".

Unity and struggle of opposites

As you can see from the previous section of the article, the strength of one technology manifests itself where the other is weak. Linguistics doesn't need samples, it categorizes data on the fly, and can protect information that hasn't been accidentally or intentionally fingerprinted. The print gives the best accuracy and is therefore preferred for automatic use. Linguistics works great with texts, prints - with other formats of information storage.

Therefore, the majority of leading companies use both technologies in their developments, with one of them being the main one, and the other additional. This is due to the fact that initially the company's products used only one technology, in which the company advanced further, and then, at the demand of the market, the second was connected. For example, earlier InfoWatch used only licensed linguistic technology Morph-OLogic, and Websense - PreciseID technology, which belongs to the Digital Fingerprint category, but now companies use both methods. Ideally, these two technologies should be used not in parallel, but in series. For example, prints will do a better job of defining the type of document - whether it's a contract or a balance sheet, for example. Then you can connect the already linguistic base created specifically for this category. This greatly saves computing resources.

Outside of the article, there are still several types of technologies used in DLP products. These include, for example, a structure analyzer that allows you to find formal structures in objects (credit card numbers, passports, tax identification numbers, and so on) that cannot be detected either using linguistics or using fingerprints. Also, the topic of different types of labels has not been disclosed - from records in the attribute fields of a file or just a special file name to special cryptocontainers. The latter technology is out of date, as most manufacturers prefer not to reinvent the wheel on their own, but to integrate with DRM system manufacturers such as Oracle IRM or Microsoft RMS.

DLP products are a rapidly developing industry of information security; some manufacturers release new versions very often, more than once a year. We look forward to the emergence of new technologies for analyzing the corporate information field to increase the efficiency of protecting confidential information.

The choice of a specific DLP system depends on the required level of data security and is always selected individually. For assistance in choosing a DLP system and calculating the cost of its implementation in the company's IT infrastructure, leave a request, and we will contact you as soon as possible.

What is a DLP system

DLP system(Data Leak Prevention in translation from English - means of preventing data leakage) are technologies and technical devices that prevent the leakage of confidential information from information systems.

DLP systems analyze data streams and control their movement within a certain protected perimeter of the information system. These can be ftp-connections, corporate and web-mail, local connections, as well as sending instant messages and data to a printer. In the case of converting confidential information in the stream, a system component is activated, which blocks the transmission of the data stream.

In other words, DLP systems stand guard over confidential and strategically important documents, the leakage of which from information systems to the outside can bring irreparable damage to the company, as well as violate Federal Laws No. 98-FZ "On Commercial Secrets" and No. 152-FZ "On Personal Data". Information protection against leakage is also mentioned in GOST. "Information technology. Practical rules for information security management "- GOST R ISO / IEC 17799-2005.

As a rule, leakage of confidential information can be carried out both after hacking and penetration, and as a result of carelessness, negligence of the company's employees, as well as the efforts of insiders - the deliberate transfer of confidential information by employees of the company. Therefore, DLP systems are the most reliable technologies for protecting against leakage of confidential information - they detect protected information by content, regardless of the document language, signature, transmission channels and format.

Also, DLP system controls absolutely all channels that are used on a daily basis to transmit information in electronic form. Information streams are automatically processed based on the established security policy. If, however, the actions of confidential information conflict with the security policy established by the company, then the transfer of data is blocked. At the same time, the company's authorized representative responsible for information security receives an instant message warning of an attempt to transfer confidential information.

DLP system implementation, first of all, ensures compliance with a number of PCI DSS requirements regarding the level of information security of an enterprise. Also, DLP systems carry out automatic audit of protected information, according to its location and provide automated control, in accordance with the rules for the movement of confidential information in the company, processing and preventing incidents of illegal disclosure of classified information. The data leakage prevention system, based on incident reports, monitors the overall level of risks, and also, in the modes of retrospective analysis and immediate response, controls information leakage.

DLP systems are installed in both small and large enterprises, preventing information leakage, thereby protecting the company from financial and legal risks that arise from the loss or transfer of important corporate or confidential information.

Today the DLP market is one of the fastest growing among all information security products. However, the domestic information security sphere does not quite keep up with global trends, and therefore the DLP systems market in our country has its own peculiarities.

What is DLP and how do they work?

Before talking about the market for DLP systems, you need to decide what, in fact, is meant when it comes to such solutions. DLP systems are commonly understood as software products that protect organizations from leaks of confidential information. The abbreviation DLP itself stands for Data Leak Prevention, that is, the prevention of data leaks.

Systems of this kind create a secure digital "perimeter" around the organization, analyzing all outgoing and, in some cases, incoming information. Controlled information should be not only Internet traffic, but also a number of other information flows: documents that are taken outside the protected security loop on external media, printed on a printer, sent to mobile media via Bluetooth, etc.

Since a DLP system must prevent leakage of confidential information, it necessarily has built-in mechanisms for determining the degree of confidentiality of a document found in intercepted traffic. As a rule, there are two most common ways: by parsing special markers of the document and by parsing the content of the document. Currently, the second option is more widespread, since it is resistant to modifications made to the document before it is sent, and also makes it easy to expand the number of confidential documents with which the system can work.

DLP Side Tasks

In addition to their main task of preventing information leaks, DLP systems are also well suited for solving a number of other tasks related to monitoring personnel actions.

Most often DLP systems are used to solve the following non-core tasks for themselves:

  • control over the use of working time and work resources by employees;
  • monitoring the communication of employees in order to identify the "undercover" struggle that can harm the organization;
  • control over the legality of employees' actions (prevention of printing forged documents, etc.);
  • identification of employees sending out resumes for the prompt search of specialists for the vacant position.

Due to the fact that many organizations consider a number of these tasks (especially control of the use of working time) to be of higher priority than protection against information leaks, a number of programs have emerged designed specifically for this, but in some cases, they can also work as a means of protecting an organization from leaks. ... Such programs are distinguished from full-fledged DLP systems by the lack of advanced analysis tools for intercepted data, which must be performed manually by an information security specialist, which is convenient only for very small organizations (up to ten controlled employees).

Did you like the article? To share with friends: