The standard extensions of text document files are. We study the formats of text files

A set of rules for storing data in a file is called the file format. Various types files, such as text files, raster graphics, etc., use different formats. In general, for several types of files, several different formats, although often the same type of file and format are understood. The file format is determined by the file name extension, which is added to the file name when it is saved in a certain format, for example, DOC, GIF, etc.

As a rule, file formats are created for use in a strictly defined application program. For example, graphic objects created in a known CorelDRAW vector graphics package are saved as files with a CDR extension, and images generated by another graphics package, CorelXara, are written to disk as files with the extension XAR. Some formats are not associated with specific applications, that is, they are universal. One of the most well-known universal formats is the TXT format (format text files DOS).

Often, compression of computer files is used to save space on the medium. There are many ways to compress files. These methods depend on the original file format. Typically, the higher the compression ratio, the slower the read and write operations.

As for the compression algorithms, there are both compression algorithms without loss of data, and algorithms, in the use of which data loss is possible.

Lossless compression ensures that all data that was in the file before compression is present after the file is unpacked. Lossless compression mechanisms are used when saving text or numeric data, such as spreadsheets or document files. Examples of lossless compression algorithms are well-known algorithms ZIP, ARJ, and others.

Let's give a brief description of the main formats used:

American Standard Code for Information Interchange ASCII (TXT). The format of text files developed by the American Institute of Standards (American National Standards Institute). It is supported by all operating systems and all programs. It is a text file in DOS-encoding, there is no function to insert a picture, there is no formatting, it works in all machines, it is possible to create only small volume files.

§ ANSI (TXT). The format of text files in ANSI encoding (for the code page Microsoft Windows)

§ MsWord for DOS, Windows (.DOC). The document format developed by Microsoft Corporation is supported by programs for MS-DOS and most word processors. It saves the original formatting of documents, as well as character styles. In addition to textual information, files of this format can contain graphic pictures with various parameters. Supports 256 colors. Does not support compression. Used mainly to exchange formatted text data between different platforms and applications.

§ Hypertext Markup Language HTML (HTM, HTML). Markup language of hypertext documents. All pages located on the Internet are created using this special language. HTML documents are ASCII files, accessible for viewing and editing in any text editor. The difference from a conventional text file is that in HTML documents there are special tag commands that define the rule for formatting a document. If you can master the HTML language, then you can create pages for the Internet. Adding tags (labels) to the usual text, you force the viewer to display this text in a certain way and place it on the image page. If you've studied Java and JavaScript, you know how to extend the capabilities of HTML by putting commands written in the scripting language inside the tags.

§ Portable Document Format PDF (.PDF). This document storage format, developed by Adobe, claims the role of an open typographic standard for the Web. It is seen as an alternative to HTML. The disadvantage of HTML is that documents translated into HTML usually do not preserve the original format, and HTML offers a very limited number of headsets when viewing. On the contrary, users of the Acrobat program and PDF-tools for creating, distributing and viewing documents in the original format know that readers will see the publication exactly as it was made. PDF format irreplaceable, if you want to get an exact copy of the required document. As an example of a successful application PDF for the documents in Russian, we bring the "Moscow News" server to the Internet. The materials presented on it in electronic form completely repeat the paper original printed with the printing method.

§ Standard Generalized Markup Language (SGML). The development of HTML is translated as the standard language of generalized markup. It is an instrumental set of mechanisms for creating structured documents marked with tags. Compared to HTML, it provides more flexible and versatile formatting capabilities on the Web. However, SGML is different and increased speed, therefore as a simpler tool applies PDF. The power of SGML lies in its cross-platform structural approach to describing the content of documents. SGML is actually a metalanguage, i.e. It is intended for the description of the markup languages used when creating documents.

Software for processing text and graphics data.

One of the most common functions of a modern personal computer is the preparation of a variety of text documents.

There are two main groups of programs for preparing text documents: text editors and word processors.

Text editors, mostly called programs that create text files without formatting elements (that is, they do not allow to separate parts of the text with different fonts and headsets). Editors of this kind are irreplaceable when creating texts of computer programs.

Word Processors are able to format the text, insert graphics and other objects in the document that are not related to the classical concept of "text." It should be noted the conventionality of such a separation - a variety of programs for text processing allows you to find an editor with any set of functions.

Some word processors are so-called WYSIWYG editors. The name is derived from the first letters of the phrase What You See Is What You Get - what you see is what you get. When they say that this is a WYSIWYG editor, they guarantee full compliance of the appearance of the document on the computer screen and its printed copy. Editors of this type include Word and StarWriter.

Some modern editors support the concept of "almost" WYSIWYG. The type of document on the screen is slightly different from what the printed document will look like, but this is done specifically for the purpose of more efficient use of the working window of the document. Examples of "almost" WYSIWYG-editors are Netscape Composer and KLyX.

Formats of text files

Text files - the most common type of data in the computer world. There are several problems associated with text files. The first is an extremely large number of symbols required to support different languages. American programmers for working with 128 characters use the US ASCII character set (American standard code for information exchange). To support other languages, 256 characters are often missing, so now a gradual transition to the Unicode encoding is carried out, in which two bytes are reserved for storing one character (that is, it is possible to encode 65,536 different characters).

The second problem is that people want the printed documents to contain graphics, diagrams, notes, headings, and so that different fonts are used. Documents distributed on the Internet (online documents) may contain animations, links to various network resources and sound.

Many text files are transmitted as plain text. Simple text is difficult to make attractive and easy to read, because it does not have fonts of various types, graphics, headings, subtitles, etc. These additional features are called markup.

Speaking about the markup of the text, the concepts of physical and logical marking are distinguished. When using physical text markup, the exact appearance of each fragment is indicated. For example, "centered text, 14th size, fatty, Headset Times". When logical markup is specified, the logical value of this fragment, for example, "this is the heading of the chapter". These two methods of markup are intended, as a rule, for use in different situations. To print the text on the printer, you need to use physical markup. Decisions must be made about the size of the fields and paragraph indentations. Early versions of word processors used only the physical type of markup. For each fragment, the font, size, and style were specified.

When exchanging information with other people, the physical design of the text imposes a number of restrictions, especially for online documents. Screen size, resolution, fonts are different for different systems. For these reasons, the logical design of the text is increasingly used. In some cases, the logical design is practically necessary: when creating electronic documents such as WWW pages or when creating and publishing voluminous works such as books.

To save markup documents when sending text information from the machine to the machine, apply different ways. Text processors and publishing systems use specially designed file formats, containing not only text, but also information about how it should be formatted. The main problem here is the incompatibility of such formats, although the most complex programs can usually read files in the formats of competing programs. Examples of this approach are word processors Word and StarWriter.

In another approach, special markup commands are inserted directly into the text of the document. Even if you do not have software, which supports this format, you will still be able to understand it. There are a lot of ways of similar representation of markup of the text, including:

HyperText Markup Language (HTML), used in the World Wide Web;

TeX and LaTeX, popular with many academic publications, as well as mathematicians, physicists, chemists and even musicians.

Examples of programs that allow the markup of text in this way are Netscape Composer and LyX (KLyX).

Files created by different editors often have unique extensions that allow you to guess the ways of marking text without looking inside the document. So the files created by the editors of plain text preparation often have an extension.txt, and those prepared in the Lyx editor are .lyx. The Word word processor creates files in MS Word format (extension.doc) by default, but it supports other formats, for example RTF (.rtf extension). Documents that contain HTML markup commands have the extension .html or .htm.

Obviously, it is impossible to list all text editors. Many of them are "imprisoned" for this or that specific activity. In the list considered below, only a small part of the text editors is represented.

Editors of unformatted texts

NotePad - built into operating system Windows is understandable and simple and use;

McEdit - resembles the editor Edit from MS DOS, a component of the file manager mc (Midnight Commander) Linux OS;

KEdit - the simplest text editor, is a part of KDE Linux;

KWrite is a text editor with a number of advanced settings compared to other simple text editors;

Emacs - combines the functions of a file manager and a text editor; one of the distinguishing features is the ability to create macros (macros); It is available in all Unix clones, including Linux; Emacs can also be used in MS Windows.

Editors that create text with markup elements

Word - serves to create a variety of printed documents, is a component of office applications in MS Windows;

StarWriter - is a part of the StarOffice program, it looks like Word in appearance and functionality, works equally well in both MS Windows and Linux OS;

LyX (KLyX in KDE) is a modern text editor designed for people who want to get a document that looks professional, but spend a minimum amount of time creating it; the editor inserts TeX and LaTeX markup commands into the text;

Netscape Composer - inserts HTML markup commands into the text, there are versions for both Linux and MS Windows.

When processing information related to the image on the monitor, it is common to distinguish three main areas: image recognition, image processing and computer graphics.

The main task of recognitionthe basics consists in converting an already existing image into a formally understandable symbol language. Image recognition or the system of technical vision (COMPUTER VISION) - a set of methods that allow you to obtain a description of the image submitted to the input, or assign the specified image to a certain class (this is the case, for example, when sorting mail). One of the tasks of COMPUTER VISION is the so-called skeletonization of objects, in which a certain basis of the object, its "skeleton," is restored.

Image processing (IMAGE PROCESSING) considers tasks in which both input and output data are images. For example, image transmission with elimination of noise and data compression, transition from one type of image to another (from color to black and white), etc. Thus, the processing of images is understood as the activity over images (image transformation). The task of image processing can be either an improvement depending on a certain criterion (restoration, restoration), or a special transformation that radically changes the images.

When processing images, the following task groups exist:

We confine ourselves to working only with a digital image. Digital transformations for the purpose of conversion can be divided into two types:

Restoration of the image - compensation of the existing distortion (for example, poor conditions of photography);

Image enhancement is image distortion in order to improve visual perception or to transform into a form that is convenient for further processing.

Computer (computer) graphics (COMPUTER GRAPHICS) reproduces the image in the case where the original information is non-descriptive nature. For example, visualization of experimental data in the form of graphs, histograms or diagrams, the output of information on the screen of computer games, the synthesis of scenes on simulators.

Computer graphics has now been formed as a science of hardware and software for a variety of images from simple drawings to realistic images of natural objects. Computer graphics is used in almost all scientific and engineering disciplines for visibility and perception, information transfer. It is used in medicine, advertising, entertainment, etc. Without computer graphics, no modern program can do. Work on graphics takes up to 90% of the working time of program teams that produce programs for mass application.

The final product of computer graphics is the image.This image can be used in various spheres, for example, it can be a technical drawing, an illustration depicting a detail in the operation manual, a simple diagram, an architectural view of the proposed construction or a design assignment, an advertising illustration or a frame from a cartoon.

Computer graphics is a science, the subject of which is the creation, storage and processing of models and their images with the help of a computer, i.e. this is the section of computer science that deals with the problems of obtaining various images (drawings, drawings, animations) on the computer.

In computer graphics, the following tasks are considered:

Representation of the image in computer graphics;

Preparing the image for visualization;

Create an image;

Realization of actions with the image.

Computer graphics is usually understood as the automation of the processes of preparing, converting, storing and reproducing graphic information with the help of a computer. Under the graphic information are understood models of objects and their images.

In the event that the user can control the characteristics of objects, they talk about interactive computer graphics, i.e. The ability of a computer system to create graphics and conduct a dialogue with a person. Currently, almost any program can be considered an interactive computer graphics system.

Interactive computer graphics - this is also the use of computers for the preparation and playback of images, but the user has the ability to quickly make changes to the image directly during the playback, ie. It is assumed that it is possible to work with graphics in real-time dialogue mode.

Interactive graphics is an important part of computer graphics, where the user is able to dynamically manage the contents of the image, its shape, size and color on the display surface using interactive control devices.

Historically, the first interactive systems are computer-aided design (CAD) systems, which appeared in the 60's. They represent a significant stage in the evolution of computers and software. In the system of interactive computer graphics, the user perceives on the display an image representing a certain complex object, and can make changes to the description (model) of the object. Such changes can be as input and editing of separate elements, and setting of numerical values for any parameters, and also other operations on input of the information on the basis of perception of images.

Raster graphics, general information

A computer raster image is represented as a rectangular matrix, each cell of which is represented by a colored dot.

The basis of raster representation of graphics is pixel (dot) with the indication of its color. When describing, for example, a red ellipse on a white background, you must specify the color of each point of the ellipse and the background. The image is represented in the form of a large number of points - the more of them, the better the image and the larger the file size. Those. one and even a picture can be presented with better or worse quality according to the number of points per unit length - resolution (usually, dots per inch - dpi or pixels per inch - ppi).

Bitmap images resemble a sheet of checkered paper on which any cell is filled with either black or white, forming in the aggregate a picture. Pixel is the main element of raster images. It is from these elements that there is a raster image, i.e. Raster graphics describe images using color points (pixels) located on the grid.

When editing raster graphics, you edit pixels, not lines. Raster graphics depend on the resolution, because the information describing the image is attached to a grid of a certain size. When editing raster graphics, the quality of its presentation may change. In particular, changing the size of raster graphics can lead to "raspolmachivaniyu" the edges of the image, because the pixels will be redistributed on the grid. The output of raster graphics to devices with a lower resolution than the resolution of the image itself, will reduce its quality.

In addition, quality is also characterized by the number of colors and shades that each point of the image can take. The more shades characterized by images, the greater the number of bits required to describe them. Red can be color number 001, and maybe - 00000001. Thus, the higher the image, the larger the file size.

A raster representation is usually used for images of a photographic type with lots of details or shades. Unfortunately, scaling such pictures in any direction usually degrades the quality. With a reduction in the number of points, small details are lost and the inscriptions are deformed (although this may not be so noticeable if the visual dimensions of the picture itself are reduced, ie, the resolution is preserved). Adding pixels leads to a decrease in the sharpness and brightness of the image, because The new points have to give shades, the average between two or more bordering colors.

With the help of raster graphics, you can reflect and convey the whole gamut of shades and subtle effects inherent in the real image. The raster image is closer to the photo, it allows you to more accurately reproduce the main characteristics of the photo: illumination, transparency and depth of field.

Most often, raster images are obtained by scanning photos and other images, using a digital camera, or by "capturing" a video shot. Bitmap images can also be obtained directly from raster or vector graphics programs by converting vector images.

The formats .tif, .gif, .jpg, .png, .bmp, .pcx, etc. are common.

Vector graphics, general information

Vector graphics describe images using straight and curved lines, called vectors, as well as parameters that describe colors and layout. For example, the image of a tree leaf (see Figure 1) is described by the points through which the line passes, thereby creating a sheet contour. The color of the sheet is specified by the color of the outline and the area inside this outline.

Unlike raster graphics in vector graphics, the image is constructed using mathematical descriptions of objects, circles and lines. Although at first glance this may seem more complicated than using raster arrays, but for some types of images, using mathematical descriptions is a simpler way.

The key point of vector graphics is that it uses a combination of computer commands and mathematical formulas for the object. This allows computer devices to calculate and place in the right place the real points when drawing these objects. This feature of vector graphics gives it a number of advantages over raster graphics, but at the same time is the cause of its shortcomings.

Vector graphics are often called object-oriented graphics or drawing graphics. Simple objects such as circles, lines, spheres, cubes, and the like are called primitives, and are used to create more complex objects. In vector graphics, objects are created by combining different objects.

To create vector drawings, you must use one of the many illustrative packages. The advantage of vector graphics is that the description is simple and takes up little computer memory. However, the disadvantage is that the detailed vector object can be too complicated, it can not be printed in the form in which the user expects or does not print at all if the printer misinterprets or does not understand the vector commands.

When editing elements of vector graphics, the parameters of straight and curved lines that describe the shape of these elements change. You can transfer items, change their size, shape and color, but this will not affect the quality of their visual presentation. Vector graphics do not depend on resolution, i.e. can be displayed in a variety of output devices with different resolutions without loss of quality.

The vector representation consists in describing the image elements with mathematical curves, indicating their colors and fillability.

Another advantage is quality scaling in any direction. Increase or decrease of objects is made by increasing or decreasing the corresponding coefficients in mathematical formulas. Unfortunately, the vector format becomes unprofitable when transferring images with a large number of tints or small details (for example, photos). After all, every smallest glare in this case will not be represented by a collection of monochrome points, but by a complex mathematical formula or a collection of graphic primitives, each of which is a formula. This leads to a heavier file. In addition, converting an image from a bitmap to a vector format (for example, Adobe Strime Line or Corel OCR-TRACE) results in the inheritance of the latter from the impossibility of correctly scaling to the larger side. From the increase in linear dimensions, the number of parts or shades per unit area no longer becomes. This limitation is imposed by the resolution of input devices (scanners, digital cameras, etc.).

Elements (objects) of vector graphics. Objects and their attributes

The main logical element of vector graphics is a geometric object. The object is simple geometric shapes (so-called primitives - a rectangle, a circle, an ellipse, a line), compound figures or figures constructed from primitives, color fillings, including gradients.

An important object of vector graphics is the spline. Spline is a curve by means of which one or another geometric figure is described. On the splines are built modern fonts TryeType and PostScript.

The objects of vector graphics are easily transformed and modified, which has almost no effect on the quality of the image. Scaling, turning, bending can be reduced to a pair of three elementary transformations over vectors.

If in the raster graphics the base element of the image is a point, then in the vector graphic there is a line. The line is described mathematically as a single object, and therefore the amount of data to display the object using vector graphics is significantly smaller than in the raster graphics.

The line is an elementary object of vector graphics. Like any object, the line has the properties: shape (straight, curve), thickness, color, inscription (solid, dotted). Closed lines acquire the filling property. The space covered by them can be filled with other objects (textures, maps) or with the selected color. The simplest non-closed line is bounded by two points, called nodes. Nodes also have properties whose parameters affect the shape of the end of the line and the nature of the interface with other objects. All other objects of vector graphics are made up of lines. For example, a cube can be made up of six connected rectangles, each of which, in turn, is formed by four connected lines.

Each PC user constantly encounters different formats of text files, but hardly thinks about how rich the history of these formats and programs, gave the person the opportunity to read books, work with the text and create all the necessary documentation directly on the computer.

The history of text files is not much younger than themselves personal computers - already their masterpieces were written in the first analogues of the modern "notebook". So what are the formats of text files and programs for working with them? First you need to understand what text files are for, what are the differences between them and what they have in common. It unites absolutely all text formats their main task - saving text information. They differ in the processing capabilities and access to the information stored in the files in terms of compatibility with other programs.

The simplest text format is traditionally the TXT format. It is the most modest in terms of features and the oldest text format. Due to its simplicity (TXT capabilities are limited to typing and breaking it into paragraphs), this format is often used by a huge number of applications and programs on a variety of platforms.

With the proliferation of personal computers and the increase in their sales, Microsoft is creating another popular format, called the Rich Text Format (or simply RTF). It is a text that is marked with the help of certain "control words", which allow not only to produce, but also to preserve complex formatting elements and to insert formulas, tables, figures, footnotes and footnotes into the text.

However, RTF is quite inferior in capabilities to the DOC format, also created by Microsoft specifically for the software package called Microsoft Office. Created more than fifteen years ago, DOC includes a huge number of opportunities for formatting and processing text, creating, editing and placing images, diagrams, tables and other elements. It should be noted that the most correct these functions will work only in MS Word. This is due primarily to the fact that Microsoft is not the current specifications of the DOC format and does not allow its competitors and independent developers to use the capabilities of this format to the fullest. This fact is one of the main reasons that in addition to the DOC format, other formats of text files are widely used nowadays.

The main difference between the DOC format and text and TXT is its binaryity, which makes it unreadable in such simple ones as Wordpad, Lexicon, Atlantis. Moreover, in some cases it is possible to observe incompatibility of DOC-files created in different versions of MS Word.

Formats of text files can be opened and edited in a huge number of programs. In addition to the previously mentioned MS Word, the most common of them are StarOffice, released by Sun Microsystems, WordPerfect from Corel and a free package OpenOffice.org.

With the proliferation of electronic readers, other types of text files are gaining popularity, for example, FB2 and LRF.

In order to be able to use different text formats on different platforms, a large number of programs have been created, called converters. Text file converters allow you to save source code from one format to another and use it later on various devices and platforms.

Converters are used not only to save text from one format to another, but also to create files that, unlike their source, can be used on devices that can not "read" the original files. For example, some e-books that do not support popular text file formats can easily recognize LRF or FB2 formats obtained from source files using converter programs.

Once the text data was placed only in one kind of container - TXT. There were no others. Now their number, perhaps, is approaching fifty. We use somehow constantly, we rarely encounter others. About the existence of third, we do not even suspect. Consider the most common text data stores in terms of convenience in use.
<<>>

ТХТ ("simple text")

The ancestor of the "genre". Actively used to this day. Since the text is stored as a sequence of characters, the size of the file in bytes is equal to the number of characters plus non-printable characters (space character, tabs, end of paragraph sign and others - they are also called formatting signs). Due to this, a small file size is achieved. However, the possibilities for formatting such documents are very limited. In fact - it's just text. Text data can be stored not only in containers with the extension of TXT. In fact, these extensions are not mandatory. Rename TXT to DOC, nothing will change. The internal structure will remain the same. Similarly, by changing the DOC extension to TXT, you get the same "vordian" file. Why then need these three letters after the point? For the correct interpretation of the programs that open them by default.

RTF (Rich Text Format)

A free cross-platform storage format for markup text documents, created by Microsoft in 1987. Today it is widely distributed, so most modern text editors support it. Having created RTF on the Windows platform, it will be perfectly read and edited on other platforms (Apple, Linux and others). The de facto standard in printing. However, not all programs create it equally well. It is noticed that in the document created in OpenOffice, formatting sometimes flew, and some of the text turned into unreadable symbols.

RTF allows you to produce and save a fairly complex formatting, insert footnotes, footers, drawings, tables and formulas, although in this it is still inferior to the DOC format. He concedes DOC and in the volume of files: complex documents are more compactly stored in DOC-files (simple - vice versa). However, RTF wins a dispute with DOC regarding security, since it does not use macros. Therefore, Word files infected with macro viruses can be "cured" by saving to RTF-format. In addition, the RTF format is resistant to file corruption. If you change at least one byte in the DOC file, it will no longer open in Word. And corrupting a file in RTF format can only lead to the loss of a corrupted piece of text.

DOC (from the English "document")

Initially, this extension was used to refer to simple text files without formatting, but in the early 90's, Microsoft actually "privatized" it. Therefore now DOC is associated only with the products of this company. This format provides great opportunities for formatting the text (scripts included, macros). Due to this, compatibility with text editors of third-party developers has deteriorated. A file of this format contains a huge amount of information about fonts, character tracing, paragraph indentations and intervals, even if you do not need all this. It is because of this additional information that the file containing only text exceeds the size of the RTF file. However, when you include various graphic elements and images in the document, the DOC wins in size and provides greater compatibility. Unlike TXT and RTF, DOC is a binary format, which makes it unreadable in simple text editors. For example, the Notepad can view some RTF files. It is popular on a par with RTF.

DOCX

With the advent of Office 2007, Microsoft has moved to new formats based on Office Open XML (visually distinguished by the addition of the letter "x" at the end to the extensions). The format is a zip-archive containing text in the form of XML, graphics and other data. To reduce the file size, ZiP compression is used. The documents are backward compatible with Office 2000 / XP / 2003 only if the Microsoft Office Compatibility Pack is installed (you can find and download it from the official Microsoft website, the file size is 27.8 MB). If you need to quickly convert DOCX to another format, use the services of the site http://docx-converter.com/. If you use latest version Office and plan to transfer files to someone, save the documents in RTF or DOC.

ODT / ODF ("Open Document Format")

ODF - common name open format documents for office applications (text, tables, figures, databases, presentations). Text data is stored in files with the extension ODT. The standard was developed by the industrial community OASIS and is based on XML format. On May 1, 2006 it was adopted as an international standard ISO / IEC 26300. ODF is available to all and can be used without restrictions. Such a free alternative to the closed formats of Microsoft. In order to read and write the ODF format in Microsoft products, the Sun ODF Plugin for Microsoft Office plug-in was released. Support for ODF in Microsoft Office 2007 should be introduced with the release of Service Pack 2. Unfortunately, it is still inferior to the prevalence of RTF and DOC.

HTML

(from the English Hypertext Markup Language - "hypertext markup language")

Standard markup language for documents on the Internet (extension.htm / html). Web pages are created using HTML (or XHTML). HTML was developed by the British scientist Tim Berners-Lee in 1991 as a language for the exchange of scientific and technical documentation, suitable for use by people who are not experts in the field of imposition. The text with HTML markup should be reproduced on various devices without stylistic and structural distortions. However, later the active introduction of multimedia and graphic design has destroyed these plans. To view HTML-documents do not need special editors, enough standard tools built into the OS. By openness, indexability, convertibility and readability is superior to any other formats. Unfortunately, the schedule is saved in a separate folder. Internet Explorer allows you to save text and graphics in one MNT document, but other browsers may not open a similar file.

СНМ (Compiled HTML)

SNM, in fact, is a set of compiled HTML documents, something like an archive from web pages, due to which its size is smaller. To view the utility, built-in Windows 98 / NT and higher is used. There are also third-party viewers. To create .chm files, you can use free remedy HTML Help Workshop. Now actively used as a reference for various applications.

PDF

(Portable Document Format-Portable Document Format)

Cross-platform format of electronic documents created by Adobe Systems using a number of PostScript features. First of all it is intended for representation in an electronic kind of polygraphic production. You can use the official free program Adobe Reader, as well as programs of other developers. Convenient is that the problem with floppy formatting, incorrect display of embedded graphic elements, lack of certain fonts is solved. The file on any platform will be displayed in the same form as it was created. The traditional way to create PDF documents is as follows: the document itself is prepared in its program, and then exported to PDF. Some programs have the ability to directly export (without using a virtual printer). For example, OpenOffice.org. In MS Word, there is no such option yet. The de facto standard for most documentation.

DjVu ("deja vu")

The technology of lossy image compression, designed specifically for storing scanned documents - books, magazines, manuscripts, etc., where the presence of formulas, diagrams, drawings and handwritten characters makes it extremely time consuming to fully recognize them. It is also an effective solution if you need to transfer all the nuances of the design, for example, historical documents. Very common, many libraries use it to store scanned scientific books. DjVu is sometimes called a "text-graphic" format. The essence of DjVu technology is the automatic splitting of images into several sections (for example, text, company logo and raster photo), for each of which an optimal compression algorithm is selected. In addition, the DjVu-file can contain a built-in interactive table of contents and active areas - links, which allows you to implement convenient navigation. Gives a win in the size of the file compared to the GIF-format on average a half to two dozen times.

XML-formats

("Extensible Markup Language")

There are quite a few text formats created for one particular device or program. For example, e-books. These include Rocket e-book (.rb), Microsoft Reader (.lit), PalmDoc, MobiPocket (.pro), etc. As a rule, they are all created using the XML language. The most successful and most common of these is the FictionBook format (FB2). At the moment this is the most progressive and promising format for e-books. Its only drawback is the long time spent in preparing the initial text. What pays off is the convenience of reading. In FictionBook, the emphasis is on structuring the document: using tags, you can select different text areas (chapters, headlines, citations, frames). How everything will look on the screen depends on the program-reader. If you want to draw a document in a certain way, you can attach a style sheet.

Why do you need a text?

Today, there are three most common text formats - TXT, RTF and DOC. What is their difference and what do they have in common? They have one thing in common: they all store textual information. The difference lies in the possibilities of formatting and processing the text they provide, as well as the extent to which the information stored in them regarding the compatibility of programs is available.

The simplest text format

The oldest and modest in terms of format. All that can be done with the text in this format is to produce the proper input of the text and save the paragraph break. This simplicity in certain situations acquires the importance of universality and transparency: TXT is easily available for reading in different applications and on different platforms. In addition, many programs that do not even have their own direct work with text, are able to save text in the format of TXT.

TXT-processors

Since DOS-ovskih times, many remember the word processor Lexicon, which was able to handle the TXT-format at a fairly high level. Today the main tool for working with TXT is the standard Windows Notepad. Anyone who does not seem to have enough of his functions can always find an editor for the taste and needs of the World Wide Web, including free of charge. For example, using the freeware Vega Konstantin Sheremetyev program, you are unlikely to see a message stating that the opened text file is too large; on the author's assurances, Vega version 2.04 opens files up to 2 Gb (!), and the program itself takes only 9.5 kb (compare, Notepad in Windows XP "weighs" about 65 kb); In this case, Vega is even more convenient than Notepad and does not require installation. And here is another example of the possibilities of processing "plain text". The text that you read was typed in an UltraEdit processor from IDM Computer Solutions. Its strong side is a special display and processing of the syntax of programming languages, but with the most straightforward text it can work wonders. For connoisseurs of handy Russified programs, ergonomic and, most importantly, "knowledgeable" in the specifics of Cyrillic encodings, it is worth getting acquainted with the Patriot program.

Formatting and universality

Rich Text Format - this stands for abbreviation, which stands in the name of the format created by Microsoft Corporation. RTF is a text marked with special "control words", which allows you to produce and save a fairly complex formatting, insert footnotes, footers, drawings, tables and formulas, although in processing these additional objects RTF is inferior to the DOC format. He concedes the DOC and the amount of files: the use of "control words" for formatting text instead of a style table does not lead to compactness. However, RTF wins a dispute with DOC regarding security, because its internal organization does not provide storage of a macro code and, therefore, is immune to macro viruses.

RTF-processors

RTF is used as the primary or supported format in many, if not most, word processing programs. A good tool can serve, for example, Hieroglyph Mikhail Morozov. In this program, not only the spelling of the Russian language is implemented, but also the function of automatic change of the language keyboard layout. The Atlantis text processor from Rising Sun Solutions, existing both in commercial and in free versions, will surely suit many users by the thoughtfulness of the interface, the presence of a large number of keyboard shortcuts, an exchangeable toolbar and other functions. With the RTF is able to work and the already mentioned editor Patriot.

The most "large" text format

The DOC format includes the widest possibilities of processing and formatting text, including the creation of footnotes and comments, as well as the creation, placement and editing of tables, diagrams, images and other elements. True, in full and most correctly all these features are implemented only in MS Word, which is facilitated by Microsoft's position, which does not disclose the current specifications of the popular format. Despite the fact that DOC "understand" and other programs, their manufacturers do not always manage to ensure its correct recognition. Unlike TXT and RTF, DOC is a binary format, which makes it unreadable in simple text editors and, moreover, does not provide full compatibility of its own versions.

DOC-processors

The main and, in view of the above mentioned reasons, the "irreplaceable" word processor for working with DOC is MS Word, which most fully implements all the features of this format. A lot of productivity and functionality WORD added third-party development - all kinds of add-ons, macros and programs exist in large numbers on the network. Vordu competition is provided, for example, by Corel's WordPerfect, Sun Microsystems StarOffice and free OpenOffice.org. Working in both Word and other programs, one should keep in mind the problem of format compatibility and save the document to DOC only if you are sure that incompatibilities will not arise.

Applicability of formats

It is groundless to state that one of the formats considered is worse than the others, without taking into account the specific features of the problems for which they should be used. Since we will not set ourselves the task of making a layout in a word processor, the choice is almost unambiguous. To prepare the volume of text from medium to very large and to provide a "full understanding" of the typesetting typed by any program, it is most convenient to use the simplest, compact and versatile means of typing and storing text - the TXT format. As for the use of other text formats in layout, very much depends on the implementation of their support in a specific layout program.
OpenOffice.org is an open source international project aimed at creating a universal office suite that runs on different operating platforms, with an open API and an XML-based file format. In fact, OpenOffice.org is a set of programs developed within the framework of this project. It includes: a word processor, spreadsheets, graphics editor, a presentation system and a data access system. In terms of its capabilities, it is comparable to similar commercial programs and may well be considered as an alternative to them. At present, OpenOffice.org is released under a double license: GPL and SISSL. Despite the differences in these licenses, for the end user OpenOffice.org is free.

OpenOffice.org derives its origin from the office suite StarOffice, developed by the German company StarDivision in the mid-90s. In the fall of 1999, Sun bought StarDivision. In June 2000, StarOffice 5.2 was released under the Sun trademark under MS Windows, Linux and Solaris. On October 13, 2000, StarOffice source code was opened (excluding the code for some modules developed by third companies), and this day is officially considered the birthday of OpenOffice.org. Today, over the OpenOffice.org code, there are volunteers from all over the world, as well as programmers from Sun.

Currently, two products are produced from the same source code developed by the OpenOffice.org community: StarOffice, which adds components under a proprietary license and free OpenOffice.org. In OpenOffice.org, most of the proprietary components present in StarOffice are replaced with their free counterparts.

(According to cnews.ru.)