What is a file system and how to find out the type of file system on disk. File systems. File system structure

The file system determines how data will be stored on disk, and what principles of access to stored information can be used when reading it.

We are used to perceiving information on our PC in the form of specific files neatly (or not very :)) arranged in folders. And, meanwhile, your computer works with data in a completely different way. On the hard drive for him there are no whole files. He "sees" only clearly addressed sectors with bytecode. Moreover, the code of one file is not always stored in neighboring sectors (the so-called data fragmentation).

How does the computer “understand” where, for example, where can it search for our text document, which lies, say, on the desktop? For this, it turns out, is responsible file system hard drive. And today we will find out what file systems are and what their features are.

What is a file system?

To understand what a file system is, it is best to use the analogy method. Imagine that a hard disk is a kind of box in which multi-colored cubes are stored. These cubes are parts of different files stored in limited-sized cells called clusters. They can simply be piled in a heap or have a specific order of placement. So, if these conditional cubes are not stored in a chaotic pile, but in accordance with some logic, we can talk about the presence of some kind of analogue of the file system.

The file system determines the order of data storage on the disk and the principles of access to it, however, in many respects the type of file system also depends on the type of medium. For example, it is obvious that for a magnetic tape that supports recording only sequential data blocks, only a single-level file system with sequential access to clusters with information is suitable, and for a modern SSD-drive, any multi-level with random access:

According to the principle of the sequence of storing data blocks, file systems, as we have already seen, can be divided into those that store clusters with file fragments consistently or arbitrarily. As for the levels, then FS can be divided into single-level and tree-like (multi-level).

In the first case, all files are displayed as a single flat list, and in the second - as a hierarchical one. The attachment level is usually unlimited, and the branching comes either from only one ("root" on UNIX) or from several root directories (logical drives on Windows):

The features of file systems can also include the presence of various mechanisms that protect the data structure from failures. One of the most modern mechanisms for ensuring FS resiliency is logging. It allows you to write to special service files (they are called "logs" or "logs") all the actions performed with the files.

Logging can be complete, when for each operation a backup is created not only of the state of the clusters, but also of all the recorded data. Such logging is often used for various databases, but it significantly slows down the system and increases the size of the logs (in fact, the logs store a full backup of the entire file system with all its data).

Much more often logged logical operations only and (optionally) the state of the file system clusters. That is, only that, say, a file with the name "file.txt" of 52 KB in size was written to such and such clusters. The contents of the same file in the log does not appear in any way. This approach allows you to avoid duplication of data, speeds up the process of working with files and reduces the size of the log by several times. The only drawback with this method of journaling is that the recorded data (if there is no copy) may be lost during a failure, but the state of the file system will remain operational.

Formatting

Since we are talking about file systems in the context of modern computers with their hard or SSD disks, we will pay more attention to multi-level file systems with random access to clusters. The most popular in the computer world today are: FAT32, NTFS, exFAT, ext3 / ext4, ReiserFS and HFS +.

Changing the file system on the disk is achieved by formatting. It provides for the creation at the level of the hard disk in its initial sector of special service marks that define the principles of access to data. At the same time, clusters with available data during formatting, as a rule, are cleared or marked as empty and available for rewriting. Exceptions are special cases. file system conversion (for example, from FAT32 to NTFS), in which the entire data structure is preserved.

For formatting, you can use the standard tools of the operating system (for example, Linux console commands or the context menu of a disk in Windows), functions available at the preparatory stage of OS installation, or special programs. The only thing that should be taken into account in a software solution is that your operating system may not support the file system of your choice without installing additional drivers (for example, ext3 / 4 on Windows):

There is also the concept of low level formatting. Initially, it meant cleaning the disk with writing special auxiliary information to its clusters to align the read heads. For modern hard disks, such a function is no longer provided at the program level (this can only be done with the help of special equipment), but the concept of low-level formatting has been preserved, though it has slightly transformed.

It is carried out now using special software (HDD Low Level Format Tool for Windows) or commands (DD for Linux). When it is applied, all clusters of the hard disk are overwritten with zeros and any markup is completely destroyed. After that, the file system actually disappears and appears on Windows as RAW. To access the disk after this formatting, you need to format it in one of the available traditional higher-level file systems.

File System Features

Well, now let's look at some features of the most common file systems.

Fat32

One of the oldest disk file systems that is still widely used these days is Fat32 (abbr. Eng. "File Allocation Table" - "table file allocation"). Due to its prevalence, it is supported by the maximum number of all kinds of equipment, starting with car radios, ending with powerful modern computers. Most flash drives sold today are also formatted in FAT32.

This FS first appeared in Windows 95 OSR2 in 1996, becoming the logical development of an even earlier FAT16 (1983). One of the main reasons for the transition to the new file system was the appearance of capacious (at that time) hard drives with a capacity of more than 2 GiB (gibibyte is a more accurate version of a gigabyte (109) - 230 bytes) (the maximum possible partition size in FAT16). FAT32 allowed the use of up to 268,435,445 clusters with a maximum of 32 KB, which is equivalent to 8 TiB per volume. However, if the cluster size is standard (512 B), then the maximum volume size will be just over 127 GB.

The basis of FAT32, as its name implies, is a file table. It stores in itself records about existing files, as well as about the time of their creation and the last access to them. There is no journaling, so the read / write processes in this file system are faster than, for example, in NTFS, which maintains more complete logs. It is because of the good performance that the FAT32 is still widely used today.

The main disadvantage of FAT32 at the moment is the restriction on the maximum file size - 4 GiB. Files that exceed this threshold must be divided into parts, which in turn makes access to them more difficult. In addition, FAT32 has some other limitations in the Windows environment. For example, using regular tools, you cannot create partitions larger than 32 GB. Therefore, flash drives with 64 GB or more will have to be formatted either using special software or on Linux.

However, in this case, although access to the medium will be preserved, it will be hindered by the “brakes” both when reading and writing data. Therefore, when using drives larger than 32 GB, it is better to format them in other file systems, such as exFAT or NTFS.

NTFS

If the Windows 95/98 line continued the tradition of the already outdated DOS operating system, then the new NT line was originally aimed at innovation. Therefore, with the advent of Windows NT 3.1 in 1993, a new file system was specially created for it. NTFS (abbr. Eng. "New Technology File System" - "file system of new technology").

This file system is still the main one for all modern versions of Windows, because it provides good speed, supports drives up to 16 EIB (exbytes - 260) (with a maximum cluster size of 64 KB) with no file size restrictions and has in its arsenal pretty good functionality. For example, NTFS is a journaling file system, and also supports the distribution of user roles for accessing individual data, which was not in the same FAT32.

As in FAT32, the basis of NTFS is a table, but it is a more advanced database and is called Mft (abbr. Eng. "Master File Table" - "the main file table"). The rows in this table correspond to the files stored on a particular section, and the columns contain the attributes of these files (creation date, size, access rights, etc.).

In addition, NTFS supports uSN magazine (abbr. Eng. "Update Sequence Number" - ext. "update order number"). This log, similarly to the FAT32 table, records data on changes to a file. However, if only the time of the last data access was recorded in the FAT32 table, which did not give any special practical benefits, then the previous state of the file system can be saved in USN, which allows it to be restored in case of failures.

Another feature of NTFS is support alternative data streams (Eng. "Alternate Data Streams" - ADS). Initially, they were conceived to distinguish between various processes. Then (in Windows 2000) they were used to store some file attributes (author name, icon, etc.), similar to how it was done in HFS from MacOS. In modern Windows, alternative streams can store almost any kind of information. Some viruses even use this to hide their presence in the system.

The fact is that alternative streams are not detected by Windows Explorer and, in fact, are invisible to users and most programs. However, you can view them and even use them, for example, to hide any data using special software. It is convenient to watch data in alternative streams using the NTFS Stream Explorer program, and use them to hide files using Xp-lore:

Of the additional features that deserve mention for NTFS are support for encryption, data compression, soft and hard links to files (for folders, alas, no such possibility), disk quotas for different users of the system, and, of course, differentiation of access rights to files.

NTFS was originally created exclusively for Windows, however, today it is supported by most media players (flash drives can also be formatted in it), Linux and MacOS operating systems (although with some recording restrictions). However, it is worth noting the weak support for NTFS on popular game consoles. Of these, only Xbox One has support for it.

exFAT

With the increase in the volume of flash drives in the second half of the 2000s, it became clear that the ubiquitous FAT32 file system would soon exhaust its potential. Using journaled NTFS for flash drives with their limited number of rewriting cycles and slower operation turned out to be not entirely advisable. Therefore, in 2006, the same Microsoft Corporation released a new file system. exFAT (abbr. Eng. "Extended FAT" - "extended FAT") complete with the operating system Windows Embedded CE 6.0:

It became a logical continuation of the development of FAT32, so sometimes it is also called FAT64. The trump card of the new file system was the removal of restrictions on file sizes and an increase in the theoretical limit for a disk partition to 16 EIB (as in NTFS). At the same time, due to the lack of journaling, exFAT has maintained high data access speed and compactness.

Another advantage of exFAT was the ability to increase the cluster size to 32 MB, which significantly optimized the storage of large files (for example, video). In addition, data storage in exFAT is organized in such a way as to minimize fragmentation and rewriting of the same clusters. All this has been done, again, for the sake of optimizing the operation of flash drives, for which the file system was originally developed.

Due to the fact that exFAT is a relatively new FS, there are some restrictions on its use. In Windows, its full support appeared only in Vista SP1 (although there is an update for Windows XP SP2 -). MacOS supports exFAT from version 10.6.5, and Linux requires the installation of a separate driver (it is built-in in some distributions, and read-only is supported in some).

ext2, ext3 and ext4

If NTFS has been the “ruler” of Windows for more than a decade, then the Linux camp traditionally has a very wide variety, including among the file systems used. True, there is one of their line, which is used by most distributions by default. These are the file systems of the family. ext (eng. abbr. "Extended File System" - "extended file system"), which since 1992 were originally created specifically for Linux.

The second version is most widely used. ext2, which, like NTFS, appeared back in 1993. True, unlike NTFS, ext2 is not a journaling file system. This is both its plus and minus. The plus is that it is one of the fastest filesystems for recording data. Also, the lack of journaling makes it preferable to use it on flash drives and SSDs. The payback for speed is low fault tolerance.

In order to improve ext2 stability, an improved version was developed in 2001 ext3. Logging appeared in it, which can work in three modes: "writeback" (only metadata of the file system is written), "ordered" (writing to the journal is always BEFORE changing the FS) and "journal" (full backup of metadata and the modified files themselves).

The rest of the special innovations did not appear. And the speed of work, in comparison with the previous version, has significantly decreased, therefore already in 2006 a prototype of the next stage of the file system development appeared ext4whose final release took place in 2008. The fourth extended file system kept journaling, but significantly increased the speed of reading data, which was even higher than in ext2!

Of the other innovations, it is worth noting an increase in the maximum volume of a disk partition to 1 TiB (from 32 TiB in ext2 and ext3), an increase in the maximum file size to 16 TiB (from 2 TiB in earlier versions) and the appearance of an extents mechanism (from the English "extent" - "space"). The latter allows you to access not single blocks, as is implemented in other FSs (and ext3 in particular), but to combined disk spaces from sequentially running clusters, with a total capacity of up to 128 MB, which significantly improves performance and reduces data fragmentation.

Today, support for ext file systems of one version or another is present by default in almost all Linux systems. Of these, almost all systems of 2010 and older support ext4. To access ext-partitions on Windows and MacOS, special software is required and / or drivers.

Reiserfs

Another young and promising Linux-based file system is Reiserfs. Through the efforts of the team of American developer Hans Reiser, it became the first journaling FS that was added to the Linux kernel version 2.4.1 in 2001, just before the addition of ext3 support.

In fact, like ext3 that came after it, ReiserFS made it possible to use full or partial logging in Linux. However, unlike ext3, it had a larger permissible file size (up to 8 TiB versus 2) and a maximum file name length of 255 characters, not bytes (4032 bytes).

Also, one of the features of ReiserFS, for which users liked it, was the ability to resize the partition without unmounting it. Ext2 did not have a similar function, but later it appeared in ext3, although ReiserFS was also the first in this regard.

Despite a number of advantages over alternative file systems of its time, ReiserFS was also not without drawbacks. The most significant of them should include a rather weak fault tolerance in case of damage to the metadata structure and an ineffective defragmentation algorithm. Therefore, in 2004, work began to improve the file system, which became known as Reiser4.

True, despite a number of innovations, improvements and corrections, the new file system has remained the lot of few enthusiasts. The fact is that in 2006, Hans Reiser committed the murder of his wife and was taken into custody, and later imprisoned. Accordingly, his company Namesys, which was engaged in the development of Reiser4, was disbanded. Since then, support and refinement of the file system has been carried out by a group of developers under the supervision of Russian developer Eduard Shishkin.

Ultimately, Reiser4 support has not yet been added to the Linux kernel, but ReiserFS is available. Therefore, many continue to use it in various assemblies as the default file system.

Hfs

Speaking about the file systems that are typical for various operating systems, one can not help but mention MacOS with its Hfs (abbr. Eng. "Hierarchical File System" - "hierarchical file system"). The first versions of this system appeared in 1985 along with the Macintosh System 1.0 operating system:

By modern standards, this file system was very inefficient, so in 1998, along with MacOS 8.1, its improved version called Hfs + or Mac OS Extendedwhich is supported to this day.

Like its predecessor, HFS + divides the disk into 512 KB blocks (by default), which combines them into clusters responsible for storing certain files. However, the new FS has 32-bit addressing (instead of 16-bit). This avoids restrictions on the size of the recorded file and provides support for a maximum volume size of up to 8 EIB (and in recent revisions up to 16 EIB).

Other advantages of HFS + include journaling (a whole hidden volume called HFSJ stands out for it), as well as multithreading. Moreover, if in NTFS alternative streams do not have particularly clear regulation on the types of stored information, then in HFS + two streams are specifically allocated: a data stream (stores the main data of files) and a stream with resources (stores file metadata).

HFS + is almost ideal for traditional HDDs, however, like the ReiserFS discussed above, it does not have the most effective algorithms to combat data fragmentation. Therefore, with the proliferation of SSDs and their introduction into Apple technology, it is increasingly being replaced by a file system developed in 2016 APFS (abbr. Engl. "Apple File System" - "Apple file system"), which appeared in the desktop macOS High Sierra (10.13) and mobile iOS 10.3.

In many ways, APFS is similar to exFAT in terms of optimizing read / write processes, however, unlike it, it has journaling, supports distribution of access rights to data, has improved encryption and data compression algorithms, and can also work with volumes up to 9 YB in size. (do not laugh - "yobibyte") due to 64-bit addressing!

The only disadvantage of APFS is that it is supported only by modern Apple technology and is not yet available on other platforms.

File System Comparison

Today we examined many different popular file systems, so it would not hurt to bring all the data about them into a single table:

Characteristics / FS	Fat32	NTFS	exFAT	ext2	ext4	Reiserfs	Hfs +	APFS
Year of implementation	1996	1993	2008	1993	2006	2001	1998	2016
Scope of application		Windows, removable drives, Linux	removable drives, Windows Vista +, Linux	Linux, removable drives	Linux	Linux	MacOS	MacOS
Max file size	4 GB	16 EIB	16 EIB	2 TiB	16 TiB	8 TiB	16 EIB	9 YiB
Max Volume Size	8 TiB	16 EIB	64 ZiB (Zebibyte)	32 TiB	1 EIB	16 TiB	16 EIB	9 YiB
Logging	-	+	-	-	+	+	+	+
Access rights management	-	+	-	-	+	+	+	+

findings

As you can see, each operating system has its own optimal file system, which allows you to work with data most efficiently. For example, for Windows it is NTFS, for MacOS it is HFS + or APFS. An exception to the rule can be considered only numerous Linux distributions. There are more than a dozen file systems, each with its own advantages and disadvantages.

Most Windows users should remember only the three most common FS: FAT32 - for small flash drives and old equipment, NTFS - for most computers and exFAT - for large flash drives and external SSDs (the relevance of formatting the system disk in exFAT is still argue over the lack of logging and the greater susceptibility to failure).

P.S. It is allowed to freely copy and cite this article provided that an open, active link to the source is provided and Ruslan Tertyshny’s authorship is preserved.

Material for the review lecture No. 33

for specialty students

"Information Technology Software"

associate Professor, Department of IWT, Ph.D. Livak E.N.

FILE MANAGEMENT SYSTEMS

Basic concepts, facts

Destination. File System FeaturesFATVFAT,FAT 32,HPFSNTFS. File systems of UNIX OS (s5, ufs), Linux Ext2FS OS. System areas of a disk (partition, volume). The principles of file allocation and storage of file location information. Organization of catalogs. Restricting access to files and directories.

Skills

Using knowledge of the file system structure to protect and restore computer information (files and directories). Organization of access control for files.

File systems. File system structure

Data on the disk is stored as files. A file is a named part of a disk.

File management systems are designed for file management.

The ability to deal with data stored in files at the logical level is provided by the file system. It is the file system that determines how data is organized on any data medium.

In this way, file system - a set of specifications and their corresponding software that are responsible for creating, destroying, organizing, reading, writing, modifying and moving file information, as well as for controlling access to files and managing the resources that are used by files.

File management system is the main subsystem in the vast majority of modern operating systems.

Using a file management system

· All system processing programs are connected according to data;

· Solves the problems of centralized distribution of disk space and data management;

· Provides the user with the ability to perform operations on files (creation, etc.), to exchange data between files and various devices, to protect files from unauthorized access.

Some operating systems may have several file management systems, which enables them to work with several file systems.

We will try to distinguish between a file system and a file management system.

The term “file system” defines the principles of access to data organized into files.

Term “File management system” refers to a specific file system implementation, i.e. this is a set of software modules that provide work with files in a specific OS.

So, to work with files organized in accordance with a certain file system, an appropriate file management system should be developed for each OS. This UV system will only work in the OS for which it was created.

For the Windows OS family, file systems are mainly used: VFAT, FAT 32, NTFS.

Consider the structure of these file systems.

In file system Fat The disk space of any logical drive is divided into two areas:

· System area and

· Data area.

System area It is created and initialized during formatting, and subsequently updated when the file structure is manipulated.

The system area consists of the following components:

· Boot sector containing boot record (boot record);

· Reserved sectors (they may not be);

· File allocation tables (FAT, File Allocation Table);

Root directory (ROOT)

These components are located on the disk one after another.

Data area contains files and directories subordinate to the root.

The data area is divided into so-called clusters. A cluster is one or more adjacent sectors of a data area. A cluster, on the other hand, is the smallest addressable unit of disk space allocated to a file. Those. a file or directory occupies an integer number of clusters. To create and write to a disk a new file, the operating system allocates several free disk clusters for it. These clusters do not have to follow each other. For each file, a list of all cluster numbers that are provided to this file is stored.

Dividing a data region into clusters instead of using sectors allows you to:

· Reduce the size of the FAT table;

· Reduce file fragmentation;

· Reduced file chain length Þ file access speeds up.

However, too large a cluster leads to inefficient use of the data area, especially in the case of a large number of small files (after all, on average, half a cluster is lost per file).

In modern file systems (FAT 32, HPFS, NTFS) this problem is solved by limiting the cluster size (maximum 4 KB)

The data area map is T file placement table (File Allocation Table - FAT) Each element of the FAT table (12, 16 or 32 bits) corresponds to one disk cluster and characterizes its state: free, busy or is a bad cluster.

· If the cluster is allocated to any file (ie, busy), then the corresponding FAT element contains the number of the next cluster of the file;

· The last cluster of the file is marked with a number in the range FF8h - FFFh (FFF8h - FFFFh);

· If the cluster is free, it contains the zero value 000h (0000h);

· A cluster unsuitable for use (failed) is indicated by the number FF7h (FFF7h).

Thus, in the FAT table, clusters belonging to one file are linked in chains.

The file allocation table is stored immediately after the boot record of the logical disk, its exact location is described in a special field in the boot sector.

It is stored in two identical copies that follow each other. If the first copy of the table is destroyed, the second is used.

Due to the fact that FAT is used very intensively when accessing the disk, it is usually loaded into the OP (in the I / O buffer or cache) and remains there for as long as possible.

The main disadvantage of FAT is its slow file handling. When creating a file, the rule works - the first free cluster is allocated. This leads to disk fragmentation and complex file chains. From here follows a slowdown in working with files.

To view and edit the FAT table, you can use utilityDiskEditor.

Detailed information about the file itself is stored in another structure called the root directory. Each logical drive has its own root directory (ROOT, English - root).

Root directory describes files and other directories. An element of a directory is a file descriptor (descriptor).

The descriptor of each file and directory includes it.

Name

· Expansion

· Date of creation or last modification

· Creation time or last modification

· Attributes (archived, directory attribute, volume attribute, system, hidden, read-only)

· File length (for the directory - 0)

· Reserved field that is not used

· The number of the first cluster in the chain of clusters allocated to a file or directory; Having received this number, the operating system, referring to the FAT table, will also find out all the other cluster numbers of the file.

So, the user runs the file for execution. The operating system searches for the file with the desired name by looking at the file descriptions in the current directory. When the required element is found in the current directory, the operating system reads the number of the first cluster of this file, and then determines the remaining cluster numbers from the FAT table. Data from these clusters are read into RAM, combining into one continuous section. The operating system transfers control to the file, and the program starts to work.

You can also use ROOT to view and edit the root directory. utilityDiskEditor.

File system Vfat

The VFAT (Virtual FAT) file system first appeared in Windows for Workgroups 3.11 and was designed for file I / O in protected mode.

This file system is used in Windows 95.

It is also supported on Windows NT 4.

VFAT is the native 32-bit Windows 95 file system. It is controlled by the VFAT .VXD driver.

VFAT uses 32-bit code for all file operations; it can use 32-bit protected-mode drivers.

BUT, the elements of the file allocation table remain 12- or 16-bit, so the same data structure (FAT) is used on the disk. Those. f table formatVfat is the same, like the FAT format.

VFAT along with the names "8.3" supports long file names. (It is often said that VFAT is FAT with support for long names).

The main disadvantage of VFAT is the large loss of clustering with large logical disk sizes and restrictions on the logical disk size itself.

File system Fat 32

This is a new implementation of the idea of \u200b\u200busing a FAT table.

FAT 32 is a fully self-contained 32-bit file system.

First used in Windows OSR 2 (OEM Service Release 2).

Currently, FAT 32 is used in Windows 98 and Windows ME.

It contains numerous enhancements and additions to previous FAT implementations.

1. Much more efficiently consumes disk space due to the fact that it uses smaller clusters (4 Kb) - it is estimated that it saves up to 15%.

2. Has an extended boot record that allows you to create copies of critical data structures Þ increases disk resistance to violations of disk structures

3. Can use FAT backup instead of standard.

4. Can move the root directory, in other words, the root directory can be in any place Þ removes the restriction on the size of the root directory (512 elements, because ROOT should have occupied one cluster).

5. Improved root directory structure

Additional fields appeared, for example, creation time, creation date, last access date, checksum

Still, multiple descriptors are used for the long file name.

File system HPFS

HPFS (High Performance File System) is a high-performance file system.

HPFS first appeared in OS / 2 1.2 and LAN Manager.

We list main features of HPFS.

· The main difference is the basic principles of placing files on disk and the principles of storing information about the location of files. Thanks to these principles, HPFS has high performance and fault tolerance, is reliable file system.

· Disk space in HPFS is not allocated by clusters (as in FAT), but in blocks.In the current implementation, the block size is taken to be one sector, but in principle it could be of a different size. (In fact, a block is a cluster, only a cluster is always equal to one sector). Placing files in such small blocks allows use disk space more efficiently, since unproductive losses of free space are on average a total (half-sector) of 256 bytes per file. Recall that the larger the cluster size, the more disk space is wasted.

· The HPFS system seeks to place the file in adjacent blocks, or, if this is not possible, place it on disk so that extents(fragments) of the file were physically as close to each other as possible. This approach is essential reduces write / read head positioning time hard disk and standby time (delay between installing the read / write head on the desired track). Recall that in the FAT file, the first free cluster is simply allocated.

Extents (extent) - file fragments located in adjacent sectors of the disk. A file has at least one extent if it is not fragmented, otherwise there are several extents.

· Used method balanced binary trees for storing and searching for information on the location of files (directories are stored in the center of the disk, in addition, automatic sorting of directories is provided), which is essential improves productivity HPFS (compared to FAT).

· HPFS provides special advanced file attributes that allow control access to files and directories.

Extended Attributes (extended attributes, EAs ) allow you to store additional information about the file. For example, each file can be associated with its unique graphic image (icon), file description, comment, information about the owner of the file, etc.

C HPFS partition structure

At the beginning of the section with installed HPFS there are three control blocks:

· Boot block

· An additional block (super block) and

· Spare (reserve) block (spare block).

They occupy 18 sectors.

The rest of the disk space in HPFS is divided into parts from adjacent sectors - stripes (band - strip, tape). Each strip occupies 8 MB of disk space.

Each lane and has its own sector allocation bitmap.Bit map shows which sectors of this band are busy and which are free. Each sector of the data band corresponds to one bit in its bitmap. If bit \u003d 1, then the sector is busy, if 0 is free.

Bitmaps of two bands are located on the disk side by side, the bands themselves are also located. That is, the sequence of stripes and cards looks like in Fig.

Comparable toFat. There is only one “bitmap” on the entire disk (FAT table). And to work with it, you have to move the read / write heads on average through half the disk.

It is in order to reduce the time of positioning the read / write heads of the hard disk, in HPFS the disk is divided into strips.

Consider control blocks.

Boot block (bootblock)

Contains the name of the volume, its serial number, BIOS parameter block, and boot program.

Bootstrap finds fileOS 2 LDR , reads it into memory and transfers control to this OS boot program, which, in turn, loads the OS / 2 kernel from disk into memory -OS 2 KRNL. And already OS 2 KRIML using file informationCONFIG. Sys loads into memory all other necessary program modules and data blocks.

The boot block is located in sectors 0 through 15.

SuperBlock (super block)

Contains

· Pointer to the list of bitmaps (bitmap block list). This list lists all the blocks on the disk that contain the bitmaps used to detect free sectors;

· Pointer to the list of bad blocks (bad block list). When the system detects a damaged unit, it is included in this list and is no longer used to store information;

· Pointer to a directory group

· Pointer to the file node (F -node) of the root directory,

· Date of the last check of the section by the CHKDSK program;

· Information about the size of the band (in the current implementation of HPFS - 8 MB).

Super block is located in sector 16.

Spareblock (spare block)

Contains

· Pointer to the emergency replacement map (hotfix map or hotfix -areas);

· Pointer to the list of spare spare blocks (directory emergency free block list);

· A number of system flags and descriptors.

This block is located in the 17th sector of the disk.

The backup unit provides high fault tolerance of the HPFS file system and allows you to recover damaged data on disk.

The principle of file allocation

Extents (extent) - file fragments located in adjacent sectors of the disk. A file has at least one extent if it is not fragmented, otherwise there are several extents.

To reduce the positioning time of the read / write heads of the hard drive, the HPFS system seeks

1) place the file in adjacent blocks;

2) if this is not possible, then place the extents of the fragmented file as close as possible to each other,

To do this, HPFS uses statistics, and also tries to conditionally reserve at least 4 kilobytes of space at the end of the files that grow.

Principles for storing file location information

Each file and disk directory has its own f-Node file node. This is a structure that contains information about the location of the file and its extended attributes.

Every F-Node Takes one sector and is always located near your file or directory (usually just before the file or directory). F-Node Object Contains

· Length

· The first 15 characters of the file name,

· Special service information,

· Statistics on access to the file,

· Advanced file attributes,

· A list of access rights (or only part of this list, if it is very large); if the extended attributes are too large for the file node, then a pointer to them is written to it.

· Associative information about the location and submission of the file, etc.

If the file is continuous, then its placement on the disk is described by two 32-bit numbers. The first number is a pointer to the first block of the file, and the second is the length of the extent (the number of successive blocks belonging to the file).

If the file is fragmented, then the placement of its extents is described in the file node with additional pairs of 32-bit numbers.

A maximum of eight file extents can be placed in a file node. If the file has more extents, then a pointer to the allocation block is written to its file node, which can contain up to 40 pointers to extents or, similar to the directory tree block, to other allocation blocks.

Directory structure and placement

Used to store directories a strip located in the center of the disk.

This strip is called directoryband.

If it is full, HPFS begins to place the file directories in other bands.

The location of this information structure in the middle of the disk significantly reduces the average positioning time of the read / write heads.

However, a significantly greater (compared with the placement of the Directory Band in the middle of the logical drive) contribution to the performance of HPFS gives the use of method balanced binary trees for storing and searching file location information.

Recall that in the file systemFat the directory has a linear structure that is not specially ordered in a special way, therefore, when searching for a file, it is necessary to browse it sequentially from the very beginning.

In HPFS, the directory structure is balanced tree with entries in alphabetical order.

Each entry in the tree contains

· File attributes,

· Pointer to the corresponding file node,

· Information about the time and date of creation of the file, the time and date of the last update and access,

· The length of the data containing the extended attributes,

· File access counter,

· File name length

· The name itself

· And other information.

The HPFS file system, when searching for a file in a directory, scans only the necessary branches of the binary tree. Such a method is many times more effective than sequential reading of all entries in the directory, which takes place in the FAT system.

The size of each of the blocks in terms of which directories are allocated in the current HPFS implementation is 2 KB. The size of the entry describing the file depends on the size of the file name. If the name occupies 13 bytes (for the 8.3 format), then the 2-KB block holds up to 40 file descriptors. Blocks are linked to each other through a list.

Problems

When renaming files, a so-called rebalancing of the tree may occur. Creating a file, renaming or erasing may result in cascading directory blocks. In fact, renaming may fail due to lack of disk space, even if the file itself has not increased in size. To avoid this “disaster”, HPFS supports a small pool of free blocks that can be used in a “crash”. This operation may require the allocation of additional blocks on a full disk. A pointer to this pool of free blocks is stored in SpareBlock,

The principles of placing files and directories on disk inHPFS:

· Information about the location of files is dispersed throughout the disk, while the records of each specific file are placed (if possible) in adjacent sectors and near data on their location;

· Directories are located in the middle of disk space;

· Directories are stored in a balanced binary tree with entries arranged in alphabetical order.

HPFS Storage Reliability

Any file system should have the means to correct errors that occur when writing information to disk. HPFS uses emergency replacement mechanism ( hotfix).

If the HPFS file system encounters a problem while writing data to disk, it displays the corresponding error message. HPFS then stores the information that was supposed to be written to the defective sector in one of the spare sectors pre-reserved for this case. A list of spare spare blocks is stored in the HPFS backup block. If an error is detected while writing data to a normal block, HPFS selects one of the spare spare blocks and stores this data in it. Then the file system updates emergency replacement card in the backup unit.

This card is simply a pair of double words, each of which is a 32-bit sector number.

The first number indicates the defective sector, and the second indicates the sector among the available spare sectors that was selected to replace it.

After replacing the defective sector with a spare, the emergency replacement card is written to the disk, and a pop-up window appears informing the user about the error that occurred on the disk. Each time the system writes or reads a sector of a disk, it looks at the emergency replacement card and replaces all the numbers of defective sectors with the numbers of spare sectors with the corresponding data.

It should be noted that this number conversion does not significantly affect system performance, since it is performed only when the disk is physically accessed, but not when reading data from the disk cache.

File system NTFS

The NTFS (New Technology File System) file system contains a number of significant improvements and changes that significantly distinguish it from other file systems.

Note that with rare exceptions, with nTFS partitions can only be used directly fromWindowsNT although for some operating systems there are corresponding implementations of file management systems for reading files from NTFS volumes.

However, there are no full-fledged implementations for working with NTFS outside of Windows NT yet.

NTFS is not supported on the widespread Windows 98 and Windows Millennium Edition operating systems.

Key FeaturesNT FS

· Work on large disks is efficient (much more efficient than FAT);

· There are tools to restrict access to files and directories Þ NTFS partitions provide local security for both files and directories;

· A transaction mechanism has been introduced in which loggingfile operations Þ significant increase in reliability;

· Many restrictions on the maximum number of disk sectors and / or clusters have been removed;

· The file name in NTFS, unlike the FAT and HPFS file systems, can contain any characters, including the full set of national alphabets, since the data is presented in Unicode - a 16-bit representation that gives 65535 different characters. The maximum length of a file name in NTFS is 255 characters.

· NTFS also has built-in compression tools that can be applied to individual files, entire directories and even volumes (and subsequently cancel or assign them as you wish).

NTFS File System Volume Structure

An NTFS partition is called a volume. The maximum possible volume sizes (and file sizes) are 16 Ebytes (exabyte 2 ** 64).

Like other systems, NTFS divides the disk space of a volume into clusters — data blocks addressed as data units. NTFS supports cluster sizes from 512 bytes to 64 KB; the standard is considered a cluster of 2 or 4 KB.

All disk space in NTFS is divided into two unequal parts.

The first 12% of the disk is allocated to the so-called MFT zone - the space that can be occupied, increasing in size, is the main service metafileMFT.

Writing any data to this area is not possible. The MFT zone always keeps empty - this is done so that the MFT file does not fragment as much as possible when it grows.

The remaining 88% of the volume is the usual file storage space.

MFT (masterfiletable -general table of files) in fact - this is a directory of all other files on the disk, including itself. It is designed to determine the location of files.

MFT consists of fixed size records. The size of the MFT record (minimum 1 Kb and maximum 4 Kb) is determined during formatting the volume.

Each entry corresponds to a file.

The first 16 entries are official and not available to the operating system - they are called metafiles and the very first metafile is MFT itself.

These first 16 elements of the MFT are the only part of the disc that has a strictly fixed position. A copy of the same 16 entries is stored in the middle of the volume for reliability.

The rest of the MFT file can be located, like any other file, in arbitrary places on the disk.

Metafiles are of a service nature - each of them is responsible for some aspect of the system. Metafiles are located in the root directory of an NTFS volume. They all begin with the symbol of the name "$", although it is difficult to get any information about them using standard means. In the table. The main metafiles and their purpose are shown.

Metafile name	Metafile Assignment
$ Mft	Master File Table itself
$ MFTmirr	A copy of the first 16 MFT entries in the middle of the volume
$ Logfile	Logging Support File
$ Volume	Service information - volume label, file system version, etc.
$ AttrDef	List of standard file attributes on a volume
		Root directory
$ Bitmap		Volume Free Space Map
$ Boot		Boot sector (if the partition is bootable)
$ Quota		A file in which user rights to use disk space are recorded (this file only started working inWindows 2000 with NTFS 5.0)
$ Upcase		File - a table of correspondence between upper and lower case letters in file names. In NTFS, file names are written toUnicode (which is 65 thousand different characters) and looking for large and small equivalents in this case is a non-trivial task

The corresponding MFT record stores all information about the file:

· file name,

· the size;

· File attributes;

· The position on the disk of individual fragments, etc.

If one MFT record is missing for information, then several records are used, and not necessarily consecutively.

If the file is not very large, then the file data is stored directly in the MFT, in the place left from the main data within the same MFT record.

The file in the NTFS volume is identified by the so-called file link (File Reference), which is represented as a 64-bit number.

· File number that corresponds to the record number in the MFT,

· And sequence numbers. This number is incremented whenever this number is reused in the MFT, allowing the NTFS file system to perform internal integrity checks.

Each file in NTFS is represented using flows (streams), that is, it doesn’t have “just data” as such, but it has streams.

One of the streams is the file data.

Most file attributes are streams too.

Thus, it turns out that the basic essence of the file is only one - the number in the MFT, and everything else, including its streams, is optional.

This approach can be effectively used - for example, you can stick another file to the file by writing any data to it.

The standard attributes for files and directories in an NTFS volume have fixed names and type codes.

Catalog NTFS is a special file that stores links to other files and directories.

The catalog file is divided into blocks, each of which contains

· file name,

· Basic attributes and

The root directory of the disk is no different from ordinary directories, except for a special link to it from the beginning of the MFT metafile.

The internal directory structure is a binary tree, as in HPFS.

The number of files in the root and non-root directories is unlimited.

The NTFS file system supports the NT security object model: NTFS treats directories and files as heterogeneous objects and maintains separate (albeit overlapping) authorization lists for each type.

NTFS provides file-level security; this means that access rights to volumes, directories and files may depend on the user account and the groups to which it belongs. Each time a user accesses a file system object, his access rights are checked against the list of permissions of this object. If the user has a sufficient level of rights, his request is satisfied; otherwise, the request is rejected. This security model is used both for local user registration on NT computers and for remote network requests.

NTFS also has certain self-healing options. NTFS supports various mechanisms for checking the integrity of the system, including transaction logging, which allows you to play back file operations using a special system log.

At journaling file operations, the file management system records changes in a special service file. At the beginning of the operation associated with changing the file structure, the corresponding note is made. If during operations with files some kind of failure occurs, then the mentioned mark about the beginning of the operation remains indicated as incomplete. When performing the file system integrity verification procedure after the machine reboots, these incomplete operations will be canceled and the files will be restored to their original state. If the operation of changing the data in the files completes normally, then in this very logging support file the operation is marked as completed.

The main drawback of the file systemNTFS - service data takes up a lot of space (for example, each directory item takes 2 KB) - for small partitions service data can take up to 25% of the media.

Þ NTFS cannot be used to format floppy disks. You should not use it to format partitions of less than 100 MB.

OS file systemUnix

In the UNIX world, there are several different types of file systems with their own external memory structure. The most famous are the traditional UNIX System V (s5) file system and the UNIX BSD (ufs) file system family.

Consider s 5.

A UNIX file is a set of characters with random access.

The file has the structure that the user imposes on it.

The Unix file system is a hierarchical, multi-user file system.

The file system has a tree structure. The vertices (intermediate nodes) of the tree are directories with links to other directories or files. The leaves of the tree correspond to files or empty directories.

Comment. In fact, the Unix file system is not woody. The fact is that in the system there is the possibility of breaking the hierarchy in the form of a tree, since there is the possibility of associating multiple names with the same file contents.

Disk structure

The disk is divided into blocks. The size of the data block is determined when formatting the file system with the mkfs command and can be set to 512, 1024, 2048, 4096 or 8192 bytes.

We count 512 bytes (sector size).

Disk space is divided into the following areas (see. Fig.):

· Boot block;

· Superblock control;

· Array of i-nodes;

· Area for storing the contents (data) of files;

· Set of free blocks (linked to a list);

Initial boot block

Super block

i - node

. . .

i - node

Comment. For the UFS file system - all this is repeated for a group of cylinders (except for the Boot block) + a special area is allocated for describing a group of cylinders

Boot block

The block is located in block No. 0. (Recall that the location of this block in the zero block of the system device is determined by the hardware, since the hardware loader always refers to the zero block of the system device. This is the last component of the file system that depends on the hardware.)

The boot block contains a spin-up program that serves to initially launch UNIX. In s 5 file systems, the boot block of only the root file system is actually used. In additional file systems, this area is present but not used.

Super block

It contains operational information about the state of the file system, as well as data on the file system settings.

In particular, the superblock contains the following information

· The number of i-nodes (inodes);

· Section size ???;

· List of free blocks;

· List of free i-nodes;

· and other.

Pay attention! Free disk space forms linked list of free blocks. This list is stored in the superblock.

The elements of the list are arrays of 50 elements (if block \u003d 512 bytes, then element \u003d 16 bits):

· In the elements of the array No. 1-48 the numbers of free blocks of the space of file blocks from 2 to 49 are recorded.

· In the 0 element contains a pointer to the continuation of the list, and

· The last element (No. 49) contains a pointer to a free element in the array.

If some process requires a free block to expand the file, then the system selects an array element by pointer (to a free element), and the block with the number stored in this element is provided to the file. If the file is shortened, then the released numbers are added to the array of free blocks and the pointer to the free element is adjusted.

Since the size of the array is 50 elements, two critical situations are possible:

1. When we free blocks of files, and they cannot fit in this array. In this case, one free block is selected from the file system and the completely filled array of free blocks is copied to this block, after which the value of the pointer to the free element is reset, and in the zero element of the array, which is in the superblock, the number of the block that the system selected to copy the contents of the array is written. At this moment, a new element of the list of free blocks is created (each with 50 elements).

2. When the contents of the elements of the array of free blocks are exhausted (in this case, the zero element of the array is zero) If this element is not equal to zero, then this means that there is a continuation of the array. This continuation is read into a copy of the superblock in RAM.

List of freei-nodes. This is a buffer consisting of 100 elements. It contains information about 100 rooms of i-nodes that are currently free.

Superblock is always in RAM

Þ all operations (release and occupation of blocks and i-nodes occur in RAM Þ minimize disk swaps.

But! If the contents of the superblock are not written to the disk and the power is turned off, then problems will arise (mismatch between the real state of the file system and the contents of the superblock). But this is a requirement for the reliability of the system hardware.

Comment. In UFS file systems, to increase stability, multiple copies of the superblock are supported (one copy per cylinder group)

Inode scope

This is an array of file descriptions called i-nodes (i -node). (64 bytes?)

Each index descriptor (i-node) of a file contains:

File Type (file / directory / special file / fifo / socket)

· Attributes (access rights) - 10

· File Owner Identifier

· File Owner Group Identifier

· File creation time

· File modification time

· Last access time to file

· File length

· The number of links to this i-node from various directories

· File block addresses

! note. No file name here

Let's take a closer look at how it is organized. block addressingwhere the file is located. So, in the address field are the numbers of the first 10 blocks of the file.

If the file exceeds ten blocks, the following mechanism starts to work: the 11th element of the field contains the block number in which 128 (256) links to blocks of this file are located. If the file is even larger, then the 12th element of the field is used — it contains the block number, which contains 128 (256) block numbers, where each block contains 128 (256) block numbers of the file system. And if the file is even larger, then 13 elements are used - where the depth of nesting of the list is increased by another one.

Thus we can get a file of size (10 + 128 + 128 2 +128 3) * 512.

This can be represented as follows:

Address of the 1st block of the file

Address of the 2nd block of the file

The address of the 10th file block

Block address of indirect addressing (block with 256 block addresses)

Block address of the 2nd indirect addressing (block with 256 addresses of blocks with addresses)

Block address of the 3rd indirect addressing (block with block addresses with block addresses with addresses)

File protection

Now let's pay attention to the identifiers of the owner and group and the security bits.

On unix three-level user hierarchy:

The first level is all users.

The second level is user groups. (All users are divided into groups.

The third level is a specific user (Groups consist of real users). In connection with this three-level organization of users, each file has three attributes:

1) The owner of the file. This attribute is associated with one specific user, which is automatically assigned by the system as the owner of the file. You can become the owner by default by creating a file, and there is also a command that allows you to change the owner of the file.

2) File access protection. Access to each file is limited in three categories:

· Owner’s rights (what the owner can do with this file, in the general case - not necessarily anything);

· The rights of the group to which the file owner belongs. The owner is not included here (for example, the file may be closed for reading to the owner, and all other members of the group are free to read from this file;

· All other users of the system;

Three actions are regulated in these three categories: reading from a file, writing to a file, and executing a file (in the mnemonics of the system R, W, X, respectively). Each file in these three categories defines which user can read, which user can write, and who can run it as a process.

Directory Organization

A directory from the point of view of the OS is a regular file that contains data on all files that belong to the directory.

The catalog item consists of two fields:

1) the number of the i-node (serial number in the array of i-nodes) and

2) file name:

Each directory contains two special names: ‘.’ - the directory itself; ‘..’ is the parent directory.

(For the root directory, the parent refers to it itself.)

In the general case, entries may be repeatedly found in a directory that refer to the same i-node, but entries with the same name may not be found in a directory. That is, an arbitrary number of names may be associated with the contents of the file. It is called binding. A directory item related to a single file is called communication.

Files exist independently of directory entries, and links in directories really point to physical files. The file “disappears” when the last link pointing to it is deleted.

So, to access the file by name,operating system

1. Finds this name in the directory containing the file,

2. receives the number of the i-node of the file,

3. by the number finds the i-node in the region of i-nodes,

4. from the i-node receives the addresses of the blocks in which the file data is located,

5. at block addresses reads blocks from the data area.

Disk partition structure in EXT2 FS

The entire partition space is divided into blocks. A block can have a size of 1, 2 or 4 kilobytes. A block is an addressable unit of disk space.

Blocks in their area are combined into groups of blocks. The groups of blocks in the file system and the blocks within the group are numbered sequentially, starting with 1. The first block on the disk is number 1 and belongs to the group with number 1. The total number of blocks on the disk (in the disk section) is a divisor of the disk volume expressed in sectors. And the number of block groups does not have to divide the number of blocks, because the last group of blocks may not be complete. The beginning of each group of blocks has an address that can be obtained as ((group number - 1) * (number of blocks in the group)).

Each group of blocks has the same structure. Its structure is presented in the table.

The first element of this structure (superblock) is the same for all groups, and all the rest are individual for each group. The superblock is stored in the first block of each group of blocks (with the exception of group 1, in which the boot record is located in the first block). Super block is the starting point of the file system. It has a size of 1024 bytes and is always located at an offset of 1024 bytes from the beginning of the file system. The presence of several copies of the superblock is explained by the extreme importance of this element of the file system. Superblock duplicates are used when restoring a file system after a crash.

Information stored in the superblock is used to organize access to other data on the disk. The superblock determines the size of the file system, the maximum number of files in a partition, the amount of free space, and contains information about where to look for unallocated areas. When the OS starts, the superblock is read into memory and all changes to the file system are first found in the copy of the superblock located in the OP and are written to disk only periodically. This improves system performance, as many users and processes constantly update files. On the other hand, when the system is turned off, the superblock must be written to disk, which does not allow turning off the computer by simply turning off the power. Otherwise, at the next boot, the information recorded in the superblock will not correspond to the real state of the file system.

Following the superblock is a description of a group of blocks (Group Descriptors). This description contains:

The address of the block containing the block bitmap of this group;

Address of the block containing the inode bitmap of this group;

The address of the block containing the inode table of this group;

The counter of the number of free blocks in this group;

The number of free inodes in this group;

The number of inodes in this group that are directories

and other data.

The information that is stored in the group description is used to find the bitmaps of blocks and inodes, as well as the inode table.

File systemExt 2 characterized by:

hierarchical structure
consistent processing of data arrays
dynamic file extension
protection of information in files,
treating peripheral devices (such as terminals and tape devices) as files.

Internal File Representation

Each file in the Ext 2 system has a unique index. The index contains the information needed by any process in order to access the file. Processes access files using a well-defined set of system calls and identify the file with a string of characters that act as the compound file name. Each compound name uniquely identifies a file, so the kernel converts that name into a file index. The index includes a table of addresses of the location of the file information on disk. Since each block on the disk is addressed by its number, this set of numbers of disk blocks is stored in this table. In order to increase flexibility, the kernel attaches one block to the file, allowing the file information to be scattered throughout the file system. But such a layout complicates the task of finding data. The address table contains a list of block numbers containing the information belonging to the file.

File inodes

Each file on the disk corresponds to an inode of the file, which is identified by its serial number - the index of the file. This means that the number of files that can be created in the file system is limited by the number of inodes that are either explicitly set when the file system is created or calculated based on the physical size of the disk partition. Indexes exist on a disk in a static form and the kernel reads them into memory before starting to work with them.

The file inode contains the following information:

- Type and access rights to this file.

File Owner Identifier (Owner Uid).

File size in bytes.

The time the file was last accessed (Access time).

File creation time.

The time the file was last modified.

File deletion time.

Group Identifier (GID).

Links count

The number of blocks occupied by the file.

Flags (File flags)

Reserved for OS

Pointers to the blocks in which the file data is written (an example of direct and indirect addressing in Fig. 1)

File Version (for NFS)

ACL file

Directory ACL

Fragment address

Fragment number

Fragment size

Catalogs

Directories are files.

The kernel stores data in a directory in the same way as it does in a regular file type, using an index structure and blocks with direct and indirect addressing levels. Processes can read data from directories in the same way that they read ordinary files, however, the exclusive right to write to the directory is reserved by the kernel, which ensures the correct structure of the directory.).

When a process uses a file path, the kernel looks in the directories for the corresponding index number. After the file name has been converted to the number of the index descriptor, this descriptor is placed in memory and then used in subsequent queries.

Additional features of EXT2 FS

In addition to the standard features of Unix, EXT2fs provides some additional features that are usually not supported by Unix file systems.

File attributes allow you to change the reaction of the kernel when working with sets of files. You can set attributes to a file or directory. In the second case, files created in this directory inherit these attributes.

During installation of the system, some features related to file attributes can be installed. The mount option allows the administrator to select file creation features. In a file system with BSD features, files are created with the same group identifier as the parent directory. Features of System V are somewhat more complicated. If the setgid bit is set on the directory, then the created files inherit the group identifier of this directory, and the subdirectories inherit the group identifier and the setgid bit. Otherwise, files and directories are created with the main identifier of the group of the calling process.

EXT2fs can use synchronous data modification similar to BSD. The mount option allows the administrator to specify that all data (index descriptors, bit blocks, indirect blocks and directory blocks) are written to the disk synchronously when they are modified. This can be used to achieve high sweat recording information, but also leads to poor performance. In fact, this function is usually not used, since in addition to performance degradation, this can lead to the loss of user data that are not marked when checking the file system.

EXT2fs allows you to select the size of the logical unit when creating the file system. It can be 1024, 2048 or 4096 bytes in size. The use of blocks of large volume leads to acceleration of I / O operations (since the number of disk requests is reduced), and, consequently, to a smaller movement of heads. On the other hand, the use of blocks of large volume leads to the loss of disk space. Usually, the last block of a file is not fully used for storing information; therefore, with an increase in the volume of a block, the volume of lost disk space increases.

EXT2fs allows the use of accelerated symbolic links. When using such links, file system data blocks are not used. The name of the destination file is not stored in the data block, but in the index descriptor itself. This structure allows you to save disk space and speed up the processing of symbolic links. Of course, the space reserved for the descriptor is limited, so not every link can be presented as accelerated. The maximum length of a file name in an accelerated link is 60 characters. In the near future it is planned to expand this scheme for small files.

EXT2fs monitors the state of the file system. The kernel uses a separate field in the superblock to indicate the status of the file system. If the file system is mounted in read / write mode, then its state is set to "Not Clean". If it is dismounted or re-mounted in read-only mode, then its state is set to "Clean". While the system is booting and checking the state of the file system, this information is used to determine the need to check the file system. The kernel also puts some errors in this field. When a kernel detects a mismatch, the file system is marked as "Erroneous". The file system checker tests this information to check the system, even if its state is actually “Clean”.

Long ignoring the testing of the file system can sometimes lead to some difficulties, therefore EXT2fs includes two methods for the regular check of the system. The superblock contains a system mount counter. This counter is incremented every time the system is mounted in read / write mode. If its value reaches its maximum (it is also stored in the superblock), then the file system test program starts its verification, even if its state is "Clean". The last verification time and the maximum interval between the checks are also stored in the superblock. When the maximum interval between checks is reached, the state of the file system is ignored and its verification starts.

Performance optimization

The EXT2fs system contains many functions that optimize its performance, which leads to an increase in the rate of exchange of information when reading and writing files.

EXT2fs actively uses the disk buffer. When a block is to be read, the kernel issues a request for I / O operation on several blocks nearby. In this way, the kernel is trying to make sure that the next block to be read is already loaded into the disk buffer. Such operations are usually performed by sequentially reading files.

The EXT2fs system also contains a large number of optimizations for posting information. Block groups are used to combine the corresponding inodes and data blocks. The kernel always tries to place data blocks of one file in one group, just like its descriptor. This is intended to reduce the movement of the drive heads when reading the descriptor and its corresponding data blocks.

When writing data to a file, EXT2fs places in advance up to 8 adjacent blocks when placing a new block. This method allows you to achieve high performance with a heavily loaded system. It also allows you to place adjacent blocks for files, which shortens their subsequent reading.

A computer usually has several drives. Each disk is assigned a name, which is specified by a Latin letter with a colon, for example, A :, B :, C: etc. It is generally accepted that A: and B: are floppy disks, and C :, D: drives, etc. - hard drives, optical drives or electronic drives.

Electronic disks are part of the RAM, which for the user looks like a RAM. The speed of information exchange with an electronic disk is much higher than with an electromechanical external storage device. When electronic disks do not wear, electromechanical parts do not wear. However, after turning off the power, information on the electronic disk is not saved.

Physically existing magnetic disks can be divided into several logical disks, which for the user will look on the screen just like physically existing disks. Logical drive - This is a part of a regular hard drive with its own name.

The disc on which the operating system is recorded is called systemic (or bootable) drive. The most commonly used boot disk is C :. When treating viruses and system crashes, loading the operating system is often done from a floppy disk. Optical discs are available, which can also be bootable.

In order for information to be recorded on a new magnetic disk, it must be pre-formatted. Formatting - This is the preparation of a disc for recording information.

During formatting, service information is written to the disk (markup is made), which is then used to write and read information, correct disk rotation speed, and a system area is selected that consists of three parts:

ü boot sector,

ü file allocation tables,

ü root directory.

Boot sector (Boot Record) is located on each disk in the logical sector with number 0. It contains data on the disk format, as well as a short program used in the boot procedure of the operating system.

There is an area on the hard disk called the Master Boot Record (MBR) or the main boot sector. The MBR indicates which logical drive the operating system should boot from.

File allocation table (File Allocation Table - abbreviated FAT) is located after the boot sector and contains a description of the location of all files in the sectors of this drive, as well as information about defective parts of the drive. An FAT table is followed by an exact copy, which increases the reliability of saving this very important table.

Root directory (Root Directory) is always behind the FAT copy. The root directory contains a list of files and directories located on the disk. Directly behind the root directory is the data.

File system - This is part of the operating system that provides the organization and storage of files, as well as the execution of operations on files.

Windows supports the concept of a file as an unstructured sequence of bytes.

An application has the ability to read these bytes in random order. Typically, file storage is organized on a direct access device in the form of a set of blocks of a fixed size. The main task of the file management subsystem is to associate a symbolic file name with disk blocks that contain file data.

File naming

To create a file and give it a name in Windows, the Win32 function CreateFile is used. The maximum length of the full file name when creating the file is MAX_PATH with a value of 260, but the system allows the use of file names up to 32,000 characters in Unicode format.

The system has the ability to distinguish between large and small letters in the file name. Applications typically recognize a file type by its name. For example, files with the extension .exe are executable. The connection of names with processing programs is implemented in the registry.

File attributes

In Windows, it is believed that a file is not just a sequence of bytes, but a collection of attributes, and file data is only one of the attributes - the so-called unnamed data stream.

Attributes are stored as a pair:<наименование атрибута, значение атрибута> in the file entry in the MFT master file table (Master File Table - "master file table").

NTFS file attribute list

Standard information is flag bits (read only, archived), time stamps.
File name. The file name is stored in Unicode encoding.
Security descriptor (file access control).
Data. Unnamed and named data streams.
The object identifier is a 64-bit file identifier unique to this volume. A file can be opened not only by name, but also by this identifier.
Information about the volume.
Indexing information used for directories.
EFS (Encryption File System) data used for encryption.

Organization of files and access to them. The concept of asynchronous I / O

The file subsystem of Windows deals with files whose bytes can be read in any order, since the block number is puzzled by the current position within the file. Such files are called direct access files.

An important achievement of Windows developers is to provide the user with the ability to perform asynchronous I / O. At the same time, the process initiating the I / O operation does not wait for its completion, but continues to calculate.

Directories. The logical structure of the file archive

The file system on the disk is a hierarchical structure, which is organized due to the presence of special files - directories (directories) that have the same internal table format (file name, file type - regular or directory, attributes).

In the full file name, Windows supports the notation "." - for the current directory, ".." - for the root directory.

Disk partitions. Mount operation

In Windows, it is customary to partition physical disks into logical ones (this is a low-level operation), sometimes called partitions. Sometimes, on the contrary, they combine several physical disks into one logical one. Logical drive names are stored in the "\\?" Directory namespaces of objects. By specifying the drive letter, the application gets access to its file system.

Windows allows the user to create a mount point - to associate an empty directory with a logical drive directory. Upon successful completion of the operation, the contents of these directories will correspond to each other.

NTFS file system

On Windows, the file system is integrated into the I / O system.

Clusters

Usually disks are divided into blocks (sectors) of size - 512 b. But it is more convenient to operate with blocks of a larger size - a cluster. The cluster size is equal to the sector size multiplied by the cluster factor (claster factor), and can be set during the disk formatting operation. By default, this value is 4 KB.

NTFS supports sizes 512, 1024, 2048, 4096, 8196, 16 KB, 32 KB, 64 K. Optimal is a compromise block size that lies in the range from 1 to 8 KB. NTFS volume compression is not supported for cluster sizes greater than 4096 B. The system distinguishes between disk clusters (volume claster) and disk clusters belonging to a file (logical claster).