Files and file system. File management, file types, file system, file attributes

One of the main tasks of the operating system is to provide convenience to the user when working with data stored on disks. To do this, the OS replaces the physical structure of the stored data with some user-friendly logical model. The logical model of the file system is materialized in the form of a directory tree, symbolic compound file names, and commands for working with files. The basic element of this model is a file, which, like the file system as a whole, can be characterized by both a logical and a physical structure.

A file is a named area of \u200b\u200bexternal memory that can be written to and read from. The files are stored in power-independent memory. However, there are exceptions, and one of them is "ramdisk", a structure in RAM that mimics a file system.

The main purposes of using the file:

Long-term and reliable storage of information. Durability is achieved through the use of storage devices that do not depend on power supply, and high reliability is determined by the means of protecting access to files and the general organization of the OS program code, in which hardware failures most often do not destroy information stored in files.

Sharing information. Files provide a natural and easy way to share information between applications and users by providing a human-readable symbolic name and persistence of the stored information and file location. The user should have convenient tools for working with files, including directories that combine files into groups, search tools for files by attributes, a set of commands for creating, modifying and deleting files. A file can be created by one user and then used by a completely different user, while the creator of the file or the administrator can determine the access rights of other users to it. These goals are implemented in the OS by the file system.

The file system (FS) is a part of the operating system that includes:

The collection of all files on a disk;

Sets of data structures used to manage files, such as file directories, file descriptors, tables for allocating free and used disk space;

A complex of system software tools that implement various operations on files, such as creating, destroying, reading, writing, naming and searching for files.



The file system allows programs to get by with a set of simple enough operations to perform actions on some abstract object that represents the file. There is no need to deal with the details of the actual location of data on disk, buffering of data, etc.: all these functions are taken over by the FS. FS allocates disk memory, supports file naming, maps file names to corresponding addresses in external memory, provides access to data, supports file partitioning, protection and recovery. Thus, the FS acts as an intermediate layer that shields all the complexities of the physical organization of a long-term data storage, and creates a simpler logical model of this storage for programs, as well as providing them with a set of easy-to-use commands for manipulating files.

The tasks solved by FS depend on the way of organizing the computational process as a whole. The simplest type is FS in single-user and single-program operating systems, such as MS-DOS. The main functions in such a FS are aimed at solving the following tasks:

File naming;

Application programming interface;

Mapping the FS logical model to the physical organization of the data warehouse;

FS resilience to power failures, hardware and software errors.

FS tasks are complicated in operating single-user multiprogramming operating systems, which, although designed for the work of one user, allow him to run several processes simultaneously. OS / 2 was one of the first operating systems of this type. A new multi-process file sharing task is added to the tasks listed above. In this case, the file is a shared resource, which means that the FS must solve the whole complex of problems associated with such resources. In particular, the FS should provide means for blocking a file and its parts, preventing races, eliminating deadlocks, reconciling copies, etc. In multi-user systems, another task appears: protecting files of one user from unauthorized access by another user. The functions of the FS, which operates as part of a network operating system, become even more complex.



FS supports several functionally different file types, which, as a rule, include regular files, directory files, special files, named pipes, memory-mapped files, and others. Regular files, or simply files, contain information of an arbitrary nature. Most modern operating systems do not limit or control the content and structure of a regular file in any way. The content of a regular file is determined by the application that works with it. For example, a text editor creates text files consisting of strings of characters represented in some code. These can be documents, source code of programs, etc. Text files can be read on the screen and printed on a printer. Binaries do not use character codes; they often have complex internal structures, such as executable program code or an archive file. All operating systems must be able to recognize at least one type of file - their own executable files. Directories are a special type of file that contains system reference information about a set of files. In many operating systems, a directory can contain files of any type, including other directories, thereby creating a tree-like structure that is easy to search. Directories establish the correspondence between file names and their characteristics used by the file system to manage files. These characteristics include, in particular, information (or a pointer to another structure containing this data) about the file type and location on disk, access rights to the file and the dates of its creation and modification. In all other respects, directories are treated like regular files by the file system. Special files are dummy files associated with I / O devices that are used to unify the mechanism for accessing files and external devices. They allow the user to perform I / O operations through normal write to file or read from file commands. These commands are first processed by FS programs, and then at some stage of the query execution they are converted by the OS into commands to control the corresponding device. Modern file systems support other file types as well, such as symbolic links, named pipes, memory-mapped files.

Users refer to files by symbolic names. Human memory capacity limits the number of object names that a user can refer to by name. The hierarchical organization of the namespace allows these boundaries to be greatly expanded. This is why most file systems have a hierarchical structure in which levels are created by allowing a lower-level directory to be part of a higher-level directory. The graph describing the directory hierarchy can be a tree or a network. Directories form a tree if a file is allowed to enter only one directory, and a network - if a file can belong to several directories at once. For example, on MS-DOS and Windows, directories form a tree structure, while on UNIX, they are networked. In a tree structure, each file is a leaf. The top-level directory is called the root directory, or root. With such an organization, the user is freed from memorizing the names of all files; it is enough for him to roughly imagine to which group a particular file can be assigned. The hierarchical structure is convenient for multi-user work: each user with his files is localized in his directory or subtree of directories, and at the same time all files in the system are logically linked. A special case of a hierarchical structure is a one-level organization, when all files are included in one directory.

Hello dear user, this article will focus on such a topic as files. Namely, we will consider: File management, file types, file structure, file attributes.

File system

One of the main tasks of the OS is to provide convenience to the user when working with data stored on disks. To do this, the OS replaces the physical structure of the stored data with some user-friendly logical model, which is implemented as a directory tree displayed by utilities such as Norton Commander, Far Manager or Windows Explorer. The basic element of this model is filewhich is the same as file system in general, it can be characterized by both logical and physical structure.

File management

File - a named area of \u200b\u200bexternal memory for reading and writing data.

The files are stored in power-independent memory. An exception is ramdisk, when a structure is created in the OP to simulate a file system.

File system (FS) is an OS component that provides the organization of creation, storage and access to named data sets - files.

The file system includes: The file system includes:

  • The collection of all files on disk.
  • Sets of data structures used to manage files (file directories, file descriptors, tables for allocating free and used disk space).
  • A set of system software tools that implement various operations on files: creation, destruction, reading, writing, naming, search.

The tasks solved by FS depend on the way of organizing the computational process as a whole. The simplest type is FS in single-user and single-program operating systems. The main functions in such a FS are aimed at solving the following tasks:

  • File naming.
  • Programming interface for applications.
  • Mapping the FS logical model to the physical organization of the data warehouse.
  • FS resilience to power failures, hardware and software errors.

FS tasks become more complex in single-user multitasking operating systems, which are designed for one user, but make it possible to run several processes simultaneously. A new task is added to the tasks listed above - sharing a file from multiple processes.

In this case, the file is a shared resource, which means that the FS must solve the whole complex of problems associated with such resources. In particular: means of blocking the file and its parts, reconciliation of copies, prevention of races, elimination of deadlocks should be provided. In multi-user systems, another task appears: Protecting files of one user from unauthorized access by another user.

The functions of the FS, which works as part of a network OS, are becoming even more complex; it needs to organize protection files one user from the unauthorized access of another user.

Main purpose file system and corresponding file management systems- organization of convenient management of files, organized as files: instead of low-level access to data indicating the specific physical addresses of the record we need, we use logical access indicating the file name and record in it.

The terms "file system" and "file management system" must be distinguished: the file system defines, first of all, the principles of access to data organized as files. And the term "file management system" should be used in relation to a specific implementation of the file system, i.e. it is a set of software modules that provide work with files in a specific OS.

Example

The FAT (file allocation table) file system has many implementations as a file management system

  • The system developed for the first PCs was simply called FAT (now it is simply called FAT-12). It was designed to work with floppy disks, and for some time it was used to work with hard drives.
  • Then it was improved to work with larger hard drives, and this new implementation was called FAT-16. this name is also used in relation to the SUF of MS-DOS itself.
  • The OS / 2 SF implementation is called super-FAT (the main difference is the ability to support extended attributes for each file).
  • There is also a version of SUF for Windows 9x / NT, etc. (FAT-32).

File types

Regular files: contain information of an arbitrary nature that the user enters into them or that is formed as a result of the work of system and user programs. The content of a regular file is determined by the application that runs with it.

Regular files can be of two types:

  1. Software (executable) - are programs written in the command language of the OS and perform some system functions (they have extensions .exe, .com, .bat).
  2. Data files - all other types of files: text and graphic documents, spreadsheets, databases, etc.

Catalogs Is, on the one hand, a group of files combined by the user for some reason (for example, files containing game programs, or files that make up one software package), and on the other hand, it is a special type of files that contain system help information about a set of files grouped by users according to some informal criterion (file type, location on disk, access rights, creation and modification date).

Special files Are dummy files associated with I / O devices that are used to unify the mechanism for accessing files and external devices. Special files allow the user to perform I / O operations through the usual write from files or read from files. These commands are first processed by FS programs, and then, at some stage in the execution of the request, the OS is converted into commands to control the corresponding device (PRN, LPT1 - for the printer port (symbolic names, for OS - these are files), CON - for the keyboard).

Example... Copy con text1 (keyboard work).

File structure

File structure - the whole set of files on the disk and the relationships between them (the order in which files are stored on the disk).

Types of file structures:

  • simple, or single-level: directory is a linear sequence of files.
  • hierarchical or multilevel: a directory itself can be part of another directory and contain many files and subdirectories within it. The hierarchical structure can be of two types: "Tree" and "Network". Directories form a "Tree" if the file is allowed to enter only one directory (OS MS-DOS, Windows) and "Network" - if the file can enter several directories at once (UNIX).
  • The file structure can be represented as a graph describing the hierarchy of directories and files:



Filename types

Files are identified by names. Users give files symbolic names, this takes into account the OS restrictions both on the characters used and on the length of the name. In early file systems, these boundaries were very narrow. So in the popular fAT file systemthe length of names is limited by the well-known 8.3 scheme (8 characters for the name itself, 3 characters for the name extension), and under UNIX System V, a name cannot contain more than 14 characters.

However, it is much more convenient for the user to work with long names, since they allow you to give a file a truly mnemonic name, by which, even after a sufficiently long period of time, it will be possible to remember what the file contains. Therefore, modern file systems tend to support long symbolic file names.

For example, Windows NT specifies in its NTFS file system that a file name can be up to 255 characters, not including the terminating null character.

Moving to long names raises a compatibility issue with previously created applications that use short names. For applications to access files in accordance with previous conventions, the file system must be able to provide equivalent short names (aliases) to files that have long names. Thus, one of the most important tasks is the problem of generating the corresponding short names.

Symbolic names can be of three types: simple, compound, and relative:

  1. Simple name identifies a file within one directory, is assigned to files taking into account the nomenclature of the symbol and the length of the name.
  2. Full name is a chain of simple symbolic names of all directories through which the path passes from the root to the given file, disk name, file name. So the full name is composite, in which simple names are separated from each other by the OS delimiter.
  3. The file can be identified as well relative name... The relative file name is defined by the term "current directory". At each moment of time, one of the directories is current, and this directory is selected by the user himself at the command of the OS. The file system captures the name of the current directory so that it can then be used in addition to relative names to form a fully qualified file name.

In the tree-like file structure, there is a one-to-one correspondence between a file and its full name - “one file - one full name”. In a network file structure, a file can be included in several directories, which means it can have several full names; here the correspondence is true - "one file - many full names".

Define all three name types for the 2.doc file, assuming the current directory is 2008_year.

  • Simple name: 2.doc
  • Full name: C: \\ 2008_year \\ Documents \\ 2.doc
  • Relative name: Documents \\ 2.doc

File attributes

Attributes are an important characteristic of a file. Attributes Is information describing the properties of files. Examples of possible file attributes:

  • Read-only flag;
  • Sign "hidden file" (Hidden);
  • Sign "system file" (System);
  • "Archive file" sign;
  • File type (regular file, directory, special file);
  • File owner;
  • File creator;
  • File access password;
  • Information about permitted file access operations;
  • Creation time, last access and last modification;
  • Current file size;
  • Maximum file size;
  • Sign "temporary (remove after completion of the process)";
  • Blocking sign.

In file systems of different types, different sets of attributes can be used to characterize files (for example, in a single-user OS, the attribute set will lack characteristics related to user and security (file creator, file password, etc.).

The user can access the attributes using the tools provided for this purpose by the file system. Typically, you are allowed to read the values \u200b\u200bof any attributes, but only to change some of them, for example, you can change the file's permissions, but you cannot change the creation date or the current file size.

File permissions

Determining access rights to a file means defining for each user a set of operations that he can apply to a given file. Different file systems can have their own list of differentiated access operations. This list can include the following operations:

  • file creation.
  • destruction of the file.
  • writing to file.
  • opening a file.
  • closing the file.
  • reading from a file.
  • file addition.
  • search in file.
  • getting file attributes.
  • setting new attribute values.
  • renaming.
  • file execution.
  • reading a directory, etc.

In the most general case access rights can be described by an access rights matrix, in which columns correspond to all system files, rows - to all users, and the permitted operations are indicated at the intersection of rows and columns:

On some systems, users can be divided into separate categories. For all users of one category, uniform access rights are defined, for example, in UNIX, all users are divided into three categories: the owner of the file, members of his group, and everyone else.


^ Hierarchical file system.

All files in the file structure are built in a tree. The root of the tree is the so-called root of the file system. If the tree node is a leaf, then it is a file that can contain either data or a directory. The non-leaf nodes are directories. Accordingly, naming in such a system can occur in different ways. The first is the naming of the file relative to the nearest directory. If we look at the files that are closest to the F0 directory, they are F1 (which is also a directory) and F2. That is, if we somehow mean (in a systematic way) that we are working in the F0 directory, then we can access the files in this directory only by their names (F1 and F2). Accordingly, names must be unique at one level (within the same directory). Since we have a tree structure, we can talk about the full file name, which is the path from the root of the tree to the file. For example, the path to the F3 file will look like “/ F0 / F1 / F3”. At the same time, we can work with both full and short file names. And since the path to each leaf is unambiguous by the property of the tree, we immediately solve the problem of unifying names.

The first such organization appeared in OS Multics, which was developed at the University of Berkeley in the late 60s. It was a long time ago, but such a good and beautiful solution has since started appearing in many operating systems.

According to the hierarchy, each file can be bound by some attributes related to access rights, these attributes can have both files and directories. That is, the structural organization of such a file system is good for a multiuser system. For on the one hand, there is no naming problem, but on the other hand, such a system can grow strongly and well.

OS data protection

^ Identification - the ability of the OS to recognize a specific user and perform, depending on the definition, the necessary actions to protect data, etc. For example, MS DOS is a single user OS. There are systems that allow you to register users, but these users are not related to each other (an example is some IBM mainframe OS), which means they cannot be organized into groups. But it would be convenient to separate it into a separate group - laboratory, department, study group of students, etc.

In the hierarchical organization of users, there is the concept of a group. And the group has real users.

When registering a specific user, it should be assigned to a group.

Since users are divided into a group, then, by analogy with the division between specific users, you can share resources with the group (that is, a user can make his files available to all members of a group.

And such division into groups can also be multi-level with a corresponding distribution of rights and opportunities.

A small note - now there are operating systems in which access rights can be not only hierarchical, but also more complex - for example, breaking the hierarchy (a file can be available to a specific user from a group of another tree branch).

That is probably all that should be said about the properties and functions of the OS. Naturally, we have not considered all the functions of the OS. Something was deliberately left out as we are looking at the OS in a simplified model. For our goal is not to study a specific OS, but to learn how to classify an OS, from what points of view we should look at it and compare different types of OS.

Lecture 7

Unix OS

Today we are moving on to the beginning of the consideration of the Unix OS, since we will consider many decisions that are made in the OS using this OS as an example.

In the mid-60s, AT & T's Bell Laboratory carried out research and development of one of the first operating systems in its modern sense - OS Multics. It is a time-sharing OS, multi-user, and in this system, in fact, solutions for organizing file systems were proposed. In particular, a hierarchical tree-like file system has been proposed. This is roughly 1965. From this development, the Unix OS got its start some time later. One of the backstories says that the company had an unnecessary PDP8 computer with very little developed software. And what was needed was a machine that would make it possible to organize a convenient user work, in particular, a convenient input of information. And a well-known group of people - Thompson and Ritchie started developing a new OS on this machine. Another option was that they were engaged in the implementation of a new game, and those tools that were not available or inconvenient, and they decided to play with this machine. The result was the emergence of the Unix OS. The peculiarity of this system was that it was the first system program written in a language other than assembly language. For the purpose of writing this system software, in particular, the Unix OS, work was also carried out in parallel, which began from the BCPL language, from which the B language was formed, which operated with machine words, then the abstraction of machine words - BN and, finally, the language “C ”. And after 1973, the Unix OS was finally rewritten into the "C" language. As a result, an OS appeared, 90% of the code of which was written in a high-level language, a language that does not depend on the architecture of the machine and the instruction set, and 10% was written in assembler, these 10% include the most time-critical parts of the OS kernel.

Many programmers at that time were a little shocked by this, few believed that such an OS was capable of living, since a high-level language was always associated with great inefficiency. But the C language was nevertheless designed in such a way that it allowed writing efficient programs and translating them into quite efficient machine code as well.

Of these design properties, it should be noted that "C" is heavily built on working with pointers. When we write a program in assembly language, very often we need to manipulate addresses to achieve the desired result. The ability to operate on pointers is the first property of “C” that allows you to efficiently translate a program in this language into machine code.

If we look at a normal assembler program, we will notice the following - when programming some blocks, we often use a side effect (for example, while evaluating an expression, we can receive and postpone intermediate results somewhere), you can also do in the language “ FROM". Thus, the concept of expression in "C" was much broader than in other languages \u200b\u200bof the time. And in expressions, in addition to new operations, such as working with a pointer, offsets, shifts, etc., a fundamentally new operation appeared - the assignment operation. Why is it new? Because in many languages \u200b\u200bbefore "C", as well as after it there was no assignment operator - there was an assignment operator. The difference is one - if we have an assignment operator, firstly, it is required that there is no such operation on the right side (we cannot use a side effect), and secondly, the left side of the assignment operator is some reference to a unit memory area. Introducing the assignment operator inside the expression allowed us to solve the problem of side effects (the values \u200b\u200bof subexpressions that can be used outside - and they, in turn, reduce the number of exchanges with RAM), and this is a means of efficiency.

These and, probably, only these properties of the language determined its survivability, suitability for programming system components and the possibility of optimal translation of the code of various machines. From a professional point of view, C is a terrible language. The main requirement for programming languages \u200b\u200btoday is the safety of programming. That is, the means of the language should minimize the number of possible errors.

And the properties of such languages \u200b\u200binclude the following:


    1. Tight type control. That is, if we try to multiply an integer variable by a floating one, the language will generate an error. All type conversions are invalid by default.

    2. Providing control over access to program memory. This means that if a number was written in our memory as an integer, then we can read it from there only as an integer, and not as a floating or a symbol. In “C” and other languages, uncontrolled access to memory provides a pointer, moreover, through a pointer, on the one hand, we lose any information about the type, and on the other hand, we can deceive functions in terms of actual and formal parameters.

    3. Control over the interaction of modules. The essence of this property is that many errors appear in the case that if a function has declared one set of parameters, but it is called with a different set, and the difference can be both in quantity and in types. The C language, even in spite of the ANSI C version, which tried to partially solve this problem - there is always a possibility to deceive a function and pass it a parameter of a different type, instead of six parameters, you can pass one parameter.
For these three points, the C language is not a good language. But nevertheless it is the "mentality" of programmers, which is that for some reason the most tenacious languages \u200b\u200bare conceptually bad languages, to such languages, in addition to "C", you can add Fortran.

So, 1973. The emergence of the Unix OS, and it was already written in the "C" language. What are the main properties already possessed by this OS. The first property is the concept of files, the main object that the OS operates on is the file. A file is a collection of data, a file from a Unix perspective is an external device, a file is a directory that contains information about the files it owns, etc. Today the file strategy is common in Unix for almost everything. The second property, which is a continuation or consequence of the first, is that the OS is built in a very interesting way. Unlike previous operating systems, where each command was hardcoded inside, and this command could not be modified, removed from the system, or a new command created - in Unix, the problems of user commands are solved very elegantly due to two points. First, Unix declares a standard interface for passing parameters from outside to inside a process. Second, all commands are implemented in the form of files, which means that you can freely add new commands to the system, which will be available either to me, or to a group of users, or to everyone, or you can delete commands.

Let's start by looking at the specific properties of the Unix OS. The first thing we will consider is the file system, the organization of work with files.

Unix file system

The Unix file system is a hierarchical, multi-user file system. It can be represented as a tree:

At the root of the tree is the “root directory”, the nodes other than the leaves of the tree are directories. Leaves can be: files (in the traditional sense - named datasets), empty directories (directories with which no file is associated). The system defines the concept of a file name - it is a name that is associated with a dataset within the directory to which this file belongs. For example, directory D1 owns files: N1, N2, N3; directory D0 owns: N4, N5 and D1, the latter is also a file, but special. So a name is a name that is associated with a dataset in the context of a directory. In addition, there is the concept of a full name. The fully qualified name is a unique path from the root of the file system to a specific file. The first character of the name is the root directory “/”, and then all directories are listed through a forward slash until you reach the desired file. For example, file N3 has the full name “/ D0 / D1 / N3”. Due to the fact that such a path is unique for each file in any directory, we can name the files in different directories with the same names. For example, the name N4 is present in the D0 and D4 directories, but they are different files, since the full paths to them are different (/ D4 / N4, / D0 / N4).

Comment. In fact, the Unix filesystem is not tree-like. All that was said above is correct, but the system has the possibility of breaking a beautiful and convenient hierarchy in the form of a tree, since it is possible to associate several names with the same file content. And situations may arise when, for example, “/ D4 / N3” and “/ D0 / D1 / N1” are, in fact, one file with two names.

One more note. Unix OS uses a three-level user hierarchy:

The first level is all users. They are subdivided into groups and, accordingly, the groups consist of real users. Because of this three-tiered user organization, each file has three attributes:

1) The owner of the file. This attribute is associated with one specific user, which is automatically assigned by the system as the owner of the file. You can become the owner by default by creating a file, and there is also a command that allows you to change the owner of a file.

2) Protection of access to the file. Access to each file (from the system kernel file to an ordinary text file) is limited in three categories:

Owner rights (what the owner can do with this file, in general - • not necessarily anything);

The rights of the group that owns the file owner. The owner is not included here (for example, the file can be closed for reading by the owner, and all other members of the group can freely read from this file;

All other users of the system;

Three actions are regulated by these three categories: reading from a file, writing to a file, and executing a file (in the system mnemonics R, W, X, respectively). In each file, these three categories are defined - which user can read, which write, and who can run it as a process.

This is some preliminary data on the file system. Now let's look at the structure of the file system on disk.

First, let's define some concepts:

For any computing system, the concept of a system external storage device (OSS) is defined. This is a device that is accessed by the system's hardware loader to start the OS.... The bottom line is the following - almost any computing system has a range of RAM address space located in ROM. The ROM contains a small program (although the concept of size is relative, but it is really small), which, when the computer is turned on, accesses a fixed block of the internal memory, reads it into memory and transfers control to a fixed address related to the read block of data.

It is considered that the read data block is a software loader and the software loader spins up the OS startup. It should be noted that if the hardware loader in the vast majority of machines is system-independent (that is, it does not know which OS will be loaded), then the software loader is already an OS component, it knows that a specific OS will be loaded, it knows where the necessary to load data.

In any system, it is accepted to divide the OVC space into some data areas, which are called blocks. The block size (logical block in the OS) is a fixed attribute. In the Unix OS in its various variations, the block size was a parameter that changed depending on the OS variant. For simplicity and consistency, we will assume that the logical block of the OVC is 512 bytes.

So, let's look at the structure of the filesystem. Let's represent the address space of the system OVC as a sequence of blocks.

We will assume that these blocks are N + M-1.

The first block is the bootstrap block. The location of this block in the zero block of the system device is determined by the hardware, since the hardware loader always refers to a specific block of the system device (to the zero block). This is the last component of the file system, which is hardware dependent.

The next block is the superblock of the file system. It contains operational information about the state of the file system, as well as information about the settings for the file system. In particular, the superblock has information about


    • the number of inodes (IDs) in the file system;

    • the size of the file system;

    • free blocks of files;

    • free IDs;

    • some more data that we will not list due to the uniqueness of their purpose.
The third block is the inode area. An ID is a special file system data structure that one-to-one corresponds to a file. There is one and only one ID associated with each file content. IDs are organized not by one block, but by a space of blocks, the sizes of which are determined by the file system generation parameter (determined by the number of IDs specified in the superblock). Accordingly, each index register contains the following information:


    • privilege / security code;


    • file length;


Next are the file blocks. This is the space of the OVC, which contains all the information in files and about files that does not fit in the blocks already listed.

The last data area (it is located differently in different systems), but for simplicity of presentation, we will assume that this area is located immediately after the file blocks - this is the save area.

This is a conceptual diagram of the structure of the filesystem. Now let's go back and look at some of its parts in more detail.

The areas of free file blocks and free IDs are of particular interest. In Unix, the influence of two factors is visible: the first is that the file system was developed when the 5-10MB VC was considered very large and the author's efforts to optimize this process are visible in the implementation of algorithms for working with the system; and the second is the properties of the file system to optimize access, the criterion of which is the number of exchanges that the file system makes for its needs, not related to reading or writing file information.

The superblock contains the list of free file blocks, it consists of 50 elements. The essence of working with this list is as follows - in a buffer consisting of 50 elements (provided that the block is 512 bytes, 1 block is a 16-bit word), they contain the numbers of free blocks of the file block space from 2 to 49. The 0 element contains a pointer to the continuation of the array, and the last element contains a pointer to a free element in the array.

If a process needs a free block to expand the file, then the system selects an array element using the N / B pointer (block number), and this block is provided to the file. If the file is shortened, the freed numbers are added to the free block array and the N / B pointer is adjusted.

Since the size of the array is 50 elements, two critical situations are possible:


    1. When we release blocks of files and they cannot fit in this array. In this case, one free block is selected from the file system and the completely filled array of free blocks is copied into this block, after which the value of the N / B pointer is zeroed, and the number of the block that we have chosen for copying is written to the zero element of the array, which is in the superblock. the contents of the array. Thus, if we constantly free blocks, then a list is formed in which all free blocks of the file system will be located.

    2. When we have selected all free blocks and the contents of the elements of the free block array have been exhausted. If the zero element of the array is equal to zero, it means that the entire file system space has been exhausted. If this element is not equal to zero, then it means that there is an extension of the array. This continuation is read into a copy of the superblock in RAM.
To get a free block and release it, in most cases no additional exchange is required. Additional exchange is required when the content of 49 blocks is exhausted. We get good buffering, which reduces OS overhead.

List of free IDs. This is a buffer of 100 elements. It contains information about 100 ID numbers that are free at the moment. Accordingly, when a new ID is needed, then its number is taken from the list of free IDs, if the number is released, then it is entered into this array. If the array is overflowed, and 101 elements are freed, then this is not written anywhere. If the list of IDs overflows, the system “runs” through the list and re-forms the contents of this buffer.

In a situation when a file needs to be created and a new ID is needed, and there are no elements in the array, the process of searching for a new ID is started, and it finds nothing. Then two situations are possible:


    1. There are no more free blocks for files;

    2. No more new IDs.
Here is the superblock information. What conclusions and comments can be drawn?

    • the superblock is always in RAM;

    • all operations for freeing blocks, seizing file blocks, seizing and freeing IDs occur in RAM (minimizing disk exchanges). If the contents of the superblock are not written to disk and the power is turned off, then problems will arise (discrepancy between the real state of the file system and the contents of the superblock). But this is already a requirement for the reliability of the system hardware.

Lecture 8

Index Descriptors

Let's take a closer look at Index Descriptors. ^ An ID is a Unix object that is put in a one-to-one correspondence with the contents of the file. That is, there is only one content for each ID and vice versa, except for the situation when the file is associated with some external device. Recall the contents of the ID:


    • field defining the file type (directories and all other files);

    • privilege / security code;

    • the number of links to this ID from all possible directories of the file system;

    • (zero value means freedom of ID)

    • file length in bytes;

    • dates and times (time of the last entry, date of creation, etc.);

    • file block addressing field.
As you can see, there is no file name in the ID. Let's see how the addressing of the blocks in which the file is located is organized.

The addressing field contains the numbers of the first ten blocks of the file, that is, if the file is small, then all information about the location of the file data is located directly in the ID. If the file exceeds ten blocks, then a certain list structure starts to work, namely, the 11th element of the addressing field contains the block number from the space of file blocks, which contain 128 references to blocks of this file. In the event that the file is even larger, then the 12th element of the addressing field is used. The point is as follows - it contains a block number, which contains 128 records of block numbers, where each block contains 128 block numbers of the file system. And if the file is even larger, then element 13 is used - where the nesting depth of the list is increased by one more.

Thus, we can get a file of size (10 + 128 + 128 2 + 128 3) * 512.

If we ask the question - why all this is needed (tables of free blocks, IDs, etc.), then we recall that we are considering the relationship between hardware and software of a computing system, and in this case, such a file system device can greatly reduce the number of real exchanges with VCU, and the layered buffering in the Unix OS makes the number of these exchanges even less.

Consider the next area - the save area. In the diagram, it is shown immediately after the file blocks. In fact, it can be placed in different ways: in front of file blocks, in some file, or elsewhere, for example, on another memory. It all depends on the specific implementation of the system.

The process is pumped into the save area, it is also used to optimize the launch of the most frequently launched processes (using the so-called T-bit of the file).

We examined the structure of the file system and its organization on the system device. This structure and algorithms for working with it are quite simple, this is done so that the overhead associated with the functioning of the system does not go beyond reasonable limits.

Filesystem elements:

Catalogs

We said that all information in Unix is \u200b\u200bstored in files. There are no special tables that are located outside the file system and are used by the system, with the exception of those tables that the OS creates while working in RAM space.

^ From the point of view of the OS, a directory is a file, a regular file that contains information about all the files that belong to the directory.

We say that directory “A” contains files: “B”, “C” and “D” - of which “B” and “C” can be both files and directories, and “D” is obviously a directory.

The directory has the following structure. It consists of elements that combine two fields - ID number and file name:

Directory \u003d ((ID, Name), (ID, Name), ..., (ID, Name))

What is the ID number? is the ordinal number of the item in the inode list. So, the first element of this list - ID # 1 belongs to the root directory “.”.

In general, a directory may contain multiple entries referring to the same ID, but a directory may not contain entries with the same name. That is, an arbitrary number of names can be associated with the contents of the file. When creating a directory, two entries are always created in it:

(Id_directory itself, “.”) And (Id_parent_directory, “..”)

So in the picture file “A” has ID # 7, “D” - ID # 5, “F” - ID # 10, “G” - ID # 101. In this case, the directory file D will have the following contents:

{{ 5, “.” },

(For the root directory, the parent refers to itself.)

How is a directory file different from a regular file? It is distinguished by the type field in the inode.

Let's see how fully qualified names and directory references can be used schematically. The system defines the current directory for the user at any given time. That is, a directory whose full name is substituted for all files whose name does not begin with the “/” character. If the current directory is "D", then we can simply talk about the file "F" or the file "G", but if the current directory is "D" and you want to get to the file "B", then you cannot simply operate with the name "B", so as it does not belong to directory “D”, file “B” can be obtained by specifying its full name from the root, or you can use a special file “..”, in this case file “B” will have the name: “../B”. If when opening we refer to ".."

In order to open file "B" in this case, you will have to perform a number of indirect operations - take the parent ID, and use it to select the contents of the file-directory "A", in "A" we select the line with the name "B" and define it ID. This procedure is quite laborious, but since opening and closing files is quite rare, there is no "crime" in this.

Due to this organization of directories, the contents of the file are torn with its name. The name can be ambiguous.

Since several names can be associated with one file, we can say that this file can be simultaneously opened by several processes (generally speaking, having one name, we can also open this file from several processes, the essence of the problem does not change from this specification) ... How is synchronization organized in this case? As we will see later, everything is solved correctly here.

Device files

This kind of files is characterized by a type and their interpretation is as follows. In principle, the device files have no content, that is, they are only an ID and a name associated with it. The ID indicates information about what type of device is associated with this file, respectively, the Unix system divides all devices into two types: byte-oriented and block-oriented. Byte-oriented devices are those devices that are exchanged with bytes (for example, a keyboard), block-oriented are those devices with which blocks are exchanged. The ID contains a field indicating this characteristic, and there is also a field that determines the number of the driver associated with this device. In a system, each driver is associated with a specific one device, but a device can have multiple drivers. This field, which determines the driver number, is actually a number in the driver table of the corresponding class of devices (there are two tables - for block and byte devices). Also in the ID there is some digital parameter that can be passed to the driver as a parameter that specifies information about the work.

This is what can be said about special files associated with external devices.

Data exchange with files

The next thing from the system organization of the file system is the organization of data exchange with the file. Let's define concepts related to low-level I / O. Unix defines special functions called system calls. These calls make direct access to the OS, they perform some system functions. In use, they practically do not differ from the use of library functions, while in implementation and action, their difference is quite significant. The library function will be loaded into the body of the process, and the system call immediately transfers control to the OS, and the latter performs the ordered action. Unix provides a set of these functions to provide low-level (through system calls) I / O:

open (...) - to work with the contents of the file, the process must register this fact in the system, the parameters of this function are a string containing the file name and attributes for the file operation mode (read only, read-write, etc.), and this function returns a certain number, which is called a file descriptor (FD). The body of the user process, as well as the data associated with this process, contains some service information. In particular, the file descriptor table is located. It, like all tables in Unix, is positional, that is, the row number in the table corresponds to the FD with this number. The file name and other attributes are associated with the FD. FD numbering is the prerogative of the process, that is, FDs are unique within one process.

The number of simultaneously open files (more precisely, the maximum number of FDs associated with files) for a process is regulated by the system.

So, the open (...) function is to open an existing file.

creat (...) is a function for opening a new file, its parameters are: file name and some open parameters, just like open.

read (...) / write (...) - their parameters are the FD number and some access parameters. These functions are used to read / write from or to a file.

close (...) - completion of work with the file. After executing this function, the FD of this file is released.

These are all system calls. Also on Unix it is possible to perform I / O through library functions (for example, fopen, fread, fwrite, fclose, ...).

Let's consider the organization of exchange from a systems point of view in Unix.

When organizing the exchange, the system divides all data into two categories - the first is the data associated with the user process and the data associated with the OS.

The first OS-related data table is the Open File Inode Table (TIDOF), this table contains records, each of which contains a copy of the ID for each file open on the system. Through a copy of the ID, we access the blocks of the file. Each of these records contains a field that indicates the number of open files on the system using the given ID. That is, if we open the same files on behalf of two processes, then one record is created in the TIDOF, but each opening of this ID increases the counter by one.

Following. File table - this table contains information about the name of the open file and has a link to the ID of this file in TIDOF

This scheme will be discussed in more detail in the next

All programs and data are stored on external memory devices of the computer in the form files.

Definition. File (file - folder) - this named memory area (sequence of bytes of arbitrary length)on disk or other medium, stored and processed as a whole. The data stored in the files can be texts, programs, encoded graphic or sound information, etc. .

File It has name and attributes and characterized size in bytes, dateand time his creation or last change.

Note.

File name may be complete and incomplete. Full (compound) name file in MS-DOS consists of two parts: file nameand enlargementseparated point. Expansionalso called file type, may be absent, in which case the filename is incomplete.

Symbolsused in file name and extension are taken from the following set :

Uppercase (large) and lowercase (small) letters of the Latin alphabet ; ;

Symbols: - _ $ # & @! % () () ‘~ ^

The file name can be from one before eight characters and in extension - from scratch before three(for operating systems like MS DOS). IN Windows OS these restrictions are less stringent - the filename can contain up to 255 characters.

Some of the file extensions (types) are standard:

COM - file ready for execution (1st type);

EXE - a file ready for execution (2nd type) or an executable file, the main file of any user program;

BAT - batch batch file;

TXT - text file of any type;

MDB - Access DBMS file;

XLS - Excel spreadsheet file;

DOC - a text file containing documentation for a software product or a Microsoft Word editor file;

BMP - graphic file in dot format;

ARJ, RAR, ZIP - archived files, etc.

Definition ... File attribute is a parameter that defines the rules for viewing and editing its content.

The file may have the following attributes:

R (Read-only) - "only for reading". When an attempt is made to update or destroy such a file using the OS system tools, an error message will be displayed. The file attribute is set to protect the file from accidental modification or destruction.

H (Hidden) - "Hidden file". When viewing a directory using standard OS tools, information about the hidden file is not displayed.

S (System) - "System file". These files are used by the operating system.

A (Archive) - "Archive file". This attribute is set each time a new file is created and cleared by the archive and backup software.

Definition. Catalog a special file is called, which contains information about other files and directories, namely:

Full name;

Time and date of creation or last modification;

Size in bytes;

Attributes;

Some other information about the file structure of the disk.

Note.

Expression " file goes into directory" or " the file is contained in the directory"Means that information about this file is contained in this directory (or directory, directory - reference, index).

The magnetic disk always exists main, or root directorythat is created during the process of formatting the disk. The number of files registered in the root directory depends on the type and capacity of the disc. A large number of files in the root directory is inconvenient for the user. In addition, a situation may arise where the capacity of the main directory is insufficient for all the files to be stored on disk. Therefore, any OS provides the ability to create a hierarchical directory system on disk. In this system, directory elements can be not only ordinary files, but also other subdirectories (subdirectories), which, in turn, may include lower-level subdirectories along with the files. The number of items in a subdirectory is limited (practically) only by the capacity of the disk. Typically, subdirectories are simply called directories for brevity.

Note.

Root directory referred to (denoted) by \\ ( backslash). Root directory on each disc only and cannot be removed by software.

Naming rules for non-root directories coincide with the rules for naming files, but extensions are usually not used.

All modern operating systems provide the creation file system, designed to store data on external media and provide access to them.

For media with a small number of files (up to several dozen) can be used single-level file system, when the root directory is a linear sequence of filenames. Such a catalog can be compared to the table of contents for a children's book, which contains only the titles of individual stories.

If hundreds of files are stored on a medium, then for the convenience of working with them, multilevel hierarchical file system, having a tree structure. Such a system can be compared, for example, with the table of contents of a book, containing sections, chapters, paragraphs and paragraphs.

Examples of file systems used in a PC are systems FAT-16, FAT-32, NTFS (New Technology File System) and etc.

Each disc has its file structure, which is formed according to the following rules:

Files can have the same name in different directories, but the file names must be different in the same directory;

There are no restrictions on the order of files and directories in a directory;

The depth of directory nesting is limited by the number of characters in the directory path length.

OS directories form hierarchical structurecalled directory tree, in which the main directory forms the "root" of the tree (hence the second name main directory - "root") and the rest of the directories are like branches.

Note.

If any files and / or subdirectories merged into catalogthen they say that they are included (nested) in this directory... However, this association does not mean, what are they in any way grouped in one place on disk.

When accessing a file, you must specify the access path to it according to specificationswith the following format:

[device] [directory path] filename [. a type]

Square brackets mean that the corresponding part of the format can be omitted. Part of the format device means the medium on which the file is located or where it is written. If no media is specified then the default is current device. Directory path specifies a route from root or current directory to the file.

Definition. Current device is a device (carrier) with which at presentthe user works. His name is the meaning default for the device name in the file specification.

Note.

Directory names in the path are separated by " \ ". If the path begins with “ \ ", The file search starts with root directory... If the path is omitted, then it is implied current directory.

Definition. Currentcalled the directory that is open currently on the current device. Sometimes the concept is used working directorymeaning by this current directory of the current device... His name is the meaning default for the directory name in the file specification.

A hard magnetic disk can be programmatically divided into several parts, with which you can work as with separate disks. These parts are called logical disks or partitions, each of which, as well as a separate device, is assigned a name in the form of a Latin letter with the symbol ":". In this case, as a rule, the FDD drive is called A :, and the HDD partitions - starting with C:. Other external memory devices in the PC (CD-ROM, streamer, etc.) are named after the name of the last partition of the hard drive in alphabetical order. The logical disk (or device) from which the operating system is loaded is called the system disk.

Did you like the article? To share with friends: