Uniform Resource Identifier (uri), its purpose and components

1.4. Uniform Resource Identifier (URI)

To fully understand how HTML documents interact, navigate between pages, and from where the user's computer receives data when working with the network, you need to consider how and what is accessed using the Global Network.

Many types of resources hosted on the Internet, whether they are HTML documents, pictures, or archive files, are most often files on the hard drive of a computer (server) connected to a network. Each resource is associated with a value that can uniquely determine its location - the universal resource identifier or URI (Universal Resource Identifier). URIs are widely used both when a user accesses a resource on his own (when, for example, the user himself enters the URI in the address bar of the browser), and when navigating between web pages. URIs are also used in an HTML document to tell the browser where to look for resources (such as pictures) used in the document itself.

Note

The notation URL is also often used in the literature. It should be noted that a URI is a more general concept that includes a URL: any URL is a Uniform Resource Identifier and follows the same rules as a URI.

The resource identifier URI consists of three parts: the name of the mechanism for accessing the resource, the domain name of the computer, and the file path of the resource. To clarify this, consider an example:

Here you can see that HTTP (Hyper Text Transfer Protocol) is used to access the resource, which in this case is an HTML document. The resource is stored on a computer with the domain name somesite.com in the ex_1.html file located in the /info/examples folder.

URIs can also refer to parts of HTML documents, for example:

Using this URI, you can access the portion of an HTML document named description (how to create names for fragments of HTML documents will be discussed in Chapter 5).

URIs also allow you to refer to resources within the same computer. This specifies the relative path of the resource. For example, to refer to the file /info/files/file1.jpg from an HTML document located in the /info/examples folder, it is enough to specify the URI /files/file1.jpg. In HTML documents, such links indicate the paths of figures and other objects used in documents, but not directly stored in them.

In general, URIs are considered to be case-insensitive. However, to be completely sure that the URI is interpreted correctly, still pay attention to the case of characters in the URI of hyperlinks, images, etc. This is useful for eliminating situations where, for example, when the site is running on a Windows computer, all hyperlinks work, but site on a UNIX server refuse to work (in UNIX, file names are case sensitive).

Working with URIs

Every day we use Uniform Resource Identifiers (URIs) when looking for something on the WWW. URIs are needed to identify and request a new kind of resource. Using a URI, you can access not only Web pages, but also an FTP server, a Web service, and local files.

The term is often used instead of URI Uniform Resource Locator (URL). URI is a generic term used to refer to resources. A URL is a URI associated with popular URI schemes such as http, ftp, and mailto. In technical documentation, the term URL is no longer used.

Another term you may already know - Uniform Resource Name (URN). A URN is a standardized URI used to identify a resource regardless of its location on the network.

Let's parse the parts of a URI that refers to a page on the Global Knowledge Web site:

http://www.globalknowledge.net:80/training/generic.asp?pageid=1078&country=DACH

    The first part of the URI is called scheme (scheme). A scheme defines a URI namespace and can narrow the syntax of an expression following the scheme. Many schemes are named after the respective protocols (like http, ftp) they use, but this is not mandatory. In our example, the schema identifier is http. Circuit Delimiter(// in this example) separates the schema from the rest of the URL.

    The schema delimiter is followed by the server name or IP address in dotted decimal notation, such as www.globalknowledge.net.

    Behind the server name or IP address is a port number that identifies the connection to a specific application on the server. If no port number is specified, the default port number for that protocol is used (for example, port 80 for HTTP).

    Way specifies the page (and directory) of the requested resource. It does not necessarily represent a physical file on the server, but can be created dynamically. In this case, the path is /training/generic.asp.

    From the path of the symbol? separated the last part of this URI, called query. In our example, the request is defined by the string pageid=1078&country=DACH. A query string can consist of several components, each of which specifies a variable and a value, combined with the & symbol. Multiple query components can be combined with the & symbol. So, in our example, the first component is pageid=1078 with the pageid variable set to 1078, and the second component is country=DACH.

    Sections within a resource can be identified with fragments. Fragments are used to link to sections within an HTML page. In Web page development, snippets are also called bookmarks. The # character separates the fragment identifier from the path. In the URL http;//www.microsoft.com/net/basics/glossary.asp#NETFramework, the fragment is the string #NETFramework.

If the # character is added to the query string, then it is no longer a fragment. A URL can contain a query string or a fragment, but not both.

Multiple characters are reserved in URIs - they cannot appear in hostnames or paths because they are special delimiter characters. The following characters are reserved in the URI:

; / ? : @ & = + $ ,

Uri class from the System namespace encapsulates the Uniform Resource Identifier. It contains properties and methods for parsing, comparing, and combining URIs.

You can create a Uri object by passing a URI string to the constructor:

Uri baseURI = new Uri("http://site");

If there is already a base Uri object, you can create a new URI by combining the base URI with a relative URI:

Uri baseURI = new Uri("http://site"); Uri newURI = new Uri(baseURI, "my/csharp/web/level2/2_2.php");

If the base URI already contains a path, it is ignored. The new URI is based only on the scheme, port, and server name.

The Uri class has several read-only static fields that allow you to get some of the commonly used schemes:

Uri.UriSchemeFile

The file scheme is used to access files locally or on shared network resources, which can be named according to the universal naming convention ( Universal Naming Convention, UNC).

Uri.UriSchemeFtp

The FTP protocol with the ftp scheme is used to receive files from an ftp server and, conversely, to put files on an ftp server.

Uri.UriSchemeGopher

The gopher protocol was the forerunner of HTTP. It provided hierarchical browsing capabilities text information about content, in which FTP excelled. But it was soon replaced by the HTTP protocol.

Uri.UriSchemeHttp, Uri.UriSchemeHttps

These two schemes are well known: http and https. The https scheme is used for secure exchange.

Uri.UriSchemeMailto

The mailto scheme is used to send mail messages.

Uri.UriSchemeNews, Uri.UriSchemeNntp

The news and nntp schemas are used in newsgroups that use the NNTP protocol.

The Uri class has static methods for checking the correct scheme and hostname: Uri.CheckSchemeName() returns true if the scheme name is correct and the method UriCheckHostName() not only checks the host name, but also returns a UriHostNameType enumeration value indicating the host type.

The Uri class has many read-only properties that allow you to access all parts of a URI. In the following table, we use the above URI as an example to demonstrate the use of properties:

AbsoluteUri This property shows the full URI. If the specified port number for the protocol is equal to the default port number, the Uri constructor automatically removes it. For our example, the value of the AbsoluteUri property looks like this: http://www.globalknowledge.net/t raining/generic.asp?pageid=1078&country=DACH. If a filename is passed to the constructor of the Uri class, the AbsoluteUri property automatically precedes the filename with the file:// scheme.
Scheme The scheme is the first part of the URI, and in this case this property returns the value http.
Host The Host property shows the hostname from the URI: www.globalknowledge.net
Authority If the port number is equal to the default number used by the protocol, the Authority property shows the same string as the Host property. If a different port number is used, then the Authority property also shows the port number.
hostnametype The host name type depends on the name used. In this case, the same value of the UriHostNameType enumeration that was discussed above is obtained.
port Using the Port property, the port number is obtained - 80.
AbsolutePath An absolute path starts after the port number in the URI and ends before the query string. In this case, it is /training/generic.asp.
localpath The local path gives the value /training/generic.asp. As you can see, for an HTTP request, there is no difference between AbsolutePath and LocalPath. The difference appears if the URI refers to a shared network resource. For a URI of the form file:\\server\share\directory\file.txt, the LocalPath property returns only the directory and file names, while the AbsolutePath property includes the server and share names.
Query The Query property shows the string following the path: ?pageid=1078&country=DACH.
PathAndQuery The PathAndQuery property gives the combination of the path and the query string: /training/generic.asp?pageid=1078&country=DACH.
Fragment If the path is followed by a fragment, it is returned in the Fragment property. The path can only be followed by a query string or a fragment. The fragment is identified by the symbol #
segments The Segments property returns an array of strings formed from the path. In this case, we have three segments: /, training/ and generic.asp.
UserInfo The username set in the URI can be read from the UserInfo property. Passing usernames is common in the FTP protocol, and if a non-anonymous user is specified, such as ftp:// [email protected], then the UserInfo property will return myuser.

In addition to those listed, there are several other properties that return boolean values ​​if the URI represents a file, a UNC path, a loopback address, or if the protocol uses the default port number. These are the IsFile, IsUnc, IsLoopback and IsDefaultPort properties, respectively.

To access any network resources, you need to know where they are located and how they can be accessed. The World Wide Web uses a standardized addressing and identification scheme, taking into account the addressing and identification experience of e-mail, Gopher, WAIS, telnet, ftp, and the like. - URL, Uniform Resource Locator.

URI(Uniform Resource Identifier, Uniform Resource Identifier) ​​(RFC 2396, August 1998) A compact character string for identifying an abstract or physical resource. A resource is any object belonging to some space. Includes and overrides previously defined URLs (RFC 1738/RFC 1808) and URNs (RFC 2141, RFC 2611).

The URI is intended to uniquely identify any resource.

Some subsets of URIs:

URN(Uniform Resource Name) - A private "urn:" URI scheme with a "namespace" subset that must be unique and unchanged even if the resource no longer exists or is not available.

It is assumed that, for example, the browser knows where to look for this resource.

Syntax:

urn:namespace: data1.data2,more-data where namespace specifies how the data after the second ":" is used.

URN example:

urn:ISBN: 0-395-36341-6

ISBN - thematic classifier for publishers

0-395-36341-6 - specific number of the subject of the book or magazine



Upon receiving a URN, the client program accesses the ISBN (the "topic classifier for publishers" directory on the Internet). And he receives a decoding of the subject number "0-395-36341-6" (for example: "quantum chemistry").

URN is widely used in P2P networks (like edonkey).

An example URN pointing to an Adobe Photoshop v8.0 disk image on the edonkey network:

urn:ed2k://|file|AdobePhotoshopv8.0.iso|940769280|b34c101c90b6dedb4071094cb1b9f2d3|/

ed2k - points to a network

Adobe Photoshop v8.0.iso - file name

940769280 - size in bytes

- file identifier (calculated using a hash function)

Uniform Resource Locator URL:

URL(Uniform Resource Locator, RFC 1738) - a unified resource locator (pointer), a standardized way to record a resource address in the WWW and the Internet. The URL has a flexible and extensible structure for the most natural location of resources on the network, which identifies the resource by the way it is accessed (for example, its "location on the network"), instead of identifying it by the name or other attributes of this resource.

URL examples:

http://www.ipm.kstu.ru/index.php

ftp://www.ipm.kstu.ru/

A limited set of ASCII characters is used to represent an address.

The general form of the address can be represented as follows:

<схема>://<логин>:<пароль>@<хост>:<порт>/<полный-путь-к-ресурсу >

resource access schema: http, ftp, gopher, mailto, news, telnet, file, man, info, whatis, ldap, wais, etc.

Login: Password- username and password used to access the resource

host the domain name of the host or its IP address.

Port- host port to connect

full-path-to-resource - specifying information about the location of the resource (depends on the protocol).

URL examples:

http://example.com #default start page request

http://www.example.com/site/map.html #request the given page in the given directory

http://example.com:81/script.php #connect to a non-standard port

http://example.org/script.php?key=value #request with parameter passing to the script

ftp://user: [email protected]#connect to ftp server with authorization

http://192.168.0.1/example/www #connect to network address

file:///srv/www/htdocs/index.html #opening local file

gopher://example.com/1 #connect to gopher server

URL - Uniform Resource Locators explicitly describes how to get to the object.

The advent of URLs has become a significant innovation on the Internet. However, from its inception to the present day, the URL standard has a serious drawback - it can only use a limited set of characters, even smaller than in ASCII: letters, numbers and only some punctuation marks- .

If we want to use Cyrillic characters, or hieroglyphs, or, say, specific French characters in the URL, then the characters we need must be recoded in a special way.

In the Russian-language Wikipedia, one sees examples of URL encoding every day, since the Russian language uses Cyrillic characters. For example, a line like:

http://en.wikipedia.org/wiki/Microcredit

encoded in the URL as:

http://ru.wikipedia.org/wiki/%D0%9C%D0%B8%D0%BA%D1%80%D0%BE%D0%BA%D1%80%D0%B5%D0%B4%D0 %B8%D1%82

Such a conversion occurs in two stages: first, each Cyrillic character is encoded in Unicode (UTF-8) into a sequence of two bytes, and then each byte of this sequence is written in hexadecimal representation:

M → D0 and 9C → %D0%9C

and → D0 and B8 → %D0%B8

to → D0 and BA → %D0%BA

p → D1 and 80 → %D1%80, etc.

Each such hexadecimal byte code, according to the URL specification, is preceded by a percent sign (%) - hence even the English term "percent-encoding" originated, denoting the way characters are encoded in URLs and URIs.

Since the letters of all alphabets are subjected to such a transformation, except for the basic Latin alphabet, a URL with words in the vast majority of languages ​​(except English, Italian, Latin) can become unreadable for a person.

All this is in conflict with the principle of internationalism proclaimed by all the leading organizations of the Internet, including the W3C and ISOC. The IRI (International Resource Identifier) ​​standard is designed to solve this problem - international resource identifiers in which Unicode characters could be used without problems, and which therefore would not infringe on the rights of other languages.

Other URL schemes

HTTP Scheme.

The scheme specifies its identifier, machine address, TCP port, path in the server directory, variables and their values, label.

Syntax:

http://[ [:@][:][?]]

http - schema name

user - username

host - host name

port - port number

query(<имя-поля>=<значение>{&<имя-поля>=<значение>) - query string

Defined in RFC 2068. By default, port=80.

Examples:
http://ipm.kstu.ru/internet/index.php

This is the most common type of URI used in WWW documents. The schema name (http) is followed by a path consisting of the domain address of the machine and the full address of the HTML document in the HTTP server tree.

An IP address can also be used as a machine address:

http://195.208.44.20/internet/index.php

If the HTTP protocol server is running on something other than 80 TCP port, then this is reflected in the address:

http://195.208.44.20:8080/internet/index.php

http://195.208.44.20/internet/index.php#metka1
The "#" character separates the document name from the label name.

Variables and their values ​​are passed as follows:
http://ipm.kstu.ru/internet/index.php?var1=value1&vard2=value2

The values ​​"var1" and "var2" are variable names, and "value1" and "value2" are their values.

FTP scheme

This scheme allows you to address FTP file archives.

Syntax:

ftp://[ [:@][:]

ftp - schema name

user - username

password - user password

host - host name

port - port number

url-path - the path to the file and the file itself

Defined in RFC 1738. By default, port=21, user=anonymous, password=email address, if a name is specified but no password, it is requested in the dialog.

looks like:

//...//[;type= ], where :

Examples: ftp://ipm.kstu.ru/students/name/

To specify a username and password, you need to write like this:
ftp://name: [email protected]://ipm.kstu.ru/students/name/

In this case, these parameters are separated from the machine address by the "@" symbol, and from each other by a colon.

Mailto schema

This scheme is for sending mail.

Syntax:

mailto:[ {,,...}][?]

mailto - schema name

e-mail-1 ( @) - first email address

user - username

host - host name

e-mail-2 - second email address

query(<имя-поля-заголовка>=<значение>{&<имя-поля-заголовка>=<значение>) - query string

mailto: [email protected]

In this schema, fields and their values ​​are passed:

mailto: [email protected]?subject=Subject_of_the_mail&body=Text_to_be_embedded_in_the_mail

The recipient's address can also be written as the value of the to field:

mailto: [email protected]?subject=Subject_of_the_mail&body=Text_to_be_embedded_in_the_mail

What is HTTP?

The first document (but not a standard) is RFC1945 (Hypertext Transfer Protocol -- HTTP/1.0 T. Berners-Lee, R. Fielding, H. Frystyk May 1996)

Latest version is RFC2616 (Hypertext Transfer Protocol -- HTTP/1.1 R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee June 1999)

Hypertext Transfer Protocol is a hypertext transfer protocol, a high-level (namely, application-level) protocol. Used by the WWW service to transfer Web pages.

HTTP (HyperText Transfer Protocol, RFC 2616, current version HTTP/1.1) is a hypertext transfer protocol. This protocol was originally intended for the exchange of hypertext documents, now its capabilities have been significantly expanded (in particular, streaming support features have been added).

HTTP is a typical client-server protocol, messages are exchanged according to the "request-response" scheme in the form of ASCII commands. A feature of the HTTP protocol is the ability to specify in the request and response how the same resource is represented by various parameters: format, encoding, language, etc. It is thanks to the ability to specify the message encoding method that the client and server can exchange binary data, although this protocol is text.

HTTP is an application layer protocol but is also used as a "transport" for other application protocols such as SOAP, XML-RPC, WebDAV.

The HTTP protocol defines a request-response method of interaction between a client program and a server program within the technology. world wide Web.

To load a web page into a client browser, it sends a message installed on the server computer special program, called http-server, the corresponding request and processes the data received from it. In this case, the functions of the browser are to request a certain page from the server, get it and display it on the user's screen. The server, on the other hand, receives the request, searches for the requested document, and gives the client either the contents of the found file or an error message if such a file was not found or access to it was denied for some reason. An important point for understanding this process is that the http server does not parse the content of the transmitted document. Roughly speaking, the http-server does not care what is inside the requested file, it only transfers it to the browser, and the latter takes all the work of structuring and displaying the information received.

The search for the requested page is carried out in a certain directory, which is allocated on the server computer for this site - a link to this directory is present in the address entered by the user. In the case when the request is made not to a specific document, but to the site as a whole, the http server automatically substitutes the so-called "start page" instead of the name of the transferred file, which is named index.htm or index.html (in some cases - default. htm or default.html). This document must be located in the root directory designated for hosting your site, or, if otherwise specified, in a directory called WWW. All other files can be placed either in the same directory or in nested directories, which is sometimes convenient, especially when the site contains several thematic sections or headings.

In addition to the subfolders you create, in which you are free to place almost any content you need, the server directory usually contains several more directories that should be mentioned separately. Firstly, this is the CGI-BIN folder, where CGI scripts and other interactive applications launched from your site are located, as well as several service directories necessary for the normal operation of the server. At the initial stage, they simply should not be paid attention to. Sometimes in the same directory where index.html is stored, there is a row additional files: not_found.html - the document that is displayed if the http server could not find the file requested by the user, forbidden.html - displayed as an error message if access to the requested document is denied, and finally robots.txt - the file , which specifically describes the rules for indexing your site by search engines.

In most cases, and especially when publishing a home page on servers that provide free hosting, users are denied access to service directories and the CGI-BIN folder, and changing the contents of not_found and forbidden.html files is also impossible. This should be taken into account if you plan to include any interactive content in your resource that requires at least the ability to place files in one of the service folders. In some cases, you may not be allowed to create nested directories on the server, in which case the user will have to be content with only one directory reserved for your needs.

From the foregoing, it becomes clear that the client browser can only receive and process information from the server, and place and modify it only if the upload of files to the server is implemented based on the HTTP protocol using special CGI scripts included in the server web. -interface. In all other cases, you have to use the so-called ftp-server, to which you can transfer the necessary files using special software, automatically uploading them to the directory designated for your site. In both cases, you will need to know your login name and password to access the system. It should also be remembered that most server programs (in particular, Apache for UNIX-compatible platforms) distinguish between lowercase and uppercase characters, so all file names and their extensions should be written in lowercase letters to avoid errors, and always in Latin. The latter is due to differences in the processing of Russian language encodings, which are typical for certain servers.

The HTTP protocol works as follows: the client program establishes a TCP connection with the server (standard port number is 80) and issues an HTTP request to it. The server processes this request and issues an HTTP response to the client.

The interaction between the client and the Web server is carried out by exchanging messages. HTTP messages are divided into client-to-server requests and server-client responses.

Request and response messages have a common format. Both message types look like this: first comes a start-line, then possibly one or more header fields, also called headers, then an empty line (that is, a line consisting of CR and LF characters), indicating the end of the header fields, and then possibly the body of the message:

initial string

header field 1

header field 2

header field N

message body

HTTP headers

The format of the initial string of the client and server differ and will be discussed below. There are four types of headers:

General headers (general-headers), which can be present both in the request and in the response;

Request headers (request-headers), which can only be present in the request;

Response headers (response-headers), which can only be present in the response;

Entity-headers that refer to the message body and describe its content.

Each heading consists of a title, a colon character ":" and a value. The most important headings are shown in Table 1.

Table 1

HTTP headers

header Purpose
Object Titles
allow Lists the methods supported by the server
content-encoding The way the body of the message is encoded, for example to reduce the size
content length Message length in bytes
content-type Contains the MIME content type designation of the response. Depending on the Content-Type value, the browser treats the response as an HTML page, gif image or jpeg, as a file to be saved to disk, or whatever, and takes the appropriate action. Some content types: text/html - text in HTML format (web page); text/plain - plain text (similar to "notepad"); image/jpeg - image in JPEG format; image/gif - the same, in GIF format; It can also pass an encoding for text data. For example: charset=windows-1251 charset=koi8-rus Content-Length - length of response content in bytes (file size). Last-Modified - date and time when the document was last modified.
ETag A unique resource tag on the server that allows you to compare resources
Expires Date and time when the resource on the server will be changed, and it needs to be retrieved again
Last Modified Date and time the content was last modified
Response headers
Age Number of seconds to retry the request to get new content
location The URI of the resource to access to get the content
Retry-After Date and time or number of seconds after which the request must be retried to receive a successful response
server The name of the server software that sent the response
Request headers
Accept A list of content types supported by the browser in order of preference by this browser, for example: Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/msword, application/vnd. ms-powerpoint, */* This is obviously necessary for the case when the server can issue the same document in different formats. The value of this parameter is mainly used by CGI scripts to generate a response adapted for a given browser.
Accept Charset Character encodings in which the client can accept text content
Accept Encoding The way the server can encode the message
Host Host and port number from which the document is requested
If-Modified-Since If-Match If-None-Match If-Range If-Unmodified-Since Request headers for conditionally accessing a resource
Range Request a part of a document
User agent The name of the client software - the value is the "code name" of the browser, for example: Mozilla/4.0 (compatible; MSIE 5.0; Windows 95; DigExt)
General headings
connection Connection (connection) - can take the values ​​Keep-Alive and close. Keep-Alive ("keep alive") means that after the issuance of this document, the connection with the server is not interrupted, and more requests can be issued. Most browsers work in Keep-Alive mode, since it allows you to "download" an html page and pictures for it in one connection to the server. Once set, Keep-Alive persists until the first error or explicitly specified in the next Connection: close request. close - The connection is closed after the response to this request.
Date Date and time the message was generated
pragma Special, implementation-specific commands regarding the content being transferred
Transfer Encoding How the message is encoded during transmission

In some headers, the value is a date and time. They must be in the format described in RFC 1123, for example:

The body of the message contains the actual transmitted information - the payload of the message. The message body is a sequence of octets (bytes). The body of the message may be encoded, with the encoding method specified in the header of the Content-Encoding object.

A request message from a client to a server consists of a request-line, headers (general, requests, object), and optionally a message body.

The request string starts with a method, followed by the requested resource ID, protocol version, and trailing end-of-line characters:

<Метод> <Идентификатор> <Версия HTTP>

The method specifies the method to apply to the requested resource. For example, the GET method says that the client wants to get the contents of the resource. The identifier identifies the requested resource. The HTTP version is indicated by a string like this:

http/<версия>.<подверсия>

HTTP protocol methods

Consider the basic methods of the HTTP protocol.

The OPTIONS method requests information about the connection options (eg, methods, document types, encodings) that the server supports for the requested resource. This method allows the client to determine the options and/or requirements associated with the resource, or the capabilities of the server, without taking any action on the resource or initiating a download of it.

If the server's response is not an error message, then the entity headers contain information that can be thought of as connection options. For example, the Allow header lists all methods supported by the server for a given resource.

If the requested resource identifier is an asterisk ("*"), then the OPTIONS request is intended to address the server as a whole.

If the requested resource identifier is not an asterisk, then the OPTIONS request applies to the options that are available when connecting to the specified resource.

The GET method allows you to get any information related to the requested resource. In most cases, if the requested resource identifier points to a document (for example, Text Document, graphic image, video), then the server returns the contents of this document (the contents of the file). If the requested resource is a data-generating application (program), then generated data is returned in the body of the response message rather than a binary image of the executable. This is used, for example, when creating CGI applications. If the identifier of the requested resource points to a directory (directory, folder), then, depending on the server settings, either the contents of the directory (list of files) or the contents of one of the files located in this directory (usually index.html or default.htm). In the latter case, the folder name can be specified either with or without the "/" character at the end. In the absence of this symbol at the end of the identifier, the server issues one of the redirect responses (with status codes 301 or 302).

A distinction is made between "conditional GET", in which the request message includes the If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match, or If-Range request headers. The conditional GET method requests the transfer of an object only if it satisfies the conditions described in the provided headers. The conditional GET method is intended to reduce unnecessary network load, since it allows not to reload data already saved by the client.

A distinction is also made between "partial GET", in which the request message includes a Range request header. A partial GET requests the transfer of only part of an object. The partial GET method is intended to reduce unnecessary network load by requesting only part of an object when the other part has already been loaded by the client. The value of the Range header is the range of bytes to be received. Bytes are numbered from 0. The start and end bytes of a range are separated by a "-" character. If you need to get several ranges, then they are listed separated by commas.

The HEAD method is identical to GET, except that the server does not return a message body in the response. The meta-information contained in the HTTP headers of a response to a HEAD request is identical to the information provided in response to a GET request. This method can be used to obtain information about the request object without passing the object body directly. The HEAD method is often used to test hypertext links.

The POST method is used for a request in which the addressed server takes the data included in the message body (object) of the request and sends it for processing to the application specified as the requested resource. POST is designed to be a common method to implement the following functions:

Annotation of existing resources;

Posting a message to a bulletin board (BBS), newsgroups, mailing lists, or similar group of articles;

Passing a block of data, such as the result of an input in a form, to a processing process;

Execution of queries to databases (DB);

In fact, the function performed by the POST method is determined by the application pointed to by the identifier of the requested resource. Along with the GET method, the POST method is used when creating CGI applications. The browser can generate requests with the POST method when submitting forms. To do this, the FORM element of the HTML document containing the form must have a METHOD attribute with a value of POST.

An action performed by the POST method may perform an action on the server and not pass any content as a result of the operation. In this case, depending on whether the response includes a message body describing the result or not, the response status code can be either 200 (OK) or 204 (No Content).

If the resource on the server has been created, the response contains a 201 (Created) status code and includes a Location response header.

The body of the message that is passed in the request with the PUT method is stored on the server, and the identifier of the requested resource will be the identifier of the saved document. If the identifier of the requested resource points to an already existing resource, then the object included in the body of the message is treated as a modified version of the resource located on the server. If a new resource is created, then the server informs the user agent about it by means of a response with a status code of 201 (Created, Created).

The fundamental difference between the POST and PUT methods is different meaning the identifier of the requested resource. The URI in a POST request identifies the resource that handles the object included in the body of the message. This resource can be an application that receives the data. In contrast, a URI in a PUT request identifies the entity included in the request as the body of the message, that is, the user agent assigns that URI to the included resource.

The DELETE method asks the server to delete the resource that has the requested identifier. A request with this method may be rejected by the server if the user does not have permission to delete the requested resource.

The TRACE method is used to return a passed request at the HTTP protocol layer. The recipient of the request (the Web server) sends the received message back to the client as a response object body with a status code of 200 (OK). A TRACE request must not contain a message body.

TRACE allows the client to see what the server is receiving at the other end and use that information for testing or diagnostics.

If the request is successful, then the response contains the entire request message in the body of the response message, and the Content-Type object header is set to "message/http".

Response codes

After receiving and interpreting the request message, the server responds with an HTTP response message.

The first line of the response is the Status-Line. It consists of the protocol version, a numeric status code, an explanatory phrase, separated by spaces, and trailing end-of-line characters:

<Версия HTTP> <Код состояния> <Поясняющая фраза>

The protocol version has the same value as in the request.

The Status-Code element is an integer three-digit (three-digit) code of the result of understanding and satisfying the request. The Reason-Phrase is a short textual description of the status code. The status code is for software processing and the explanatory phrase is for users.

The first digit of the status code determines the class of the response. The last two digits have no specific role in the classification. There are 5 values ​​for the first digit:

1xx: Information codes - request received, processing continues.

2xx: Success codes - The action was successfully received, understood and processed.

3xx: Redirect Codes - Further action must be taken to complete the request.

4xx: Client Error Codes - The request has a syntax error or cannot be completed.

5xx: Server Error Codes - The server is unable to fulfill a valid request.

Reason-Phrases for each status code are listed in RFC 2068 and are recommended but may be replaced with equivalent ones without affecting the protocol. For example, in localized Russian-language versions of HTTP servers, these phrases are replaced by Russian ones. Table 2 shows the HTTP server response codes.

table 2

HTTP Server Response Codes

The code Explanatory phrase according to RFC 2068 Equivalent explanatory phrase in Russian
1xx: Information codes
Continue Continue
2xx: Success codes
OK OK
Created Created
no content No content
Reset content Reset content
Partial content Partial content
3xx: Redirect codes
Moved Temporarily Temporarily relocated
Not Modified Not modified
4xx: Client error codes
Bad Request Broken Request
Unauthorized Unauthorized
not found Not found
Method Not Allowed Method not allowed
Request Timeout Request timed out
Conflict Conflict
Length Required Required Length
Request Entity Too Large Request object is too big
5xx: Server error codes
Internal Server Error Internal Server Error
Not Implemented Not implemented
Service Unavailable Service is unavailable
HTTP Version Not Supported Unsupported version of HTTP

The status line is followed by the headers (general, response, and object) and optionally the body of the message.

One of the most important functions of a Web server is to provide access to a portion of the local file system. To do this, a certain directory is specified in the server settings, which is the root for this server. To publish a document, that is, to make it available to users, who "visited" this server (having made a connection with it via HTTP), need to copy this document to the root directory of the Web server or to one of its subdirectories. When connecting via the HTTP protocol, a process is created on the server with user rights, which, as a rule, does not exist in reality, but is specially created to view server resources. Setting Rights and Permissions this user You can control access to Web resources.

Consider the simplest example HTTP request. If we type the address http://yandex.ru in the address window of the browser, the browser will determine the IP address of the yandex.ru server and send it the following HTTP request on the 80th port:

GET http://yandex.ru/ HTTP/1.0

Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/msword, application/vnd.ms-powerpoint, */*

Accept-Language: en

Cookie: yandexuid=2464977781018373381

User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)
Host: yandex.ru

Referer: narod.ru

Proxy Connection: Keep Alive

The request is sent in unencrypted text form. The most important part of the request is located in the first line: This is the request type (GET), the URL of the requested document (http://yandex.ru) and the version of the HTTP protocol (HTTP/1.0). The query parameters are listed below. Each line corresponds to one parameter. The line starts with the name of the parameter, followed by a colon and the value of the parameter.

Accept - the type of data that the browser can accept (in MIME encoding).

Accept-Language is the preferred language in which the browser wants to accept data. User-Agent - the type of program that sent the request.

Host - DNS (or IP) name of the host to which the request is addressed.

Cookie - cookies (data that was stored by the server on the client's local disk when visiting this host last time).

Referer - the host from whose page we are sending the request. So, for example, if we are on the http://narod.ru page and click on the http://yandex.ru link there, then the request will be sent to the yandex.ru host, and the referer request field will contain the host name narod.ru.

The set of query parameters is not fixed. In addition to the above, there may be other parameters.

The most interesting parameters are referer and cookie. These parameters are mainly used to identify the user by the server.

The GET request may contain data passed from the client to the server. They are transmitted directly via a URL using the CGI protocol. The data is separated from the URL by a “?” and are connected with the sign “&”:

GET ?<параметр 1>=<значение 1>&<параметр 2>=<значение 2>&…

This type of data transfer to the server is convenient, but has limitations on the volume. Too large data arrays cannot be transmitted via URL. For such purposes, there is another type of request: a POST request. A POST request is very similar to a GET request, with the only difference being that the data in the POST request is passed separately from the request header itself:

The request body must be separated from the header by an empty line. If the server encounters an empty string in a POST request, then everything that follows is considered the body of the request (transmitted data). Note the following: the format of the data in the body of the POST request is arbitrary. Although the CGI format is most commonly used, it is not required. In addition, a POST request does not require a request body, and can pass data through a URL as well.

In addition to the CGI format, sometimes to transfer large amounts of information (for example, files) use the so-called. multipart format (the format of the transmitted data is determined by the Content-Type parameter):

Modern browsers contain tools for web developers to get some information about the post requests that are being sent. If you need to look at the headers of just a couple of requests, using them will be easier and faster than other methods.

If you are using Firefox, you can use its web console. It displays the request headers and the content of the transmitted cookies. To launch it, open the browser menu, click on the "Web Development" item and select "Web Console". In the panel that appears, activate the "Network" button. Enter the method name in the filter field - post. Depending on your goals, click on the form button that sends the required request or refresh the page. The console will display the submitted request. Click on it with your mouse to see more details.

Google Browser Chrome has powerful debugging tools. To use them, click on the wrench icon, and then expand the "Customize and control Google Chrome" item. Select "Tools" and launch "Developer Tools". On the toolbar, select the Network tab and submit the request. Find the required request in the list and click on it to view the details.

IN Opera browser there are built-in tools for Opera Dragonfly developers. To launch them, right-click on the desired page and select the "Inspect Element" item from the context menu. Go to the "Network" tab of the developer tools and submit the desired request. Find it in the list and expand it to examine the server's headers and responses.

Internet Explorer 9 contains a suite called "F12 Developer Tools" that provides detailed information on completed requests. They are launched by pressing the F12 button or using the "Tools" menu containing the item of the same name. To view the request, go to the "Network" tab. Find a given query in the summary and double-click to expand the details.

Chrome browsers and Internet Explorer 9 contain built-in tools that allow you to examine the submitted post request in detail. For full details, use them or Firefox with the Firebug plugin installed. It is very handy for frequently examining queries, for example, when debugging websites.

If you want to see a request sent by a program other than the browser, use Fiddler's HTTP debugger. It works as a proxy server and intercepts requests from any program, and provides very detailed information on their headers and content.

URI (Uniform Resource Identifier) is a uniform (uniform) resource identifier. URI - a character string that allows you to identify any resource: a document, image, file, service, e-mail box, etc. First of all, we are talking, of course, about the resources of the Internet and the World Wide Web. URI provides a simple and extensible way to identify resources. URI extensibility means that there are already several identification schemes within URIs, and more will be created in the future.

Relationship between URI, URL and URN

Venn diagram showing subsets of the URI scheme: URL and URN.

A URI is either a URL or a URN or both.

  • A URL is a URI that, in addition to identifying a resource, also provides information about the location of that resource.
  • A URN is a URI that only identifies a resource in a specific namespace (respectively, in a specific context), but does not specify its location. For example, the URN urn:ISBN:0-395-36341-1 is a URI that points to resource (book) 0-395-36341-1 in the ISBN namespace, but unlike a URL, a URN does not point to the location of that resource: it does not say in which store it can be bought or on which site to download.

Since a URI does not always indicate how to get a resource, unlike a URL, but only identifies it, this makes it possible to describe using RDF (Resource Description Framework) resources that cannot be obtained via the Internet (for example, a person, a car, city, etc.).

History

In 1990, in Geneva, Switzerland, within the walls of the European Council for Nuclear Research, the URL resource locator was invented by British scientist Tim Berners-Lee. Since the URL is the most used subset of the URI, the same year 1990 is considered to be the birth year of the URI. But strictly speaking, the URI concept was only documented in June 1994 in RFC 1630.

A new version of the URI was defined in 1998 in RFC 2396, at the same time the word Universal name was changed to Uniform.

disadvantages

The URL was a fundamental innovation on the Internet, so URI principles were documented to be fully compatible with URLs. This is where the big drawback of URI came from, which came as a legacy from the URL. URIs, like URLs, can only use a limited set of Latin characters and punctuation marks (even smaller than in ASCII). In other words, if we want to use Cyrillic characters, or hieroglyphs, or, say, specific characters of the French language, in the URI, then we will have to encode the URI in the same way that URLs with Unicode characters are encoded in Wikipedia. For example, a line like:

https://en.wikipedia.org/wiki/Cyrillic

encoded in the URL as:

https://ru.wikipedia.org/wiki/%D0%9A%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0

Since letters of all alphabets are subjected to such a transformation, except for the Latin alphabet used in English, URIs with words in other languages ​​(even European ones) lose their ability to be perceived by people. And this is in gross contradiction with the principle of internationalism proclaimed by all the leading organizations of the Internet, including the W3C and ISOC. This problem is intended to be solved by the IRI standard. Internationalized Resource Identifier) are international resource identifiers in which Unicode characters could be used without problems, and which would not infringe on the rights of other languages. Also, the creator of the URI, Tim Berners-Lee, said that the domain name system underlying the URL is bad decision, which imposes a hierarchical architecture on resources that is not suitable for the hypertext web.

URI structure

URI = [scheme ":"] hierarchical - part [ "?" request ] [ "#" fragment ]

In this entry:

Scheme

resource access scheme (often indicates a network protocol), e.g. http, ftp, file, ldap, mailto, urn

hierarchical-part

contains data, usually organized in a hierarchical form, which, together with data in a non-hierarchical component inquiry, are used to identify a resource within the scope of a URI scheme. Usually hierarchical-part contains the path to the resource (and possibly, before it, the address of the server on which it is located) or the resource identifier (in the case of a URN).

Inquiry

this optional URI component is described above.

Fragment

(also optional)

Allows you to indirectly identify a secondary resource by referring to the primary and specifying additional information. A secondary identifiable resource may be some part or subset of a primary, some representation of it, or another resource defined or described by such a resource.

Parsing the URI structure. For the so-called "parsing" URI (eng. parsing), that is, to decompose a URI into its component parts and then identify them, it is most convenient to use the regular expression system that is now available in almost all modern programming languages. RFC 3986 recommends using the following pattern to parse a URI:

This pattern includes the 9 groups indicated above by numbers (for more on patterns and groups, see Regular expressions), which most fully and accurately parse the typical structure of a URI, where:

  • group 2 - scheme,
  • group 4 - source,
  • group 5 - path,
  • group 7 - request,
  • group 9 - fragment.

Thus, if using this pattern to parse, for example, such a typical URI:

http://www.ics.uci.edu/pub/ietf/uri/#Related

then the 9 above pattern groups will produce the following results respectively:

  1. http:
  2. //www.ics.uci.edu
  3. www.ics.uci.edu
  4. /pub/ietf/uri/
  5. no result
  6. no result
  7. #Related
  8. related

URI examples:

Absolute URIs

  • https://ru.wikipedia.org/wiki/URI
  • ftp://ftp.is.co.za/rfc/rfc1808.txt
  • file://C:\UserName.HostName\Projects\Wikipedia_Articles\URI.xml
  • file:///C:/file.wsdl
  • file:///Users/John/Documents/Projects/Web/MyWebsite/about.html
  • ldap:///c=GB?objectClass?one
  • mailto: [email protected]
  • sip: [email protected]
  • news:comp.infosystems.www.servers.unix
  • data:text/plain;charset=iso-8859-7,%be%be%be
  • tel:+1-816-555-1212
  • telnet://192.0.2.16:80/
  • urn:oasis:names:specification:docbook:dtd:xml:4.1.2

2) Relative URIs

  • /relative/URI/with/absolute/path/to/resource.txt
  • //example.org/scheme-relative/URI/with/absolute/path/to/resource.txt
  • relative/path/to/resource.txt
  • ../../../resource.txt
  • resource.txt
  • /resource.txt#frag01
  • #frag01

[empty string] - equivalent to parsing the identifier by the parser with the result [empty string], i.e. the link leads to the default object in the default schema

DNS Service

DNS - domain name system. DNS domain names are synonyms for an IP address, just like the names in your phone's address book are synonyms phone numbers. They are character, not numeric; they are more convenient for memorization and orientation; they carry meaning. www.irnet.ru → DNS tables →193.232.70.36 Domain names are also unique, i.e. There are no two identical domain names in the world. Domain names, unlike IP addresses, are optional, they are purchased separately.

Rice. 2. Hierarchy in the DNS system.

Also unique are the addresses that are indicated on envelopes when delivering letters by regular mail. There are no countries in the world with the same name. And if the names of cities are sometimes repeated, then in combination with the division into larger administrative units such as districts and regions, they become unique. And street names should not be repeated within the same city. Thus, an address based on geographic and administrative names uniquely identifies a destination. Domains have a similar hierarchy. Domain names are separated from each other by dots: lingvo.yandex.ru, krkime.com.

DNS has the following characteristics:

  • Distribution of administration. Different people or organizations are responsible for different parts of the hierarchical structure.
  • Distribution of information storage. Each network node must necessarily store only those data that are included in its area of ​​responsibility, and (possibly) addresses root DNS servers.
  • Information caching. Knot maybe store a certain amount of data not from their area of ​​​​responsibility to reduce the load on the network.
  • Hierarchical structure, in which all nodes are combined into a tree, and each node can either independently determine the work of downstream nodes, or delegate(transmit) them to other nodes.
  • Reservation. Several servers (usually) are responsible for storing and maintaining their nodes (zones), separated both physically and logically, which ensures the safety of data and the continuation of work even if one of the nodes fails.

Domain levels. There are three levels of domains.

Domains first or top level are divided into two groups:

1) These are domains with territorial affiliation, for example: .ru .by .ua .de .us, etc. That is, these are domains that are assigned to a particular country. Using them, you can, for example, determine which country a particular site belongs to.

2) The second group of first-level domains are domains of some specific purpose. For example: .com - for commercial organizations, .info - for informational sites, .tv - for television companies, etc. These domains can determine the specific focus of the site. Although, to tell the truth, in recent times they have been increasingly used for any purpose and often do not adhere to their purpose.

Top-level domains cannot be used as the address of your site. They serve to create domains second level , so you can register a second-level domain on any of the first-level domains. The second-level domain consists of the following elements: www.site_name.first-level domain. For example: www.webmastermix.ru. It is recommended to use second-level domain names for the website address. They are best read and remembered by people, as well as perceived by search engines. Therefore, most sites have domain names of this particular level.

In addition, there are domains third level . They are created on the basis of second-level domains. The third-level domain looks like this: www.forum.webmastermix.ru. By registering a second-level domain, you can independently create as many third-level domains on its basis as you like. You can register a domain name for your site using special services.

WEB TECHNOLOGIES: HTML, JAVASCRIPT

The first part of the didactic block above this topic was devoted to Internet technologies. Now we are starting to study the technologies used in the World Wide Web, or web technologies.

First you need to understand the basic concepts of web technologies: a website and a web page. A web page is the smallest logical unit of the World Wide Web, which is a document that is uniquely identified by a unique URL. A website is a set of thematically related web pages located on the same server and owned by the same owner. In a particular case, a website may be represented by a single web page. The World Wide Web is the collection of all websites.

The basis of the entire World Wide Web is the HTML hypertext markup language - Hyper Text Markup Language (Fig. 3). It serves for the logical (semantic) markup of a document (web page). It is sometimes misused to control the way web page content is displayed on a monitor screen or when output to a printer, which is fundamentally contrary to the ideology adopted in world wide web.

Rice. 3. Web technologies

Cascading Style Sheets (CSS) are designed to control the display of web page content. CSS is in many ways similar to the styles used in the popular word processor word.

Scripting languages ​​are used to give web pages dynamism (drop-down menus, animations). The standard scripting language on the World Wide Web is JavaScript. The core of the JavaScript language is ECMAScript.

HTML, CSS, JavaScript are the languages ​​with which you can create arbitrarily complex websites. But this is just linguistic provision, while in browsers, documents are represented as a set of objects, the set of types of which is the Browser Object Model (BOM). The browser object model is unique to each model, and thus there are problems when creating cross-browser applications. Therefore, the Web Consortium proposed the Document Object Model (DOM), which is a standard way of representing web pages using a set of objects.

The syntax of modern HTML is described using the Extensible Markup Language XML. XML will allow you to create your own markup languages, similar to HTML in the form of a DTD. There are many such languages: for representing mathematical and chemical formulas, knowledge, etc.

As can be seen from the above, all web technologies are closely interconnected. Understanding this fact will make it easier to understand the purpose of a particular mechanism used to create web applications.

EMAIL

Electronic mail (email, e-mail, from English electronic mail) - technology and the services it provides for sending and receiving electronic messages(called "letters" or "e-mails") over a distributed computer network. The main difference from other messaging systems is the possibility of delayed delivery and a developed system of interaction between independent mail servers.

E-mail makes it possible to send and receive messages, reply to letters of correspondents automatically using their addresses, send copies of a letter to several recipients at once, forward the received letter to another address, use logical names instead of addresses (numerical or domain names), create several mailbox subsections for all kinds of correspondence, include text files in letters, use the "mail reflector" system to conduct discussions with a group of your correspondents, etc. To send a mail message by e-mail, you must specify the mailbox address. An e-mail subscriber's mailbox is an area on a mail server's hard drive dedicated to a user.

Development Internet technologies led to the emergence of modern messaging protocols that provide great opportunities for processing letters, a variety of services and ease of use. So, for example, the SMTP protocol, which works on the client-server principle, is designed to send messages from a computer to an addressee. Normally, access to the SMTP server is not password protected, so any known server on the network can be used to send mail. Unlike servers for sending letters, access to servers for storing messages is protected by a password. Therefore, you must use a server or service that has Account. These servers use the POP and IMAP protocols, which differ in the way messages are stored.

In accordance with the POP3 protocol, messages arriving at a specific address are stored on the server until they are downloaded to the computer during the next session. After downloading the messages, you can disconnect from the network and start reading mail. Thus, the use of mail via the POP3 protocol is the fastest and most convenient to use.

The IMAP protocol is convenient for those people who use a permanent connection to the network. Messages received at the address are also stored on the server, but, unlike POP3, when checking mail, only the message headers will be downloaded first. The letter itself can be read after selecting the message header (it will be downloaded from the server). It is clear that with a dial-up connection, working with mail using this protocol leads to unjustified loss of time.

There are several protocols for receiving and transmitting mail between multi-user systems.

Short description some of them:

1) SMTP (Simple Mail Transfer Protocol) is a network protocol designed for the transmission of e-mail in TCP / IP networks, and the transmission must necessarily be initiated by the transmitting system itself.

MTA (Mail Transfer Agent) - mail transfer agent - is the main component of the Internet mail transfer system, which represents this network computer for the network e-mail system. Usually, users do not work with the MTA, but with the MUA (Mail User Agent) program - an e-mail client. Schematically, the principle of interaction is shown in the figure.

2) POP, POP2, POP3 (Post Office Protocol)- three fairly simple non-interchangeable protocols designed to deliver mail to a user from a central mail server, delete it from it, and identify a user by name/password. POP includes SMTP, which is used to transfer mail originating from a user. Mail messages can be received in the form of headers, without receiving the whole letter.

After a connection is established, the POP3 protocol goes through three successive states.

      1. Authorization The client goes through the authentication procedure
      2. The client transaction receives information about the state of the mailbox, accepts and deletes mail.
      3. The server update deletes the selected emails and closes the connection.

3) IMAP2, IMAP2bis, IMAP3, IMAP4, IMAP4rev1 (Internet Message Access Protocol) - provides the user with rich opportunities for working with mailboxes located on a central server

o IMAP stores mail on the server in file directories, and also provides the client with the ability to search for strings in mail messages on the server itself.

o IMAP2 - used in rare cases.

o IMAP3 - incompatible solution, not used.

o IMAP2bis - IMAP2 extension, allows servers to parse the MIME structure (Multipurpose Internet Mail Extensions) of a message, is still used today.

o IMAP4 is a redesigned and enhanced IMAP2bis that can be used anywhere.

o IMAP4rev1 - Extends IMAP with a large set of features, including those used by DMSP (Distributed Mail System for Personal Computers).

4) ACAP (Application Configuration Access Protocol) - a protocol designed to work with IMAP4; adds the possibility of a search subscription and a subscription to bulletin boards, mailboxes and is used to search for address books.

5) DMSP (or PCMAIL) is a protocol for receiving / sending mail, the peculiarity of which is that a user can have more than one workstation in his use. The workstation contains status information about mail, the directory through which the exchange takes place, which, when connected to the server, is updated to the current state on the mail server.

6) MIME is a standard that defines mechanisms for sending various types of information via email, including text in languages ​​other than English that use character encodings other than ASCII, and 8-bit binary content such as pictures, music, movies and programs.

Independent work.

Run the example given in the text (handout) save in own folder on the desktop.

9.2. Working with a teacher:

If there are any difficulties or erroneous actions, contact the teacher to correct the errors.

By the end of the lesson, show the teacher a report on the work performed and receive a credit for this work.

9.3. Control of the initial and final level of knowledge:

Computer testing .


Similar information.


URI (Uniform Resource Identifier) ​​is a compact string of characters for identifying an abstract or physical resource. A resource is any object belonging to some space. The need for a URI has been clear to the designers of the WWW since the inception of the system. it was supposed to combine into a single information environment tools using various ways identification of information resources. A specification was developed that included calls to FTP, Gopher, WAIS, Usenet, E-mail, Prospero, Telnet, X.500 and, of course, HTTP (WWW). As a result, a universal specification was developed that allows you to expand the list of addressable resources due to the emergence of new schemes.

Place of application of URI - hypertext links that are written in tags And . Embedded graphic objects are also addressed by the URI specification in the tags And . The URI implementation for WWW is called URL (Uniform Resource Locator). More precisely, a URL is an implementation of a URI scheme mapped to a resource access algorithm by network protocols. There is also a URN (Uniform Resource Name) that maps a URI to a namespace on the web.

The appearance of the URN is due to the desire to address parts of the MIME mail message. Principles of constructing a WWW address. URIs are based on the following principles:

· Extensibility - New address schemes should easily fit into existing URI syntax.

Completeness - whenever possible, any of the existing schemes should be described by means of a URI.

· Readability - the address had to be easily readable by the user, which is generally typical for WWW technology - documents along with links can be developed in a regular text editor.

Before looking at the various address representation schemes, here is an example of a simple URI address:

http://polyn.net.kiae.su/polyn/index.html

The colon is preceded by the address scheme identifier - "http". This name is separated by a colon from the remainder of the URI, which is called "path". In this case, the path consists of the domain address of the machine on which the HTTP server is installed and the path from the root of the server tree to the "index.html" file. In addition to the above complete record URI, there is a simplified one. It assumes that by the time it is used, many resource address parameters have already been determined (protocol, machine address on the network, some path elements). Under such assumptions, the author of hypertext pages can only indicate the relative address of the resource, i.e. an address relative to certain underlying resources.

URL (Uniform Resource Locator, Uniform Resource Locator) is a subset of URI schemes that identifies a resource by the way it is accessed (for example, its "location on the network"), instead of identifying it by the name or other attributes of this resource. The URL explicitly describes how to get to the object.

Syntax: :, where:

scheme="http" | ftp | gopher | "mailto" | news | telnet | "file" | man | info | whatis | "ldap" | "wais" | ...- schema name

scheme-specific-part– depends on the scheme. In scheme-specific-part hexadecimal values ​​can be used in the form: %5f. The non-printable octets must be encoded: 00-1F, 7F, 80-FF.

URL examples:

http://www.ipm.kstu.ru/index.php

ftp://www.ipm.kstu.ru/

URN (Uniform Resource Name) is a private "urn:" URI scheme with a "namespace" subset that must be unique and unchanged even if the resource no longer exists or is not available.

It is assumed that, for example, the browser knows where to look for this resource.

Syntax: urn: namespace: data1.data2,more-data, where namespace specifies how the data after the second ":" is used.

URN example:

urn: ISBN: 0–395–36341–6

ISBN - thematic classifier for publishers,

0-395-36341-6 - a specific number of the subject of a book or magazine

Upon receiving a URN, the client program accesses the ISBN (the "subject classifier for publishers" directory on the Internet). And he receives a transcript of the subject number “0–395–36341–6” (for example: “quantum chemistry”). URN is relatively recent, not included in current versions of HTML, and directory services are not yet developed, so URNs are not as widely used as URLs.

Internet Resource Addressing Schemes

There are 3 addressing schemes for Internet resources. The scheme specifies its identifier, machine address, TCP port, path in the server directory, variables and their values, label.

HTTP Scheme. This is the basic scheme for WWW. The schema specifies its identifier, machine address, TCP port, path in the server directory, search criteria, and label.

Syntax: http://[ [:@][:][?]]

http- schema name

user- Username

password– user password

host- host name

port- port number

url-path- the path to the file and the file itself

query (<имя–поля>=<значение>{&<имя–поля>=<значение>) – query string

By default, port=80.

Here are some examples of URIs for the HTTP scheme:

http://polyn.net.kiae.su/polyn/manifest.html

This is the most common kind of URI used in WWW documents. The schema name (http) is followed by a path consisting of the domain address of the machine and the full address of the HTML document in the HTTP server tree.

An IP address can also be used as a machine address:

http://144.206.160.40/risk/risk.html

If the HTTP protocol server is running on a different TCP port than 80, then this is reflected in the address:

http://144.206.130.137:8080/altai/index.html

http://polyn.net.kiae.su/altai/volume4.html#first

FTP scheme. This scheme allows you to address FTP file archives from World Wide Web client programs. In this case, the program must support the FTP protocol. In this scheme, it is possible to specify not only the name of the scheme, the address of the FTP archive, but also the user ID and even his password.

Syntax: ftp://[ [:@][:]

ftp- schema name

user- Username

password– user password

host- host name

port- port number

url-path- the path to the file and the file itself

By default, port=21, user=anonymous, password=email address.

Most often, this scheme is used to access public FTP archives:

ftp://polyn.net.kiae.su/pub/0index.txt

In this case, a link to the archive "polyn.net.kiae.su" is recorded with the identifier "anonymous" or "ftp" (anonymous access). If there is a need to specify the user ID and password, then you can do this before the machine address:

ftp://nobody: [email protected]/users/local/pub

In this case, these parameters are separated from the machine address by the "@" symbol, and from each other by a colon.

TELNET scheme. This scheme provides access to the resource in remote terminal mode. Usually the client calls additional program to work over the telnet protocol. When using this scheme, a user ID is required, and a password is allowed.

Syntax: telnet://[ [:@][:]/

telnet- schema name

user- Username

password– user password

host- host name

port- port number

By default, port=23.

Example: telnet://name: [email protected]

In reality, access is made to public resources, and the identifier and password are publicly known, for example, they can be found in Hytelnet databases.

telnet://guest: [email protected]

It can be seen from the above examples that the specification of URI resource addresses is quite general and allows you to identify almost any resource on the Internet. At the same time, the number of resources can be expanded by creating new schemes.

WWW service

The WWW (World Wide Web) service is intended for the exchange of hypertext information, built according to the "client-server" scheme. The browser (Internet Explorer, Opera...) is a multi-protocol client and HTML interpreter. And like a typical interpreter, the client performs various functions depending on the commands (tags). These functions include not only the placement of text on the screen, but the exchange of information with the server as the received HTML text is analyzed, which most clearly occurs when displaying graphic images embedded in the text.

An HTTP server (Apache, IIS...) handles client requests for a file. In the beginning, the WWW service was based on three standards:

· HTML (HyperText Markup Lan–guage) – language of hypertext markup of documents;

URL (Universal Resource Locator) – universal way addressing resources in the network;

· HTTP (HyperText Transfer Protocol) – hypertext information exchange protocol.

Scheme of the WWW server

A WWW server is a part of a global or intracorporate network that allows network users to access hypertext documents located on this server. To interact with the WWW server, a network user must use specialized software - a browser (from the English browser) - a viewer.

Let's take a closer look at how the WWW server works:

1. The network user launches a browser, the functions of which include:

establishment of connection with the server;

obtaining the required document;

display of the received document;

Responding to user actions - access to a new document. After launch, the browser, at the user's command, or automatically establishes a connection with a given WWW server and sends it a request-receiving a given document.

2. The WWW server searches for the requested document and returns the results to the browser.

3. The browser, having received the document, displays it to the user and waits for his reaction. Possible options:

Entering the address of a new document;

Printing, searching, other operations on the current document;

· activation (pressing) of special zones of the received document, called links (link) and associated with the address of the new document. In the first and third cases, a request for a new document occurs.

Liked the article? Share with friends: