Simple language about HTTP. Hypertext Transfer Protocol

At the heart of the web is the Hypertext Transfer Protocol (HTTP), which is the application layer protocol. A description of HTTP can be found in RFC 1945 and RFC 2616. The HTTP protocol is implemented using two programs: a client and a server, which, while on different end systems, exchange HTTP messages. The exchange procedure and message content are described in the protocol. Before delving into the study of HTTP, we first learn the terminology used in the context of the web.

Each web page, or document, consists of objects. The object is an ordinary file in HTML format, an image in JPEG or GIF format, a Java applet, an audio clip, etc., that is, a unit that has its own universal resource locator (Uniform Resource Locator, URL). Typically, web pages consist of a basic HTML file and the objects it refers to. So, if a web page includes a basic HTML file and five images, then it consists of six objects. Links to objects related to the web page are URLs included in the base HTML file. The URL consists of two parts: the host name of the server on which the object is located, and the path to the object. So, for example, for the URL _www.someSchool.edu/someDepartment/picture.gif, the host name is the fragment _www.someSchool.edu, and the path to the object is the fragment someDepartment / picture.gif.

A browser is a web user agent; it displays web pages, and also performs many additional utility functions. In addition, browsers represent the client side of the HTTP protocol. Thus, the terms “browser” and “client” in the web context will be used as equivalent. Some of the most popular browsers include Netscape Navigator and Microsoft Internet Explorer.

The web server contains objects, each of which is identified by its own URL. In addition, web servers represent the server side of the HTTP protocol. The most popular web servers include Apache and Microsoft Internet Information Server.

The HTTP protocol defines how clients (such as browsers) request web pages, and servers transfer those pages. We will conduct a more detailed discussion about the interaction of the client and server later, however, the main idea can be understood from Fig. 2.4. When a user requests a web page (for example, clicks on a hyperlink), the browser sends an HTTP request to the server for the objects that make up the web page. The server receives the request and sends response messages containing the required objects. In 1997, almost all web browsers and web servers began supporting the HTTP protocol version 1.0, which is described in RFC 1945. In 1998, the transition to version 1.1, which was described in RFC 2616, began. Version 1.1 is backward compatible with version 1.0 , that is, any server or browser using version 1.1 can fully interact with a browser or server supporting version 1.0.

Both HTTP 1.0 and HTTP 1.1 use TCP as the transport layer protocol. The HTTP client first establishes a TCP connection to the server, and after creating the connection, the client and server begin to communicate with the TCP protocol through the socket interface. As stated earlier, sockets are the “doors” between processes and the transport layer protocol.

The client sends requests and receives responses through its socket interface, and the server uses the socket interface to receive requests and execute them. After the web request bypasses the client socket, it is "in the hands" of the TCP protocol. Recall that one of the functions of the TCP protocol is to ensure reliable data transmission; this means that each request sent by the client and each server response are delivered in the form exactly corresponding to the sent one. One of the advantages of a multilevel communication model is manifested here: the HTTP protocol does not need to control the reliability of transmission and provide retransmission of packets in case of distortions. All the "rough" work will be done by the TCP protocol and the protocols of lower levels.

It should be noted that after completion of customer service, the server does not save any information about them. If, for example, a client makes two requests of the same resource in a row, the server will execute them without giving the client any notification of a duplicate request. They say that HTTP is a stateless protocol for a connection.

In the era of the widespread use of the Internet, viruses that are installed in the browser are especially prevalent. On our resource you can find several articles about such malicious programs, but Time to Read stands out especially among them. This virus can infiltrate an inattentive user’s computer and greatly spoil the pleasure of working with the browser. The user will see the advertisement, it will constantly be transferred to the Time to Read website, and many other problems will arise, however, first things first.

Like most Trojan viruses, Time to Read performs a simple task - to show the user the maximum amount of advertising so that the creators of the program receive money for its broadcast, clicks on it and transfers to partner sites. Most often, fraudulent resources or pages that are infected with something more serious than an advertising trojan use the services of promoting their sites with viruses.

The virus Time to Read after entering the computer manifests itself in the following "symptoms":

Additional advertising constantly pops up on sites, including pop-up banners that completely block the content before clicking on them;
Computer security settings are subject to change, which is dangerous for a computer that has a permanent Internet connection;
The start page of all browsers automatically changes to the Time to Read website, which seeks to position itself as a search and news resource;
Automatic redirection to third-party resources. Important: from an unknown site to which the user may be redirected by the Time to Read virus, there is a high risk of downloading other viruses to the computer.

If you notice the symptoms indicated above on your computer, then your computer is infected with the Time to Read virus. It must be urgently removed to avoid the more serious problems that it may cause.

To remove the Time to Read virus from a computer, you will need to first download and install two programs: AdwCleaner and CCleaner. These applications will help to automatically cope with the virus, and the user will have to perform only the simplest tasks in the "manual" mode.

The process of removing the Time to Read virus from a computer is as follows:

First of all, it is necessary to delete all temporary files from the computer so that the virus program could not recover after deletion. To do this, go to the appropriate sections:

On Windows 7: (System drive): \\ Users \\ Username \\ AppData \\ Local \\ Temp On Windows 10: (System drive): \\ Users \\ Administrator \\ AppData \\ Local \\ Temp

You should not act selectively, you need to delete all the files that are contained in the Temp folder, since each of them can be dangerous.

Please note that you must reset the recognizer cache from the administrator profile.

How to install and configure CCleaner correctly:

On this, the removal of the Time to Read virus from the computer can be considered complete. We recommend that you restart your computer before using the browser.

Most often, the Time to Read virus enters the user's computer due to its carelessness. A few basic recommendations that can significantly reduce the risk of a computer infected with this trojan:

Download programs on the Internet only from trusted sites. If the application is free, it is better to download it from the developers site;
When installing programs, carefully monitor all the “checkmarks” in the installer. Often, “full installation of a program” means its developers understand the installation of an application with affiliate software, which can be viral. We also recommend that you familiarize yourself with the user agreement, which may indicate that, by default, one or another affiliate program will be installed on the computer;
Do not download from the Internet programs from unknown developers who promise incredible functionality.

By following the simple rules described above, you can significantly reduce the risk of getting your computer infected with the Time to Read virus, which can cause a lot of trouble.

HTTP or HyperText Transfer Protocol is the main puncture (of the world wide web). The main objective of the protocol is to ensure the transmission of hypertext on the network. The protocol accurately describes the message format for exchanging clients and servers.

The HTTP protocol is described in RFC 2616 (HTTP1.1).

The basis of the protocol is to ensure the interaction of the client and server by means of one ASCII request, and the next response to it in RFC 822 MIME.

In practice, the HTTP protocol works on the basis of port 80, but you can configure it in another way. And although TCP / IP is optional, it remains preferable, since it takes over the splitting and compilation of messages and does not "strain" either the browser or the server.

It should be noted that the HTTP protocol can be used not only in web technologies, but also in other OOP applications (objectively oriented).

URL

The basis of client-server web communication is request. The request is sent using the URL - a single index of Internet resources. Let me remind you what a URL is.

A clear and simple URL structure consists of the following elements:

Protocol;
Host
Port;
Wheelchair resource;
Tags (Request).

Note: The http protocol is a protocol for simple, unsecured connections. Secure connections operate over https. It is more secure for data exchange.

HTTP Request Methods

One of the URL parameters determines the name of the host with which we want to communicate. But this is not enough. It is necessary to determine the action to be performed. This can be done using the method defined by the HTTP protocol.

HTTP Methods

Method / Description
HEAD / Read Web Page Title
GET / Read Web Page
POST / Add to Webpage
PUT / Save Web Page
TRACE / Send request back
DELETE / Delete Webpage
OPTIONS / Display Parameters
CONNECT / Reserved for future use.

Let's examine HTTP methods in more detail.

GET Method requests a page (file, object) encoded according to the MIME standard. This is the most used method. Method structure:
GET HTTP / 1.1 FileName

HEAD Method This method requests a message header. However, the page does not load. This method allows you to find out the time of the last page refresh that you need to manage the page cache. This method allows you to verify the functionality of the requested URL.

PUT Method This method can put the page on the server. The body of a PUT request includes a hosted page that is MIME encoded. This method requires customer identification.

POST Method This method adds content to an existing page. Used as an example to add a post to a forum.

DELETE method. This method destroys the page. The deletion method requires confirmation of the user's deletion rights.

TRACE Method This is a debugging method. It tells the server to send the request back and allows you to find out if the client request is distorted or not, returning from the server.

CONNECT Method - reserve method, not used.

OPTIONS Method allows you to request server properties and properties of any file.

In the communication of the client and the server "request-response", the server necessarily generates a response. It can be a web page or a status bar with a status code. The status code is well known to you. One of the codes is known code 404 - Page not found.

Status Code Groups

1xx: Server availability, Code 100 - the server is ready to process client requests;

2xx: Success.

Code 200 - the request was processed successfully;
Code 204 - No Content.

3xx: Redirection.

Code 301 - The requested page has been moved;
Code 304 - The page in the cache is still relevant.

4xx: Client error.

We offer you a description of the main aspects of the HTTP protocol - a network protocol that from the beginning of the 90s to this day allows your browser to load web pages. This article is written for those who are just starting to work with computer networks and developing network applications, and for whom it is still difficult to read the official specifications on their own.

HTTP - A widespread data transfer protocol, originally intended for the transfer of hypertext documents (that is, documents that may contain links that allow you to organize the transition to other documents).

The acronym HTTP stands for HyperText Transfer Protocol, "Hypertext Transfer Protocol." According to the OSI specification, HTTP is an application (upper, 7th) layer protocol. The current version of the protocol, HTTP 1.1, is described in the RFC 2616 specification.

The HTTP protocol involves the use of a client-server data transfer structure. The client application generates a request and sends it to the server, after which the server software processes this request, generates a response and passes it back to the client. After that, the client application can continue to send other requests that will be processed in a similar way.

The task that is traditionally solved using the HTTP protocol is the exchange of data between a user application that accesses web resources (usually a web browser) and a web server. Currently, it is thanks to the HTTP protocol that the World Wide Web is supported.

Also, HTTP is often used as an information transfer protocol for other application layer protocols such as SOAP, XML-RPC, and WebDAV. In this case, they say that the HTTP protocol is used as a "transport".

The API of many software products also implies the use of HTTP for data transfer - the data itself can be in any format, for example, XML or JSON.

Typically, data transfer over HTTP is via TCP / IP connections. The server software usually uses TCP port 80 (and if the port is not specified explicitly, then the client software usually uses the 80th port by default for open HTTP connections), although it can use any other.

How to send an HTTP request?

The easiest way to deal with the HTTP protocol is to try accessing a web resource manually. Imagine that you are a browser and you have a user who really wants to read articles by Anatoly Alizar.

Suppose he entered the following in the address bar:

Http: //alizar.site/

Accordingly, you, as a web browser, now need to connect to the web server at alizar.site.

You can use any suitable command line utility for this. For example, telnet:

Telnet alizar.site 80

I’ll clarify right away that if you suddenly change your mind, then press Ctrl + “]”, and then enter - this will allow you to close the HTTP connection. In addition to telnet, you can try nc (or ncat) to taste.

After you connect to the server, you need to send an HTTP request. This, by the way, is very easy - HTTP requests can consist of only two lines.

In order to generate an HTTP request, you need to make a start line, as well as specify at least one header - this is the Host header, which is required and must be present in each request. The fact is that the domain name is converted to an IP address on the client side, and, accordingly, when you open a TCP connection, the remote server does not have any information about which address was used for the connection: it could be, for example , the address is alizar..ru or m .. However, in fact, the network connection in all cases is opened with the host 212.24.43.44, and even if the domain name was not initially set when opening the connection, then the server will be informed about this not informed in any way - and that is why this adr The eu must be passed in the Host header.

The start (initial) request line for HTTP 1.1 is composed as follows:

For example (such a start line may indicate that the main page of the site is being requested):

Well, of course, do not forget that any technology becomes much simpler and more understandable when you actually start using it.

Good luck and fruitful learning!

Tags:

http
alizar
spdy

Add tags

The standard protocol for transmitting data on the World Wide Web is HTTP (HyperText Transfer Protocol - Hypertext Transfer Protocol). It describes the messages that clients and servers can exchange. Each interaction consists of one ASCII request, followed by one response, reminiscent of the response of the RFC 822 MIME standard. All clients and all servers must follow this protocol. It is defined in RFC 2616.

Connections

The usual way the browser interacts with the server is to establish a TCP connection with port 80 of the server, although this procedure is not formally required. The value of using TCP is that neither browsers nor servers need to worry about lost, duplicated, too long messages and acknowledgments. All this is provided by the TCP protocol.

In HTTP 1.0, after the connection was established, one request was sent, to which one response came. After that, the TCP connection was broken. At that time, a typical web page consisted entirely of HTML text, and this way of interaction was adequate. However, several years passed, and the page turned out to have many icons, images and other decorations. Obviously, establishing a TCP connection to transmit a single icon is irrational and too expensive.

This consideration led to the creation of the HTTP 1.1 protocol, which supported persistent connections. This meant that it became possible to establish a TCP connection, send a request, receive a response, and then send and receive additional requests and responses. Thus, the overhead incurred during permanent installations and connection breaks was reduced. It also became possible to pipe queries, that is, send request 2 before the response to request 1 arrives.

Despite the fact that HTTP was designed specifically for use in web technologies, it was intentionally made more universal than was necessary, as it was counted on for future use in object-oriented applications. For this reason, in addition to the usual web page requests, special operations called methods have been developed. They owe their existence to SOAP technology. Each request consists of one or more ASCII lines, with the first word being the name of the method being called. Built-in methods are listed in the table in Fig. 6. In addition to these general methods, various objects may also have their own specific methods. Method names are case sensitive, that is, the GET method exists, while get does not.

Figure 6 - Built-in HTTP request methods

The GET method asks the server for a page (which in general means an object, but in practice it is usually just a file) encoded according to the MIME standard. Most of the server requests are GET requests.

The HEAD method simply asks for the message header, without the page itself. Using this method, you can find out the time of the last page change to collect index information or just to check the health of this URL.

The PUT method is the opposite of the GET method: it does not read, but writes the page. This method allows you to create a set of web pages on a remote server. The request body contains a page. It can be encoded using MIME. In this case, the lines following the PUT command may include various headers, for example, Content-Type or authentication headers confirming the rights of the subscriber to the requested operation.

The POST method is somewhat similar to the PUT method. It also contains a URL, but instead of replacing existing data, new data is “added” (in a general sense) to existing data. This could be posting to a conference or adding a file to a BBS bulletin board. In practice, neither PUT nor POST are widely used.

The DELETE method, unsurprisingly, deletes the page. As in the PUT method, authentication and permission to perform this operation can play a special role. Even if the user has permission to delete the page, there is no guarantee that the DELETE method will delete the page, since even with the consent of the remote HTTP server, the file itself may be protected from modification or movement.

The TRACE method is for debugging. He tells the server to send the request back. This method is especially useful when requests are processed incorrectly and the client wants to know what kind of request the server actually receives.

The CONNECT method is currently not used. It is reserved for future use.

The OPTIONS method allows the client to find out from the server about its properties or about the properties of a particular file.

In response to each request, the server receives a response containing a status bar, as well as, possibly, additional information (for example, a web page or part of it). The status bar may contain a three-digit status code that indicates the successful completion of the request or the reasons for the failure. The first category is designed to divide all the answers into five main groups, as shown in the table in Fig. 7. Codes starting with 1 Axx) are rarely used in practice. Codes starting with 2 mean that the request was processed successfully and the data (if requested) was sent. 3x codes tell the client to try their luck elsewhere - using either a different URL or their own cache.

Figure 7 - Groups of status codes contained in server responses

Codes starting with 4 mean that the request failed for some reason related to the client: for example, a nonexistent page was requested or the request itself was incorrect. Finally, 5xx codes report server errors that have occurred either as a result of a program error or due to temporary overload.

HTTP example

Since HTTP is a text protocol, interaction with the server through the terminal (which in this case acts as the opposite of the browser) can be organized quite simply. You only need to establish a TCP connection with port 80 of the server. The reader is given the opportunity to see for himself how this script works (it is preferable to run it on a UNIX system, as some other systems may not display the connection status). So, the sequence of commands is as follows:

Figure 8 - HTTP protocol command sequence

This sequence of commands establishes a telnet connection (that is, a TCP connection) to port 80 of the IETF web server located at www.ietf.org.

The result of the communication session is written to a log file, which can then be viewed. The next is the GET command. The name of the requested file and the transfer protocol are indicated. Next is the required line with the Host header. The empty line that is behind it is also required. It signals the server that the request headers are over. With the close command (this is the telnet program command), the connection is terminated.

The connection log file, log, can be viewed using any text editor. It should start approximately as shown in the listing in Figure 8, unless there have been any changes on the IETF site during this time.

Figure 9 - The beginning of the output of the file “www.ietf.org/rfc.html”

The first three lines in this listing are created by telnet, not the remote site. But the line starting with HTTP / 1.1 is the IETF response, saying that the server wants to communicate with you using the HTTP / 1.1 protocol. This is followed by a series of headers and, finally, the content of the requested file itself. The ETag header, which is a unique identifier for the page related to caching, and the X-Pad, a custom header that helps deal with browser errors.