What is a search engine, how does the search work

GBPOU Republic of Mordovia

"Saransk Medical College"

Summary of classes on the topic:

“Internet search services. Search serversWww»

Prepared by: teacher

Gorina A.D.

Saransk, 2016

Discipline:information technology in professional activities

Lesson #:3.1.1

Theme:Search services Internet. WWW Search Servers

Note: The name of the lesson appears on slide 1 of the presentation.

Purpose:mastering the theoretical foundations of the topic being studied (search server, global network protocols, technical means of global networks)

Providing classes:computer, interactive board, multimedia projector, lecture notes, presentation

Type of occupation:lesson - lecture

Learning technology:developmental education

Teaching methods:lecture, work with a book

Competences:

OK 1. Understand the nature and social significance of their future profession, show a steady interest in it.

OK 2. Organize your own activities, choose standard methods and ways of performing professional tasks, evaluate their implementation and quality.

OK 3. Make decisions in standard and non-standard situations and take responsibility for them.

OK 4. To carry out the search and use of information necessary for the effective performance of professional tasks, professional and personal development.

OK 5. Use information and communication technologies in professional activities.

PC 2.1. To present information in a form understandable for the patient, to explain to him the essence of the interventions.

PC 2.6. Maintain approved medical records.

Interdisciplinary communication:

Used Books:Ugrinovich, N.D. Computer science and information technology. Textbook for grades 10-11

1. Organizational moment: 3-5 min

(mark absent, check the appearance of students, the sanitary condition of the office)

2. Material Statement:   53-58 min

1) Computer network and its types. Global network hardware

2) Types of servers. Global network protocols

3) Internet search services. WWW Search Servers

Nowadays, computers are increasingly not used in isolation, but are interconnected for constant or occasional interaction and transfer of information. This interaction is a computer network.

A computer network is a collection of computers, connected by communication channels, through which information is exchanged through signals and the user solves joint information tasks.

Note: on slide 2 of the presentation the definition is given for the record of students.

Creation of computer networks caused by the practical need of users remote friend   from a friend's computers in the same information. Networks provide users with an opportunity not only to quickly exchange information, but also to work together on printers and other peripheral devices, and even to simultaneously process documents.

All variety of computer networks can be classified according to the group of features:

by geographic location;

depending on the availability of a host computer in the network;

by the way of connecting computers through communication channels;

by type of transmission medium.

Note: slide 3 of the presentation presents this classification, which will be deciphered.

By geographical location: local, distributed (corporate), regional and global. Note: slide 4 of the presentation presents this classification, which will be deciphered.

Depending on the presence in the network of the host computer: a network of the "client-server" type, peer-to-peer networks

By way of connecting computers through communication channels (topology): tire, ring, star, snowflake.

Note: on slides 5-7 of the presentation, this classification is presented, which will be deciphered. Connection of computers in each case is shown.

By the type of transmission medium, they are divided into coaxial networks, on a twisted pair, fiber-optic, with the transmission of information via radio channels, in the infrared range.

Local computer networks (LAN, LAN)

A group of up to 12-15 computers located within one or several premises, using a common channel of information exchange and one set of network equipment and managed by a single software package.

A local network unites computers installed in one room (for example, a school computer class consisting of 8-12 computers) or in one building (for example, in a school building several dozens of computers installed in different subject cabinets can be connected into a local network). Such networks are used to exchange files between network users, as well as to use shared resources available to all network users.

Each computer connected to a local network must have a special card (network adapter). In between computers ( network adapters) are connected with cables.

Distributed (corporate)

Many organizations that are interested in protecting information from unauthorized access (for example, military, banking, etc.) create their own, so-called corporate networks. A corporate network can unite thousands and tens of thousands of computers located in various countries and cities (for example, Microsoft’s MSN network).

Regional (MAN)

Local networks do not allow sharing information with users located, for example, in different parts of the city. Regional networks that connect computers within one region (city, country, continent) come to the rescue.

A network of computers that belong to the same region. They have various transmission channels, the number of computers is unlimited.

Global (telecommunication, WAN)

A set of computers located at a great distance from each other, with various channels of information transfer and data exchange.

In 1969, the ARPAnet computer network was created in the United States, combining the computer centers of the Ministry of Defense and a number of academic organizations. This network was designed for a narrow purpose: mainly to study how to maintain communication in the event of a nuclear attack and to help researchers in the exchange of information. As this network grew, many other networks were created and developed. Even before the onset of the era personal computers   the creators of ARPAnet started developing the program Internetting Project. The success of this project led to the following results. First, the largest internet network in the United States was created. Secondly, various options for the interaction of this network with a number of other US networks were tested. This created the prerequisites for the successful integration of many networks into a single global network. This “network of networks” is now everywhere referred to as the Internet (Russian publications widely use Russian-language spelling - the Internet).
  Currently, tens of millions of computers connected to the Internet store a huge amount of information (hundreds of millions of files, documents, etc.) and hundreds of millions of people use the information services of the global network.

The Internet is a global computer network that brings together many local, regional and corporate networks and includes tens of millions of computers.

Each local or corporate network usually has at least one computer that has a permanent connection to the Internet using a high-bandwidth link (Internet server).

The reliability of the global network is provided by the redundancy of communication lines: as a rule, servers have more than two communication lines connecting them to the Internet.

The basis of the Internet is more than one hundred million servers that are constantly connected to the network.

Internet servers can connect using local networks   or dial-up telephone lines hundreds of millions of network users.

In small local networks, all computers are usually equal in rights, that is, users independently decide what resources of their computer (disks, directories, files) to make publicly available over the network. Such networks are called peer-to-peer.

If more than ten computers are connected to the local network, the peer-to-peer network may not be sufficiently productive. To increase performance, as well as to ensure greater reliability when storing information in the network, some computers are specially allocated for storing files or application programs. Such computers are called servers, and a local network is called a client-server network.

Unallocated servers are servers to which neither a monitor nor a keyboard is connected.

Dedicated servers - which do not differ about work conventional computerswhose speed is higher than in unallocated; a dedicated server not only manages the network, but is also a stand-alone computer with a high-speed processor, a large amount of memory; designed for servicing client computers.

Network topology

The general scheme of connecting computers to local networks is called network topology. Network topologies may vary.

Bus - all computers are connected in parallel to one cable (communication lines), with plugs located at the edges of the cable. With such a connection, computers can transmit only in turns. There is no central computer in this structure. On the edges of the cable, you need to have special matching devices - terminators.

Advantages:

Simplicity and cost effectiveness

Reliability (resistance to breakage of one computer)

Disadvantages:

Sensitivity to cable system faults. If the cable is damaged in at least one place, then problems arise for the entire network.

Ring - all computers are connected to the same cable, each computer has two neighbors. In such a network, information is transmitted between stations in a ring with a reception in each network controller. Repeating is performed through buffer drives made on the basis of random access memory, therefore, when the output of a single network controller fails, the work of the entire ring may be disrupted.

Dignity:

Ease of device implementation

Disadvantage:

Low reliability

High cable consumption

Star (radial structure) - in the center is the server to which workstations are connected, i.e. each computer is connected to its cable.

Tree topology - the implementation of the hierarchical subordination of computers.

Snowflake is a combination of computers across servers using various topologies.

Global network hardware

Note: on slide 8 of the presentation there is a list of these technical means.

The technical means of the global network include computers, communication channels, special equipment (switches, hubs), modems.

Signals on computer networks are transmitted via communication channels - these can be radio waves, optical fiber, satellite communications. The most common communication channel is cable.

Coaxial cable - signals are transmitted over a copper core, and the metal shield is grounded at one of the ends; besides, the metal screen protects the copper core from external influence.

Twisted pairs - a set of eight wires, twisted in pairs, thus, to provide protection from electromagnetic interference. Each twisted pair cable connects only one computer to the network; therefore, a connection violation affects only this computer, which allows you to quickly find and fix faults.

Fiber optic cables - transmit data in the form of light pulses on glass wires. Such cables provide the highest transmission speed; not subject to electromagnetic interference; more convenient transportation.

Wireless radio communications can be used to organize networks within large rooms.

Hub (hub) - a device that provides simultaneous operation of several subscribers on one channel ;   grouping signals of several subchannels and sending them to one channel with higher bandwidth.

Switch (switch) - a device that allows you to share the bandwidth between end stations. The switch remembers the addresses of senders and receivers, the port numbers to which the device’s communication lines are connected, and based on this data, a table is constructed, according to which the signal is split.

A modem (modulator and demodulator) is a device used in communication systems for physically interfacing an information signal with its propagation medium, where it cannot exist without adaptation and performs the function of modulating and demodulating this signal.

All computers in the global network can be divided into:

Note: slide 9 of the presentation presents a list of these computers.

1) Workstations - a computer that uses either its own resources or resources of another computer, usually a server, for solving information tasks.

2) Server - a computer whose resources are accessible from network workstations. It performs the functions of network maintenance, the organization of shared resources, provides centralized management of the entire network, defines message transfer routes, and provides access to network resources through it. peripherals, and on its disks shared programs are located.

3) Host computer (network server) - a special communication center, which is most often created on the basis of several powerful computers and ensures reliable round-the-clock transmission of information, its storage and simultaneous operation of many users. In addition to network functions, it can perform user tasks.

4) Gateway (router, router) - a server that provides communication between local networks using different data transfer protocols. It connects many computers with various operating systems, applications, hardware platforms.

5) Firewall - a gateway computer that restricts access to computer networks from the outside. Designed to protect information inside.

Types of servers

Note: slides 10-11 of the presentation contain a list of these servers.

File Server — Stores Data Files

Print servers - with one or more printers, used for printing documents transmitted over the network.

Application Server - provides access to network applications, thanks to them, users can work in applications that are missing on their computers.

Registration servers - designed to ensure the security of databases, they contain information about users.

Servers-Web-provide requests to network resources.

Servers email   - provide electronic mailboxes   for letters addressed to network users.

Remote access servers - provide a dial-up connection, i.e. with their help, another computer can access the server or the network by telephone line.

Terminal servers provide access to remote computers   or terminals.

Telephone servers perform the role of answering machines, transmit voice messages, and forward calls.

Cluster servers - provide the integration of many servers into clusters, i.e. independent groups of computer systems operating autonomously.

Proxy servers - serve as intermediate servers between user workstations and the Internet, improve system security.

Fax servers - are the central point of the network, designed to receive and send faxes, distribute incoming faxes to users.

BOOTH - servers - using the BOOTH protocol, the OS loads client computers that do not have hard drives   and provide information on configuring the network protocol.

DHSP servers — assign IP addresses and configuration settings to computers that are clients of DHSP servers.

Servers routers (routers) are powerful computers or specialized intelligent devices connecting different networks or areas, they determine the most efficient way to move information, determine the addresses of recipients and senders, minimize the workload of lines and packet routes.

Servers bridges devicesproviding data transfer between two networks, provide more remote access   in comparison with routers, can connect local networks, produce packet filtering.

A firewall is a device that restricts access to a computer network from outside.

Global network protocols

Note: Slide 12 presents a list of network protocols.

Global networks operate stably thanks to unified information exchange protocols. Global network protocols are more complicated than local ones. This is due to the fact that the servers use different software. The higher the level, the closer it is to the user. There are 7 levels of protocols that define the principles of interaction between computers of global networks:

Physical - the lowest, it determines the type and characteristics of communication lines. Along the lines of communication signals are transferred from computer to computer, while their physical nature may vary.

Logical - for each protocol of the physical layer, a logic layer protocol has been developed that controls the transmission of information over physical lines.

SLIP is an Internet protocol for serial communication.

PPP is a protocol for communication between nodes.

Ethernet is a protocol for local area networks.

Network - responsible for routing - choosing the shortest route for following information through the network.

IP - interworking protocol is a system of 32-bit physical addresses of computers connected to global networks.

ARP - address determination protocol.

Transport - manages the transfer of information across networks.

TCP is a message transfer control protocol. Splits messages into small fragments, provides each header with a header, combines these fragments into a single whole, and simultaneously checks for errors.

UDP is a universal data transfer protocol. Is used for fast transfer   information. Streaming protocol Messages that did not reach the addressee are not repeated.

Session - is responsible for the installation, maintenance and destruction of relevant data channels, for their security. AT normal work   such protocols 3 and 4 of the example are almost not used, they are needed for non-standard communication conditions.

UUCP is a copy protocol from Unix to Unix.

SSL is a secure connection layer.

Executive - engaged in the maintenance of application programs.

SMTP is a mail transport protocol.

POP3 - office email protocol version 3.

IMAP is a protocol for accessing messages on the Internet.

HTTP - hypertext transfer protocol.

FTP - data transfer protocol.

NNTP - network protocol   broadcast news.

NFS - distributed file system.

Most commonly used in pairs. The first two are email. With the advent of the WWW service, HTTP was developed, which provides verification and identification of the user, protection against interception and confidentiality of information.

Applied - Services - Network Services

Means to provide certain information services for network users are called Internet services. Internet services are divided into information and communication.

Internet communication services (services)

Note: on slide 13, the presentations are presented in the form of a list of Internet services, this classification, which will be deciphered.

Email - e-mail. This is the oldest and most massive network service.

E-mail - e-mail - a system that allows you to share by email   via modem

This service is provided by special mail servers that receive messages from clients and send them through the chain to the recipient's mail servers. These messages are accumulated and when the recipient establishes communication with the server, they are automatically transferred to the recipient’s computer. When registering on the Internet, each user receives a unique email address.

Email address structure

<идентификатор_абонента>@<домен>

Subscriber ID is the registered user name.

Domain defines the mail computer to which this subscriber is connected.

Sample email address:

Email scheme

the mail program places the letter in the mail-header (envelope) and sends it to the network using SMTP;

the message is transmitted over the network from one computer to another via an internetwork protocol;

when a code message arrives at the desired computer, the mail agent (postman) delivers it to mailbox   the recipient. The recipient retrieves the message using the POP3 protocol.

Teleconferencing service (USENET mailing lists). Mailing lists are a special address through which incoming messages are considered by special programs and sent to those recipients who subscribed to messages on this topic. Teleconferencing combines both communication and information functions.

Direct communication forums (IRC, chat conferences, ICQ) - real-time communication between participants.

Internet telephony (IP telephony) -   voice communication via the network in real time. Skype

Instant messengers (instant messaging system) - instant messaging service.

Internet Information Services

Data services - FTP - storage of a set of files for different purposes.

WWW - World Wide Web - a distributed information system with hyperlinks existing on the technical basis of the world wide web. Appeared in 1993.

Runet is the Russian part of the world wide web the Internet.

Web browsers are application programs that provide network services and are used to quickly obtain information from global networks. Web browser is a WWW client program.

Web browsers provide viewing of almost all types of information and access to global network resources. For consumers, the browser can be delivered as a stand-alone (stand-alone) application or as part of bundled software.

Browser Internet Explorer   supplied as part of operating system Microsoft Windows.

Mozilla firefox   - separately or as part of Linux distributions.

Safari - as part of the operating mac systems   OS X and as a standalone application for Microsoft Windows.

Google chrome, Opera and other browsers - as independent programs in a variety of options for various operating environments.

A web page is each individual document having its own address. An extension of a html or htm web page.

A web server is a computer running a WWW server program.

A Web site is a collection of interrelated Web pages.

Comprehensive Online Services

On-line translators and dictionaries

Online stores

Electronic payment systems (QIWI, Yandex.Money, WebMoney)

Ways to actively display information on the World Wide Web:

guest books - softwarethat is used on websites and allows their visitors to leave various suggestions, comments, brief notes addressed to the owner or future visitors. Thus, the guest book is the most simplified version of a web forum;

forums - a class of web applications for organizing communication of website visitors;

blogs are a website whose main content is regularly added posts containing text, images or multimedia;

wiki projects — a website, structure, and content that users can independently modify using the tools provided by the site itself;

social networks   - platform, online service or website designed to build, display and organize social relationships, visualization of which are social graphs. Examples: VKontakte, Odnoklassniki, My [email protected], Facebook, Google+, Myspace, Among friends and others;

content management systems.

3. Fastening a new material: 15-27 min

Questions for self-control:

1. What is the Internet?

2. List the basic Internet services to which the user has access?

3. What parts is the email address?

4. What is WWW? What is a web page?

5. What is the function of hyperlinks in WWW technology?

6. List the main elements of the Internet Explorer browser window?

7. What is the browser address bar for?

8. How to determine the addresses of recently visited pages?

9. What are browsers? Give examples.

10. What is an email subscriber mailbox?

4. Homework: 2 minutes

5. Summarizing: 5-10 min

(marks are given, their comment)

Classification of information (what can be searched on the Internet).

From the point of view of the consumer, all information on the Internet can be divided into telecommunication information markets (Fig. 1.).


Search engines (classification and scope).

The search for the necessary information in a large amount of sufficiently diverse information is a task that humanity has been solving for many centuries. As the volume of information resources grew, sufficiently sophisticated search tools and techniques were developed to find the necessary document. Catalogs (alphabetic, systematic and subject) are used as the main tool for searching information in libraries. However, each tool has its drawbacks. With large amounts of information (which are characteristic of the Internet), the search for information becomes a very complicated procedure. In order to find the necessary information in Inernet, you must have special knowledge and skills. A specialist with such knowledge and skills and searching for information on incoming orders is called an information broker. He knows how classifiers are structured, how systematists interpret them, what tools exist for searching information in Inernet, technological methods and methods of searching, features of various search engines, etc. In an interview with the customer, he studies his information need and turns it into a search prescription. In our country, specialists of such a profile are rare, although the need for them is already being felt.

Information retrieval systems (IPS) of three types are available on the Internet: classification, vocabulary and subject.

ClassificationIRS use a hierarchical organization of information, which is described using a classifier. The sections of the classifier are called headings. For example, a systematic catalog is used in librarianship for this purpose.

The classifier is developed and improved by a team of authors. Then it is used by another team of specialists, called systematizers, who, knowing the classifier, read the documents and assign classification indexes to them indicating which sections of the classifier these documents correspond to. As an example of a classification IPA on the Internet, Yahoo! In which at the same time more than 100 systematists work, Excite, Look Smart, Yellow Web, “Constellation of the Internet”, “Au”.

Classification IPA have a number of specific disadvantages. The development of a classifier is associated with an assessment of the relative importance of various areas of human activity. Any assessment is a social action - it is associated with society, culture, social groupto which the person producing the assessment belongs. Therefore, classifiers created by different teams in different countries vary greatly. In addition, systematizers have difficulty interpreting materials written in foreign languages ​​(not only source documents, but also classifiers). Since an absolutely strict classification cannot be made to anyone, there are always documents that can be attributed to several sections of the classifier.

In complex cases, systematizers (when it is not clear to which of the sections the document should be assigned) use two methods: sendingand link.The reference (in Yahoo! It is denoted by the @ sign) is placed in those sections of the classifier in which this document did not fall - it indicates to which rubric it is assigned by the systematizer. The link is used in cases where similar information may be in other sections of the classifier.

VocabularyIRSs use a database built from words found in Internet documents a. In such a database with each word is stored a list of documents from which it is taken. Since all morphological units in the dictionary are ordered, the search for the desired word can be performed quite quickly, without sequential browsing.

One word to find the required information is quite difficult. Therefore, each dictionary has its own query language, allowing you to combine words that most fully characterize the desired information.

Alta Vista, Rambler, Me ndex, Aport relate to the Internet.

IPS wordlists are capable of issuing lists of documents containing millions of links. Even a simple viewing of such lists is difficult. Therefore, many IPS dictionary entries provide the ability to rank search results — the most important documents are placed at the top of the list. In the query language of such IPAs, special tools are provided, for example, in the Alta Vista advanced search mode, you can specify a list of terms that increase the rank of the found document (which is especially important for this IPS, since it shows only the first 200 found documents). Rambler and I ndex allow you to specify the weight of each of the terms, which allows you to quite accurately adjust the order of the found documents.

Forecasting is the core of any trading system, which is why competently reproduced ones can make you an archive of money.

AT subjectIPA with a search image associated with the resources of the Network, containing the necessary information and links to related sites. In such IPA ring link structures are created. Thus, the server contains several tens of thousands of thematic rings (the average ring size is about 12 servers, but there are also giant rings that include thousands of servers). While the rings were small, the search for information was not a problem. To facilitate the search on the specified server uses its own classification and vocabulary IPS to help find the necessary information.

With the help of information retrieval systems, it is possible to search for quite specific information objects, the list of which is shown in Fig. 2

Description of search engines. Search systemAlta Vista.

Each search engine has its own query language, which defines the rules according to which requests for information search are formulated.

In the classification and vocabulary IPS, the request is based on the keywords that are the most striking characteristic of the information sought (in fact, without these words this information   can not do). It is better if these keywords have a specific meaning inherent only in the information material sought, which distinguishes this material from all others.

The AltaVista search engine is classified as an IPA dictionary and is one of the most informative. You can contact her at the following addresses:

2) The search image may consist of one or several keywords.

3) Depending on the method of combining keywords in a search query, simple and complex queries are distinguished.

4) A complex query differs from a simple one in that it is possible to indicate the date when the desired document was created (in order to highlight materials that have last update   after the specified date), special search logic (defined by using AND, OR, NOT, NEAR operators), choose one of three options for ordering search results when they are displayed: “as a result only”, “compact form”, and “standard form” (the latter is used by default), and use parentheses to separate the logically independent parts of the query.

5) Key words can be typed in different registers of the keyboard - depending on this, the search engine will search differently.

The presence of a capital letter in a keyword will cause the search engine to search for words with such a spelling as in the query with a simple search. If capital letters were not used, the search engine takes into account any spellings of these words. For example, if the search prescription consists of one word Computer, then informational materials containing this word in this style will be found. If this word does not contain capital letters, then the search will take into account words in such types as computer, COMPUTER, COMPuter, etc. It must be taken into account that when using a search image consisting of only one word computer, AltaVista provides about 2000 links. It is almost impossible to view such a number of links, which means that information retrieval cannot be considered effective (if a request is properly written, the necessary information is among the first two dozen links).

6) In the event that the correct spelling of a word is unknown, or a set of words with the same root is of interest, the uncertainty operator “*” (asterisk) is used. By placing this symbol after any sequence of letters (at least three), the influence of which must be taken into account when searching, you can perform a wide search in which the keyword will be modified: the search will be omitted both for a set of letters that are hard-coded to an asterisk, and for words containing any letters (up to 5) instead of an asterisk. For example, if you specify the comp * keyword, then the search will be taken into account as the key ones - computer, computers, compute, etc.

7) To connect several keywords, the operators “space”, “quotation marks”, logical operators “+”, “-”, AND, OR, NOT, NEAR can be used.

8) The operator "space" connects the words in the search prescription in such a way that each of these words is used separately for the search. In this case, the order of words in the query does not matter. The search process takes into account only the distance of each word from the beginning of the document and the frequency of its use in the document.

9) The operator "quotes" connects words so that they form a phrase in which all the words specified in the instruction in the document are next to each other and in the same sequence as indicated in the instruction. Therefore, if you specify a search prescription in the form of the words “personal computer” and in the form of “computer personal”, the search results will be different.

10) The “+” operator connecting words tells the search engine that the main word (first) should be searched for in the document, but the document should be shown in the search result only if the rest of the words from the search prescription are found in the text. The operator is placed immediately before each minor word. For example, on a search image:

computer + personal + digital will be searched for the main word computer, but the text will only be relevant if it contains the words personal and digital.

11) The “-” operator in front of a word means that the main word should be used in the text without a secondary one. For example, the search prescription computer - personal tells the search engine to search for the main word computer, but the word personal should not appear in the text (that is, they are interested in materials about computers, but not personal ones).

12) AND, OR, NOT, NEAR operators are used in complex queries.

13) The AND operator (you can use the & symbol instead) defines that the words it joins should appear together (that is, in simple queries it is equivalent to the “+” sign).

14) The OR operator (instead of it, you can use the “|” sign) determines that the words it joins are independent of each other (in simple queries, it is equivalent to a space).

15) The NOT operator denotes negation (in simple queries it is equivalent to the sign “-”).

16) The operator NEAR (instead of it you can use the symbol “~”) determines that the keyword specified by it in the search text is no more than 10 words from the main one (for example, in the search prescription:

provider * NEAR “very cheap” provides that in the text you are looking for the word “provider” and the phrase “very cheap” are not at different ends of the text, but next to each other - there can be no more than 10 words between them).

17) To limit the search, special commands (tags) are used: anchor, applet, title, url, host, link, image, from, subject.

18) The anchor command allows you to find the word contained in the “body” of a link on the web. To do this, after the anchor command, the search word is indicated with a colon. For example, the search image contains:

anchor: home This search will find the entire set of pages containing the word home inside the links, including the following link: “If you would like to go home, press here”.

19) The applet command allows you to find the Java module specified by the name. For example, if the Java module is called word, then you can find it by writing the search image: applet: word.

20) The title command is used if the search word is in the text header. For example, at the request of the form: title: links, documents containing the word links in the title, including the text with the title “Cool Links” will be found.

21) The url command instructs to search for a url-address containing the given word. For example, if you do not know in which root domain is the host computer MESI, you can set the search prescription: url: mesi. Among the many addresses with such a word will be the address.

22) The host command allows you to find out which Web sites are on a given host computer. For example, in order to find out which sites are on a host, it is necessary to type the query: host: intel. ru. If the request specifies only a part of the name, then the search will find sites that have different addresses but contain the specified part of the name.

Using this command, you can search in a given country. For example, on request host: *. ru + kreml information about Moscow, Ryazan and other Kremlins will be found. It should be remembered that the search is conducted only for sites registered in the AltaVista search system; other sites are not accessible to it.

23) The link command allows you to find the addresses of pages (sites) that contain a link to a specific (specified in the search image) Web page. For example, in order to find out who is linking to the site, you must specify the prescription: link:. The result will be a list of pages that contain links to the mesi site. ru.

24) The image command allows you to find an illustration on the Internet. For this you need to know the name of the file in which it is stored. The command format is the same.

25) The from command allows you to search in Usenet newsgroups for mail sent by a specific person, whose name is indicated after the colon in the command. For example: from: Ivan + Fedorov (or Ivan + Fedorov).

26) The subject command allows you to search for messages in Usenet newsgroups on a specific topic specified in the search prescription.

The AltaVista search engine can work (and search) in different languages, including Russian.

The described principles of search engine management are in many respects similar to those used in other search engines.

Search systemYandex.

In 1997 to the address: a new Russian search engine Yandex has opened (or I ndex). By recruiting search opportunities   it is not inferior to the most sophisticated search engines in the West, is specifically designed for Russian-language queries and takes into account the peculiarities of Russian vocabulary, offers several more opportunities for intelligent search.

Similar to AltaVista, Yandex distinguishes between upper and lower case letters. If the keyword is written in capital letters, the search engine does not distinguish between uppercase and uppercase letters, that is, when specifying the computer keyword in the search, Computer and COMPUTER, etc. will be taken into account. Then, as if the search image contains at least one capital letter, the search will be take into account only the words that have this type.

In Yandex, there is no need to use the operator of uncertainty (similar to the asterisk in AltaVista), since when you set a keyword in capital letters, the words in other cases, in different declinations, in singular and plural, will be used in the search process.

In Yandex, just like in AltaVista, you can build simple and complex queries. But building complex queries requires a higher qualification of the person conducting the search.

To connect keywords in simple queries, operators are used, denoted by the symbols: &, |, ~, (,). Among them, only the tilde (~) has another purpose - in Yandex it denotes negation (and is equivalent to the sign “-” in AltaVista). However, these operators have a significant feature: the keywords that they combine must be within the same paragraph.

Doubling the carrier indicates that words must be within the entire text (and not just a single paragraph).

Yandex has a “search with distance” - you can specify that keywords in the search text should be no more than, for example, three words (and in one paragraph). The distance is given by the / character, followed by a digit that defines the distance. For example, setting the prescription:

round / 3ball search engine will search for documents containing within the same paragraph the words “round” and “ball”, moreover, they may be separated by no more than three words. If the distance is set to a negative number, this means that the second word precedes the first.

Instead of one word in the search prescription, you can use whole expressions. Logically independent elements of these expressions may be enclosed in brackets.

Features search for information on the Internet.

The Internet, as a global information exchange tool, is often used to find the necessary data. There are many ways to search for information (in brackets there are cases when this search method is most applicable): Search using search engines(specific things) Catalogs and Link Collections(more general concepts) Ratings(most popular resources) Conferences, chat rooms and links pages on thematic sites(rare, specialized things). The limited temporal, physical, and financial capabilities of people most often force them to use special directories and search engines (search engines) for this - a kind of librarians who index the array of information available to them on the Internet. This section highlights the features and describes the general rules for the most well-known directories and search engines.

Catalogsrepresent a systematic group of addresses, combined, as a rule, on the subject. To the convenience of their application can be attributed to the fact that if the user knows the topic of the desired document, he will investigate the appropriate branch of the directory, without being distracted by extraneous, irrelevant documents. However, the volume of the catalog is limited by the physical capabilities of the editorial group and its subjectivity in the choice of material. They lack information on narrow, special topics, and the very subject of the document sought cannot always be formulated within the classification classification. Below are the capabilities of foreign and Russian catalogs.

Yahoo! - The most popular directory containing extensive information about tens of thousands of Web-sites. The first level of the hierarchy contains 14 thematic categories, which branch out into another 4-5 sub-levels. It has its own search engine that allows you to: 1) search on the basis of Yahoo!, Usenet or email addresses; 2) to limit the search to materials posted for the last day, week, month, year or 3 years; 3) to issue articles containing at least one keyword or all keywords; 4) search by single-word words or only by the specified key; 5) to display the results of 10, 25, 50 or 100 on one page (more details below). Excite Reviews - Contains reviews of 60 thousand Internet sites (hierarchical catalog) City. Net - information about countries and cities.

Galaxy - Hierarchical directory with detailed description   thematic categories on the first page. Carries out a search by search category, by one or several keywords, a brief and detailed display of search results, go to the Gopher and Telnet pages.

Yellow Pages - Search for information about 16 million American companies in various fields of activity, as well as personal data and email addresses of individuals.

Russia on the Net - The first catalog of Russian resources.

Internet constellation - Covers about 400 servers. The possibility of truncation of terms. Contains the names and brief characteristics of servers. Attractive graphics. Small search area, weak hierarchy.

Yellow Pages Internet - About 1200 Web servers. Large amount of information, well thought out structure.

Treasure Online - Web directory of resources on the server Relcom. AU! - Young, fast-growing directory.


Search enginesin total, more than 150 are known, varying by region of coverage, the principles of conducting a search (and therefore


input language and the nature of perceived requests), the volume of the index base, the speed of updating information, the ability to search for "non-standard" information and the like. The main criteria for selecting search servers are the size of the server's index base and the degree of sophistication of the search engine itself, that is, the level of complexity of the queries it perceives. Traditionally, search engines have three elements:

1. Robot   (krouler, spider, agent), which moves on the web and collects information; Crawlers browse headlines and return only the first link.

Spiders are programs that perform a general search for information on the Web and report the content of a found document, indexing it and extracting summary information.

Agents - the most "intelligent" of the search tools. They can do more than just look: they can even perform transactions on your behalf. Already, they can search for sites of specific subjects and return lists of sites, sorted by their attendance. Search engine administrators can determine which sites or types of sites agents should visit and index. Agents can process the content of documents, find and index other types of resources, not just pages. Some, for example, index every single word in a meeting document, while others index only the most important 100 words in each, index the size of the document and the number of words in it, the title, headings and subheadings, and so on. They can also be programmed to extract information from existing databases.

The online community has adopted the “Standard for robots exemptions”. This standard describes the use of simple structured text fileavailable at a known location on the server (" / robots. txt") and used to determine which part of the server links should be ignored by robots. All" smart "search engines first access this file, which must be present on each server. To date, this file is required by search engines only of such systems as Altavista, Excite, Infoseek, Lycos, OpenText and WebCrawler. This tool can also be used to warn robots about black holes. Each type of robot can send certain commands if it is known that this robot It specializes in a specific area. This standard is free, but it is very easy to implement and there is considerable pressure on robots to try to obey them.

1. Databasewhich contains all the information collected by the robots.

It is very difficult to index an arbitrary document located on the Web. The first robots simply kept the name of the document and the anchors (anchor) in the text itself, but the newest robots already use more advanced mechanisms and generally consider the full content of the document. The indexed information is sent to the database (DB) of the search engine. The type of index constructed determines which search can be done by the search engine user and how the information obtained will be interpreted. People can put information directly into the index, filling out a special form for the section in which they would like to place their information. The database is automatically updated for a certain period of time so that dead links are found and deleted.

2. User Interfaceto interact with the search engine database. When a user searches for information on the Internet, he fills out a search form on the search engine page. Keywords, dates and other criteria can be used here. The criteria in the search form must meet the criteria used by agents when indexing web resources. Both the format and the semantics of the queries vary depending on the search engine used and the specific subject area. Requests are made so that the search area is as specific as possible and narrowed down. The preference is given to use of several narrow requests in comparison with one advanced. Query languagesthe various search engines are mainly a combination of the following functions (Table 3).

Boolean algebra operators AND, OR, NOT:


Based on the search string entered by the user, the query object is searched in the database and a list of relevant links is displayed. The number of documents received as a result of a search on request


can be huge. However, thanks rankingdocuments used in most search engines, on the first pages of the list almost all documents will be relevant (ideally). Basic principles of definition relevancefollowing:

1. The number of query words in the text content of the document (in html code).

2. Tags in which these words are located.

3. The location of the search words in the document.

4. The proportion of words (density), for which relevance is determined, in the total number of words in a document.

These principles are applied by all search engines. And the following are used by some, but fairly well-known (like AltaVista, HotBot).

5. Time - how long the page is in the database of the search server. Many sites live a maximum of a month. If the site exists for a long time, it means that the owner is very experienced in this topic.

6. Citation index - the number of links to this page from other pages registered in the database.

Exist features of the showthe resulting list - some search engines only show links; others display links with the first few sentences contained in the document or the title of the document along with the link.

The result of the query (list of links) is processed in two stages. At the first stage (automatic processing), obviously irrelevant sources are cut off, which are in the sample due to the imperfection of the search engine or insufficient “intelligence” of the query. Further (manual) processing is performed by the user by sequentially referring to each of the resources found and analyzing the information there. When a user clicks on a link from the list, before requesting the corresponding document from the server on which it is located, search engines enter into their database a mark on user preferences. Collected information about user behavior(query wording and resources selected from the list) is successfully used in advertising campaigns on the web.

The most famous non-Russian search engines are compared below.

Altavista . Covers over 30 million pages on 225,000 servers, provides access to 3 million articles in 14,000 Usenet newsgroups. It has two modes: Simple query and Advanced query. In Simple mode, you can enter patterns for searching with at least three specified characters at the beginning of a word. If the word contains at least one capital letterSearch is case sensitive. Below the input line, search tips are displayed. In Advanced mode, you can create complex queries based on the logical operators AND, OR, NOT, NEAR and specify the criteria for sorting the results obtained. You can specify a range of publication dates. Provides the ability to search for images. Convenient interface. High speed, multiple search prescription, the ability to search in Russian, taking into account the morphology. The system does not order the search results, so it is advisable to use it for a specific or exhaustive search. Indexing in this system is carried out using a robot. At the same time, the robot has the following priorities:

Key phrases in< Meta >   tags;

Key phrases at the beginning of the page;

Key phrases by the number of occurrences in the presence of words; If there are no tags on the page, use the first 30 words that are indexed and shown instead of the description (tag description) The most interesting feature of AltaVista is the advanced search. It is worth mentioning here that, unlike many other systems, AltaVista supports the single NOT operator. In addition, there is also the operator NEAR, which implements the possibility of contextual search, when the terms should be placed side by side in the text of the document. AltaVista allows searching for key phrases, while it has a rather large phraseological dictionary. Among other things, when searching in AltaVista, you can specify the name of the field where the word should appear: hypertext link, applet, image name, title and a number of other fields. Unfortunately, the ranking procedure is not described in detail in the system documentation, but it can be seen that the ranking is applied both with a simple search and with an advanced query. In fact, this system can be attributed to the system with extended boolean search.

Hotbot - Covers 54 million pages. Search in Russian is possible. It is a popular search tool due to the presence of mechanisms for constructing complex search queries. Basically, the 1st page of results obtained in response to search query, comes from Direct Hit, then results from Inktomi are taken. A directory listing is provided by the Open Directory. HotBot began to provide its services in May 1996, and in October 1998 it was purchased by Lycos.

Infoseek . Covers 1.5 million pages. Query language allows you to use all possible options   logical expressions. Less complete than on other servers, search results, uncomfortable interface. In this system, the robot creates an index, but it does not index


whole site, but only the specified page. At the same time, the robot has the following priorities:

Headline Words< title >   have the highest priority;

Words in the keywords tag, description and frequency of occurrences of repetition in the text itself;

When repeating the same words next to it, it throws out from the index

Allows up to 1024 characters for the keywords tag, 200 characters for the description tag;

If no tags were used, index the first 200 words on the page and use as a description;

The Infoseek system has a fairly developed information retrieval language, which allows not only to indicate which terms should appear in the documents, but also to weigh them in a peculiar way. This is achieved with the help of special signs "+" - the term must be in the document, and "-" - the term must be absent in the document. In addition, Infoseek allows you to carry out what is called a contextual search. This means that, using a special query form, you can require consistent joint occurrence of words. You can also indicate that some words must be co-located not only in one document, but even in a separate paragraph or heading. It is possible to specify key phrases that are a whole, up to word order. Ranking when issuing is carried out by the number of query terms in the document, by the number of query phrases minus common words. All of these factors are used as nested procedures. Summing up, we can say that Infoseek refers to traditional systems with a term weighting element in the search.

Infoseek Ultra - 50 million pages of WWW, search in Russian, image search is possible.

Lycos . Covers 68 million pages. You can select search options: one, several keywords or a phrase; truncation of terms; restrictions on the number of matches; the degree to which the search results match the keywords; the form of the output of the results (short or detailed); The number of terms found on each page. Low speed and speed of updating information. Lycos uses the following indexing mechanism:

Words in< title >   header have the highest priority;

The words at the top of the page;

Like most systems, Lycos makes it possible to use a simple query and a more sophisticated search method. In a simple query, a sentence in natural language is introduced as a search criterion, after which Lycos normalizes the query, removing the so-called stop words from it, and only then proceeds to its execution. Almost immediately, information about the number of documents per word, and later a list of links to formally relevant documents, is issued. The list against each document indicates its measure of proximity to the query, the number of words from the query that hit the document, and an estimated measure of proximity that may be more or less formally calculated. While you can not enter logical operators in the line along with the terms, but the use of logic through the menu system Lycos allows. This feature is used to build an extended request form for sophisticated users who have already learned how to work with this mechanism. Thus, it is clear that Lycos belongs to a system with a query language like "Like this", but its expansion is planned for other ways of organizing search prescriptions. In October 1998, Lycos acquired HotBot, which, at present, is used as a separate service.

Wais is one of the most sophisticated Internet search engines. It does not implement only the search for fuzzy sets and probabilistic search. Unlike many search engines, the system allows you to build not only embedded Boolean queries, read the formal relevance of various proximity measures, weigh the terms of the query and document, but also correct the query by relevance. The system also allows the use of truncations of terms, dividing documents into fields and maintaining distributed indexes. It is no coincidence that this system was chosen as the main search engine for implementing the Britannica encyclopedia on the Internet.

Yahoo . The secret of Yahoo's success lies in people. Yahoo has about 150 editors to compile and edit the contents of their directories. Yahoo has a database of more than 1 million indexed sites. Also, in the event of a shortage of its own database, Yahoo uses the Google database (until July 2000, Yahoo used the Inktomi database). Yahoo is the oldest search engine that launched its services in 1994. Yahoo language is simple enough: all words should be entered separated by a space, they are connected with a bunch of AND or OR. When issuing does not indicate the degree of compliance of the document with the request, but only underlined the words from the request that are encountered in the document. It does not normalize vocabulary and does not analyze the "common" words. Good search results are obtained only when the user knows that there is certain information in the Yahoo database. Ranking is based on the number of query terms in the document. Yahoo belongs to the class of simple traditional systems with limited search capabilities.



Prospects for the development of Internet search tools

The following Internet trends are undoubtedly:

Growth of available information and information needs of users

Expanding the boundaries of the Internet through the accession of new countries

Strengthening the commercialization of services

Increase speed, bandwidth and number of ways to access the network

Deepen the differentiation of services by target audiences (circles of interest)

Combining homogeneous services into single portals (queuing sites)

Influences of “give” - the protocol affect the development of means of collecting information about the behavior of users on the web

All this will push the automation of search tools and semantic information processing such as:

Personal autonomous intelligent agents (such as “Search +”)

Personalization and intellectualization of search engines on search portals (setting the method for displaying a list of links, using the Cookies mechanism, filling in special questionnaires and “subscribing to a query”, semantic ranking of query results)

Foreign search servers:

Russian servers are better suited for searching in Russian, while foreign servers are better suited in foreign languages, although, for example, Google does a good job of searching in many languages. We will talk more about the most popular search engines later, considering the advanced search capabilities, since each of these systems has its own characteristics. Now we’ll focus on some basic rules for building search queries that are common to all search engines.

Despite the claims of many search engine owners that queries can be written in virtually a language that people use to communicate with each other, this is far from the case. Most likely, the time will not come soon when the computer and the person will be able to communicate in a natural (for a person) language. However, we must pay tribute to search engines: recently they have become much better understood by the user, and the search results are now more in line with expectations than it was a few years ago. This happened largely due to the introduction of new language technologies.

It follows from the above that in practice it has become easier for an ordinary user to find the necessary information. Search engines are now looking for not only the requested word, but also its word forms, which allows you to make search results more accurate. For example, if the search query contains the word smart, then its results will contain not only this word, but also its derivatives: smart, smart, as well as the mind and even the mind. Of course, pages with word forms will not be among the first search results, but elements of artificial intelligence are obvious. This fact is useful to consider when building search queries. Now I will talk about a few more such facts.

Search engines are not case sensitive when processing a request. Therefore, requests for holidays in Turkey and holidays in Turkey are identical from the point of view of a search engine.

It should also be remembered that the use of punctuation marks in search queries is not necessary, more precisely, not even necessary, since they are also ignored by search servers. But many traditional punctuation marks can be used to construct complex advanced queries, the search results for which are usually much closer to the expected ones.

Most search engines (except, perhaps, Google) also ignore short words that do not carry meaning. In Russian, these are prepositions, conjunctions, etc., in foreign ones, for example, articles.

Many search engines allow you to deal with typos and incorrect keyboard layout. For sure many readers often happen to type with the included english layout   Keyboard Russian word, and it turned out, for example, gfhjdjp instead of the engine. The same “Yandex” will immediately determine that something is wrong here and at the top of the search results page will display a link: Perhaps you were looking for: steam locomotiveby clicking on which you can get a page with the correct results. Similarly, you can deal with typos. If the search server seems to have a mistake or typo in the word, he will warn about it with the same phrase: Perhaps you were looking for.

Let's talk about which words are better to take for a query to a search server. First, from the topic of interest to the user, you need to take the most important words that reflect only the essence of the question. For example, if you need material on the topic “Catching penguins in Antarctica in the polar night”, then it is not necessary to write the entire phrase, the search results in this case, most likely, will not suit the requester, since there will be a lot of excess. There is an expression “A machine should work, a person should think”, and it is said about such a situation. The task of the user in the preparation of a search query is to highlight keywords, the task of a search server is to best handle the entered query. In the example in question, the words catching penguins can be considered key. After all, it is known that, apart from Antarctica, they are not found anywhere else, but the “polar night” should be discarded, if only because in reality it is quite difficult to work in such conditions.

This ironic example illustrates that the user, based on his knowledge and logic, must select only the necessary keywords, without overloading the request with unnecessary terms.

Consider an example illustrating common mistakes of novice users when searching the Internet. At the request of the puzzle about musical instruments, the search engine did not produce useful results. Then the user decides to correct the request, adding to it and writing: riddles for children about musical instruments - the search results were even worse than the previous one. In such a situation, it is said that the query conditions have become more stringent, in contrast to the softer ones established in the previous case. For this example, a good solution, as practice has shown, was the search for the keyword riddles. There are many such sites on the Internet, and if you go to the site itself and search a little in its sections, you can easily find the information of interest.

Arguing about which words should be used in a search query, we can formulate several rules:

Choose only the most important keywords related to the topic in question;

Words should not be too many or too few; some consider the query consisting of three to four words to be optimal, but in different cases this number may vary;

In case of unsatisfactory search results, try to apply softer conditions for the query, but in no case less stringent;

If you are not satisfied with the search results on one search server, then try searching on another; mechanisms of work at servers vary, so the results can vary radically.

I hope that the above information will help readers find the necessary information on the web. If you still cannot find something, then advanced search methods will come to the rescue.

In order to provide a more efficient search on the Internet, search servers offer the option of advanced search as well as search using the query language. To distinguish between these concepts, I will give their definitions.

Advanced Search- the ability to search with an indication of many different parameters. For this purpose, search engines have separate pages on which you can set such parameters. The principles of advanced search are similar in most search engines.

Query language- a system of commands that allows you to change the query parameters from the main search string using special commands. Focused on advanced users.

Consider additional search features on the examples of search engines "Yandex" and Google. Why on them? Because Yandex is the most popular search engine on the Russian-language Internet, and Google is the most popular search engine in the world. However, with success, you can use other search engines, but for now we will focus on the two mentioned.

Search server "Yandex"

This search engine is one of the oldest in the Russian segment of the Network. Yandex began its activities in 1997, when the Internet began to develop in the post-Soviet space. Gradually gaining momentum, Yandex has today become the most popular search engine on the Russian-language Internet, with a daily audience of over 4,000,000 people. About half of all Russian-speaking Internet users use its services. When searching the Internet, Yandex was one of the first to take into account the morphology of the Russian language, that is, to use various forms of the word, as mentioned above.

It is worth mentioning another interesting feature of the Google search server called the button I'm lucky. Its pressing leads to the fact that the search result is not a page with a list of links, but a transition to the first found site. This button is useful when searching, for example, sites of large organizations. If you type in the search bar of Moscow State University and click I'm lucky, then the site of Moscow State University will immediately open.

Alternative search tools

Despite the fact that in today's Internet search engines are the main way to search for information, there are other methods to find it. Such alternative ways   sometimes help to find what could not be found using a search engine.

Metasearch

Despite the universality of search engines, the search results for each of them are almost always different. To search separately on each search engine for information of interest is quite difficult and tiring. It is easier to find the necessary information with the help of a meta-search, which allows you to search for the keywords entered by the user from several search servers automatically. Then the results are grouped according to the following principle: the more search engines have found a specific site and the higher its position in the search results, the higher it will be in the meta-search. Working with a metasearch server, from the user's point of view, is practically no different from a search on a regular search engine: the same input of keywords, the same button click To find.

The most powerful servers of meta-search in the Russian-speaking Internet - Nigma.ru ( www.nigma.ru) and MetaBot.ru ( www.metabot.ru).

The search engine Nigma.ru supports searching the following search engines: Google, Yahoo !, Msn, Yandex, Rambler, Altavista, Aport. You can search in Russian or english language. Provides a simple language searches, as well as advanced search. Nigma.ru (fig. 3.3) searches for documents taking into account Russian spelling, and also corrects possible errors. In addition to the traditional search for web pages, you can search by section: Pictures, Libraries, Musicand even Presents.



Fig. 3.3.Metasearch Nigma.ru


MetaBot.ru meta-search engine (Fig. 3.4) supports searching by a large number of search engines, among which there are little-known, one can even say exotic: Alltheweb, Google, Inktomi, Yandex, Northernlight, Altavista, Lycos, Webtop, Aport, Euroseek, Rambler , Links2go, Excite.



Fig. 3.4.The main window of the meta-search server MetaBot.ru


However, according to the administration of MetaBot.ru, the number and composition of the search engines surveyed may vary. When searching using this server, you can select one of its types:

  RUSSIAN SEARCH- Search in the Russian-language Internet;

  THE WHOLE WORLD- Search all over the Internet;

  SEARCH FILES- search for information itself;

  MP3 / VIDEO- search for music and video.

Like Nigma.ru, MetaBot.ru has a simple search query language, but unfortunately it doesn’t have an advanced search option.

Web directories

Another alternative search method is web directories. This is essentially an Internet site that contains links to various web resources. These links are grouped into sections, which may have subsections, which, in turn, are also divided into subsections, etc. Entering, for example, in the section “Business and Finance”, you can see the subsections “Management”, “Currency”, “ Lawyers ”,“ Security ”,“ Banks ”,“ Taxes ”, etc. Usually, in the web directory there is an internal search in directory resources. Links to specific websites are usually accompanied by a small description of the resource, which makes the search more convenient.

At the dawn of the Internet, when the number of sites on the web was relatively small, search in web directories was the primary means of searching for information on the World Wide Web. Over time, the Internet has grown, and it has become increasingly difficult for administrators of such directories to ensure that they work and to keep the information provided up to date. Webmasters who created their websites were given the opportunity to add links to their resource in the web directory themselves. However, it became more difficult to track the correctness of the added links and descriptions to the administrators of web directories. Therefore, gradually, web directories gave way to search engines that could automatically index the contents of Internet sites, and human participation was minimal, and the results obtained were often more consistent with the requirements.

However, web directories are still alive today. Many major Internet portals have web directories. Many search engine owners consider it nice to have their web directory. Here are some of the current web directories:

Catalog of Russian Web Servers - www.weblist.ru/russian;

Often with the help of web directories you can find interesting and informative resources that could not be found using search engines.

Find answers to questions

Strictly speaking, what is being discussed now is not a pure search. However, considering the ways of searching on the Internet, one cannot but mention expert nodes. As already mentioned, people have not yet taught computers to understand living human language. Sometimes a situation arises when it is impossible to find the answer to an interesting question, formulating this very question only with the key words: it is required to state the essence of the problem in a living language. For example, you bought an unknown fruit on the market and no one has to ask what it is (one wonders, why did you buy it, but this is not important anymore). It is important that no matter how hard you try, traditional and many alternative ways of searching the Internet will not be able to help here. It is in this case that the expert node will come to the rescue - a specialized website, a kind of information center, supported by the knowledge and judgments of living people.

The work of the expert site in many ways resembles the work of the forum (for more on the forums, see Chapter 9, section “Dating, chatting, searching for friends”) and is structured as follows: one person asks a question in the usual “human” language, while other visitors give the answer is again in ordinary language. In this example, the fruit can be compared with the question: “Who knows the name of the fruit of such and such color, such and such smell, of such and such size that looks like this or that?”

Since the number of visitors to expert sites is large, as a rule, the answer lies within a few minutes. Most often, the result of a question is a highly qualified and individual answer or advice, which no one can give. search system. All questions are divided into categories, almost like in web directories, which makes it easier for participants to communicate with each other.

For the Russian-speaking Internet, the phenomenon of expert nodes is not yet very common. Of the more or less large ones, it is possible to name only the project [email protected] ( otvet.mail.ru). The project has more than 7.5 million participants, and their number is constantly growing (Fig. 3.5).



Fig. 3.5.Project [email protected], allowing you to find the answer to any question


Users who are ready to ask a question in English and get an answer on it can resort to the services of foreign expert nodes, such as AskMe ( askme.com) or LookSmart Live.

Having understood the whole search for web pages and documents, you should pay attention to the search for files of all formats, as it has some peculiarities compared to the text search. By files it is meant files in their pure form: pictures, audio and video files, programs, archives, etc. (in fact, a web page is also a set of files). Consider some of the subtleties of finding pictures, music and other files.

Search for pictures on the web

Billions of pictures stored on the Web can be used for a variety of purposes: as a picture Desktop, as an illustration of a scientific or other work, for creating your own postcards or presentations, etc. Searching pictures is more difficult than text, since indexing pictures for a search engine is more problematic than text indexing, because computers have not yet learned to recognize patterns.

No matter how clever computers may seem, no matter how hard their computing power is, modern computers are still not able to cope with a task that a five-year-old child can accomplish, for example, to distinguish a cat from a dog. And despite the fact that the end user when searching for a picture describes it with words, and not images. Defining a list of keywords for a particular image on the Web is the main problem facing search engines.

To solve it, the following parameters are analyzed: words appearing on the page next to the picture; the name of the link leading to the drawing; image file name; the name of the site, etc. Whatever it was, but you can find pictures on the Internet.

However, when searching for pictures, it is necessary to take into account the presence of the problems described above for finding them. Far from always, the figure found will depict what was expected. This must be remembered when building a search query. It's one thing when you need to find an image of a bicycle (that is, a specific object), and quite another when you need a picture on an abstract topic, such as jogging. In the latter case, you will have to experiment and try, for example, the following keyword options: “jogging”, “runner”, “athlete”, etc. Thus, unlike text search, when searching for pictures, if they cannot be found right away, you need to use related and sometimes distant concepts.

Consider the tools used to search for images on the web. As mentioned above, using meta-search servers, you can search for pictures by simply entering keywords in search string   and selecting the search function for pictures. By the way, conventional search engines also provide the ability to search for pictures. On home page   almost any search engine has a link Picturesor Drawingsby clicking on which you can search for pictures. For example, the Yandex image search service is located at www.images.yandex.rucorresponding to the service provided by Google, you can use the page www.images.google.com. The result of the query will be a page with reduced copies of the images found (Fig. 3.6).



Fig. 3.6.Image Search Results for Piggy Google


Clicking on any of these miniature images, you can see its full-size version and, of course, save the drawing on the computer's hard drive.

In addition to the usual search engines, you can use specialized services for image search, the task of which is solely to find images. Among them - GoGraph.com ( www.gograph.com) and Picsearch ( www.picsearch.com). A common drawback of these services is their English-language interface. Briefly describe both of these search engine.

This server is, rather, not a search engine, but a web directory of pictures with a search function. As in the usual web directory, on GoGraph.com (Fig. 3.7), all pictures are structured into categories, which allows you to search for images manually, without resorting to search functions.



Fig. 3.7.Search directory GoGraph.com


You can search for all types of files or select one or more of the following: Animated gifs(Gif-animation), Photos(Photo), Icons(icons), Clip art(clipart). Some images, in particular images with high resolution, will be available only for money.

Picsearch is an easy-to-use service with simple interface, the possibility of an advanced search on various parameters and even your own mini-language of search queries (+ commands are supported (it is necessary to include the word) and - (delete the word)). As stated on the main page of the service, you can search for more than 2 000 000 000 pictures. Picsearch (Fig. 3.8) works pretty quickly and, despite its English-language interface, does a good job with handling Russian-language queries.



Fig. 3.8.Home page of image search server Picsearch


If you enter a bike request on the page, then after processing and issuing the result, the server will offer to try other frequently used queries: children bicycle, water bicycle, mountain bike. This approach confirms that the understanding of the Russian language at Picsearch is organized at a high level.

Perhaps this is all the basic information relating to the search for images on the web. Consider the equally useful and interesting features of the Internet search - search for multimedia files.

Search for music and video

Like image search, multimedia file search has features. From the point of view of search engines, multimedia files are indexed easier than images. Firstly, most modern audio and video formats support the ability to store textual information inside the file itself, which is read by the search server and then used when searching for such files. Secondly, unlike pictures, multimedia files are usually placed on the web specifically for downloading. Therefore, such files usually have a name corresponding to the content (the file is called the same as the musical composition or film it represents), as well as a link that points to the address of such a file and contains the name of the piece.

In this regard, ordinary search engines are often used to search for multimedia files, which bring good results. And you can search without the use of special search engine tools, that is, a multimedia file is searched in the same way as a regular web page.

The question of the presence of audio and video on the Internet is of great concern to the copyright holders of such products, because multimedia files are often distributed illegally, which is detrimental to the legitimate copyright holders. In recent years, the circulation of audio and video products on the Internet has become more and more civilized. Some large portals that offered to download MP3 music earlier (and, as a rule, free of charge) have now ceased to exist or retrained and transferred their activities to the legal direction.

An example of this is the large-scale music portal MP3Search.ru ( www.mp3search.ru), which was originally conceived by the creators as a system for finding free music files on the web. However, over time, the project was forced to change the tactics of work, and today it is one of the largest resources on the Russian-language Internet involved in the legal distribution of musical compositions (of course, for the appropriate fee). The MP3Search.ru portal presents a large amount of music of different styles and directions; you can search manually by the principle of a web directory or using the search string.

When searching for a video, there is also a similar situation. Illegal sites gradually cease to exist, giving way to legal ones. It is worth mentioning the search server GoGo.ru ( www.gogo.ru), which was one of the first in the Russian-speaking Internet to provide a video search service. Search for the required video on this service simple: just type the keywords in the search bar and select a category Video. In the search results will be exclusively resources that distribute video legally.

Search by FTP servers

You can find a variety of file types on current FTP servers today, including music and video. A significant part of FTP-resources has paid or limited access, however, many of them are publicly available. Having accumulated some experience and having determined the most preferable resources for themselves, the user can search for files on them, simply moving from folder to folder. However, it is much more convenient to use the services of FTP search engines that search for files on various FTP servers.

The peculiarity of the file search is that, besides the name and type, they do not have any signs that could identify them. These are not even pictures, the contents of which can be determined using special algorithms. The contents of the file can not be classified. True, multimedia files often contain some information about, say, the name of the song or artist. However, archives, programs, and many other file types cannot be classified. As a result, the main feature by which a file can be found is its name.

Taking into account this specificity, FTP search engines work. The ability to search on FTP servers as an additional function is provided by some ordinary search engines. For example, "Rambler" ( http://ftpsearch.rambler.ru/db/ftpsearch/) or MetaBot.ru meta-search server already mentioned in the context of web page search ( www.metabot.ru). In addition, there are specialized search engines focused exclusively on search on FTP-servers. The most famous of the Russian - FileSearch.ru ( www.filesearch.ru). One of the most powerful foreign FTP search engines is FreewareWeb ( www.freewareweb.com).

Consider the capabilities of such search servers on the example of FileSearch.ru (the work of other FTP search engines, including English-language ones, is built in a similar way).

The main page of FileSearch.ru, like most search engines, looks simple enough, however, it is replete with ads. In addition to the actual search string, there is a drop-down list in which you can select the type of files to search from the following options:

  file / directory- Search all files and folders;

  music (mp3)- search for MP3 music;

  Images- image search;

  video- search for video files;

  server- Search FTP server.

In another drop-down list, you can select the geographical boundaries of the search: in Russiaor on all(i.e. worldwide). Having set the necessary parameters, you should type the name of the necessary file or its part and click the button To find- Search results appear almost instantly.

Board

The file name can be specified using the special characters * and?, Which mean, respectively, any group of characters and any single character.

There is also a link on the main page of the search server. Driversby clicking on which you can get to the search page for drivers for various computer devices. Selecting the type of device and its manufacturer, you will need to specify a specific model, as a result of which links to the found drivers stored on FTP servers will appear.

Like most self-respecting search engines, FileSearch.ru provides advanced search capabilities. Going to the home page of the search link Advanced Search, you can get to the page (Fig. 3.9), where you can change some of the file search parameters.



Fig. 3.9.FileSearch.ru advanced search page


This page contains almost no settings that require explanation. Among the possibilities, it should be noted that the display is turned off in the search results of files intended for use on Unix systems. It is advisable to use this opportunity when you need to find files for Windows, so as not to overload the search results with unnecessary data. Sometimes useful is the possibility of limiting the file size, if, for example, it is known that the desired file cannot be less than a certain size or to discard too large files, the download of which to a computer can be time consuming.

There is also a search query language on FileSearch.ru, which is in general similar to the languages ​​of traditional search engines, so we will not dwell on it in detail. You can view the list of language commands at www.filesearch.ru/help/regexp.html.

Let me remind you that in order to download files from an FTP server, you will need one of the FTP client programs (see Chapter 2, Section “Downloading and Uploading Files via FTP”).

Summing up the story about the possibilities of searching the Internet, it is worth mentioning another new phenomenon on the Web - a visual search, the essence of which is in a visual and colorful presentation of the search results.

I also note that for a successful search from the user requires the acquisition of certain skills. Also, remember: “He who seeks will always find!”

The most popular web service today is just a search engine. Everything is explainable, because those times when the representatives of the first Internet users were able to watch the news on the network have long gone.

Information appears and accumulates so much that it became very difficult for a person to find exactly that which he would need. Imagine how you would search on the Internet, if an ordinary user would have to search for information do not understand where. Do not understand where, because you will not find much information by manual search.

Search engine what is it?

It is good if the user already knows the sites where you may have the necessary information, but what to do otherwise? In order to make life easier for a person in finding the right information on the Internet, search engines or just search engines were invented. The search engine performs one very important function, without which the Internet would not be the same as we used to see it - this is searching for information on the network.

Search system   - This is a special web site or in another way a website that provides users with hyperlinks to pages, sites responding to a given search query.

To be a little more accurate, then search for information on the Internet, carried out thanks to the software and hardware functional set and a web interface for interacting with users.

For human interaction with the search engine, a web interface was created, that is, a visible and understandable shell. This approach of search engine developers makes it easy for many people to search. As a rule, it is on the Internet that a search is performed using search engines, but there are also search systems for FTP servers, certain types of goods on the world wide web, or news information, or other search directions.

The search can be carried out not only on the text content of sites, but also on other types of information that a person can search for: images, videos, sound files   etc.

How is the search engine search?

Self searching on the Internet, just like browsing a web site is possible with internet browser browser   . Only after the user has specified his query in the search bar, the search itself is performed directly.

Any search engine contains the software part on which the entire search engine is based, it is called the search engine - it is a software package and provides the ability to search for information. After referring to the search engine, forming a search query by a person and entering it into the search bar, the search engine generates a page with a list of search results that are most relevant, according to the search engine, are located above.

Search Relevance - searches for the most relevant materials for a user’s request and the location of hyperlinks on a search page with more accurate results than others. The distribution of the results itself is called site ranking.

So how does the search engine prepare for the issuance of its materials and how is the search for information by the search engine? A unique robot for each search engine or, in a different way, a bot that also has a number of other synonyms like a crawler or a spider contributes to gathering information in the network, and the search system itself can be divided into three stages:

The first stage of the search engine operation can be attributed to scanning sites on the global network and collecting copies of web pages to their own servers. This forms a huge amount of as yet unprocessed and unsuitable information for search results.

The second stage of the search engine is reduced to putting in order the information received from the sites at the first stage. This sorting is performed, which in the shortest time will favor the very high-quality search that users actually expect from the search engine. The stage is called indexing, which means that the pages are already prepared for issuance, and the current database will be considered an index.

Just the third stage and causes search results, after receiving a request from your client, relying on or near the key words specified in the request. This contributes to the selection of the most relevant information request, and its subsequent issuance. Since the information is very, very much, the search engine ranks in accordance with its algorithms.
The best search engine is the one that will be able to provide the most correctly responding to the user's query material. But even here there may be results that are affected by people interested in promoting their website, such sites, although not always, but often appear in the search results, but not for long.

Although world leaders have already been identified in many regions, search engines continue to develop their high-quality search. The better the search they can provide, the more people will use it.

How to use a search engine?

What is a search engine and how it works is already clear, but how to use it correctly? Most sites always have a search bar, and next to it is the Find or Search button. A query is entered into the search box, after which you need to click the search button or, as it happens more often, press the Enter key on the keyboard and within seconds you get the result of the query in the form of a list.

But to get the right answer to the search query, it’s not always possible the first time. To ensure that the search for the desired does not become painful, it is necessary to correctly compose a search query and follow the recommendations described below.

We make a search query correctly

Below are tips on using a search engine. Following some tricks and rules in the search for information in a search engine will provide an opportunity to get the desired result much faster. Follow these guidelines:

  1. Proper spelling of words provides the maximum number of matches with the desired information object (Although modern search engines have already learned how to correct spelling errors, but this advice should not be neglected).
  2. By using synonyms in the query, it is possible to cover a wider search range.
  3. Sometimes changing a word in the query text can bring more results. Re-generate the query.
  4. Bring concreteness to the query, use exact entries of the phrases that should determine the main essence of the search.
  5. Experiment with keywords. Using keywords and phrases can help determine the core essence, and the search engine will produce a more relevant result.

So such a search engine is nothing more than the opportunity to find information of interest and is usually completely free to use it, learn something, understand something or make the right conclusion for yourself. Many no longer represent their lives without voice searchin which the text does not have to be typed, you only need to say your request, and the input device here is a microphone. All this testifies to the continuous development of search technologies on the Internet and the need for them.

Like this article? Share with friends: