Search network. How to write a search engine

Instructions

Divide your search engine into three parts. The first part is the interface of the future web search engine, which is written in PHP. The second part is the index (My SQL database), which stores all the information about the pages. The third part is search robot, which will index web pages and put their data into the index, it is done in Delphi.

Let's start creating the interface. Create index.php file. To do this, divide the page into two parts using tables. The first part is the search form, the second is the search results. At the top, create a form that will send information to the index.php file using the get method. There will be three elements on it - a text field and two more buttons. One button is needed to send a request, the second - to clear the field (this button is optional).

Name the text field "search", the first button (the one that sends the request) the name "Search". Leave the name of the form as it is - "form1".

Connect the config file to connect to the database.

include "config.php";

Check if the "Search" button was clicked.

if (isset ($ _ GET ["button"])) (code executed if the "Search" button is pressed) else (code executed if the "Search" button is not pressed)

If the button is clicked, then check for a search query.

if (isset ($ _ GET ["search"])) ($ search \u003d $ _ GET ["search"];)

If search query is, then assign the text of the search query to the variable $ search.

if ($ search! \u003d "" && strlen ($ search)\u003e 2) (database search code) else (echo "An empty search query was specified or the search string contains less than 3 characters.";)

In the event that the search query satisfies the upper condition, run the search script itself.

Run a loop that will print the search results through printf.
That's all. If you have the necessary knowledge, then you may well add the elements you need to the search engine and draw up your own algorithm for its creation.

Popular websites attract users not only with their original design, interesting thematic content, but also with functional services. People go to the Internet for information, daily searching for materials of interest to them. So it makes sense to create search engine on websiteby giving users the ability to quickly find what they need from hand-selected resources.

You will need

- browser;
- Internet connection;
- the right to edit the content or templates of the pages of the site.

Instructions

Start building the system custom search based on Google technologies. Log into the search engine management service panel. Open the page with the address in the browser http://www.google.ru/cse/... Use your Google account to work with the system. Click on the "Create user search system" button. If you are not logged in at the moment, then click on the "Login" link. Enter the data from your account in the form and click the "Login" button. If you don't have in common google account, create it by clicking on the link "Create an account right now" and following the suggested steps.

Enter the basic parameters of the custom search system to be created. Fill in the "Name" and "Description" fields, select the interface language in the "Language" drop-down list. In the "Sites to search" text box, enter the list of resources, information from which will be presented in the search results using the system being created. Click "Next".

Get the javascript code to install the search engine on the site. Select all content in the text box on the current page. Copy the selected content to the clipboard and save it to some temporary file.

If you have already decided on a future mail server, you can proceed to registration email address... The registration process on any portal is about the same and offers filling out a questionnaire and instructions security question, in case you forget your email password. Filling out the questionnaire must be approached responsibly, since if your mail is hacked, you will have to provide the registrar with the data from the questionnaire. Therefore, if you decide to take a pseudonym or deliberately use fake data, you should save it in a safe place.

note

Emailshould generally be offered free of charge. But there are sites that do it for a certain monthly subscription fee with a beautiful and exclusive domain name and a lot of additional features. Before buying mailboxit is worth considering all the possibilities free services, and then accept commercial offers.

Useful advice

When choosing mail server pay special attention to popular portals that offer post service... As a rule, such portals are time-tested and guarantee reliability and functionality.

To improve performance reliability site, safety of information on the site, increase in traffic site, reducing the load on the site, etc. do mirror site... It is understood that in the case when the main resource is unavailable for a number of reasons, then the visitor gets to the spare resource, that is, the site mirror.

Many newcomers in the field of “webmastering” (let's call it that) at some point acquire a “brilliant” idea, “and not to stir up my search engine ?! sell advertising, cut the loot! " I confess, I have had this too ... 3 times.

Runet search engine - Yandex killer

I collected links on topics, began to study, shoveled everything that I found on Aport and Yandex. I downloaded several free engines with spiders, but I didn't have enough "knowledge" to even install them. The pain of invention is tricky: I took a directory script (without a database, on txt files), with a search in the database of sites and began to fill it with sites: first myself, then I hired a moderator. And what do you think? Of course, the idea failed, but ideas appeared, which poured into the search engine through books, about it - further.

Book search engine

Having rummaged through the few runets (about 2004-2007), I took two books: Hummingbirds and Bolero, the reason for choosing is simple - in both cases, it was possible to download databases with the goods of these stores from the partner interface. There was little information in the databases: the title of the book, the author, the address on the store's website. But this was enough to create a directory + search engine. Moreover, annotations were also issued for books (they were parsed in real time from store sites, yes, I did not even suspect about caching then, just as I did not use an automatic redirect ...).

The book search engine was not a success, but the catalog brought in tones of oil traffic from Yandex, respectively, book sales. Most of the purchases were delivered by mail, cash on delivery, so it took months to receive receipts to the account ... Russian Post.

Google killer

The main direction of my work was in the "bourgeois", in particular, I worked with PPC, mainly with Yumax, and therefore I chose their feed as the "engine" for the next search engine. Armed with php (or rather, redoing the parsers of book catalogs), I learned to add additional information to the output according to the user's request, pictures, etc. (just like now 🙂).

And then something wonderful happened. Search engines: Msn (now Bing) and Google began to index the results of "my search engine" and delight with traffic, which in turn was generously paid for by Yumax.

And while my colleagues riveted doors, I riveted such search engines: different disas, different sources for additional information... Why make doorways and redirect traffic to the feed, risking getting banned due to a redirect, when you can create, for example, thematic mini-sites, without a redirect? White doorways, it seems that they are now called so. The idyll did not last long - less than a year. Algorithm changes at the beginning to MSN, then they buried at Google similar solutions (more precisely, made them much less effective).

Somewhere during the collapse of the "system" in MSN, I "out of grief" took one of the banned domains - the site and transferred to it a blog that had previously been running on some forum or within the site of an advertising agency.

3 times! 3 times stepped on a similar rake: some people don't even learn from their own mistakes :)

Subscribe to our newsletter and get what is not included in the blog, announcements and thematic collections + several guides (collecting subscribers and selling information).

Have you ever wondered how such search engines as Yandex or Google work? If you had a task to write search engine from scratch, where would you start? Surely many of you have already written simple content sites with an internal search system for them, and the search was implemented very simply - using the LIKE command of the SQL syntax. Do you think Yandex works like that too? 🙂

To tell about all the mechanisms implemented in modern search engines is clearly not a task for one post (and I can't tell you much много), so here I will tell you about the most significant and unknown part of search engines for many - the index. But let's not rush.

In general, the entire search engine can be roughly divided into 3 parts: user interface, search agent and index.

The user interface is familiar to everyone - google.com, ya.ru. This is usually just a search string. A search agent is a program that crawls through sites, collecting page texts and url from them. The search agent stores the collected information in the index.
Well, the most important part is the index, or search base data.

The index stores all information collected by search agents - Internet pages.

In general terms, the agent's job is to collect information - he goes to the site page, takes the text from there, pulls out links from it and sends them along with the text to the index. This moment should be considered in more detail, since it consists in main job search engine.

How exactly is the data stored in the index? What is the structure of the index tables? This is just one of the key details of the search engine.

Considering the search engine in a very abstract way, the index can be divided into three tables: dictionary, documents, and links.

To make it clearer, imagine three tables:

words (dictionary) with fields:
id, name

documents (documents) with fields:
id, document

and relations with fields:
word_id, doc_id

Before adding the page text to the index, the search engine breaks it down into words. After receiving a list of words from the document, it adds to its dictionary (table words) those words from them that are not there yet. And the document itself is saved in the documents table.
After that, word-to-document links are added to the relations table, which determine which words occur in which document.

The next complex process that is implemented in search engines is the selection and ranking of information at the user's request.

After you go to google.com and type “php” into the search, a very complex mechanism is launched, the purpose of which is to show you a list of documents related to the request, in descending order of relevance.

How is this implemented? Everything is very complicated here. First, the search engine must select the relevant documents - those documents in which the specified words are found. With the help of the tables mentioned above, one can already generally imagine how this is done. But with ranking (ordering), problems are already beginning that each search engine solves in its own way.

Very sophisticated clustering and classification algorithms are already used here, which divide all documents into groups and determine a category for each document. Already, based on these categories, some information appears about the degree of relevance of each document. In addition to this factor, a huge number of others are also taken into account in modern search engines.

You should really distinguish between search engines on the web (google, Yandex, etc.) from relatively small information search engines. The former are much larger in scale than the latter, which means that their structure is much more complex.

Small search engines include projects such as sphinx and lucene.

That's all. Such is a small and useful excursion into search engines. 🙂

Additional Information.