The Secret behind Google Search
Thursday, November 16, 2006
Google runs on a distributed network of thousands of low-cost computers and can therefore carry out fast parallel processing. Parallel processing is a method of computation in which many calculations can be performed simultaneously, significantly speeding up data processing. A search which typically takes less than half a second, is the result of a complex journey that typically makes at least two stops, often thousands of miles apart.
2.) Google boats feed key information from a web page to google’s central network URL. It includes full text of the page, reference to images and other links,
3.) At the central network the information is indexed. Every word that could be used in a search query is listed along with information referencing websites where the word can be found.
4.) The index is broken in to “shards” and sent to data centers. – Facilities made up of thousands of servers wired together around the world. Because centers may have slightly different versions of the index, depending upon when they received last update. So users in different places may get slightly different result for same search.
When people searches google, they are asking google to find every instance of the term in its index and rank the correspondent pages by its relevance.
1.) The users type a keyword in the search box. Typically it is two or three words. This can make, finding the most relevant results challenging. Also one in 10 queries may be misspelled.
2.) Before processing the keywords, google locate the user’s location using the IP address. It helps google to use the nearest server or data center for processing the keyword. It also helps google to display geographically matching ad words and advertisements.
3.) The query or keywords are sent to central network and redirect to nearest server or data center.
4.) At the data center the search term is run through the index. Matching terms are sent back to the central network. Then to the user with a summery of the web pages called “snippets”.
Google determines which web sites are most relevant to a search term by using its page ranking algorithm or page ranking formula, a formula that weighs more than 200 measurements, such as number of times a keyword or search term appeared on the web page. It also considers the page impressions and the popularity of the site.