B Cognizance- IIITA e-magazine

SEARCH ENGINES

by Sumit Miglani, M.TECH(WCC) Class of 2007, IIIT Allahabad

Searching the Internet is one of the earliest activities people try when they first start using the internet but the Web is not indexed in any standard manner, finding information can seem difficult as there are millions and millions of web pages out there. Search engines are the key to finding specific information on the vast expanse of the World Wide Web.

“A program that searches documents for specified keywords and returns a list of the documents where the keywords were found.”

Without the use of sophisticated search engines, it would be virtually impossible to locate anything on the Web without knowing a specific URL (Uniform Resource Locator) especially as the Internet grows exponentially every day. But do you know how search engines work? And do you know what makes some search engines more effective than others?

There are basically three types of search engines: Those that are

• powered by crawlers or spiders

• powered by human submissions; and

• That are a combination of the two.

Crawler-based engines send crawlers, or spiders, out into cyberspace. These crawlers visit a Web site, read the information on the actual site, read the site's meta tags and also follow the links that the site connects to. The crawler returns all that information back to a central depository where the data is indexed. The crawler will periodically return to the sites to check for any information that has changed, and the frequency with which this happens is determined by the administrators of the search engine. Crawler-based search engines, such as Goggle, create their listings automatically. They "crawl" or "spider" the web, then people search through what they have found.

Human-powered search engines rely on humans to submit information that is subsequently indexed and catalogued. Only information that is submitted is put into the index.

In both cases, when you query a search engine to locate information, you are actually searching through the index that the search engine has created; you are not actually searching the Web. These indices are giant databases of information that is collected and stored and subsequently searched. This explains why sometimes a search on a commercial search engine, such as Yahoo! or Goggle, will return results that are in fact dead links. Since the search results are based on the index, if the index hasn't been updated since a Web page became invalid the search engine treats the page as still an active link even though it no longer is. It will remain that way until the index is updated.

In the web's early days, it used to be that a search engine either presented crawler-based results or human-powered listings. Today, it is extremely common for both types of results to be presented. Usually, a hybrid search engine will favor one type of listings over another. For example, MSN Search is more likely to present human-powered listings from Look Smart. However, it does also present crawler-based results, especially for more obscure queries.

So why will the same search on different search engines produce different results? Part of the answer to that is because not all indices are going to be exactly the same. It depends on what the spiders find or what the humans submitted. But more important, not every search engine uses the same algorithm to search through the indices. The algorithm is what the search engines use to determine the relevance of the information in the index to what the user is searching for.

One of the elements that a search engine algorithm scans for is the frequency and location of keywords on a Web page. Those with higher frequency are typically considered more relevant. In this way the search engines works.

So at the last, we can say the Search Engines offer users vast and impressive information available with a speed and convenience.