HITS stands for Hyperlink-Induced Topic Search and serves as an algorithm for search engines to rank pages. It was developed by Jon Kleinberg. HITS uses hubs and authority to define relationships between pages. Hubs are highly valued lists for a given query. An authoritative page is one that many hubs link to, and a hub is a page that links to many authorities.
How does Hyperlink-Induced Topic Search work?
The first step that the algorithm takes is to retrieve the data of the search query. This is the information people type on search engines to obtain a specific result. Then, it performs a computation only regarding these results, without taking into consideration other websites.
After that, authoritative and hub values are defined, and a process of iteration begins. In the iteration process, two updates are done: the authority update and the hub update. For HITS, an authoritative website is a site that has valuable content. These types of websites tend to rank higher on the search engine results page because they are considered ‘expert’ pages.
Much like PageRank (another algorithm that identifies and ranks sites), HITS takes the linkage of documents on the web into account.
However, HITS differentiates itself from PageRank on a few aspects:
• It is executed at query time, not at indexing time. The hub and authority scores assigned to a page are query-specific. This means that the ranking will always consider the keyword or content that people are searching for.
• Whereas algorithms like PageRank compute one score per document, HITS compute two.
• It is processed on a small subset of documents, instead of all documents, like PageRank does.
• Search engines do not commonly use it.