Research

Web Information Retrieval

We are interested in all forms of retrieving information from the Web. In particular, we have developed the Absorbing Model link analysis method, as well as selective methods for effectively choosing appropriate features on a per-query basis. Query performance predictors are also a key aspect in our selective retrieval approaches. Moreover, we have developed the xQuAD probabilistic framework to diversify the retrieved documents for a query.

Expert/Entity Search

Often, people may be a good answer to a query, instead of documents. We have developed the novel Voting Model for ranking experts in response to a query. Thorough experiments, we have shown the benefit of expert-query term proximity and query expansion to this task. Additionally, we have applied the Voting Model to ranking blogs in the blogosphere, and entities on the Web.

Blog & News Search

We are the joint-organisers of the TREC Blog track, and created the Blogs06 and Blogs08 test collections. Moreover, we have developed models for identifying opinionated blog posts. We have also shown how to identify the important news stories of a given day using the blogosphere.

Divergence From Randomness Weighting Models

We were involved in the development of the Divergence From Randomness (DFR) framework of weighting models. In particular, we developed new methods and training strategies for document length normalisation, as part of the Smooth project. More recently, these have spawned field-based models, such as PL2F and ML2, and proximity term dependence models, such as pBiL2. All of these weighting models are available in the Terrier IR platform.

Efficient Indexing and Retrieval

Information retrieval concerns not just identifying the right documents, but doing it quickly. We have developed a MapReduce indexing strategy. We have also compared different partitioning schemes of distributed retrieval.