Home arrow Sindicacion
Sindicacion
Information Retrieval (Online First?)
Articles recently accepted for publication in this journal

  • Lumping algorithms for computing Google?s PageRank and its derivative, with attention to unreferenced nodes

    Abstract  
    In this paper, we introduce five type nodes for lumping the Web matrix, and give a unified presentation of some popular lumping methods for PageRank. We show that the PageRank problem can be reduced to solving the PageRank corresponding to the strongly non-dangling and referenced nodes, and the full PageRank vector can be easily derived by some recursion formulations. Our new lumping strategy can reduce the original PageRank problem to a much smaller one, and it is much cheaper than the recursively reordering scheme. Furthermore, we discuss sensitivity of the PageRank vector, and present a lumping algorithm for computing its first order derivative. Numerical experiments show that the new algorithms are favorable when the matrix is large and the damping factor is high.

    • Content Type Journal Article
    • Pages 1-24
    • DOI 10.1007/s10791-012-9183-2
    • Authors
      • Qing Yu, Department of Arts and Sciences, Xuzhou Higher Normal School, Xuzhou, 221116 Jiangsu, People?s Republic of China
      • Zhengke Miao, School of Mathematical Sciences, Xuzhou Normal University, Xuzhou, 221116 Jiangsu, People?s Republic of China
      • Gang Wu, School of Mathematical Sciences, Xuzhou Normal University, Xuzhou, 221116 Jiangsu, People?s Republic of China
      • Yimin Wei, School of Mathematical Sciences and Shanghai Key Laboratory of Contemporary Applied Mathematics, Fudan University, Shanghai, 200433 People?s Republic of China


  • On the role of novelty for search result diversification

    Abstract  
    Re-ranking the search results in order to promote novel ones has traditionally been regarded as an intuitive diversification strategy. In this paper, we challenge this common intuition and thoroughly investigate the actual role of novelty for search result diversification, based upon the framework provided by the diversity task of the TREC 2009 and 2010 Web tracks. Our results show that existing diversification approaches based solely on novelty cannot consistently improve over a standard, non-diversified baseline ranking. Moreover, when deployed as an additional component by the current state-of-the-art diversification approaches, our results show that novelty does not bring significant improvements, while adding considerable efficiency overheads. Finally, through a comprehensive analysis with simulated rankings of various quality, we demonstrate that, although inherently limited by the performance of the initial ranking, novelty plays a role at breaking the tie between similarly diverse results.

    • Content Type Journal Article
    • Pages 1-25
    • DOI 10.1007/s10791-011-9180-x
    • Authors
      • Rodrygo L. T. Santos, School of Computing Science, University of Glasgow, Glasgow, G12 8QQ UK
      • Craig Macdonald, School of Computing Science, University of Glasgow, Glasgow, G12 8QQ UK
      • Iadh Ounis, School of Computing Science, University of Glasgow, Glasgow, G12 8QQ UK


  • Reranking web search results for diversity

    Abstract  
    Search engine results are often biased towards a certain aspect of a query or towards a certain meaning for ambiguous query terms. Diversification of search results offers a way to supply the user with a better balanced result set increasing the probability that a user finds at least one document suiting her information need. In this paper, we present a reranking approach based on minimizing variance of Web search results to improve topic coverage in the top-k results. We investigate two different document representations as the basis for reranking. Smoothed language models and topic models derived by Latent Dirichlet allocation. To evaluate our approach we selected 240 queries from Wikipedia disambiguation pages. This provides us with ambiguous queries together with a community generated balanced representation of their (sub)topics. For these queries we crawled two major commercial search engines. In addition, we present a new evaluation strategy based on Kullback-Leibler divergence and Wikipedia. We evaluate this method using the TREC sub-topic evaluation on the one hand, and manually annotated query results on the other hand. Our results show that minimizing variance in search results by reranking relevant pages significantly improves topic coverage in the top-k results with respect to Wikipedia, and gives a good overview of the overall search result. Moreover, latent topic models achieve competitive diversification with significantly less reranking. Finally, our evaluation reveals that our automatic evaluation strategy using Kullback-Leibler divergence correlates well with ?-nDCG scores used in manual evaluation efforts.

    • Content Type Journal Article
    • Pages 1-20
    • DOI 10.1007/s10791-011-9179-3
    • Authors
      • Ralf Krestel, L3S Research Center - Leibniz Universität Hannover, Appelstrasse 9a, 30167 Hannover, Germany
      • Peter Fankhauser, DFKI - German Research Center for Artificial Intelligence, Stuhlsatzenhausweg 3, 66123 Saarbrcken, Germany


  • Coverage-based search result diversification

    Abstract  
    Traditional retrieval models may provide users with less satisfactory search experience because documents are scored independently and the top ranked documents often contain excessively redundant information. Intuitively, it is more desirable to diversify search results so that the top-ranked documents can cover different query subtopics, i.e., different pieces of relevant information. In this paper, we study the problem of search result diversification in an optimization framework whose objective is to maximize a coverage-based diversity function. We first define the diversity score of a set of search results through measuring the coverage of query subtopics in the result set, and then discuss how to use them to derive diversification methods. The key challenge here is how to define an appropriate coverage function given a query and a set of search results. To address this challenge, we propose and systematically study three different strategies to define coverage functions. They are based on summations, loss functions and evaluation measures respectively. Each of these coverage functions leads to a result diversification method. We show that the proposed coverage based diversification methods not only cover several state-of-the-art methods but also allows us to derive new ones. We compare these methods both analytically and empirically. Experiment results on two standard TREC collections show that all the methods are effective for diversification and the new methods can outperform existing ones.

    • Content Type Journal Article
    • Pages 1-25
    • DOI 10.1007/s10791-011-9178-4
    • Authors
      • Wei Zheng, University of Delaware, 209 Evans Hall, Newark, DE 19716, USA
      • Xuanhui Wang, Yahoo! Labs, 4401 Great America Parkway, Santa Clara, CA 95054, USA
      • Hui Fang, University of Delaware, 209 Evans Hall, Newark, DE 19716, USA
      • Hong Cheng, The Chinese University of Hong Kong, William M. W. Mong Engineering Building, Shatin, NT, Hong Kong


  • Architecture and evaluation of BRUJA, a multilingual question answering system

    Abstract  
    Given a user question, the goal of a Question Answering (QA) system is to retrieve answers rather than full documents or even best-matching passages, as most Information Retrieval systems currently do. In this paper, we present BRUJA, a QA system for the management of multilingual collections. BRUJ rkstions (English, Spanish and French). The BRUJA architecture is not formed with three monolingual QA systems but instead uses English as Interlingua to make usual QA tasks such as question classifications and answer extractions. In addition, BRUJA uses Cross Language Information Retrieval (CLIR) techniques to retrieve relevant documents from a multilingual collection. On the one hand, we have more documents to find answers from but on the other hand, we are introducing noise into the system because of translations to the Interlingua (English) and the CLIR module. The question is whether the difficulty of managing three languages is worth it or whether a monolingual QA system delivers better results. We report on in-depth experimentation and demonstrate that our multilingual QA system gets better results than its monolingual counterpart whenever it uses good translation resources and, especially, CLIR techniques that are state-of-the-art.

    • Content Type Journal Article
    • Pages 1-20
    • DOI 10.1007/s10791-011-9177-5
    • Authors
      • M. Á. García-Cumbreras, SINAI Research Group, Computer Science Department, University of Jaén, Jaén, Spain
      • F. Martínez-Santiago, SINAI Research Group, Computer Science Department, University of Jaén, Jaén, Spain
      • L. A. Ureña-López, SINAI Research Group, Computer Science Department, University of Jaén, Jaén, Spain