-
Lumping algorithms for computing Google?s PageRank and its derivative, with attention to unreferenced nodes
Abstract In this paper, we introduce five type nodes for lumping the Web matrix, and give a unified presentation of some popular lumping
methods for PageRank. We show that the PageRank problem can be reduced to solving the PageRank corresponding to the strongly
non-dangling and referenced nodes, and the full PageRank vector can be easily derived by some recursion formulations. Our
new lumping strategy can reduce the original PageRank problem to a much smaller one, and it is much cheaper than the recursively
reordering scheme. Furthermore, we discuss sensitivity of the PageRank vector, and present a lumping algorithm for computing
its first order derivative. Numerical experiments show that the new algorithms are favorable when the matrix is large and
the damping factor is high.
- Content Type Journal Article
- Pages 1-24
- DOI 10.1007/s10791-012-9183-2
- Authors
- Qing Yu, Department of Arts and Sciences, Xuzhou Higher Normal School, Xuzhou, 221116 Jiangsu, People?s Republic of China
- Zhengke Miao, School of Mathematical Sciences, Xuzhou Normal University, Xuzhou, 221116 Jiangsu, People?s Republic of China
- Gang Wu, School of Mathematical Sciences, Xuzhou Normal University, Xuzhou, 221116 Jiangsu, People?s Republic of China
- Yimin Wei, School of Mathematical Sciences and Shanghai Key Laboratory of Contemporary Applied Mathematics, Fudan University, Shanghai, 200433 People?s Republic of China
-
On the role of novelty for search result diversification
Abstract Re-ranking the search results in order to promote novel ones has traditionally been regarded as an intuitive diversification
strategy. In this paper, we challenge this common intuition and thoroughly investigate the actual role of novelty for search
result diversification, based upon the framework provided by the diversity task of the TREC 2009 and 2010 Web tracks. Our
results show that existing diversification approaches based solely on novelty cannot consistently improve over a standard,
non-diversified baseline ranking. Moreover, when deployed as an additional component by the current state-of-the-art diversification
approaches, our results show that novelty does not bring significant improvements, while adding considerable efficiency overheads.
Finally, through a comprehensive analysis with simulated rankings of various quality, we demonstrate that, although inherently
limited by the performance of the initial ranking, novelty plays a role at breaking the tie between similarly diverse results.
- Content Type Journal Article
- Pages 1-25
- DOI 10.1007/s10791-011-9180-x
- Authors
- Rodrygo L. T. Santos, School of Computing Science, University of Glasgow, Glasgow, G12 8QQ UK
- Craig Macdonald, School of Computing Science, University of Glasgow, Glasgow, G12 8QQ UK
- Iadh Ounis, School of Computing Science, University of Glasgow, Glasgow, G12 8QQ UK
-
Reranking web search results for diversity
Abstract Search engine results are often biased towards a certain aspect of a query or towards a certain meaning for ambiguous query
terms. Diversification of search results offers a way to supply the user with a better balanced result set increasing the
probability that a user finds at least one document suiting her information need. In this paper, we present a reranking approach
based on minimizing variance of Web search results to improve topic coverage in the top-k results. We investigate two different
document representations as the basis for reranking. Smoothed language models and topic models derived by Latent Dirichlet allocation.
To evaluate our approach we selected 240 queries from Wikipedia disambiguation pages. This provides us with ambiguous queries
together with a community generated balanced representation of their (sub)topics. For these queries we crawled two major commercial
search engines. In addition, we present a new evaluation strategy based on Kullback-Leibler divergence and Wikipedia. We evaluate
this method using the TREC sub-topic evaluation on the one hand, and manually annotated query results on the other hand. Our
results show that minimizing variance in search results by reranking relevant pages significantly improves topic coverage
in the top-k results with respect to Wikipedia, and gives a good overview of the overall search result. Moreover, latent topic models
achieve competitive diversification with significantly less reranking. Finally, our evaluation reveals that our automatic
evaluation strategy using Kullback-Leibler divergence correlates well with ?-nDCG scores used in manual evaluation efforts.
- Content Type Journal Article
- Pages 1-20
- DOI 10.1007/s10791-011-9179-3
- Authors
- Ralf Krestel, L3S Research Center - Leibniz Universität Hannover, Appelstrasse 9a, 30167 Hannover, Germany
- Peter Fankhauser, DFKI - German Research Center for Artificial Intelligence, Stuhlsatzenhausweg 3, 66123 Saarbrcken, Germany
-
Coverage-based search result diversification
Abstract Traditional retrieval models may provide users with less satisfactory search experience because documents are scored independently
and the top ranked documents often contain excessively redundant information. Intuitively, it is more desirable to diversify
search results so that the top-ranked documents can cover different query subtopics, i.e., different pieces of relevant information.
In this paper, we study the problem of search result diversification in an optimization framework whose objective is to maximize
a coverage-based diversity function. We first define the diversity score of a set of search results through measuring the
coverage of query subtopics in the result set, and then discuss how to use them to derive diversification methods. The key
challenge here is how to define an appropriate coverage function given a query and a set of search results. To address this
challenge, we propose and systematically study three different strategies to define coverage functions. They are based on
summations, loss functions and evaluation measures respectively. Each of these coverage functions leads to a result diversification
method. We show that the proposed coverage based diversification methods not only cover several state-of-the-art methods but
also allows us to derive new ones. We compare these methods both analytically and empirically. Experiment results on two standard
TREC collections show that all the methods are effective for diversification and the new methods can outperform existing ones.
- Content Type Journal Article
- Pages 1-25
- DOI 10.1007/s10791-011-9178-4
- Authors
- Wei Zheng, University of Delaware, 209 Evans Hall, Newark, DE 19716, USA
- Xuanhui Wang, Yahoo! Labs, 4401 Great America Parkway, Santa Clara, CA 95054, USA
- Hui Fang, University of Delaware, 209 Evans Hall, Newark, DE 19716, USA
- Hong Cheng, The Chinese University of Hong Kong, William M. W. Mong Engineering Building, Shatin, NT, Hong Kong
-
Architecture and evaluation of BRUJA, a multilingual question answering system
Abstract Given a user question, the goal of a Question Answering (QA) system is to retrieve answers rather than full documents or even
best-matching passages, as most Information Retrieval systems currently do. In this paper, we present BRUJA, a QA system for
the management of multilingual collections. BRUJ rkstions (English, Spanish and French). The BRUJA architecture is not formed
with three monolingual QA systems but instead uses English as Interlingua to make usual QA tasks such as question classifications
and answer extractions. In addition, BRUJA uses Cross Language Information Retrieval (CLIR) techniques to retrieve relevant
documents from a multilingual collection. On the one hand, we have more documents to find answers from but on the other hand,
we are introducing noise into the system because of translations to the Interlingua (English) and the CLIR module. The question
is whether the difficulty of managing three languages is worth it or whether a monolingual QA system delivers better results.
We report on in-depth experimentation and demonstrate that our multilingual QA system gets better results than its monolingual
counterpart whenever it uses good translation resources and, especially, CLIR techniques that are state-of-the-art.
- Content Type Journal Article
- Pages 1-20
- DOI 10.1007/s10791-011-9177-5
- Authors
- M. Á. García-Cumbreras, SINAI Research Group, Computer Science Department, University of Jaén, Jaén, Spain
- F. Martínez-Santiago, SINAI Research Group, Computer Science Department, University of Jaén, Jaén, Spain
- L. A. Ureña-López, SINAI Research Group, Computer Science Department, University of Jaén, Jaén, Spain
|