Home arrow Sindicacion
Sindicacion
ScienceDirect Publication: Information Processing & Management
ScienceDirect RSS

ScienceDirect
  • The Editor in Chief would like to extend his thanks to the below reviewers
    Publication year: 2012
    Source: Information Processing & Management, Available online 2 February 2012

    [No author name available]


  • Document replication strategies for geographically distributed web search engines
    Publication year: 2012
    Source: Information Processing & Management, Available online 1 February 2012

    Enver Kayaaslan, B. Barla Cambazoglu, Cevdet Aykanat

    Large-scale web search engines are composed of multiple data centers that are geographically distant to each other. Typically, a user query is processed in a data center that is geographically close to the origin of the query, over a replica of the entire web index. Compared to a centralized, single-center search engine, this architecture offers lower query response times as the network latencies between the users and data centers are reduced. However, it does not scale well with increasing index sizes and query traffic volumes because queries are evaluated on the entire web index, which has to be replicated and maintained in all data centers. As a remedy to this scalability problem, we propose a document replication framework in which documents are selectively replicated on data centers based on regional user interests. Within this framework, we propose three different document replication strategies, each optimizing a different objective: reducing the potential search quality loss, the average query response time, or the total query workload of the search system. For all three strategies, we consider two alternative types of capacity constraints on index sizes of data centers. Moreover, we investigate the performance impact of query forwarding and result caching. We evaluate our strategies via detailed simulations, using a large query log and a document collection obtained from the Yahoo! web search engine.

    Highlights

    ? Document replication strategies for geographically distributed search engines. ? Search quality, average query response time, and query workload criteria. ? Selective, partial document replication is superior to full or no replication. ? Experiments with a real-life setting and a large query log.




  • Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling
    Publication year: 2012
    Source: Information Processing & Management, Available online 30 January 2012

    Zhengchen Zhang, Shuzhi Sam Ge, Hongsheng He

    In this paper, a document summarization framework for storytelling is proposed to extract essential sentences from a document by exploiting the mutual effects between terms, sentences and clusters. There are three phrases in the framework: document modeling, sentence clustering and sentence ranking. The story document is modeled by a weighted graph with vertexes that represent sentences of the document. The sentences are clustered into different groups to find the latent topics in the story. To alleviate the influence of unrelated sentences in clustering, an embedding process is employed to optimize the document model. The sentences are then ranked according to the mutual effect between terms, sentence as well as clusters, and high-ranked sentences are selected to comprise the summarization of the document. The experimental results on the Document Understanding Conference (DUC) data sets demonstrate the effectiveness of the proposed method in document summarization. The results also show that the embedding process for sentence clustering render the system more robust with respect to different cluster numbers.

    Highlights

    ? A document summarization method for storytelling is proposed which extracts important sentences from a given story to compose the summary. ? The sentences in a document are ranked following a mutual-reinforcement rule. ? The sentence rank is determined by term ranks and cluster ranks. ? An embedding method is employed to improve the performance of sentence clustering.




  • Managing the investment in information security technology by use of a quantitative modeling
    Publication year: 2012
    Source: Information Processing & Management, Available online 28 January 2012

    Rok Bojanc, Borka Jerman-Bla?i?, Metka Tekav?i?

    This paper presents a mathematical model for an optimal security-technology investment evaluation and decision-making processes based on a quantitative analysis of the security risks and a digital-assets assessment in an organization. The model makes use of a quantitative analysis of different security measures that counteract individual risks by identifying the information-system processes in an enterprise and the potential threats. The model comprises the target security levels for all the identified core business processes and the probability of a security accident together with the possible loss the organization may suffer. The model allows in-depth analyses and computations providing quantitative assessments of different options for investments, which translate into recommendations that facilitate the selection of the best solution and the associated decision-making. The model was tested using empirical examples and mathematical simulations with data from a real business environment.

    Highlights

    ? Innovative quantitative model for evaluating investments in information security technology. ? Simulation of random events and probability elements in provision of risk management. ? Examples based on empirical research. ? Standard procedure for selecting optimal security solutions and associated investment.




  • A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization
    Publication year: 2012
    Source: Information Processing & Management, Available online 17 January 2012

    Jieming Yang, Yuanning Liu, Xiaodong Zhu, Zhen Liu, Xiaoxu Zhang

    The feature selection, which can reduce the dimensionality of vector space without sacrificing the performance of the classifier, is widely used in text categorization. In this paper, we proposed a new feature selection algorithm, named CMFS, which comprehensively measures the significance of a term both in inter-category and intra-category. We evaluated CMFS on three benchmark document collections, 20-Newsgroups, Reuters-21578 and WebKB, using two classification algorithms, Naïve Bayes (NB) and Support Vector Machines (SVMs). The experimental results, comparing CMFS with six well-known feature selection algorithms, show that the proposed method CMFS is significantly superior to Information Gain (IG), Chi statistic (CHI), Document Frequency (DF), Orthogonal Centroid Feature Selection (OCFS) and DIA association factor (DIA) when Naïve Bayes classifier is used and significantly outperforms IG, DF, OCFS and DIA when Support Vector Machines are used.

    Highlights

    ? The term is comprehensively measured both in inter-category and intra-category. ? We compared the proposed method with six well-known feature selection algorithms. ? The proposed algorithm can significantly improve the performance of classifiers.