Home arrow Sindicacion
Sindicacion
Journal of Intelligent Information Systems (Online First?)
Articles recently accepted for publication in this journal

  • Challenges and solutions in the opinion summarization of user-generated content

    Abstract  
    The present is marked by the influence of the Social Web on societies and people worldwide. In this context, users generate large amounts of data, especially containing opinion, which has been proven useful for many real-world applications. In order to extract knowledge from user-generated content, automatic methods must be developed. In this paper, we present different approaches to multi-document summarization of opinion from blogs and reviews. We apply these approaches to: (a) identify positive and negative opinions in blog threads in order to produce a list of arguments in favor and against a given topic and (b) summarize the opinion expressed in reviews. Subsequently, we evaluate the proposed methods on two distinct datasets and analyze the quality of the obtained results, as well as discuss the errors produced. Although much remains to be done, the approaches we propose obtain encouraging results and point to clear directions in which further improvements can be made.

    • Content Type Journal Article
    • Pages 1-24
    • DOI 10.1007/s10844-011-0194-z
    • Authors
      • Alexandra Balahur, European Commission Joint Research Centre, Via E. Fermi 2749, 21027 Ispra, Italy
      • Mijail Kabadjov, European Commission Joint Research Centre, Via E. Fermi 2749, 21027 Ispra, Italy
      • Josef Steinberger, European Commission Joint Research Centre, Via E. Fermi 2749, 21027 Ispra, Italy
      • Ralf Steinberger, European Commission Joint Research Centre, Via E. Fermi 2749, 21027 Ispra, Italy
      • Andrés Montoyo, DLSI, University of Alicante, Ap. de Correos 99, 03080 Alicante, Spain


  • Adaptive two-level optimization for selection predicates of multiple continuous queries

    Abstract  
    A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Query processing for such a data stream should also be continuous and rapid, which requires strict time and space constraints. In order to guarantee these constraints, we have proposed a new scheme called an Attribute Selection Construct (ASC) for an attribute of a data stream in our previous study (Lee and Lee, Information Sciences 178:2416?2432, 2008). As its optimization technique, this paper proposes the new strategy that determines the evaluation order of multiple ASC?s for a given query set at two different levels?macro and micro levels. Based on the two levels, it also proposes two different strategies?macro-sequence and hybrid-sequence?that find the optimized full evaluation sequence of all the ASC?s. In addition, it provides the adaptive strategy that periodically rearranges the evaluation sequence of multiple ASC?s. The performance of the proposed technique is verified by a series of experiments.

    • Content Type Journal Article
    • Pages 1-18
    • DOI 10.1007/s10844-011-0192-1
    • Authors
      • Hyun-Ho Lee, Department of Non-commissioned officers, Anyang Science University, San39-1 Anyang3-Dong Manan-Gu, Anyang-Si, Gyeonggi-Do, Korea
      • Won-Suk Lee, Department of Computer Science, Yonsei University, 134 Sedaemoongu, Shinchondong, Seoul, Korea


  • BRACID: a comprehensive approach to learning rules from imbalanced data

    Abstract  
    In this paper we consider induction of rule-based classifiers from imbalanced data, where one class (a minority class) is under-represented in comparison to the remaining majority classes. The minority class is usually of primary interest. However, most rule-based classifiers are biased towards the majority classes and they have difficulties with correct recognition of the minority class. In this paper we discuss sources of these difficulties related to data characteristics or to an algorithm itself. Among the problems related to the data distribution we focus on the role of small disjuncts, overlapping of classes and presence of noisy examples. Then, we show that standard techniques for induction of rule-based classifiers, such as sequential covering, top-down induction of rules or classification strategies, were created with the assumption of balanced data distribution, and we explain why they are biased towards the majority classes. Some modifications of rule-based classifiers have been already introduced, but they usually concentrate on individual problems. Therefore, we propose a novel algorithm, BRACID, which more comprehensively addresses the issues associated with imbalanced data. Its main characteristics includes a hybrid representation of rules and single examples, bottom-up learning of rules and a local classification strategy using nearest rules. The usefulness of BRACID has been evaluated in experiments on several imbalanced datasets. The results show that BRACID significantly outperforms the well known rule-based classifiers C4.5rules, RIPPER, PART, CN2, MODLEM as well as other related classifiers as RISE or K-NN. Moreover, it is comparable or better than the studied approaches specialized for imbalanced data such as generalizations of rule algorithms or combinations of SMOTE + ENN preprocessing with PART. Finally, it improves the support of minority class rules, leading to better recognition of the minority class examples.

    • Content Type Journal Article
    • Pages 1-39
    • DOI 10.1007/s10844-011-0193-0
    • Authors
      • Krystyna Napierala, Institute of Computing Science, Poznan University of Technology, Poznan, Poland
      • Jerzy Stefanowski, Institute of Computing Science, Poznan University of Technology, Poznan, Poland


  • A distributed architecture for efficient parallelization and computation of knowledge-based temporal abstractions

    Abstract  
    Today, data storage capabilities as well as computational power are rapidly increasing. On the one hand, this improvement makes it possible to generate and store a great amount of temporal (time-oriented) data for future query, analysis and discovery of new knowledge. On the other hand, systems and experts are encountering new problems in processing this increased amount of data. The rapid growth in stored time-oriented data necessitates the development of new methods for handling, processing, and interpreting large amounts of temporal data. One approach is to use an automatic summarization process based on predefined knowledge, such the Knowledge-Based Temporal-Abstraction (KBTA) method. This method enables one to summarize and reduce the amount of raw data by creating higher level interpretations based on predefined domain knowledge. Unfortunately, the task of temporal abstraction is inherently computationally expensive, especially when an enormous volume of multivariate data has to be handled and when complex patterns need to be considered. In this research, we address the scalability problem of a temporal-abstraction task that involves processing significantly large amounts of raw data. We propose a new computational framework, the Distributed KBTA (DKBTA), which efficiently distributes the abstraction process among several parallel computational nodes, in order to achieve an acceptable computation time. The DKBTA framework distributes the temporal-abstraction process along one or more computational axes, each of which enables parallelization of one or more temporal-abstraction tasks into which the main temporal-abstraction task is decomposed, such as by different subject groups, concepts types, or abstraction types. We have implemented the DKBTA framework and have evaluated it in a preliminary fashion in the medical and the information security domains, with encouraging results. In our small-scale evaluation, only distribution along the subjects axis and sometimes along the concept-type axis seemed to consistently enhance performance, and only for computations involving individual subjects and not functions of sets of subjects; but this observation might depend on the number of processing units. Additionally, since the communication between the processing units was based on the TCP protocol, we could not observe any speedup even when using two processing units on the same machine. In our further evaluations we plan to use a shared memory architecture in order to exchange data between processing units.

    • Content Type Journal Article
    • Pages 1-38
    • DOI 10.1007/s10844-011-0190-3
    • Authors
      • Asaf Shabtai, Department of Information Systems Engineering and Deutsche Telekom Laboratories at Ben-Gurion University, Ben-Gurion University, Beer-Sheva, Israel
      • Yuval Shahar, Department of Information Systems Engineering and Deutsche Telekom Laboratories at Ben-Gurion University, Ben-Gurion University, Beer-Sheva, Israel
      • Yuval Elovici, Department of Information Systems Engineering and Deutsche Telekom Laboratories at Ben-Gurion University, Ben-Gurion University, Beer-Sheva, Israel


  • Towards an effective automatic query expansion process using an association rule mining approach

    Abstract  
    The steady growth in the size of textual document collections is a key progress-driver for modern information retrieval techniques whose effectiveness and efficiency are constantly challenged. Given a user query, the number of retrieved documents can be overwhelmingly large, hampering their efficient exploitation by the user. In addition, retaining only relevant documents in a query answer is of paramount importance for an effective meeting of the user needs. In this situation, the query expansion technique offers an interesting solution for obtaining a complete answer while preserving the quality of retained documents. This mainly relies on an accurate choice of the added terms to an initial query. Interestingly enough, query expansion takes advantage of large text volumes by extracting statistical information about index terms co-occurrences and using it to make user queries better fit the real information needs. In this respect, a promising track consists in the application of data mining methods to the extraction of dependencies between terms. In this paper, we present a novel approach for mining knowledge supporting query expansion that is based on association rules. The key feature of our approach is a better trade-off between the size of the mining result and the conveyed knowledge. Thus, our association rules mining method implements results from Galois connection theory and compact representations of rules sets in order to reduce the huge number of potentially useful associations. An experimental study has examined the application of our approach to some real collections, whereby automatic query expansion has been performed. The results of the study show a significant improvement in the performances of the information retrieval system, both in terms of recall and precision, as highlighted by the carried out significance testing using the Wilcoxon test.

    • Content Type Journal Article
    • Pages 1-39
    • DOI 10.1007/s10844-011-0189-9
    • Authors
      • Chiraz Latiri, URPAH Team, Computer Sciences Department, Faculty of Sciences of Tunis, El Manar University, Tunis, Tunisia
      • Hatem Haddad, URPAH Team, Computer Sciences Department, Faculty of Sciences of Tunis, El Manar University, Tunis, Tunisia
      • Tarek Hamrouni, URPAH Team, Computer Sciences Department, Faculty of Sciences of Tunis, El Manar University, Tunis, Tunisia