Real-life data is often affected by noise. To cope with this issue, classification techniques robust to noisy data are needed. Bayesian approaches are known to be fairly robust to noise. However, to compute probability estimates state-of-the-art Bayesian approaches adopt a lazy pattern-based strategy, which shows some limitations when coping data affected by a notable amount of noise. This paper proposes RIB (Robust Itemset-based Bayesian classifier), a novel eager and pattern-based Bayesian classifier which discovers frequent itemsets from training data and exploits them to build accurate probability estimates. Enforcing a minimum frequency of occurrence on the considered itemsets reduces the sensitivity of the probability estimates to noise. Furthermore, learning a Bayesian Network that also considers high-order dependences among data usually neglected by traditional Bayesian approaches appears to be more robust to noise and data overfitting than selecting a small subset of patterns tailored to each test instance. The experiments demonstrate that RIB is, on average, more accurate than most state-of-the-art classifiers, Bayesian and not, on benchmark datasets in which different kinds and levels of noise are injected. Furthermore, its performance on the same datasets prior to noise injection is competitive with that of state-of-the-art classifiers.
|Titolo:||RIB: A Robust Itemset-based Bayesian approach to classification|
|Data di pubblicazione:||2014|
|Digital Object Identifier (DOI):||10.1016/j.knosys.2014.08.015|
|Appare nelle tipologie:||1.1 Articolo in rivista|