University of Borås

Borås Academic Digital Archive (BADA) >
Forskningspublikationer / Research Publications >
Institutionen Handels- och IT-högskolan / School of Business and IT (HIT) >
Informatik / Informatics >
Konferensbidrag / Conference papers (Informatics) >

Please use this identifier to cite or link to this item: http://hdl.handle.net/2320/14712

Files in This Item:

File Description SizeFormat
CIDM_2014_OC.pdf95.94 kBAdobe PDFView/Open
Title: Accurate and Interpretable Regression Trees using Oracle Coaching
Authors: Johansson, Ulf
Sönströd, Cecilia
König, Rikard
Department: University of Borås. School of Business and IT
Issue Date: 2014
Citation: 5th IEEE Symposium Computational Intelligence and Data Mining, 9-12 Decmber, Orlando, FL, USA
ISBN: 978-1-4799-4518-4/14
Pages: 194-201
Publisher: IEEE
Media type: text
Publication type: conference paper, peer reviewed
Keywords: Oracle coaching
Regression trees
Predictive modeling
Interpretable models
Subject Category: Subject categories::Engineering and Technology::Computer and Information Science::Computer Science
Subject categories::Social Sciences::Computer and Information Science::Computer and Information Science::Computer Science
Research Group: CSL@BS
Area of Research: Machine learning
Data mining
Strategic Research Area: none
Abstract: In many real-world scenarios, predictive models need to be interpretable, thus ruling out many machine learning techniques known to produce very accurate models, e.g., neural networks, support vector machines and all ensemble schemes. Most often, tree models or rule sets are used instead, typically resulting in significantly lower predictive performance. The over- all purpose of oracle coaching is to reduce this accuracy vs. comprehensibility trade-off by producing interpretable models optimized for the specific production set at hand. The method requires production set inputs to be present when generating the predictive model, a demand fulfilled in most, but not all, predic- tive modeling scenarios. In oracle coaching, a highly accurate, but opaque, model is first induced from the training data. This model (“the oracle”) is then used to label both the training instances and the production instances. Finally, interpretable models are trained using different combinations of the resulting data sets. In this paper, the oracle coaching produces regression trees, using neural networks and random forests as oracles. The experiments, using 32 publicly available data sets, show that the oracle coaching leads to significantly improved predictive performance, compared to standard induction. In addition, it is also shown that a highly accurate opaque model can be successfully used as a pre- processing step to reduce the noise typically present in data, even in situations where production inputs are not available. In fact, just augmenting or replacing training data with another copy of the training set, but with the predictions from the opaque model as targets, produced significantly more accurate and/or more compact regression trees.
Sponsorship: This work was supported by the Swedish Foundation for Strategic Research through the project High-Performance Data Mining for Drug Effect Detection (IIS11-0053), the Swedish Retail and Wholesale Development Council through the project Innovative Business Intelligence Tools (2013:5) and the Knowledge Foundation through the project Big Data Analytics by Online Ensemble Learning (20120192).
URI: http://hdl.handle.net/2320/14712
Sustainable development: -
Appears in Collections:Konferensbidrag / Conference papers (Informatics)

SFX Query

All items in Borås Academic Digital Archive are protected by copyright, with all rights reserved.

 

DSpace Software Copyright © 2002-2010  The DSpace Foundation