towards a better integration of data mining and decision support

Upload: paul-wells

Post on 09-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 Towards a Better Integration of Data Mining and Decision Support

    1/5

    Towards a Better Integration of Data Mining and Decision Support

    via Computational Intelligence

    Sylvain DelisleDpartement de mathmatiques et dinformatique

    Universit du Qubec Trois-RiviresQubec, Canada, G9A 5H7

    [email protected] www.uqtr.ca/~delisle

    Abstract

    Despite a high level of activity and a large numberof researchers involved, current research in data

    mining is plagued with several serious problems thatshould be regarded as top-priority challenges. First,large-scale, visible results and benefits seem rare.Second, the field has become utterly specialized and alot of work remains to be done at the big picturelevel. Finally, data mining is very poorly integratedwith the decision support level. This paperconcentrates on the latter challenge and presents aresearch proposal in that direction.

    1. Introduction

    With the relentless growth of information availableon the Internet, combined with the storage of gigantic

    quantities of data on personal and corporate computers,individuals and businesses often find themselves insituations where they feel overwhelmed by the sheeramount of data they must process to make time-constrained but informed decisions.

    To add to this already challenging state of affairs, thelast couple of years have brought along a new twist:valuable information could be "hidden" in your

    personal/corporate/Web data, and data mining (DM)may allow you to uncover it. And so DM, and variousrelated methodologies, techniques and tools are nowbecoming more and more widely used in order touncover this implicit and potentially useful informationor knowledge.

    Unfortunately, despite the huge interest andsometimes unrealistic hopes it has generated, DM stillfaces several serious challenges. One of these is thefact that large-scale, systematic, and predictableapplications of DM to the benefit of numerousorganizations and businesses seem rather uncommon.Another challenge is the fact that the majority ofcurrent progress in DM research seems to be based on

    utterly-specialized (computer science- or mathematics-related) techniques, whereas research on strategic,methodological, and even epistemological aspects ofDM are raretoo few people seem to be working atthe level of the forest and too many at the level of theindividual tree! Yet another challenge, and one onwhich we will focus in this paper, is the fact that DMremains essentially very poorly integrated with thedecision support level (and its correspondingcomputerized systems) it is supposed to serve, thus notfully realizing its very own raison d'tre.

    In this paper, we present the core of our newresearch project currently in its initial stage: integrating

    DM and decision support (DS) through computationalintelligence. Our main goal is to make DM moreaccessible and effective for decision makers to supportapplication-oriented DS. We believe there is currentlya lack of research in this area; progress in that direction

    could foster significant advancement in applied DM.

    2. Research Goals

    The long term goal of this research project is todiscover and evaluate new solutions to the crucial andvery challenging problem of poor integration betweenDM and DS. These solutions will be based mainly oncomputational intelligence concepts and tools (e.g.ontologies and reasoning systems), but also on newdevelopment methodologies that take DM aspects earlyinto consideration in the development cycle ofinformation and DS systems. The combination of thesetwo domains, knowledge and methodology, constitutesa most promising avenue relative to the current state ofthe art in computer science. Our short term objective isto find solutions to the following specific problemswhich constitute serious limitations in typicalsituations of DM and DS integration:i) Current DM processes make very little use of

    already existing corporate knowledge, even whensuch knowledge has been (partially) formalized

    Proceedings of the 16th International Workshop on Database and Expert Systems Applications (DEXA05)529-4188/05 $20.00 2005 IEEE

  • 8/8/2019 Towards a Better Integration of Data Mining and Decision Support

    2/5

    and computerized. Consequently, DM is moretedious than needed and its results tend to producealready known information.

    ii) Most DM projects occur as a posterioridevelopment for which specific databases and pro-cessing capabilities must be (re-)developed, andfor which special (non-interactive and non-robust)

    means are used to make DM results usable by theDS level. Thus, the lack of methodological andtechnical integration with standard informationsystems causes DM projects to appear as extrawork for which the return on investment canhardly be measured.

    iii) There is a lack of standardisation in DM thatgenerates a lot of redundant work and complicatesin a major way the evaluation and comparison oftools, methods, and results. Steps must be madetowards the adoption of effective DS-oriented DMstandards, at least at the local (e.g. enterprise)level.

    Because we live in an information-based world, theproblem and research objectives identified above are ofthe greatest importance, both at the scientific andeconomic levels. They are directly related to the all-pervading problem of information overload for whichDM offers valuable (though partial) solutions, evenmore so if adequately coupled with modern businessesDS systems. We think our project offers great potentialfor advancing the state of the art in DM applied toinformation and DS systems.

    3. Related Work

    The problem of poor DM and DS integration has

    only recently been clearly acknowledged in thescientific literature and little work has been done alongthe lines presented here. Some recent conference callsfor papers confirm the relevance of our viewpoint, suchas [1]: Recently emerged the idea that databasemanagement systems should provide support andtechnology for data mining, as it occurred for OLAPand data warehouses in the last decade. [15], asseveral others, mentions integration of datawarehousing, OLAP and data mining in its list ofpreferred topics of interest. Some conferences areaimed more specifically at decision making and less soat the technological infrastructure supporting it, as inthe work of [3], while others are associated with thefield of multiagent systems ([17]): Integratedknowledge intensive systems emerge in all domains ofbusiness and engineering, when intelligent decisionsupport requires knowing how this knowledge isproduced, measured, communicated, andinterpretedsee also [27] for an agent-basedapproach to DM. The recent domain of gridintelligence, see e.g. [16], is concerned with large-scale

    distributed enterprises information systems and cansurely be considered promising as it strives to integratethe so-called data and computational grids with theknowledge gridsee also [8].

    One of the rare references dealing in some depthwith DM and DS is that of [19] with chapters dedicatedto various aspects of the integration of DM and DS

    these researchers have proposed the equation datamining + decision tools = better business. Work inmeta-learning is also quite relevant as it attempts tosupport DM: see METAL (www.metal-kdd.org), [6],and [24]. Other recent work in the field of KDD(Knowledge Discovery in Databases) is that of [13] inwhich an ontology is used to model domain knowledgeto support the KDD process, that of [20] where anenvironment for the rapid development of pre-DMprocessing chains is introduced, and that of [25] inwhich expert system and machine learningtechnologies are combined to support DM. Workdealing with conceptual queries and online/Web

    (interactive) DM is also of interest since it must takeinto account some elements of the DS dimension: see[9], for instance.

    In the domain of DS systems, the Journal ofDecision Support Systems published in 2002 a specialissue on directions for the next decade containingseveral key papers: [7] propose to integrate DS andknowledge management processes using DM; [21]propose the concept of a knowledge warehouse tomanage the knowledge of the firm; and [22] sum-marize the evolution of DS systems and emphasize theimportance of data warehouses, OLAP, and DM inthose systems. Also of interest is case-based reasoning(CBR) which offers valuable tools for DS, as

    exemplified in the works of [2], [18], and [23].Because decision makers tend to use analogy-basedreasoning processes when solving problems andbecause they usually do so by considering only a verylimited number of cases, especially in the field ofbusiness administration, we hypothesize that CBRconstitutes a promising approach for our researchprojectmore on this in the following section.

    4. For a Better Integration of DM and DS

    Holsapple & Whinston ([14]) consider that DSsystems raison dtre is to increase the productivity ofdecision makers through implementing their abilities tomanipulate knowledge, by facilitating problem-solving, and by providing an aid for non-structuredproblems. Nowadays, this requires the constantprocessing of phenomenal quantities of data andinformation and always involves the use ofcomputerized systems. DM methods and tools, holdthe promise of facilitating the decision makers life byextracting hidden information from a sea of data, and

    Proceedings of the 16th International Workshop on Database and Expert Systems Applications (DEXA05)529-4188/05 $20.00 2005 IEEE

  • 8/8/2019 Towards a Better Integration of Data Mining and Decision Support

    3/5

    compacting it into a form easily amenable to decisionmaking. But as we argued before, DM is poorlyintegrated with DS and, consequently, cannoteffectively support decision makers.

    We start from the work of Bolloju et al. ([7]) inorder to depict the typical situation addressed by ourproposal, from a DS perspective. Indeed, as shown in

    Figure 1 below, DM and DS systems are two distinctcomponents that do not usually interact through acomputerized tool. Thus, DM and DS are notintegrated at allBolloju et al. do not consider thisquestion in their work which focuses on knowledgemanagement. From a DM perspective and in relation toour objectives, we must remark that all too often inDM or machine learning, the zero-knowledge (tabularasa) hypothesis is used and, not very surprisingly,leads to results that can be qualified as already knowninformation or knowledge (e.g. rediscoveringdatabase functional dependencies).

    What we propose, in order to support a fruitfulintegration of DM and DS, is to better exploit existingknowledgebe it background or specialized, or,relative to the enterprise, internal or externalvia what

    we will be referring to as KIVs, i.e. Knowledge (DM-DS) Integration Vectors. We define a KIV as a sourceof data, information or knowledge that plays a role in

    DM and DS, and in their integration. Examples ofKIVs are data models, decision models, general anddomain-specific ontologies, and general and domain-specific rule bases. All KIVs will be exploitable and

    managed by KIVE, i.e. the KIV Environment wepropose. KIVE could be added to the architecturedepicted in Figure 1 as a new system interfacing withdecision makers and interconnected with existingresources. KIVE will contain a case-based system thatwill index (see [5]) all relevant information to facilitateDM, and ultimately decision making, from the decisionmakers viewpoint.

    Figure 1 (Source: Bolloju et al. 2002 [7]).EIS=Enterprise Information System; DSS=Decision Support System(s); model bases= models used by decision makers

    DS is now an important application domain of case-based reasoning ([4], [12]) as it embodies well theprocessing performed by actual decision makers,including their potential usefulness with regard todecision explanation ([10]). The case-based approachwe suggest here is somewhat similar to that of [20],although more much elaborated. KIVE will not onlyconsider pre-DM processing, but all aspects of DMrelated to DS. Also, contrary to [20], we do notexclude unsuccessful cases as they can also be quiteinformative for DM purposes. Thus, KIVE willconstitute a DS-oriented DM environment that containsthe following main components: a user interfacesuitable for decision makers that includes a DS-oriented DM wizard; a catalogue of KIVs which arelinked to other relevant resources, information and DS

    systems; a feature/attribute selection and engineeringtool; a catalogue of DM algorithms; a catalogue of(positive and negative) commented cases that will beindexed on multiple criteria, including information onthe decision problem related to it; and a case learnerand knowledge acquisition module.

    The development and implementation of theproposed KIVE environment is a challenging projectthat will allow us to study fundamental issues relatedto the problem of DM-DS integration. Ultimately, thiswill support significant progress in applied DM. Wenow consider many key questions (in no particularorder) that will motivate our thinking during thisresearch project:

    KIVs include a wide variety of data, information,and knowledge. What is the best way to represent

    Proceedings of the 16th International Workshop on Database and Expert Systems Applications (DEXA05)529-4188/05 $20.00 2005 IEEE

  • 8/8/2019 Towards a Better Integration of Data Mining and Decision Support

    4/5

    KIVs, especially for DM purposes and in a format(and language) amenable to DS? How to ensurethat existing corporate knowledge is properlyformalized and accessible in due time to supplyreliable KIVs?

    What types of metadata and ontologies should bedeveloped to successfully guide the DM process?How can recent work on meta-learning be put touse here? Should metadata be related to a specificDM methodology and/or to a specific theory ofknowledge?

    How to define cases, or derive them fromdatabases ([20], [26]) in order to serve both theobjectives of DM and DS? Meta-learning seemspromising, as well as knowledge extraction fromDM experts. And what is the best way to process(i.e. index, retrieve, assess similarity, adapt) themwithin KIVEs case system?

    How should the knowledge base of KIVEs wizardbe developed in the first place, and what type of

    rule should it use? Should it put more emphasis onDM or on DS aspects? Should it be data- ormodel-oriented, or both? Should the wizards rule-based system be replaced altogether by a casesystem instead (with learning capacities)?

    To what extend can the trial and error strategy,which is often used in DM, be minimized duringthe interaction with the KIVE interactive wizardwhen no pre-existing case is available? Hereagain, meta-learning could be useful. How todefine a productive concept of interactive DSthrough DM when many DM processes mayrequire a lot more time than what is usuallyunderstood by online, interactive?

    How can standard development methodologies forinformation systems and database systems berevised or adjusted to include right from the startrelevant requirements and specifications related toDM and DS? Or should entirely newmethodologies be devised?

    How DM and DS integration should be balanced(i.e. with more emphasis on DM or on DS) tobetter serve the needs of typical decision makers?Should DM-specific and DS-specific functions beaccessible separately (through the wizard) whenneeded? Can we define precisely the notion ofDM-DS integration and propose a theoretical

    model supporting it? Can this integration bemeasured or quantified?

    5. Conclusion

    We have briefly presented the core of our newresearch project currently in its initial stage.Consequently, this paper has put forward more

    questions than definite answers. We believe findinganswers to some of these questions could triggersignificant advancement in applied DM, especially inthe context of decision support.

    Our goal is to develop an intelligent environment(named KIVE) to support decision makers in theirappropriate application of DM so that they can

    efficiently make better decisions in a given domain. Sothe first order goal is DM for DS, i.e. DM forapplication-domain-related DS. However, intelligentlysupporting DM involves DM-related DS (e.g. how toselect the most appropriate DM technique during theDM process). Thus, the second order goal is DS forDM. Both first and second order goals will be tackledwith DM, DS, and computational intelligence conceptsand techniques.

    For instance, KIVE should be able to allow amanager to perform an in-depth analysis, involvingDM, of the last two years production data in order toderive decision trees, classification rules, or regression

    functions that would help him/her better understandhow to control production costs and thus be in aposition to make better decisions in that area. Anotherexample could be the use of DM on enterprise data inorder to derive a parameterized model of its overallperformance (see our related work in [11]).

    This work is oriented towards the elaboration of newconceptual models and computer-oriented techniques,and their implementation and testing in software andrelated methodologiesbut all this while keeping theforest in mind. We hypothesize that the current stateof the art in DM is limited by hyper-specialized sub-disciplines not sufficiently taking into considerationother DM-related disciplines and challenges from a

    broader perspective.DM is a multidisciplinary field. It obviously involves

    statistical and computer-science related techniques, butit also involves decision theory and epistemologicalaspects at an equally important level. We think thatmuch more research is needed on the synergisticcombination of the various sub-fields of DM,especially, as we have argued here, with regard todecision support and computational intelligence.

    6. AcknowledgmentsWe thank the anonymous reviewers who provided

    judicious comments that helped us improve our paper.

    We acknowledge the financial support of the National

    Sciences and Engineering Research Council of Canada

    (NSERC).

    7. References

    [1] ACM/SAC (2005), The 2005 ACM Symposium on Applied Computing, Special Track on Data Mining, CFP,Santa Fe (New Mexico, USA), March 2005.

    Proceedings of the 16th International Workshop on Database and Expert Systems Applications (DEXA05)529-4188/05 $20.00 2005 IEEE

  • 8/8/2019 Towards a Better Integration of Data Mining and Decision Support

    5/5

    [2] Angehrn, A.A. & S. Dutta (1998), Case-Based DecisionSupport, Communications of the ACM, 41(5).

    [3] Arnott, D., G. Pervan, P. ODonnell & G. Dodson (2004),An Analysis of Decision Support Systems Research:Preliminary Results, The 2004 IFIP InternationalConference on Decision Support Systems, Prato (Italy), July2004.

    [4] Avesani, P., S. Ferrari & A. Susi (2003), Case-BasedRanking for Decision Support Systems, The ICCBR 2003Conference, Lecture Notes in Artificial Intelligence 2689(Spinger), 35-49, Trondheim (Norway), June 2003.

    [5] Azuaje, F., W. Dubitzky, N. Black & K. Adamson(2000), Retrieval Strategies for Case-Based Reasoning: ACategorized Bibliography, The Knowledge Engineering

    Review, 15(4), 371-379.

    [6] Blanzieri, B., P. Giorgini, P.Massa & S. Recla (2001),Data Mining, Decision Support and Meta-Learning:Towards an Implicit Culture Architecture for KDD,Workshop on Positions, Developments and Future

    Directions, 2001 ECML/PKDD Conference, Freiburg(Germany), September 2001.

    [7] Bolloju, N., M. Khalifa & E. Turban (2002), IntegratingKnowledge Management into Enterprise Environments forthe Next Generation Decision Support, Decision SupportSystems, 33, 163-176.

    [8] Cannataro, M. & D. Talia (2003), The KnowledgeGrid, Communications of the ACM, 46(1).

    [9] Chen, Q., X. Wu & X. Zhu (2004), OIDM: OnlineInteractive Data Mining, The 2004 IEA/AIE Conference,

    Lecture Notes in Artificial Intelligence 3029 (Springer-Verlag), 66-76, Ottawa (Canada), May 2004.

    [10] Cunningham, P., D. Doyle & J. Loughrey (2003), AnEvaluation of the Usefulness of Case-Based Explanation,The ICCBR 2003 Conference, Lecture Notes in Artificial

    Intelligence 2689 (Spinger), 122-130, Trondheim (Norway),June 2003.

    [11] Delisle, S., J. St-Pierre & T. Copeck (to appear), AHybrid Diagnostic-Advisory System for Small and Medium-Sized Enterprises: A Successful AI Application", Applied

    Intelligence (The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies), Springer.

    [12] ECCBR (2004), The 7th

    European Conference in Case-Based Reasoning, CFP, Madrid (Spain), August 2004.

    [13] Euler, T. & M. Scholz (2004), Using Ontologies in aKDD Workbench, Workshop on Knowledge Discovery andOntologies, 2004 ECML/PKDD Conference, Pisa (Italy),September 2004.

    [14] Holsapple, C.W. & A.B. Whinston (1996), DecisionSupport Systems: A Knowledge-Based Approach, West.

    [15] ICDM (2004), The Fourth IEEE InternationalConference on Data Mining, CFP, Brighton (U.K.),November 2004.

    [16] KGGI (2003), Workshop on Knowledge Grid and Grid Intelligence, The 2003 IEEE/WIC International Conferenceon Web Intelligence / Intelligent Agent Technology, Halifax(Nova-Scotia, Canada), October 2003.

    [17] KIMAS (2005), The 2005 IEEE InternationalConference on Integration of Knowledge Intensive Multi-

    Agent Systems , Waltham (Massachusetts, USA), April 2005.

    [18] Lee, J.K. & J.K. Kim (2002), A Case-Based ReasoningApproach for Building a Decision Model, Expert Systems,19(3).

    [19] Mladenic, D., N. Lavrac, M. Bohanec & S. Moyle

    (2003), Data Mining and Decision Support (integration andcollaboration), Kluwer.

    [20] Morik, K. & M. Scholz (2004), The MiningMartApproach to Knowledge Discovery in Databases, in

    Intelligent Technologies for Information Analysis, N. Zhong& J. Liu (eds.), Springer.

    [21] Nemati, H.R., D.M. Steiger, L.S. Iyer & R.T.Herschel(2002), Knowledge Warehouse: An ArchitecturalIntegration of Knowledge Management, Decision Support,Artificial Intelligence and Data Warehousing, DecisionSupport Systems, 33, 143-161.

    [22] Shim, J.P., M. Warkentin, J.F. Courteny, D.J. Power, R.Sharda & C. Carlsson (2002), Past, Present, and Future ofDecision Support Technology, Decision Support Systems,

    33, 111-126.[23] Sun, B., L.D. Xu, X. Pei & H. Li (2003), Scenario-Based Knowledge Representation in Case-Based ReasoningSystems,Expert Systems, 20(2).

    [24] Vilalta, R., C. Giraud-Carrier, P. Brazdil & C. Soares(2004), Using Meta-Learning to Support Data Mining,

    International Journal of Computer Science and Applications,1(1), 31-45.

    [25] Weiss, S.M., S.J. Buckley, S. Kapoor & S. Damgaard(2003), Knowledge-Based Data Mining, The 2003SIGKDD Conference, Washington (D.C., USA), August2003.

    [26] Yang, Q. & H. Cheng (2003), Case Mining from LargeDatabases, The ICCBR 2003 Conference, Lecture Notes in

    Artificial Intelligence 2689 (Spinger), 691-702, Trondheim(Norway), June 2003.

    [27] Zhang, Z., C. Zhang & S. Zhang (2003), An Agent-Based Hybrid Framework for Database Mining, Applied

    Artificial Intelligence , 17, 383-398.

    Proceedings of the 16th International Workshop on Database and Expert Systems Applications (DEXA05)529-4188/05 $20.00 2005 IEEE