what-if simulation modeling in business intelligence

16
International Journal of Data Warehousing and Mining, X(X), X-X, Oct-Dec 2008 1 Copyright © 2008, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. What-if Simulation Modeling in Business Intelligence Matteo Golfarelli 1 [email protected] Stefano Rizzi 1 [email protected] DEIS - University of Bologna, Italy 1 ABSTRACT Optimizing decisions has become a vital factor for companies. In order to be able to evaluate beforehand the impact of a decision, managers need reliable previsional systems. Though data warehouses enable analysis of past data, they are not capable of giving anticipations of future trends. What-if analysis fills this gap by enabling users to simulate and inspect the behavior of a complex system under some given hypotheses. A crucial issue in the design of what-if applications is to find an adequate formalism to conceptually express the underlying simulation model. In this paper we report on how, within the framework of a comprehensive design methodology, this can be accomplished by extending UML 2 with a set of stereotypes. Our proposal is centered on the use of activity diagrams enriched with object flows, aimed at expressing functional, dynamic, and static aspects in an integrated fashion. The paper is completed by examples taken from a real case study in the commercial area. Keywords: What-if analysis, Business intelligence, Data warehouse, UML, Simulation. 1. Introduction Market conditions increasingly force companies to reduce waste and optimize decisions. This has become not only a critical, but a vital factor for companies. In this direction, business intelligence (BI) provides a set of tools and techniques that enable a company to transform its business data into timely and accurate information for the decisional process. BI platforms are used by decision makers to get a comprehensive knowledge of the business and of the factors that affect it, as well as to define and support their business strategies. The goal is to enable data-based decisions aimed at gaining competitive advantage, improving operative performance, responding more quickly to changes, increasing profitability and creating added value for a company (Rizzi, 2009a). As summarized by the so-called BI pyramid shown in Figure 1, BI platforms make it possible for companies to extract and process their own business data and then transform those data into information useful for the decision-making process. The information obtained in this way is then contextualized and enhanced by the decision-makers’ own skills and experience, generating knowledge that is used to make conscious and well-informed decisions (Golfarelli & Rizzi, in press). The BI pyramid demonstrates that data warehouses, that have been playing a lead role within BI platforms in supporting the decision process over the last decade, are no more than the starting point for the application of more advanced techniques that aim at building a bridge to the real decision-making process. This is because data warehouses are aimed at enabling analysis of past data, but they are not capable of giving anticipations of future trends. Indeed, in order to be able to evaluate beforehand the impact of a strategic or tactical move, decision makers need reliable previsional systems. So, almost at the top of the BI pyramid, what-if analysis comes into play.

Upload: ngotram

Post on 04-Jan-2017

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: What-if Simulation Modeling in Business Intelligence

International Journal of Data Warehousing and Mining, X(X), X-X, Oct-Dec 2008 1

Copyright © 2008, Idea Group Inc. Copying or distributing in print or electronic forms without writtenpermission of Idea Group Inc. is prohibited.

What-if Simulation Modeling

in Business IntelligenceMatteo Golfarelli1

[email protected]

Stefano Rizzi1

[email protected]

DEIS - University of Bologna, Italy1

ABSTRACTOptimizing decisions has become a vital factor for companies. In order to be able to evaluate beforehand theimpact of a decision, managers need reliable previsional systems. Though data warehouses enable analysisof past data, they are not capable of giving anticipations of future trends. What-if analysis fills this gap byenabling users to simulate and inspect the behavior of a complex system under some given hypotheses. Acrucial issue in the design of what-if applications is to find an adequate formalism to conceptually expressthe underlying simulation model. In this paper we report on how, within the framework of a comprehensivedesign methodology, this can be accomplished by extending UML 2 with a set of stereotypes. Our proposal iscentered on the use of activity diagrams enriched with object flows, aimed at expressing functional, dynamic,and static aspects in an integrated fashion. The paper is completed by examples taken from a real case studyin the commercial area.

Keywords: What-if analysis, Business intelligence, Data warehouse, UML, Simulation.

1. IntroductionMarket conditions increasingly forcecompanies to reduce waste and optimizedecisions. This has become not only a critical,but a vital factor for companies. In thisdirection, business intelligence (BI) provides aset of tools and techniques that enable acompany to transform its business data intotimely and accurate information for thedecisional process. BI platforms are used bydecision makers to get a comprehensiveknowledge of the business and of the factorsthat affect it, as well as to define and supporttheir business strategies. The goal is to enabledata-based decisions aimed at gainingcompetitive advantage, improving operativeperformance, responding more quickly tochanges, increasing profitability and creatingadded value for a company (Rizzi, 2009a).

As summarized by the so-called BI pyramidshown in Figure 1, BI platforms make itpossible for companies to extract and process

their own business data and then transformthose data into information useful for thedecision-making process. The informationobtained in this way is then contextualized andenhanced by the decision-makers’ own skillsand experience, generating knowledge that isused to make conscious and well-informeddecisions (Golfarelli & Rizzi, in press).

The BI pyramid demonstrates that datawarehouses, that have been playing a lead rolewithin BI platforms in supporting the decisionprocess over the last decade, are no more thanthe starting point for the application of moreadvanced techniques that aim at building abridge to the real decision-making process. Thisis because data warehouses are aimed atenabling analysis of past data, but they are notcapable of giving anticipations of future trends.Indeed, in order to be able to evaluatebeforehand the impact of a strategic or tacticalmove, decision makers need reliableprevisional systems. So, almost at the top of theBI pyramid, what-if analysis comes into play.

Page 2: What-if Simulation Modeling in Business Intelligence

2 International Journal of Data Warehousing and Mining, X(X), X-X, Oct-Dec 2008

Copyright © 2008, Idea Group Inc. Copying or distributing in print or electronic forms without writtenpermission of Idea Group Inc. is prohibited.

Figure 1. The business intelligence pyramid

What-if analysis is a data-intensivesimulation whose goal is to inspect the behaviorof a complex system (i.e., the enterprisebusiness or a part of it) under some givenhypotheses cal led scenar ios . Morepragmatically, what-if analysis measures howchanges in a set of independent variablesimpact on a set of dependent variables withreference to a simulation model offering asimplified representation of the business,designed to display significant features of thebusiness and tuned according to the historicalenterprise data (Kellern et al., 1999).

Example 1. A simple example of what-ifquery in the marketing domain is: How wouldmy profits change if I run a 3×2 (pay 2, take 3)promotion for one week on all audio productson sale? Answering this query requires asimulation model to be built. This model, thatmust be capable of expressing the complexrelationships between the business variablesthat determine the impact of promotions onproduct sales, is then run against the historicalsale data in order to determine a reliableforecast for future sales.

Among the killer applications for what-ifanalysis, it is worth mentioning profitabilityanalysis in commerce, hazard analysis infinance, promotion analysis in marketing, andeffectiveness analysis in production planning(Rizzi, 2009b). Less traditional, yet interestingapplications described in the literature areurban and regional planning supported byspatial databases, index selection in relationaldatabases, and ETL maintenance in datawarehousing systems.

Surprisingly, though a few commercial toolsare already capable of performing forecastingand what-if analysis, very few attempts havebeen made so far outside the simulationcommunity to address methodological andmodeling issues in this field (Golfarelli et al.,2006). On the other hand, facing a what-ifproject without the support of a designmethodology is very time-consuming, and doesnot adequately protect designers and customersagainst the risk of failure.

From this point of view, a crucial issue is tofind an adequate formalism to conceptuallyexpress simulation models. Such formalism, byproviding a set of diagrams that can bediscussed and agreed upon with the users, couldfacilitate the transition from the requirementsinformally expressed by users to theirimplementation on the chosen platform.Besides, as stated by Balci (1995), it couldpositively affect the accuracy in formulatingsimulation problems and help the designer todetect errors as early as possible in the life-cycle of the project. Unfortunately, nosuggestion to this end is given in the literature,and commercial tools do not offer any generalmodeling support.

In this paper we show how, in the light ofour experience with real case studies, aneffective conceptual description of thesimulation model for a what-if application inthe context of BI can be accomplished byextending UML 2 with a set of stereotypes. Asconcerns static aspects we adopt as a referencethe multidimensional model, used to describeboth the source historical data and theprediction; the YAM2 (Abelló et al., 2006)UML extension for modeling multidimensionalcubes is adopted to this end. From thefunctional and dynamic point of view, ourproposal is centered on the use of activitydiagrams enriched with object flows. Inparticular, while control flows allow sequential,concurrent, and alternative computations to beeffectively represented, object flows are used todescribe how business variables and cubes aretransformed during simulation. The approach tosimulation modeling proposed integrates andcompletes the design methodology presented byGolfarelli et al. (2006).

Page 3: What-if Simulation Modeling in Business Intelligence

International Journal of Data Warehousing and Mining, X(X), X-X, Oct-Dec 2007 3

Copyright © 2007, Idea Group Inc. Copying or distributing in print or electronic forms without writtenpermission of Idea Group Inc. is prohibited.

The paper is structured as follows. In thesecond section we survey the literature onmodeling and design for what-if analysis. In thethird section we outline the methodologicalframework that provides the context for ourproposal. The fourth section enunciates a wishlist for a conceptual formalism to supportsimulation modeling. The fifth sectiondiscusses how we employed UML 2 forsimulation modeling. The sixth sectionproposes some examples taken from a casestudy concerning branch profitability andexplains how we built the simulation model.Finally, the seventh section draws theconclusion.

2. Related Works and ToolsThere are a number of papers related to what-ifanalysis in the literature. In several cases, theyjust describe its applications in different fieldssuch as e-commerce (Bhargava et al., 1997)hazard analysis (Baybutt, 2003), spatialdatabases (Klosterman, 1999; Lee & Gahegan,2000), and index selection for relationaldatabases (Chaudhuri & Narasayya, 1998).Other papers, such as the one by Fossett et al.(1991), are focused on the design of simulationexperiments and the validation of simulationmodels. In (1999), Armstrong & Brodie surveya set of alternative approaches to forecasting,and give useful guidelines for selecting the bestones according to the availability and reliabilityof knowledge.

In the literature about simulation, differentformalisms for describing simulation modelsare used, ranging from colored Petri nets (Leeet al., 2006) to event graphs (Kotz et al., 1994)and flow charts (Atkinson & Shorrocks, 1981).The common trait of these formalisms is thatthey mainly represent the dynamic aspects ofthe simulation, almost completely neglectingthe functional (how are data transformed duringthe simulation?) and static (what data areinvolved and how are they structured?) aspectsthat are so relevant for data-intensivesimulations like those at the core of what-ifanalysis in BI.

A few related works can be found in thedatabase literature. Dang & Embury (2004) useconstraint formulae to create hypothetical

scenarios for what-if analysis, whileKoutsoukis et al. (1999) explore therelationships between what-if analysis andmultidimensional modeling. Balmin et al.(2000) present the SE S A M E system forformulating and efficiently evaluating what-ifqueries on data warehouses; here, scenarios aredefined as ordered sets of hypotheticalmodifications on multidimensional data. In allthese papers, no emphasis is placed onmodeling and design issues.

In the context of data warehousing, there arerelevant similarities between simulationmodeling for what-if analysis and the modelingof ETL (Extraction, Transformation andLoading) applications; in fact, both ETL andwhat-if analysis can be seen as a combinationof elemental processes each transforming aninput data flow into an output. Vassiliadis et al.(2002) propose an ad hoc graphical formalismfor conceptual modeling of ETL processes.While such proposal is not based on anystandard formalisms, other proposals extendUML by explicitly modeling the typical ETLmechanisms. For example, Trujillo & Lujan-Mora (2003) represent ETL processes by aclass diagram where each operation (e.g.,conversion, log, loader, merge) is modeled as astereotyped class. All these proposals cannot beconsidered as feasible alternatives to ours, sincethe expressiveness they introduce is specificallyoriented to ETL modeling. On the other hand,they strengthen our claim that extending UMLis a promising direction for achieving a bettersupport to the design activities in the area of BI.

Finally, we mention two relevant approachesfor UML-based multidimensional modeling(Abelló et al., 2006; , Lujan-Mora et al., 2006).Both define a UML profile formultidimensional modeling based on a set ofspecific stereotypes, and represent cubes atthree different abstraction levels. However, theapproach by Abelló et al. (2006) is preferred tothe one by Lujan-Mora et al. (2006) for thepurpose of this work since it allows for easilymodeling different aggregation levels over thebase cube, which is essential in simulationmodeling for what-if analysis.

Recently, what-if analysis has been gainingwide attention from vendors of businessintelligence tools. For instance, both SAP

Page 4: What-if Simulation Modeling in Business Intelligence

4 International Journal of Data Warehousing and Mining, X(X), X-X, Oct-Dec 2008

Copyright © 2008, Idea Group Inc. Copying or distributing in print or electronic forms without writtenpermission of Idea Group Inc. is prohibited.

SEM-BPS (Strategic Enterprise Management -Business Planning and Simulation) and SASForecast Server enable users to makeassumptions on the enterprise state or futurebehavior, as well as to analyze the effects ofsuch assumptions by relying on a wide set offorecasting models. Also Microsoft AnalysisServices provides some limited support forwhat-if analysis. Other commercial tools thatcan be used for what-if analysis to some extentare Applix TM1, Powersim Studio andQlikView.

Also spreadsheets and OLAP tools are oftenused to support what-if analysis. Spreadsheetsoffer an interactive and flexible environmentfor specifying scenarios, but lack seamlessintegration with the bulk of historical data.Conversely, OLAP tools lack the analyticalcapabilities of spreadsheets and are notoptimized for scenario evaluation.

3. Methodological FrameworkAs summarized in Figure 2, a what-ifapplication is centered on a simulation model.The simulation model establishes a set ofcomplex relationships between some businessvariables corresponding to significant entitiesin the business domain (e.g., products,branches, customers, costs, revenues, etc.). Inorder to simplify the specification of thesimulation model and encourage itsunderstanding by users, we functionallydecompose it into scenarios, each describingone or more alternative ways to construct apredic t ion of interest for the user. Theprediction typically takes the form of amultidimensional cube, meant as a set of cellsof a given type, whose dimensions andmeasures correspond to business variables, tobe interactively explored by the user by meansof an OLAP front-end. A scenario ischaracterized by a subset of business variables,called source variables, and by a set ofadditional parameters, called scenarioparameters, whose value the user has to enterin order to execute the simulation model andobtain the prediction. While business variablesare related to the business domain, scenarioparameters convey information technicallyrelated to the simulation, such as the type of

regression adopted for forecasting and thenumber of past years to be considered forregression. Distinguishing source variablesamong business variables is important since itenables the user to understand which are the“levers” that she can independently adjust todrive the simulation; also non-source businessvariables are involved in scenarios, where theycan be used to store simulation results. Eachscenario may give rise to different simulations,one for each assignment of values to the sourcevariables and to the scenario parameters.

Figure 2. What-if analysis at a glance

Example 2. In the promotion domain ofExample 1, the source variables for thescenario are the type of promotion, its duration,and the product category it is applied to;possible scenario parameters are theforecasting algorithm and its tuningparameters. The specific simulation expressedby the what-if query reported in the text isdetermined by giving values “3×2”, “oneweek” and “audio”, respectively, to the threesource variables. The prediction is a sales cubewith dimensions Week and Product andmeasures Revenue, Cost and Profit, whichthe user could effectively analyze by means ofany OLAP front-end.

Designing a what-if application requires amethodological framework; the one weconsider, presented by Golfarelli et al. (2006)and sketched in Figure 3, relies on the sevenphases sketched in the following:1. Goal analysis aims at determining which

business phenomena are to be simulated, andhow they will be characterized. The goalsare expressed by (i) identifying the set ofbusiness variables users want to

Page 5: What-if Simulation Modeling in Business Intelligence

International Journal of Data Warehousing and Mining, X(X), X-X, Oct-Dec 2007 5

Copyright © 2007, Idea Group Inc. Copying or distributing in print or electronic forms without writtenpermission of Idea Group Inc. is prohibited.

Figure 3. A methodology for what-if analysis application design

monitor and their granularity; and (ii)outlining the relevant scenarios in terms ofsource variables users want to control.

2. Business modeling builds a simplified modelof the application domain in order to helpthe designer understand the businessphenomenon, enable her to refine scenarios,and give her some preliminary indicationsabout which aspects can be neglected orsimplified for simulation.

3. Data source analysis aims at understandingwhat information is available to drive thesimulation, how it is structured and how ithas been physically deployed, withparticular regard to the cube(s) that storehistorical data.

4. Multidimensional modeling structurallydescribes the prediction by taking intoaccount the static part of the business modelproduced at phase 2 and respecting therequirements expressed at phase 1. Veryoften, the structure of the prediction is acoarse-grain view of the historical cube(s).

5. Simulation modeling defines, based on thebusiness model, the simulation modelallowing the prediction to be constructed, foreach given scenario, from the source dataavailable.

6. Data design and implementation, duringwhich the cube type of the prediction and thesimulation model are implemented on thechosen platform, to create a prototype fortesting.

7. Validation evaluates, together with the users,how faithful the simulation model is to thereal business model and how reliable the

prediction is. If the approximationintroduced by the simulation model isconsidered to be unacceptable, phases 4 to 7are iterated to produce a new prototype.The five analysis/modeling phases (1 to 5)

require a supporting formalism. Standard UMLcan be used for phases 1 (use case diagrams), 2(a class diagram coupled with activity and statediagrams) and 3 (class, component anddeployment diagrams), while any formalism forconceptual modeling of multidimensionaldatabases can be effectively adopted for phase4 (e.g., (Abelló et al., 2006) or (Lujan-Mora etal., 2006)). On the other hand, finding in theliterature a suitable formalism to give broadconceptual support to phase 5 is much harder.

4. A Wish List for SimulationPhase 5, simulation modeling, is the core phaseof design. In the light of our experience withreal case studies of what-if analysis in the BIcontext, we enunciate a wish list for aconceptual formalism to support it:# 1 The formalism should be capable of

coherently expressing the simulation modelaccording to three perspectives: functional,that describes how business variables aretransformed and derived from each otherduring simulation; dynamic, required todefine the simulation workflow in terms ofsequential, concurrent and alternativetasks; static, to explicitly represent howbusiness variables are aggregated duringsimulation.

Page 6: What-if Simulation Modeling in Business Intelligence

6 International Journal of Data Warehousing and Mining, X(X), X-X, Oct-Dec 2008

Copyright © 2008, Idea Group Inc. Copying or distributing in print or electronic forms without writtenpermission of Idea Group Inc. is prohibited.

#2 It should provide constructs for expressingthe specific concepts of what-if analysis,such as business variables, scenarioparameters, predictions, etc.

# 3 I t should support hierarchicaldecomposition, in order to provide multipleviews of the simulation model at differentlevels of abstraction.

#4 It should be extensible so that designerscan effectively model the peculiarities ofthe specific application domain they aredescribing.

# 5 It should be easy to understand, toencourage the dialogue between designersand final users.

#6 It should rely on a standard notation tominimize the learning effort.

UML perfectly fits requirements #4 and #6,and requirement #5 to some extent. Inparticular, it is well known that the stereotypingmechanism allows UML to be easily extended.As to requirements #1 and #3, the UMLdiagrams that best achieve integration offunctional, dynamic and static aspects whileallowing hierarchical decomposition areactivity diagrams. Within UML 2, activitydiagrams take a new semantics inspired by Petrinets, which makes them more flexible andprecise than in UML 1 (OMG, 2008). Theirmost relevant features for the purpose of thiswork are summarized below:• An activity is a graph of activity nodes (that

can be action, control or object nodes)connected by activity edges (either controlflows or object flows).

• An action node represents a task within anactivity; it can be decorated by the rakesymbol to denote that the action isdescribed by a more detailed activitydiagram.

• A control node manages the control flowwithin an activity; control nodes formodeling decision points, fork andsynchronization points are provided.

• An object node denotes that one or moreinstances of a given class are availablewithin an activity, possibly in a given state.Input and output objects for activities aredenoted by overlapping them to the activityborderline.

• Control flows connect action nodes andcontrol nodes; they are used to denote theflow of control within the activity.

• Object flows connect action nodes to objectnodes and vice versa; they are used todenote that objects are produced orconsumed by tasks.

• Selection and transformation behaviors canbe applied to object nodes and flows toexpress selection and projection queries onobject flows.

Though activity diagrams are a nice startingpoint for simulation modeling since theyalready support advanced functional anddynamic modeling, some extensions arerequired in order to attain the desiredexpressiveness as suggested by requirement #2.In particular, it is necessary to define anextension allowing basic multidimensionalmodeling of objects in order to express howsimulation activities are performed on data atdifferent levels of aggregation.

With regard to this we recall that, as statedby OMG (2008), it is expected that the UML 2Diagram Interchange specification will supporta form of integration between activity and classdiagrams, by allowing an object node to belinked to a class diagram that shows theclassifier for that object and its relations toother elements. In this way, while activitiesprovide a functional view of processes,associated class diagrams can be used to showstatic details. This argument has a relevantweight in our approach, since we associateactivity diagrams with class diagrams tobasically model the structure of cubes and theirrelationships with business variables.

5. Expressing Simulation Modelsin UML 2In our proposal, the core of simulationmodeling is a set of UML 2 diagrams organizedas follows:1. A use case diagram that reports a what-if

analysis use case including one or morescenario use cases.

2. One or more class diagrams that staticallyrepresent scenarios and multidimensionalcubes. A scenario is a class whose attributesare scenario parameters; it is related via an

Page 7: What-if Simulation Modeling in Business Intelligence

International Journal of Data Warehousing and Mining, X(X), X-X, Oct-Dec 2007 7

Copyright © 2007, Idea Group Inc. Copying or distributing in print or electronic forms without writtenpermission of Idea Group Inc. is prohibited.

aggregation to the business variables that actas source variables for the scenario. Cubesare represented in terms of their dimensions,levels and measures.

3. An activity diagram (called scenar iodiagram) for each scenario use case. Eachscenario diagram is hierarchically explodedinto activity diagrams at increasing level ofdetail. All activity diagrams represent, asobject nodes, the business variables, thescenario parameters and the cubes that areproduced and consumed by tasks.

5.1. Static ModelingRepresentation of cubes is supported by YAM2

(Abelló et al., 2006), a UML extension forconceptual multidimensional modeling. YAM2

models concepts at three different detail levels:upper, intermediate, and lower. At the upperlevel, stars are described in terms of facts anddimensions. At the intermediate level, a fact isexploded into cells at different aggregationgranularities, and the aggregation levels foreach dimension are shown. Finally, at the lowerlevel, measures of cells and descriptors oflevels are represented.

In our approach, the intermediate and lowerlevels are considered. The intermediate level isused to model, through the «cell» stereotype(represented by the C icon), the aggregationgranularities at which data are processed byactivities, and to show the combinations ofdimension levels («level» stereotype,represented by the L icon) that define thosegranularities. The lower level allows singlemeasures of cells to be described as attributesof cells, and their type to be separately modeledthrough the «KindOfMeasure» stereotype.

In order to effectively use YAM2 forsimulation modeling, three additionalstereotypes are introduced for modeling,respectively, scenarios, business variables andscenario parameters:name: scenariobase class: classicon: Sdescription: classes of this stereotype

represent scenariosconstraints: a s c e n a r i o class is an

aggregation of b u s i n e s svariable classes (that representits source variables)

its source variables)

name: business variablebase class: classicon: Bdescription: classes of this stereotype

represent business variablestaggedvalues:

isNumerical (type Boolean,indicates whether the businessvariable can be used as ameasure)isDiscrete (type Boolean,indicates whether the businessvariable can be used as adimensions)

name: scenario parametericon: SPbase class: attributedescription: attributes of this stereotype

represent parameters that modeluser settings concerningscenarios

constraints: a scenario parameter attributebelongs to a scenario class

5.1. Dynamic ModelingAs already mentioned, activity diagrams atdifferent levels of detail are used todynamically model how simulation is carriedout. The main features of this representation aresummarized below:• The rake symbol denotes the actions that

will be further detailed in subdiagrams.• Object nodes that represent cubes of class

<Class> cells are named as Cube of<Class>. The state in the object node isused to express the current state of thecubes being processed.

• Other object nodes represent businessvariables and scenario parameters used byactivities.

• The «datastore» stereotype is used torepresent an object node that stores non-transient information, such as a cube storinghistorical data.

• Selection operations on cubes arerepresented by decorating object flows witha selection behavior («selection»

Page 8: What-if Simulation Modeling in Business Intelligence

8 International Journal of Data Warehousing and Mining, X(X), X-X, Oct-Dec 2008

Copyright © 2008, Idea Group Inc. Copying or distributing in print or electronic forms without writtenpermission of Idea Group Inc. is prohibited.

stereotype). Projection operations on cubes(i.e., restricting the set of measures to beprocessed) are represented by decoratingobject flows with a transformation behavior(«transformation» stereotype).

Based on the characteristics of each specificapplication domain, some ad hoc stereotypescan be defined to model recurrent types ofactions. The action stereotypes we definedbased on our experience are listed below:name: olapbase class: actiondescription: actions of this stereotype

transform cubes by applyingOLAP operators; mainly, theyaggregate, select and projectcube cells

constraints: at least one input and oneoutput object flow must connectan olap action to Cube of<Class> object nodes

name: regressionbase class: actiondescription: actions of this stereotype carry

out regression to extrapolatefuture data from past data

constraints: at least one input object flowand one output object flow mustconnect a regression action toCube of <Class> object nodes

name: apportionbase class: actiondescription: actions of this stereotype

apportion values of aggregatecube cells among a set of finercube cells according to a givendriver

constraints: at least one input object flowand one output object flow mustconnect a regression action toCube of <Class> object nodes

name: user inputbase class: actiondescription: actions of this stereotype allow

manual input of data

constraints: at least one output object flowmust exit a user input action

6. A Case StudyOrogel S.p.A. is a large Italian company in thearea of deep-frozen food. It has a number ofbranches scattered on the national territory,each typically entrusted with selling and/ordistribution of products. Its data warehouseincludes a number of data marts, one of whichdedicated to commercial analysis and centeredon a sales cube with dimensions Month ,Product, Customer, and Branch.

The managers of Orogel are willing to carryout an in-depth analysis on the profitability(i.e., the net revenue) of branches. Moreprecisely, they wish to know if, and to whatextent, it is convenient for a given branch toinvest on either selling or distribution, withparticular regard to the possibility of takingnew customers or new products. Thus, the fourscenarios chosen for prototyping are: (i)analyze profitability during next 12 months incase one or more new products weretaken/dropped by a branch; and (ii) analyzeprofitability during next 12 months in case oneor more new customers were taken/dropped bya branch. Decision makers ask for analyzingprofitability at different levels of detail; thefinest granularity required for the prediction isthe same of the sales cube.

The main issue in simulation modeling is toachieve a good compromise between reliabilityand complexity. To this end, in constructing thesimulation model we adopted a two-stepapproach that consists in first forecasting pastdata, then “stirring” the forecasted dataaccording to the event (new product or newcustomer) expressed by the scenarios. Wemainly adopted statistical techniques for boththe forecasting and the stirring steps; inparticular, linear regression is employed toforecast unit prices, quantities and costs startingfrom historical data taken from the commercialdata mart and from the profit and loss accountduring a past period taken as a reference. Basedon the decision makers’ experience, and aimedat avoiding irrelevant statistical fluctuations

Page 9: What-if Simulation Modeling in Business Intelligence

International Journal of Data Warehousing and Mining, X(X), X-X, Oct-Dec 2007 9

Copyright © 2007, Idea Group Inc. Copying or distributing in print or electronic forms without writtenpermission of Idea Group Inc. is prohibited.

Sales decision support

Monthly reporting on

revenues

OLAP analysis of budget variance

What -if analysis of profitability

Managementcontroller

Branch managerAdd

customerAdd

product

«include» «include»

Sales manager

Sales agent

Generalmanager

Remove customer

Remove product

«include»«include»

Figure 4. Use-case diagram

Product{isDiscrete}

Customer{isDiscrete}

Month{isDiscrete}

Branch{isDiscrete}

Year{isDiscrete}

EconomicCategory{isDiscrete}

quantitySold: QuantityunitPrice: PricefixedCosts: CostvariableCosts: Cost/netRevenue: Revenue

Sale

quantitySold: Quantity

YearlyQuantity

C

C

BL BL

BL BL BLBL

«KindOfMeasure»Quantity

{isNumerical}

«KindOfMeasure»Cost

{isNumerical}

unitPrice*quantitySold - fixedCosts - variableCosts

costValue: Cost

Profit&LossAccount

CostElement{isDiscrete} BL

C

B

B

Figure 5. Multidimensional class diagram

while capturing significant trends, we adopteddifferent granularities for forecasting thedifferent measures of the prediction cube(Golfarelli, 2006).

6.1. Representing the SimulationModel: Static AspectsThe four what-if use cases (one for eachscenario) are part of a use case diagram that, assuggested by List et al. (2000), expresses howthe different organization roles take advantageof BI in the context of sales analysis. Figure 4

shows a part of the use-case diagram for ourcase study. For space reasons, in this paper wewill discuss only the Add product use case.

The class diagram shown in Figure 5 gives a(partial) specification of the multidimensionalstructure of the cubes involved. Sale is the basecell of the sales cube; its measures areq u a n t i t y S o l d , u n i t P r i c e , fixedCosts,variableCosts and netRevenue (the latter isderived from the others), while the dimensionsare Product, Customer, Month and Branch.Aggregations from one level to another

Page 10: What-if Simulation Modeling in Business Intelligence

10 International Journal of Data Warehousing and Mining, X(X), X-X, Oct-Dec 2008

Copyright © 2008, Idea Group Inc. Copying or distributing in print or electronic forms without writtenpermission of Idea Group Inc. is prohibited.

Product{isDiscrete}

Branch{isDiscrete}

BL

BL

regressionLength: Integer forecastTechnique: String

AddProduct

QtyScaling

UnitPriceScaling

SPSP

referenceBranch

toBranch

S

B

B

Figure 6. Scenario class diagram

Scenario management

Forecast Stir product

stored in the data warehouse

stored in <PL_ACC.XLS>

Cube of Sale[forecast]

What -if analysis of profitability

«datastore»Cube of Sale

[history]

«datastore»Cube of Sale[prediction]

«datastore»Cube of Profit&LossAccount

[history]

Figure 7. Scenario diagram

«userInput»Input

parameters

«userInput»Input source

variables

Scenario management

AddProduct

UnitPriceScaling

Product

QtyScaling

Branch

Figure 8. Activity diagram for Scenario management

represent roll-up hierarchies (e.g., products rollup to economic categories). Both dimensionlevels and measure types are furtherstereotyped as business variables.YearlyQuantity is a cell derived from Sale byaggregating by EconomicCategory, Year and

Branch and projecting on quantitySold; it willbe used in activity diagrams to modelaggregated data for regression. Finally, theProfit&LossAccount base cell stores the costsby Year, CostElement and Branch.

Page 11: What-if Simulation Modeling in Business Intelligence

International Journal of Data Warehousing and Mining, X(X), X-X, Oct-Dec 2007 11

Copyright © 2007, Idea Group Inc. Copying or distributing in print or electronic forms without writtenpermission of Idea Group Inc. is prohibited.

Figure 6 reports an additional class diagramthat statically represents the AddProductscenario in terms of its parameters and sourcevariables:• f o r e c a s t T e c h n i q u e and

regressionLength are scenario parameters.The first one specifies if forecast should bebased on regression (which means thattrends for the future are extrapolated frompast data) or on judgment (which meansthat trends for the future are manuallyentered by users). The second one stores thenumber of past years to be used as input forregression.

• A m o n g s o u r c e v a r i a b l e s ,AddProduct.addedProduct is the specificproduct that users may decide to add to abranch, while AddProduct.toBranch is thespecific branch that product is added to.AddProduct.referenceBranch is a branch,t h a t a l r e a d y s e l l sAddProduct.addedProduct, and that userschoose as a reference for estimatingquantities and unit prices for sellingA d d P r o d u c t . a d d e d P r o d u c t inA d d P r o d u c t . t o B r a n c h . Finally,UnitPriceScaling and QtyScaling storethe percentage change in unit price andquantity sold that users expect inAddProduct.toBranch with reference toAddProduct.referenceBranch.

6.2. Representing the SimulationModel: Dynamic AspectsThe Add product use case is expanded in thescenario diagram reported in Figure 7, thatprovides a high-level overview of the wholesimulation process. The S c e n a r i omanagement action is aimed at enteringvalues for the scenario parameters and thesource variables. The cube storing historicalsales data is represented as an object nodecalled Cube of Sale , with state [history] andstereotyped as «datastore». The Forecastaction takes this cube and the cube storing theprofit and loss account, and produces in outputa Cube of Sale with state [forecast] storingthe sales trends for next year. This cube is thentransformed by the Stir product action, thatsimulated the addition of a product to a branch,

to produce the final prediction in the form of aCube of Sale with state [prediction].

The action nodes of the context diagram areexploded into a set of hierarchical activitydiagrams whose level of abstraction may bepushed down to describing tasks that can beregarded as atomic. Some of them are reportedhere in a simplified form and briefly discussedbelow:• Activity Scenario management (Figure

8) enables users to set all source variablesand scenario parameters. This is representedby decorating actions with the «userinput» stereotype.

• Activity Forecast (Figure 9) is aimed atextrapolating sale data for the next twelvemonths, represented by the Cube of Saleobject node with state [forecast]. This isdone separately for each single measure ofthe Sale cell. In particular, forecastinggeneral costs requires to extrapolate thefuture fixed and variable costs from the pastprofit and loss accounts, and scale variablecosts based on the forecasted quantities.Input and output objects nodes forForecast are emphasized by placing themon the activity borderline. The «selection»and «transformation» stereotypes are usedto express, respectively, that only the salesdata of the last few years have to bese lec ted (as def ined by theregressionLength scenario parameter) andwhich measure(s) from cube cells are to beprocessed. States [ g c F o r e c a s t ] ,[qtyForecast] and [upForecast] denoteforecast cubes where only cost, quantity andprice measures have been calculated,respectively. These three cubes are thenjoined together through the Drill-acrossaction.

• The quantity forecast granularity suggestedby users i s Y e a r , Branch,EconomicCategory. The forecast for nextyear (Figure 10) can be done, depending onthe value taken by the forecastTechniquescenario parameter, either by judgment (theyearly quantities for next year are directlyentered by the user) or by regression (basedon the yearly quantities sold during the lastregressionLength years, stored in a Cubeof YearlyQuantity with state [history]). In

Page 12: What-if Simulation Modeling in Business Intelligence

12 International Journal of Data Warehousing and Mining, X(X), X-X, Oct-Dec 2008

Copyright © 2008, Idea Group Inc. Copying or distributing in print or electronic forms without writtenpermission of Idea Group Inc. is prohibited.

Forecast

Forecast quantity

Forecast unit price

Forecast general costs

general costs fornext 12 months

quantities fornext 12 months

Cube of Sale[history]

«transformation»Sale.quantitySold

Cube of Sale[history]

«transformation»Sale.unitPrice

Cube of Sale[qtyForecast]

Cube of Sale[gcForecast]

Cube of Sale[upForecast]

unit prices fornext 12 months

«datastore»Cube of

Profit&LossAccount[history]

«datastore»Cube of Sale

[history]

Cube of Sale[forecast]

«selection»last AddProduct.regressionLength years

Cube of Sale[history]

Cube of Sale[qtyForecast]

Cube of Sale[qtyForecast]

«transformation»Sale.fixedCosts,

Sale.variableCosts

Figure 9. Activity diagram for Forecast

both cases, the total quantities by branchand economic category are stored in aCube of YearlyQuantity with state[forecast]. This cube is then apportionedon the single months, products andcustomers proportionally to the quantitiessold during the last 12 months. Note theuse of names to express specific roles ofobject flows within actions. For instance,the time span object flow in input toQuantity regression carries the temporalspan used for regression; the driver objectflow in input to Quantity apportiondenotes the flow carrying the cube whosecells provide the historical data used as adriver to proportionally distribute yearlyquantities among months, products andcustomers.

• Finally, Figure 11 explodes the St i rproduct activity, that simulates the effectso f a d d i n g a n e w p r o d u c t

(AddProduct.addedProduct) to a branch(AddProduct.toBranch) by reproducingthe sales of that product in a differentbranch (AddProduct.referenceBranch)where that product is already sold. First, thepast sales of the product are scaledaccording to the user-specified percentagesstored in the q t y S c a l i n g andunitPriceScaling source variables (actionScale quantities and unit prices ), andt h e y a r e a s c r i b e d t o t h eAddProduct.toBranch branch. This actionproduces a Cube of Sale object node withs t a t e [ a d d e d P r o d u c t ] . Then,cannibalizationi on forecasted sales for theother products (Cube of Sale with state[forecast]) is simulated by applying aproduct correlation matrix (Cube ofC o r r e l a t i o n ) built by judgmentaltechniques, i.e., by user input. Finally, fixedcosts are properly redistributed on the

Page 13: What-if Simulation Modeling in Business Intelligence

International Journal of Data Warehousing and Mining, X(X), X-X, Oct-Dec 2007 13

Copyright © 2007, Idea Group Inc. Copying or distributing in print or electronic forms without writtenpermission of Idea Group Inc. is prohibited.

«regression»Quantity

regression

«apportion»Quantity apportion by Month, Product,

Customer

«driver»

«history»«length»

Forecast quantity

«olap»Group by Year,

Branch, Econ.Cat.

«userInput»Input yearly quantities

[AddProduct.forecastTechnique='regression']

quantitiesfor next year

quantities fornext 12 months

«datastore»Cube of Sale

[history]

AddProduct Cube of Sale[history]

Cube of Sale[qtyForecast]

Cube of YearlyQuantity[forecast]

Cube of YearlyQuantity[history]

«transformation»AddProduct.regressionLength

«selection»last 12 monthsCube of Sale

[history]

Figure 10. Activity diagram for Forecast quantity

single forecasted sales, changing the state ofthe Cube of Sale object node from[cannibalized] to [prediction].

6.3. Building the Simulation ModelIn this section we give an overview of theapproach we pursued to build the UMLsimulation model for Orogel. The startingpoints are the use case diagram, the businessmodel and the multidimensional modelobtained, respectively, from phases 1, 2 and 4of the methodology outlined in the thirdsection. For simplicity, we assume that themultidimensional model is already coded inYAM2.1. The class diagram is created first, by

extending the multidimensional modelthat describes the prediction with thestatic specification of scenarios, sourcevariables and scenario parameters.

2. For each scenario reported in the use casediagram, a high-level scenario diagram is

created. This diagram should show themacro-phases of simulation, the maindata sources and the prediction. Theobject nodes should be namedconsistently with the classes diagram.

3. Each activity in each scenario diagram isiteratively exploded and refined intoadditional activity diagrams. As new,more detailed activities emerge, businessvariables and scenario parameters fromthe class diagram may be included inactivity diagrams. Relevant aggregationlevels for processing business variableswithin activities may be identified, inwhich case they are increasingly reportedon the class diagram. Refinement goes onuntil the activities are found that areelemental enough to be understood by anexecutive designer/programmer.

We chose to build activity diagrams in a top-down fashion since, in our experience, thisapproach provides a stronger thread for

Page 14: What-if Simulation Modeling in Business Intelligence

14 International Journal of Data Warehousing and Mining, X(X), X-X, Oct-Dec 2008

Copyright © 2008, Idea Group Inc. Copying or distributing in print or electronic forms without writtenpermission of Idea Group Inc. is prohibited.

QtyScaling

«scale»Scale quantities and unit prices

Cannibalize

Distribute fixed costs

«scaling factor»

«scaling factor»

«userInput»Input

correlations

Stir product

UnitPriceScaling

Cube of Sale[forecast]

«selection»Sale.Product=AddProduct.addedProduct AND

Sale.Branch=AddProduct.referenceBranch

Cube of Sale[addedProduct]

Cube of Correlation

Cube of Sale[cannibalized]

«datastore»Cube of Sale

[prediction]

Cube of Sale[forecast]

Cube of Sale[forecast]

Figure 11. Activity diagram for Stir product

reasoning with users, especially in the verycommon case that, in the beginning, users havelittle or no idea about how the basic laws thatrule their business world should be coded.

In order to validate the simulation model, weused 2003 and 2004 data to forecast theprofitability for 2005. A comparison with theactual data for 2005 yielded an average error ofabout 18% on the total profitability of thesingle branches, which decision makers judgedto be very promising. The error on the totalprofitability for 2005 was significantly lower(about 7%) due to a compensation effect.

7. ConclusionTo sum up, our approach to simulationmodeling fulfills the wish list proposed in thefourth section as follows: (#1) Static, functionaland dynamic aspects are modeled in anintegrated fashion by combining use case, classand activity diagrams; (#2) Specific constructsof what-if analysis are modeled through theUML stereotyping mechanism; (#3) Multiple

levels of abstraction are provided by bothactivity diagrams, through hierarchicaldecomposition, and class diagrams, through thethree detail levels provided by YAM2; (#4)Extensibility is provided by applying thestereotyping mechanism; (#5) Thoughcompletely understanding the implications of aUML diagram is not always easy for businessusers, the precision and methodological rigorencouraged by UML let them more fruitfullyinteract with designers, thus allowing solutionsto simulation problems to emerge easily andclearly during analysis even when, in thebeginning, users have little or no idea abouthow the basic laws that rule their businessworld should be coded; (#6) UML is a standard.

In practice, the approach proved successfulin making the design process fast, well-structured and transparent. A critical evaluationof the proposed approach against its possiblealternatives unveils that the decisive factor isthe choice of adopting UML as the modelinglanguage rather than devising an ad hocformalism. Indeed, adopting UML poses some

Page 15: What-if Simulation Modeling in Business Intelligence

International Journal of Data Warehousing and Mining, X(X), X-X, Oct-Dec 2007 15

Copyright © 2007, Idea Group Inc. Copying or distributing in print or electronic forms without writtenpermission of Idea Group Inc. is prohibited.

Figure 12. Scenario management window for defining hypotheses on general costs

constraints in the syntax of diagrams (forinstance, the difficulty of directly showing onactivity diagrams the aggregation level at whichcells are processed); on the other hand it bringssome undoubted advantages to the designer,namely, the fact of relying on a standard andwidespread formalism. Besides, usinghierarchical decomposition of activity diagramsto break down the complexity of modelingincreases the scalability of the approach.

The simulation model designed has beenprototyped in C#. Oracle 9i is the platformchosen for hosting the predictions and as arepository for business variables and modelparameters. Business Objects is used for OLAPanalysis of predictions. A screenshot of theGUI used to input business variables andscenario parameters is reported in Figure 12; inparticular, the form used to formulatehypotheses about general costs is shown.

We conclude by remarking that the proposedformalism is oriented to support simulationmodeling at the conceptual level, which in ouropinion will play a crucial role in reducing theoverall effort for design and in simplifying itsreuse and maintenance. Devising a formalismcapable of adequately expressing the simulationmodel at the logical level, so that it can bedirectly translated into an implementation, is asubject for our future work.

REFERENCESAbelló, A., Samos, J., & Saltor F. (2006). YAM2: a

multidimensional conceptual model extendingUML. Information Systems, 31(6), 541-567.

Armstrong, S., & Brodie, R. (1999). Forecasting formarketing. In G. Hooley and M. Hussey (Eds.),Quantitative methods in marketing (pp. 92-119).Int. Thompson Business Press.

Atkinson, W. D., & Shorrocks, B. (1981). Competitionon a Divided and Ephemeral Resource: ASimulation Model. Journal of Animal Ecology, 50,461-471.

Baybutt, P. (2003). Major hazards analysis – Animproved process hazard analysis method. ProcessSafety Progress, 22(1), 21-26.

Balci, O. (1995). Principles and Techniques ofSimulation Validation, Verification, and Testing. InProceedings Winter Simulation Conference (pp.147-154). Arlington, USA.

Balmin, A., Papadimitriou, T., & Papakonstantinou, Y.(2000). Hypothetical Queries in an OLAPEnvironment. In Proceedings Conference on VeryLarge Data Bases (pp. 242-253). Cairo, Egypt.

Bhargava, H. K., Krishnan, R., & Muller, R. (1997).Electronic Commerce in Decision Technologies: ABusiness Cycle Analysis. International Journal ofElectronic Commerce, 1(4), 109-127.

Chaudhuri, S., & Narasayya, V. (1998). AutoAdminwhat-if index analysis utility. SIGMOD Records,27(2), 367-378.

Dang, L., & Embury, S. M. (2004). What-If Analysiswith Constraint Databases. In Proceedings BritishNational Conference on Databases. Edinburgh,Scotland.

Fossett, C., Harrison, D., & Weintrob, H. (1991). Anassessment procedure for simulation models: a casestudy. Operations Research, 39(5), 710-723.

Page 16: What-if Simulation Modeling in Business Intelligence

16 International Journal of Data Warehousing and Mining, X(X), X-X, Oct-Dec 2008

Copyright © 2008, Idea Group Inc. Copying or distributing in print or electronic forms without writtenpermission of Idea Group Inc. is prohibited.

Golfarelli, M., Rizzi, S., & Proli, A. (2006). DesigningWhat-if Analysis: Towards a Methodology. InProceedings International Workshop on DataWarehousing and OLAP (pp. 51-58). Arlington,USA.

Golfarelli, M., & Rizzi, S. (in press). Data warehousedesign: Modern principles & methodology.McGraw-Hill Professional.

Kellner, M., Madachy, R., & Raffo, D. (1999). Softwareprocess simulation modeling: Why? What? How?Journal of Systems and Software, 46(2-3), 91-105.

Klosterman, R. (1999). The What if? collaborativesupport system. Environment and Planning, B:Planning and Design, 26, 393-408.

Kotz, D., Toh, S. B., & Radhakrishnan, S. (1994). ADetailed Simulation Model of the HP 97560 DiskDrive (Tech. Rep.). Hanover, USA: DartmouthCollege.

Koutsoukis, N. S., Mitra, G., & Lucas, C. (1999).Adapting on-line analytical processing for decisionmodelling: the interaction of information anddecision technologies. Decision Support Systems,26(1), 1-30.

Lee, C., Huang, H. C., Liu, B., & Xu, Z. (2006).Development of timed colour Petri net simulationmodels for air cargo terminal operations. Computersand Industrial Engineering, 51(1), 102-110.

Lee, I., & Gahegan, M. (2000). What-if Analysis forPoint Data Sets Using Generalised VoronoiDiagrams. In Proceedings International Conferenceon GeoComputation. Greenwich, UK.

List, B., Schiefer, J., & Tjoa, A. M. (2000). Process-Oriented Requirement Analysis Supporting the DataWarehouse Design Process – A Use Case DrivenApproach. Proceedings 11th InternationalConference Database and Expert SystemsApplications (pp. 593-603). London, UK.

Lujan-Mora, S., Trujillo, J., & Song, I.-Y. (2006). AUML profile for multidimensional modeling in datawarehouses. Data & and Knowledge Engineering,59(3), 725-769.

OMG, (2008). UML: Superstructure, version 2.0.Retrieved December 10, 2008, from http://www.omg.org.

Rizzi, S. (2009a). Business Intelligence. In L. Liu and T.Özsu (Eds.), Encyclopedia of Database Systems.Springer.

Rizzi, S. (2009b). What-if analysis. In L. Liu and T.Özsu (Eds.), Encyclopedia of Database Systems.Springer.

Trujillo, J., & Lujan-Mora, S. (2003). A UML basedapproach for modelling ETL processes in datawarehouses. In Proceedings InternationalConference on Conceptual Modeling (pp. 307-320).Chicago, USA.

Vassiliadis, P., Simitsis, A., & Skiadopoulos, S. (2002).Conceptual modeling for ETL processes. InProceedings International Workshop on DataWarehousing and OLAP (pp. 14-21). McLean,USA.

i Cannibalization is the process by which a new

product gains sales by diverting sales from existingproducts, which may deeply impact the overallprofitability.