gaining is business value through big data analytics: a ... · gaining is business value through...

19
Gaining IS Business Value through Big Data Analytics Thirty Sixth International Conference on Information Systems, Fort Worth 2015 1 Gaining IS Business Value through Big Data Analytics: A Case Study of the Energy Sector Completed Research Paper Mariya Sodenkamp University of Bamberg Kapuzinerstraße 16, 96047 Bamberg [email protected] Ilya Kozlovskiy University of Bamberg Kapuzinerstraße 16, 96047 Bamberg [email protected] Thorsten Staake University of Bamberg / ETH Zurich Kapuzinerstraße 16, 96047 Bamberg / Rämistrasse 101, 8092 Zürich [email protected] Abstract Following decades of stability and comfortable margins, utility companies today face strong pressure from regulatory bodies and competitors. As a response to the market dynamics, many have initiated a transformation from a “provider” to a service company, yet realize that their customer insights that would be necessary to successfully develop and market new services are sparse. We argue that the required information is contained in consumption data that is available to utility companies. We demonstrate how data analytics and machine learning make sense out of such data and add value to organizations. Using datasets containing annual electricity consumption information of private households, we apply and test in field experiments a Support Vector Machines algorithm that predicts probabilities of individual costumers to sign up on an energy efficiency portal. We show that signup rates can be doubled and argue that classification tools provide customer insights at low cost and at scale. Keywords: Business value of IS/value of IS, Decision Support Systems (DSS), Data analysis, Green IT/IS Introduction The ways to enhance business value of Information Systems (IS) in organizations has been, and is predicted to remain, one of the major topics of interest for IS researchers and practitioners. Recent years have witnessed the emergence of the field of big data analytics and its role in decision support as a new frontier of IS, able to transform the competitive landscape and to improve organizational performance (Goes, 2014; Sharma et al., 2014). As a consequence, much attention has been paid to the value that organizations could create through the use of big data and analytics technologies (Mithas et al., 2013; Gillon et al., 2012; Newell & Marabelli, 2015). Although the pathways from investments in those technologies to economic utility are not obvious, many researchers have documented empirical evidence of the value of being a data-driven organization (Sharma et al., 2014; Mithas et al., 2011; Saldanha et al., 2013; Mithas et al., 2012). La Valle et al. (2011) and Davenport (2006) describe many examples of successful use of data analytics and report that top performing organizations make successful decisions using rigorous analysis at more than double rate of lower performing organizations.

Upload: others

Post on 28-Oct-2019

21 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Gaining IS Business Value through Big Data Analytics: A ... · Gaining IS Business Value through Big Data Analytics Thirty Sixth International Conference on Information Systems, Fort

Gaining IS Business Value through Big Data Analytics

Thirty Sixth International Conference on Information Systems, Fort Worth 2015 1

Gaining IS Business Value through Big Data Analytics: A Case Study of the Energy Sector

Completed Research Paper

Mariya Sodenkamp University of Bamberg

Kapuzinerstraße 16, 96047 Bamberg [email protected]

Ilya Kozlovskiy University of Bamberg

Kapuzinerstraße 16, 96047 Bamberg [email protected]

Thorsten Staake

University of Bamberg / ETH Zurich Kapuzinerstraße 16, 96047 Bamberg / Rämistrasse 101, 8092 Zürich

[email protected]

Abstract Following decades of stability and comfortable margins, utility companies today face strong pressure from regulatory bodies and competitors. As a response to the market dynamics, many have initiated a transformation from a “provider” to a service company, yet realize that their customer insights that would be necessary to successfully develop and market new services are sparse. We argue that the required information is contained in consumption data that is available to utility companies. We demonstrate how data analytics and machine learning make sense out of such data and add value to organizations. Using datasets containing annual electricity consumption information of private households, we apply and test in field experiments a Support Vector Machines algorithm that predicts probabilities of individual costumers to sign up on an energy efficiency portal. We show that signup rates can be doubled and argue that classification tools provide customer insights at low cost and at scale.

Keywords: Business value of IS/value of IS, Decision Support Systems (DSS), Data analysis, Green IT/IS

Introduction The ways to enhance business value of Information Systems (IS) in organizations has been, and is predicted to remain, one of the major topics of interest for IS researchers and practitioners. Recent years have witnessed the emergence of the field of big data analytics and its role in decision support as a new frontier of IS, able to transform the competitive landscape and to improve organizational performance (Goes, 2014; Sharma et al., 2014). As a consequence, much attention has been paid to the value that organizations could create through the use of big data and analytics technologies (Mithas et al., 2013; Gillon et al., 2012; Newell & Marabelli, 2015). Although the pathways from investments in those technologies to economic utility are not obvious, many researchers have documented empirical evidence of the value of being a data-driven organization (Sharma et al., 2014; Mithas et al., 2011; Saldanha et al., 2013; Mithas et al., 2012). La Valle et al. (2011) and Davenport (2006) describe many examples of successful use of data analytics and report that top performing organizations make successful decisions using rigorous analysis at more than double rate of lower performing organizations.

Page 2: Gaining IS Business Value through Big Data Analytics: A ... · Gaining IS Business Value through Big Data Analytics Thirty Sixth International Conference on Information Systems, Fort

Gaining IS Business Value through Big Data Analytics

Thirty Sixth International Conference on Information Systems, Fort Worth 2015 2

However, while there is some evidence that investments in analytics can create value, deeper analysis of the thesis ‘analytics leads to value’ is needed (Sharma et al., 2014). This is particularly challenging due to the ambiguity and fuzziness of the ‘IS value’ construct, because substantial part of IS value manifests various intangible value items, such as improved customer knowledge, one-to-one marketing effectiveness, customer satisfaction, and customer surplus (Davern & Wilkin, 2010; Sharma et al. 2014). One of the most prominent examples of the domains dealing with intangible values is a recent work on Green IS and energy informatics that shifts the focus of IS value towards sustainability goals that include environmental, economic, and societal issues (Dyllick & Hockerts, 2002; Schryen, 2013; Porter & Kramer, 2006; Malhotra et al., 2013). Jenkin et al. (2011) define ‘Green IS’ as the development and use of information systems to support or enable environmental sustainability initiatives and, thus, tend to have an indirect and positive effect. The thematic scope of the Green IS movement encompasses different topics, such as the improved eco-efficiency, eco-equity, eco-effectiveness of business processes through automation, the development of sustainable strategies with the aid of decision support systems (Thambusamy & Salam, 2010; vom Brocke et al., 2013; Seidel et al., 2013; Melville, 2010).

Although big data analytics is undoubtedly the source of unprecedented power to radically transform and improve entire business sectors, with major impacts on the society as a whole (Campolargo, 2015; Jayaraman, 2014), the role of this transformation theme especially for Green IS is still an understudied topic in both IS literature and practice. This is where the motivation for our research emerges.

Due to the increasing domestic energy consumption, which in Western countries accounts for 20 to 30 percent of the total energy use (EEA 2001; EIA 2009), the present work focuses on the investigation of the role of data-analytics-driven IS in stimulating energy efficient behavior among private households while achieving low abatement cost (spendings per kilowatt-hour (kWh) saved) and realizing gains in terms of customer retention or upselling. Recently, Loock et al. (2013) have shown that information systems providing specific feedback on individual households are extremely valuable for energy consultancies, e.g., to identify households that show a mismatch between energy demand and household characteristics, to formulate suitable saving advice that reflect disposable income, appliance structure, etc., and to design targeted motivational cues that engage customers into energy efficiency campaigns. In general, recent consumption feedback studies documented savings in the range of 2-6%, with larger effects occurring in settings where the feedback is specifically tailored to the recipient (Fischer, 2008; Van Houwelingen, 1989; Ayres et al., 2013; Tiefenbeck et al., 2013; Vassileva, 2012).

Despite strong evidence that specific information on a recipient of an behavioral intervention can boost the performance of saving campaigns (Allcott & Mullainathan, 2010), large-scale implementations that utilize these insights have been dismissed with the argument of high costs: gathering household information might be possible for research studies but is often too expensive when targeting hundreds of thousands of households. Moreover, very limited, incomplete, and inaccurate responses on questionnaires (Jayaweera & Hossein, 2013) make surveys inappropriate as data source for large-scale deployments. This is especially unfortunate for the emerging smart metering infrastructure that makes high-resolution consumption data available which, without further information on individual households, cannot be turned into truly effective decisions in the form of feedback interventions.

We argue that classification of utility customers according to their intention to sign-up for an energy efficiency service based on electricity consumption data (that is available for billing purposes already) is possible at low cost and at scale, thereby solving an eminent business problem of a large industry, improving the effectiveness of energy conservation campaigns, and ultimately increasing the customer value and adoption of related services. Thus, the core research-guiding question of our paper can be formulated as follows:

How can supervised machine learning be used to predict household intention to register on the energy efficiency web portal?

This article reports the results of the project conducted in collaboration with BEN Energy AG, a Swiss company developing cloud-based software for the utility industry in private sector housing. A joint project was undertaken to develop and test in field studies machine learning algorithms that determine household characteristics (such as probability to sign-up on energy efficiency web portals, interest on green power/ biogas, type of heating, potential to produce/ consume photovoltaic energy, etc.) based on energy consumption data, customer core data (location, title, age, etc.), and other consumption related

Page 3: Gaining IS Business Value through Big Data Analytics: A ... · Gaining IS Business Value through Big Data Analytics Thirty Sixth International Conference on Information Systems, Fort

Gaining IS Business Value through Big Data Analytics

Thirty Sixth International Conference on Information Systems, Fort Worth 2015 3

information (neighborhood, socio-demographical statistics, weather, etc.). This supports the design and advancement of customized measures that promote an efficient electricity use. The obtained portfolios of customers' characteristics can also be successfully used for elaboration and adoption of efficiency promoting tariffs, prediction and shifting of electricity- and gas-demand, as well as for targeted marketing campaigns for ecological products and services.

A field experiment with two different treatment groups (randomized controlled trials) was launched to engage customers of one of the largest utility companies in Germany to register on an energy efficiency web portal developed by BEN Energy AG, which is designed to motivate customers to reduce their electricity consumption. A similar portal was described by Loock et al. (2013); it allows consumers to periodically record their electricity meter readings and provides feedback on their consumption behaviour.

The real-world consumption-relevant data was used by the Support Vector Machines (SVMs) algorithm to select the customers who are likely to sign-up. The training and test data stem from the Swiss utility company. The results suggest that the targeted campaigns reduce the costs by more than 50%.

The remainder of this paper is organized as follows. After having outlined the problem of IS value creation through big data analytics, we review prior Green IS research to delineate the research gap addressed by the present study. We then present our data analytics approach implementing a SVMs algorithm to find residential utility customers for the purpose of campaigns targeting. This approach lays the foundation for the design and implementation of a real world Green IS as well as for its evaluation. We continue with description of the practical demonstration phase, in which we used a sample of 5’000 electricity consumers. For evaluation purposes, we provide details of the data analysis and the results of the statistical tests. We close with a discussion of our main findings, theoretical and practical implications, limitations, and suggestions for further research.

IS Business Value through Data Analytics Much attention is currently being paid in both the academic and practitioner literatures on IS, management and social science research to the value that organizations could create through the use of big data and business analytics (Sharma et al., 2014; Constantiou & Kallinikos, 2015; Gillon et al., 2012; Mithas et al., 2013). Recently, the IBM Tech Trends Report (2011) identified business analytics as one of the major technology trends characterized by usually low risks and quick paybacks, based on a survey of over 4’000 information technology professionals from 93 countries and 25 industries. Chen et al. (2012, p. 1166–1168) suggest that business analytics and related technologies can help organizations to ‘better understand its business and markets’ and ‘leverage opportunities presented by abundant data and domain-specific analytics’. As Markus (2015) points out, the potential consequences of big data analytics applications for organizations, individuals and society as a whole go far beyond the confines of the strategic management. The data trail left by people is useful for companies to manage employees, investigate markets, streamline, and automate core business processes, as well as target and personalize products and services for clients and customers, based on developing algorithms that can make predictions about individuals by recognizing complex patterns in data sets compiled from multiple sources (Newell & Marabelli, 2015; Goes, 2014; Constantiou & Kallinikos, 2015). For instance, Davenport (2014) describes examples of business analytics using extensive data sources, including Customer Relationship Management (CRM) and sales management systems, as well as video, GPS and Wi-Fi.

The Challenge of Sense-Making in Machine Learning Lycett (2013) argues that even though business analytics tools make it easy to spot statistical patterns, trends, and relationships, the critical step of understanding the causes behind those patterns is still important in order to undertake actions that generate value. Shanks et al. (2010) and Sharma et al. (2013) suggest that the process of generating insights from data should involve actors from different parts of the organization, including analysts and business managers, as they use the data and analysis as a means to understand the phenomena that the data represent. This process known as ‘datafication’ (Lycett, 2013) implies selection of relevant datasets and development of meaningful machine learning algorithms. Furthermore, since there is no one-to-one correspondence between an insight and a specific course of action to exploit that insight, it is a part of organizational decision process to convert insights into

Page 4: Gaining IS Business Value through Big Data Analytics: A ... · Gaining IS Business Value through Big Data Analytics Thirty Sixth International Conference on Information Systems, Fort

Gaining IS Business Value through Big Data Analytics

Thirty Sixth International Conference on Information Systems, Fort Worth 2015 4

decisions, such as matching knowledge about households and habits of its inhabitants with appropriate efficiency measures or products. An interpretive paradigm is involved not only into analyzing information requirements, applying algorithms and accepting the insights generated via machine learning as being valid and useful, in ‘deciding’ to deploy them to run operations in an unguided manner, and in ‘accepting’ the refinements to the algorithms generated via machine learning as being valid. Human sense making is also a crucial point for the machine learning process itself, independently on the algorithms being used. In particular, the unsupervised learning methods group data based observations into similarity clusters (‘patterns’) that require interpretation by an analyst to enable conversion of these insights into actions. In supervised learning, extraction of features from the data for dimensionality reduction is an engineering task requiring analytical expertise.

Big Data and Business Analytics in Energy Sector Green IS can play a key supporting role in enhancing performance by strengthening the influence of top management commitment (managerial competency) on green product and service design and green manufacturing (transformation- based competencies) (Jayaraman, 2014). From an organizational perspective, sustainability value has been widely operationalized as endeavoring to achieve societal goals within commercial goals in such a way as to optimize social, environmental, and economic dimensions simultaneously. For example, Watson et al. (2010) and Loos et al. (2011) argue for applying IS thinking and skills to reduce energy consumption and CO2 emissions. Hereby, ‘Responsive’ Green IS aim to mitigate harmful value chains, for example by simulating a smart online city to raise awareness of climate and energy issues and to drive engagement, or by optimizing driving routes in terms of reduction of CO2 emissions (Malhotra et al., 2013; Washburn et al., 2009). ‘Strategic’ Green IS allow companies to proactively transform value chain activities to benefit society both economically and environmentally, for example by developing carbon management software, using it internally and selling it externally (Malhotra et al., 2013).

From the utilities perspective in private housing sector, efficiency regulation, changing market models, and increasing market liberalization result in severe pressure on companies’ revenue. An intensive customer engagement is therefore of ultimate importance to reduce churn rates and to tap new sources of growth and to establish new business models. Although utilities have a large customer base, yet their knowledge about individual households is small. This adversely affects both the development of innovative, household specific services and the utilities’ key performance indicators (KPIs). Due to the capability to gain customer insights, business analytics and machine learning can help utilities to improve both monetary KPIs (e.g., higher customer satisfaction from targeted service development and offerings) and environmental KPIs (e.g., share of green tariffs, reach of energy efficiency campaigns). The latter will finally translate into energy efficiency gains as targeted campaigns result in higher signup rates and thus lower cost per kWh saved.

A series of interviews provided by BEN Energy AG with mid-sized and large utility companies in Germany, Switzerland and Czech Republic, have indicated average spending for data analytics in the scope of customer segmentation, targeting and engagement amounting to EUR 0.15 per year and household in 2014 and expected to grow to EUR 0.90 in 2018, and to EUR 2.50 in 2022. It means that the total spending on relevant analytics services among utility companies will grow from EUR 32.5mn in 2014 to EUR 216.8mn in 2022.

In overall, the potential significance of data analytics in private energy sector is threefold:

Economic significance: targeted, transparent and cost-effective managerial actions toward residential energy consumers. The tailored interventions enabled by the gained household- specific information may yield considerably lower cost per kWh saved (or shifted) than rebates and meet higher public acceptance than prohibitive regulations. For instance, load shifting, billing process, customer (call center) services, and claims management of utility companies will benefit from the application of the personalized information. Selecting customers for load shifting campaigns considerably benefit from knowing when inhabitants are at home, conservation campaigns tailored to kids from some knowledge of the number of minors in a family, and promotions for thermal insulation yield much higher return rates if specifically addressed to individuals who live in houses rather than in apartments, to name just a few examples.

Page 5: Gaining IS Business Value through Big Data Analytics: A ... · Gaining IS Business Value through Big Data Analytics Thirty Sixth International Conference on Information Systems, Fort

Gaining IS Business Value through Big Data Analytics

Thirty Sixth International Conference on Information Systems, Fort Worth 2015 5

Social significance: add-value services. Individual consumers may benefit from closer knowledge of own energy consumption (e.g., through detailed bills with comprehensive consumption analyses and in individual benchmarks), tailored saving recommendations, personal advices regarding potentially appropriate efficiency products, services and home automation systems, along with other add-value services.

Ecological significance: energy conservation and decrease of CO2 emissions. Efficiency campaigns proved to be much more effective in triggering behavioral change if saving advice is household specific and if measures utilize comparisons (social norms) that reflect the household characteristics (Bohner & Schlüter, 2014). Given a fixed budget, lower cost per kWh saved or shifted leads to larger total effects.

Research Objective and Theoretical Context Based on strong hypothetical evidence that big data analytics can become an important component of Green IS and create value to organizations, as discussed in the previous sections, the objective of present research is to demonstrate practical feasibility of machine learning for improving decision making in utility companies using real-world data. The research employs a design science approach and a field study, which could serve as blueprint for future research and application.

In particular, we seek to answer the following four questions:

Question 1 (Q1): Can a supervised classification algorithm find more utility customers who would register on the energy efficiency web portal than by using random customer addressing?

Question 2 (Q2): Do the sigh-up predictions that are based on the model from one geographical region hold for other regions?

Question 3 (Q3): Can the model be used to increase the number of signups by addressing a fixed number of customers?

Question 4 (Q4): Can the model be used to determine the optimal number of customers that have to be addressed?

We rely upon the Simon’s classic model of decision making as a three-step process comprising intelligence, design, and choice (Simon, 1947), and on the respective Scharma et al.’s (2014) guidelines for translating data into decision value. These guidelines imply three following stages of using business analytics to obtain KPI gains: the data to insight stage, the insight to decision stage, and the decision to value stage. Accordingly to these guidelines, we elaborated a general procedure to reap value of private energy consumers from the consumption relevant data, which is schematically illustrated in Figure 1. This procedure is associated with four main information blocks: (1) Data, including all potentially relevant data sources about residential customers for the utility’s problem at hand; (2) Insights, which mean knowledge of the individual household characteristics needed to launch targeted interventions; they should be inferred from the initial data using machine learning methods; (3) Interventions are targeted campaigns toward the households with specific characteristics; (4) Value is a result of targeted interventions by the utility company; the information about gained value can be consequently used for the improvement of future recognition of dwelling characteristics and campaigns. We then followed this procedure for developing and testing in filed experiments machine learning algorithms to predict signup rates on the energy efficiency web portal of a large utility company in Germany, using real-world energy consumption and location data of private dwellings. The procedure and results of our experiments are described in the following sections.

Page 6: Gaining IS Business Value through Big Data Analytics: A ... · Gaining IS Business Value through Big Data Analytics Thirty Sixth International Conference on Information Systems, Fort

Gaining IS Business Value through Big Data Analytics

Thirty Sixth International Conference on Information Systems, Fort Worth 2015 6

Figure 1. Gaining Green IS business value through household consumption data analytics

Experimental Design We approach this research as a design science research project (Hevner et al., 2004) with the focus on building a supervised classification algorithm as an IT artifact, and use the principles underlined by Peffers et al. (2008), since our research is motivated to develop a novel artifact in order to solve an existing organisational problem. Moreover, the presented artifact was tested within a field experiment: a customer engagement campaign of a large utility company in Germany. In order to answer the questions Q1-Q4, we built two treatment groups. Group 1 (control) included a sample of 5’000 randomly selected customers from the set of 109’270 households. Group 2 (experimental) included 5’000 customers from the remaining household base selected by our machine learning algorithm. All 10’000 customers were sent identical physical letters in November 2014 with an invitation to register to an online service of this utility. The registration data was evaluated four weeks later.

Machine Learning for Converting Data into Insights Our analysis relies on supervised machine learning (classification, particularly Support Vector Machines) to infer customer engagement potential based on consumption-relevant data (annual energy usage, location information, salutation, etc.). In general, the classification problem refers to the assignment of observations into predefined unordered homogeneous classes (Chaochang, 2002; Zopounidis & Doumpos, 2000). Supervised classification implies that the function of mapping objects described by the data into categories is constructed based on so called training instances – data with respective class labels or rules. This is realized in a two-step process of, first, building a prediction model from either known class labels or using a set of rules, and then automatically classifying new data based on this model (Thrun & Pratt, 2012; Blum & Langley, 1997). This approach is useful to gain knowledge about customer characteristics based on the data describing individual consumption behavior. Bijmolt et al. (2010) define supervised machine learning as an important element of customer engagement analytics. An early example of this approach in the energy domain is a classification by Chicco et al. (2004) used to detect

4. Value

Machine learning

3. Targeted interventions

Efficiency products

Normative feedback

Saving tips

Customized energy consulting

Gaining Green IS business value through household consumption data analytics

Green/flexible tariffs

Customer engagement solutions

Weather

Core customer data

Prices/Tariffs

1. Data

Entries on the Web-portals

Energy consumption

Socio-demo-graphics

2. Insights

Type of heating

Eco friendly

Energy saving potential

Solar panel potential

Social class

Family with children

Customers

Increased revenues

Customer loyalty

New data for learning

Customer surplus

Energy efficiency

Income forecasts

1

2

3

4 5

6

Page 7: Gaining IS Business Value through Big Data Analytics: A ... · Gaining IS Business Value through Big Data Analytics Thirty Sixth International Conference on Information Systems, Fort

Gaining IS Business Value through Big Data Analytics

Thirty Sixth International Conference on Information Systems, Fort Worth 2015 7

examples of inefficient billing practices in non-residential buildings. Tsekouras et al. (2007) and Mutanen et al. (2011) use automatic labeling for the classification of electricity consumers and for the study of electricity behavior of each customer. Recently, Beckel et al. (2014), Sodenkamp et al. (2015), and Hopf et al. (2014) applied different classification methods to derive predefined energy efficiency relevant household characteristics (such as age of house, floor area, number of residents, social class, entertainment devices, etc.) from smart meter data. Knowledge about such characteristics can be used to develop novel tariff schemes, improve network management, or to perform load forecasting.

The upper part of Figure 2 shows two main components of our IT artifact: feature extraction and classification. The feature extraction component takes as an input consumption related data of a household and computes over it a set of representative values, i.e. independent features that each correlate well with the class, as suggested by Domingos (2012). These are useful for inferring previously specified class labels within the household classification component (Beckel et al., 2014).

Figure 2. Design of the IT artifact. The upper part of the picture shows the main components of supervised machine learning: feature extraction and classification. The bottom part shows the

design steps necessary to specify these components.

The definition of features is a crucial step in data modeling related to the extraction of representative low-level features (such as relative change in consumption, statistics, etc.) and converting the data into a usable format (Domingos, 2012). It is an engineering task, and there is no single generally accepted method for reducing the data to a handful of important characteristics (Domingos, 2012). That is where expertize of an analyst plays a significant role. Furthermore, dimension reduction techniques can be used to reduce the complexity of the problem if the data volume is large (van der Maaten, 2009).

The definition of class labels is another stage requiring an input of skillful managers and psychologists, as classes represent the insights to be gained and underlay the measures to be taken toward the households (Beckel et al., 2012). In supervised machine learning, knowledge about the actual class labels is required for a sample of objects (‘training set’). If such data is not available, it should be possible to solicit this information (e.g., through surveys or third parties). The training data must be unbiased and collected from a random sample of customers.

Choice of the classifier depends of the problem and data at hand; it is under responsibility of the involved analysts. The classifiers typically differ in terms of implementation and computational complexity, or based on the assumptions regarding data distribution. Additionally, the internal classifier parameters (‘hyperparameters’) should usually be manually adjusted or optimized with a selected technique (Hossain et al., 2013; Bergstra et al., 2011).

Furthermore, the ‘test data’ with known class labels is left out for the final stage: evaluation of the classifier generalization power. Depending on the required properties of the classifier (e.g., should all classes be recognized only one particular class is of interest) an appropriate metric should be chosen (e.g., precision, accuracy, recall).

• Probability to switch to a green tariff • Type of heating (electric/ not electric)

Input

!!Feature extraction

Household classification

Output

Syst

em

desi

gn

Definition of features

Definition of class labels

Choice of classifier

Results evaluation

Machine learning

Energy consumption data

of a household

Page 8: Gaining IS Business Value through Big Data Analytics: A ... · Gaining IS Business Value through Big Data Analytics Thirty Sixth International Conference on Information Systems, Fort

Gaining IS Business Value through Big Data Analytics

Thirty Sixth International Conference on Information Systems, Fort Worth 2015 8

In some cases, the defined features may be meaningless for the algorithm due to the lack to correlation with to be derived characteristics, the problem referring to as over-fitting. Hereby, an analyst must apply suitable feature selection with the help of cross-validation. This means a division of the training set into random disjoint subsets and iteratively usage of each subset for validation, along with the others for training. Thus, all combinations of features are tested and the best performing ones are selected.

Experimental Data For our field studies, we used four following real-world datasets provided to us by BEN Energy AG.

Dataset A contains 10'000 entries on electricity customers of a mid-sized Swiss utility company (Company A).

Dataset B contains 10'350 entries on electricity customers of another mid-sized Swiss utility company (Company B) serving households in a different geographical area than Company A.

The variables in both datasets A and B provide salutation, anonymized address information (street, postal code, and city), as well as annual energy consumption in kWh for three consecutive years. For each year, the information is available on how many days the consumption occurred. We used datasets A and B as a basis for the algorithms training and performance tests, as information regarding registration for the targeting service is available for each customer from these sets.

The customers in data samples A and B were randomly selected from the entire customer bases of the utility companies, taking into account the following constrains:

• Annual energy consumption is less than 40'000 kWh. This was done to ensure that the chosen customers are household and not small enterprises. The threshold value was chosen empirically, based on the distribution of consumption values of known households.

• Households are on the street with at least 9 other customers of the same utility. This was done to enable normative feedback (i.e., neighborhood comparison) during the engagement campaign.

Dataset C1 contains 109’270 entries on electricity customers of a large German utility company (Company C). Similarly to the datasets A and B, variables in C1 provide salutation and anonymized address information (street, postal code, and city), as well as annual energy consumption. In contrast to datasets A and B, the energy consumption was available for only one year. This dataset was used to predict the probability to register for the targeted service.

Dataset C2 contains 173’626 entries on other electricity customers of Company C from the same region as customers in C1. The variables describe the anonymized address information, (street, postal code, and city) and annual energy consumption for one year. The salutation is not given here.

The households in the dataset C1 were selected randomly from the entire customer base of the utility company C. The households in C2 were selected in such a way, that for each street in the dataset C1 where are at least 10 households from C1 and C2 with available consumption information.

The data entries represent typical residential customers of the utility companies. The annual consumption distribution of the considered datasets is represented in Figure 3 (on the logarithmic scale).

Page 9: Gaining IS Business Value through Big Data Analytics: A ... · Gaining IS Business Value through Big Data Analytics Thirty Sixth International Conference on Information Systems, Fort

Gaining IS Business Value through Big Data Analytics

Thirty Sixth International Conference on Information Systems, Fort Worth 2015 9

Figure 3. Histogram of the annual electricity consumption of households

Analysis In this section, we address questions Q1-Q4 by operationalizing our IT artifact and setting the focus on the household consumption readings and customer core data described in the previous section.

Question 1 (Q1): Can a supervised classification algorithm find more utility customers who would register on the energy efficiency web portal than by using random customer addressing?

To answer this question, we applied the classification procedure to predict household memberships to either class "Signed up" or "No response". The training data is provided by the dataset A. We split this dataset into a training set (90%) and a test set (10%). These proportions are common for the supervised learning tasks (Zaki & Wagner, 2014).

To evaluate the classification results, it is necessary to compute a function of four quantities: the true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). Given a target class ‘Signed up’ and another class ‘No response’, TP indicates the number of samples of ‘Signed up’ that are correctly classified as ‘Signed up’. TN denotes the number of samples of ‘No response’ that are correctly classified as ‘No response’. FP counts the number of samples of ‘No response’ that are incorrectly classified as ‘Signed up’ and, finally, FN indicates the number of samples of ‘Signed up’ that are incorrectly classified as ‘No response’. These four values form the so-called contingency table, presented in Table 1.

Predicted values Real values

Signed up No response

Signed up True positive (TP) False positive (FP)

No response False negative (FN) True negative (TN)

Table 1. Generic contingency table

Since targeting implies an attempt to identify which consumers should be made an offer based on their prior behavior (Ha et al., 2002), and we aim at finding the customers to send an invitation to register on the web portal, only the customers identified as ‘Signed up’ are interesting for the campaign. Using the algorithm described below, utility companies would only address the customers categorized as ‘Signed up’

Annual electricity consumption, kWh

Den

sity

0.

00.

51.

01.

5

density.default(x = C1$mlc/log(10))

N = 109270 Bandwidth = 0.02455

Den

sity

0 10 1'000 100'000

Dataset A Dataset B Datasets C1 and C2

Page 10: Gaining IS Business Value through Big Data Analytics: A ... · Gaining IS Business Value through Big Data Analytics Thirty Sixth International Conference on Information Systems, Fort

Gaining IS Business Value through Big Data Analytics

Thirty Sixth International Conference on Information Systems, Fort Worth 2015 10

to maximize the mailing response rate, or in other words, to minimize the engagement cost per customer. The achieved signup rate is the same as the classification precision (Pr), which can be calculated by:

Pr = TP/(TP + FP).  

The precision can take values between 0 and 1, where 0 means that no customers signed up to the service, and 1 means that all customers signed up. To optimize the number of signups, the precision for the class ‘Signed up’ must be maximized.

As a classifier, we use SVMs with standard Gaussian kernel. The SVMs creates a non-linear mapping from the feature space to a higher dimensional space. The decision boundary is created by separating different classes with linear hyper-surface and by maximizing the distance to the hyper-surface (Cortes & Vapnik, 1995; Zaki & Wagner, 2014). Further, we use variable class weights to assign different priorities to the misclassification of different classes in the optimization problem (Joachims, 1998). Such weights are needed, because the number of sing-ups is much smaller than the total number of the addressed customers. The class weights are set accordingly to the class distribution in the dataset A, as 20 for the positive class ("Signed up") and 1 for the negative ("No response") class. To determine the individual signup probabilities, we use the probabilistic version of the SVMs by Platt (1999), which transforms the classifier scores to probabilities with the help of logistic regression (Niculescu-Mizil & Caruana, 2005).

At the next step, we defined and calculated five features to be used by the prediction algorithm. For this, we used our expertise and interviews with utility representatives. The goal was to construct the features that should be able to draw distinctions between the classes. As only one to three consumption data points are available for each household, a large number of features could lead to over-fitting.

(i) The basic feature is the overall consumption (OC), which can be calculated from the annual consumption (AC) and the number of consumption days (NCD):

OC   =  AC   ∗     365  /  NCD .

Since the consumption is highly skewed, we apply a log transformation to the consumption values to get a symmetric distribution, as shown on Figure 3. Further, we use the standard SVMs with a symmetric Gaussian kernel. The log transformation helps to reduce the absolute errors (Schölkopf & Smola, 2002).

The motivation behind the choice of this feature was the assumption that customers with high consumption might be interested in the registration on the portal to better track own progress.

(ii) As the second feature, we consider the consumption trend expressed by the relative change in the log consumption.

Similarly to the previous point, we presumed that the households with increasing consumption might be interested to register on the portal to get interactive assistance toward consumption reduction.

(iii) Next, we consider different approaches to neighbourhood comparison as the features.

By constructing this feature, we expected to detect households that consume more than their neighbors and might want to cut down their energy costs and reduce the environmental burden.

• The first possibility is to consider neighbourhood at the postal code level. This creates large and homogenous in size areas.

• The second possibility is to take the street as a neighbourhood. This creates smaller areas that, due to the big differences in street lengths, are inhomogeneous in size. Therefore, we use this definition.

For each neighborhood, the mean logarithm consumption and standard deviation are calculated. The neighborhood comparison is made based on the Z-scores. Since the household composition and their electricity consumptions are more homogenous (e.g., similar building type, house age, and social class) in a single area, this feature allows us to measure the difference in consumption between similar households. The relative contribution of this feature to the classification precision is about 4%.

The relative contribution of features (i)-(iii) to the classification precision is 26%.

Page 11: Gaining IS Business Value through Big Data Analytics: A ... · Gaining IS Business Value through Big Data Analytics Thirty Sixth International Conference on Information Systems, Fort

Gaining IS Business Value through Big Data Analytics

Thirty Sixth International Conference on Information Systems, Fort Worth 2015 11

(iv) All datasets contain the salutation that we bring to one of the three formulations (‘Madam’, ‘Sir’, or ’Madam and Sir’) that we use as a feature. The idea behind this feature was to distinguish between the behaviour of men and women, as well as singles and families. However, the salutations do not necessarily reflect the real household members, which impair its usefulness.

(v) As the last feature, we estimate the city size by calculating the number of different customers from each city in the datasets. Our motive for building this feature was to differentiate between the behaviour of inhabitants of urban and suburban areas.

By investigating the IT artifact in detail, we can interpret how the selected features correlate with signup probabilities:

• The households with average consumption are more likely to sign up than the household with low or high consumption.

• The households where the consumption increased or stayed nearly the same are more likely to sign up than the households that decreased their consumption.

• The households that consume more than other households in their neighbourhood are more likely to sign up.

The model was evaluated on the test set. The results are presented in the form of contingency matrix in Table 2. A utility company would use the predicted classes to only address the customers who are classified as ’Signed up’. With this method, a signup rate of 6.7% (32/479) could be achieved. Without the targeting approach, the company would have to address either all or a random subset of customers thereby achieving a signup rate of 4.6% (46/1000). This corresponds to an improvement of 46% of a targeted selection compared to the random selection. This means that the utility company can perform a more effective campaign by using the prediction results. The actual signup rate increases, because both classes ’Signed up’ and ’No response’ can be identified better than by random chance. While comparing the signup rate achieved using the baseline random method (4.6%) with the signup rate resulting from the proposed classification (6.7%), the denominator and numerator decrease to 48% of the baseline (from 1000 to 479) and to 70% of the baseline (from 46 to 32) respectively. For the random selection, the decrease of both denominator and numerator would be equal. Since the decrease of the numerator is smaller in our case, we can improve the classification precision by 146% (70%/48%) compared to the random selection. Therefore, we can answer Q1 positively: A utility company can find more customers who would register on the energy efficiency web portal using supervised classification than by using random customer addressing.

Predicted classes Real classes

Total

Signed up No response

Signed up 32 447 479

No response 14 507 521

Total 46 954 1000

Table 2. Results of the classification of the test set from dataset A

Question 2 (Q2): Do the sigh-up predictions that are based on the model from one geographical region hold for other regions?

The datasets A and B stem from different utility companies in different regions, and can be used to answer Q2. For this, we can use the model constructed from dataset A on the previous step and apply it to the dataset B. The results are presented in Table 3. Here, the random selection would achieve a signup rate of 5.0%, while addressing only the identified potential customers leads to a signup rate of 6.6%. This means an improvement of 32% of a targeted campaign compared to random selection. We can therefore conclude that the calculated model generalizes the data and answers Q2 positively.

Page 12: Gaining IS Business Value through Big Data Analytics: A ... · Gaining IS Business Value through Big Data Analytics Thirty Sixth International Conference on Information Systems, Fort

Gaining IS Business Value through Big Data Analytics

Thirty Sixth International Conference on Information Systems, Fort Worth 2015 12

Predicted classes Real classes Total

Signed up No response

Signed up 294 4157 4451

No response 220 5679 5899

Total 514 9836 10350

Table 3. Results of application of the model to dataset B

Question 3 (Q3): Can the model be used to increase the number of signups by addressing a fixed number of customers?

To answer Q3, we use the operationalized IT artifact and calculate the sigh-up probability for each household with the help of probabilistic version of SVM (Platt, 1999). The customers can then be ranked based on the estimated probabilities. Using the model, we would only address the most likely customers (Saharon et al., 2005).

Addressing more customers leads to a larger number of signups, but this dependency is linear only if the random selection is used. Choosing a small number of top-ranked customers results in a higher signup rate, but still a small absolute number of signups. On the other hand, choosing many customers does not necessarily lead to improvements over random selection, because nearly all (even low-ranked) customers are addressed. The dependency between the absolute number of signups and the number of addressed customers is represented in the figure 4.

If a company wants to address a fixed subset of customer, then figure 4 can be used to find the predicted number of signups with the model and with random selection. For instance, if 40% of the customers are addressed using random and targeted selections, then 1.8% and 2.5% of them respectively do signup. In this case, the targeted campaign leads to an improvement of 39%. As a result, Q3 can be answered positively.

Furthermore, we can validate the earlier given answer on Q3 by applying our IT artifact to the datasets C1 and C2. The feature ‘consumption trend’ was not used at this stage due to the single consumption value point for one year. We have repeated the performance tests with datasets A and B by leaving this feature out. The results are presented in Table 4.

3 year consumption 1 year consumption Random selection

Dataset A Dataset B Dataset A

Dataset B Dataset A Dataset B

Signup rate 6.6% 6.6% 5.6% 5.8% 4.6% 5.0%

TP 32 294 25 231 46 514

FP 447 4157 421 3752 954 9836

Table 4. Signup rates for the model with 1 and 3 years of consumption information

Page 13: Gaining IS Business Value through Big Data Analytics: A ... · Gaining IS Business Value through Big Data Analytics Thirty Sixth International Conference on Information Systems, Fort

Gaining IS Business Value through Big Data Analytics

Thirty Sixth International Conference on Information Systems, Fort Worth 2015 13

Figure 4. Dependency between the number of addressed customers and the number of signups.

The results of the campaign are presented in table 5. The service registration rate in the control group was only 2.1%, whereas the signup rate in the experimental group reached 4.4%. This corresponds to the relative improvement of 204%. The signup rate corresponds to the proportion of the signed up customers in both groups. To compare the proportions between both groups we applied the Fisher test (Fisher, 1922). The difference is significant at p<0.01. This shows that the application of the supervised learning model to select the best customers improves the signup rate (positive answer to Q3).

An important note is however, that the signup rate of the control group is smaller that the signup rate in the datasets A and B. The predicted signup rates are also smaller than the resulting signup rates. This can be explained by the assumption that the signup rates for random selection are identical among different groups. The most likely reason for these discrepancies is that the campaigns were conducted in different regions (Germany vs. Switzerland).

Signed up No response Total Signup rate

Control group 107 4893 5000 2.14%

Experimental group

218 4782 5000 4.36%

Table 5. Results of the 3rd mailing experiment

Question 4 (Q4): Can the model be used to determine the optimal number of customers that have to be addressed?

To answer Q4, we need to evaluate the benefits and costs of addressing individual households. The engagement costs per household have to be balanced in terms of the respective consumer profit margins.

Portion of customers addressed [%]

Port

ion

of c

usto

mer

s sig

ned

up [%

]

Number of registrations for service depending on the number of addressed customers

Model selection Random selection

Page 14: Gaining IS Business Value through Big Data Analytics: A ... · Gaining IS Business Value through Big Data Analytics Thirty Sixth International Conference on Information Systems, Fort

Gaining IS Business Value through Big Data Analytics

Thirty Sixth International Conference on Information Systems, Fort Worth 2015 14

Given the benefit (𝐵) and the cost (𝐶) for a campaign per household, the total benefit (𝑇) can be calculated from the number of signups (𝑁!) and the total number of addressed customers (𝑁!):

𝑇=𝐵 ∗ 𝑁! - 𝐶 ∗ 𝑁!

Moreover, it is possible to calculate the expected value (E) of each individual customer using the predicted signup probabilities (𝑝):

𝐸 =  𝑝 ∗  𝐵 − 𝐶  

To maximize the total benefit, the campaigns should be launched for the customers with a positive expected value. We can conclude from

𝑝   ∗  𝐵   −  𝐶   >  0  ⇔  𝑝   ∗  𝐵  /  𝐶   −  1   >  0  ⇔  𝐶/𝐵   >  𝑝,  

that for the customers the break even point for positive expected value is the signup probability higher than 𝐶/𝐵. Hence, this model renders finding the optimal subset of customers possible. This leads to the positive answer on question Q4.

For a more detailed cost benefit analysis, we can calculate the expected value of addressing a specific portion of customers. Figure 5 represents the expected total value as the sum of the expected values, that can be achieved by addressing top-ranking customers. The numbers used in this figure were calculated based on the concluded campaign, but are presented as relative numbers to present a generally useful results. As described above, the maximum benefit is reached at the point where the expected value is zero and decreases afterwards. Addressing too many customers can also lead to losses, if the overall signup rate is low enough. In this case, the expected benefit from addressing a single random customer is negative and the campaign would be unprofitable without targeted customer selection, or with a random selection. This is the main motivation for completion of targeted marketing campaigns (Kim & Street, 2004).

Further we can forecast the value of the campaign by applying the model, but use the actual signups to calculate total benefit 𝑇. These values are presented in Figure 5. The experimental results for the total benefit nearly coincide with the expected total benefits.

Discussion and Conclusions In this paper, we have demonstrated how data analytics and supervised learning can become a truly valuable and highly scalable mechanism of decision support in energy utility companies. As an integral part of energy efficiency endeavors and green information systems in organizations, big data analytics enables gaining insights about millions of individual customers within shortest times. From the company’s perspective, these insights help to increase customer retention, strengthen customer engagement, and establish new sales channels. Ultimately, this yields considerably lower cost per kWh saved (or shifted) than rebates and meet higher public acceptance than prohibitive regulations. Load shifting, marketing, billing process, customer (call center) services, and claims management can benefit from the utilization of the household specific information.

The customers benefits include but are not limited to closer knowledge of own consumption, tailored normative feedback, offerings of appropriate efficiency products/services, effective home automation systems, and other add-value services. Nevertheless, privacy protection mechanisms are crucial to carefully balance the positive outcomes (better, more interesting efficiency services) and the ‘dual’ effects (potential threats to privacy) (Pool, 1983; Markus, 2014; Newell & Marabelli, 2015; Markus, 2015).

The presented approach demonstrates how to use yearly consumption data of private households collected by conventional meters that is already available to almost all utility companies. The adoption of smart electricity meters will enable collection of more detailed consumption data, which might help to gain more valuable insights. Privacy, however, is a significant barrier for smart metering adoption so far.

Page 15: Gaining IS Business Value through Big Data Analytics: A ... · Gaining IS Business Value through Big Data Analytics Thirty Sixth International Conference on Information Systems, Fort

Gaining IS Business Value through Big Data Analytics

Thirty Sixth International Conference on Information Systems, Fort Worth 2015 15

Figure 5. Predicted profits for addressing different number of customers

The results of field studies helped us to empirically answer the question about the feasibility of data analytics and IS value creation in private housing sector. Based on the real-world consumption data of more than 300’000 households in Germany and Switzerland, we have developed and tested algorithms for selecting customers who are likely to register to an energy efficiency service. Consequently, the developed methods helped to double the signup rate, or in other words, to half the campaign cost. The developed model was based on the data that is typically available to utility companies. Combined with the presented model transferability between different geographical regions, it presents a tremendous potential for immediate large-scale application.

Beyond the immediate results of our experimental studies, we believe that this paper opens a widow to the new research opportunities surrounding the application of Data Analytics and Green IS. The presented approach may be taken as a blueprint for solving other similar problems, e.g., identification of customers for targeted tariff upselling and heating system upgrade advertisement, churn rate prediction, etc. Thereby, the acquisition of relevant data sources, identification of classification variables for dimensionality reduction, as well as construction of prediction models is a cornerstone of future development. Jointly with powerful interventions, such data-driven information systems make it possible to better understand customer habits and yield a good cost-benefit ratio.

Profit for mailing cost of 1 EUR and signup benefit of 17 EUR

Portion of customers addressed [%]

Tota

l pro

fit p

er c

usto

mer

(Ben

efits

-cos

ts) [

EUR

] Total benefit Expected total benefit

optimal range

Addressing too many customers

leads to losses

Page 16: Gaining IS Business Value through Big Data Analytics: A ... · Gaining IS Business Value through Big Data Analytics Thirty Sixth International Conference on Information Systems, Fort

Gaining IS Business Value through Big Data Analytics

Thirty Sixth International Conference on Information Systems, Fort Worth 2015 16

Notation Definition

AC Annual consumption

B Benefit

C Cost

E Expected value

FN False negative

FP False positive

NA Number selected customers

NS Number addressed customers

NCD Number consumption days

OC Overall consumption

p Predicted probability

Pr Precision

SVM Support vector machines

T Total cost

TN True negative

TP True positive

Table 6. Table of notations

References Allcott, H., and Mullainathan, S. 2010. "Behavioral science and energy policy". Science (5970:327), pp.

1204-1205. Ayres, I., Raseman, S., and Shih, A. 2013. "Evidence from two large field experiments that peer

comparison feedback can reduce residential energy usage". Journal of Law, Economics, and Organization (29:5), pp. 992-1022.

Baker, J., Song, J., and Jones, D. 2008. "Refining the IT business value model: evidence from a longitudinal investigation of healthcare firms". In Proceedings of the International Conference on Information Systems. Paris, France, Association for Information Systems, 14-17 December.

Beckel, C., Sadamori, L., Staake, T., and Santini, S. 2014. "Revealing household characteristics from smart meter data". Energy (78), pp. 397-410.

Beckel, C., Sadamori, L., and Santini, S. 2012. "Towards automatic classification of private households using electricity consumption data." Proceedings of the Fourth ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Buildings. ACM.

Bergstra, J.S., Bardenet, R., Bengio, Y., and Kégl, B. 2011. "Algorithms for hyper-parameter optimization". In Advances in Neural Information Processing Systems. pp. 2546-2554.

Bijmolt, T.H., Leeflang, P. S., Block, F., Eisenbeiss, M., Hardie, B. G., Lemmens, A., and Saffert, P. 2010. Analytics for customer engagement. Journal of Service Research, (13:3), pp. 341-356.

Blum, A. L., and Langley, P. 1997. "Selection of relevant features and examples in machine learning". Artificial intelligence (97:1), pp. 245-271.

Bohner, G., and Schlüter, L. E. 2014. "A room with a viewpoint revisited: Descriptive norms and hotel guests' towel reuse behavior". PloS one, (9:8).

Page 17: Gaining IS Business Value through Big Data Analytics: A ... · Gaining IS Business Value through Big Data Analytics Thirty Sixth International Conference on Information Systems, Fort

Gaining IS Business Value through Big Data Analytics

Thirty Sixth International Conference on Information Systems, Fort Worth 2015 17

vom Brocke, J., Watson, R.T., Dwyer, C., Elliot, S., and Melville, N. 2013. "Green information systems: Directives for the IS". Communications of the Association for Information Systems. pp. 33:30.

Campolargo, M. 2015. "EU must look to 'internet of things, big data and the cloud'". The Parliament Magazine: Politics, Policy and People. https://www.theparliamentmagazine.eu/articles/opinion/eu-must-look-internet-things-big-data-and-cloud (Accessed on 03.05.2015).

Chaochang, C. 2002. "A case-based customer classification approach for direct marketing." Expert Systems with Applications (22:2), pp. 163-168.

Chen, H., Chiang, R.H.L., and Storey, V.C. 2012. "Business Intelligence and Analytics: From Big Data to Big Impact." MIS Quarterly (36:4), pp. 1165-1188.

Chicco, G., Napoli, R., Piglione, F., Postolache, P., Scutariu, M., and Toader, C. 2004. "Load pattern-based classification of electricity customers." Power Systems, IEEE Transactions on (19:2), pp. 1232-1239.

Constantiou, I. D., and Kallinikos, J. 2014. "New games, new rules: big data and the changing context of strategy". Journal of Information Technology (30), pp. 44-57.

Cortes, C., and Vapnik, V. 1995. "Support-vector networks." Machine Learning (20:3), pp. 273-297. Davenport, T. H. 2014. "What Businesses Can Learn From Sports Analytics". MIT Sloan Management

Review, Summer 2014. Davenport, T. H. 2006. "Competing on analytics". Harvard Business Review (January), pp. 99–107. Davern, M. J., and Wilkin, C. L. 2010. "Towards an integrated view of IT value measurement".

International Journal of Accounting Information Systems (11:1), pp. 42–60. Domingos, P. 2012. "A few useful things to know about machine learning". Communications of the ACM,

(55:10), pp. 78-87. Dyllick, T., and Hockerts, K. 2002. "Beyond the business case for corporate sustainability." Business

Strategy and the Environment (11:2), pp. 130-141. EEA. 2001. Indicator Fact Sheet Signals 2001—Chapter Households, European Environment Agency,

Copenhagen. EIA. 2009. Annual Energy Review 2009, U.S. Energy Information Administration, Washington, DC. Fischer, C. 2008. "Feedback on household electricity consumption: a tool for saving energy?" Energy

efficiency (1:1), pp. 79-104. Fisher, R. A. 1922. "On the interpretation of χ2 from contingency tables, and the calculation of P". Journal

of the Royal Statistical Society (85:1), pp. 87–94. Gillon, K., Brynjolfsson, E., Griffin, J., Gupta, M., and Mithas, S. 2012. "Panel–business analytics: radical

shift or incremental change?", Proceedings of the 32nd International Conference on Information Systems (16–19 December), Orlando, Florida: Association for Information Systems.

Goes, P. B. 2014. Big data and IS research. MIS Quarterly (38:3), pp. iii-viii. Ha, S.H., Bae, S.M., and Park, S.C. 2002. "Customer's time-variant purchase behavior and corresponding

marketing strategies: an online retailer's case". Computers & Industrial Engineering (43:4), pp. 801-820.

Hevner, R. A., March, S. T., Park, J., and Ram, S. 2004. ''Design science in information systems research,'' MIS quarterly (28:1), pp. 75-105.

Hopf, K., Sodenkamp, M., Kozlovkiy, I., and Staake, T. 2014. "Feature extraction and filtering for household classification based on smart electricity meter data". Computer Science-Research and Development, pp. 1-8.

Hossain, M. R., Oo, A. M. T., and Ali, A. B. M. S. 2013. "The Combined Effect of Applying Feature Selection and Parameter Optimization on Machine Learning Techniques for Solar Power Prediction". American Journal of Energy Research (1:1), pp. 7-16.

van Houwelingen, J. H., and van Raaij, W. F. 1989. "The effect of goal-setting and daily electronic feedback on in-home energy use". Journal of Consumer Research (16:1), pp. 98-105.

IBM. 2011. The 2011 IBM Tech Trends Report: The Clouds are Rolling In...Is Your Business Ready? November 15. https://www.ibm.com/developerworks/community/blogs/ff67b471-79df-4bef-9593-4802def4013d/entry/2011_ibm_tech_trends_report_the_clouds_are_rolling_in_is_your_business_ready5?lang=en (Accessed April 30, 2015).

Jayaraman, V., Paulraj, A., Li, S., and Shang, K. C. 2014. "Environmental Competencies and Competitive Advantage: Is Green IS the missing link?" Academy of Management Proceedings (2014:1), pp. 12458. Academy of Management.

Jayaweera, T., and Hossein H. 2013. "The Uniform Methods Project: Methods for Determining Energy Efficiency Savings for Specific Measures." U.S. Department of Energy, pp: 275-3000. http://energy.gov/sites/prod/files/2013/05/f0/53827-12.pdf .

Page 18: Gaining IS Business Value through Big Data Analytics: A ... · Gaining IS Business Value through Big Data Analytics Thirty Sixth International Conference on Information Systems, Fort

Gaining IS Business Value through Big Data Analytics

Thirty Sixth International Conference on Information Systems, Fort Worth 2015 18

Jenkin, T. A., Webster, J., and McShane, L. 2011 "An agenda for ‘Green information technology and systems research." Information and Organization (21:1), pp. 17-40.

Joachims, T. 1998. "Text categorization with support vector machines: Learning with many relevant features". Springer Berlin Heidelberg, 1998.

Kallinikos, J., and Constantiou, I. D. 2015. "Big data revisited: a rejoinder". Journal of Information Technology, (30:1), pp. 70-74.

Kim, Y., and Street, W.N. 2004. "An intelligent system for customer targeting: a data mining approach". Decision Support Systems, (37:2), pp. 215-228.

La Valle, S., Lesser, E., Shokley, R., Hopkins, M.S., and Kruschwitz, N. 2011. "Big data, analytics and the path from insights to value". MIT Sloan Management Review (52:2), pp. 21–32.

Loock, C. M., Staake, T., and Thiesse, F. 2013. "Motivating energy-efficient behavior with green IS: an investigation of goal setting and the role of defaults". MIS Quarterly (37:4), pp. 1313-1332.

Loos, P., Nebel, W., Gómez, J.M., Hasan, H., Watson, R.H., vom Brocke, J., Seidel, and S., Recker, J. 2011. "Green IT: a matter of business and information systems engineering?." Business & Information Systems Engineering (4:3), pp. 245-252.

Lycett, M. 2013. "‘Datafication’: making sense of (Big) data in a complex world". European Journal of Information Systems (22:4), pp. 381–386.

van der Maaten, L.J.P., Postma, E.O., and van den Herik, H.J. 2009. "Dimensionality reduction: A comparative review." Journal of Machine Learning Research (41:10.1), pp. 66-71.

Markus, M. L. 2014. "Information Technology and Organizational Structure", in H. Topi and A. Tucker (eds.) Information Systems and Information Technology, Computing Handbook, Volume II. Chapman and Hall, CRC Press, pp. 67, 61–22.

Markus, M. L. 2015. "New games, new rules, new scoreboards: the potential consequences of big data". Journal of Information Technology. Forthcoming

Malhotra, A., Melville, N. P., and Watson, R. T. 2013. "Spurring impactful research on information systems for environmental sustainability". MIS Quarterly (37:4), pp. 1265-1274.

Melville, N. P. 2010. ''Information systems innovation for environmental sustainability,'' MIS Quarterly (34:1), pp. 1-21.

Mithas, S., Lee, M. R., Earley, S., Murugesan, S., and Djavanshir, R. 2013. "Leveraging big data and business analytics", IEEE IT Professional (15:6), pp. 18–20.

Mithas, S., Ramasubbu, N., and Sambamurthy, V. 2011. "How information management capability influences firm performance". MIS Quarterly (35:1), pp. 237–256.

Mithas, S., Tafti, A. R., Bardhan, I. R., and Goh, J. M. 2012. "Information technology and firm profitability: mechanisms and empirical evidence". MIS Quarterly (36:1), pp. 205–224.

Mutanen, A., Ruska, M., Repo, S., and Järventausta, P. 2011. "Customer classification and load profiling method for distribution systems." Power Delivery, IEEE Transactions on (26:3), pp. 1755-1763.

Newell, S., and Marabelli, M. 2015. "Strategic opportunities (and challenges) of algorithmic decision-making: A call for action on the long-term societal effects of ‘datification’". The Journal of Strategic Information Systems, (24:1), pp. 3-14.

Niculescu-Mizil, A., and Caruana, R. 2005. "Predicting good probabilities with supervised learning." Proceedings of the 22nd international conference on Machine learning.

Peffers, K., Tuunanen, T., Rothenberger, M.A., Chatterjee, S. 2007. "A design science research methodology for information systems research". Journal of Management Information Systems,( 24:3) pp. 45-77.

Platt, J. 1999. "Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods." Advances in large margin classifiers (10:3), pp. 61-74.

Pool, I.d.S. 1983. "Forecasting the Telephone: A retrospective technology assessment of the telephone". Norwood, NJ: Ablex.

Porter, M. E., and Kramer, M. R. 2006. “Strategy and Society: The Link Between Competitive Advantage and Corporate Responsibility”. Harvard Business Review (84:12), pp. 78-92.

Saharon, R.,Perlich, C., and Zadrozny, B. 2005. "Ranking-based evaluation of regression models." Data Mining, Fifth IEEE International Conference on. IEEE.

Saldanha, T., Mithas, S., and Krishnan, M.S. 2013. "The role of business analytics in customer-involvement and innovation". Proceedings of the 23rd Workshop on Information Technologies and Systems 2013 (WITS 2013) (Purao S and Sharman R Eds.), Milan, Italy.

Schölkopf, B., and Smola, A. J. 2002. "Learning with kernels: Support vector machines, regularization, optimization, and beyond". MIT press.

Page 19: Gaining IS Business Value through Big Data Analytics: A ... · Gaining IS Business Value through Big Data Analytics Thirty Sixth International Conference on Information Systems, Fort

Gaining IS Business Value through Big Data Analytics

Thirty Sixth International Conference on Information Systems, Fort Worth 2015 19

Schryen, G. 2013. "Revisiting IS business value research: what we already know, what we still need to know, and how we can get there". European Journal of Information Systems (22:2), pp. 139–169.

Seidel, S., Recker, J., and vom Brocke, J. 2013. ''Sensemaking and sustainable practicing: functional affordances of information systems in green transformations,'' MIS Quarterly (37:4), pp. 1275-1299.

Shanks, G., Sharma, R., Seddon, P., and Reynolds, P. 2010. "The impact of strategy and maturity on business analytics and firm performance: a review and research agenda". Australasian Conference on Information Systems, Association for Information Systems, Brisbane, Australia.

Sharma, R., Mithas, S., and Kankanhalli, A. 2014. "Transforming decision-making processes: a research agenda for understanding the impact of business analytics on organisations". European Journal of Information Systems, (23:4), pp. 433-441.

Simon, H.A., 1947. "Administrative Behavior: A Study of Decision-Making Processes in Administrative Organization". Palgrave Macmillan, New York.

Sodenkamp, M., Hopf, K., and Staake, T. 2014. "Using Supervised Machine Learning to Explore Energy Consumption Data in Private Sector Housing". Handbook of Research on Organizational Transformations through Big Data Analytics, pp. 320.

Thambusamy, R., and Salam, A. F. 2010. “Corporate Ecological Responsiveness, Environmental Ambidexterity and IT-Enabled Environmental Sustainability Strategy”. In Proceedings of the 31st International Conference on Information Systems, St. Louis, MO, December 12-15.

Thrun, S., and Pratt, L., eds. 2012. "Learning to learn". Springer Science & Business Media, 2012 Tiefenbeck, V., Tasic, V, Staake, T., and Fleisch, E. 2013. "Contrasting the effects of real-time feedback on

resource consumption between single- and multi-person households", SSES Annual Meeting 2013, Neuchatel, Switzerland, June 2013.

Tsekouras, G. J., Hatziargyriou, N.D., and Dialynas, E.N. 2007. "Two-stage pattern recognition of load curves for classification of electricity customers." Power Systems, IEEE Transactions on (22:3), pp. 1120-1128.

Vassileva, I., Odlare, M., Wallin, F., and Dahlquist, E. 2012. "The impact of consumers' feedback preferences on domestic electricity consumption". Applied Energy (93), pp. 575-582.

Washburn, D., Nelson, L. E., King, O., and Yates, S. 2009. “The Rise of the Green Enterprise: A Primer for IT Lead Involvement,” Forrester Research, Inc.

Watson, R. T., Boudreau, M.C., and Chen, A.J. 2010 "Information systems and environmentally sustainable development: energy informatics and new directions for the IS community." MIS Quarterly (34:1), pp. 4.

Yong-Seog, K., and Street, W. N. 2004. "An intelligent system for customer targeting: a data mining approach." Decision Support Systems (37:2), pp. 215-228.

Zaki, Mohammed J., and Wagner Meira Jr. 2014. "Data Mining and Analysis: Fundamental Concepts and Algorithms". Cambridge University Press.

Zopounidis, C., and Doumpos, M. 2000. "PREFDIS: A multicriteria decision support system for sorting decision problems". Computers & Operations Research, (27:7), pp. 779-797.