big data analytics - actuarial society of south africa · others train their staff to improve their...

39
106 | ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014 Big Data analytics By K Bhoola, K Kruger, J Peick, P Pio and NA Tshabalala Presented at the Actuarial Society of South Africa’s 2014 Convention 22–23 October 2014, Cape Town International Convention Centre ABSTRACT e amount of structured and unstructured data becoming available to the insurance industry continues to grow rapidly. Analysing these large datasets, also referred to as Big Data, can provide helpful information to avoid risks, discover new opportunities, identify customer trends and develop new products. Hence, Big Data analysis is fast becoming the competitive, innovative edge insurers are looking for. Although data analysis is not new to the insurance industry, the volume and range of data being available is constantly changing. e true value of Big Data is only realised when relevant information can be extracted rapidly and when it can be structured in a way that fact based decisions can be made based on it. is paper sets out the history and definition of Big Data, the challenges and opportunities around Big Data using case studies that may be applied in the local South African insurance industry as well as the technology and tools needed to analyse Big Data. It also explores the roles actuaries can play in Big Data Analytics and insurance space. A short introduction into the data governance and regulations as well as a possible outlook of what the future might hold are included as well. KEYWORDS Big Data; insurance; tools; data governance CONTACT DETAILS Kerisha Bhoola, KPMG Services (Pty) Ltd, Wanooka Building, 1 Albany Road, Parktown, 2193 Tel: 079 300 5038; Email: [email protected]

Upload: others

Post on 26-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

106 | ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

Big Data analyticsBy K Bhoola, K Kruger, J Peick, P Pio and NA Tshabalala

Presented at the Actuarial Society of South Africa’s 2014 Convention22–23 October 2014, Cape Town International Convention Centre

ABSTRACTThe amount of structured and unstructured data becoming available to the insurance industry continues to grow rapidly. Analysing these large datasets, also referred to as Big Data, can provide helpful information to avoid risks, discover new opportunities, identify customer trends and develop new products. Hence, Big Data analysis is fast becoming the competitive, innovative edge insurers are looking for. Although data analysis is not new to the insurance industry, the volume and range of data being available is constantly changing. The true value of Big Data is only realised when relevant information can be extracted rapidly and when it can be structured in a way that fact based decisions can be made based on it. This paper sets out the history and definition of Big Data, the challenges and opportunities around Big Data using case studies that may be applied in the local South African insurance industry as well as the technology and tools needed to analyse Big Data. It also explores the roles actuaries can play in Big Data Analytics and insurance space. A short introduction into the data governance and regulations as well as a possible outlook of what the future might hold are included as well.

KEYWORDSBig Data; insurance; tools; data governance

CONTACT DETAILSKerisha Bhoola, KPMG Services (Pty) Ltd, Wanooka Building, 1 Albany Road, Parktown, 2193Tel: 079 300 5038; Email: [email protected]

Page 2: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS | 107

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

1. INTRODUCTION1.1 Big Data analytics has transformed the way in which many organisations deal with data, as businesses are using the power of insights provided by Big Data to instantaneously establish more information around their customers and business practices (Kuketz, unpublished). According to him, the biggest value created by these timely, meaningful insights from large datasets is often the effective business decision-making that these enable. He states that extrapolating valuable insights from very large amounts of structured and unstructured data from disparate sources in different formats present a multitude of business benefits including real-time monitoring and forecasting of events that impact either business performance or operation. It has been observed from industry practice that some companies outsource this capability while others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques that can assist in analysing Big Data.

1.2 According to Schroeck & Shockely (unpublished), it was estimated that over 90% of global data was created in the past two years and more than 80% of that data is unstructured from sources such as internal electronic documents, the social media and government data, and therefore not accessible to traditional IT systems. They mention that new techniques, technological advances in hardware and tooling, and (open-source) platforms are now becoming available.

1.3 Figure 1 below illustrates the estimated adoption of Big Data Analytics for larger organisations during 2012 to 2017.

Figure 1 Estimated adoption of Big Data Analytics

Page 3: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

108 | K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

1.4 Definition of Big Data1.4.1 The term ‘Big Data’ was included in an update of the Oxford English

Dictionary (2013):Big data n. Computing (also with capital initials) data of a very large size, typically to the extent that its manipulation and management present significant logistical challenges; (also) the branch of computing involving such data.

1.4.2 The following definition is outlined in Wikipedia:Big data is a blanket term for any collection of datasets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis and visualization.

1.4.3 Sorge (unpublished) highlights structured and unstructured data as being the key components of data. She outlines that structured data refers to data that resides in a fixed field within a record or file. This includes data contained in business process applications utilising databases and spreadsheets. According to her, structured data is more organised data that is usually simpler to manage and analyse. It can also be stored in sources such as Excel, Access, and other traditional software.

1.4.4 Sorge (op. cit.) states that unstructured data includes, for example, unfor-matted paper-based documents, streaming instrument data, machine logs, webpages, pdf files, PowerPoint presentations, emails, blog entries, wikis and word processing documents.

1.5 Characteristics of Big Data1.5.1 Volume

1.5.1.1 Normandeau (unpublished) describes Big Data as incorporating enor-mous volumes of data. He states that unlike the past when data was created by humans, data is now generated by machines, networks and human interaction on systems such as social media which results in a large increase in the volume of data that can be analysed.

1.5.1.2 According to Dave (unpublished), there has been an exponential growth in data storage as data can now exist in the format of videos, music and large images on social media channels. He has observed that as the database grows, the applications and architecture built to support the data needs to be re-evaluated regularly. He states that re-evaluating the same data with multiple angles may also create a new data finding.

1.5.1.3 Laney (unpublished) suggests that the factors that contribute to the increase in data volume include transaction-based data stored through the years, unstructured data streaming in from social media and increasing amounts of sensor and machine-to-machine data being collected. According to him, as data storage costs reduce, there is a focus on determining relevance within large data volumes and methods in using analytics to create value from relevant data.

Page 4: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS | 109

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

1.5.2 Velocity1.5.2.1 Big Data velocity deals with the pace at which data continually flows in

from sources such as business processes, machines, networks and human interaction on platforms such as social media sites and mobile devices (Normandeau, op. cit.).

1.5.2.2 Dave (op. cit.) draws attention to the fact that data growth and social media have changed how we look at data. For example, news channels and radios have changed how fast we receive the news and people now rely on social media to update them with the latest happenings (Dave, op. cit.). He states that social media users pay little attention to old messages and statuses, discarding them to focus on the most recent ones. The data movement has become almost real time and the update window has greatly reduced. According to Normandeau (op. cit.), such real-time data can help businesses make valuable decisions that provide strategic competitive advantages if the velocity can be handled.

1.5.3 Variety1.5.3.1 Normandeau (op. cit.) describes variety as referring to the many sources

and types of data both structured and unstructured. It is evident that data can now be sourced in the form of emails, photos, videos, monitoring devices, PDFs and audio. This variety of unstructured data creates challenges for storage, mining and analysing data (Normandeau, op. cit.). Figure 2 below outlines the key characteristics discussed above.

Figure 2 Characteristics of Big Data

Page 5: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

110 | K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

1.5.4 Veracity1.5.4.1 Coops (unpublished) describes veracity as referring to the confidence in

your data. According to him, it refers to the accuracy of the data and how it may relate to business value. If a company does not trust its data, then it may not be useful for strategic decision making.

1.5.5 Value1.5.5.1 Coops (op. cit.) describes value as the aspect that ensures that the

findings obtained from the analysis are insightful and can be applied practically within the business context. According to him, it refers to the extent to which a company can leverage its data to solve its business problems.

1.6 The Story of Big Data1.6.1 Approximately seventy years ago, the first attempt to quantify the growth

rate in the volume of data or what has been known as the ‘information explosion’ (a term first used in 1941, according to the Oxford English Dictionary) occurred. This section is based on the article ‘A Very Short History Of Big Data’ by Press (unpublished). It covers the major milestones in the history of sizing data volumes as well as the key milestones in the evolution of Big Data and observations pertaining to data or information explosion.

1.6.2 In 1944, Fremont Rider, Wesleyan University Librarian, published The Scholar and the Future of the Research Library. He estimated that American university libraries were doubling in size every sixteen years. Rider also speculated that the Yale Library in 2040 will have “approximately 200,000,000 volumes, which will occupy over 6,000 miles of shelves … [requiring] a cataloging staff of over six thousand persons.” (Press, op. cit.).

1.6.3 In 1967, B.A. Marron and P.A.D. de Maine published ‘Automatic data compression’, stating that “The ‘information explosion’ noted in recent years makes it essential that storage requirements for all information be kept to a minimum.” They referred to (Press, op. cit.):

a fully automatic and rapid three-part compressor which can be used with ‘any’ body of information to greatly reduce slow external storage requirements and to increase the rate of information transmission through a computer.

1.6.4 In 1975, the Ministry of Posts and Telecommunications in Japan started conducting the Information Flow Census, tracking the volume of information circulating in Japan. The census introduced ‘amount of words’ as the single unit of measurement across all media. The 1975 census found that information supply was increasing much faster than information consumption and in 1978 it reported that (Press, op. cit.):

the demand for information provided by mass media, which are one-way communication, has become stagnant, and the demand for information provided

Page 6: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS | 111

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

by personal telecommunications media, which are characterized by two-way communications, has drastically increased … Our society is moving toward a new stage … in which more priority is placed on segmented, more detailed information to meet individual needs, instead of conventional mass-reproduced conformed information.

1.6.5 In 1980 I.A. Tjomsland gave a speech titled “Where Do We Go From Here?” at the Fourth IEEE Symposium on Mass Storage Systems, in which he said (Press, op. cit.):

Those associated with storage devices long ago realized that Parkinson’s First Law may be paraphrased to describe our industry—‘Data expands to fill the space available’ … I believe that large amounts of data are being retained because users have no way of identifying obsolete data; the penalties for storing obsolete data are less apparent than are the penalties for discarding potentially useful data.

1.6.6 In 1981, the Hungarian Central Statistics Office started a research project covering the country’s information industries. The research is currently ongoing (Press, op. cit.).

1.6.7 In 1983, Ithiel de Sola Pool published ‘Tracking the Flow of Information’ in Science. After analysing the growth trends in 17 major communications media from 1960 to 1977, he concluded that (Press, op. cit.):

words made available to Americans (over the age of 10) through these media grew at a rate of 8.9 percent per year … words actually attended to from those media grew at just 2.9 percent per year … In the period of observation, much of the growth in the flow of information was due to the growth in broadcasting … But toward the end of that period [1977] the situation was changing: point-to-point media were growing faster than broadcasting.

1.6.8 In 1986, Hal B. Becker published ‘Can users really absorb data at today’s rates? Tomorrow’s?’ in Data Communications. Becker estimated that (Press, op. cit.):

the recoding density achieved by Gutenberg was approximately 500 symbols (characters) per cubic inch—500 times the density of [4,000 B.C. Sumerian] clay tablets. By the year 2000, semiconductor random access memory should be storing 1.25X10^11 bytes per cubic inch.

1.6.9 In 1990, Peter J. Denning published ‘Saving All the Bits’ in American Scientist. He stated (Press, op. cit.):

The imperative [for scientists] to save all the bits forces us into an impossible situation: The rate and volume of information flow overwhelm our networks, storage devices and retrieval systems, as well as the human capacity for comprehension … What machines can we build that will monitor the data stream of an instrument, or sift through a database of recordings, and propose for us a

Page 7: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

112 | K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

statistical summary of what’s there? … it is possible to build machines that can recognize or predict patterns in data without understanding the meaning of the patterns. Such machines may eventually be fast enough to deal with large data streams in real time … With these machines, we can significantly reduce the number of bits that must be saved, and we can reduce the hazard of losing latent discoveries from burial in an immense database. The same machines can also pore through existing databases looking for patterns and forming class descriptions for the bits that we’ve already saved.

1.6.10 In 1996, digital storage was considered to be more cost-effective for storing data than paper according to R.J.T. Morris and B.J. Truskowski, in ‘The Evolution of Storage Systems’, IBM Systems Journal, July 1, 2003. In 1997, Michael Lesk published ‘How much information is there in the world?’ Lesk concluded that (Press, op. cit.):

There may be a few thousand petabytes of information all told; and the production of tape and disk will reach that level by the year 2000. So in only a few years, (a) we will be able [to] save everything–no information will have to be thrown out, and (b) the typical piece of information will never be looked at by a human being.

1.6.11 In 1998, K.G. Coffman and Andrew Odlyzko published ‘The Size and Growth Rate of the Internet’. They mentioned that (Press, op. cit.):

the growth rate of traffic on the public Internet, while lower than is often cited, is still about 100% per year, much higher than for traffic on other networks. Hence, if present growth trends continue, data traffic in the U. S. will overtake voice traffic around the year 2002 and will be dominated by the Internet.

1.6.12 In 1999, Steve Bryson, David Kenwright, Michael Cox, David Ellsworth, and Robert Haimes published ‘Visually exploring gigabyte datasets in real time’ in the Communications of the ACM. It is the first CACM article to use the term “Big Data”. The article opened with the following (Press, op. cit.):

Very powerful computers are a blessing to many fields of inquiry. They are also a curse; fast computations spew out massive amounts of data. Where megabyte datasets were once considered large, we now find datasets from individual simulations in the 300GB range. But understanding the data resulting from high-end computations is a significant endeavor. As more than one scientist has put it, it is just plain difficult to look at all the numbers. And as Richard W. Hamming, mathematician and pioneer computer scientist, pointed out, the purpose of computing is insight, not numbers.

1.6.13 In 2000, Peter Lyman and Hal R. Varian at UC Berkeley published ‘How Much Information?’ According to Press (op. cit.), it was the first comprehensive study to quantify, in computer storage terms, the total amount of new and original

Page 8: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS | 113

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

information (not counting copies) created in the world annually and stored in the following physical media: paper, film, optical (CDs and DVDs), and magnetic. The study discovered that in 1999, the world produced about 250 megabytes for every man, woman, and child on earth (Press, op. cit.). It also found that ‘a vast amount of unique information is created and stored by individuals’ and that ‘not only is digital information production the largest in total, it is also the most rapidly growing’. Another study taking place in 2003 by the same researchers found that the ‘world produced about 5 exabytes of new information in 2002 and that 92% of the new information was stored on magnetic media, mostly in hard disks’ (Press, op. cit.).

1.6.14 In 2001, Doug Laney, an analyst with the Meta Group, published a research note titled ‘3D Data Management: Controlling Data Volume, Velocity, and Variety’. Ten years later, the “3Vs” had become the standard defining principles of big data (Press, op. cit.).

1.6.15 In 2005, Tim O’Reilly published ‘What is Web 2.0’ in which he stated that “data is the next Intel inside.” O’Reilly stated (Press, op. cit.):

As Hal Varian remarked in a personal conversation last year, ‘SQL is the new HTML.’ Database management is a core competency of Web 2.0 companies, so much so that we have sometimes referred to these applications as ‘infoware’ rather than merely software.

1.6.16 In 2008, Cisco released the ‘Cisco Visual Networking Index – Forecast and Methodology, 2007–2012’ part of an ‘ongoing initiative to track and forecast the impact of visual networking applications’. It predicted that (Press, op. cit.):

‘IP traffic will nearly double every two years through 2012’ and that it will reach half a zettabyte in 2012. The forecast held well, as Cisco’s latest report (May 30, 2012) estimates IP traffic in 2012 at just over half a zettabyte and notes it ‘has increased eightfold over the past five years.’

1.6.17 In 2009, Roger E. Bohn and James E. Short published ‘How Much Information? 2009 Report on American Consumers’. The study found that in 2008 (Press, op. cit.):

Americans consumed information for about 1.3 trillion hours, an average of almost 12 hours per day. Consumption totaled 3.6 Zettabytes and 10,845 trillion words, corresponding to 100,500 words and 34 gigabytes for an average person on an average day.” Bohn, Short, and Chattanya Baru follow this up in January 2011 with “How Much Information? 2010 Report on Enterprise Server Information,” in which they estimate that in 2008, “the world’s servers processed 9.57 Zettabytes of information, almost 10 to the 22nd power, or ten million gigabytes. This was 12 gigabytes of information daily for the average worker, or about 3 terabytes of information per worker per year. The world’s companies on average processed 63 terabytes of information annually.

Page 9: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

114 | K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

1.6.18 In 2010, Kenneth Cukier published in The Economist a Special Report titled, ‘Data, data everywhere’. He stated that (Press, op. cit.):

… the world contains an unimaginably vast amount of digital information which is getting ever vaster more rapidly … The effect is being felt everywhere, from business to science, from governments to the arts. Scientists and computer engineers have coined a new term for the phenomenon: ‘big data’.

1.6.19 In 2011, Martin Hilbert and Priscila Lopez published ‘The World’s Technological Capacity to Store, Communicate, and Compute Information’ in Science. According to Press (op. cit.), they estimated that the global information storage capacity grew at a compound annual growth rate of 25% per year between 1986 and 2007. They also estimated a change in roles such that in 1986, 99.2% of all storage capacity was analog, but in 2007, 94% of storage capacity was digital (Press, op. cit.).

1.6.20 In 2012, Danah Boyd and Kate Crawford published ‘Critical Questions for Big Data’ in Information, Communications, and Society. They define Big Data as (Press, op. cit.):

a cultural, technological, and scholarly phenomenon that rests on the interplay of: (1) Technology: maximizing computation power and algorithmic accuracy to gather, analyze, link, and compare large datasets. (2) Analysis: drawing on large datasets to identify patterns in order to make economic, social, technical, and legal claims. (3) Mythology: the widespread belief that large datasets offer a higher form of intelligence and knowledge that can generate insights that were previously impossible, with the aura of truth, objectivity, and accuracy.

2. BIG DATA ANALYTICS IN INSURANCE2.1 Introduction

2.1.1 Data and information is the basis of the insurance industry. As larger datasets and more innovation and tools around analysing Big Data and extracting business value from them in the insurance industry are created, the actuary’s role has also developed and evolved. According to the IBM White paper “Insurance in the age of analytics” (unpublished b):

Insurance is a business based on information, analysis and relationships. Its core function of profitable risk management relies on the ability to apply the requisite mathematics against data on exposures, investments and markets. Whether kept in ledgers or filing cabinets, written on magnetic tapes and disks, or pushed up to the ubiquitous cloud, insurers have had to embrace information technology (IT) to manipulate and store critical business information. Recently, the core analytics paradigm (for actuarial and pricing) has been extended to encompass larger and larger datasets of company and customer activities.

2.1.2 The insurance industry is built on the capabilities of analysing data to understand and evaluate risks. The actuarial and underwriting professions emerged at

Page 10: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS | 115

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

the beginning of the modern insurance era in the 17th century (Breading & Smallwood, unpublished). These roles are both dependent upon the analysis of data. Recently, new technologies have been introduced that can be applied in data analysis, giving insurers new strategic and operational insights into their businesses (Breading & Smallwood, op. cit.).

2.1.3 According to Breading & Smallwood (op. cit.), insurance organisations are inundated with data, and the volumes are growing rapidly due to telematics, social media, and data from other unstructured sources. Today, Big Data technologies such as Hadoop are entering the insurance market, introducing new approaches to rapidly analysing large amounts of data from many sources (Breading & Smallwood, op. cit.).

2.1.4 However, given the growth and expansion of data analytics tools and techniques that have become available, it is possible to leverage a wide range of data across every part of an insurance company, including the actuarial or underwriting units (Breading & Smallwood, op. cit.). They also state that competitive demands are requiring that new insights be generated much more rapidly – in some cases in real-time. The analytics capability can allow an insurer to manage its customer engagement, financial management, risk assessment and operational units more effectively (Breading & Smallwood, op. cit.).

2.1.5 According to the research carried out by Novarica in Josefowicz & Diana (unpublished) based on 86 insurers, more than a quarter of insurers were using or planned to use Internet clickstreams, audio data, mobile geospatial data, telematics data, or social media content for analysis. They found that the usage of mobile data, historical stock market data, video data, or sensor data was found to be less common amongst these insurers. It was also found that relatively few insurers capture, persist, and analyse Big Data within their computing environment, but those that did, typically leveraged traditional computing, storage, database and analytics. Josefowicz & Diana (op. cit.) noted that very few insurers had invested in specialised Big Data infrastructure as most were still maturing and expanding their use of traditional data analytics and predictive models to improve processes, reduce losses and generally improve their book of business.

2.1.6 According to Schroeck & Shockely (op. cit.), IBM’s Big Data study in 2013 found that 74% of insurance companies surveyed reported that the use of information (including Big Data) and analytics was creating a competitive advantage for their organisations, compared with 63% of cross-industry respondents. Compared to 35% of insurance companies that reported an advantage in IBM’s 2010 New Intelligent Enterprise Global Executive Study and Research Collaboration, this represents a 111% increase in two years.

2.1.7 Jennings (unpublished) states that industries may be hyping the idea of collecting and analysing data whose volumes were too large to deal with in the past. The curve outlined in Figure 3 below illustrates the various expectations that can be created in respect of Big Data Analytics over time.

Page 11: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

116 | K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

2.2 The Role of Big Data Analytics in Insurance2.2.1 According to the industry practice, software solutions that produce

management dashboards, carry out reporting, ad-hoc analysis, scenario planning, and predictive modelling are currently being used by some insurers.

2.2.2 The traditional management approaches and culture of insurance companies have been conservative and relatively unchanged over the past few decades (Breading & Smallwood, op. cit.). According to them, a shift is currently underway with many insurers adopting a ‘management by analytics’ approach to running their businesses. They state that this shift, fuelled by Big Data and high performance analytics, is enabling insurers to select more profitable business, implement more precise pricing, manage the risk portfolio holistically, improve fraud detection, and increase investment returns. High performance analytics is an area where a strong alignment between business and IT can create powerful new capabilities within an insurer’s organisation (Breading & Smallwood, op. cit.).

2.2.3 Life insurers and property and casualty insurers have lagged behind other financial-services sectors but they are now catching up in their adoption of predictive and optimisation models in business processes such as sales, marketing, and service (Sherer, Brown & Grepin, unpublished). They state that the overall effect of these developments will be greater depth and breadth of analytics talent throughout

Figure 3 Big Data hype cycle

Page 12: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS | 117

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

organisations, significant improvements in management processes, and new products that deliver greater value to customers and to society.

2.2.4 While the impetus to invest in analytics has never been greater for insurance companies, the key for insurers is to motivate their highly skilled experts to adopt and train in the use of the newest tools and use them with creativity, confidence, and consistency (Sherer, Brown & Grepin, op. cit.).

2.2.5 Business Capabilities2.2.5.1 Insurers may pursue analytics initiatives in three key areas: customer-

centric, risk-centric, and finance-centric activities (Breading & Smallwood, op. cit.). Figure 4 identifies a number of important areas where analytics are already being applied by leading insurers (Breading & Smallwood, op. cit.).

2.2.6 Risk-Centric Analytics2.2.6.1 According to Breading & Smallwood (op. cit.), insurers assess the

probability and expected costs of specific exposures, illnesses, and death. They state that complex models for product design, pricing, underwriting, loss reserving, and CAT modelling form the basis for determining what type of risks the company will assume and how profitable the portfolio of contracts are projected to be.

2.2.6.2 According to Breading & Smallwood (op. cit.), risk-centric analytics shape the core of the insurance business with new analytics capabilities introducing an opportunity for insurers to strengthen this core. For example, consider property

Figure 4 Role of Data Analytics in insurance

Page 13: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

118 | K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

insurance, where insurers are moving toward by-peril rating (Breading & Smallwood, op. cit.). They are leveraging external data on individual perils such as hail, wildfire, coastal storm surge, crime, and dozens of other factors. The need to build and run models to assess all the exposures for individual properties or groups of properties in a portfolio is crucial to properly evaluating and pricing specific risks (Breading & Smallwood, op. cit.). The ability to capitalise on high performance analytics to rapidly assess different combinations, multiple times a day, will keep an insurer a step or two ahead of the competition (Breading & Smallwood, op. cit.).

2.2.6.3 For personal vehicle insurance, the area of telematics provides a compelling example of how Data Analytics are creating high business value on data collected from remote devices (Breading & Smallwood, op. cit.). According to them, in the early stages of telematics, insurers leverage only a tiny portion of the massive amounts of data streaming in from vehicles installed with telematics devices. This implies that basic rating factors such as miles driven, location, and speed are used to improve risk assessment and better match the price to the risk. A UK insurance company using telematics had reported that better driving habits resulted in a 30% reduction in the number of claims while another UK insurer similarly used telematics to help a large client reduce accident-causing risky driving by 53% (Sherer, Brown & Grepin, op. cit.). According to Breading & Smallwood (op. cit.), the next stage will be for analytics capabilities to evaluate a wide variety of data about driver behaviour, vehicle performance, and location factors to gain new insights on risks and provide more vehicle safety and maintenance advice to policyholders.

2.2.7 Customer-centric Analytics2.2.7.1 According to Breading & Smallwood (op. cit.), insurers have been using

software tools to assist in segmenting markets, identifying prospects, measuring cam-paign effectiveness, and spotting cross-selling opportunities. As observed in industry practice, many insurers would like to reorient their business to focus on the customer, instead of focusing on products and internal operations. This requires a much deeper and more granular understanding of customer wants, needs, and behaviours.

2.2.7.2 The knowledge and insights from agents, brokers, and company employees is important in understanding customers, but becoming a truly customer-centric organisation requires sophisticated customer-centric analytics (Breading & Smallwood, op. cit.). In addition, there is a scope for insurers to gather new types of information that is relevant to customers, including information from social media sites and external data on demographics and location-based perils (Breading & Smallwood, op. cit.).

2.2.7.3 According to Breading & Smallwood (op. cit.), Big Data Analytics will increasingly enable insurers to make customer decisions in real-time, even as interactions are in process. They highlight that the analysis of web navigation, social media channels, and data entry patterns will allow automated or human intervention to provide the right information or assistance. They also state that call centre

Page 14: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS | 119

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

conversations between prospects or customers and representatives will be analysed in real-time for key phrases, voice modulations, and questions to identify when new opportunities are presented or when intervention is required to address a problem.

2.2.7.4 Data analytics can be used to perform a detailed and relevant customer needs analysis. Automating the discussion between customers and advisors about complex insurance products such as living annuities, based on a customer’s needs and resources can enhance the sales process (Breading & Smallwood, op. cit.). From this, it is evident that data analytics can be used to perform a detailed and relevant customer needs analysis.

2.2.7.5 By knowing their customers and their needs, insurers can maximise retention initiatives by suggesting the next best offer (Sherer, Brown & Grepin, op. cit.). The IBM White Paper ‘Harnessing the power of big data and analytics for insurance’ (2013), outlines an example of a large US insurer that conducted extensive analysis on customer information files, transaction data and call centre interactions to identify customers who would respond positively to contact with an agent. Based on the analysis, the company then developed new product offerings. The result was a significant increase in offer response rates and up to a 40% retention rate improvement (IBM, 2013).

2.2.7.6 The IBM White Paper ‘Harnessing the power of big data and analytics for insurance’ (2013), outlines an example of AEGON Hungary Composite Insurance Co. Ltd. It is a subsidiary of the AEGON Group and one of the largest insurance providers of life insurance, asset insurance and pension products for individuals and businesses in Hungary and had huge amounts of raw customer data but lacked the ability to turn it into insight and cross-selling opportunities (IBM, 2013). Using powerful statistical analysis and modelling solutions, the company developed an innovative methodology to connect life events and situations to insurance needs; based on aggregate profiles and predictive behaviour models, the insurer can craft insurance offerings to individual requirements As a result of these efforts, the company improved customer response by 78% through a targeted direct marketing campaign and increased policy purchases by 3% (IBM, 2013).

2.2.7.7 Brokers can propose the right action to the right policyholder at the right time to maximise cross sell, up-sell, strategic lifetime value profitability and loyalty (Sherer, Brown & Grepin, op. cit.). As outlined in the IBM White Paper ‘Harnessing the power of big data’ (2013), after several years of rapid growth, a large Korean non-life insurer wanted to boost revenues by improving its competitive position. The company needed to enhance the effectiveness of its large, distributed network of affiliated agents by providing them with the insights and tools to identify opportunities and implement more targeted and relevant cross-selling offers (IBM, 2013). The company embarked on a comprehensive customer segmentation and market-targeting solution. By applying analytics and predictive modelling to customer account data and transaction histories, the solution enables agents to conduct segmentation based on the probability customers will adopt complementary or higher-value insurance services (IBM, 2013).

Page 15: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

120 | K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

2.2.7.8 According to IBM (2013), by analysing customers’ purchase of various products and services, the insurer can optimise cross-selling strategies, fine-tune marketing material and deliver targeted customer offerings. The insurer can also more accurately predict which insurance products are the most appropriate for each customer. Offering the right mix of services improves the effectiveness and efficiency of the company’s sales force, while the more personalised approach helps agents forge closer bonds with customers, which enhances loyalty (IBM, 2013).

2.2.8 Finance-centric Analytics2.2.8.1 According to Breading & Smallwood (op. cit.), efficient capital allocation

and optimum investment returns are critical to an insurer’s financial performance. Insurers frequently use custom-built approaches to augment financial management, using capital asset pricing models (CAPM) for the purposes of asset liability management and to value and manage assets for least risk and maximum return (Breading & Smallwood, op. cit.).

2.2.8.2 More complex models may be built to address areas such as asset and liability matching, investment portfolio optimisation, embedded value calculations, and econometric modelling (Breading & Smallwood, op. cit.). They also state that an increasingly complex business and economic environment is pushing insurers to do more with analytics so that they can dynamically manage the business, quickly make adjustments in response to changing conditions, and react to requests from regulators.

2.2.8.3 Breading & Smallwood (op. cit.) state that considerable value can be obtained from being able to combine real-time insight from the operational side of the business with extensive external information concerning insurance and economics – and then being able to view, within hours or even minutes, multiple what-if scenarios about investment directions, portfolio mix and asset and liability matching (Breading & Smallwood, op. cit.). From their work, it can be concluded that the ability to make business decisions based on an advanced understanding of financial implications and associated probabilities is becoming a key competitive advantage for insurers.

2.3 Insurance Interviews on Analytics2.3.1 FC Business Intelligence recently held interviews with insurance analytics

experts to gain their views on the role of data analytics in the insurance industry. The following experts were interviewed: Paul Hately (Global Head of Protection Partners and Managing Director, Life & Health for Swiss Re), Grant Mitchell (Chief Actuary for The Co-Operative Insurance) and Christina Jones (Head of Customer Analytics, EMEA Consumer Lines for AIG).These analytics experts will also be presenting their views at the Analytics for Insurance Europe Conference in October 2014.

2.3.2 The experts highlighted several uses of big data analytics within their organisations. This included using predictive models as a substitute for medical underwriting on term life insurance policies in order to make these products easier to

Page 16: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS | 121

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

buy. Data analytics has also been used in the predictive modelling of persistency and customer lifetime value.

2.3.3 Several benefits of using data analytics in insurance were identified. This included using analytics to detect patterns in data and then to predict outcomes. This involves using predictive factors to predict events such as hurricane paths, floods, buying patterns, motor accidents, health insurance claims, mortality and lapses. These insights can be used to enhance business process and may translate into more customer friendly product solutions and/or lower costs. Other benefits include pricing and risk selection where additional insight that can help an insurer identify better risks and price more accurately, will drive financial benefit. Data analytics can also be used in fraud detection, customer propositions and operational efficiency. In terms of customer-centric areas, data analytics can be used in optimising call centre staffing based on predicted call volumes (this reduces wait time and improves customer satisfaction) and improving the digital customer experience by analysing the conversion funnel in the quote process.

2.3.4 There were also several challenges identified in terms of deploying analytics in an organisation. These included the following:a) Knowledge Improving awareness at all levels of the power of data and analytics. b) Regulation Making sure that data is always used ethically. There may be cases

where the regulations have not been written with analytics of portfolios in mind but with a focus on protecting the individual.

c) Access to data Where organisations such as reinsurers for example are very reliant on data from client organisations, this can be a challenge in terms of access, purpose and structure.

d) Budget Companies may have a limited budget and resource pool to grow their data analytics capability. As a result, there is a need to identify quick wins that can be used to prove the business case for further investment.

e) Historic IT infrastructure This can present challenges in developing big data analytics processes including data extraction processes.

2.3.5 One of the questions asked focused on using data analytics for the advantage of customers. The key advantages outlined included the following:a) Analytics should improve accuracy in pricing and customers who present a better

risk should get a cheaper price as a result. Pricing based on a fair risk assessment is one of many customer advantages. Telematics is an example, where the customers’ driving patterns result in a fairer price than the traditional pricing.

b) If insurers can use data and analytics to reduce uncertainty in a portfolio then this may result in prices having less allowance for uncertainty. This may result in a reduction in price at a portfolio level.

c) Analytics can also help develop propositions that are better tailored to individual customer needs. This may be in terms of the cover offered under the policy, or the servicing model available.

Page 17: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

122 | K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

d) Using data to reduce the cost of distribution will also benefit customers i.e. more of their premiums will go in paying benefits and less of it in paying to get the product to them.

e) There are also other benefits: Vehicle renewal data is a trigger used by all insurance companies to contact prospects and existing customers with an vehicle insurance offer. Using renewal dates and analysing behavioural data for triggered insurance offers, enhances marketing efficiency for both the customer and insurance company.

2.4 The Role of Actuaries in Big Data Analytics2.4.1 According to Sherer, Brown & Grepin (op. cit.), the analytics performed by

actuaries are critically important to an insurer’s continued existence and profitability. However, they note that over the past 15 years, revolutionary advances in computing technology and the explosion of new digital data sources have expanded and reinvented the core disciplines of insurers. Today’s big data analytics in insurance pushes far beyond the boundaries of traditional actuarial science (Sherer, Brown & Grepin, op. cit.).

2.4.2 According to the research carried out by Novarica in Josefowicz & Diana (op. cit.) based on 86 insurers, 71% were using analytics in the actuarial modelling and risk analysis area and 56% were using it in pricing.

2.4.3 Pricing2.4.3.1 With much better access to third-party data from a wide variety of

sources, insurers can ask new questions and better understand many different types of risks and price for them more appropriately (Sherer, Brown & Grepin, op. cit.). They outline the following examples:

Which combination of demographic factors and treatment options will have the biggest impact on the life expectancies of people with Parkinson’s disease? Which combination of corporate behaviours in health and safety management is predictive of lower worker compensation claims? What is the probability that, within a given geographic radius, a person will die from a car accident or lose their house in a flood?

2.4.3.2 Sherer, Brown & Grepin (op. cit.) cover an example outlining a new health risk model that has been created by combining actuarial data with medical science, demographic trend information, and government data. This forward- and backward-looking tool for modelling longevity risk operates by capturing data from traditional mortality tables and adds data on medical advances and emerging lifestyle trends such as less smoking, more exercise, and healthier diets (Sherer, Brown & Grepin, op. cit.). According to them, innovations in analytics modelling will also enable insurers to underwrite many other emerging risks that are underinsured including for example industrywide business interruption stemming from natural disasters.

Page 18: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS | 123

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

2.4.4 Product Development2.4.4.1 It can be observed from industry practice that the ability to offer

customers the policies they need at the most competitive premiums is a big advantage for insurers. According to StackIQ (unpublished), this is more of a challenge today, when contact with customers is mainly online or over the phone instead of in person. Scoring models of customer behaviour based on demographics, account information, collection performance, driving records, health information, and other data can aid insurers in tailoring products and premiums for individual customers based on their needs and risk factors (StackIQ, op. cit.).

2.4.5 Risk Management2.4.5.1 Actuaries can access new sources of data and build statistical and

financial models to better understand and quantify risk. These Big Data analytical applications include behavioural models based on customer profile data compiled over time together with other data that is relevant to specific types of products (StackIQ, op. cit.). StackIQ (op. cit.) also outline an example where an insurer could assess the risks inherent in insuring real estate by analysing satellite data of properties, weather patterns, and regional employment statistics.

2.4.5.2 In terms of catastrophe modelling, being proactive when extreme weather is predicted, can lessen the extent of claims payments and accelerate responses by insurers (StackIQ, op. cit.). With the ability to gather data directly from customers and other sources in real-time, more actionable and practical information can be gathered by insurers (StackIQ, op. cit.).

2.4.6 Predictive Modelling2.4.6.1 Predictive modelling can be defined as using past data to predict the

probability of a future event occurring (Barnes, unpublished). According to Batty et al. (unpublished), this is a discipline that actuaries have practiced for quite a long time. They state that one of the oldest examples of statistical analysis guiding business decisions is the use of mortality tables to price annuities and life insurance policies (which originated in the work of John Graunt and Edmund Halley in the 17th century). Likewise, throughout much of the 20th century, general insurance actuaries have either implicitly or explicitly used Generalised Linear Models and Empirical Bayes techniques for the pricing of short-term insurance policies (Batty et al., op. cit.).

2.4.6.2 According to the research carried out by Novarica in Josefowicz & Diana (op. cit.), there are several applications of predictive modelling. These are outlined below:

(a) Underwriting risk score, in which the analysis of multiple factors results in a single score to assist underwriters to evaluate prospective risk. Underwriting risk scores can be used to drive automated underwriting or provide a threshold at which risks must be reviewed by a human underwriter.

Page 19: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

124 | K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

(b) Profitability models, looking at a risk, book of business, or channel, given underwriting, pricing, loss history, consumer information or other factors.

(c) Financial projections, which predicts the financial performance (revenues, operational costs, loss costs and more) of the company or line of business.

(d) Stochastic modelling, a statistical modelling technique that assesses large samples of historical and predictive data to develop probabilities of events occurring, such as claims or fraud.

(e) Claims severity and fraud scoring which help adjusters predict the probable severity of a claim and its likelihood of being fraudulent. As with underwriting risk score, claims scoring can be used to drive automated decisioning around claims.

3. PRACTICAL APPLICATIONS OF BIG DATA ANALYTICS IN INSURANCE

3.1 Case Studies3.1.1 Telematics

3.1.1.1 According to Holiday (unpublished), insurance telematics is the process by which data is collected and analysed to enable driver behaviour to be assessed, and the level of risk to be calculated on an individual basis. This allows an insurance company to assess individual risk much more accurately and therefore provide a price that is fair and tailored to the customer (Holiday, op. cit.). Based on observed industry practice, it is noted that telematics devices are currently used in South Africa as well. The Discovery Insure mobile app that is currently being advertised in the market, for example, allows users to track how well they drive and receive insurance quotes from Discovery based on this.

3.1.1.2 The following case study has been provided by Voelker (unpublished). Two prominent vehicle-insurance providers in the UK and US specialise in customising policy premiums on the basis of drivers’ risk profiles. Some of the challenges they had faced were maintaining top line margins in a recessionary vehicle insurance market by:a) reducing fraudulent claims by tracking real-time driving data;b) reducing costs of injury claims;c) challenges in incorporating real-time data within traditional policies;d) traditional vehicle insurance policies did not factor in driving habits; ande) negligent drivers paying same premium as customers driving safely.

3.1.1.3 The solution involved deploying a telematics solution. It comprised of an in-vehicle device that records real-time driving data, and transmits the same over a wireless network. The components of this solution included the following:a) on-board diagnostic port (OBD) recorded real-time driving data;b) data shared wirelessly with the insurer;c) drivers rated on the basis of miles driven, frequency of sudden stops; andd) customers can earn monthly discounts as per their ratings.

Page 20: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS | 125

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

3.1.1.4 The following benefits were achieved:a) individual driving habits leading to cost-effective individualised policy pricing;b) safe driving efforts leading to –30% drop in premium charges; andc) suggestions on improving driving habits, leading to safer roads and less carbon

emissions.

3.1.2 Social Media3.1.2.1 The following case study has been provided by Maremont & Scism

(unpublished). The US arm of one of the largest life insurance providers globally traditionally used medical tests to ascertain customers’ lifestyle patterns, and decide insurance coverage accordingly. It faced the following challenges:a) pathological tests are costly and can be manipulated; andb) existing process less efficient and less consumer friendly.

3.1.2.2 In dealing with these challenges, the insurer analysed the online behav-iour of approximately 60,000 insurance applicants. The following process was carried out:a) unstructured data from customers’ online shopping, social media activities, etc.

was captured;b) data analysed to categorise customers as runners/hikers, dieters or couch

potatoes;c) predictive modelling tools were applied to estimate the customer’s longevity; andd) coverage decided on the basis of lifestyle pattern life expectancy.

3.1.2.3 The following benefits were achieved:a) social media analytics considered more efficient and customer-friendly;b) insurers expected to save –$125 per applicant by eliminating conventional

medical tests; andc) Facebook page ‘likes’ and Tweets are better indicator of lifestyle risks such as

high blood pressure.

3.1.3 Agriculture3.1.3.1 The following case study has been provided by Khaliq (unpublished).

The Climate Corp. (TCC) provides coverage to agricultural yields from extreme weather impacts. Policies are customised as per growing plans and locations. It faced the following challenges:a) managing 200 TB historical data and twenty TB unstructured farming data

stored on Amazon clouds;b) analysis of five trillion data points across twenty million farming fields in the US;

andc) increasing data volumes.

Page 21: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

126 | K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

3.1.3.2 The solution comprised the following components:a) processing data to generate 10,000 predicted weather scenarios;b) quantifying risk to crop yield and build corresponding insurance policies;c) acquiring weather data from major climate models; andd) incorporating weather data into a real-time pricing engine.

3.1.3.3 The following benefits were achieved:a) enhanced ability to process data growing ten times every year;b) ability to analyse more granular geographic data; andc) significantly scale up processes for rapid data acquisition and ingestion.

3.1.4 Healthcare Fitbits3.1.4.1 Chandler (unpublished) describes FitBits as being a small sensory device

that tracks steps, calories and sleeps patterns. This device was introduced in 2008 by co-founders Eric Friedman and James Park in San Francisco (Chandler, op. cit.). All recorded data and information is tracked in real time on the device, then captured on a ‘personal web based FitBit Dashboard’ and this information can be accessed regularly by the participants (Chandler, op. cit.). Considering the large amount of health data being collected by these devices, this may present a new opportunity for healthcare professionals to collect patient data in a more efficient manner.

3.1.4.2 FitBits also covers a social aspect in that it helps individuals set activity goals, challenge friends and logs meal in the day as well as assist users who are at risk of chronic conditions to improve their health by monitoring their physical activity (Chandler, op. cit.). Health insurers are able to access more data about FitBits users to create detailed risk profiles or adjusting premiums on insured individuals (Chandler, op. cit.). Based on industry practice, some insurers are currently using this in South Africa, such that one is able to earn points by living a healthy and active lifestyle.

3.1.4.3 A large insurance company in South Africa has begun to use this as a wellness tracking device tool to help them correctly diagnose illnesses and also enable quicker treatment turnaround time, which is improving their healthcare systems (Akabor, unpublished).

3.1.5 Fraud Investigation and Prevention3.1.5.1 This case study is outlined in IBM ‘Business Applications of Data

Mining’ (Smyth et al., 2002). A US company implemented specific software to prevent and detect vehicle and medical insurance fraud. The software was used to ingest reports and data from all sources and quickly transform unstructured data into a clear format. The software solution helps the company to ingest IP addresses, scanned document references, email addresses, driver’s licence information, criminal records, judgments or known fraudulent connection data to link multiple sources and identify relationships that may otherwise remain anonymous. Based on hundreds of thousands of data points, repeatable patterns could be highlighted, pinpointing entities involved

Page 22: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS | 127

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

and associated connections and identifying commonalities that may lead to claim denial. Mass data analysis around medical billing was also possible by following diagnosis and treatment codes because certain treatments follow certain diagnoses. With the detailed analysis, the organisation can make quick, easy and informed decisions regarding fraudulent claims, saving the business and its customers money and providing details and referrals to law enforcement.

3.1.5.2 The benefits of Big Data analysis with regard to fraud detection/prevention include:a) saves the company and its customers money with better fraud prevention;b) stronger fraud protection by assimilating multiple data types; andc) identification of hidden or anonymous relationships.

3.1.6 Telemedicine3.1.6.1 This case study is outlined in IBM ‘Business Applications of Data

Mining’ (Smyth et al., 2002). An Austrian public insurance company provides health, pension and accident coverage to 260,000 railway workers and miners. Responding to a dramatic increase in the percentage of the population suffering from type 2 diabetes, with lifetime costs per patient averaging EUR150 000, the company shifted its focus from traditional to preventive care.

3.1.6.2 Effective screening, treatment and management of at-risk or already afflicted patients can slow or even halt the advance of type 2 diabetes. Proactive treatment requires close cooperation between physician and patient. Success is directly dependent on precise medication management and challenging lifestyle changes, which demand a high degree of motivation. This insurer required a convenient way to monitor compliance on a daily basis, in near-real time, while offering ease of use to facilitate patient autonomy and participation.

3.1.6.3 The insurer developed a telemedicine programme that helps physicians monitor patient compliance with prescribed treatment programmes, including adjusting medications and treatment quickly and remotely. The core of the system is a mobile diabetes memory diary, a custom-designed database solution.

3.1.6.4 The solution integrates medical devices such as blood pressure gauges, blood glucose meters and weight scales with cell phones to help patients monitor their vital values. The solution automatically forwards patient data to the diabetes memory database, and an algorithm compares actual results to expected standards, immediately notifying physicians if results exceed critical thresholds. Physicians then respond rapidly to adjust patient treatment to avert the cascading complications that can worsen the disease. Traditional treatment carries higher financial and human costs, usually occurring after patients suffer symptoms or complications. The solution fundamentally transforms the treatment of type 2 diabetes and helps dramatically reduce the need for emergency treatment or hospitalisation.

3.1.6.5 Patients exchange inconvenient, face-to-face, single point-in-time consultations with physicians for continuous home-based tracking and analysis of key

Page 23: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

128 | K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

parameters using telemedicine, promoting effective, reality-based treatment. More frequent data reporting and analysis helps improve patient health, lower costs and reduce escalated interventions. The solution gathers and analyses patient data, helping identify trends and offering visible progress reports for healthcare providers. It also links the insurance records of each patient with the diabetes telemedicine system and gives role-based access to clinical staff. Stringent data security protects patient privacy with encryption and decryption during transfer by mobile technology.

3.1.6.6 The benefits of this telemedicine system includes the following:a) reduced healthcare costs by emphasising prevention and proactive care;b) identifies at-risk patients and quickly integrates them into the programme,

further reducing costs; andc) use of telemedicine and insurance records to calculate the return on investment

to measure treatment efficacy.

3.1.7 Measuring Claims Risks3.1.7.1 This case study is outlined in the article ‘Santam Insurance: Measuring

claims risks in real time lowers fraud costs, speeds payments and delights customers’ IBM (unpublished a). A Santam executive team was looking to incorporate predictive analytics into claims processing. The head of the Analytics Unit assembled a multidisciplinary team comprised of business process experts, actuarial and analytics specialists, and key IT staff. The following challenges were faced by the team:

— The aim was to automate and streamline that part of the claims process where it resulted in the most value for the business. Their starting point was the assumption that all claims are not the same. Some claims are either large or complex (and risky) enough to require detailed human involvement, including the need for on-site inspection by insurance adjusters. Conversely, the majority of smaller or simpler claims did not justify incurring the cost of traditional processing.

— Santam’s plan was to create a new, multi-track processing channel that would accelerate the claims based on their risk levels. As each claim came in, its details would feed into an analytical model that would, in essence, predict whether or not that claim would justify fast-track treatment or should receive the standard– or even higher – level of scrutiny.3.1.7.2 It was key to obtain buy-in from the employees as well as executive

manage ment to enable them to gauge the value that may be extracted from this exercise. The employees attended a series of workshops to train them on what predictive modelling meant to the company as a whole, and for individual employees to get comfortable with the workings of it. This provided the team with a strong bottom-up understanding of the details of the business, which proved to be essential in designing the predictive analytics solution.

3.1.7.3 Management also focused on evaluating which of Santam’s data sources would be suitable to feed into the predictive model. The team then set out to implement the solution. Six months later, with the solution entering production,

Page 24: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS | 129

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

predictive analytics had moved from being an interesting idea to an important element of Santam’s operating model.

3.1.7.4 From the instant a claim is filed, the pieces of information captured by an insurance company (e.g. time of day, place, age, gender and vehicle make and year) accumulate speedily. By that point, and in real time, Santam’s solution has already begun assembling these pieces into a picture of what it means from a risk perspective. Applying business rules from the expertise of Santam’s claims specialists, the solution calculates each claim’s intrinsic risk, and from that, prescribes one of five courses of action, ranging from immediate payment to the triggering of a fraud investigation.

3.1.7.5 This solution had an immediate impact on the speed and efficiency of Santam’s claims process. Among those claims deemed the lowest risk, the time to resolution was decreased by more than 90%. Across claims as a whole, the 70% of claims that would ordinarily require as long as five days to settle now took under 48 hours.

3.1.7.6 One month after the solution went live, the analytics engine underlying the claims solution detected the existence of a motor insurance fraud syndicate. By blocking that scheme, and detecting and preventing several others, Santam’s claims department saved the company more than US$2.4 million in fraudulent payments in just the first four months after implementation. A more detailed analysis of claims data enabled Santam to create an optimisation calculation that determined when it made sense to repair a car.

3.1.8 Underwriting3.1.8.1 During the insurance industry’s developmental phase, experienced

underwriters considered each risk at an individual level and quoted each insurance package accordingly based on reasoning, judgment and historic experiences (Minor, unpublished). According to Minor (op. cit.), cognitive computing will allow under-writers to underwrite by assessing the unique risks of each individual customer in a personalised and customer-centric approach. This can all happen based on knowledge of the customer, past experiences and future predictions. Cognitive computing allows insurers to analyse significant amounts of unstructured and structured information in real time (Minor, op. cit.).

3.1.8.2 Each customer has their own unique risk profile and cognitive computing will enable insurers to assess these profiles at a customer level, instead of imputing high level characteristics into a rigidly defined product model for an assessment (Minor, op. cit.).

3.2 Methodology and Tools3.2.1 According to the research carried out by Novarica in Josefowicz & Diana

(op. cit.) based on 86 insurers, the results reflecting the tools used extensively as part of the Big Data initiative by insurers included the following:a) 50% of insurers used traditional computing, storage, database and analytics

technology;

Page 25: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

130 | K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

b) 5% used specialised appliances;c) 8% used specialised databases/file systems; andd) 8% used specialised analysis tools.

3.2.2 The Big Data market as measured by Kelly (unpublished) considers vendor revenue derived from sales of related hardware, software and services in the United States. As outlined in Figure 5 below, Big Data-related services revenue made up 40% of the total market, followed by hardware at 38% and software at 22%. He states that this is partly due to the open source nature of much Big Data software and related business models of Big Data vendors, as well as the need for professional services to help organisations identify Big Data uses in their businesses, build solutions and maintain and monitor performance.

3.2.3 Valuable business insight can be derived from Big Data. Based on industry experience, leveraging lots of data in a variety of forms are mainly supported by architectures such as MapReduce and Hadoop:a) MapReduce is a divide and conquer-based programming framework used for

parallel processing (Van der Lans, 2011).b) Hadoop is an open source technology stack that implements the concept of

MapReduce (Wiley, 2012).

3.2.4 MapReduce3.2.4.1 MapReduce is unknowingly used by millions of people daily when

searching the internet using Google (Van der Lans, 2011). He states that the MapReduce

Figure 5 Big Data revenue

Page 26: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS | 131

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

concept was introduced in 2004 by two Google Engineers and was patented by Google in January 2010.

3.2.4.2 MapReduce is a not a programming language but rather a programming model focusing on breaking down a task into smaller units and distributing processing over a variety of processors in parallel (Van der Lans, 2011).

3.2.4.3 There a two processing steps (Van der Lans, 2011):a) Map Key or value pairs are mapped. The key component identifies the informa-

tion (similar to a column in a relational database). The value component is the actual data associated with the relevant key. The Map function uses the key or value pairs to produce intermediate values together with an output key.

b) Reduce The intermediate values generated in the Map step are combined into final values for the same output key.

3.2.5 Hadoop3.2.5.1 Hadoop was created by Doug Cutting as an implementation of

MapReduce (Wiley, op. cit.). According to Wiley (op. cit.), he named it after his son’s toy elephant and handed it over to the Apache Software Foundation.

3.2.5.2 Hadoop is an open source software framework consisting of a variety of products and technologies that can be combined such as Hadoop Distributed File System (HFDS), MapReduce, Pig and Hive (Wiley, op. cit.). The Hadoop technologies aim to simplify distributed processing to take advantage of Big Data (Wiley, op. cit.).

3.2.5.3 A typical Hadoop cluster consists of one or more master nodes along with worker nodes (Wiley, op. cit.):a) Master node Hadoop deployments generally have several master nodes that

consist of the following processes: — JobTracker A process that interacts with applications and distributes MapReduce

tasks; — TaskTracker A process that receives tasks from a JobTracker; and — NameNode A process that stores directory of files in HDFS and keeps track of

where data is located.b) DataNodes stores and replicates data. DataNodes interact with applications

when the NameNode has supplied the DataNode’s address.c) Worker nodes provide the processing power to analyse Big Data. Each worker

node includes a DataNode and a Tasktracker.

3.2.5.4 Hadoop architecture consists of three layers to deliver a MapReduce implementation (Wiley, op. cit.):a) Application layer This end user access layer consists of a framework for distrib-

uted computing and is the interaction point for all applications into Hadoop.b) MapReduce workload management layer (JobTracker) This layer deals with

scheduling and balancing the workload across the distributed system.c) Data layer is a distributed filing system used to store information.

Page 27: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

132 | K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

4. DATA AND INFORMATION GOVERNANCE4.1 Overview

4.1.1 Data governance describes how an enterprise manages its data assets. Governance includes the rules, policies, procedures, roles and responsibilities that guide the overall management of an enterprise’s data. Governance provides the guidance and a framework to ensure that data are accurate, consistent, complete, available, and secure (PTAC, 2011). Data governance is usually enforced via an executive-level data governance board, committee, or other organisational structure that creates and enforces procedures for the business use and technical management of data across the entire organisation (Russom, unpublished).

4.1.2 According to Shek et al. (unpublished), information governance aims to provide a management framework for an organisation’s information based on its levels of business value and corresponding risks. If applied appropriately, information governance can provide a platform to generate fast, high-quality information to help management make key business decisions (Shek et al., op. cit.). In addition, they state that it can assist customers and regulators with insights into how the data has been collected, used, transferred, stored and destroyed. An example of an Information Governance Maturity model is illustrated in Figure 6, which outlines the link between information accuracy and confidence in an organisation. Companies with higher-level maturity models tend to have clear leadership commitment embedded in their business, which includes common governance processes and structures.

4.1.3 Nadhan (unpublished) draws attention to the fact that in simple terms, the key difference between data and information governance is that data governance is the tool that is used to ensure that any form of the organisation’s data is accessible to the right people, at the appropriate time, in the correct format, through the right

Figure 6 Information governance maturity model

Page 28: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS | 133

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

distribution channels, while information governance applies to the knowledge (or understanding) gathered from this data.

4.1.4 According to Shek et al. (op. cit.), protecting client data, confiden tiality and strict personal data protection laws requires that the handling of customer data should be transparent and have clear accountability. They state that by classifying and identifying critical data components and understanding its location and flow, organisations can have improved control over their information assets. They also highlight that a planned and agreed governance structure around information allows organisations to support their business objectives more effectively and efficiently while meeting regulatory requirements. Strong governance supports brand protection and increased enforcement of policies both internally and with third parties (Shek et al., op. cit.). In a world where data loss is heavily publicised and penalised, getting this right is beneficial from a reputation point of view.

4.1.5 As organisations grow, managing information may become more complex; data security breaches now appear to be headline news on a regular basis (Shek et al., op. cit.). Brandon (unpublished) outlines an example where Gidani, the licensed operator of the South African national lottery, almost lost R400-million-a-year contract due to a data breach. The consequences can present challenges as organisations’ earnings and reputation are impacted, calling for the need for measures to be put in place to protect confidential data (Shek et al., op. cit.). Shek et al. also states that differing regulations regarding national secrets and personal data protection guidelines are becoming more complex. In conjunction with the increased focus on information risk and customer data confidentiality, this implies that information governance must also be discussed during Board meetings.

4.2 Data Governance4.2.1 Introduction

4.2.1.1 According to Sun (unpublished), data governance is comprised of various components. However, for the purpose of this paper, we will focus on a few fundamental key areas of data and information governance. Understanding how data and information is governed in an organisation is the key to compliance and the market brand (Shek et al., op. cit.).

4.2.2 Data Quality4.2.2.1 Ensuring accuracy and completeness of data is fundamental within all

departments in a business. The accuracy and timely information helps to manage services and accountability. This helps to prioritise the best use of a company’s resources. Data quality is achieved through good leadership and governance, policies, systems and processes, people and skills, and finally the effective use of data (Shek et al., op. cit.).

4.2.2.2 At a first glance, the quality of Big Data can be expected to follow the same dimensions as for the traditional data (Kobielus, unpublished). However, while

Page 29: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

134 | K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

collecting and using information from social media, the data collected may include inaccurate and exaggerated information (Kobielus, 2013).

4.2.2.3 Kobielus (op. cit.) highlights that appropriate predictive weights could be applied to behavioural models that rely heavily on verbal evidence, such as Tweets, logs of interactions with call-centre agents, and responses to satisfaction surveys.

4.2.2.4 Big Data is different compared to traditional data in various dimensions. The following table provides an overview of these dimensions and the different data quality aspects to consider:

Table 1 Big data versus traditional data

Dimension Traditional Data Quality Big Data QualityFrequency of processing Batch-oriented Real-time and batch-orientedVariety of data Largely structured Structured, semi-structured, or unstructuredConfidence levels High “Noise” needs to be filteredTiming of data cleansing Cleansed prior to loading Might require streaming, in-memory analytics

to cleans dataCritical data elements Data quality is assessed for critical data

elements such as customer addressQuasi or ill-defined and subject to further exploration as critical data elements may change iteratively

Location of analysis Data moves to the data quality and analytics engines

Data quality and analytical engines may move to the data to increase speed

Stewardship Manages a high percentage of the data Manages smaller percentage due to high volumes and/or velocity

4.2.3 Policies, Standards and Strategy4.2.3.1 Data management policies, standards and overall data strategies is

the primary step to ensure data integrity, consistency and sharing of the enterprise’s data resources. Data policies describe what to do and what not to do, and are more fundamental than detailed data standards (Mosley et al., 2009). They state that data policies are the high-level or detailed rules and procedures that an enterprise utilises to manage its data assets. Data policies might include adherence of data to business rules, enforcing authentication, database recovery, data retention, and access rights to data, compliance with laws and regulations, and protection of data resources. Data standards describe how to do something, and are the precise criteria, specifications, and rules for the definition, creation, storage and usage of data within an organisation (Mosley et al., 2009). Data strategy is a high level plan for any company to maintain and improve data quality, security and access. It includes the business plans, and it involves strategic decisions with are important for any organisation (Mosley et al., 2009).

4.2.3.2 The data policies, standards and strategy must be clear, timely and regularly updated. They should be easy to understand and readily available to the users. Data policies should include an established procedure for resolving conflicting

Page 30: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS | 135

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

or inconsistent use of data (Sun, op. cit.). He also states that the policies, standards and strategy must be followed and enforced and the development of data policies, Data standards and data strategy should be reviewed and agreed with all involved or impacted parties.

4.3 Information Governance: The South African Legislation4.3.1 Introduction

4.3.1.1 Data and information security management is very important to any organisation and businesses have a responsibility to protect their data (Milner, unpublished). Methods and tools must be put in place to mitigate or minimise threats that can possibly occur (Milner, op. cit.). Some aspects of the South African legislation aim to allow customers to gain more control over their own information (Milner, op. cit.). According to him, legislation requires companies to notify their customers what personal customer information is held by the company and what will be done with that information, thus helping companies to be more responsible with data they obtain from their customers (Milner, op. cit.).

4.3.2 Constitution4.3.2.1 Section 14 of the South African Constitution guarantees the right

to privacy. The Constitution states that “Everyone has the right to privacy, which includes the right not to have the privacy of their communications infringed.” This protects information to the extent that it limits the ability of people, organisations and the government to gain, publish, disclose or use information about others. Privacy and data protection are fundamental areas for consideration when working with customers’ information.

4.3.3 Protection of Personal Information4.3.3.1 This section is based on information provided in Van Kerckhoven

(unpublished). She highlights that personal information from the public is becoming easily accessible and there are security risks associated with this. Thus, in November 2013 the Protection of Personal Information Act (POPI) was signed into law. Although more business-relevant provisions are yet to be implemented, some sections were implemented in April 2014 which involve the establishment of the Information Regulator and the guidelines for the creation of associated data protection regulations. She states that, as a result, currently businesses have no legal obligations to comply with its provisions. However, once the remainder of POPI is implemented, business will have a one-year grace period to comply with the conditions for processing personal information. Failure to comply with such provisions may result in regulatory fines for non-compliance, civil liability claims, criminal liability, and financial and reputational loss. International precedent shows that businesses who fail to comply have incurred substantial fines and decreasing profits, and drops in share price resulting from a lack of customer confidence and trust following a data breach (Van Kerckhoven, op. cit.).

Page 31: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

136 | K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

4.3.3.2 According to Van Kerckhoven (op. cit.), POPI has a broad scope and covers all personal information touch-points in the information lifecycle. Therefore, if an organisation collects, analyses, distributes, transfers, stores or destroys personal information as a public or private entity it will be bound to adhere to the data protection conditions prescribed in POPI.

4.3.3.3 Van Kerckhoven (op. cit.) states that with the growing utilisation of data modelling and interpretation in the global economy the implications of POPI for business deriving value from “Big Data” is significant. This is due to the following:a) Big Data is founded on the premise that the more quality and complete

information a business has about a person and the relevant market, the more able it is to present tailored products and service offerings to such a person. POPI’s Processing Limitation condition, section 10, prescribes that businesses only collect the minimum amount of personal information required to perform a business transaction and that the collection of such personal information is not excessive. Therefore, for Big Data to be compliant, businesses would need to ensure that each piece of personal information processed is linked to a legitimate business purpose (as required in the condition for Purpose Specification in section 13 of the POPI Act). Furthermore, as the amount of information collected about a person increases, the likelihood of a business’ ability to identify that person increases (i.e. information that originally may not have been able to identify a person may be used to identify a person when combined with other information). Therefore, businesses engaging in Big Data analysis will need to remain cognisant throughout the processing of information to ensure that the identity of persons cannot be derived therefrom (Van Kerckhoven, op. cit.).

b) Additional conditions with which businesses will be required to comply with in terms of POPI are the conditions for Openness, section 17, and Data Subject Participation, sections 23 and 24. These conditions require businesses to ensure that data subjects (persons who are identified by the personal information) are informed and aware of the processing of their personal information and the purpose for which it is being processed. Furthermore, data subjects should be given the opportunity to view and amend their personal information processed by a business. This is particularly relevant to Big Data where profiles of individuals are generated based on information collected from multiple sources (i.e. personal preferences, online behaviours, social networking sites, and commercial transactions, for example). Businesses utilising information, including personal information, for Big Data analytics will need to inform data subjects of such collection, particularly where such personal information is not collected directly from the data subject themselves (i.e. cookies, and marketing databases), and obtain informed consent for the processing of such personal information, in terms section 11 of POPI, in order to be aligned to the conditions for the lawful processing of personal information (Van Kerckhoven, op. cit.).

Page 32: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS | 137

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

c) Section 16 of POPI requires businesses to take reasonable and practical steps to ensure that personal information is complete, accurate, not misleading and up-to-date. Therefore, businesses engaging in Big Data analytics will be required to adhere to an additional standard to ensure that the sources from which they procure personal information are accurate and that the individual profiles generated from such personal information are accurate. Misleading or inaccurate profiles may result in businesses being found to be non-compliant with the conditions of the POPI Act (Van Kerckhoven, op. cit.).

d) Although, POPI places conditions for the protection of personal information processed by public and private entities it is not designed to prevent business from leveraging the value and benefit of Big Data. POPI is about ensuring that businesses are responsible in their processing of personal information to ensure that personal information is not abused or disclosed, particularly where such information is sensitive (i.e. information related to health, political affiliation, or criminal background). Businesses that implement effective POPI solutions will be able to derive the benefits of enhanced data governance practices, data quality, and control which will enable them to better utilise the information in their care (Van Kerckhoven, op. cit.).

4.3.4 Whose Problem is It?4.3.4.1 Historically, establishing an information management framework could

have been considered an Information Technology (IT) challenge (Shek et al., op. cit.). The chief information officers were expected to deliver the appropriate technology to support critical data reporting, while the chief information security officers, who are mostly aligned with IT functions, were expected to protect it (Shek et al., op. cit.). As the regulation of data usage has improved over recent years, the approach to how information is managed has been forced to change so that a cross-functional approach to governance is now required (Shek et al., op. cit.). The business as a whole needs to be involved as the entire business takes ownership of the data (Shek et al., op. cit.).

4.3.4.2 According to Shek et al. (op. cit.), inadequate control over information can directly impact the organisation’s reputation and breaching any regulation or legislation can result in possible personal liability for management. However, organisations find it a challenge to obtain the necessary information to help them decide on what action to take to manage this exposure in a cost-effective way (Shek et al., op. cit.).

4.3.4.3 According to Shek et al. (op. cit.), for organisations to consider all aspects of information governance, they now need input from experts in the various areas of information use and protection in various business units within the organisation. This includes Privacy (Legal, Human Resource), IT Security (Technology, Business Continuity), Management Reporting and Analytics (Business Units), Operations (Chief Operating Officer), and Forensics and Regulatory Compliance (Risk and Compliance). Organisations often find it difficult to collaborate across various teams and need to

Page 33: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

138 | K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

appropriately manage and tackle this broad information management challenge (Shek et al., op. cit.). Although this type of silo mentality across the business increases cost, it is often considered acceptable so long as the business operates effectively, produces reports, and protects data in systems (Shek et al., op. cit.). Information governance can provide standardisation and consistency across teams, help with cost control and can help organisations react effectively to reporting and regulatory requirements (Shek et al., op. cit.).

4.3.5 Key Questions Insurers are asking4.3.5.1 Based on the information gathered by KPMG, the following questions

are currently being asked by insurers with regard to data governance:(a) Does the enterprise have a Data Governance organisation? Is the function

across the enterprise or only in some business units? Does Data Governance involve both the business units and the IT group? How mature is the function?

(b) Is there senior management support for Data Governance? Does it reach C-level executives (CEO, COO, and CFO)? Is there a senior-level Executive Sponsor(s) from the business organisation? Are they actively involved in the Data Governance process? Is there a Steering Committee (or council) including sponsors and senior management?

(c) Is there a central person and/or group responsible for the oversight of all Data Governance? Is there a Data Czar (or similar role) with clear responsibility? Do they have direct responsibility or is it a dotted line? Does the central person or group have authority over localised Data Governance groups?

(d) Is the Data Governance process communicated to the organisation? Is the information about how the data is defined and used readily available? Are Data Governance meeting minutes (such as from the Steering Committee) and decisions published? Is the information well disseminated throughout the enterprise? Do the Business Units know who their representatives are?

(e) Is there a Data Governance conflict resolution process? Is there an escalation process? Do the Data Governance leaders and Steering Committee have the authority to resolve business issues, approve projects, and settle disputes?

(f) Do Data Governance metrics exist in the enterprise? Are the metrics measured regularly and used to improve the drive data improvement efforts? Are there incentives in place to achieve the metrics? Are the results published? Do the metrics include Data Quality measures such as accuracy, completeness, and consistency?

(g) Do the following roles exist: Executive Sponsors, Steering Committee (or Data Governance council), Data Czar (or similar), Data Owners, Data Stewards. Are the roles clearly defined? Are the roles part of the people’s job description and compensation? (Note: the roles may have a variety of titles in different companies but should have the same basic responsibilities.)

(h) Does Data Governance cover all geographies of the enterprise and are they

Page 34: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS | 139

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

involved in the Data Governance process? Has the Data Governance function adapted (or can it be adapted) in the case of an acquisition, divestiture, reorganisation, or outsourcing arrangement?

4.3.5.2 Based on the information gathered by KPMG, the following questions are currently being asked by insurers with regard to data policies and standards:

(a) Do a set of Data Policies exist? Are the Data Policies aligned with the enterprise business objectives? Are the Data Policies up-to-date and are they updated regularly? Do the Data Policies cover the entire enterprise? Are the Data Policies used throughout the enterprise?

(b) Do a set of Data Standards exist? Are the Data Standards up-to-date and are they updated regularly? Do the Data Standards cover the entire enterprise? Are the Data Standards used throughout the enterprise?

(c) How complete and comprehensive are the Data Policies? Have they been developed and endorsed by key impacted parties including the Data Governance team, Data Producers and Data Consumers? Do they include policies for key areas such as: minimising duplication of data; usage of a System of Record; management and distribution of master data; responsible usage of data; usage of a data dictionary and/or metadata library; data retention and archival; legal and regulatory compliance; access and security.

(d) How complete and comprehensive are the Data Standards? Have they been developed and endorsed by the Data Governance team, Data Modelers, and Data Architects? Do they include standards for key areas such as: naming standards; abbreviations; normalisation; usage of time and date stamp; data formats; etc.?

(e) Is there a clear and consistent process to distribute and disseminate the Data Policies and Data Standards? Does the process also apply to changes and updates? Are they available via company portal or intranet?

(f) Are the Data Policies and Data Standards easy to understand? Are they written in plain language without a lot of technical jargon? Do the business users understand the Data Policies and Data Standards? Do they know who to call with issues and questions?

(g) Is the process to develop and change Data Policies and Data Standards clearly defined? Is there a controlled process as to who can update the Data Policies and Data Standards? Are approval and escalation processes defined? Are all involved parties part of the development and change process?

(h) What is done to enforce the use of Data Policies and Data Standards? How is Data Policy and Data Standards usage known or measured? Is there ongoing monitoring and audits of the use of Data Policies and Standards?

Page 35: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

140 | K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

4.3.5.3 Based on the information gathered by KPMG, the following questions are currently being asked by insurers with regard to information governance:

(a) Do I know what information is most valuable to my business? Do I know where it is?

(b) Do my employees have access to information they shouldn’t? Do they know how to handle, label, protect, and transmit restricted or confidential information?

(c) Have I classified what information should be considered confidential and have restricted access?

(d) How is my information being passed throughout the organisation and to my external contacts? Is it secure?

(e) Am I compliant with all applicable compliance, regulatory, and legal require-ments related to my industry?

(f) Are my information records destroyed at the appropriate time? Or, if they are destroyed too early, too late, or never?

(g) Am I prepared to deal with the media and manage the legal process if I have a breach?

(h) Are there unnecessary information, records handling and management processes that could cut costs?

(i) Do my third-party and vendor contracts consider the privacy and security of my information important? How are they protecting it?

(j) How do I get a handle on the challenges associated with Information Governance?

5. POSSIBLE OUTLOOK ON THE FUTURE5.1 Kelly (op. cit.) expects that as the market matures through 2017 and beyond, Big Data applications and cloud-based services will play an increasingly important role. This is illustrated in Figure 7. As the underlying infrastructure develops, he believes that organisations will look to service providers to deliver applications and services that link to Big Data infrastructure and target specific, practical business decisions and challenges.

5.2 The following areas have been noted as potential Big Data and analytics developments areas over the next few years (Schiller, 2014):a) An increased level of merging of internal and external data as well as structured

and unstructured datab) Further applications of social and digital media;c) More companies will start to create technologies that can handle and process

different types of data. This will lead to better targeting of customers, better understanding of customer behaviour and risk management processes as well as continuous feedback into the underwriting, pricing and claims processes;

d) There may be an integration and further development around telematics data;

Page 36: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS | 141

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

e) There may be complex statistical methods being introduced with further developed forecasting, machine-learning and data-mining techniques. There may also be a development of self-learning systems to handle the vast amount of data; and

f) The insurance industry may start moving into new products that cover more complex, multiple risks.

Figure 7 Big Data market forecast

Page 37: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

142 | K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

BIBLIOGRAPHYAkabor, N (unpublished). Fitbit now available in South Africa. Nafisa, www.nafisa.co.za/fitbit-

now-available-in-south-africa/, 21/07/2014Barnes, S (unpublished). Predictive Modeling. Casualty Actuarial Society, www.casact.org/

newsletter/index.cfm?fa=viewart&id=6476, 18/7/2014Batty M, et al. (unpublished). Predictive Modelling for Life Insurance, www.google.co.za/?gws_

rd=ssl#q=Predictive+modelling+can+be+defined+as+the+analysis+of+large+data+sets+to+make+inferences+or+identify+meaningful+relationships%2C+and+the+use+of+these+relationships+to+better+predict+future+events+, 22/06/2014

Brandon (unpublished). How data breach nearly cost Lotteries operator its license to trade. Cibecs, http://cibecs.com/blog/2012/03/26/how-data-breach-nearly-cost-gidani-400-million/, 15/7/2014

Breading, M & Smallwood D (unpublished). What does Big Data really Mean for Insurers? SAS, www.sas.com/resources/whitepaper/wp_49547.pdf, 25/6/2014

Chandler, N (unpublished). How FitBit Works. Electronics- How stuff works, http://electronics.howstuffworks.com/gadgets/other-gadgets/fitbit.htm, 20/7/2014

Coops, A (unpublished). Big Data and why it matters. KPMG, www.kpmg.com/au/en/beyond/new-thinking/pages/big-data.aspx, 21/7/2014

Dave, P (unpublished). Big Data – What is Big Data – 3 Vs of Big Data – Volume, Velocity and Variety – Day 2 of 21. SQL Authority, http://blog.sqlauthority.com/2013/10/02/big-data-what-is-big-data-3-vs-of-big-data-volume-velocity-and-variety-day-2-of-21/, 4/7/2014

Essadiq, R (unpublished). Will Technology Like Fitbits Change Big Data In Healthcare? NTC Healthcare, www.ntctexas.com/healthcare/h-blog/bid/71824/Will-Technology-Like-Fitbits-Change-Big-Data-In-Healthcare, 19/07/2014

Gordon, J & Spillecke, D (unpublished). Big Data, Analytics And The Future Of Marketing And Sales. Forbes, www.forbes.com/sites/mckinsey/2013/07/22/big-data-analytics-and-the-future-of-marketing-sales/ 21/7/2014

Holiday, L (unpublished). Insurance telematics: early adopters need to know the facts, www.fleetnews.co.uk/blog/entry/insurance-telematics-early-adopters-need-to-know-the-facts/41714/, 15/06/2014

IBM (unpublished a). Santam Insurance. Forbes, www.ibm.com/smarterplanet/global/files/us__en_us__leadership__santam_casestudy_final.pdf/ 16/7/2014

IBM (unpublished b). Insurance in the age of analytics. IBM, www-935.ibm.com/services/multimedia/Insuranceintheageofanalytics.pdf, 18/06/2014

IBM White paper (2012). Insurance in the age of analyticsIBM White paper (2013). Harnessing the power of big dataJennings (unpublished). Big Data Overhyped? “No,” Say Actual Scientists, www.forbes.com/

sites/netapp/2014/08/19/big-data-gartner-hype-cycle-otoh/, 25/08/2014Josefowicz, M & Diana, F (unpublished). Big data and Analytics in Insurance, Reactions, www.

reactionsnet.com/pdf/reactionsbigdatawebcastaug9.pdf, 9/7/2014Kelly (unpublished). Big Data Vendor Revenue and Market Forecast 2013-2017, http://wikibon.

org/wiki/v/Big_Data_Vendor_Revenue_and_Market_Forecast_2013-2017, 24/08/2014

Page 38: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS | 143

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

Khaliq, S (unpublished). Helping the World’s Farmers Adapt to Climate Change, http://strataconf.com/stratany2012/public/schedule/detail/25140, 12/7/2014

Kobielus (unpublished). Social Data Quality Will Take Back Seat to Data Relevance. Dataversity, www.dataversity.net/social-data-quality-will-take-back-seat-to-data-relevance/, 23/06/2014

Kuketz, D (unpublished). The 7 Biggest Benefits from Big Data. www.utopiainc.com/insights/blog/381-7-biggest-business-benefits-from-big-data, 17/07/2014

Laney, D (unpublished). Big data defined, www.sas.com/en_us/insights/big-data/what-is-big-data.html, 31/05/2014

Livingstone, R (unpublished). The 7 Vs of Big Data. Rob Livingstone Advisory, http://rob-livingstone.com/2013/06/big-data-or-black-hole/, 4/7/2014

Maremont, M & Scism, L (unpublished). Insurers Test Data Profiles to Identify Risky Clients, http://online.wsj.com/news/articles/SB10001424052748704648604575620750998072986?mod=WSJ_hp_MIDDLETopStories&mg=reno64-wsj&url=http%3A%2F%2Fonline.wsj.com%2Farticle%2FSB10001424052748704648604575620750998072986.html%3Fmod%3DWSJ_hp_ MIDDLETopStories, 16/7/204

Milner, JJ (unpublished). 7 things you need to know about South Africa’s new data protection laws. Memeburn, http://memeburn.com/2013/06/7-things-you-need-to-know-about-south-africas-new-data-protection-laws/, 15/7/2014

Minor, K (unpublished). How Big Data and Cognitive Computing are Transforming Insurance: Part 2. IBM, www.ibmbigdatahub.com/blog/how-big-data-and-cognitive-computing-are-transforming-insurance-part-2, 14/7/2014

Mosley, M, Brackett, M, Earley, S & Hendersen, D (2009). The DAMA Guide to Data Management Body of Knowledge. Technics Publication, Bradley Beach

Nadhan, EG (unpublished). Information Governance is more than just Data Governance. Enter-prise CIO Forum, www.enterprisecioforum.com/en/blogs/enadhan/information-governance-more-just-data-go, 15/7/2014

Normandeau, K (unpublished). Beyond Volume, Variety and Velocity is the Issue of Big Data Veracity. Inside-Big data, http://inside-bigdata.com/2013/09/12/beyond-volume-variety-velocity-issue-big-data-veracity/, 4/7/2014

Nosta, J (unpublished). Digital Health In 2014: The Imperative Of Connectivity. Forbes, www.forbes.com/sites/johnnosta/2014/01/02/digital-health-in-2014-the-imperative-of-connectivity/, 22/7/2014

Oxford English Dictionary (2013)Press, G (unpublished). A Very Short History of Big Data. Win Shuttle, www.winshuttle.com/

big-data-timeline/, 21/7/2014Privacy Technical Assistance Centre (PTAC) (2011). Data Governance and Stewardship, http://

ptac.ed.gov/sites/default/files/issue-brief-data-governance-and-stewardship.pdf, 23/06/2014Russom, P (unpublished). Data Governance Strategies, www.pmi.it/file/whitepaper/ 000373.pdf,

21/06/2014SAS (unpublished). Big Data-What it is and why it matters. SAS, www.sas.com/en_us/insights/

big-data/what-is-big-data.html, 4/7/2014Schiller, B (2014). Expert Insight on the Evolving Role of Analytics in Insurance

Page 39: Big Data analytics - Actuarial Society of South Africa · others train their staff to improve their analytical skills in order to enable them to integrate data with tools and techniques

144 | K BHOOLA, K KRUGER, J PEICK, P PIO & NA TSHABALALA BIG DATA ANALYTICS

ACTUARIAL SOCIETY 2014 CONVENTION, CAPE TOWN, 22–23 OCTOBER 2014

Schroeck, M & Shockely, R (unpublished). Analytics: Real-world use of big data in insurance. IBM, www-935.ibm.com/services/us/gbs/thoughtleadership/big-data-insurance/, 14/7/2014

Shek, H, et al. (unpublished). Information Governance- The growing complexity. KPMG, www.kpmg.com/CN/en/IssuesAndInsights/ArticlesPublications/Documents/Information-Governance-201211.pdf, 15/7/2014

Sherer, L, Brown, B & Grepin, L (unpublished). Unleashing the value of advanced analytics in insurance, Mckinsey http://solutions.mckinsey.com/Index/media/62687/Unleashing_the_value_of_advanced_analytics_in_insurance.pdf, 16/7/2014

Smyth, P, Pednault, E, Liu, B & Apte, C (2002). Business Applications of Data Mining. Communications of the ACM

Sorge, L (unpublished). Big data and its relevance to enterprise quality management. Sword Achiever, www.sword-achiever.com/blog/Big-Data-And-Its-Relevance-To-Enterprise-Quality-Management, 1/6/2014

South African Protection of the Personal Information Bill (2009). Protection of Personal Information. Justice 1, 1–52

StackIQ White Paper (unpublished). Capitalising on Big Data Analytics for the Insurance Industry. http://cdn2.hubspot.net/hub/173001/file-18488782-pdf/docs/stackiq_insuranceind _wpp_f.pdf, 24/06/2014

Sun, H (unpublished). Enterprise Information Management: Best Practices in Data Governance. Oracle, www.oracle.com/technetwork/articles/entarch/oea-best-practices-data-gov-400760.pdf, 14/7/2014

Van Der Lans, R (2011). Using SQL-MapReduce for Advanced Analytical Queries, 2nd editionVan Kerckhoven, A (unpublished). POPI. KPMG, https://www.kpmg.co.za/, 23/6/2014Voelker, M (unpublished). Telematics: Trouble or Tipping Point for Midsized Carriers? www.

propertycasualty360.com/2013/04/04/telematics-trouble-or-tipping-point-for-midsized-c?t=tech-management&page=1, 0/7/204

Wikipedia (unpublished). http://en.wikipedia.org/wiki/Big_data, 22/05/2014Wiley, J (2012). Hadoop for Dummies, special edition