[ieee 2008 annual reliability and maintainability symposium - las vegas, nv, usa...

5
1-4244-1461-X/08/$25.00 ©2008 IEEE Efficiently represent diverse System Field Usage in Reliability Testing Peter J.M. Sonnemans, PhD, Eindhoven University of Technology Aravindan Balasubramanian, MSc, Eindhoven University of Technology Kostas Kevrekedis, MSc, Eindhoven University of Technology Martin J. Newby, PhD, City University London Key Words: reliability, testing, system use, operational profile, clustering, grouping, field performance SUMMARY & CONCLUSIONS This paper addresses the problem how to represent diverse field usage of professional systems in an efficient way, so that field usage can be incorporated in reliability tests. With diverse we mean the variability in system use in the field. Operational Profiles are constructed from system field data to represent system field use. A clustering technique is introduced and applied in a strategic way to reduce the diversity in describing diverse system use in the field. In this way testing effort could be reduced by a factor 87 while maintaining 70 % similarity with the original system field data. 1 INTRODUCTION System reliability tests have to be set up in such a way that the tests represent actual field use as close as possible; otherwise the outcomes don't say anything about the reliability that will be experienced in the field. Nowadays, professional systems, like high-volume copiers, automotive equipment, medical scanners, etc. are sold and used all over the world and used in very different ways. For a manufacturer it is impossible to test the systems accordingly in great detail, because of the natural business restrictions, being mainly budget and time. Question is then how to efficiently represent diverse field usage in the reliability testing of such professional systems. This paper demonstrates a method how to do this in an efficient way for a fleet of medical scanning systems used in health care environments. The system to be tested according to real-life field use is a medical cardio-/vascular scanning system, as can be seen in Figure 1, used for diagnostic and interventional purposes in hospitals, e.g. for catheterization. The patient, lying on the table, is scanned by an X-ray tube/detector combination that is positioned accurately according to a scanning plan. Scanning data is retrieved, being processed and the resulting images are depicted on monitors for the medical specialist on a real-time basis, and stored for later examination. In the second section diversity in system field usage is described based on available field data. In the third section a Figure 1 – System to be tested general applicable method is presented to systematically reduce diversity. The method presented, will be applied multiple times to the field data to achieve the desired reduction in diversity of the observed data. This strategy is explained in section 4 and applied in section 5, where the achievements are also presented. In the last section some conclusions are summarized and discussed. 2 DESCRIBING SYSTEM USE These systems record a lot of system internal condition- data (despite the patient scanning data) for system condition monitoring, error recovery, diagnostic, maintenance and service purposes. These data contain a lot of information about how the specific system is being used, i.e. not only when the system is being operated but also its condition during use. These logging data are the original source we considered as the starting point to analyze the actual system use in practice. These system loggings were extracted from the participating hospitals via a remote service network that is operated for service purposes. In order to be able to analyze system use, we first have to characterize or describe the concept of 'use'. The field of Software Reliability Engineering (SRE) is very helpful in this. SRE is a practice to guide reliability testing in a software based environment [1, 2] based on a user-oriented approach. It describes a practice and accompanying techniques to plan and guide software development and software testing focused on

Upload: martin-j

Post on 21-Feb-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2008 Annual Reliability and Maintainability Symposium - Las Vegas, NV, USA (2008.01.28-2008.01.31)] 2008 Annual Reliability and Maintainability Symposium - Efficiently represent

1-4244-1461-X/08/$25.00 ©2008 IEEE

Efficiently represent diverse System Field Usage in Reliability Testing

Peter J.M. Sonnemans, PhD, Eindhoven University of Technology Aravindan Balasubramanian, MSc, Eindhoven University of Technology Kostas Kevrekedis, MSc, Eindhoven University of Technology Martin J. Newby, PhD, City University London

Key Words: reliability, testing, system use, operational profile, clustering, grouping, field performance

SUMMARY & CONCLUSIONS

This paper addresses the problem how to represent diverse field usage of professional systems in an efficient way, so that field usage can be incorporated in reliability tests. With diverse we mean the variability in system use in the field.

Operational Profiles are constructed from system field data to represent system field use. A clustering technique is introduced and applied in a strategic way to reduce the diversity in describing diverse system use in the field. In this way testing effort could be reduced by a factor 87 while maintaining 70 % similarity with the original system field data.

1 INTRODUCTION

System reliability tests have to be set up in such a way that the tests represent actual field use as close as possible; otherwise the outcomes don't say anything about the reliability that will be experienced in the field.

Nowadays, professional systems, like high-volume copiers, automotive equipment, medical scanners, etc. are sold and used all over the world and used in very different ways. For a manufacturer it is impossible to test the systems accordingly in great detail, because of the natural business restrictions, being mainly budget and time.

Question is then how to efficiently represent diverse field usage in the reliability testing of such professional systems. This paper demonstrates a method how to do this in an efficient way for a fleet of medical scanning systems used in health care environments.

The system to be tested according to real-life field use is a medical cardio-/vascular scanning system, as can be seen in Figure 1, used for diagnostic and interventional purposes in hospitals, e.g. for catheterization.

The patient, lying on the table, is scanned by an X-ray tube/detector combination that is positioned accurately according to a scanning plan. Scanning data is retrieved, being processed and the resulting images are depicted on monitors for the medical specialist on a real-time basis, and stored for later examination.

In the second section diversity in system field usage is described based on available field data. In the third section a

Figure 1 – System to be tested

general applicable method is presented to systematically reduce diversity. The method presented, will be applied multiple times to the field data to achieve the desired reduction in diversity of the observed data. This strategy is explained in section 4 and applied in section 5, where the achievements are also presented. In the last section some conclusions are summarized and discussed.

2 DESCRIBING SYSTEM USE

These systems record a lot of system internal condition-data (despite the patient scanning data) for system condition monitoring, error recovery, diagnostic, maintenance and service purposes. These data contain a lot of information about how the specific system is being used, i.e. not only when the system is being operated but also its condition during use. These logging data are the original source we considered as the starting point to analyze the actual system use in practice. These system loggings were extracted from the participating hospitals via a remote service network that is operated for service purposes.

In order to be able to analyze system use, we first have to characterize or describe the concept of 'use'. The field of Software Reliability Engineering (SRE) is very helpful in this. SRE is a practice to guide reliability testing in a software based environment [1, 2] based on a user-oriented approach. It describes a practice and accompanying techniques to plan and guide software development and software testing focused on

Page 2: [IEEE 2008 Annual Reliability and Maintainability Symposium - Las Vegas, NV, USA (2008.01.28-2008.01.31)] 2008 Annual Reliability and Maintainability Symposium - Efficiently represent

reliability and is standardized by the IEEE [3]. The prominent technique to incorporate the user perspective here, is the use of so-called Operational Profiles (OP) to describe user behavior.

An OP is a pre-defined set of operations that a system can perform and their associated probabilities of occurrence [4, 5]. An operation is a major logical task (or function) of short duration. Despite that OPs originated in the software-only industry, they are also applicable in situations with hardware interactions [1, 6]. OPs, which we will use to describe the concept 'use', can be represented in tabular form or in a histogram-plot expressing operations versus their probabilities of occurrence.

Our original data source, a system logging, exists of a long sequence of encrypted technical codes representing the status of the system in time and as such not very helpful. Such a logging has to be "translated" to or converted into an operational profile in order to be useful.

First step then is to determine a list of operations for which an occurrence frequency will be determined and that provides a good overview of (different) system use. A typical operations list contains between 20 and several hundreds of operations [2]. The operations list was made in co-operation with representatives of different departments to ensure that it is able to capture the complete spectrum of different system use and the operations list contains 98 different operations. We will use this number of operations as an indication of the required test effort for each system.

As a second step the codes used in the logfiles that indicate the start or the end of an operation were identified, besides additional helpful codes, such as equipment parameter settings that were used. For the purpose of reliability testing, the "sojourn time" for each operation was also deduced from the logging data. Finally, a software application was developed that was used to 'convert' each system's logfile into a corresponding operational profile, based on these 'start' and 'ending' codes.

In this way system use was successfully represented by means of operational profiles. However, since each individual system resulted in a unique operational profile, still nothing was gained with respect to efficiency up to this point. In order to gain efficiency the diversity in system usage descriptions has to be reduced. Clustering techniques, described in the next section, are very helpful in reducing diversity in an objective way.

3 REDUCING DIVERSITY

So-called clustering techniques are very helpful in making combinations in an objective way based on similarity [7, 8, 9 and 10].

A clustering technique, in general, classifies 'cases' into clusters. The aim is to establish a set of clusters such that the 'cases' within a cluster are more 'similar' to each other than they are to 'cases' in other clusters.

To measure the 'similarity' between 'cases' we have to define a distance measure that enables the comparison of all relevant parameters of the 'cases' simultaneously. For this purpose the Euclidian distance ),( bad was used, i.e. the

difference between two points a and b positioned in a Euclidian space defined by the number of relevant parameters. An example, based on two parameters ).( qp is given in Figure 2.

Figure 2 – Euclidian distance

Subsequently, a distance matrix is constructed, containing the distances between all 'cases'. The two 'cases' with the smallest distance are combined into a cluster (having maximum similarity).

The next step is to choose a so-called 'amalgamation rule' [9, 10] enabling the calculation of the distance between a cluster and a 'case' or between two clusters. Since there are no specific reasons to prefer one parameter above another we applied the so-called 'average-linkage' rule, that calculates the distance as the average distance between all pairs of 'cases' in two different clusters. The two clusters with the smallest (average) distance and therefore maximum similarity are then agglomerated and the (average) distance between the remaining clusters are calculated again. We calculate the 'similarity' ),( jis between two 'cases' i and j as a value complementary to the distance concept, e.g.

⎥⎦

⎤⎢⎣

⎡−=

max

),(1),(d

jidjis (1)

where maxd is defined as the maximum distance between the cases in the original distance matrix. Other alternatives for defining the similarity are possible.

This process is continued till only one single cluster remains, containing all original 'cases'.

During this clustering process the number of clusters reduces, which is the purpose of the clustering, but at the cost of reducing similarity, i.e. losing accuracy in describing the original set of 'cases'. In practice the 'optimum' number of clusters can be identiefied in a graph plotting the average similarity within clusters versus the (reducing) number of clusters. An example can be seen in Figure 3. A sudden drop of the average similarity indicates the 'optimum' number of clusters by the knee of the curve. Continue clustering beyond the knee results in clusters that have no similarity left and thereby losing their communality as a cluster. Stopping before the knee will still leave you with too many clusters.

4 EFFICIENCY THROUGH STRATEGIC CLUSTERING

At this moment, a straightforward way to group the

ab pp −

axis - p

axis - q

ab qq −

22),( abab qqppbad −+−=

a

b

Page 3: [IEEE 2008 Annual Reliability and Maintainability Symposium - Las Vegas, NV, USA (2008.01.28-2008.01.31)] 2008 Annual Reliability and Maintainability Symposium - Efficiently represent

systems would be clustering the systems based on their individual OPs. Certainly, a reduction in use-description diversity could be achieved, however since these OPs are quite large (an OP contains about a 100 operations) the reduction of such a 'black-box' clustering strategy can be expected to be limited. However, experts in the company know that the systems are used in a limited number of medical application area's, being Cardiology, Neurology and Radiology.

Furthermore, these experts also claim that during a normal session, the working process executed by every system consists of groups of operations, i.e. so-called 'working area's' (WA). Depending on the medical specialist using the system, some working area's are used more intensively than others, but normally all WAs are used in each session. The five distinguished WAs are: • Acquisition: concerns the operations for setting the

scanning parameters and regarding the actual acquisition of the scanning data.

• Workflow: concerns operations regarding interactions with other systems and the hospital network. Furthermore operations using the archiving and querying functions of the system. Printing and patient administration also belong to this area.

• Movement: this working area includes the functions for correct movement and positioning of the system during an examination

• Viewing: concerns the operations for processing the scanning data and for viewing and analyzing the images for diagnostic purposes.

• General: these are the remaining operations that do not belong to any of the other working areas. As a matter of fact, substantial parts of the operational

system development process of the manufacturer is organized based on these working areas.

This expert knowledge is exploited in a clustering strategy in order to gain efficiency. Therefore, it was decided to cluster the systems in so-called 'User Groups' (UG), based on their occupancy rate in each of the three application areas, i.e. Cardiology, Neurology and Radiology.

Furthermore, if the experts' claim, regarding similarity among User Groups in certain Working Area's, would be right, further efficiency could be gained, by exploiting this similarity through a further reduction of the diversity in the OPs.

To make this latter efficiency gain possible would require describing system use per each WA separately. This requirement led to the decision to split up the original OPs into smaller pieces corresponding to each WA, resulting in the so-called Partial Operational Profiles (POPs). Given an OP for each system, for each WA a corresponding POP was constructed (containing only that part of the original OP that describes the specific Working Area).

Choosing this clustering strategy could result in two possible efficiency gains. In the next section the results of the actual data analysis according to this approach will be presented.

5 ANALYSIS OF FIELD DATA AND RESULTS

Logging data of 121 (medical) systems in use at different hospitals around the globe are gathered and analyzed. Since these systems are used intensively, gathering data for a period of two months was sufficient to give a representative view on systems' use in the field. Since there are 98 operations for each individual system to be attended to, this would imply

858,1198121 =× operations as a measure of usage diversity which is related to the effort necessary to test the systems according to their actual usage in the field.

From the logging data the Operational Profiles were constructed for each system and the clustering technique was applied to cluster the systems into User Groups, based on their occupancy rate in the three different medical application areas, at the cost of decreasing similarity. In Figure 3 the similarity versus (decreasing) nr. of clusters is shown

Figure 3 – Similarity vs. (decreasing) nr. of clusters

and there it can clearly be seen that the similarity remains high during the beginning of the clustering process, meaning that the growing clusters preserve a high similarity for quit a while. At the end there is a sudden drop of the similarity and according to clustering theory this is the moment to consider stopping the clustering process, since the resulting clusters lose too much similarity and therefore the clusters become meaningless.

In our case we decided that 4 clusters was an acceptable number without too much loss of accuracy in describing the (diverse) usage of the original unclustered systems (about 70 % similarity). Stopping the clustering process around the 70 % similarity-level results in 4 clusters (indicated by the 4th-last dot lying above the 0,7 level on the vertical similarity scale).

In Figure 4 a commonly used so-called 'dendrogram' is depicted, illustrating the entire clustering process with U-shaped lines connecting the clusters in a hierarchical tree. In a dendrogram, the horizontal axis at the top represents system numbers and the vertical axis represents the similarity of the clusters. The dendrogram starts at the top with all 121 distinct systems (at the maximum similarity level) and while the nr. of clusters reduces in each clustering step (by agglomerating two

decreasing nr. of clusters

Page 4: [IEEE 2008 Annual Reliability and Maintainability Symposium - Las Vegas, NV, USA (2008.01.28-2008.01.31)] 2008 Annual Reliability and Maintainability Symposium - Efficiently represent

systems or clusters of systems into one single new cluster indicated by a U-shaped line connecting the systems or clusters of systems to be agglomerated in this step) the tree reduces finally to one single cluster (at the minimum similarity level). It can be seen from Figure 4 on the vertical axis, that the 70 % similarity level we chose corresponds to the 4 clusters (indicated by the 4 dots present at the 0.7 similarity level) we will use throughout the remainder of this paper.

Figure 4 – Dendrogram

From this dendrogram it can also be seen that the 4 resulting clusters are of different size (by starting for each cluster at the dot on the 70 % similarity level and looking up towards the horizontal axis at the top with the original system numbers each cluster contains). From left to right we will call the clusters usergroup 1 (UG 1) to 4 (UG 4), and they contain respectively 8, 17, 85 and 11 systems, which can not be deduced from the dendrogram due to lack of detail). UG 3 consists of one single system type that is mainly being used for Cardiology application, while the other 3 clusters contain a different system type being used with different occupation rates in all application areas.

In Figure 5 the resulting test effort requirement is depicted. The horizontal arrows in the first column represent the 4 UGs (with the number of systems they contain between brackets) and the large arrows at the top represent the five WAs (with the number of operations in the corresponding POPs)

Figure 5 - First reduction of testeffort from 11,858 to 392 operations

A lot of efficiency was won here by clustering the original 121 systems into only 4 User Groups, which have similar occupancy rates in the different medical application areas. This first step of the clustering strategy results in a test effort

of 4 UGs × 98 operations 392= operations to be tested. This is a reduction by a factor 30, which is quit an efficiency improvement coming from 11,858 operations at the start.

The second gain of our clustering strategy could be realized by clustering POPs among those User Groups in certain Working Areas where the experts' claim proved to be right. To prove this we again applied the clustering technique in each WA, starting with the POPs of the different UGs and monitoring the similarity of the POPs while they were being clustered. As can be seen in Figure 6, for three Working Area's (i.e. Workflow, Movement and General) large similarity exists

Figure 6 - Second reduction of testeffort from 392 to 136 operations

among all User Groups, justifying the aggregation of the POPs into one single POP for each of these three Working Area's.

For the other two WAs (i.e. Acquisition and Viewing) relevant differences were found among certain UGs, justifying distinct POPs, although here, also a reduction in the number of POPs could be achieved (i.e. UG 1 and 2 have similar Acquisition POPs and UG 1 to 3 have similar POPs in the Viewing area).

This second clustering step results in 392 operations (adding the nr of operations in each required POP), which gives a total reduction by a factor 87 compared to the original data set.

In total, only 8 POPs were necessary to describe the diverse system use of the entire fleet of 121 systems all over the world.

For the purpose of reliability testing, the "sojourn time" of each UG in each WA was also deduced from the logging data. Reliability test scenario's for a particular UG (which operations to be called how often), can then simply be generated by sampling from the appropriate POPs of each WA, taking the UG's sojourn time in the WAs into account.

6 CONCLUSIONS AND DISCUSSION

This paper shows that diverse system usage can efficiently be represented in Reliability testing, as was illustrated in this paper by: • clustering different forms of system usage into a limited

number of user-groups and • divide the working process into different working area's

and • establish POP's for each working area/user group

Acquisition (2)

UG 2 (17)

UG 1 (8)

UG 3 (85)

UG 4 (11)

POP 1.A

POP 2.A

POP 3.A

POP 4.A

POP 1.W

POP 2.W

POP 3.W

POP 4.W

POP 1.M

POP 2.M

POP 3.M

POP 4.M

POP 1.V

POP 2.V

POP 3.V

POP 4.V

POP 1.G

POP 2.G

POP 3.G

POP 4.G

Movement (20)

Viewing (34)

General (34)

Workflow (8)

POP 1

POP 2

POP 3

POP 4 POP 5

POP 7

POP 6

POP 8

Acquisition (2)

UG 2 (17)

UG 1 (8)

UG 3 (85)

UG 4 (11)

Movement (20)

Viewing (34)

General (34)

Workflow (8)

Page 5: [IEEE 2008 Annual Reliability and Maintainability Symposium - Las Vegas, NV, USA (2008.01.28-2008.01.31)] 2008 Annual Reliability and Maintainability Symposium - Efficiently represent

combination and aggregate per working area in case of large similarity among user-groups.

• determine the "sojourn time" of each user-group in each working area

• build a test-scenario for a specific user-group, by sampling from the appropriate OP's, according to the "sojourn time" in each working area. In our example, the usage of 121 systems all over the

world is well-represented in reliability testing by only 4 user-groups and a total of 8 POP's.

Of course there are also some considerations to be taken into consideration. First of all, logging data is not always available or accessible. For example for consumer products, such data is usually non-existing. Fortunately, for professional equipment, these logging facilities become more and more custom.

Furthermore, this methodology is based on the assumption that current field use of systems is also representative for the use of new equipment (under development), which is a reasonable assumption for incremental system development (often applied to professional systems) but does not hold for radically new product development. There, early user-tests could be more helpful. As a last point we like to mention the software program to convert the raw data from logging files into usable Operational Profiles. If, for some reason, the functionality of a new system is changed/extended or the coding in the logfiles are changed, this program has to be modified too, which could disappoint the advocate of automatic testing.

In this paper we have demonstrated a successful and practical way to efficiently represent diverse field usage. We believe that the fleet of systems that was used in this paper as a demonstrative example, concerns not such an exotic product that the method can not be applied elsewhere.

ACKNOWLEDGEMENT

The work of Mr. R.J.J. van Meeteren is gratefully acknowledged.

REFERENCES

1. Musa, John D., “Software Reliability Engineering”, McGraw-Hill, 1st edition, 1998, ISBN: 0-07-913271-5.

2. Musa, John D., “Software Reliability Engineering: More Reliable Software Faster and Cheaper”, McGraw-Hill, 2nd edition, 2004, ISBN: 1-4184-93872.

3. IEEE Std 982.1-1988, “IEEE standard dictionary of measures to produce reliable software”, IEEE Standards, 30 April 1989.

4. Musa, J.D., Operational Profiles ins Software Reliability Engineering, IEEE-Software, 10-2, p. 14-32, ISSN 0740-7459, 1993.

5. Musa, J.D., Software Reliability Engineering, Mc-Graw Hill, 1st edt. ISBN 0-07-913271-5, 1998.

6. Cheung, R., “A user oriented software reliability model”, IEEE Transactions on Software Engineering, volume SE-6, March 1980, pgs 118-125.

7. Hartigan,J.A., Clustering Algoritms, John Wiley & Sons,

ISBN 0-471-35645-X,1975. 8. Jain, A.K., Dubes, R.C., Algoritms for Clustering Data,

Prentice Hall, ISBN 0-13-022278-X, 1996. 9. Jain, A.K., Murthy, M.N., Flynn, P.J., Data Clustering: A

Review, ACM Computing Reviews, Nov. 1999. 10. Sneath P.H.A., Sokal, R.R., “Numerical Taxonomy- The

Principles and Practice of Numerical Taxonomy”, W.H. Freeman and Company, 1973, ISBN: 0-7167-0697-0.

BIOGRAPHIES

Peter J.M. Sonnemans, PhD, MSc Eindhoven University of Technology P.O.Box 513, 5600 MB Eindhoven, THE NETHERLANDS

e-mail: [email protected]

Peter Sonnemans is an assistant professor at Eindhoven University of Technology, in the faculty of Industrial Design, where he is responsible for research and education in the field of Business Process Development. He is also connected to Philips Electronics as a senior consultant in the same field.

Aravindan Balasubramanian, MSc Eindhoven University of Technology P.O.Box 513, 5600 MB Eindhoven, THE NETHERLANDS

e-mail: [email protected]

Aravindan Balasubramanian is a PhD student at Eindhoven University of Technology, in the faculty of Technology Management, where he is doing his research in the field of Quality & Reliability Engineering. His research focuses on predicting reliability of the product during the product development process for professional systems.

Kostas Kevrekidis, MSc Eindhoven University of Technology P.O.Box 513, 5600 MB Eindhoven, THE NETHERLANDS

e-mail: [email protected]

Kostas Kevrekidis is a PhD student at Eindhoven University of Technology, in the faculty of Technology Management, where he is doing his research in the field of Quality & Reliability Engineering. His research focuses on field monitoring techniques for capital goods.

M.J. Newby, PhD, MSc City University of London Northampton Square London EC1V 0HB, UNITED KINGDOM

email: [email protected]

Martin Newby is Professor of Statistical Science at City University. He is active in (inter)national professional bodies in statistics and reliability and is adviser to the Committee on Defence Equipment Reliability and Maintenance in the UK. Since 2005 he is also a Professor at Eindhoven University of Technology.