performance metrics: avoiding the pitfalls

SPAEF

PERFORMANCE METRICS: AVOIDING THE PITFALLSAuthor(s): RONALD L. STRAIGHTSource: Public Administration Quarterly, Vol. 23, No. 4 (WINTER, 2000), pp. 495-516Published by: SPAEFStable URL: http://www.jstor.org/stable/41288156 .

Accessed: 14/06/2014 23:13

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

SPAEF is collaborating with JSTOR to digitize, preserve and extend access to Public AdministrationQuarterly.

http://www.jstor.org

This content downloaded from 188.72.96.21 on Sat, 14 Jun 2014 23:13:14 PMAll use subject to JSTOR Terms and Conditions

http://www.jstor.org/action/showPublisher?publisherCode=spaef

http://www.jstor.org/stable/41288156?origin=JSTOR-pdf

http://www.jstor.org/page/info/about/policies/terms.jsp


PERFORMANCE METRICS:

AVOIDING THE PITFALLS

RONALD L. STRAIGHT Howard University

ABSTRACT

This article discusses the method often used to select a work group's goals and measures, including brainstorming, nominal group technique, and multivoting. It describes the pitfalls in selecting and ranking output measures and emphasizes the necessity for considering possible interactions when selecting multiple goals and measures. Although all measures usually are given identical weights, which in certain situations may be optimum, in most cases some measures should weigh more heavily than others.

INTRODUCTION

Economic analysis usually assumes the existence of a profit motive. The effect of this motive is cited in introductory economics courses as one of the key strengths of free markets in that the gener- ation of normalize profits in a competitive environment usually produces the most efficient allocation of resources. This analysis is difficult to translate to a not-for-profit or governmental business situation, however where the power of profits is not applicable. This article explores using other measures in situations where profit measures are not applicable.

Within government, performance measurement has been a hot topic recently. Both Congress and the President have required managers to measure organizational efficiency and effectiveness as one way to improve program performance. Early attempts to measure performance were made as early as the 1930s and again in the 1970s under the Productivity Improvement Council and received renewed attention with the publication of Vice President Gore's Report of the National Performance Review , passage of the Govern- ment Performance and Results Act of 1993, and the office of Management and Budget's implementation of the Chief Financial Officer Act of 1990. Each of these initiatives stresses the need for



(496) PAQ WINTER 2000

improved performance measures. For example, Executive Order 12852, Setting Customer Service Standards, requires surveying customers to improve product or service quality. With all of this interest, the body of literature on performance measures is becom- ing quite large.

The American Society for Public Administration (1992) found that the "use of performance measurement is still the exception rather than the norm in American government organizations ... [TJhere is great potential to improve performance, accountability, and reporting, and by integrating systemic performance measurement, monitoring, and reporting, and by integrating performance information into regular policy and management processes."

Nyhan and Marlowe (1995) trace the focus on measurement techniques to improve the production of workers to Taylor (1911) and others following his work. They also cite the pioneering work of the New York Bureau of Municipal Research early in this century and note that, with the general decline in productivity growth in the 1970s, performance measurement became of great concern in all levels of government.

The use of various types of measurement is an often-explored issue. These types are usually divided into five classes: inputs, process, outputs, outcomes, and impacts. Behn (1995:319) illustrates several of these, using the example of a public-health program designed to assist expectant mothers:

°Input measures include the number of public-health clinics providing this service, the number of public-health nurses working in these clinics, and the dollars spent on the program. °Output measures include the number of women who participated in

the program, the number of visits these women made to the clinics, and the prenatal instructions that they followed. °Outcome measures include the number of healthy (and unhealthy)

babies born to women who participated in the program. °Impact measures include the difference between the number of

healthy babies born to women who participated in the program and the number of healthy babies who would have been born to these women had they not participated in the program.

Adherents to the total quality management (TQM) philosophy as taught by Deming (1986) emphasize the need for process measures rather than measurement of results. They say it is the process that



PAQ WINTER 2000 (497)

should be the focus of management attention and managers following the TQM approach should concentrate on the performance of the processes involved in creating the outputs instead of focusing on indicators of results (McGowan, 1995). This article discusses the measures fitting both descriptions and attempts to consider the implications of each measure carefully.

Measurement of results has long been a staple tenet of good measurement practice. In recent years, however, measurement, particularly of program outcomes or results, has become of increasing concern to both the private and the public sectors. In the public sector, measurement is more difficult due to the lack of one of the public sector's common measure-profit.

In the administration of the public sector, workshops and training programs have been established to discuss the need for performance measures and of the need to measure outcomes rather than inputs or processes. In many government administrative functions, however, the work result is an intermediate product or service that assists in producing a program outcome but which is not an outcome itself. For example, the government purchasing function may be necessary to obtain the furniture to equip a program office. The contracting function produces outcomes such as completed contracts or even delivered products but does not produce program outcomes. Since, as Kaplan and Norton (1992) say, "what you measure is what you get," selecting the best performance measures is of critical importance for managers.

For common administrative support functions for which outcome measures are not typically available, output measures are often used to assess quality, effectiveness, and efficiency. With no outcome measures or for = profit private industry measures, output measures are often necessary as a manager's best alternative. However, these measures of intermediate products or services have the potential to be counterproductive if they are not carefully selected and matched to the goals of the program.

A literature review finds many initiatives that have emphasized measurement in public administration, including Vice President Gore's Report of the National Performance Review , the Government Performance and Results Act of 1993 (GPRA, 1993), and the Office of Management and Budget's implementation of the Chief Financial Officers (CFO) Act of 1990. In addition to these official publica- tions, many others have devoted research efforts to the issues of performance measurement. Such issues include the selection of




measures that are most relevant to the success of the program, the avoidance of measures that motivate behavior that is counterproductive, and the need to determine a method of weighting measures.

This article discusses the potential pitfalls in selecting and ranking output measures. Frequently, multiple goals and measures are selected without consideration of how they interact. Better information systems that can easily collect more data may make this problem worse. Also, each measure is most often given the same weight as any other. Under some circumstances, several equally weighted measures may be the optimum management tool but in most cases thoughtful reflection will quickly determine a superior set of measures and weights.

This article explores issues of importance to managers of not-for- profit organizations and public administrators of government agencies. The results are also important for certain business processed within the for-profit companies. Within this environment, the selection of goals and metrics through various voting or other mecha- nisms, the weights assigned will be explored. This article will concentrate on discussing the primary pitfalls that often occur in establishing performance metrics which fall in the areas of selecting goals, selecting metrics, and selecting weights.

SELECTING GOALS

The pitfalls discussed in this section include: (1) failing to select goals at all or (2) selecting or imposing goals without regard to developing commitment of the team that will have to achieve them.

One of the first steps to be done before selecting the performance metrics is to select the goals. Failure to do so results in information or data being collected for a metrics system that may simply be a series of unconnected numbers having no relationship to the organization's performance measures and may not be a valid indicator as to whether it is meeting its goals. In public agencies, the legislature or other outside body may set some goals but usually the agency will still have the opportunity to establish numerous mission- related goals for its specific requirements.

The managers and the staff should jointly agree on the goals in order to have the entire team committed to their achievement. Several techniques are widely used to assist the staff in developing and selecting the organizational goals. Among these are brainstorming, often using multivoting to sort out the results, the Crawford slip




method, and the nominal group technique. The published research literature discusses each method thoroughly so they are only briefly reviewed here.

Brainstorming involves the group trying to express as many ideas as possible with no initial evaluation of the ideas. It is usually best to review the topic at the beginning to define the subject of the brainstorm clearly (Scholtes, 1988). In setting the goals for an office, the team should first identify the concern of the stakeholders with the operation of the office and then develop ideas of how to satisfy those concerns. Usually one person writes the suggestions on newsprint easel pads and posts the full sheets around the room so all can see them. List all ideas, even those that may seem wild or crazy. During the brainstorming session, encourage all ideas and save analysis for later.

Some companies use a squirt from a squirt gun or a tossed Neft ball to reprimand anyone who tries to cut off an idea by too-early discussion (Schrage, 1996). Usually brainstorming is accompanied by stating ideas as they occur to each person. However, if there are one or two people who are more outspoken and tend to overwhelm other members of the group, the team can use structured brainstorming where the members sit in a circle and each contributes an idea in turn (Bruno, 1995). Bruno also describes silent brainstorming where team members write down their ideas and the team facilitator reads and posts them.

The Crawford Slip Method of simultaneous interviewing also organizes information from groups. In this method, the group writes down all the ideas on different slips of paper similar to the silent brainstorming. The idea inputs consist of one short sentence of simple words per slip. In some situations, the leader may instruct the team on the way the ideas are to be phrased, for example, to start each sentence with an imperative word. The session may have an overall time limit, say 30 minutes. With each idea on its own slip of paper, it is easy to consolidate similar tasks, functions, or action patterns into piles or clusters. Some categories may have one or two slips while others have many and obvious gaps may stimulate further thoughts on the subject (Crawford, Siegel, and Kerr, 1990). An affin- ity diagram (also known as the KJ method after its creator, Kawakita Jiro) is a similar technique for grouping the output of a brainstorming session (DoD, 1994).

After the team develops the list of ideas describing the goals, they must use some method to sort out those that seem to have




more promise or be more appropriate from the others. Multivoting is one technique used to do this sorting. Usually it involves first consolidating like items then the items on the list are numbered and each person votes for those that are most important. Usually each person may use a number of votes equal to one-third or one-fifth of the items on the list (Bruno, 1995; Lynch and Werner, 1992).

After the vote, the members eliminate those items that received no or only a few votes and repeat the voting process. The number of votes that will be required to keep an item on the list will depend on the size of the team and should be decided by the team. Scholtes (1988) provides rules of thumb as one or two votes if the team is five or fewer, three votes for teams of 6 to 15 members, and 4 votes for groups of more than 15 members.

The nominal group technique is another method for generating ideas and making a selection among them. It is characterized by fewer interventions; thus, it is a "nominal group." New groups, con- troversial issues or difficulty in reaching agreement all suggest this method. This summary follows Scholtes' (1988) description of this technique. The group starts just as in a brainstorming session, by generating multiple ideas in response to the task questions under consideration. In this case, the members write down their ideas without discussion. When all are finished, each person in turn reads off one idea which is listed on a flipchart. After listing all ideas, the group spends some time for clarification and discussion. Clarifica- tion or agreement to combine items depends on the person who originated that particular idea.

The next major step in the nominal group technique is to select the key idea(s). If more than 50 items are on the list, it should first be reduced to 50 ideas by allowing group members to withdraw less serious ones, by additional consolidation or by multivoting. Then each person is given several 3 by 5 cards or pieces of paper to assign point values to each item. The number of cards may be 4 up to 20 items, 6 for 20 to 35 items, and 8 for 35 to 50 items.

The person's most preferred item is given the highest card number, say 8 when that many cards are used, with the second item receiving a 7, and so on. When all are finished, the members tally the votes for each idea on the flipchart. The highest point total is the group's selection. If after reviewing the result, the group is in agreement, the members can go on to the next step based on the selected idea. If the group does not agree that the item selected is the most important, the members can push for another item, ask for a revote




or continue with the two or three ideas that received the most votes. In some public administration situations, the above technique

may be limited because the legislature or the executive may have established a set of goals. While these situations do not fit the ideal of having those responsible for executing the goal to be established, the established goal will typically be so broad that it is still important for the workforce team to establish shorter term, mission-related goals.

SELECTING METRICS

The pitfalls discussed in this section include (1) selecting metrics not matching the goals, (2) selecting or imposing metrics without regard to developing commitment by the team that will have to achieve them, and (3) selecting metrics that are counterproductive. Motivating the team, considering the types of measures to be used, and deciding the proper points of comparison for selecting metrics are discussed here.

Once the team reaches agreement on the goals, the next step is to establish a set of metrics to check on progress towards those goals. It is important to align performance measurement and the recognition and reward systems with the goals and desired results of the organization (Troy, 1994). An established working group for each program should review the fundamental purpose of each program and compare it to the organization's goals (DO J, 1995). From this review, the group can identify the specific performance metrics for the program and set target levels of performance.

The group of people who will be measured by the performance metrics should select them. There is always the fear that the group will select easy items or non-challenging targets so they cannot fail. However, in most cases this is not a significant problem. Once the goals are established, either in the group or by the legislature, the executive or customers, the metrics to be used may be effectively established. Sets of metrics are usually structured to provide various levels of accomplishment rather than pass/fail so even an initial low level of accomplishment can be quickly exceeded. However, whether the team establishes the set of metrics or it comes from outside the team, or if it a combination of the two, the set can often produce unintended consequences if not well thought out. It is important to align performance measurement and reward systems with the goals and desired results of the organization.




Here it is useful to remember "what you measure is what you get" (Kaplan and Norton, 1992:71). Many other authors have also noted that the selected measures influence the behavior of the people being measured (Kamensky, 1993; Eccles, 1991; Behn, 1995; Os- borne and Gaebler, 1992). Having those being measured also establishing the measures further enhances the effect of the motivational importance of the measures. Bouckaert (1993), Epstein, (1994), Isham, Narayan, and Pritchett (1995) all discuss the desirability of having some type of collaborative participation in selecting performance measures to achieve higher levels of use and commitment to them and to program outcomes better.

Keeley (1993) advocates both customer feedback and team evaluation as performance measures in organizatios practicing TQM. The internal and external customers provide information on satisfaction with overall products and services (external). This is combined with measures of specific products and services contrib- uting to overall products and services (internal). The group establishes its goals and the leader also evaluates individual contribution to work goals.

The introduction discusses the types of measures-input, process, output, outcome, and impact. Almost all activities need a better level of metrics than inputs. Impact metrics are difficult to devise although they should be considered and used when possible. Many services provided to the public can be counted as an outcome. However, there are many internal customer services where outputs or processes are the relevant measures. Unfortunately, in these situations counterproductive measures are often chosen.

One output measure used for a personnal office could be the number of personnel actions per month. If this were the only measure, poor quality actions, unnecessary actions or other undesirable consequences may result from the desire to maximize the number of actions. A better approach would be to balance the total number of actions with a measure of quality or effectiveness or efficiency. Ammons (1994) reports that perhaps a majority of the standards applied to local governments are process standards such as law enforcement accreditation standards.

"As output measures are more readily and easily developed than outcome measures, more of these are expected initially in the GPRA-required performance plans, but agencies should move toward increasing the number and quality of outcome measures" (Office of Management and Budget, 1996). This implies that the




most desirable primary measures are measures of outcomes. For example, if a program were to increase the level of employment of persons currently receiving public assistance, a simple measure would be to count those that have secured a job by the end of the time in the program.

However, this measure is subject to a number of factors outside the control of program managers, for example, general economic conditions. Such a measure can also produce counterproductive results if the managers are able to "skim the cream" in selecting participants in the program. In other cases, where the outcome of a program or an activity within a program may not produce easily measurable outcomes, it may be more appropriate to measure its outputs. For example, managers may wish to measure the performance of a grants issuing office where the outcome of its work may be difficult to measure but its outputs-grants issued--may be readily available to evaluate.

The private sector has long used financial measures such as net earnings or return on investment. While these are important measures, they are frequently criticized for being applied as only short- term measures. Thus strategic goals may too often take a back seat to short-term quick fixes. In retail companies, current management action may too soon show up in short-term results, thus, these measures may be acceptable. However, in biotechnology companies, where current work may not show up in financial results for some time, the number of patent applications and the number of clinical trials may also be counted (Andrews, 1996).

Eccles (1991) also describes the shift in private, for-profit firms from solely concentrating on financial figures to a broader set of performance measures which may include quality, market share, customer satisfaction, human resources, cash flow, manufacturing effectiveness, innovation, customers' perceptions of the firm's nature and professionalism, customer retention rates or perceived value of goods and services.

Deller, Nelson, and Walzer (1992) discuss measures of effectiveness and efficiency and cite other reports noting that public administrators' effectiveness measures include citizen satisfaction and perceptions, service accomplishments or community conditions. Economists have used microeconomic theory to discuss the optimal supply of public goods with signals provided by those "voting with their feet" or by property value.

They also note that the generally accepted measure of efficiency




is the provision of a certain level of service at the lowest possible cost. Measures cited are dollars or employee hours per unit of output while noting that these measures are faulted for lack of flexi- bility and oversimplification. In their study of rural road services, they used miles of road (gravel, low-bituminous, and high-bituminous) as the output measure, noted the difficulty of measuring local government services, and particularly that they did not consider road surface quality. The input measures were the number of full-time- equipment employees, capital equipment (road graders and single axle trucks), and annual purchases of resurfacing materials (for example, gravel and bituminous product).

The possibility of neglecting changes in quality in developing measures is not a new issue. Hatry and Fisk (1971) noted that, when only using a workload measure, the metrics system fails to reflect quality. Hayes (1977) also noted that, in measuring productivity, output varies in quality. Epstein (1984) discussed the need to include quality to counter unintended effects of using efficiency or effectiveness measures alone. He thinks the degree of difficulty of the work should be discussed in supporting narratives and advocates the use of citizen surveys and notes that the whole process must be dynamic.

Ammons (1994) finds that few associations offer quantitative guidelines for the operation of local government. Survey responses revealed less than a dozen organizations out of 50 reviewed that declared quantitative, nonprocedural yardsticks, excluding practi- tioner credentials, against which to judge local governments. Among the reported yardsticks, not all were true performance standards. Some were resource standards (typically, staffing levels and per capita expenditures) that sometimes appear self-serving and that Hatry had labeled as "pseudo-measures" of performance. The importance of the quality measurement system in a total quality management environment has been noted. "There is a tendency to under- state the importance of quality measurement, as if measurement were some kind of innate public sector trait" (Hyde, 1992:31). Eccles (1991) reports on some measures related to quality- defect rates, response time, and delivery commitments.

However, some metrics, such as the total number of actions, may be a useful measure and often cannot be eliminated. Instead, pair it with a quality measure with each given an appropriate weight. A review of current literature finds little practical discussion on approaches to resolving counterproductive situations that are related to metrics.




Customer satisfaction is often an important metric. Most often customers report a level of satisfaction on a number of dimensions thought to be of interest to them. Such things as timeliness in meeting their needs, courtesy and helpfulness of the staff or assistance in emergency situations are scored from "very dissatisfied" to "very satisfied." Although measures are subject to subjective judgment, this may be the most important set of metrics. In some cases, when customer satisfaction is of overriding concern, one could argue that this is the only relevant measure.

Often performance indicator is a synonym for performance measure but occasionally authors make a distinction. Performance indicators may be an indirect way to measure the actual goal. For example, the number of litter baskets emptied may be an indicator of street cleanliness but is less direct that a visual inspection of the streets after the baskets are emptied. As in this case, the indicator is easier to record and administrate than a direct measure and many indicators are sufficiently descriptive of the goal to be useful.

Among the characteristics that Kamensky (1993) cites as part of a good results-oriented measurement system are that it 1) focuses on outcomes and quality-not process; 2) defines outcomes and quality from the client/user's perspective; 3) uses a few select indicators for top managers (with more measures, more frequently, for line managers); 4) emphasizes making sure that data are valid and con- sistent over time; and 5) ensures that contextual comparisons are provided in relation to standards, baseline data, or other relevant comparisons.

Similarly, Grifel (1993) finds the four elements of a successful system are the support of top management, reporting on a limited number of measures, rewarding accomplishments of objectives (either in manager's compensation or allowing funds to carry over to next year), and monitoring of objectives by a trained staff member. Bouckaert (1993) finds that measurement systems have to be valid, legitimate, and functional and cites criteria for performance measurement mentioned by four earlier papers. These papers said they must be countable, uniform over time (timeliness), controllable, and mutually exclusive or unique. In addition, two of the papers said they must be accurate, understandable or unequivocal, comprehensive, congruent, direct, process definable, and mission-oriented. Finally, only mentioned once, were the attributes of having the data readily available, reproducible, objective, choosable, tangible, homogeneous, reasonable data collection cost, quality identifiable, and discourage




perverse behavior. Many examples of measures used in other areas have been

provided in the literature. For example, among those cited by Sorber (1993) was cost per pupil; Currall and Kohl (1996) cite educational outputs as student performance on standardized tests; Ammons and Hill (1995), for garbage collection, cite annual costs per household and average tons per route.

Behn (1995) cites birth weight of babies. Ammons (1994:291) finds "the best examples of results standards were produced by the American Water Works Association (1992) in the form of water quality standards (for example, less than 5 mg/L of zinc in finished water) and by the APHA [American Public Health Association] (1991) in its national public health targets for the year 2000," for example, no more than 15 percent cigarette smokers people 20 and older.

Sorber (1993:63) says, "The general purpose of performance measurement is to gain insight into a number of relevant aspects of a production or policy process, in order to develop a better ability to 'control' these processes. In this respect four functions of performance measurement are distinguished: (1) providing early warning on the development of output and outcome; (2) improving allocation of resources; (3) improving the efficiency and effectiveness of production and policy processes; and (4) improving accountability, especial- ly in cases of contract management and agencies."

Others have debated the limits to the comparison that can be made of the measures. Lewis (1948) said an organization must match its results only against itself over time, not against others that were bound to be in different situations of uncertain comparability. Grifel (1993) also says that those data need to be compared from one period to the next. Speaking of the commercial sector, Eccles (1991:134) finds that "[w]hat matters is how a company is doing compared with its current competitors, not with its own past." Sorber (1993:63) notes the use of various performance approaches: comparisons over time, cross -section analysis, and the comparison of actual performance with standards for performance and says that "the approach used depends on the information needs of the decision makers in a particular case."

Certainly, comparisons within the same office from one period to another are useful and comparisons to other organizations may also be helpful but, in that case, the differences in organizational struc- ture and situations must be taken into account. In his discussion of




associations, Ammons (1994:294) finds that carefully developed national statistics on relevant performance indicators provide an excellent basis for comparison, even as norms rather than standards.

This section discusses motivating the team, considering the types of measures to be used, and deciding the proper points of comparison in selecting metrics. Following these points should help avoid pitfalls in the process of selecting metrics by (1) selecting metrics that match the goals; (2) selecting or imposing metrics as a team effort, and (3) selecting metrics that are not counterproductive. As Eccles (1991) notes, a new philosophy regards performance measurement as an ongoing, evolving process.

SELECTING WEIGHTS

The pitfalls discussed in this section include (1) failing to understand that implicit weights are in place if no other action is taken and (2) assigning weights explicitly without foreseeing their motivational impact.

All metric systems contain weights on the individual measurement elements. Most situations make no mention of these weights and, as a result, they appear equally weighted although almost always some factors being measured are more important than others. It is important to state the weights that are being applied to each measure if all workers are to be working toward the same organizational goal. By being able to see clearly how changing weights applied to a set of measures can dramatically affect the overall score of that operation, managers will be better able to select those that reflect the emphasis that they desire in each area.

For example, suppose on the final course examination there are four questions, two short answers and two essays but no information on how many points each is worth. Each student would count on spending more time on the essay questions as they must have more weight but how much more-twice as much, five items, ten times? Certainly, one will ask the question during the examination. It helps to have the explicit weights on the examination and we should do the same with our organizational metrics systems.

By explicitly assigning weights to indicate the importance of each factor, it is possible to combine the factors into a single score for the office as a whole and compare it to previous benchmarks. Using the weights to determine a combined score avoids argument at the end of the period as to whether being up 10 percent in one factor and




down 15 percent in another means that performance towards goals this period is better or worse than the period before. Explicitly weighting the factors and combining the scores provides an objective measure of how this period compares with others.

Although previous sections of this article have had numerous references to prior research, much less is available in this area. Epstein (1984:202) discusses the need to treat the system as a whole and notes that one way to sabotage a performance measurement and improvement program is to "[t]reat each measurement and improvement approach separately, without trying to get them to complement each other for increased decision making and performance improvement power."

Eccles (1991:135) discusses the need to tie the measures that management has said are important to the personnel incentives system. He finds that formulas to do this are not usually effective. "[I]f the formula is simple and focuses on a few key variables, it inevitably leaves some important measure out. Conversely, if the formula is complex and factors in all the variables that require attention people are likely to find it confusing and may start to play games with the numbers. Moreover, the relative importance of the variables is certain to change more often-and faster~than the whole incentive system can change."

More recently, Nyhan and Marlowe (1995) discuss these topics and conclude that some previous attempts at sophisticated mathematical approaches such as data envelopment analysis require more mathematical knowledge and familiarity than is generally available in the workforce. Even the Total Organizational Performance System (TOPS) that they set forth may be too complex for many workers to understand. Their example, with eight quality indicators, four weighted at 15 percent and four at 10 percent, and their algebraic approach to data reduction may be too complex with weights not sufficiently diverse to motivate front-line workers consistently in the direction that managers intend.

Traditionally, businesses have relied on financial measures as their primary or only measure of performance. In a series of papers, Kaplan and Morton (1992, 1993, 1996) discuss the use of a balanced scorecard that includes measures from four perspectives: financial, customer, internal business, and innovation and learning. Their concept puts strategy and vision, not control at the center, establishing goals in the belief that people will work to achieve them. This is in contrast to the control basis of traditional measures which specify




employees' actions and then assess whether they have performed them. Using this approach forces managers to arrive at a consensus on their strategies and the measures to monitor them so that the management vision is in terms that are meaningful to the people who will realize the vision.

Some companies have linked the balanced scoreboard approach to their compensation system. By assigning weights to each objective to calculate incentive compensation, an additional payment may result even though performance may have been low in some areas. Kaplan and Norton (Ibid.) suggest that it may be better to establish minimum threshold levels for critical measures that must be exceeded before allowing any incentive compensation.

Schwartz (1996) discusses combining performance measures in a performance unit costing (PUC) framework which integrates quality, output, and cost signals. Efficiency is determined by dividing the cost by the output (C/O). Performance, outcome or effectiveness are calculated by multiplying output by quality (OxQ). Finally, the PUC is determined by C/OxQ). This method reduces a set of metrics to a PUC that can illustrate unit cost over time or unit cost reductions due to improved processes. In the basic formula, any proportional change in one variable has the same effect as the same change in any other variable. This may not be desirable in all circumstances.

Using the PUC as the starting point, Sorber and Straight (1995) revised the formula to provide increased weights on quality by squar- ing the quality index term. Such a change significantly increases the importance of the quality measure in the formula and requires a substantial change in cost or output to offset any reduction in quality to maintain a given overall score. For example, from an established baseline, even a 20 percent reduction in cost cannot be combined with any more than a 10.5 reduction in quality to maintain the origi- nal overall score.

This discussion will illustrate the importance of the weights of the measure being used and how they interact with one another. Sup- pose the function that is being evaluated has four primary measures. This number should be few enough so the people working to evaluate their achievements with them can keep each measure in mind as they perform their duties. The goals of the organization, these measures, and the weights assigned to them should have been developed in a group process as discussed earlier in this article.

The first two columns in Table 1 show an example of measures and weights. In this example, measure A is three times as important




W u

04 £ J*

J| 2 > w 2

>

D o 2 U 2" 5 a Qh

w

0) 3 O O O O O •g ^ cs -H ^ o

<D J> O O O O c ctf B u, <g 0) ^ £ o o o o o o o o o

^ r-, _ ^

4-» o o o o o • 2^ 'D (N - I - < o

^ O O O O

"3 +-> 0) o 3 H c3 < ffl U Q <D S

I I I I I I



PAQ WINTER 2000

g| I I I ||| SL O O CO > »

c CD

I- o o o o ^ CD O - H- N> ON O O O O O Cg- c-t*

O O O O VO VO VO VO o tn O O O O 3 CD X CD p

3 H-

o o o o o <3 vo o o Ln £- - o vo vo 00 4^ £ o

- - ~ o % o © o bo o ° ° o -J 3 £ 3

T3_ O O O O O <! <T> VO - - to in £- w - ' O O O K) £ CD

© © ~ ~ in in O © O P o o o o JJ g

3 ^ 0 0 0 0 0 ^° 03 (x) VO O O to ON pr- O LT} Ln O O £ O

o ooo ^ « « w bo o W o o o vy. 3 g

3 . -o o © o o o < fr

^ o o M £• A VO VO vo o ►- £ CD

00 tn 2 on H M < >- I

3

S CO

£ H c/D

M H EC ^ H

Is c *° CO •-a IT] B o

o w 00 o o S CO

(511)



(512) FAQ WINTER 2000

as measure B which is twice as important as either C or D which are equal.

The third and fourth columns of Table 1 show an example of a performance score (perfect in this case) ad the value when the score is multiplied times the weight assigned to each measure. To evaluate the assignment of the weights, the managers and staff should use sensitivity analysis to determine if the applied weights provide the desired motivation. Using the perfect case from Table 1 above as the baseline, Table 2 provides other examples with different levels of performance. Examples 1, 2, and 3 provide essentially the same total value score and example 4 results in a score that is substantially lower than the others.

To the managers and staff responsible for meeting the goals of the organization, do the results of examples 1, 2, and 3 provide all equally pleasing results towards the goals? If so, the selected weights are appropriate and appear to provide the proper motivation in the organization. However, if it should be unacceptable to score only 0.87 for measure A even if all the other areas have perfect scores (Example 2), the weights are not set correctly. Likewise, if a very low score in both measures C and D (Example 4, or if A, B, and C were perfect and D equaled zero), is unacceptable, the weights are not correct. Here unacceptable means that management does not be- lieve that each of the sample outcomes in the sensitivity analysis is of equal value in meeting organizational goals.

Sensitivity analysis should explore the extremes which Examples 1, 2, and 3 of Table 2 illustrate. Here, Example 4 may be the most realistic with no measure scoring the highest possible. An organization's sensitivity analysis should include several examples in the most realistic range as well.

In too many situations when using multiple measures, weights are not considered with the result that all measures are given equal weights. Such systems may not produce the results that managers desire and expect since it is unlikely that there will be true equality between several measures in any metric system. Equal weighting sends the wrong signals to the workforce when certainly some measures are more important than others,

For the same reason that it is important to have the work team agree on the goals and the measures, it is important to have the team explicitly agree on the weights. Having the proper weights on the measures and having the measures aligned with the goals of thr organization make a powerful metric system aligned towards motivat-




ing desired behavior from every member of the team responsible for achieving the goals.

CONCLUSION

By using readily available examples, this article illustrates the potential pitfalls cited above and suggests ways to avoid them. A proper performance metric system assists in achieving the goals of the organization. Since people normally do their work based on the elements that are measured, the selection of measures and the weights applied to the measures are vital to goal accomplishment. By illustrating the different results from one set of measures by assigning different weights to them, the reader can determine which result most closely matches the desired one. Readers can then apply the same techniques to their own situations to improve their selection of metrics and the weights they assign to them. Proper selection of a performance metrics system will avoid motivating counterproductive efforts from the work team and assist in achieving the organization's goals.

Pitfalls to avoid in the process of developing a performance metric system include those involved in

Selecting goals: 1) failing to select goals at all and 2) selecting or imposing goals without regard to developing commitment by the team that will have to achieve them; Selecting metrics: 1) selecting metrics that don't match the goals; 2) selecting or imposing metrics without team input; and 3) selecting metrics that are counterproductive; Selecting weights: 1) failing to understand that implicit weights are in place if no other action is taken and 2) assigning weights explicitly without foreseeing the potential motivation that will occur.

REFERENCES

American Society for Public Administration (1992). "Resolution Encouraging the Use of Performance Measurement and Reporting by Government Organizations." Adopted April 14.

Ammons, D.N. (1994). "The Role of Professional Associations in Establishing and Promoting Performance Standards for Local Government." PUBLIC PRODUC- TIVITY AND MANAGEMENT REVIEW 17(3):281-298.




Ammons, DA. and D.J. Hill (1995). "The Viability of Public-Private Competition as a Long-Term Service Delivery Strategy." PUBLIC PRODUCTIVITY AND MANAGEMENT REVIEW 19(l):12-24.

Andrews, K.Z. (1996). "Two Kinds of Performance Measures." HARVARD BUSINESS REVIEW 74(l):8-9. Reporting on C.C. Ittner, D.F. Larcker, and M.V. Rajan (1995). "The Choice of Performance Measures in Annual Performance Contracts." Working paper, August.

Behn, R.D. (1995). "The Big Question of Public Management." PUBLIC ADMIN- ISTRATION REVIEW 55(4):313-324.

Bouckaert, G. (1993). "Measurement and Meaningful Management." PUBLIC PRODUCTIVITY AND MANAGEMENT REVIEW 17(l):31-43.

Bruno, G. (1995). THE PROCESS ANALYSIS WORKBOOK FOR GOVERN- MENT. Milwaukee: ASQC Quality Press.

Crawford, C.C., G.B. Siegel, and JA. Kerr (1990). "Learning Needs of Contracting Personnel: Feedback from 62 Contracting Educators in a Crawford Slip Method Workshop." NATIONAL CONTRACT MANAGEMENT JOURNAL 23(2):55- 66.

Currall, S.C. and S.S. Kohl (1996). "Productivity of Public School Districts: The Employment Relations Model." PUBLIC PRODUCTIVITY AND MANAGE- MENT REVIEW 19(3):363-381.

Deller, S.C., C.H, Nelson, and N. Walzer (1992). "Measuring Managerial Efficiency in Rural Government." PUBLIC PRODUCTIVITY AND MANAGEMENT JOURNAL 15(3):355-370.

Deming, W.E. (1986). OUT OF THE CRISIS. Cambridge: MIT Press. DoD (1994). "Section 10. Techniques for Process Improvement." FRAMEWORK FOR MANAGING PROCESS IMPROVEMENT. Department of Defense, Dec. 15. The Electronic College of Process Innovation. Available at http//www.dtic.dla. mil/c3/bpred/3003sa.html.

DOJ (1995). "DOJ Manager's Handbook on Developing Useful Performance Indica- tors: Managing for Results." Washington, D.C.: Justice Management Division, Management and Planning Staff Version I, April 1.

Eccles, R.G. (1991). "The Performance Measurement Manifesto." HARVARD BUSINESS REVIEW 69(1): 131-137.

Epstein, P. (1984). USING PERFORMANCE MEASUREMENT IN LOCAL GOVERNMENT: A GUIDE TO IMPROVING DECISIONS, PERFORM- ANCE, AND ACCOUNTABILITY. New York: Van Nostrand Reinhold.

Grifel, S. (1993). "Performance Measurement and Budgetary Decision Making." PUBLIC PRODUCTIVITY AND MANAGEMENT REVIEW 16(4):403-407. Hatry, H.P. and D.M. Fisk (1971). IMPROVING PRODUCTIVITY AND PRODUCTIVITY MEASUREMENT IN LOCAL GOVERNMENT. Washing- ton, D.C.: Urban Institute.




Hayes, F.O'R (1977). PRODUCTIVITY IN LOCAL GOVERNMENT. Lexington, MA: Lexington Books.

Hyde, A.C. (1992). "The Proverbs of Total Quality Management: Recharting the Path to Quality Improvement in the Public Sector." PUBLIC PRODUCTIVITY AND MANAGEMENT REVIEW 16(l):25-37.

Isham, J., D. Narayan, and L. Pritchett (1995). "Does Participation Improve Per- formance? Establishing Causality with Subjective Data." WORLD BANK ECONOMIC REVIEW 9(2), abstract at http/www.world.org/html/extpb/ Re- view/May95.html.

Kamensky, J. (1993). "Program Performance Measures: Designing a System to Manage for Results." PUBLIC PRODUCTIVITY AND MANAGEMENT REVIEW 16(4):395-402.

Kaplan, R.S. and D.P. Norton (1996). "Using the Balanced Scorecard as a Strategy Management System." HARVARD BUSINESS REVIEW 74(l):75-85.

(1993). "Putting the Balanced Scorecard to Work." HARVARD BUSINESS REVIEW 71 (5): 134-147.

(1992). "The Balanced Scorecard- Measures that Drive Performance." HARVARD BUSINESS REVIEW 70(l):71-79.

Keehley, P. (1993). "Does TQM Spell Time to Quit Merit'?" PUBLIC PRODUC- TIVITY AND MANAGEMENT REVIEW 16(4):387-394.

Lewis, H.T. (1948). "Evaluating Department Efficiency." HARVARD BUSINESS REVIEW (May):313-328.

Lynch, R.F. and TJ. Werner (1992). CONTINUOUS IMPROVEMENT: TEAMS AND TOOLS. Atlanta: QualTeam, Inc.

McGowan, R.P. (1995). "Total Quality Management: Lessons from Business and Government." PUBLIC PRODUCTIVITY AND MANAGEMENT REVIEW 18(4):321-331.

Nyhan, R.C. and HA. Marlowe, Jr. (1995). "Performance Measurement in the Public Sector: Challenges and Opportunities." PUBLIC PRODUCTIVITY AND MANAGEMENT REVIEW 18(4):333-348.

Office of Management and Budget (1996). PRIMER ON PERFORMANCE MEASUREMENT. Available at gopher//pula.financenet.gov.70/00/docs /post/ perform/primer.

Osborne, D. and T. Gaebler (1992). REINVENTING GOVERNMENT: HOW THE ENTREPRENEURIAL SPIRIT IS TRANSFORMING THE PUBLIC SECTOR. New York: Addison- Wesley.

Scholtes, P.R. (1988). THE TEAM HANDBOOK: HOW TO USE TEAMS TO IMPROVE QUALITY. Madison, WI: Joiner Associates.

Schrager, M. (1996). "Meetings Don't Have to be Dull." WALL STREET JOUR- NAL 227(84): (April 29):A22.

Schwartz, L. (1996). "Government Performance: Measurement and Improvement."




Briefing for the Department of Health and Human Services, April 17, Sorber, H. (1993). "Performance Measurement in the Central Government Depart- ments of the Netherlands." PUBLIC PRODUCTIVITY AND MANAGEMENT REVIEW 17(l):59-68.

Sorber, K.D. and R.L. Straight (1995). "Measuring Operational Contracting Cost, Output, and Quality Together." PROCEEDINGS, 1995 Acquisition Research Symposium, Washington D.C.

Taylor, F. (1911). PRINCIPLES OF SCIENTIFIC MANAGEMENT. New York: Harper Collins.

Troy, K. (1994). CHANGE MANAGEMENT: AN OVERVIEW OF CURRENT INITIATIVES. New York: Conference Board.



performance metrics: avoiding the pitfalls

Documents