an experimental comparison of tabular and graphic data presentation

22
Int. J. Man-Machine Studies (1984) 20, 545-566 An experimental comparison of tabular and graphic data presentation MATTHEW POWERS, CONDA LASHLEY, PAMELA SANCHEZ AND BEN SHNEIDERMANt Department of Computer Science, University of Maryland, College Park, Maryland 20742, U.S.A. (Received 15 December 1982, and in revised form 4 May 1983) We present the results of our experiment designed to test the hypothesis that more usable information can be conveyed using a combination of graphical and tabular data then by using either form alone. Our independent variables were memory (recall and non-recall) and form (tables, graphs, or both). Comprehension was measured with a multiple choice exam consisting of three types of questions (retrieve, compare, or compare/calculate answers). Both non-recall and tabular treatments significantly increased comprehension. Combinations of graphs and tables produced slower but more accurate performance. An executive should use the form with which he/she is most familiar and comfortable. Introduction Since voluminous data can be condensed into compact tables and graphic represen- tations, the effect of form and amount of data presented on the user's ability to comprehend the information must be understood. Issues related to data representation have been the topic of several experimental studies. The results of such studies, unfortunately, are not conclusive. Several authors have stressed the importance of the form of data presentation on comprehension, others have discounted this importance, and still others have been unable to draw conclusions one way or another. To help gather more evidence, we have conducted a controlled experiment in an attempt to determine what form of data presentation is the easiest to comprehend and is most accurately recalled at a later time. Background studios The importance of the form of data presentation has been a controversial issue. A series of studies, known as the Minnesota experiments, examined numerous factors impacting decision effectiveness (Chervany & Dickson, 1974; Dickson, Senn & Cher- vany, 1977; Schroeder & Benbasat, 1975; Smith, 1975; Senn, 1973). Several studies in this area have centered around the tabular versus graphical presentation of data. Davis (1981b) summarized the data presentation issue as follows: "Experimental results concerning the effects of report format and level of summarization have been somewhat contradictory". Davis cites three studies which found that report format had an effect on performance, and one study which found no difference due to format. tAll correspondence to be addressed to Ben Shneiderman. 545 0020-7373/84/060545 + 22503.00/0 9 1984 Academic Press Inc. (London) Limited

Upload: matthew-powers

Post on 15-Jul-2016

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: An experimental comparison of tabular and graphic data presentation

Int. J. Man-Machine Studies (1984) 20, 545-566

An experimental comparison of tabular and graphic data presentation

MATTHEW POWERS, CONDA LASHLEY, PAMELA SANCHEZ AND BEN SHNEIDERMANt

Department of Computer Science, University of Maryland, College Park, Maryland 20742, U.S.A.

(Received 15 December 1982, and in revised form 4 May 1983)

We present the results of our experiment designed to test the hypothesis that more usable information can be conveyed using a combination of graphical and tabular data then by using either form alone. Our independent variables were memory (recall and non-recall) and form (tables, graphs, or both). Comprehension was measured with a multiple choice exam consisting of three types of questions (retrieve, compare, or compare/calculate answers). Both non-recall and tabular treatments significantly increased comprehension. Combinations of graphs and tables produced slower but more accurate performance. An executive should use the form with which he/she is most familiar and comfortable.

Introduction

Since voluminous data can be condensed into compact tables and graphic represen- tations, the effect of form and amount of data presented on the user 's ability to comprehend the information must be understood.

Issues related to data representation have been the topic of several experimental studies. The results of such studies, unfortunately, are not conclusive. Several authors have stressed the importance of the form of data presentation on comprehension, others have discounted this importance, and still others have been unable to draw conclusions one way or another. To help gather more evidence, we have conducted a controlled experiment in an at tempt to determine what form of data presentation is the easiest to comprehend and is most accurately recalled at a later time.

Background studios

The importance of the form of data presentation has been a controversial issue. A series of studies, known as the Minnesota experiments, examined numerous factors impacting decision effectiveness (Chervany & Dickson, 1974; Dickson, Senn & Cher- vany, 1977; Schroeder & Benbasat, 1975; Smith, 1975; Senn, 1973). Several studies in this area have centered around the tabular versus graphical presentation of data. Davis (1981b) summarized the data presentation issue as follows: "Exper imental results concerning the effects of report format and level of summarization have been somewhat contradictory". Davis cites three studies which found that report format had an effect on performance, and one study which found no difference due to format.

tAll correspondence to be addressed to Ben Shneiderman. 545

0020-7373/84/060545 + 22503.00/0 �9 1984 Academic Press Inc. (London) Limited

Page 2: An experimental comparison of tabular and graphic data presentation

546 M. P O W E R S E T A L .

Of the three studies which found that report formats had an effect on performance, two indicated that those subjects given tabular reports performed better than subjects given graphic reports. The third study found that subjects given graphic format performed better than those with tabular report formats.

One of the studies cited above, which did find that report format had a statistically significant affect on performance, was conducted by Benbasat & Schroeder (1977). In a complex experiment, they studied six factors (such as form of report presented, decision-making style, and number of reports available) in Management Information System (MIS) decision effectiveness. In the experiment, 32 students, enrolled in an operations management course, acted as inventory/production managers of a hypothetical firm. Subjects were given the duties and responsibilities of typical ware- house managers (such as ordering, selling, stocking, shipping, etc.) and placed at a CRT which presented them with typical warehouse situations. Using graphic and/or tabular data, Benbasat & Schroeder reported that graphical reports were preferred by the subjects, reduced the costs, increased the effectiveness of decision-making, and reduced the total number of reports needed.

A study conducted by Lucas (1979) presented several interesting but conflicting conclusions about the graphic/tabular issue. Stanford University graduate students were asked to put themselves in the position of a buyer for a whisky firm. The subjects had to deal with basic supply and demand issues and attempt to maximize profits (high sales, low backstock, no item shortages, etc.). Subjects queried the "company books" to assist them in making accurate decisions. The queries were entered via a CRT and answered through either tabular or graphical report formats. The subjects receiving tabular reports found the output more useful than the subjects receiving the graphic reports. On the other hand, the graphics treatment produced significantly more enjoy- ment from the exercise and a better understanding of the problem.

Several studies indicated that users with a tabular report performed better than subjects with a graphical report. Lusk & Kersnick (1979) asked subjects to answer 20 questions based on information presented to them in either tabular form or graphic form. Lusk & Kersnick gathered statistical data on such issues as complexity of data, analytic and heuristic decision makers, and complexity of data presented. They com- mented "Most of the individuals interviewed did not feel as confident about their ability to use the graphical reports as they did about their ability to use tabular reports". Lusk & Kersnick summarized the data-presentation issue by stating that individuals prefer (and will therefore exhibit superior performance using) report formats with which they are familiar. Subjects consistently perceive report formats with which they are familiar as less complex and easier to comprehend. Lusk & Kersnick suggested further research on the questions raised in the experiment. We conclude that computer- produced tables may be perceived as less complicated and easier to comprehend, and are more often preferred by the subjects simply based on the fact that tabular format is a more traditional and familiar form of output.

One factor which may have caused conflicting results was the broad scope of each experiment. Benbasat & Schroeder, as well as Lusk & Kersnick, examined numerous independent and dependent variables (the Benbasat & Schroeder study used six independent and three dependent variables). The independent variable identified as "Form of Report Presentation" was given only a small consideration in relation to the entire study of the MIS. These pioneering experiments may have been hampered

Page 3: An experimental comparison of tabular and graphic data presentation

C O M P A R I S O N O F T A B U L A R / G R A P H I C P R E S E N T A T I O N 547

by their broad scope, neglect of several interactive system issues and limited criterion for the selection of subjects.

We propose a straightforward and simple paper-and-pencil test to study the compre- hension and recall ability based on a small set of data presented in three different forms. This experiment is analogous to the following example: Mr Smith is the head of the accounting depar tment for the X Y Z company. He must address the board of directors of the company and present an overview of the company 's financial standing. Mr Smith has received the materials necessary to make the report a mere five minutes before his presentation. The report Mr Smith receives contains such information as expenditures, income, dividends on stock, cash on hand, etc. The reports come to him in one of three forms: graphical reports, tabular reports, or a combination of the two. If Mr Smith wishes to appear prepared while in the meeting, he may leave the reports outside the meeting room. On the other hand, he may want to bring the reports inside with him and refer to them while speaking. The six possible cases (three forms of data presentation and two styles of presentation to the board) parallel our experimental design.

Experimental design

EXPERIMENTAL HYPOTHESIS

Our study deals with the presentation of computer-generated information to an individual who must interpret the information. Our hypothesis was that more "usable" information can be conveyed using a combination of graphical and tabular data than by using either graphical data or tabular data alone. We felt that when a user is given both forms, he or she would get a better overall "view" of the information and comprehend it better. The combination of graphic and tabular also allows the user to extract information from the form of presentation that he or she most readily under- stands (and as stated in a study cited, this is most often the tabular form) and then further enhance his/her understanding with the lesser known presentation mode. In a sense, one form of data may verify the other.

For the recall versus non-recall experimental groups, we felt that comprehension performance would depend on the type of questions asked (type A, B, or C, as described

to

"6

4~ L 30

2O

10~

c ~

I C

FIG. 1. Predicted comprehension of type A, B, and C questions.

I B

Question type

A, Non-recall; - - x , recall.

Page 4: An experimental comparison of tabular and graphic data presentation

548 M. P O W E R S E T AL.

below). As shown in Fig. 1, we feel that a subject 's comprehension score would decrease as the difficulty of the questions increases, involving more comparisons and more complex cognitive operations. We felt comprehension scores in the recall group would deteriorate rapidly, while the comprehension scores in the non-recall group would indeed decline, but at a much slower rate. Our two by three experimental design is shown in Table 1.

TABLE 1 Experimental design

Form

Memory Tabular Graphical Combination Form summary

Non-recall, N = 12 12 13 37 Recall, N = 11 13 13 37 Memory summary, N = 23 25 26 74

EXPERIMENTAL PROCEDURES

We read subjects the text of a prepared speech (Appendix A) so that each experimental session would receive exactly the same instructions. The subjects read and signed an experimental consent form. After doing so, the experimental materials (either the tables, graphics, or a combination) were distributed face down and subjects were given 5 min for study. The multiple-choice test was then passed out and subjects were allowed 10 rain to complete it. Subjects were not permitted to leave if they finished the questionnaire before the 10 min limit. A "debriefing session" was held immediately following collection of the test materials.

PILOT STUDY

The pilot study was administered to 18 individuals. All subjects did satisfactorily on the test and their results are presented in Appendix B. The administration, grading, and analysis of comments made by the subjects helped us to correct several small flaws in the experiment. The experimental flaws and their subsequent corrections are shown in Appendix C. Most of the corrections to the material simply made the materials easier to read or understand. Several changes, most notably the inclusion of a pre- liminary set of written instructions, helped us to eliminate several sources of possible experimental biases that may have invalidated the experimental results. The pilot study confirmed that 5 min was adequate for the study of the material and 10 min was adequate for completion of a majority of the questions.

EXPERIMENTAL SUBJECTS

The subjects in our experiment were University of Maryland undergraduates, enrolled in a second semester computer science course. Each of the subjects had approximately the same background; however, a subject with a radically different background should not affect our experimental results (since no special mathematical or English skills were assumed). Only very simple mathematical skills, such as multiplication or addition, were required.

Page 5: An experimental comparison of tabular and graphic data presentation

COMPARISON OF T A B U L A R / G R A P H I C PRESENTATION 549

Materials given to the subjects were numbered with a two digit code. The first digit indicated if the materials were to be used for recall (code of 2) or if the materials may be kept during administration of the questionnaire (non-recall, code of 1). The second digit represented the form of the test material givep to the subject (tables, graphics, or tables and graphics, where the code was 1, 2, or 3, respectively). Test materials were randomly distributed to the subjects.

EXPERIMENTAL MATERIALS

We prepared the tables and graphic output shown in Appendix D. A single page containing five tables was printed on a standard line-printer. This assured that the output was similar to typical computer generated tables. The main table consisted of 20 test scores arranged in random order. The test scores were purely hypothetical but were made to appear as typical grades received on a 20-point test. The test scores were further broken down into tables indicating the distribution of letter grades. It was assumed the standard 90, 80, 70, 60 percentile cut-offs apply to the letter grades A, B, C, D, and F, respectively. In addition to the three tables of scores and letter grades on the hypothetical test, tables were also presented to indicate letter grades received on a previous test. The same standards for letter grading applied to the previous test as were mentioned above for the current test. The tables which refer to the previous and current test were clearly labeled using indentifiers such as ' "CURRENT T E S T " and " P R E V I O U S TEST". Table column labels were also provided to aid in the understanding of the tables. The average for the current test was also provided.

The graphic output simply illustrated the table data in a pictorial form. A line plot was chosen to present the 20 individual test scores. Bar and pie charts were used to report the breakdown of the test score into letter grades. All graphic displays included legends, titles, statement of test average, X and Y axis labels, shading, and X and Y axis values where appropriate. We duplicated the tabular data into graphic form, and therefore there were no misleading labels, missing values, or deceiving shading in the graphic data presented. The graphic output was produced by T E L L G R A F , an interac- tive computer graphics package. The plots were generated on a Tektronix static line-plotter. Students were presented with high quality copies of the original tables and graphics.

We prepared a 27 question multiple-choice test that we felt would test the subject's understanding of the materials (see Appendix E). Each question contained only one correct answer and four incorrect alternative answers. Of the 27 questions presented, each was classified into one of three categories. Type A questions were defined to be simple recall (or in the case where the subjects were allowed to keep the materials, a simple look-up) questions. Typical type A questions were: "What is the average grade on the current test?" or "What was the lowest grade received on the current test?". Type B questions involved recall (or look-up) of several facts presented and then a comparison of those facts. Typical type B questions were '"In comparing students that received either an A or a B on the current test to those receiving an A or a B on the previous test, we may c o n c l u d e . . . " or "When comparing the results of the previous and the current test, one may conclude t h a t . . . " . Type C questions involved extensive recall (or look-up) combined with a comparison and /or arithmetic manipulation of the data recalled. Typical type C questions were "'If a student received a test score equal to the average, his letter grade on the exam would b e . . . " . This question involved

Page 6: An experimental comparison of tabular and graphic data presentation

550 M. POWERS lifT AL.

recalling the average score on the tests, the number of questions on the test (these are both type A questions), and then multiplying the results to determine the letter grade category. Another typical type C question was "Suppose that the person who scored the highest grade on the current test answered eighty percent of the questions correctly. How many questions would this imply were on the test?". This question requires several steps to calculate the answer.

ADMINISTRATION OF THE EXPERIMENT

We administered the experiment to four undergraduate computer science classes. Each class contained approximately 20 predominantly freshman-level students.

The subjects appeared fairly relaxed and confident during testing. During the experiment, questions were allowed to be asked. Two of the subjects asked if they had received enough test materials since they noticed that others around them received different amounts. We explained that, as was stated in the introduction read to the students, the amount of test materials will vary between subjects (see Appendix A, for text of introduction). Three students did not copy the two digit code number from the experimental materials to the questionnaire (as instructed in Appendix A) and these tests were disregarded. The only additional problem that occurred during the experiment was pointed out to us immediately after we administered the experiment to the first group of subjects. A student approached us and pointed out a typing error in question 18 of the questionnaire. The subject stated that he was " thrown off" by the typing error. We decided to ignore all responses to question 18 when grading the tests.

Experimental results

The independent variables in our study were memory (two levels of t reatment , non-recall and recall) and form (three levels of t reatment, tabular, graphical, and a

TABLE 2 Dependent variable 1: Number of questions answered correctly (out of 26)

Memory Tabular

Form

Graphical Combination Form summary

Sum 229 142 184 555 Non-recall Mean 19.08 11"83 14.15 15.02

N 12 12 13 37

Sum 147 143 134 424 Recall Mean 13.36 11.00 10.31 11.46

N 11 13 13 37

Memory summary Sum 476 285 318 1079 Mean 16.35 11.41 12.23 13.23 N 23 25 26 74

Significance of form <0-001. Significance of memory <0.001.

Page 7: An experimental comparison of tabular and graphic data presentation

COMPARISON OF TABULAR/GRAPHIC PRESENTATION 551

combination of the two). The number of subjects in each of the six experimental groups varied from 11 to 13. There were 37 subjects in both the non-recall and recall groups. The breakdown of subjects by form was: 23 subjects in the tabular treatment, 25 subjects in the graphical treatment, and 26 subjects in the combination treatment.

The questionnaires were graded by awarding five points for a correct response. On several questions it was possible to receive partial credit (ranging from zero to four points) for selecting an answer that was nearly correct. Using this five-point grading system, we tested seven dependent variables in an attempt to find significant differences between the six experimental groups. The first dependent variable measured the number of questions a subject answered perfectly correctly (for which five points were

TABLE 3 Dependent variable 2: Total number of points scored (out of 130)

Memory

Form

Tabular Graphical Combination Form summary

Sum 1214 826 971 3011 Non-recall Mean 101.17 68.83 74.69 81.38

N 12 12 13 37

Sum 857 929 871 2657 Recall Mean 77.91 71.46 67.00 71.81

N 11 13 13 37

Sum 2071 1755 1842 5668 Memorysummary Mean 90.04 70.20 70.85 76.60

N 23 25 26 74

Significance of form = 0.001. Significance of memory = 0.042.

TABLE 4 Dependent variable 3: Number of questions answered (out of 26)

Memory Tabular

F o r m

Graphical Combination Form summary

Sum 283 205 219 707 Non-recall Mean 23.58 17.08 16.85 19.11

N 12 12 13 37

Sum 226 300 273 799 Recall Mean 20.55 23.08 21.00 21.60

N 11 13 13 37

Sum 509 505 492 1506 Memory summary Mean 22.13 20.20 18.92 20.35

N 23 25 26 74

Significance of form = 0-022. Significance of memory = 0-009.

Page 8: An experimental comparison of tabular and graphic data presentation

5 5 2 M. POWERS E T A L .

awarded). The second dependent variable measured the total number of points accumu- lated by the subject. The third dependent variable recorded the number of questions the subject attempted to answer. The fourth dependent variable measured the percen- tage of questions answered correctly of those attempted. The fifth, sixth, and seventh dependent variables recorded the number of points achieved on type A, B, and C questions, respectively. Since there were more type B questions than types A or C, the type B scores were scaled down so that a maximum of 40 points could have been scored in each of the three categories of questions. Tables 2-8 summarize the responses of the 74 participants. The levels of statistical significance (determined by the A N O V A test of SPSS) are supplied below each table.

TABLE 5 Dependent variable 4: Percentage of answers correct of those questions attempted

Memory Tabular

Form

Graphical Combination Form summary

Sum 970 838 1079 2887 Non-recall Mean 80.83 69-83 83.00 78.03

N 12 12 13 37

Sum 717 598 645 1960 Recall Mean 65.18 46.00 49.63 52.97

N 11 13 13 37

Sum 1687 1436 1724 4847 Memorysummary Mean 73.35 57-44 66'30 65.50

N 23 25 26 74

Significance of form = 0.001. Significance of memory < 0.001.

TABLE 6 Dependent variable 5: Points scored on type A questions (out of 40)

Form

Memory Tabular Graphical Combination Form summary

Sum 395 286 313 994 Non-recall Mean 32.92 23-83 24.08 26.86

N 12 12 13 37

Sum 302 335 323 960 Recall Mean 27.46 25.77 24.85 25.95

N 11 13 13 37

Sum 697 621 636 1954 Memorysummary Mean 30.30 24.84 24.40 26.40

N 23 25 26 74

Significance of form = 0.002. Significance of memory = 0-586.

Page 9: An experimental comparison of tabular and graphic data presentation

COMPARISON OF TABULAR/GRAPHIC PRESENTATION

TABLE 7 Dependent variable 6: Points scored on type B questions (out of 40)

553

Memory Tabular

Form

Graphical Combination Form summary

Sum 405 270 343 1018 Non-recall Mean 33.75 22.50 26.39 27.51

N 12 12 13 37

Sum 260 282 268 810 Recall Mean 23.64 21.69 20.62 21.89

N 11 13 13 37

Sum 665 552 611 1828 Memory summary Mean 28.91 22.08 23.50 24.70

N 23 25 26 74

Significance of form = 0.004. Significance of memory = 0.001.

TABLE 8 Dependent variable 7: Points scored on type C questions (out of 40)

Form

Memory Tabular Graphical Combination Form summary

Sum 314 203 229 746 Non-recall Mean 26.17 16.92 17.62 20.16

N 12 12 13 37

Sum 229 241 224 694 Recall Mean 20-82 18.54 19.54 18.76

N 11 13 13 37

Sum 543 444 453 1440 Memorysummary Mean 23.61 17.76 17.43 19.90

N 23 25 26 74

Significance of form = 0.014. Significance of memory = 0.500.

W h e n examin ing Tab les 2 - 8 and the ba r char t s which summar i ze them (see Figs 2 and 3), it was a p p a r e n t tha t the subjec t s in the non- reca l l g roup scored be t t e r than thei r c o u n t e r p a r t s in the recal l group. In some cases, however , such as scores on type A and C ques t ions , the non- reca l l and recal l g roups rece ived nea r ly ident ical scores. The A N O V A tests of t ype A and C scores r evea l ed no s ta t is t ical ly significant d i f ferences be t ween the two groups . H o w e v e r , when we c o m p a r e d non- reca l l and recal l g roups with the n u m b e r of cor rec t responses ( P < 0 . 0 0 1 ) , to ta l n u m b e r of poin ts scored ( P = 0 . 0 4 2 ) , pe r cen t age cor rec t of those answered ( P < 0 . 0 0 1 ) and score on type B

Page 10: An experimental comparison of tabular and graphic data presentation

554 M. POWERS E T A L .

"E

"5

z

I00

80

60

40

20

0

[ ] Non-recall �9 Recall

DV2 DV4

FIG. 2. Scores on dependent variables 2 and 4 based on independent variable memory.

g "6

E= Z

30 [ ] Non- recall I I F

20

10

DVl DV5 DV5 DV6 DV7

FIG. 3. Scores on dependent variables 1, 3, 5, 6, and 7 based on independent variable memory.

questions ( P = 0 . 0 0 1 ) , we found that the dependent variable of memory played a significant role. In these four cases, the non-recall group scored significantly better than the recall group. In the case of the third dependent variable, the number of questions attempted, the recall group scored significantly higher (P = 0.009) than the non-recall group.

When we compared the various forms of data that were presented, we discovered a further trend. Information presented in Figs 4 -6 indicates that subjects using informa- tion presented in tabular form scored significantly better than subjects using the other two forms of data representation. A N O V A tests reported significance levels ranging from 0.001 to 0.083. This indicates that tabular data was significantly superior to graphics or graphic/tabular data. In addition to the dominance of the tabular form of presentation, we also noted that the graphics and the graphic/tabular combination achieved almost identical scores on most of the seven dependent measures. The graphic and graphic/tabular combination were nearly equal on such measures as number correct (11.41, 12.23), total points scored (70.20, 70.85), number of questions

Page 11: An experimental comparison of tabular and graphic data presentation

C O M P A R I S O N OF T A B U L A R / G R A P H I C PRESENTATION 5 5 5

4 O

t n

2 O

10

F~G. 4. Comprehension versus type of question. - -

I I B C

O ueshon type

-~, Tabular; - - - - x , graphic; - - - - ~ , combination.

5 0

2 0 -

"S

E ~ I0

[ ] Tabular [ ] Graphic [ ] Combination

f.f.~f.f.~

~ ,,,\\\.,~

' ~//A ...... X\\\\'~

DVI DV5

FIG. 5. Scores on dependent variables 1 and 3 based on independent variable form.

IO0

80

t a

.~ 6 0

"a

4 0 -

Z

2 0 -

7//,

r I I i t

r i i l l ~ z H

, H ,

Tabular [] Graphic [] Combination

, z / z / / J

, x x x x . ~

~ x x x x ~ , \ \ \ \ . ~ \ \ ' ~ " ' ' ~ ' " . . . . . . ~ \ \ x x . . . . . . ,H/H, N\\"~

N\"x~ . . . . . .

r H l c s ,

DV2 DV4

FIG. 6. Scores on dependent variables 2 and 4 based on independent variable form.

Page 12: An experimental comparison of tabular and graphic data presentation

556 M. P O W E R S E T A L .

attempted (20.20, 18.92), and points on type A (24.84, 24.40), type B (22.08, 23.50), and type C (17.76, 17.43) questions.

Discussion of experimental findings

As expected, the subjects in the non-recall group consistently scored higher than their counterparts in the recall group. Individuals who have the information in front of them should be able to extract the information better than individuals who must recall such information. In several dependent variables measured, namely the points scored on type A and C questions, the recall and non-recall scores achieved were nearly identical. By definition, type A questions were devised to test only information that was easily derivable from the materials presented. This type of information is therefore easily and accurately stored and retrieved. As shown in Table 6, the mean number of points scored on type A questions by both recall and non-recall groups (26.86, 25.95) are almost equal. This indicates that information needed to answer type A questions was easily stored and accurately retrieved by both treatment groups. Similar conclusions can be drawn concerning the near equality of the means of non-recall and recall subjects in answering type C questions (see Table 8). Subjects in both the non-recall and recall groups performed poorly on type C questions (as compared with type A questions), yet scores remained remarkably similar. We surmise that the recall group did not properly store or could not properly recall such complicated relationships, while the non-recall group simply could not determine the correct answers, given the materials presented.

On the other hand, the true differences between the non-recall and recall groups became evident in examination of type B question results. The non-recall participants scored significantly higher than their counterparts in the recall group. We feel that type B questions represent the point at which memorization and accurate recall begin to break down. The recall group understood the questions (as they did on the type A questions) and were not overwhelmed at the request to recall vast amounts of data and analyze it (as they were overwhelmed in answering type C questions). The recall group, on the other hand, began to lose accuracy when several facts needed to be recalled and compared (type B questions). Our hypothesis, which was stated previously and illustrated in Fig. 1, was only partially proven. Both recall and non-recall groups had similar type A scores and the recall group's comprehension score did, indeed, decline at a faster rate than their counterparts in the non-recall group. What we did not account for, as shown in Fig. 7, was that the two groups would again achieve similar averages at type C comprehension scores. We conclude that when comparing the non-recall and recall groups, that the type A questions a r e equally easy, that the type C questions are equally difficult, and that the type B questions, involving several comparisons and more extensive recall of data, illustrates the true differences between the recall and non-recall groups.

Individuals in the recall group did score significantly higher when the number of questions answered was examined (see Table 4). We feel that since the recall group had no materials to refer to, the subjects simply answered the questions to the best of their ability and proceeded to the next question. The non-recall group, on the other hand, could refer back to materials if they were unsure of the correct answer. This

Page 13: An experimental comparison of tabular and graphic data presentation

C O M P A R I S O N OF T A B U L A R / G R A P H I C P R E S E N T A T I O N 557

4~ I -~ 3 o c_

20~! --~ z

1 I I 0 B C

Ouestion typ~ FIG, 7. Actual comprehension of type A, B, and C questions. - - Z ~ , Non-recall; - - x , recall.

slowed down the non-recall group, therefore their total number of questions at tempted was lower than their counterparts in the recall group. We also found evidence that casts doubt on the subjects' ability to "answer to the best of their ability". In dependent variable four, the percentage correct of those questions attempted, we found that most recall group members correctly answered less than half of the questions they attempted. This fact could be attributed to outright guessing by the subjects. These figures may indicate that either the subjects in the recall group had poor storage and retrieval techniques or the recall subjects simply guessed at questions too difficult to answer and proceeded to the next question.

When our second independent variable, form of materials, was studied we also came to several interesting conclusions. The subjects that received tabular data consistently scored significantly better than their counterparts in either the graphics or the graphics/tabular group (see Figs 4-6). We feel that the differences can be attributed to several factors. First, as stated by Lusk & Kersnick (1979), the tabular format was probably a more familiar and natural form of data presentation for our subjects. Students in an academic setting are rarely presented with test results in pie chart form. In most cases, test results are displayed to a student in tabular form. Secondly, we feel that subjects in the graphics/tabular combination group were overwhelmed by the amount of data presented to them. By examining such variables as number of questions answered and points scored on type A questions (Tables 4 and 6), we saw that the groups receiving both graphics and tabular data scored lower than if they would have received either form alone. With such voluminous data in front of them, the subjects could not reap the benefits of either form of data, but instead were overwhelmed by it. Lastly, we feel that the design and format of the experiment was geared towards the subjects in the tabular group. The tabular group received data in much the same form as that in which they were required to recall it. For example, the highest score achieved on our hypothetical test clearly appeared in the tables provided. When a subject in the tabular group was asked to recall the highest score on the test, the information had been seen in that form and could be easily recalled. Subjects in the graphic treatment group, however, needed to recode the general "pictures" of the data provided into the appropriate answer. The graphic treatment gave the subjects a good overall "view" of the data, but did not easily supply facts to the subjects. The

Page 14: An experimental comparison of tabular and graphic data presentation

558 M. POWERS U T A L .

questions demanded recall of specific facts provided by the materials presented, therefore the questions were geared towards the tabular treatment.

Experimental conclusions

We predicted that the combination of graphic and tabular data would yield a better "view" of the data and therefore produce better comprehension of it. This hypothesis appears to have been supported only partially. When dependent variable four, the percentage correct of those questions attempted, was examined, we found that the non-recall graphic/tabular combination achieved the highest percentage correct. This verified that subjects given voluminous data would not necessarily answer more questions, but they would answer the questions at tempted with a high degree of accuracy. If one's goal is to increase speed of performance, then the combination of tabular and graphic data should be avoided (in this case our experiment indicates tabular data alone would be most appropriate). However, if one's goal is to increase the accuracy of performance, then the combination of graphics and tabular data appears to be the most conducive to that goal.

Our second hypothesis stated that the degree of comprehension should be determined by the type of question asked. When we compared type A, B, and C questions over all 74 subjects, we found that scores steadily declined (averages of 26.4, 24.7, and 19.9) as the complexity of the questions increased. In all of the six experimental groups, as the questions involved more extensive recall, comparison of information, and complex operations on the data, the comprehension scores decreased steadily. When non-recall and recall groups are compared, our hypothesis (as pictured in Fig. 1) was partially verified. Although the recall and non-recall groups started at the same level of comprehension and did, indeed, decline steadily, our hypothesis failed to predict that comprehension levels would again be equal as the questions became extremely complicated.

Recommendations

Our recommendations fall into two broad categories: recommendations to future experimental researchers and recommendations to system designers who are users of tabular and graphic presentation methods.

Experimenters may wish to alter the questionnaire to include a larger number of questions or a varied question format (to include short answer, fill-in-the-blank, or subjective type questions). The materials presented to the subjects could include a larger amount of data, a text description of the data as an additional presentation form, or varying graphic and tabular formats as the volume of data grows. A valuable dependent variable would be the retention of information after an hour, day, or week.

Future experimenters should also examine the classifications of questions (type A, B, or C) chosen for this experiment and determine whether a more accurate scheme could be devised to classify question difficulty. Most importantly, we recommend that future practitioners should involve a larger number and larger variety of subjects in

Page 15: An experimental comparison of tabular and graphic data presentation

COMPARISON OF TABULAR/GRAPHIC PRESENTATION 559

the experiment. Our experiment did not prove that graphics data was significantly bet ter in aiding comprehension than tabular data. However , a similar experiment in which subjects are given (or already possess) formal training in graphic interpretation may alter the results significantly.

System designers dealing with graphic and tabular data should attend to the advice of Lusk & Kersnick (1979): the most familiar form of data presentation is often perceived as the easiest to comprehend. Our experiment gave limited support to this notion, so users of graphic and tabular data should be aware of their predisposition to one form of presentation over another. A system designer may wish to give the intended user of data an option of receiving any combination of graphic and tabular data. For example, a menu selection process could be implemented so that the same data could be presented in a number of ways. Users could select the form(s) they feel most comfortable with and can comprehend the most easily.

Recent developments in graphic hardware and software have made graphics an economically sound alternative to traditional hard-copy output of tables. However , users must recognize drawbacks that plague both tabular and graphic output of data. Both tabular and graphic data can overwhelm users when presented in great quantities or in a hard to understand fashion. Simplicity and moderat ion should always be considered. Secondly, both graphics and tabular data can distort results.

The hypothetical "Mr Smith" described earlier should request both tabular and graphical output for use in his presentation. He should review the tables prior to presenting the graphic data to the board of directors. In order to make a clear presentation of the data, he should reference the graphics while citing details gained from his study of the tables.

References

BENBASAT, 1. t~ SCHROEDER, R. G. (1977). An experimental investigation of some MIS design variables. The Management Information Systems Quarterly, 1, 37-49.

CHERVANY, N. L. t~z DICKSON, G. W. (1974). An experimental evaluation of information overload in a production environment. Management Science, 20, 1335-1344.

DAVIS, D. L. (1981a). An experimental investigation of the form of information presentation, psychological type of the user, and performance within the context of a management information system. Ph.D. Thesis, University of Florida (unpublished).

DAVIS, D. L. (1981b). Unpublished report. DICKSON, G. W., SENN, J. A. 8z CHERVANY, N. L. (1977). Research in management

information systems: the Minnesota experiments. Management Science, 23, 913-923. LUCAS, H. C. (1979). An experimental investigation of the use of graphics in decision making,

pp. 11-17. School of Business, New York University. LUSK, E. L. & KERSNICK, M. (1979). The effect of cognitive style and report format on task

performance: the MIS design consequences. Management Science, 25, 787-798. SCHROEDER, R. G. & BENBASAT, 1. (1975). An experimental evaluation of the relationship

of uncertainty in the environment to information used by decision makers. Decision Sciences, 6, 556-5567.

SENN, J. A. (1973). Information system structure and purchasing decision effectiveness: an experimental study. Ph.D. Thesis, University of Minnesota (unpublished).

SMITH, H. R. (1975). Experimental comparison of database inquiry techniques. Ph.D. Thesis, University of Minnesota (unpublished).

Page 16: An experimental comparison of tabular and graphic data presentation

560 M. POWERS E T AL.

Appendix A. Instructions to subjects

TEXT OF INSTRUCTIONS READ TO SUBJECTS PRIOR TO TEST ADMINISTRATION

W e a re t ak ing a h u m a n factors in c o m p u t e r science course and conduc t ing an exper i - m e n t in an a t t e m p t to d e t e r m i n e the best way to c o m m u n i c a t e in fo rma t ion by c o m p u t e r - g e n e r a t e d repor t s . Each of you will be given di f ferent types and a moun t s of c o m p u t e r g e n e r a t e d ou tpu t . W e will give you five minu tes to s tudy the ma te r i a l s p resen ted . Y o u will then be given a 27 ques t ion mu l t i p l e - cho ice ques t ionna i re .

The ques t ions will dea l with the da t a values only. F o r ins tance, if we gave you a r e p o r t with two char t s on it such as: (d raw sample char ts on boa rd ) . W h e r e this char t conta ins i n fo rma t ion a b o u t 1980 popu la t i on and this char t conta ins in fo rmat ion a b o u t 1970 popu la t ions ( re fe r r ing to the char t s on boa rd ) . W e will ques t ion you a b o u t the i n fo rma t ion con ta ined in the charts . W e will no t ask you if the char t con ta in ing 1980 i n fo rma t ion was loca ted in the u p p e r r i gh t -hand co rne r of the repor t . The ques t ions will dea l with very specific aspects a b o u t the data . Y o u will be r equ i r ed to p e r f o r m severa l ca lcula t ions on the da ta p resen ted . Y o u will be given 10 minu tes to c o m p l e t e the ques t i onna i r e and you should a t t e m p t to answer as m a n y ques t ions cor rec t ly as poss ib le ; but , we do not expec t you to finish. If you do finish, p lease r ema in sea ted for the r e m a i n d e r of the t ime. Since you a re l imi ted to 10 minutes , you may sk ip ques t ions tha t you do not unde r s t and o r tha t you find e x t r e m e l y difficult. Par t ia l c red i t is given, so you will be r e w a r d e d for any " e d u c a t e d guesses" . P lease do not wr i te on the r epor t s . H o w e v e r , you may do any f iguring on the ques t ionna i re . A r e t he re any ques t ions?

W e will now h a n d out the consent forms. P lease r ead them and sign them. ( H a n d out mater ia l s . ) You have five minutes . ( A f t e r five minutes . ) P lease p lace the two digit code n u m b e r found on the ma te r i a l s

on the ques t i onna i r e book le t s when you get them. ( H a n d ou t ques t ionna i res . ) If the code n u m b e r p laced on the ques t ionna i r e is a 21, 22, o r a 23, then p lease hand in your mater ia l s . Y o u now have 10 minu tes to work on the ques t ionna i r e book le t .

Appendix B. Pilot study results

Form

Memory Tabular Graphic Combination Form summary

Non-recall Sum = 287 Sum = 234 Sum = 253 Sum = 774 Mean = 95.67 Mean = 78.00 Mean = 84.33 Mean = 86.0

Num. = 9

Recall Sum = 274 Sum = 138 Sum = 251 Sum = 663 Mean = 91.33 Mean = 46.0 Mean = 83.67 Mean = 73.67

Num. = 9

Memory Sum = 561 Sum = 372 Sum = 504 Totals summary Mean = 93.5 Mean = 62.00 Mean = 84.00 Mean = 79.83

Num. = 6 Num. = 6 Num. = 6 Num. = 18

Three subjects per cell in all cases. Form was statistically at the 0.010 level; memory was not statistically significant.

Page 17: An experimental comparison of tabular and graphic data presentation

COMPARISON OF TABULAR/GRAPHIC PRESENTATION

Appe n d i x C. Pilot s tudy correc t ions

561

Comments by pilot study subjects Corrections made by experimental team

The answer to several of the questions appeared on a different page to the question itself. This hampered reading of the question and the poss- ible answers

Subjects reported the 5 rain to study the tables or graphics was too long. On the other hand, 5 rain was too short to study the table/graphic combination

Some subjects did not like the questionnaire being on computer paper. It was unfamiliar and page flipping was difficult

Subjects commented that several extremely difficult questions were located in the first 10 questions. This was very discouraging to the subjects

Several subjects did not understand the term "median" as explained in one of the questions

We carefully placed page eject commands to eliminate this problem

We compromised and kept the time to study the materials at 5 rnin

We photocopied the computer output onto ordinary (8�89 x 11 in.) paper

We moved several of these questions to more appropriate positions in the questionnaire

A better explanation was substituted

Appendix D. Experimental materials

Current test Current test

Individual Test Letter Percentage number scores grades of class

1 19 A 20 2 17 B 15 3 5 C 30 4 20 D 10 5 14 F 25 6 12 7 7 8 9 9 19

10 15 11 18 Previous test 12 9 13 16 Letter Percentage 14 15 grades of class 15 12 16 16 A 15 17 14 B 15 18 15 C 25 19 14 D 20 20 11 F 25

Average for current test is 13.85.

Current test

Letter Number of grades students

A 4 B 3 C 6 D 2 F 5

Previous test

Letter Number of grades students

A 3 B 3 C 5 D 4 F 5

Page 18: An experimental comparison of tabular and graphic data presentation

562 M . P O W E R S E T A L .

Current test

~ 5.o~ ~

�9 o

c

N D I00% - -

SCHEME 1.

Previous test

15.0 % r/~, ,~

"6

Z

3

2

I

Average= 13 85

I I I I I I I I 8 12 16 20

Test scores

S C H E M E 2.

I 24

E

"5

E 3

Z

8

6

4

2

0

Current test

A B ~//.d. y//z r C D F

f [ ] Current I--I Previous

A B S C H E M E 3.

2?..

C D F

Page 19: An experimental comparison of tabular and graphic data presentation

C O M P A R I S O N OF T A B U L A R / G R A P H I C P R E S E N T A T I O N 563

Appendix E. Experimental questionnaire

Below are ques t ions that will test your c o m p r e h e n s i o n of the mater ia l you received.

Circ le the le t te r co r respond ing to the cor rec t answer.

1. The highest score for the current exam was A. 18 B. 19 C. 20 D. 21 E. 22

2. The lowest score for the current exam was A. 7 B. 4 C. 9 D. 5 E. 8

3. The percent of students that got a letter grade of " C " or better on the current exam was A. 30 B. 35 C. 65 D. 45 E. 75

4. The letter grade which was received the most often on the current exam was A . A B. B C. C D . D E. F

5. The average numeric grade of the current exam was A. 16-75 B. 14"85 C. 13.5 D. 13.85 E. 12-25

6. The median score (that is, the middle score) on the current exam was A. 14.5 B. 13"85 C. 12.25 D. 14.2 E. 15.1

7. When comparing the previous exam results to the current exam results, one can conclude that A. PREVIOUS EXAM SCORES W E R E P O O R E R THAN C U R R E N T EXAM SCORES B. C U R R E N T EXAM SCORES W E R E P O O R E R THAN PREVIOUS EXAM SCORES C. THE RESULTS OF BOTH EXAMS W E R E THE SAME D. THIS CAN NOT BE DETERMINED FROM THE INFORMATION GIVEN

8. The number of students that scored better than the average on the current exam was A. 9 B. 15 C. 11 D. 17 E. 13

Page 20: An experimental comparison of tabular and graphic data presentation

564 M. POWERS E T A L .

9. The number of scores received by two and only two people on the current exam was A. 7 B. 1 C. 2 D. 4 E. 6

10. The number of students that received a " B " or better as letter grades on the current was A. 7 B. 3 C. 13 D. 4 E. 9

11. The number of students that took the current exam was A. 22 B. 16 C. 20 D. 18 E. 25

12. The letter grade which shows the most change between the current exam and the previous exam is A . A B. B C. C D . D E. F

13. The number of students that received a score lower than average on the current exam was A. 9 B. 3 C. 11 D. 7 E. 5

14. Assume that the highest score on the current exam was only 80% of the possible total score. The total number of points would have been A. 20 B. 15 C. 25 D. 37-5 E. 50.0

15. The percentage of students that failed the current exam was A. 1/4 B. 1/3 C. 1/5 D. 1/10 E. 1/7

16. The average score for the current test fell into which letter grade (assuming a 90, 80, 70, 60 cut-off for A, B, C, D, F) A . A B. B C. C D . D E . F F. CAN NOT BE D E T E R M I N E D FROM THE INFORMATION GIVEN

Page 21: An experimental comparison of tabular and graphic data presentation

COMPARISON OF TABULAR/GRAPHIC PRESENTATION 5 6 5

17. If the s tudent who scored lowest on the cu r ren t exam answered only 33% of the quest ions on the exam correct ly, t hen the to ta l n u m b e r of quest ions mus t have been A. 17 B. 20 C. 10 D. 15 E. 21

18. If a s tuden t was to receive a score equal to the average on the cu r ren t exam, the percen t cor rec t would be A. 64 .25 B. 75 .75 C. 65 .75 D. 85-75 E. 57 .50

19. Look ing at the exam results as a whole, one could conclude tha t the exam was A. E X T R E M E L Y E A S Y B. E X T R E M E L Y H A R D C. M O D E R A T E L Y E A S Y D. M O D E R A T E L Y H A R D E. A N E X A M P L E O F A N O R M A L D I S T R I B U T I O N O F G R A D E S F. C A N N O T T E L L F R O M I N F O R M A T I O N G I V E N

20. The le t ter grade which was received mos t of ten on the previous exam was A . A B. B C. A + C D. C + D E. C + F

21. The average score received on the previous exam was A. 13.25 B. 11.50 C. 14.95 D. 13.85 E. 15-25 F. C A N N O T B E D E T E R M I N E D F R O M I N F O R M A T I O N G I V E N

22. The n u m b e r of s tudents tha t took the previous exam was A. 22 B. 16 C. 20 D. 18 E. 24

23. The percen t of s tudents tha t received a le t te r grade of " A " on the cu r ren t exam was A. 30 B. 33�89 C. 25 D. 20 E. 10

24. The pe rcen t of s tudents tha t failed the previous exam was A. 15 B. 25 C. 10 D. 30 E. 35

Page 22: An experimental comparison of tabular and graphic data presentation

566 M. POWERS E T A L .

25. In comparing the current exam with the previous exam it is apparent that A. M O R E STUDENTS TOOK THE PREVIOUS EXAM THAN TOOK THE

C U R R E N T EXAM B. M O R E STUDENTS TOOK THE C U R R E N T EXAM THAN TOOK THE

PREVIOUS EXAM C. THE N U M B E R OF PARTICIPANTS WAS E Q U A L D. IMPOSSIBLE TO C O M P A R E THE TWO EXAMS ON THIS FACTOR

26. In comparison, the highest score on the previous exam was A. G R E A T E R THAN THE HIGHEST SCORE ON THE C U R R E N T EXAM B. LESS THAN THE HIGHEST SCORE ON THE C U R R E N T EXAM C. E Q U A L TO THE HIGHEST SCORE OF THE C U R R E N T EXAM D. INCOMPARABLE, INFORMATION NOT GIVEN

27. If the number of questions on the current exam was 20, then the number of students receiving 100% on the exam was A. 0 B. 1 C. 2 D. 3 E. 4