my masters thesis

103
The University of Queensland Faculty of Business, Economics & Law Department of Commerce Information Request Ambiguity and End User Query Performance: Theory and Empirical Evidence A Thesis submitted to the Department of Commerce, the University of Queensland, in partial fulfilment of the requirements for the degree of Master of Information Systems. By Micheal Axelsen 15th June 2000 Supervisor: Dr Paul Bowen

Upload: micheal-axelsen

Post on 14-Nov-2014

161 views

Category:

Documents


1 download

DESCRIPTION

My Maters thesis

TRANSCRIPT

Page 1: My Masters Thesis

The University of Queensland

Faculty of Business, Economics & Law

Department of Commerce

Information Request Ambiguity and End User Query

Performance: Theory and Empirical Evidence

A Thesis submitted to the Department of Commerce, the University of Queensland, in partial fulfilment of the requirements for the degree of

Master of Information Systems.

By Micheal Axelsen

15th June 2000

Supervisor: Dr Paul Bowen

Page 2: My Masters Thesis

i

Acknowledgments

I wish to express my appreciation and thanks to my supervisor, Dr Paul Bowen, for his

assistance, advice, and patience in the preparation of this thesis. To my mother I offer thanks

for making it all possible. I also express sincere gratitude to my wife, Leeanne Klan, whose

obstinate patience continues to assist in putting the world in focus.

I also thank workshop participants at Nanyang Technological University in Singapore for

their comments and contributions to this thesis.

Page 3: My Masters Thesis

ii

Abstract

The increasing reliance of organisations on information technology and the persistent

shortage of IT/IS professionals requires end users to satisfy many information requests by

querying complex information systems. Because many business decisions are now based on

the results of the end users' queries, information request ambiguity has extensive

ramifications for business practices. Where the queries do not match the requirements of the

information requests, the business decisions are likely to be fundamentally flawed.

This paper develops a theory of ambiguity in information requests and reports the results of

an initial empirical investigation of that theory. The theory identifies seven ambiguities:

lexical, syntactical, inflective, pragmatic, extraneous, emphatic, and suggestive. A laboratory

experiment with sixty-six participants was used to investigate the empirical effect of

ambiguity on end user query performance. End user query performance was measured by the

number of total errors in the proposed solution, the time taken to complete the solution, and

the end user's confidence in the solution.

The results indicate that ambiguity significantly degrades end user query performance. The

seven types of ambiguity were analysed to determine their individual effects on end user

query performance. Actual (pragmatic, extraneous) and imaginary (emphatic, suggestive)

ambiguities show significant relationships with total errors and duration. In general, potential

(lexical, syntactical, and inflective) ambiguities were not significantly associated with total

errors or end user confidence. The results should have important implications for consulting

firms, for organisations with ad hoc work groups, and for entities that make extensive use of

electronic mail for information requests.

Page 4: My Masters Thesis

iii

Table of Contents

1. Introduction............................................................................................................................. 1

2. Information Request Ambiguity and End User Query Performance ..................................... 3

2.1 .A Theoretical Model of Information Request Ambiguity .......................................................... 3

2.2 .The Nature of Ambiguity ......................................................................................................... 5

2.2.1 Potential Ambiguity ..................................................................................................... 7 Lexical Ambiguity ........................................................................................................... 7 Syntactical Ambiguity ..................................................................................................... 8 Inflective Ambiguity ........................................................................................................ 9

2.2.2 Actual Ambiguity ...................................................................................................... 10 Pragmatic Ambiguity ..................................................................................................... 11 Extraneous Ambiguity ................................................................................................... 12

2.2.3 Imaginary Ambiguity ................................................................................................. 14 Emphatic Ambiguity...................................................................................................... 14 Suggestive Ambiguity.................................................................................................... 15

2.2.4 Ambiguity in Practice ................................................................................................ 17

2.3 .Task Complexity ................................................................................................................... 18

2.4 .Theoretical Model Summary .................................................................................................. 19

3. Methodology .......................................................................................................................... 20

3.1 .Experimental Design ............................................................................................................. 20

3.2 .Experiment Participants ......................................................................................................... 21

3.3 .Assessment of Participant Responses ..................................................................................... 21

4. Results and Discussion .......................................................................................................... 23

4.1 .Overview of Experimental Results ......................................................................................... 23

4.2 .Regression Analysis............................................................................................................... 27

4.3 .Ambiguity Treatment Multiple Linear Regression Model Results ........................................... 29 4.4 .Multiple Linear Regression Model: Seven Types of Ambiguity .............................................. 31

4.5 .Summary of Results ............................................................................................................... 32

4.5.1 Potential Ambiguity ................................................................................................... 34

4.5.2 Actual Ambiguity ...................................................................................................... 35

4.5.3 Imaginary Ambiguity ................................................................................................. 36

4.5.4 Complexity ................................................................................................................ 37

5. Implications For Business Practice ....................................................................................... 38

5.1.1 Electronic Mail .......................................................................................................... 38

5.1.2 Personnel Turnover and Work Teams......................................................................... 39

6. Contributions, Limitations, and Future Research ................................................................ 41

6.1 .Research Contributions .......................................................................................................... 41 6.2 .Research Limitations ............................................................................................................. 41

6.3 .Future Research ..................................................................................................................... 42

References ....................................................................................................................................... 44

Appendix A: Experiment Information Requests and Model Answers .......................................... 47

Appendix B: Experiment Instruction Sheet ................................................................................... 52

Appendix C: Command Interpreter Unix Shell Script .................................................................. 58

Appendix D: Experiment Entity-Relationship Diagram ............................................................... 65

Appendix E: Experimental Design ................................................................................................ 68

Page 5: My Masters Thesis

iv

Appendix F: Error Marking Sheets ............................................................................................... 72

Appendix G: Annotated Corrected Participant Response ............................................................. 75

Appendix H: Pearson Correlation Matrix of Variables ................................................................ 77

Appendix I: Analysis of Ambiguity's Effect On Error Type ........................................................ 78

Appendix J: Seven Ambiguity Types Question Assessment Ratings ............................................ 84

Appendix K: Ambiguity Assessment Instrument .......................................................................... 85

Appendix L: Internal Validity of the Experiment ........................................................................ 94

Page 6: My Masters Thesis

v

Figures

Figure 1 Types of Ambiguity (adapted from Walton 1996) 7

Figure 2 The Theoretical Model of Ambiguity, Complexity, and End User Query Performance 19

Figure 3 Depicting graphically the relationship between the treatment received (ambiguous or

clear information request) and the total errors in the participant's response.

25

Figure 4 Depicting graphically the relationship between the treatment received (ambiguous or

clear information request) and the duration taken for the participant to prepare the

response.

26

Figure 5 Depicting graphically the relationship between the treatment received (ambiguous or clear information request) and the participant's confidence in the response.

26

Tables

Table 1 Summary and Examples of the Seven Types of Ambiguity in Natural Language Information Requests

17

Table 2 Participant Demographic Information and Descriptive Statistics: Course Background

of Group A and Group B

23

Table 3 Participant Demographic Information and Descriptive Statistics: Academic Record of

Group A and Group B

23

Table 4 Participant Demographic Information and Descriptive Statistics: Participant Age in

Group A and Group B

24

Table 5 Comparative Statistics for all Participant Responses Grouped by Question (Q) and

Treatment (T). Note that for T, a = ambiguous, c = clear

25

Table 6 Confidence Rating Transformation to a Numerical Scale 28

Table 7 Ambiguity Assessment Scale for the analysis of the Seven Ambiguity Types Regression Model

29

Table 8 Regression Analysis Results for the General Ambiguity Regression Model 30

Table 9 Regression Analysis Results for the Seven Ambiguity Types Regression Model 31

Table 10 Summary of Analysis' Support for Hypotheses 32

Table 11 Participant Strata Classes 69

Page 7: My Masters Thesis

1

1. Introduction

Keen (1993) predicts that innovative applications of information technology will change the

competitive landscape to such an extent that fifty percent of companies in some industries

may not survive the next decade. This rise of the importance of information technology

innovation and application has lead to the increased need for relevant, timely information at

the point where that information is used and understood (Conger 1994; Delligatta and

Umbaugh 1994; Nath and Lederer 1996).

The demand for information system (IS) professionals vastly overwhelms the available

supply for both now and the foreseeable future (Freeman et al. 2000; Rosenthal and

Jategaonkar 1995; Australian Bureau of Statistics 1997). Hence, the use of computerised

information systems by end users has become compulsory in most business organisations

(Cardinali 1992; Athey and Wickham 1995-1996). To provide appropriate, relevant

information requires identifying and eliminating ambiguities in communication between the

stakeholders or managers requesting information, and the end users querying the information

systems.

Traditional structured methodologies reduce ambiguity at the expense of timeliness,

flexibility, and learning. The insights that end users can achieve during interactive, iterative

query sessions are also of benefit. The need for timeliness, flexibility, learning and end user

insights, as well as the shortage of IS professionals, have lead to the general decline of

structured reports (Ryan 1993). The use of ad hoc and iterative end user reports has

increased (Tayntor 1994). Nonetheless, many end users now use more formalised processes

in developing their reports than previously (Conger 1994; Tayntor 1994).

Page 8: My Masters Thesis

2

Information request ambiguity has potentially real and large impacts on business

organisations. An ambiguous information request can result in a report that, although it

appears acceptable to the person making the information request, does not contain the desired

information. If that wrong report is then used to make business decisions that the correct

report would not have supported, then information request ambiguity can cause substantial

negative impacts.

This paper develops a theory of the impact of ambiguity in information requests on end user

query performance, and tests that theory empirically. It empirically examines the strength

and direction of the relationships between ambiguity types (lexical, syntactical, inflective,

pragmatic, extraneous, emphatic, and suggestive), complexity, and end user query

performance. The current study extends previous work (Suh and Jenkins 1992; Borthick et

al. 1997; Rho and March 1997; Borthick et al. 2000) and builds upon the theory of end users'

query performance in the tradition of Dubin (1978).

Page 9: My Masters Thesis

3

2. Information Request Ambiguity and End User Query Performance

Different forms of ambiguity can be present in a natural language information request. The

primary aim of this research is to explore the impact of ambiguity on end user query

performance. This chapter develops a theory of the relationship between information request

ambiguity and end user query performance.

2.1 A Theoretical Model of Information Request Ambiguity

The development of an accurate SQL query by an end user depends on the user's knowledge

of the information needed, the database structure, and the query language (Ogden et al. 1986).

A lack of skill in any of these three domains will lead to inaccurate SQL queries (Ogden et al.

1986).

A natural language information request requires end users to transform the natural language

constructs into the query components consisting of lexical items (Katzeff 1990). End users

must conceptualise the information requirement and then mentally map this conceptualisation

to their understanding of the database structure. Reisner (1977) proposed a template model

for the manner in which users create SQL queries from a natural language information

request. Each query's operator components (Halstead 1977) are drawn from a set of known

query language components to address the requirements of the natural language information

request.

Ambiguity affects the user's interpretation of the information needed. Because information

requests are expressed using a natural language, they are ambiguous and uncertain. End users

Page 10: My Masters Thesis

4

must interpret and analyse the information requests to develop queries that meet the

requestors' needs. The end users' uncertainty in determining the required response affects the

required cognitive effort because multiple interpretations of the actual information required

may be legitimately constructed (Almuallim et al. 1997).

The impact of natural language's seven types of ambiguity has not previously been examined

in the context of end user query performance. These seven types of ambiguity are lexical,

syntactical, inflective, pragmatic, extraneous, emphatic and suggestive (Walton 1996; Fowler

and Aaron 1998). These ambiguities affect the number of legitimate interpretations of the

natural language statement of the information request. The information request has

"multiplicity of meaning" (Walton 1996).

Tasks that are more complex require increased cognitive effort (Campbell 1988). In the

context of database queries, task complexity generally negatively impacts end user query

performance (Borthick et al. 1997; Borthick et al. 2000). Task complexity is included in this

research to control for complexity's established impact on end user query performance.

Query performance can be measured on a number of dimensions including correctness, time

required, and confidence.

Hence, the following hypotheses are proposed:

H1a: Higher ambiguity in the information request leads to an increase in the total errors

in the query formulation.

H1b: Higher ambiguity in the information request leads to an increase in the time taken

to complete the query formulation.

Page 11: My Masters Thesis

5

H1c: Higher ambiguity in the information request leads to lower end user confidence in

the accuracy of the query formulation.

2.2 The Nature of Ambiguity

Ambiguity is an inherent property of all natural languages, including English (Jespersen

1922; Williamson 1994). Absolute precision of a language is pragmatically undesirable,

because the language is unable to adapt to new concepts (Williamson 1994). The

communication needed to ensure effective and efficient report production, however, requires

complete clarity. Hence, a tension exists between the natural language's need for flexibility

in the long term and the need for precision in the short term. Natural language is at once both

dysfunctional and poorly adapted to the functions language needs to perform, yet flexible and

broad-based such that it is useable in practice (Chomsky 1990).

Interest in linguistic ambiguity has an extensive history, and has been recognised as a

separate branch of study since at least Aristotle's time (Kooij 1971). Aristotle noted that

language must be ambiguous, as a language has limited words but an infinite number of

things and concepts to which those words must apply (Kooij 1971).

Russell (1923) recognised that all natural languages are vague and ambiguous. Excluding the

realm of mathematical symbolism, constructing completely unambiguous expressions is not

possible with the syntax and vocabulary tools available within natural languages (Williamson

1994). To endure and survive, language requires the flexibility to communicate new

concepts. Ambiguity necessarily derives from the flexibility of natural language.

Page 12: My Masters Thesis

6

Kooij (1971) states that ambiguity arises where a sentence can be interpreted in more than

one way. Similarly, Walton (1996) considers a sentence or statement to be more ambiguous

as the number of legitimate interpretations of the sentence (or paragraph) increase.

Ambiguity implies multiplicity of meaning (Walton 1996).

In classical analysis, the multiplex (Latin for "multiple meaning") categorisation of

Alexander of Aphrodisius (Hamblin 1970) suggests a basis for the identification of categories

of ambiguity. In classical literature, Alexander of Aphrodisius identified three categories of

ambiguity: potential, actual, and imaginary. Walton (1996) adapts this classical multiplex

categorisation to his identified types of ambiguity.

Walton (1996) identifies six classical types of ambiguity in natural language: lexical,

syntactical, inflective, pragmatic, emphatic, and suggestive. In addition to Walton's (1996)

taxonomy, extraneous information and noise in the communication can also be a source of

ambiguity. Extraneous ambiguity arises where the communication is not parsimonious, or

the communication includes information that is not directly relevant to the message being

communicated (Fowler and Aaron 1998). Extraneous ambiguity is an actual ambiguity

within the Walton (1996) taxonomy.

Each ambiguity type can be independently present within the communication. Walton's

(1996) modified taxonomy and model of ambiguity is presented in Figure 1.

Page 13: My Masters Thesis

7

Ambiguity

SuggestiveEmphaticPragmaticInflective

Syntactical

Lexical

ImaginaryActualPotentialMultiplex

Categories of

Ambiguity

Types of

AmbiguityExtraneous

Figure 1

Types of Ambiguity (adapted from Walton 1996)

2.2.1 Potential Ambiguity

Potential ambiguity arises when a term or a sentence is ambiguous in and of itself, for

example, before its use in the context of a sentence or paragraph. Three types of ambiguity

are categorised as potential ambiguity: lexical, syntactical, and inflective.

Lexical Ambiguity

Lexical ambiguity is the most commonly known form of ambiguity (Reilly 1991; Walton

1996). It occurs when words have more than one meaning as commonly defined and

understood. Considerable potential ambiguity arises when a word with various meanings is

used in a statement of information request. For example, "bank" may variously mean the

"bank" of a river (noun), to "bank" as related to aeroplane or a roller-coaster (verb), a savings

"bank" (noun), to "bank" money (verb), or a "bank" of computer terminals (noun) (Turner

1987). Lexical ambiguity is often reduced or mitigated by the context of the sentence.

In the case of an information request, lexical ambiguity exists in the statement "A report of

our clients for our marketing brochure mail-out". The word "report" may have several

Page 14: My Masters Thesis

8

meanings, independent of its context. A gunshot report may echo across the hillside. A

student can report to the lecturer. A heavy report can be dropped on the foot. Although the

context may make the meaning clear, the lexical ambiguity contributes to the overall

ambiguity of the statement and increases cognitive effort.

The following hypotheses are proposed:

H2a: Higher lexical ambiguity in the information request leads to an increase in the total

errors in the query formulation.

H2b: Higher lexical ambiguity in the information request leads to an increase in the time

taken to complete the query formulation.

H2c: Higher lexical ambiguity in the information request leads to lower end user

confidence in the accuracy of the query formulation.

Syntactical Ambiguity

Syntactical ambiguity is a structural or grammatical ambiguity of a whole sentence that

occurs in a sub-part of a sentence (Reilly 1991; Walton 1996). Syntactical ambiguity is a

grammatical construct, and results from the difficulty of applying universal grammatical laws

to sentence structure. An example of syntactical ambiguity is "Bob hit the man with the

stick". This phrasing is unclear as to whether a man was hit with a stick, or whether a man

with a stick was struck by Bob. The context can substantially reduce syntactical ambiguity.

For example, knowing that either Bob, or the man, but not both, had a stick resolves the

syntactical ambiguity.

Page 15: My Masters Thesis

9

Comparing the phrase "Bob hit the man with the stick" to the analogous "Bob hit the man

with the scar" provides some insights. As a scar is little suited to physical, violent use, the

latter formulation clearly conveys that the man with the scar was struck by Bob (Kooij 1971).

In the case of an information request, syntactical ambiguity exists in the request "A report of

poor-paying clients and client managers. Determine their effect on our profitability for the

last twelve months." The request is syntactically ambiguous because the end user can

interpret "their" to mean the poor paying clients, the client managers, or both. Although the

context may reduce or negate the ambiguity, syntactically the request is ambiguous.

The following hypotheses are proposed:

H3a: Higher syntactical ambiguity in the information request leads to an increase in the

total errors in the query formulation.

H3b: Higher syntactical ambiguity in the information request leads to an increase in the

time taken to complete the query formulation.

H3c: Higher syntactical ambiguity in the information request leads to lower end user

confidence in the accuracy of the query formulation.

Inflective Ambiguity

As Walton (1996) notes, inflective ambiguity is a composite ambiguity, containing elements

of both lexical and syntactical ambiguity. Like syntactical ambiguity, inflective ambiguity is

grammatical in nature. Inflection arises where a word is used more than once in a sentence or

paragraph, but with different meanings each time (Walton 1996). An example of inflective

Page 16: My Masters Thesis

10

ambiguity is to use the word "scheme" with two different meanings in the fallacious

argument, "Bob has devised a scheme to save costs by recycling paper. Therefore, Bob is a

schemer, and should not be trusted" (Ryle 1971; Walton 1996).

In the case of an information request, inflective ambiguity exists in the example, "A report

showing the product of our marketing campaign for our accounting software product".

Ambiguity derives from using the word "product" in two different senses in the one statement

(Walton 1996; Fowler and Aaron 1998).

The following hypotheses are proposed:

H4a: Higher inflective ambiguity in the information request leads to an increase in the

total errors in the query formulation.

H4b: Higher inflective ambiguity in the information request leads to an increase in the

time taken to complete the query formulation.

H4c: Higher inflective ambiguity in the information request leads to lower end user

confidence in the accuracy of the query formulation.

2.2.2 Actual Ambiguity

Actual ambiguity refers to ambiguity that occurs in the act of speaking. It arises when a word

or phrase, without variation either in itself or in the way the word is put forward, has different

meanings. The statement does not contain adequate information to resolve the ambiguity,

resulting in a number of legitimate interpretations. Two distinct types of ambiguity are

categorised as actual ambiguity: pragmatic and extraneous.

Page 17: My Masters Thesis

11

Pragmatic Ambiguity

Pragmatic ambiguity arises when the statement is not specific, and the context does not

provide the information needed to clarify the statement. Information is missing, and must be

inferred. An example of pragmatic ambiguity is the story of King Croesus and the Oracle of

Delphi (adapted from Copi and Cohen 1990):

"King Croesus consulted the Oracle of Delphi before warring with Cyrus of

Persia. The Oracle replied that, "If Croesus went to war with Cyrus, he would

destroy a mighty kingdom". Delighted, Croesus attacked Persia, and Croesus'

army and kingdom were crushed. Croesus complained bitterly to the Oracle's

priests, who replied that the Oracle had been entirely right. By going to war with

Persia, Croesus had destroyed a mighty kingdom - his own."

Pragmatic ambiguity arises when the statement is not specific, and the context does not

provide the information needed to clarify the statement (Walton 1996). The information

necessary to clearly understand the message is omitted. Due to the need to infer the missing

information, pragmatically ambiguous statements have multiple possible interpretations

(Walton 1996). Croesus interpreted the Oracle's statement as indicating his success in battle -

the response he desired. As noted by Hamblin (1970), Croesus' logical response to the

oracular reply would have been to immediately ask the Oracle, "Which kingdom?" Further

information is needed to resolve pragmatic ambiguity.

In the case of an information request, pragmatic ambiguity exists in the request for "A report

of all the clients for a department." The ambiguity is that the request does not refer to a

specific department. The end user could legitimately prepare a report for any department.

Further information is needed to resolve this actual ambiguity in this case.

Page 18: My Masters Thesis

12

The following hypotheses are proposed:

H5a: Higher pragmatic ambiguity in the information request leads to an increase in the

total errors in the query formulation.

H5b: Higher pragmatic ambiguity in the information request leads to an increase in the

time taken to complete the query formulation.

H5c: Higher pragmatic ambiguity in the information request leads to lower end user

confidence in the accuracy of the query formulation.

Extraneous Ambiguity

In contrast to pragmatic ambiguity, in which information necessary to clearly understand the

message is omitted, extraneous ambiguity arises from an excess of information. Clearer

communication arises where the minimally sufficient words needed to convey the message of

the statement are used (Fowler and Aaron 1998). Where more words are used than

necessary, or where unnecessary detail is provided in the communication that is not part of

the message, ambiguity arises. The excess detail obscures the essential message and

contributes to different emphases or interpretations.

The use of passive voice, vacuous words, or the repetition of phrases with the same meaning

all contribute to lack of clarity (Fowler and Aaron 1998). The use of clichés and the over-use

of figures of speech add volume to the statement, but add little or no meaning. Pretentious

and indirect writing also adds to the bulk of the statement, but without adding meaning.

Fowler and Aaron (1998) provide the following comparative example:

Page 19: My Masters Thesis

13

Pretentious: To perpetuate our endeavour of providing funds for our elderly citizens as

we do at the present moment, we will face the exigency of enhanced

contributions from all our citizens.

Revised: We cannot continue to fund Social Security and Medicare for the elderly

unless we raise taxes.

The extra volume contributes to vagueness in the first statement, and adds to the multiplicity

of legitimate interpretations of the statement. The first statement exhibits extraneous

ambiguity. The second statement communicates forcefully and concisely.

An example of extraneous ambiguity in an information request is "A report of all clients (and

their names and addresses only) for the Tax and Business Services department. Some of

those clients are our biggest earners, you know". The last sentence is extraneous, and

contains detail that is redundant, uninformative, or misleading relative to the fundamental

message. In information theoretic terms, extraneous ambiguity is "noise" in the

communication (Axley 1984; Eisenberg and Phillips 1991; Severin and Tankard 1997).

The following hypotheses are proposed:

H6a: Higher extraneous ambiguity in the information request leads to an increase in the

total errors in the query formulation.

H6b: Higher extraneous ambiguity in the information request leads to an increase in the

time taken to complete the query formulation.

H6c: Higher extraneous ambiguity in the information request leads to lower end user

confidence in the accuracy of the query formulation.

Page 20: My Masters Thesis

14

2.2.3 Imaginary Ambiguity

Imaginary ambiguity occurs when a word with a fixed meaning seems to have a different one.

Imaginary ambiguity derives from the optional interpretation that the recipient of the

communication places on the information received. Two distinct types of ambiguity can be

categorised as imaginary ambiguity: emphatic and suggestive.

Emphatic Ambiguity

The question of ambiguity deriving from accent, or emphasis in speaking, is an ancient one

(Hamblin 1970). When a phrasing is rendered in the written form, the verbal emphasis may

only be crudely indicated. Significant meaning and context is lost. Rescher (1964) provides

the following example of emphatic ambiguity:

The intended meaning of the democratic credo "Men were created equal" can be

altered by stressing the word "created" (implying "that's how men started out, but

they are no longer so").

The verbal emphasis creates an inference of meaning that is a legitimate interpretation of the

phrasing. That is, changes in intonation can yield different interpretations.

In the case of an information request, emphatic ambiguity occurs in the example information

request of "A report of our good clients". Ambiguity can derive from placing different

emphases on the words. Depending on the context or on emphasis used, "good clients" could

be legitimately interpreted to be clients that pay on time or clients that have the highest

dollar-value sales. Indeed, with an ironic emphasis on the word "good", this request could be

interpreted as a list of our worst clients - those that do not pay. The information necessary to

resolve the ambiguity is often difficult to convey using only printed media.

Page 21: My Masters Thesis

15

The following hypotheses are proposed:

H7a: Higher emphatic ambiguity in the information request leads to an increase in the

total errors in the query formulation.

H7b: Higher emphatic ambiguity in the information request leads to an increase in the

time taken to complete the query formulation.

H7c: Higher emphatic ambiguity in the information request leads to lower end user

confidence in the accuracy of the query formulation.

Suggestive Ambiguity

Despite the apparent clarity of the sentence in question, suggestive ambiguity creates diverse

implications and innuendos that can produce different implications (Walton 1996). Fischer

(1970) provides an example:

The First Mate of a ship docked in China returned drunk from shore leave, and

was unable to write up the ship's log. The displeased Captain completed the log,

adding, "The Mate was drunk all day". The next day, the now-sober Mate

challenged the Captain over the entry, as it would reflect poorly on him. The

Captain responded that the comment was true, and must stand. Whereupon the

mate added to that day's log, "The Captain was sober all day". In reply to the

Captain's challenge, the mate responded "the comment is true, and must stand"

(derived from Trow 1905, pp 14-15).

The phrase "The Captain was sober all day" contains suggestive ambiguity. As a further

example, the statement, "The President is now an honest man", is perfectly clear, and yet

considerable innuendo exists. The fact that the President's current honesty is worthy of

comment implies that the President was previously dishonest.

Page 22: My Masters Thesis

16

Both phrases are perfectly clear, and, indeed, true. However, considerable innuendo exists.

The fact that the Captain's sobriety, or the President's honesty, is singled out for special

comment implies that such a state of affairs is unusual (Walton 1996). The statements are

suggestively ambiguous.

In the case of an information request, an example of this ambiguity is, "A report of the clients

of this accounting practice that have lodged taxation returns in the past five years in

accordance with the requirements of the Australian Taxation Office". The request for

information is quite clear. By definition, however, all taxation returns should be lodged in

accordance with the Australian Taxation Office's requirements. The extra phrase introduces

suggestive ambiguity into the information request by suggesting that the report will not

consist of all taxation clients, because some clients may not have complied with the Tax

Office's requirements.

The following hypotheses are proposed:

H8a: Higher suggestive ambiguity in the information request leads to an increase in the

total errors in the query formulation.

H8b: Higher suggestive ambiguity in the information request leads to an increase in the

time taken to complete the query formulation.

H8c: Higher suggestive ambiguity in the information request leads to lower end user

confidence in the accuracy of the query formulation.

Page 23: My Masters Thesis

17

2.2.4 Ambiguity in Practice

Table 1 provides examples of the types of ambiguity identified in this paper. The table also

summarises, and provides examples for, each type of ambiguity.

Table 1

Summary and Examples of the Seven Types of Ambiguity in Natural Language Information Requests

Ambiguity

Type

Information Request

Lexical A report of our clients for our marketing brochure mail-out.

The word "report" may have several meanings, independent of its context.

For example, there may be: a gunshot report echoing through the hillside;

the Lieutenant reported to the Captain; I dropped the heavy report on my toe,

etc. Although the context may make the meaning clear, the lexical ambiguity adds to cognitive effort and contributes to ambiguity overall.

Syntactical A report of poor-paying clients and client managers. Determine their effect

on our profitability for the last twelve months.

It is not clear whose effect on profitability is meant. Another example is

"Bob hit the man with a stick". It is not clear, syntactically, whether the man

with a stick was hit, or whether the man was hit, by Bob, with a stick.

Inflective A report showing what the product of our last marketing campaign for sales

of our accounting software product in the last month was.

Ambiguity here derives from the use of the word "product" with two

different meanings in the one information request.

Pragmatic A report of all the clients for a department.

The ambiguity here is that the department has not been specified.

Information necessary to clearly understand the message is omitted. It would

be legitimate to prepare a report for any department. Further information is

needed to resolve this actual ambiguity.

Extraneous A report of all clients (and their names and addresses only) for the Tax and

Business Services department. Some of those clients are our biggest earners, you know.

The last sentence is extraneous. Unlike pragmatic ambiguity, the sentence

contains information that is redundant, uninformative, or not necessary to

derive the statement's message. "Noise" in the communication exists. More words are used than are necessary to make the statement.

Emphatic A report of our good clients.

Ambiguity here could derive from the lack of ability to provide emphasis of

the words in its written form. Depending on the emphasis used, "good

clients" could be legitimately interpreted to be clients that pay on time,

clients that have the most dollar-value sales, or even, with the correct ironic emphasis on the spoken word, our worst clients - those that do not pay.

Page 24: My Masters Thesis

18

Ambiguity

Type

Information Request

Suggestive A report of the clients of this accounting practice that have lodged taxation

returns in the past five years in accordance with the requirements of the

Australian Taxation Office.

The request for information is quite clear until the phrase "in accordance

with the requirements of the Australian Taxation Office". By definition, all

taxation returns should be lodged in accordance with these requirements.

The extra phrase introduces suggestive ambiguity into the information

request by suggesting that the report will not necessarily consist of all

taxation clients.

2.3 Task Complexity

More complex tasks require more cognitive effort and hence have a generally negative

impact on the user's performance in deriving database queries (Campbell 1988; Borthick et

al. 1997; Borthick et al. 2000). Task complexity, in the context of query development,

consists of the inherent task complexity associated with the query syntax, and the data

structure complexity associated with the organisation of the tables and attributes (Liew 1995).

Campbell (1988) and Wood (1986) document the general impact of task complexity. Jih et

al. (1989) studied task complexity and user performance in the context of the use of entity-

relationship diagrams and relational data models. Complexity in this context is generally

measured as a function of the total number of elementary mental discriminations required to

write a query (Halstead 1977).

The following hypotheses are proposed:

H9a: Higher complexity in the information request leads to more total errors in the query

formulation.

Page 25: My Masters Thesis

19

H9b: Higher complexity in the information request leads to more time taken to complete

the query formulation.

H9c: Higher complexity in the information request leads to lower end user confidence in

the accuracy of the query formulation.

2.4 Theoretical Model Summary

Figure 2 summarises the theoretical model presented in this paper. Complexity and the seven

types of ambiguity have a negative impact on end user query performance as they increase.

Hypotheses 1 through 9 are derived from these hypothesised relationships.

Pragmatic

Extraneous

Lexical

Syntactical

Inflective

Emphatic

Suggestive

Ambiguity

Information

Request

Complexity

End User

Query

Performance

Negative

Relationship With

Negative

Relationship With

Figure 2

The Theoretical Model of Ambiguity, Complexity, and End User Query Performance

Page 26: My Masters Thesis

20

3. Methodology

3.1 Experimental Design

A laboratory experiment was conducted to test the hypotheses presented in this study. A two-

factor, within-groups experimental design was used (Huck et al. 1974). Participants were

randomly assigned to two groups (Group A and Group B). Each participant was presented

with up to sixteen questions. Each question was presented in either a clear or ambiguous

formulation.

Group A's question formulations were alternately ambiguous and clear. Group B's question

formulations were alternately clear and ambiguous. Using alternating formulations helped

promote equitable treatment of the two groups. That is, the alternating formulations ensured

that both groups would complete approximately the same number of questions during the

allotted time, expend approximately the same amount of cognitive effort, and would

experience approximately the same level of frustration in dealing with ambiguous

information requests. All participants spent two hours on the experiment. Appendix A

shows the questions presented to students together with the model answers.

A set of instructions (Appendix B), including a synopsis of the query language syntax, was

provided to the participants. A Unix shell script (Appendix C) presented the questions

electronically to the participants and automatically captured their responses in text files. An

entity-relationship diagram describing the database is presented in Appendix D, and was

available to subjects. Further details regarding the experimental process are provided in

Appendix E.

Page 27: My Masters Thesis

21

3.2 Experiment Participants

Forty-seven undergraduate and nineteen postgraduate students participated in the experiment.

Participating students were enrolled either in an advanced undergraduate or in a post-graduate

database subject within the business school at the University of Queensland. All students

enrolled in the two database subjects participated in the experiment.

The motivation for student participation was the receipt of five percent of the students' final

mark for the subject (2.5% for participation, 2.5% for performance). Participants were aware

that they were participating in an experiment.

Participants had been previously trained in the use of the SQL query language, and had been

afforded the opportunity to practice SQL on the university systems. All practice took place

on different databases than used for the experiment. Generally, student expertise with SQL

was low to intermediate. The experiment, for most students, was the first practical

application of their SQL skills.

3.3 Assessment of Participant Responses

Participant responses were captured in text files that showed each interactive response and

captured the start and end time of each question. This file was edited into a suitable format

for marking by two examiners. Each response was independently assessed by each examiner

to determine whether the response was the participant's final complete response. Responses

where participants did not finish the query formulation were removed from the study.

Page 28: My Masters Thesis

22

In some instances, the state of completion of the response was indeterminate. If the response

could only be corrected with substantial rework of the submitted response, the examiners

erred on the side of caution and removed these responses from the study.

Examiners then corrected the answers according to the model answers (Appendix A), using

the Semantic Error Counting, SQL Challenge Error Counting, and Intermediate Error

Counting Forms shown in Appendix F. Each examiner independently assessed the

participant responses and corrected the response. Each discrete alteration (addition or

deletion of a query component) counted as one "micro error" in the Semantic Error Counting

Form (Appendix F).

The corrected response that determined the total error count was the response that required

the fewest changes to the participant's response, and still produced the required result set.

This approach ensured a lower error count than a strict modification of the response to ensure

an exact match to the model answer. Appendix G provides an example corrected response.

The examiners then compared their independent assessments to ensure that all errors had

been found and corrected and that the proposed formulations or corrected formulations

produced the correct output. If more than one correction method was found to produce a

correct query, the correction method that produced the smallest number of errors was used.

A diary of common errors and their corrections was kept to ensure consistency throughout the

assessment process. The final, moderated, error sheets were transcribed to a relational

database for analysis.

Page 29: My Masters Thesis

23

4. Results and Discussion

4.1 Overview of Experimental Results

Participant demographic information and statistics are presented in Tables 2, 3, and 4. The

demographic information indicates that the assignment of participants to ensure homogeneity

between Group A and Group B was successful. The groups are relatively homogeneous in

terms of course background, grade point average (GPA), and age. In any case, both Group A

and Group B received the treatment effect of ambiguity on alternate questions, mitigating

concerns of the effect of a selection bias on experimental results.

Table 2

Participant Demographic Information and Descriptive Statistics: Course Background of Group A and Group B

Enrolled Degree Group

A

Group

B

Total

Undergraduate Arts 3 3 6

Undergraduate Business 20 18 38

Undergraduate Computer Science/Information systems 3 0 3

Postgraduate Business 2 1 3

Postgraduate Computer Science/Information Systems 5 11 16

Total Participants: 33 33 66

Table 3 Participant Demographic Information and Descriptive Statistics:

Academic Record of Group A and Group B

Academic Record Average Standard

Deviation

Min Max

GPA (65 students with academic

records)

4.94 0.90

3.26 7.00

GPA (Group A: 33 students with

academic records)

5.04 0.83 3.26 6.84

GPA (Group B: 32 students with

academic records)

4.83 0.97 3.29 7.00

Page 30: My Masters Thesis

24

Table 4

Participant Demographic Information and Descriptive Statistics:

Participant Age in Group A and Group B

Age (in Years) Average Standard

Deviation

Min Max

Average Age

(65 Students with date of birth

available)

24.94 7.72 18.74 61.25

Average Age

(Group A, 33 Students with date

of birth available)

24.76 7.29 19.50 48.53

Average Age

(Group B, 32 Students with date

of birth available)

25.13 8.26 18.74 61.25

Participants completed 425 responses in the experiment. The experiment contained sixteen

questions for both ambiguous and clear information requests. Due to the two hour time

constraint no participant completed more than twelve questions. Forty participants (60.61%

of the sample population) completed six questions. On average, participants completed 6.44

questions, with a standard deviation of 1.75.

Table 5 provides an overview of the participants' results in the experiment. Total errors is

calculated as the average of the micro errors counted using the Semantic Error Counting

Sheet shown in Appendix F. Appendix H provides a Pearson correlation matrix of the

dependent and independent variables measured in the experiment. Appendix I provides

detailed reports of the errors participants made on each individual question.

Page 31: My Masters Thesis

25

Table 5

Comparative Statistics for all Participant Responses

Grouped by Question (Q) and Treatment (T). Note that for T, a = ambiguous, c = clear Q T Halstead's

Complexity

Group Response

Count

Attempts

Average

Attempts

Standard

Deviation

Confidence

Average

Confidence

Standard

Deviation

Duration

Average

Duration

Standard

Deviation

Total Errors

Average

Total Errors

Standard

Deviation

1 a 1.6927 A 32 3.31 1.99 6.22 1.36 10.51 4.63 1.59 3.66

1 c 1.6927 B 33 3.18 2.16 6.42 0.87 11.63 6.60 1.12 2.48

2 a 5.4186 B 33 9.21 8.88 5.21 1.47 20.74 11.30 4.27 8.18

2 c 5.4186 A 33 3.61 3.43 6.30 1.05 9.03 6.89 0.30 0.81

3 a 6.8908 A 33 7.94 6.04 5.91 1.57 11.84 7.72 3.97 3.50

3 c 6.8908 B 33 5.09 6.18 6.27 1.42 8.63 5.29 1.03 2.86

4 a 4.4697 B 32 7.31 4.75 5.38 1.64 15.57 8.95 4.03 5.54

4 c 4.4697 A 33 6.52 7.36 6.21 1.47 10.95 8.46 0.67 2.23

5 a 12.2917 A 33 9.24 6.63 5.24 2.21 18.54 11.06 9.42 10.39

5 c 12.2917 B 30 7.07 5.98 5.37 2.16 15.65 9.74 5.20 7.70

6 a 18.8000 B 17 11.41 7.21 5.59 1.33 23.59 7.93 32.94 13.21

6 c 18.8000 A 23 14.91 9.36 4.87 1.91 25.63 10.13 8.00 10.49

7 a 16.0076 A 15 11.07 6.10 5.07 1.49 18.78 5.46 7.27 8.65

7 c 16.0076 B 15 7.67 4.20 5.07 1.98 15.31 7.86 6.13 7.41

8 a 16.2684 B 6 6.83 8.42 5.83 1.60 13.24 8.36 2.33 4.08

8 c 16.2684 A 10 6.40 2.46 5.00 1.94 12.53 5.35 6.40 6.52

9 a 23.8970 A 3 12.33 2.08 3.00 1.73 16.43 7.77 18.00 10.54

9 c 23.8970 B 2 6.50 3.54 6.50 0.71 15.36 2.51 15.50 21.92

10 a 19.4819 B 1 7.00 - 5.00 - 9.93 - 20.00 -

10 c 19.4819 A 4 7.25 3.20 4.25 2.50 9.56 1.40 5.00 2.58

11 a 22.4000 A 2 7.00 4.24 5.00 2.83 8.53 2.13 22.50 13.44

11 c 22.4000 B 1 4.00 - 7.00 - 9.45 - 8.00 -

12 c 29.1633 B 1 14.00 - 4.00 - 10.10 - 8.00 -

The relationships between the dependent variables (duration, confidence, and total errors) and

the independent variables (complexity, ambiguity) are graphically depicted in Figures 3, 4,

and 5. These figures illustrate that the hypothesised relationships for complexity and

ambiguity were supported for most measures by most queries.

Questions by Treatment and Error

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

1 2 3 4 5 6 7 8 9 10 11 12

Question

Avera

ge E

rro

rs

Ambiguous

Clear

Figure 3

Depicting graphically the relationship between the treatment received (ambiguous or clear information request)

and the total errors in the participant's response.

Page 32: My Masters Thesis

26

Questions by Treatment and Duration

0.00

5.00

10.00

15.00

20.00

25.00

30.00

1 2 3 4 5 6 7 8 9 10 11 12

Question

Avera

ge D

ura

tio

n

(in

min

ute

s)

Ambiguous

Clear

Figure 4

Depicting graphically the relationship between the treatment received (ambiguous or clear information request)

and the duration taken for the participant to prepare the response.

Questions by Treatment and Confidence

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00

1 2 3 4 5 6 7 8 9 10 11 12

Question

Avera

ge C

on

fid

en

ce

Ambiguous

Clear

Figure 5

Depicting graphically the relationship between the treatment received (ambiguous or clear

information request) and the participant's confidence in the response.

Question Six, with an average of 32.94 errors (standrard deviation of 13.21), caused the most

problems for participants in its ambiguous formulation. Nonetheless the seventeen

respondents to Question Six in its ambiguous formulation took on average slightly less time

to complete the response (23.59 average minutes, 7.93 standard deviation) than the twenty-

three respondents for the clear formulation (25.63 average minutes, 10.13 standard

deviation).

Page 33: My Masters Thesis

27

Participants that completed Question Eight in the clear formulation made more average errors

(6.40, standard deviation of 6.52) than those with the ambiguous formulation (average of 2.33

and standard deviation of 4.08). Participants also exhibited higher average confidence ratings

for the ambiguous formulation of this question (5.83, standard deviation of 1.60) than

participants receiving the clear formulation (5.00, standard deviation of 1.94).

A reason for these results may be that extraneous ambiguity is apparent in the clear

formulation due to the formulation's length. Question Eight had sixteen completed responses

(six respondents for the ambiguous formulation, ten respondents for the clear formulation),

however, which limits the weight that can be placed on this question's result. Because of the

small number of participants completing Questions Nine through Twelve, analysis of

differences in these individual questions is not appropriate.

4.2 Regression Analysis

Two multiple linear regression models were used to analyse the experimental results. The

model used to test H1a-c, and H9a-c for the effects of ambiguity and complexity respectively

was:

(1) Performance = Ambiguity + Complexity

where ambiguity was a dichotomous variable and complexity was measured using the

Halstead (1977) complexity measure for difficulty.

Page 34: My Masters Thesis

28

The model used to test the seven individual types of ambiguity in H2a-c to H8a-c was:

(2) Performance = Lexical + Syntactical + Inflective + Pragmatic +

Extraneous + Emphatic + Suggestive + Complexity

where the ambiguity types were measured as shown in Appendix J, according to the

ambiguity assessment instrument presented in Appendix K.

Performance is end user query performance. The dependent variables that proxy for end user

query performance are total errors, duration, and confidence. Duration was measured as

decimal minutes. The Confidence Rating was self-assessed by participants and was

transformed to a numerical rating in accordance with Table 6. The numerical rating was used

as the measure for confidence in the regression analysis.

Table 6

Confidence Rating Transformation to a Numerical Scale

Confidence Rating Numerical Rating

>85-100% 7

70-85% 6

55-70% 5

40-55% 4

25-40% 3

10-25% 2

<10% 1

In all regression models, the Halstead (1977) complexity measure for difficulty was used to

assess the complexity of the required model answer. This measure has been used in several

end user query performance studies (Jih et al. 1989).

For testing H1a-c and H9a-c, a dichotomous variable of 0 (clear formulation, or pseudo-SQL)

and 1 (ambiguous formulation, or manager-English) was used to indicate whether the

Page 35: My Masters Thesis

29

participant had received a clear formulation or an ambiguous formulation of the information

request. For testing H2a-c to H8a-c, the seven independent ambiguity parameters were

assessed in accordance with the scale presented in Table 7. Each question was assessed by

two independent non-researchers who had been briefed in the definitions of the seven types

of ambiguity. The initial scores were moderated by discussion and consideration between the

independent third parties and the researcher to ensure consistent and correct interpretation of

the seven ambiguity definitions. Cronbach's alpha (Cronbach 1951) for the two third parties'

ambiguity measurement scores was 0.6887, indicating that a moderately reliable measure for

ambiguity across two researchers was achieved.

Table 7

Ambiguity Assessment Scale for the analysis of the Seven Ambiguity Types Regression Model

Ambiguity Assessment Rating Meaning

0 No ambiguity of this type present

1 A little ambiguity of this type present

2 Some ambiguity of this type present

3 Much ambiguity of this type present

4 A great deal of ambiguity of this type present

Each question formulation, clear and ambiguous, for each information request was assessed

to provide a scale of ambiguity. The instrument used to undertake this finer assessment of

ambiguity for questions for which responses exist is reproduced in Appendix K. Using a five

point scale for the ambiguity assessment rating provides a finer measure than would a

dichotomous variable.

4.3 Ambiguity Treatment Multiple Linear Regression Model Results

Table 8 provides the results of the multiple linear regression (Newbold 1984) shown for

model (1) for the Total Errors, Duration, and Confidence measures of end user query

performance. These results provide evidence regarding H1a-c and H9a-c. All relationships

Page 36: My Masters Thesis

30

are in the hypothesised direction (positive for H1a, H1b, H9a, and H9b, and negative for H1c

and H9c), and indicate strong support for each hypothesis.

Table 8

Regression Analysis Results for the General Ambiguity Regression Model

Source

(n=425)

DF Mean

Square

F-Value Pr > T

(2 tailed)

Parameter

Estimate

R2

Model (Total Errors) 2 5430.30 88.44 0.0001 0.2954

Error 422 61.40

Ambiguity (H1a) 1 2447.98 39.87 0.0001 4.8042

Complexity (H9a) 1 8705.38 141.78 0.0001 0.7582

Model (Duration) 2 2236.60 28.59 0.0001 0.1193

Error 422 78.23

Ambiguity (H1b) 1 1250.63 15.99 0.0001 3.4339

Complexity (H9b) 1 3352.81 42.86 0.0001 0.4705

Model (Confidence) 2 42.87 16.25 0.0001 0.0715

Error 422 2.64

Ambiguity (H1c) 1 13.03 4.94 0.0268 -0.3505

Complexity (H9c) 1 74.68 28.31 0.0001 -0.0702

Ambiguity in an information request has a strong impact on the three measures of end user

query performance presented in H1a, H1b, and H1c. Total errors, duration, and end user

confidence are significantly and strongly affected by the presence of ambiguity in the

information request. The result is confirmatory of the general hypothesis of the model

presented in this paper: that an ambiguous information request is likely to result in a query

formulation that is less accurate, takes longer to prepare, and in which the end user is less

confident. Ceteris paribus, a clearly formulated information request is more effective and

efficient than an information request that is ambiguous and poorly specified.

The relationship between ambiguity and end user confidence, however, is generally weaker

than expected, although still significant at the 5% level. The small R2 (0.0715) for the

confidence model indicates that the ambiguity and complexity of an information request had

little impact on each participant's confidence in their query formulation.

Page 37: My Masters Thesis

31

Ambiguity is significant for all three models. The R2 for each model (0.2954, 0.1193, and

0.0715) provides strong support for the assertion that ambiguity and complexity negatively

impact end user query performance.

4.4 Multiple Linear Regression Model: Seven Types of Ambiguity

Table 9 provides the results of the multiple linear regression model shown for model (2) for

the Total Errors, Duration, and Confidence measures of end user query performance. This

testing examines hypotheses H2a-c through H8a-c for individual types of ambiguity.

Table 9

Regression Analysis Results for the Seven Ambiguity Types Regression Model

Source

(n=425)

DF Mean

Square

F-Value Pr > T

(2 tailed)

Parameter

Estimate

R2

Model (Total Errors) 8 2177.52 46.81 0.0001 0.4737

Error 416 46.52

Lexical (H2a) 1 78.41 1.69 0.1949 -1.5545

Syntactical (H3a) 1 7.99 0.17 0.6789 -0.2274

Inflective (H4a) 1 0.79 0.02 0.8963 -0.4143

Pragmatic (H5a) 1 385.36 8.28 0.0042 1.2621

Extraneous (H6a) 1 254.77 5.48 0.0197 3.3940

Emphatic (H7a) 1 394.51 8.48 0.0038 2.6906

Suggestive (H8a) 1 167.54 3.60 0.0584 2.9079

Complexity 1 2605.34 56.01 0.0001 0.4899

Model (Duration) 8 832.24 11.23 0.0001 0.1776

Error 416 74.10

Lexical (H2b) 1 1272.66 17.17 0.0001 6.2626

Syntactical (H3b) 1 600.95 8.11 0.0046 1.9725

Inflective (H4b) 1 780.00 10.53 0.0013 -13.0021

Pragmatic (H5b) 1 4.65 0.06 0.8023 -0.1387

Extraneous (H6b) 1 1008.31 13.61 0.0003 6.7520

Emphatic (H7b) 1 129.85 1.75 0.1863 -1.5436

Suggestive (H8b) 1 457.05 6.17 0.0134 -4.8029

Complexity 1 1926.10 25.99 0.0001 0.4213

Model (Confidence) 8 14.66 5.64 0.0001 0.0978

Error 417 2.60

Lexical (H2c) 1 8.81 3.39 0.0664 -0.5211

Syntactical (H3c) 1 0.02 0.01 0.9292 -0.0115

Inflective (H4c) 1 1.27 0.49 0.4844 0.5253

Pragmatic (H5c) 1 2.83 1.09 0.2973 -0.1082

Extraneous (H6c) 1 0.10 0.04 0.8435 0.0677

Emphatic (H7c) 1 0.07 0.03 0.8697 -0.0358

Suggestive (H8c) 1 1.91 0.74 0.3915 0.3107

Complexity 1 76.02 29.24 0.0001 -0.0837

Page 38: My Masters Thesis

32

4.5 Summary of Results

The experimental results indicate that the taxonomy presented in this paper explains a great

deal of the effect of ambiguity on end user query performance. The results indicate that

further refinement of the theory presented in this paper is required. Table 10 provides a

summary of the results obtained in this experiment. All hypotheses indicated as "supported"

are significant at the p = 0.05 level or below according to a one-tailed test. The two-tailed p-

value is shown, and is immediately followed by the one-tailed p-value in brackets.

Table 10

Summary of Analysis' Support for Hypotheses

Hypothesis Statement Result

H1a Higher ambiguity in the information request leads to an

increase in the total errors in the query formulation.

Supported

p=0.0001 (0.0001)

H1b Higher ambiguity in the information request leads to an

increase in the time taken to complete the query formulation.

Supported

p=0.0001 (0.0001)

H1c Higher ambiguity in the information request leads to lower

end user confidence in the accuracy of the query formulation.

Supported

p=0.0268 (0.0134)

H2a Higher levels of lexical ambiguity in the information request

lead to more total errors in the query formulation.

Not Supported

p=0.1949 (0.0975)

(negative parameter)

H2b Higher levels of lexical ambiguity in the information request

lead to more time taken to complete the query formulation.

Supported

p=0.0001 (0.0001)

H2c Higher levels of lexical ambiguity in the information request

leads to lower end user confidence in the accuracy of the

query formulation.

Supported

p=0.0664 (0.0332)

H3a Higher levels of syntactical ambiguity in the information

request lead to more total errors in the query formulation.

Not Supported

p=0.6789 (0.3395)

H3b Higher levels of syntactical ambiguity in the information

request lead to more time taken to complete the query

formulation.

Supported

p=0.0046 (0.0023)

H3c Higher levels of syntactical ambiguity in the information

request leads to lower end user confidence in the accuracy of

the query formulation.

Not Supported

p=0.9292 (0.4646)

H4a Higher levels of inflective ambiguity in the information

request lead to more total errors in the query formulation.

Not Supported

p=0.8963 (0.4482)

H4b Higher levels of inflective ambiguity in the information

request lead to more time taken to complete the query

formulation.

Not Supported

p=0.0013 (0.0007)

(negative parameter)

Page 39: My Masters Thesis

33

Hypothesis Statement Result

H4c Higher levels of inflective ambiguity in the information

request leads to lower end user confidence in the accuracy of

the query formulation.

Not Supported

P = 0.4844 (0.2422)

H5a Higher levels of pragmatic ambiguity in the information

request lead to more total errors in the query formulation.

Supported

p=0.0042 (0.0021)

H5b Higher levels of pragmatic ambiguity in the information

request lead to more time taken to complete the query

formulation.

Not Supported

p=0.8023 (0.4012)

H5c Higher levels of pragmatic ambiguity in the information

request leads to lower end user confidence in the accuracy of

the query formulation.

Not Supported

p=0.2973 (0.1487)

H6a Higher levels of extraneous ambiguity in the information

request lead to more total errors in the query formulation.

Supported

p=0.0197 (0.0099)

H6b Higher levels of extraneous ambiguity in the information

request lead to more time taken to complete the query

formulation.

Supported

p=0.0003 (0.0002)

H6c Higher levels of extraneous ambiguity in the information

request leads to lower end user confidence in the accuracy of

the query formulation.

Not Supported

p=0.8435 (0.4218)

H7a Higher levels of emphatic ambiguity in the information

request lead to more total errors in the query formulation.

Supported

p=0.0038 (0.0019)

H7b Higher levels of emphatic ambiguity in the information

request lead to more time taken to complete the query

formulation.

Not Supported

p=0.1863 (0.0932)

(negative parameter)

H7c Higher levels of emphatic ambiguity in the information

request leads to lower end user confidence in the accuracy of

the query formulation.

Not Supported

p=0.8697 (0.4349)

H8a Higher levels of suggestive ambiguity in the information

request lead to more total errors in the query formulation.

Supported

p=0.0584 (0.0292)

H8b Higher levels of suggestive ambiguity in the information

request lead to more time taken to complete the query

formulation.

Not Supported

p=0.0134 (0.0067)

(negative parameter)

H8c Higher levels of suggestive ambiguity in the information

request leads to lower end user confidence in the accuracy of

the query formulation.

Not Supported

p=0.3915 (0.1958)

H9a Higher complexity in the information request leads to more

total errors in the query formulation.

Supported

p=0.0001 (0.0001)

H9b Higher complexity in the information request leads to more

time taken to complete the query formulation.

Supported

p=0.0001 (0.0001)

H9c Higher complexity in the information request leads to lower

end user confidence in the accuracy of the query formulation.

Supported

p=0.0001 (0.0001)

Page 40: My Masters Thesis

34

4.5.1 Potential Ambiguity

The generally weak measured effects for the potential ambiguities assessed by the experiment

(lexical and syntactical) do not support the hypotheses presented in this paper. As the

theoretical model indicates, potential ambiguities derive their ambiguity independently of the

context of the statement. A statement may contain lexical or syntactical ambiguity, but the

context of the statement resolves the ambiguity measured. The hypothesised effects were not

measurable due to the clarification of the ambiguity by the context.

Lexical ambiguity did not show a statistically significant relationship with total errors (H2a).

Lexical ambiguity did demonstrate a statistically significant relationship with duration (H2b,

p=0.0001) and confidence (H2c, p=0.0332 for a one-tailed t-test). The implication of these

results is that lexical ambiguity requires more cognitive effort by the end users to determine

the meaning of the request. Once the meaning of the request has been determined, however,

users do not make significantly more errors in their query formulations. Lexical ambiguity

did result in end users being slightly less confident in their queries.

Although in the hypothesised direction (positive), the relationship between syntactical

ambiguity and total errors (H3a) is not significant (p=0.6789). Syntactical ambiguity does

show a significant relationship with the time taken to compare the query, which indicates that

greater cognitive effort is required to resolve the contextual ambiguity. Syntactical

ambiguity's relationship with end user confidence is not significant (H3c, p=0.9292).

Inflective ambiguity does not show a significant relationship in the hypothesised direction for

H4a (p=0.8963), H4b (negative parameter, p=0.0013), or H4c (p=0.4844). Interestingly,

Page 41: My Masters Thesis

35

inflective ambiguity shows a significant negative relationship with duration, which is in the

opposite direction to that hypothesised. This result must be considered with caution,

however, as the level of inflective ambiguity present in the questions presented to subjects

was low (Appendix J).

4.5.2 Actual Ambiguity

The role of the actual ambiguity types (pragmatic and extraneous) in the theoretical model are

strongly supported by the empirical results. Actual ambiguities are not clarified by the

context of the statement, i.e., the context does not resolve pragmatic and extraneous

ambiguities. Actual ambiguities generally show a strong relationship with total errors, and

extraneous ambiguity (but not pragmatic ambiguity) displays a strong relationship with

duration. Neither pragmatic or extraneous ambiguities show a significant relationship with

end user confidence.

Pragmatic ambiguities are not clarified by context, and arise where information necessary to

properly answer the information request is missing. The hypothesised relationship between

pragmatic ambiguity and total errors is strongly supported (H5a, p=0.0042). The

hypothesised effects of pragmatic ambiguity on duration (H5b, negative parameter,

p=0.8023), and end user confidence (H5c, p=0.2973) were not significant. Pragmatic

ambiguity may require the end user to infer the missing information, and increase total errors.

In the current experiment, the need to infer missing information did not significantly affect

the time necessary to complete the query response or end user confidence in their query.

Extraneous ambiguity occurs when more information than is required is provided or when the

information request is indirectly and pretentiously written. Extraneous ambiguity misleads

Page 42: My Masters Thesis

36

end users as to the required response. H6a was strongly supported for total errors (p=0.0197)

and duration (p=0.0003) in the end user query formulation. Extraneous ambiguity, where

more information is provided than is required, appears to require more time and cognitive

effort to resolve the ambiguity, and the query response is more likely to be inaccurate.

The parameter estimates (Table 9) for total errors (3.3940) and for duration (6.7520) indicate

that extraneous information produces severe negative impacts on end user query efficiency

and effectiveness. The result for H6c, which hypothesised that extraneous ambiguity

decreases end user confidence, is not significant (p=0.8435). Where information needs to be

inferred (pragmatic ambiguity), end users appear to recognise and grapple with the

ambiguity. End users appeared less able to recognise and adjust for extraneous ambiguity

than pragmatic ambiguity.

4.5.3 Imaginary Ambiguity

The results for imaginary ambiguities support the hypothesised relationships between these

ambiguities and query errors. The results do not support the hypothesised relationships with

duration or end user confidence. Imaginary ambiguities result in more total errors, but appear

to result in less time taken to complete the requests. These outcomes are important, because,

although not hypothesised, imaginary ambiguities appear to lead end users to infer the

requirements of the question more quickly (leading to a shorter duration required) and to

formulate the query response on that basis (leading to higher total errors). This result should

be treated with caution, as the imaginary ambiguities were not at a high level in this

experiment (Appendix J).

Page 43: My Masters Thesis

37

Emphatic ambiguity arises from the limited ability to convey intonation in written form. The

hypothesis regarding the effect of emphatic ambiguity on total errors (H7a) is strongly

supported (p=0.0038). Neither H7b (duration) nor H7c (confidence) were statistically

significant. Where the emphasis of the information request cannot be clearly expressed, end

users are required to supply their own emphasis when interpreting the meaning of the

information request. While they appear to make their interpretation quickly, the end users did

not recognise that their queries were more likely to contain errors.

The hypothesised relationship between suggestive ambiguity and total errors (H8a) is

strongly supported (p=0.00292 for a one-tailed t-test). The relationship between suggestive

ambiguity and duration (H8b), however, is opposite to the hypothesised direction, and

significant (negative parameter, p=0.0134). The hypothesised relationship with end user

confidence (H8c) is not supported (p=0.3915). Similar to extraneous ambiguity, suggestive

ambiguity indicates that end users are not able to recognise the negative impact of suggestive

ambiguity on their query formulations. This anomalous result requires further research to

determine the reason for this undesirable result and to search for ways to ameliorate these

problems for end user formulations.

4.5.4 Complexity

The results indicate strong support for the hypotheses regarding complexity (H9a, H9b, and

H9c all with p=0.0001). Task complexity increases total errors and duration, and decreases

the end user's overall confidence in the query formulation. These results are consistent with

previous research (e.g., Borthick et al. 1997; Borthick et al. 2000).

Page 44: My Masters Thesis

38

5. Implications For Business Practice

This research has developed an initial theory of ambiguity and end user queries. It

empirically investigated seven ambiguities, and measured how they differentially affect end

user query performance. Some ambiguities, e.g., lexical, extraneous, pragmatic, and

emphatic, affect end user query performance more than others. Some ambiguities, i.e.

extraneous, and suggestive, indicate that end users will potentially make decisions based on

results that are inaccurate or misleading.

5.1.1 Electronic Mail

In the business world, electronic mail is often used to transmit information requests,

frequently without the benefit of other channels of communication (Star 1995). Furthermore,

these information requests are hurriedly written (Star 1995; Fowler and Aaron 1998). Such

haste contributes to syntactical, lexical, and inflective ambiguities. The use of shorthand

notations often miscommunicates the intended message. Electronic mails frequently leave

assumptions about the business process unstated and assumed. These omissions contribute to

pragmatic ambiguity. The hurried state of the specification, and the lack of a formal

specification process also contribute to extraneous ambiguity (Fowler and Aaron 1998).

Lexical, syntactical, inflective, and, to some extent, extraneous, ambiguity types are functions

of the grammar used to write the information request. The longer the request, the more likely

the request is to contain these ambiguities (Fowler and Aaron 1998). Concise writing is

important to reduce ambiguity. Good written communication skills on the part of the

individual making the information request are required.

Page 45: My Masters Thesis

39

All seven ambiguities arise in the daily business specification of reports. Several strategies

are available to reduce their impact. Electronic mails containing information requests need to

be concisely drafted and proofread to reduce pragmatic ambiguity. Providing concise

specifications and avoiding indirect writing, e.g., pretentious writing and passive voice,

reduce the lexical, syntactical, inflective, and extraneous ambiguity of information requests

(Fowler and Aaron 1998).

Emoticons (Sanderson 1993) and generally accepted formatting styles can be used to add

emphasis to electronic mail. These techniques can reduce emphatic ambiguity.

An objective reading of the information request to reduce innuendo addresses suggestive

ambiguity. Explaining the reason for the information request as much as possible will

enhance clarity and reduce the perception of hidden agendas.

Each of the above techniques enhance the clarity of the information request and thus increase

the effectiveness and efficiency of the response received. These techniques initially increase

the time necessary to write the information request. Nonetheless, this paper's results indicate

that the result will be an increase in the timeliness, accuracy, and relevance of the information

received.

5.1.2 Personnel Turnover and Work Teams

Information systems personnel and end users are frequently engaged on short-term contracts.

Turnover in many organisations, and especially within work groups, is high (Moore 2000).

As turnover increases, the ambiguity of information requests also tends to increase. End

users have less experience and understanding of the organisational culture and thus do not

Page 46: My Masters Thesis

40

understand the context and assumptions made in information requests. Especially when

faced with high turnover of information systems personnel and end users, strategies for

reducing the seven ambiguities can significantly benefit the organisation.

Jessup and Valacich (1993) suggest strategies for retaining group memory and enhancing

organisational learning. For work teams that often have new members, a library of previous

information requests and associated query responses will assist team members to reduce

information request ambiguity by providing a context for the request. To function properly,

new team members must understand the organisational procedures and have a context within

which to function.

Business would benefit from candidly assessing its methodology of making information

requests. Using methodologies that result in less ambiguity through formalisation of the

information request will reduce errors and improve the efficient use of the time of skilled end

users.

Page 47: My Masters Thesis

41

6. Contributions, Limitations, and Future Research

6.1 Research Contributions

This paper provided significant, unique contributions to the theory of ambiguity, complexity,

and end user query performance. The theory of communication linguistics has been applied

to end user query performance theory. The theory identified seven ambiguities: lexical,

syntactical, inflective, pragmatic, extraneous, emphatic and suggestive. The empirical results

obtained for the developed theory are robust, and indicate substantial support.

An instrument to measure ambiguity in an information request, at a finer level than

previously available, was developed and applied. Although requiring further refinement, this

instrument is a significant advance in the measurement of information request ambiguity.

This paper identifies areas for future research, and examines the implications for business

practices. This paper represents a significant advancement of the theory and application to

ensure the efficient and effective development of queries by end users.

6.2 Research Limitations

Huck et al. (1974) identify seven issues for the internal validity of experiments. Appendix L

provides a detailed analysis of these issues. Appendix L outlines how this experiment's

design controlled for each issue.

As with most controlled laboratory experiments with student participants for subjects, there

are external validity issues. Generalisation from student subjects to the business setting may

Page 48: My Masters Thesis

42

be invalid. Students' motivations to obtain a high grade may be different to the business end

user. This experiment's use of advanced business and systems undergraduate students as

subjects however implies that this generalisation to the business setting is meaningful, as

these subjects are reflective of the skill levels of end users in a business context.

Generalising from this paper's results to a business setting is invalid to the extent that the

experimental information requests are not representative of information requests made in a

business setting. The information requests nonetheless are based on a close model of the

business world, undertaking likely real world tasks.

Another limitation is the need to extend the results to more extreme levels of ambiguity. The

ambiguity present in the experiment's questions was not extreme. Hence, generalising from

the results of the current experiment to more extreme levels of ambiguity may not be valid.

6.3 Future Research

Replication of this experiment, with more ambiguous information requests than those of the

current experiment, would strengthen the theoretical model. An experiment designed to

examine contextual reduction of the potential ambiguities (lexical, syntactical, and inflective)

would also be valuable. The weaker results of the current experiment may derive from a lack

of variation in ambiguity for some of the seven types of ambiguity. Instantiating ambiguity

into the experiment over a greater range and variation of ambiguity in the information

requests would add empirical insight into the theoretical model.

This paper presents what initially appear to be anomalous results for inflective and suggestive

ambiguity in the context of duration. A future experiment would do well to investigate the

Page 49: My Masters Thesis

43

circumstances of these results, and to empirically analyse the relationship between inflective

ambiguity, suggestive ambiguity, and duration.

A future experiment having particular regard to end user confidence would significantly

assist the development of the theoretical model. None of the hypotheses, with the exception

of lexical ambiguity (H2c), is supported for end user confidence. On the basis of the current

results, end user confidence often does not reflect the true state of affairs of the query

response's accuracy. End users do not appear to know when the query response is inaccurate.

Outside of the domain of laboratory research, an avenue for future research would be a field

experiment of ambiguity and the performance of business end users. This experiment would

allow the researcher to examine the prevalence and effects of the seven types of ambiguity in

actual business settings. Such a study would also make a contribution by assessing the extent

to which the current experimental results generalise to the business setting.

An experiment designed to analyse the empirical effectiveness of strategies to mitigate each

ambiguity in a business setting would hold considerable value for research and business

practice. This would allow the development and subsequent assessment of strategies to

reduce the effect of ambiguity on end user query performance.

The development and empirical testing of the ambiguity assessment instrument (Appendix K)

would provide the opportunity to refine and enhance the current initial instrument. Future

research is necessary to develop a reliable and robust instrument for the measurement of

ambiguity in information requests.

Page 50: My Masters Thesis

44

References

Almuallim, H., Akiba, Y., Yamazaki, T., and Kaneda, S. "Learning Verb Translation Rules

from Ambiguous Examples and a Large Semantic Hierarchy," Computational Learning Theory and Natural Learning Systems, (4), 1997, pp. 323-336.

Athey, S., and Wickham, M. "Required Skills for Information Systems Jobs in Australia".

Journal of Computer Information Systems,.(36:2), 1995-1996.

Australian Bureau of Statistics. "8669.0 Computing Services Industry, Australia, 1995-96".

Australian Bureau of Statistics. 1997.

Axley, S.R. "Managerial and organizational communication in terms of the conduit

metaphor," Academy of Management Review, (9), 1984, pp. 428-437.

Borthick, A.F., Bowen, P.L., and Diery, R.G. "Complexity and Errors in SQL Queries:

Development and Empirical Comparison of Complexity Measures." Workshop on

Information Technologies and Systems (WITS '97), pp. 31-40, December 13-14 1997.

Borthick, A.F., Bowen, P.L., Jones, D.R., and Tse, M.H.K. "The Effects of Information

Request Ambiguity and Construct Incongruence on Query Development," Proceedings of the Pacific Asia Conference on Information Systems, June 2000.

Campbell, D. J. "Task Complexity: A Review and Analysis," Academy of Management

Review, (13:1), 1988, pp. 40-52.

Cardinali, R. "Information Systems - A Key Ingredient to Achieving Organizational

Competitive Strategy," Computer in Industry, (18:3), 1992, pp. 241-245.

Chomsky, N. "Language and Mind," in Ways of Communicating, Cambridge University

Press, Cambridge, 1991, pp. 56-80.

Conger, S. The New Software Engineering, Wadsworth Publishing, Belmont, California.

1994.

Copi, I. M., and Cohen, C. Introduction to Logic (8th ed.), Macmillan, New York, New York,

1990.

Cronbach, L. J. "Coefficient Alpha and the Internal Structure of Tests," Psychometrika, (16),

1951, pp. 297-334.

Delligatta, A., and Umbaugh, R. E. "EUC Becomes Enterprise Computing," Information

Systems Management, Fall 1993, pp. 53-55.

Dubin, R. Theory Building, Collier Macmillan Publishers, London, 1978.

Eisenberg, E.M., and Phillips, S.R. "Miscommunication in Organizations," in

"Miscommunication" and Problematic Talk, Sage Publications, London, 1991.

Fischer, D. H. Historians' Fallacies, Harper & Row, New York, 1970.

Fowler, H. R., and Aaron, J. E. The Little, Brown Handbook (7th ed.), Addison-Wesley

Publishers Inc., New York, New York, 1998.

Freeman, L.A., Jarvenpaa, S.L., and Wheeler, B. C. "The Supply and Demand of

Information Systems Doctorates: Past, Present and Future," MIS Quarterly, (24:2), June 2000.

Page 51: My Masters Thesis

45

Halstead, M. H. Elements of Software Science, Elsevier North-Holland Inc, Purdue University, 1977.

Hamblin, C. L. Fallacies, Methuen, London, 1970.

Huck, S. W., Cormier, W. H., and Bounds, W. G. Jr. Reading Statistics and Research,

Harper & Row, New York, New York, 1974.

Jespersen, O. Language: its nature, development and origin, Allen & Unwin, London, 1922.

Jessup, L.M., and Valacich, J.S. Group Support Systems, Macmillan Publishing Company,

New York, New York, 1993.

Jih, W.J.K., Bradbard, D.A., Snyder, C.A., and Thompson, N.G.A. "The Effects of

Relational and Entity-Relationship Data Models on Query Performance of End Users,"

International Journal of Man-Machine Studies, (31), 1989, pp. 257-267.

Katzeff, C. "Systems Demands on Mental Models for a Fulltext Database," International

Journal of Man-Machine Studies, (32), 1990, pp. 483-509.

Keen, P.G.W. "Information Technology and the Management Difference: A Fusion Map,"

IBM Systems Journal, (32:1), 1993, pp. 17-38.

Kooij, J.G. Ambiguity in Natural Language, North-Holland Publishing Company,

Amsterdam, Holland, 1971.

Liew, S.T. "The Effects of Normalization on Query Errors: An Experimental Evaluation,"

Unpublished Thesis, University of Queensland, 1995.

Moore, J.E. "One Road to Turnover: An Examination of Work Exhaustion in Technology

Professionals," MIS Quarterly, (24:1), March 2000, pp. 141-168.

Nath, R., and Lederer, A.L. "Team Building for IS Success," Information Systems

Management, Spring 1996, pp. 32-37.

Newbold, P. Statistics for Business and Economics, Prentice-Hall Inc, Englewood Cliffs,

New Jersey, 1984.

Ogden, W.C., Korenstein, R., and Smelcer, J.B. An Intelligent Front-End for SQL, IBM

General Products Division, San Jose, California, 1986.

Reilly, R.G. "Miscommunication at the Person-Machine Interface," in "Miscommunication"

and Problematic Talk, Sage Publications, London, 1991.

Reisner, P. "Use of Psychological Experimentation as an Aid to Development of a Query

Language," IEEE Transactions on Software Engineering, SE3:3, 1977, pp. 218-299.

Rescher, N. Introduction to Logic, St Martin's Press, New York, New York, 1964.

Rho, S., and March, S.T. "An Analysis of Semantic Overload in Database Access Systems

using Multi-Table Query Formulation," Journal of Database Management, (8:2), Spring

1997, pp. 3-14.

Rosenthal, D.A., and Jategaonkar, V.A. "Wanted: Qualified IS Professionals," Information

Systems Management, Spring 1995, pp. 27-31.

Russell, B.A.W. "Vagueness," Australasian Journal of Philosophy and Psychology, (1),

1923, pp. 84-92.

Ryan, H.W. "User-Driven Systems Development: Defining a New Role for IS," Information

Systems Management, Summer 1993, pp. 66-68.

Page 52: My Masters Thesis

46

Ryle, G. Collected Papers, (2), Hutchinson, London, 1971.

Sanderson, D. Smileys, O'Reilly, Sebastapol, California, 1993.

Sekine, S., Carroll, J.J., Ananiadou, S., and Tsujii, J. "Automatic learning for Semantic

Collocation," Third Conference on Applied Natural Language Processing, 1992, pp. 104-

100.

Severin, W.J., and Tankard, J.W. "Communication Theories: Origins, Methods, and Uses in

the Mass Media," Addison Wesley Longman, Inc., New York, New York, 1997.

Star, S.L. The Cultures of Computing, Blackwell Publishers/The Sociological Review,

Oxford, U.K., 1995.

Suh, K.S., and Jenkins, A.M. "A Comparison of Linear Keyword and Restricted Natural

Language Database Interfaces for Novice Users," Information Systems Research, (3:3), 1992, pp. 252-272.

Tayntor, C.B. "New Challenges or the End of EUC?," Information Systems Management,

Summer 1994, pp. 86-88.

Trow, C.E. The Old Shipmasters of Salem, New York, New York, 1905.

Turner, G.W. (Editor). The Australian Concise Oxford Dictionary of Current English,

Oxford University Press, Melbourne, 1987.

Walton, D. Fallacies Arising from Ambiguity, Kluwer Academic Publishers, Dordrecht,

1996.

Williamson, T. Vagueness, Routledge, New York, New York, 1994.

Wood, R.E. "Task Complexity: Definition of the Construct," Organizational Behaviour and

Human Decision Processes, (37), 1986, pp. 60-82.

Page 53: My Masters Thesis

47

Appendix A: Experiment Information Requests and Model Answers

No. Formulation Information Request

1. Ambiguous Management wants a list of each of our suppliers with no

duplicates in the list.

Clear List the distinct suppliers of the items we stock.

Model Answer (Halstead’s Complexity: 1.6927):

Select distinct(item_maker) from inventory;

2. Ambiguous Produce a report that lists the inventory items where the quantity

on hand is much larger, on a percentage basis, than the quantity ordered.

Clear List item number, item name, quantity on hand, quantity on order

where quantity on hand is greater than 2 * quantity ordered.

Model Answer (Halstead’s Complexity: 5.4186):

Select item_no, item_name, qty_hand, qty_ordered from inventory where qty_hand > 2 *

qty_ordered;

3. Ambiguous Management wants a list of all Japanese customers and customers

with credit limits over $15,000.

Clear List customer numbers, customer names, country, and credit limit

of customers with credit limits greater than $15,000 or of

customers in Japan.

Model Answer (Halstead’s Complexity: 6.8908):

Select cust_no, cust_name, country, credit_limit from customer where country = 'Japan' or

credit_limit > 15000;

4. Ambiguous Produce a report that statistically compares the credit limits for

customers in different countries.

Clear List country, average credit limit, and standard deviation of

customer credit limit grouped by country.

Model Answer (Halstead’s Complexity: 4.4697):

Select country, avg(credit_limit), stddev(credit_limit) from customer group by country;

5. Ambiguous Produce a report of clients that prefer the Speedair carrier and

addresses.

Clear List customer number, customer name, street, city, post code, and

country where the customer's preferred carrier is Speedair.

Model Answer (Halstead’s Complexity: 12.2917):

Select cust_no, cust_name, street, city, state, post_code, country From customer, carrier

where customer.pref_carrier_code = carrier.carrier_code and carrier_name = ‘Speedair’;

Page 54: My Masters Thesis

48

No. Formulation Information Request

6. Ambiguous We're wondering if some of our winemakers are using poor quality

packaging and bottles - we've had a few complaints. Can you get

us a report that gives us some sort of idea about what items we are

shipping compared to what the customers are taking delivery of?

It would probably be a good idea while you're at it to give a

comparative percentage of the stuff shipped that doesn't make it -

just so the vintners won't try and weasel their way out of it, you

understand, they're good at that.

Clear List item maker, item number, item name, and 100 * (sum of

quantity shipped less sum of quantity accepted) / (sum of quantity shipped) where the type of alcohol is wine.

Model Answer (Halstead’s Complexity: 18.8):

Select item_maker, inventory.item_no, item_name, 100 * (sum(qty_shipped - qty_accepted) /

sum(qty_shipped)) From inventory, invoiceitem where inventory.item_no =

invoiceitem.item_no and type_of_alc = "wine" Group by item_maker, inventory.item_no,

item_name;

7. Ambiguous Prepare a report that provides *all* customer's details and

indicates the number of different products they have ordered from

us.

Clear List customer number, and customer name for *all* customers,

and, if they have ordered anything, a count of unique items ordered.

Model Answer (Halstead’s Complexity: 16.0076):

Select customer.cust_no, cust_name, count(distinct(item_no)) from customer, invoice,

invoiceitem where customer.cust_no = invoice.cust_no (+) and invoice.invoice_no = invoiceitem.invoice_no (+) group by customer.cust_no, cust_name;

8. Ambiguous Management wants to know which customers we've shipped goods

more than 10 times to them by the shipper that they requested.

Clear List customer number, name, and count of invoices, where the

actual carrier is the same as the customer's preferred carrier,

having more than 10 shipments.

Model Answer (Halstead’s Complexity: 16.2684):

Select customer.cust_no, cust_name, count(*) from Invoice, Customer where

invoice.cust_no = customer.cust_no and invoice.carrier_code = customer.pref_carrier_code group by customer.cust_no, cust_name having count(*) > 10;

Page 55: My Masters Thesis

49

No. Formulation Information Request

9. Ambiguous Produce a report, with best items first, on the gross contribution to

profitability of each inventory item for July 1999.

Clear List item number, item description, and (unit price less unit cost)

multiplied by units sold in July 1999. Sort your output by descending gross contribution to profitability.

Model Answer (Halstead’s Complexity: 23.897):

select inventory.item_no, item_name, avg(avg_unit_price - avg_unit_cost) *

sum(qty_accepted) from invoice, invoiceitem, inventory where invoice.invoice_no =

invoiceitem.invoice_no and invoiceitem.item_no = inventory.item_no and deliver_date

between '1-Jul-99' and '31-Jul-99' group by inventory.item_no, item_name order by 3 desc;

10. Ambiguous Produce a report with the relevant customer details that gives us an

idea of how much of our business is exposed to foreign currency

fluctuations.

Clear List customer number, customer name, customer country, and a

total of the amount paid where the settlement currency code for the

invoice is not equal to the currency code for Australian dollars.

Group results by customer number.

Model Answer (Halstead’s Complexity: 19.4819):

Select customer.cust_no, cust_name, country, sum(amt_paid) from customer, invoice,

currency where customer.cust_no = invoice.cust_no and invoice.currency_code =

currency.currency_code and currency.currency_name <> ‘Australian Dollar’ Group by customer.cust_no, cust_name, country;

11. Ambiguous Management is concerned about current slow-moving inventory

items, based on shipments since 1 June 1999. Produce a report of

the items that they might be most concerned about.

Clear List inventory item number, item description, quantity on hand,

and sum(quantity shipped) with ship dates greater than 1 June

1999 that have sums of the quantity shipped less than the sums of

the quantity on hand.

Model Answer (Halstead’s Complexity: 22.4):

Select inventory.item_no, item_name, sum(qty_hand), sum(qty_shipped) from inventory,

invoiceitem, invoice where inventory.item_no = invoiceitem.item_no and

invoiceitem.invoice_no = invoice.invoice_no and ship_date > ‘1-Jun-99’ group by

inventory.item_no, item_name having sum(qty_shipped) < sum(qty_hand);

Page 56: My Masters Thesis

50

No. Formulation Information Request

12. Ambiguous Produce a report that gives some idea about our best USA export

items where the amount since March is bigger than $5,000.

Clear List item numbers, item descriptions and the total accepted

quantity times agreed price of each item for items shipped to US

customers since 1 March 1999 and having a total accepted quantity

times agreed price greater than $5,000.

Model Answer (Halstead’s Complexity: 29.1633):

select inventory.item_no, item_name, sum(qty_accepted * agreed_unit_price) from invoice,

invoiceitem, inventory, customer where invoice.invoice_no = invoiceitem.invoice_no and

invoiceitem.item_no = inventory.item_no and customer.cust_no = invoice.cust_no and

ship_date > '1-Mar-99' and country = ‘USA’ group by inventory.item_no, item_name

having sum(qty_accepted * agreed_unit_price) > 5000;

13. Ambiguous Produce a report showing our Japanese client base that didn't order

anything in July. We're going to need an idea of how many

invoices and things like that that we have for them. We're

concerned about why our orders have dropped off. Can you use

that statistical thing (you know, the one that gives an idea of how

the numbers are varying, not variance, the other one) to show

whether the date the stuff is delivered is different to the date they wanted the stuff?

Clear List customer number, customer name, number of invoices, and

standard deviation of the difference between the deliver date and

the want date for Japanese customers who did not place an order in July 1999.

Model Answer (Halstead’s Complexity: 24.0168):

select customer.cust_no, cust_name, count(invoice_no), stddev(deliver_date - want_date)

from customer, invoice where customer.cust_no = invoice.cust_no and country = 'Japan'

and customer.cust_no not in (select cust_no from invoice where order_date between '1-Jul-

99' and '31-Jul-99') group by customer.cust_no, cust_name;

14. Ambiguous We want to have a mail-out to our best customers (say, those who

paid us more than $5000 or so recently, and those with credit

limits over $20,000). We're interested in seeing if we can move

that new Hunter Valley shipment. Can you get us a mailing list?

Clear List customer number, name, street, city, state, post code, and

country for those customers with credit limits greater than $20,000 or since 1 July 1999 have total paid invoices of more than $5,000.

Model Answer (Halstead’s Complexity: 29.9607):

select customer.cust_no, cust_name, street, city, state, post_code, country from customer,

invoice where customer.cust_no = invoice.cust_no group by customer.cust_no, cust_name,

street, city, state, post_code, country having sum(amt_paid) > 5000

UNION

select customer.cust_no, cust_name, street, city, state, post_code, country from customer where credit_limit > 20000;

Page 57: My Masters Thesis

51

No. Formulation Information Request

15. Ambiguous Produce a report that shows the percentage of orders where we're

not meeting customers' delivery date expectations in each country.

Clear Count all invoices, where the date the order was delivered was

larger than the date the customer wanted the order. Group by country. Calculate the percentage of late orders by country.

Model Answer (Halstead’s Complexity: 34.992):

Create View TotalOrders as select country, count(*) Total_Orders from customer, invoice

here customer.cust_no = invoice.cust_no group by country;

Create view LateOrders as select country, count(*) Late_Orders from customer, invoice

where customer.cust_no = invoice.cust_no and deliver_date > want_date group by country;

Select total_orders.country, 100*(late_orders / total_orders) Percent_Late_Orders from

lateorders, totalorders where totalorders.country = lateorders.country;

16. Ambiguous Produce a report that shows, by country, which carriers are, on

average, not meeting their expected delivery times.

Clear List carrier code, carrier name, country, and average of (delivery

days less the difference between delivery date and ship date) by

country having that average difference greater than 1 day.

Model Answer (Halstead’s Complexity: 40.1661):

select carrier.carrier_code, carrier_name, delivdays.country avg((deliver_date - ship_date)

- deliver_days) from carrier, invoice, customer, delivdays where carrier.carrier_code =

invoice.carrier_code and invoice.cust_no = customer.cust_no and carrier.carrier_code =

delivdays.carrier_code and customer.city = delivdays.city and customer.state =

delivdays.state and customer.country = delivdays.country group by carrier.carrier_code, carrier_name, delivdays.country having avg((deliver_date - ship_date) - deliver_days) > 1;

Page 58: My Masters Thesis

52

Appendix B: Experiment Instruction Sheet

INSTRUCTIONS

This laboratory session requires you to execute command files and query a database.

Please follow the instructions carefully.

Page 59: My Masters Thesis

53

Part 1 - Scenario

George Harford Wine Merchant distributes wines throughout the world. They predominantly

trade with customers in France, Japan, the USA, and the UK. Customers place orders for

wines which employees process, pack, and ship to the customers via an appropriate carrier.

The packers attach an invoice created by the Accounts Receivable department to the goods

when shipped. These invoices contain all relevant information generated from the invoice and

inventory databases. The data structures for the relevant tables are attached.

Page 60: My Masters Thesis

54

Part 2 - SQL Syntax Reminder

The SQL syntax for SELECT commands follows. Items in square brackets [ ] are optional,

and items in braces { } can be repeated zero or more times:

SELECT [DISTINCT]*|(((table. | view.)column | expression) [alias]

{, ((table. | view.)column | expression)[alias]})

FROM (table|view)[alias]{,(table | view)[alias]}

WHERE condition {, condition}

[GROUP BY expression{,expression} [HAVING condition{,condition}]]

[(UNION|UNIONALL|INTERSECT|MINUS) SELECT command]

[ORDER BY (expression|position)[DESC]{,(expression|position)

[DESC]}];

Only under highly unusual circumstances should you formulate a select command that

contains more than one table in the FROM clause without a join in the WHERE cause. As a

general rule, the number of joins should equal to the number of foreign key attributes. Except

for extremely rare queries that usually produce only summary results (such as counting the

number of records in a table), all SQL queries, even those involving only one table, should

include WHERE conditions.

You may need to use some of the following keywords

AND

AVG

COUNT

DISTINCT

IN

MAX

MIN

NOT

NULL

OR

STDDEV

SUM

SYSDATE

UNIQUE

VARIANCE

(+) (outer join)

The SQL syntax for VIEW commands follows

CREATE VIEW viewname AS (SELECT command);

When you create a view with the same name as an already-existing view (for example, you

rerun your query), you will need to drop the already-existing view:

DROP VIEW viewname;

Reminders:

Aliases for columns in views should not be enclosed in quotes.

If you have multiple join conditions, i.e., more than one foreign key or a concatenated

foreign key, you may need to put the outer join symbol on other join conditions.

Page 61: My Masters Thesis

55

Part 3 - Getting started

Log into your area on valinor. For the purposes of assessment, everything you do in this

laboratory session needs to be recorded and sent to the instructor. Follow the instructions

carefully. In particular, please refrain from running more than one session on valinor because

running more than one session will mean that all your query attempts will not be recorded. To

begin this quiz, type the following at the valinor prompt:

valinor> ksh

valinor> /home/staff/bowen/startqz199b

Follow the instructions given by the program carefully. You can attempt each query as many

times as you wish.

You should note that once you accept a query, you cannot return to the question again.

Page 62: My Masters Thesis

56

Part 3 - Getting started

Log into your area on valinor. For the purposes of assessment, everything you do in this

laboratory session needs to be recorded and sent to the instructor. Follow the instructions

carefully. In particular, please refrain from running more than one session on valinor because

running more than one session will mean that all your query attempts will not be recorded. To

begin this quiz, type the following at the valinor prompt:

valinor> ksh

valinor> /home/staff/bowen/startqz199a

Follow the instructions given by the program carefully. You can attempt each query as many

times as you wish.

You should note that once you accept a query, you cannot return to the question again.

Page 63: My Masters Thesis

57

Part 4 - Your Mission

You are an internal auditor at George Harford. On 16 August 1999, your supervisor

approaches you with a list of questions. Some questions were designed by the supervisor,

who knows SQL well. Your supervisor was also given questions from management, who do

not know SQL all that well.

Your task is to formulate and execute SQL queries to answer these questions.

Your supervisor is gone for the day and getting answers for these questions is urgent.

Therefore, you need to make your best interpretation of the questions from management. You

can discuss with your supervisor the assumptions you made after she returns. However, she

will be most annoyed if you do not make an attempt to answer as many of the questions as

you can prior to her return.

The questions have been structured so that easier questions appear first and then become

progressively more difficult.

Your supervisor wants to see the complete SQL queries that you use. When the question is

phrased asking for a name, your query should use criteria that include that name i.e. you

should not look up the code to avoid joining to the table that contains the name.

Page 64: My Masters Thesis

58

Appendix C: Command Interpreter Unix Shell Script

Two Unix Shell Scripts were used to operate the experiment. The two scripts were essentially

identical except that they used different source data depending on the treatment initially

received by the different experimental groups (the variable $quizfile). This script has been

developed, modified, and enhanced from previous experiments undertaken within the Faculty

of Commerce at the University of Queensland (Borthick et al. 1997; Borthick et al. 2000).

The interface source code had been previously developed by Mr Andrew Jones.

Page 65: My Masters Thesis

59

#!/bin/ksh

## /\ndy. 28/08/98. version 0.02

## NB. this script requires ksh because it uses "read -u".

## The rest of it should run in any sh-compatible shell (sh, bash, ksh etc)

## DoLog() - A utility function to append a message to our log file.

## As it stands, each line contains the username, process ID, date, time,

## and a message

## eg.

## [jones] <4268> 28/08 11:41:09: Displaying question 3

## [jones] <4298> 28/08 11:41:12: Attempting question 3 Attempt number 1

DoLog()

{

## %a = day, %e = date, %m = month. %T = time.

now=`date +"[$username] <$$> %e/%m %T:"`

echo "$now $*" >> $logfile

}

## Obtain the username of the person running this program, for the log

file.

## No need to change this.

###username=${USER:-$LOGNAME}

username=`whoami`

## CONFIGURE THIS:

## "quizfile" is a variable which contains the name of the file with the

## questions you wish to present to the students. You should edit this

## script to set this variable to the appropriate value.

## If this variable is null, then the program will expect a single

## command-line argument, which will be the filename of the question file.

##

## The question file should contains questions, one per line.

##

## Note that the user running this program must have access privs to the

## question file and the directories above it...

## eg. quizfile="/home/staff/bowen/questions"

quizfile="/home/staff/bowen/questions99qz1b"

## CONFIGURE THIS:

## Location of the log file to record what people do.

## You can reset this to whatever you like, but make sure that everyone

## can append to it. Also note that files in /tmp disappear when

## valinor is restarted. /var/tmp might be safer, but who knows.

##

## Probably best if you make a logfile directory in your home dir,

## chmod it to mode 1777 and put the log files in there...

##

## Note: If the log file does not already exist, this program will now

## create it. This better allows per-user log files to work.

## However, if you are using only one log file, it is a better idea

## if you create and chmod it yourself...

Page 66: My Masters Thesis

60

#logfile="/var/tmp/sql.log" # one log for all users..

#logfile="/var/tmp/sql.$username.log" # one log per user...

logfile="/home/staff/bowen/logfile/qz199/$username.log"

## Editor to use. pico is the easiest.. esp if we run it in "tool" mode...

editor="pico -t"

## temporary filenames.

tmp="/tmp/qn-$username.$$"

attfile="$HOME/answer.$$"

qnum=1 # question number

attnum=0 # attempt number

## Set up a clean up routine to clean up after ourselves in case we die..

trap 'rm -f "$attfile" "$tmp"; exit 1' 1 3 15 8

## "echo -n" is supposed to print without a newline.

## This little hack ensures it will on valinor...

PATH=/usr/ucb:${PATH}

## ---------------------------------------------------------------------

## End of configuration section: Start of program.

## Create the log file if it doesn't exit...

if [ ! -f "$logfile" ]

then

> $logfile

chmod 666 $logfile

DoLog "StartUp: Created this Log file."

fi

if [ -z "$quizfile" ]

then

## No $quizfile, so we expect a question file command-line argument.

if [ $# != 1 ]

then

echo "Usage: `basename $0` file-with-questions"

DoLog "Error: No quizfile and no cmd line argument."

exit 1

fi

quizfile="$1"

fi

## Make sure we can read the file. NB. this requires some permissions on

the

## directory containing the file, and that directory's parent, and ...

if [ ! -f "$quizfile" ]

then

echo "Error: Unable to read file: \"$quizfile\"."

DoLog "Error: Can't open file $question (pwd=`pwd`)"

exit 2

Page 67: My Masters Thesis

61

fi

## Splash screen telling them what will happen.

DoLog "Startup: Showing splash screen."

clear

cat <<ENDOFBLURB

CO365 DATABASE MANAGEMENT SYSTEMS IN BUSINESS

QUIZ ONE

In this exercise, you will be presented with a series of problems.

The first problem will be displayed, and then the system will wait

for you to hit the <RETURN> (aka the <ENTER>) key.

This gives you time to read and absorb the problem.

After you hit the <ENTER> key, you will be taken into the user-friendly

editor "pico", where you can compose a solution. When you are satisfied,

quit the editor with the Control-X command. Your solution will be run,

and any output will be displayed on your screen.

You will then be asked whether you are happy with your solution.

If you are not, then you can re-edit your first attempt and try again.

Otherwise, you will be asked to rank your confidence in your solution.

You then continue on to the second problem, and so on...

ENDOFBLURB

echo -n "Hit the <RETURN> key to continue."

read junk

echo

echo

clear

DoLog "Startup: Finished showing splash screen."

exec 3<"$quizfile"

qnum=1

## This is the main loop of the program.

while read -u3 question

do

## if we are between questions, make the screen tidier.

if [ "$qnum" -gt 1 ]

then

clear

## echo

echo "Ok. Onto the next question."

echo

fi

thisattmpt="retry"

attnum=0 # attempt number

> $attfile

Page 68: My Masters Thesis

62

## attempt the current question.

while [ "$thisattmpt" != "accept" ]

do

attnum=`expr $attnum + 1`

clear

echo "Question #$qnum:"

echo

echo "$question"

echo

if [ $attnum = 1 ]

then

echo

echo "--------------------------------------------------"

echo "When you are finished reading the question, hit the

<ENTER> key, to start"

echo -n "using an editor to create your solution. "

DoLog "Displaying question $qnum"

else

echo

echo "--------------------------------------------------"

echo "Your current solution is ..."

sed -e 's/^/| /' < $attfile

echo

echo -n "Hit the <ENTER> key to re-edit this... "

fi

# pause here until they hit RETURN

read junk

DoLog " Attempting question $qnum Attempt number $attnum"

$editor $attfile

## cp $attfile $username.sql

## echo "quit" >> $username.sql

echo

echo "Ok. Now testing this solution..."

echo

## FIXME: Need to make sure that the Oracle environment

## is properly set up so that they can run sqlplus...

## Plus, the /dev/null thing is crude, but probably enough to

## prevent them getting into an interactive oracle session...

sqlplus / @$attfile < /dev/null

## Reformat of output allows users to use data more

## interactively. Micheal Axelsen 1999.

## Disabled since they can then end up in a cartesian

## product join.

## echo "Attempting Question: $qnum" > $username.lst

## echo "" >> $username.lst

## cat "$question" >> output_screen

## echo "" >> $username.lst

## echo "Your SQL Query:" >> $username.lst

## echo >> $username.lst

## cat $attfile >> $username.lst

## echo "" >> $username.lst

Page 69: My Masters Thesis

63

## echo "Results:" >> $username.lst

## sqlplus / @$username.sql >> $username.lst

## $editor $username.lst

## Should we pipe output into less for them to see?

echo

## Should we capture their attempt?

DoLog " The attempt was ..."

sed -e "s/^/[$username] <$$> Qn: $qnum Att: $attnum /" <

$attfile >> $logfile

## ask if happy with this attempt or not

echo "Are you happy with this attempt, or do you want to try

again?"

PS3="Choice: "

select thisattmpt in retry accept

do

if [ -n "$thisattmpt" ]

then

echo "Ok."

break

fi

echo "Invalid response. Try again."

done

echo

done

DoLog "Completed question $qnum Number of attempts was $attnum"

## DoLog "The final solution was ..."

## sed -e 's/^/| /' < $attfile >> $logfile

## Ask here how confident they are...

echo "How confident are you about your solution?"

PS3="Confidence? "

select conf in "85-100%" "70-85%" "55-70%" "40-55%" "25-40%" "10-25%"

"<10%"

do

if [ -n "$conf" ]

then

echo "Ok."

break

fi

done

DoLog "Confidence for question $qnum was $conf"

echo

echo "Ok. Now what?"

PS3="What now? "

select whatnow in "Contine to next question" "Quit"

do

if [ -n "$whatnow" ]

Page 70: My Masters Thesis

64

then

break

fi

done

if [ "$whatnow" = "Quit" ]

then

echo

echo "Are you sure you want to quit?"

PS3="Confirm quit: "

select confirm in yes no

do

if [ -n "$confirm" ]

then

break

fi

done

if [ "$confirm" = "yes" ]

then

echo "Ok. Quitting now."

break

else

echo "Ok. Not quitting."

fi

fi

## NB. It's more efficient to use the shell's built in arithmetic...

qnum=`expr $qnum + 1`

done

DoLog "Quitting."

rm -f "$attfile" "$tmp"

echo "Bye..."

Page 71: My Masters Thesis

65

Appendix D: Experiment Entity-Relationship Diagram

Customer

Cust_no+

Cust_name

Phone_no

Street

City

State

Post_code

Country

Credit_limit

Outstanding_bal

Pref_carrier_code

Delivdays

Carrier_code+

City+

State+

Country+

Deliver_days

Carrier

Carrier_code+

Carrier_name

Carrier_type

Invoice

Invoice_no+

Order_date

Cust_no

Ship_date

Want_date

Deliver_date

Paid_date

Fob_code

Disc_pct

Disc_days

Currency_code

Amt_paid

Carrier_code

Emp_no

Employee

Emp_no+

Emp_name

Currency

Currency_code+

Currency_name

Currency_date+

Currency_rateFob

Fob_code+

Fob_name

Invoiceitem

Invoice_no+

Item_no+

Unit_meas

Quoted_unit_price

Agreed_unit_price

Qty_shipped

Qty_accepted

Diff_cause

Inventory

Item_no+

Item_name

Item_maker

Item_package

Item_year

Type_of_alc

Alc_category

Alc_content

Avg_unit_cost

Unit_meas

Avg_unit_price

Qty_hand

Qty_ordered

FK = Foreign Key

+ Primary Key

FK =

Carrier_code

FK = Carrier_code

FK = Emp_no

FK =

Currency_code

+ [Appropriate

Dates]

FK = Invoice_no

FK = Cust_no

FK = Fob_code

FK = Item_no

Page 72: My Masters Thesis

66

Abbreviation Type Description

Table: Invoice

Invoice_no Char(7) Invoice number

Order_date Date Date the order was placed

Cust_no Char(5) Customer number

Ship_date Date Date the order was shipped

Want_date Date Date the order was wanted by the customer

Deliver_date Date Date the order was delivered

Paid_date Date Date the invoice was paid

Fob_code Char(1) FOB code {1,2}

Disc_pct Number Discount percent, e.g. 1, 1.5, 2, 2.25

Disc_days Number Discount days - start day depends on FOB

Currency_code Char(1) Settlement currency code

Amt_paid Number Amount paid in Australian dollars

Carrier_code Char(5) Carrier code of carrier that delivered the order

Emp_no Char(4) Employee number of person who packed the order

Table: Customer

Cust_no Char(5) Customer number

Cust_name Char(20) Customer's name

Phone_no Char(15) Customer's telephone number

Street Char(30) Customer's street address

City Char(20) Customer's city

State Char(20) Customer's state

Post_code Char(10) Customer's post code

Country Char(20) Customer's country

Credit_limit Number Customer's credit limit

Outstanding_bal Number Customer's outstanding balance (amount owing)

Pref_carrier_code Char(5) Customer's preferred carrier

Table: Carrier

Carrier_code Char(5) Carrier code

Carrier_name Char(20) Carrier's nae

Carrier_type Char(8) Type of carrier {air, surface}

Table: Fob

Currency_code Char(1) Currency code

Currency_name Char(15) Name of currency

Currency_date Date Date for which the currency rate applies

Currency_rate Number Currency rate as of the currency date, i.e. the

number of units of the currency that one Australian

dollar will purchase, e.g., one Australian dollar can

currency be exchanged for approximately 0.65 US

dollars.

Page 73: My Masters Thesis

67

Table: Delivdays

Carrier_code Char(5) Carrier code

City Char(20) Deliver to city

State Char(20) Deliver to state

Country Char(20) Deliver to country

Deliver_days Number Expected number of calendar days for the carrier to

deliver merchandise to the city, state, and country,

i.e., the carrier's estimate of the time required to

deliver an order to the destination described by city,

state, and country.

Table: Employee

Emp_no Char(4) Employee number

Emp_name Char(20) Employee's name

Table: Invoiceitem

Invoice_no Char(7) Invoice number

Item_no Char(7) Inventory item number

Unit_meas Char(5) Unit of measure for item {case, each}

Quoted_unit_price Number Quoted unit cost of the item in Australian dollars

Agreed_unit_price Number Agreed unit cost of the item in Australian dollars

Qty_shipped Number Quantity of the item shipped to the customer

Qty_accepted Number Quantity of the item accepted by the customer

Diff_cause Char(15) Reason for differences in costs or quantities {broken

bottle, damaged cork, late delivery, no diff,

shortage, sugary, vinegary}

Table: Inventory

Item_no Char(7) Inventory item number

Item_name Char(20) Name or description of the item

Item_maker Char(20) Maker of the item, e.g. the vintner

Item_package Char(15) How each component of the item is packaged

{bottle, can, cardboard box}

Item_year Number Year the item was produced.

Type_of_alc Char(5) Type of alcohol {beer, wine}

Alc_category Char(15) Alcohol category {dark, dry, full strength, light,

mid-strength, red, sparkling, white}

Alc_content Number Alcohol content e.g. full strength beers are typically

about 5.0 (percent) and wines are typically between

12 and 14 (percent)

Avg_unit_cost Number Average price per unit at which the item was

purchased from the item maker

Unit_meas Char(5) Unit of measure for item {case, each}

Avg_unit_price Number Average price per unit at which the item is sold to

customers

Qty_hand Number Quantity of the item on hand

Qty_ordered Number Quantity of the item ordered in the last 12 months

Page 74: My Masters Thesis

68

Appendix E: Experimental Design

Stratification Into Group A and Group B

To control for a testing effect (Huck et al. 1974), and to ensure even representation of skill

sets across Group A and Group B, participants were stratified into classes. This stratification

was in accordance with participants' previous subject enrolments. Participants within each

strata class were then ranked according to their current enrolment subject and their

performance in earlier subjects, and their experience with database query languages. Thirteen

groups were used to classify participants. Table 11 shows the final strata class ordering, and

the number of participants in each strata class.

This process resulted in a ranked listing of participants from one to sixty-six. The

experimental treatment effect of manager-English (ambiguous) and pseudo-SQL (clear) was

assigned randomly to the first student on this list and then alternately to each student

thereafter. This resulted in two student groups with equivalent participant counts: Group A

and Group B. Group A's first question formulation was ambiguous, and then alternately clear

and ambiguous thereafter. Group B's first question formulation was clear, and then

alternately ambiguous and clear thereafter.

Page 75: My Masters Thesis

69

Table 11

Participant Strata Classes

Strata Class Participant Count Description

865(1) 4 Students in the postgraduate Database Design

subject who had previously participated in more

than one similar experiment.

365(1) 1 Students in the undergraduate Database Design

subject who had previously participated in more

than one similar experiment.

365(2) 1 Computer Science students in the undergraduate

Database Design subject who had previously

participated in a similar experiment.

865(2) 15 Students who had undertaken a database design

course previously and enrolled in the

postgraduate database design subject.

365(3) 10 Students who had undertaken a database design

course previously and undertaking the

undergraduate database design course.

865(3) 2 Students who had undertaken a database design

course previously (but not at University of

Queensland) and undertaking the postgraduate

database design course.

365(4) 13

Students who had undertaken advanced

information systems courses previously and

undertaking the undergraduate database design

course.

865(4) 3 Students who had undertaken information

systems courses previously and undertaking the

postgraduate database design course.

365(5) 6 Students who had undertaken introductory

computer courses previously and undertaking the

undergraduate database design course.

365(6) 3 Students who had undertaken no information

system or computer courses previously and

undertaking the undergraduate database design

course.

865(5) 6 Students undertaking the postgraduate database

design course with no available academic

history.

365(7) 2 Students undertaking the undergraduate database

design course with no available academic

history.

Page 76: My Masters Thesis

70

The Experiment

The experiment was held over two days during the fourth week of instruction. Students

undertook a two hour closed-book (no reference material allowed) experiment on computer,

with no perusal time, in their normal classes. The random assignment of membership to

Group A and Group B had the purpose and effect of ensuring an even representation of

Group A and Group B in each class.

Participants knew before the experiment that questions increased in complexity, that there

were sixteen questions in total, and that, once a question had been completed, they could not

return to their answer. Participants were also aware that the number of attempts they made

on the question did not affect their mark.

An instruction sheet was provided to participants (refer Appendix B), depending on the

treatment group (A or B) to which the participant had been previously assigned. The only

point of difference between the two groups' instruction sheet was the name of the Unix

command script file to use: startqz199a for Group A and startqz199b for Group B. The

instruction sheet contained an overview of SQL syntax as a reference for participants.

Further, an entity-relationship diagram was provided to describe the database being used, as

reproduced in Appendix D.

Participants could make reference notes on working paper if they required. Participants

returned these materials to the examiner at the end of the experiment. The question

formulations used in the experiment and model answers are reproduced in Appendix A.

Page 77: My Masters Thesis

71

There were two examiners present (the course lecturer and the researcher). Assistance was

provided to participants in the operation of the experimental program (the Unix command

script). Assistance was also provided on some technical aspects of SQL on request.

User Interface and Query Development Process

Appendix C contains an example of the Unix command interpreter script used by participants

to enter information using the relatively easy-to-use Pico editor, with which they were

familiar. The command interpreter presented the question to the participant. On the

completion of an attempt, the SQL result set was displayed. If the participant did not

consider the results presented to be their final response, the participant could return to the

SQL formulation. If the participant considered the result satisfactory, the participant would

be prompted to rank their confidence in the solution, and proceed to the next question.

Hence, the participant was able to interactively build and test their response until they were

confident in their answer. This confidence was self-assigned on the following scale: >85-

100%, 70-85%, 55-70%, 40-55%, 25-40%, 10-25% and <10%.

The questions were only available electronically. The questions were presented alternately

ambiguous (natural language) and clear (pseudo-SQL). A participant in Group A received an

ambiguous formulation for Question One, clear for Question Two, ambiguous for Question

Three, and so on. A participant in Group B had clear for Question One, ambiguous for

Question Two, clear for question three, and so on. The required answer was identical for

both formulations of the same question.

Page 78: My Masters Thesis

72

Appendix F: Error Marking Sheets

Semantic Error Counting Form

User Name Question Number Attempts

Confidence:

Duration:

MICRO ERRORS Keywords

View Select From Where Join Where Cond Group by Having Order by

Symbols View Select From Where Join Where Cond Group by Having Order by

Logical Operators View Select From Where Join Where Cond Group by Having Order by

Relational Operators View Select From Where Join Where Cond Group by Having Order by

Tables View Select From Where Join Where Cond Group by Having Order by

Attributes View Select From Where Join Where Cond Group by Having Order by

Values View Select From Where Join Where Cond Group by Having Order by

Set Operators Where Union Intersect Minus

MACRO ERRORS Columns Rows Aggregation

Page 79: My Masters Thesis

73

SQL Challenge Error Counting Form

User Name Question Number Attempts

Confidence:

SQL CHALLENGE EXPRESSION

Present Challenge Response Comment

Distinct Keyword in Select Clause P / A 1 2 3 4 5 6 7

Built-in Function (Avg, Sum, Std

Dev, etc)

P / A 1 2 3 4 5 6 7

Mathematical Expression in Select

Clause

P / A 1 2 3 4 5 6 7

Mathematical Expression in Where

Clause

P / A 1 2 3 4 5 6 7

Mathematical Expression in

Having Clause

P / A 1 2 3 4 5 6 7

ERD (Join not shown on ERD) P / A 1 2 3 4 5 6 7

Join P / A 1 2 3 4 5 6 7

Outer Join P / A 1 2 3 4 5 6 7

Subquery P / A 1 2 3 4 5 6 7

Or (Where or Having) P / A 1 2 3 4 5 6 7

Between P / A 1 2 3 4 5 6 7

Not Equal P / A 1 2 3 4 5 6 7

Group By P / A 1 2 3 4 5 6 7

Having P / A 1 2 3 4 5 6 7

View P / A 1 2 3 4 5 6 7

Page 80: My Masters Thesis

74

Intermediate Error Counting Form

User Name Question Number Attempts

Confidence:

Column Errors

Missing

Extra

Wrong (in contrast with missing &

extra columns)

Table Errors

Missing

Extra

Wrong

Row Restriction

Missing

Extra

Wrong

Logical Operator

Join Restrictions

Missing

Extra

Wrong

Aggregation Level (Group by/Aggregation in Select)

Missing

Extra

Wrong

Aggregation Restriction (Having)

Missing

Extra

Wrong

Sort/Order by

Missing

Wrong Attribute Order

Wrong Direction (ascending,

descending)

Wrong

Page 81: My Masters Thesis

75

Appendix G: Annotated Corrected Participant Response

This appendix provides an annotated example of the process used to correct participant

responses according to the model answer. This question was chosen to provide a flavour of

the methodology used to determine and classify errors. The response shown here is the fifth

participant's response (in order of assessment) to the third question.

Model Answer:

Select cust_no, cust_name, country, credit_limit from customer where credit_limit > 15000

or country = 'Japan';

Actual Response:

Select cust_no, cust_name, country, credit_limit from customer where credit_limit > 15000

and country = 'japan';

Annotated Response:

Select cust_no, cust_name, country, credit_limit from customer where credit_limit > 15000

and (1)

or (2)

country = 'j (3)

J(4)

apan';

In this annotated response, the superscript number in brackets indicates the error count. In

this response there are four micro errors.

Page 82: My Masters Thesis

76

Micro Error Sheet:

Errors (1) and (2) result in a total of two logical operator errors in the WHERE COND clause.

Errors (3) and (4) result in a total of two value errors in the WHERE COND clause.

Macro Error Sheet

There are two row errors here, as there are two errors in the WHERE COND clause.

SQL Challenge Sheet

The SQL Challenge presented in this question is the "Or (Where or Having)" challenge. The

challenge is present, and the participant's response to the challenge was poor, resulting in a

"1" assessment.

Intermediate Error Counting Sheet

In this response there are two row restriction errors, one "wrong" row restriction and one

"logical operator" error.

Page 83: My Masters Thesis

77

Appendix H: Pearson Correlation Matrix of Variables

Am

big

uit

y

Co

mp

lex

ity

Att

em

pts

Co

nfi

den

ce

Du

ra

tio

n

To

tal

Erro

rs

Lex

ica

l

Sy

nta

cti

ca

l

Infl

ecti

ve

Pra

gm

ati

c

Ex

tra

neo

us

Em

ph

ati

c

Su

gg

est

ive

GP

A

Ambiguity 1.0000

one-sided p 0.0000

Complexity -0.0330 1.0000

one-sided p 0.2488 0.0000

Attempts 0.1247 0.3312 1.0000

one-sided p 0.0050 0.0000 0.0000

Confidence -0.0961 -0.2463 -0.4242 1.0000

one-sided p 0.0239 0.0000 0.0000 0.0000

Duration 0.1729 0.2932 0.6905 -0.4282 1.0000

one-sided p 0.0002 0.0000 0.0000 0.0000 0.0000

Total Errors 0.2421 0.4783 0.2742 -0.3241 0.3653 1.0000

one-sided p 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

Lexical 0.7169 -0.0593 0.0847 -0.1213 0.2241 0.2165 1.0000

one-sided p 0.0000 0.1114 0.0406 0.0062 0.0000 0.0000 0.0000

Syntactical 0.6103 -0.1196 0.0532 0.0153 -0.0122 -0.0491 0.0855 1.0000

one-sided p 0.0000 0.0068 0.1367 0.3769 0.4007 0.1564 0.0391 0.0000

Inflective 0.3957 -0.0219 -0.0602 0.0698 0.0118 0.2534 0.2816 0.1606 1.0000

one-sided p 0.0000 0.3266 0.1079 0.0754 0.4045 0.0000 0.0000 0.0004 0.0000

Pragmatic 0.4735 -0.1131 0.0877 -0.0403 0.1057 0.2521 0.4378 0.1257 0.2299 1.0000

one-sided p 0.0000 0.0098 0.0354 0.2035 0.0146 0.0000 0.0000 0.0048 0.0000 0.0000

Extraneous 0.1855 0.3333 0.1410 -0.0223 0.2183 0.5764 0.2616 -0.2611 0.5837 0.3314 1.0000

one-sided p 0.0001 0.0000 0.0018 0.3234 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

Emphatic 0.7173 0.1914 0.1886 -0.1490 0.2482 0.3588 0.7100 0.2746 0.1177 0.2486 0.2870 1.0000

one-sided p 0.0000 0.0000 0.0000 0.0010 0.0000 0.0000 0.0000 0.0000 0.0076 0.0000 0.0000 0.0000

Suggestive 0.4930 0.2863 0.1432 -0.0270 0.1927 0.5611 0.3881 0.1127 0.5723 0.4139 0.8347 0.4058 1.0000

one-sided p 0.0000 0.0000 0.0015 0.2893 0.0000 0.0000 0.0000 0.0101 0.0000 0.0000 0.0000 0.0000 0.0000

GPA (n=420) 0.0000 0.1256 -0.0842 0.1764 -0.1256 -0.1313 -0.0282 0.0336 0.0099 0.0079 -0.0013 0.0010 0.0275 1.0000

one-sided p 0.4999 0.0050 0.0424 0.0001 0.0050 0.0035 0.2820 0.2463 0.4196 0.4358 0.4891 0.4919 0.2869 0.0000

Page 84: My Masters Thesis

78

Appendix I: Analysis of Ambiguity's Effect On Error Type

Question One

SQL

Component

Type Keywords Symbols Logical

Operators

Relational

Operators

Tables Attributes Values Total:

View A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Select A 0.156 0.406 0.000 0.000 0.000 0.344 0.000 0.906

Select C 0.091 0.273 0.000 0.000 0.000 0.182 0.000 0.545

From A 0.000 0.000 0.000 0.000 0.250 0.000 0.000 0.250

From C 0.000 0.061 0.000 0.000 0.091 0.000 0.000 0.152

Where Join A 0.031 0.063 0.000 0.031 0.063 0.063 0.000 0.250

Where Join C 0.030 0.061 0.000 0.030 0.061 0.061 0.000 0.242

Where Cond A 0.031 0.063 0.000 0.031 0.000 0.031 0.031 0.188

Where Cond C 0.000 0.000 0.000 0.000 0.000 0.000 0.061 0.061

Group By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Group By C 0.000 0.030 0.000 0.000 0.000 0.030 0.000 0.061

Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Having C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Order By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Order By C 0.030 0.000 0.000 0.000 0.000 0.030 0.000 0.061

Total A 0.219 0.531 0.000 0.063 0.313 0.438 0.031 1.594

Total C 0.152 0.424 0.000 0.030 0.152 0.303 0.061 1.121

SQL

Component

Type Set

Operators

Summary Error

Average

Response

Count

Where A 0.000 Ambiguous 1.594 32

Where C 0.000 Clear 1.121 33

Union A 0.000

Union C 0.000

Intersect A 0.000

Intersect C 0.000

Minus A 0.000

Minus C 0.000

Total A 0.000

Total C 0.000

Question Two

SQL

Component

Type Keywords Symbols Logical

Operators

Relational

Operators

Tables Attributes Values Total:

View A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Select A 0.091 1.394 0.030 0.000 0.061 1.212 0.030 2.818

Select C 0.000 0.061 0.000 0.000 0.000 0.000 0.000 0.061

From A 0.061 0.030 0.000 0.000 0.121 0.000 0.000 0.212

From C 0.000 0.000 0.000 0.000 0.030 0.000 0.000 0.030

Where Join A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Where Join C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Where Cond A 0.182 0.364 0.000 0.091 0.121 0.364 0.121 1.242

Where Cond C 0.030 0.152 0.000 0.000 0.000 0.030 0.000 0.212

Group By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Group By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Having C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Order By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Order By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Total A 0.333 1.788 0.030 0.091 0.303 1.576 0.152 4.273

Total C 0.030 0.212 0.000 0.000 0.030 0.030 0.000 0.303

SQL

Component

Type Set

Operators

Summary Error

Average

Response

Count

Where A 0.000 Ambiguous 4.273 33

Where C 0.000 Clear 0.303 33

Union A 0.000

Union C 0.000

Intersect A 0.000

Intersect C 0.000

Minus A 0.000

Minus C 0.000

Total A 0.000

Total C 0.000

Page 85: My Masters Thesis

79

Question Three

SQL

Component

Type Keywords Symbols Logical

Operators

Relational

Operators

Tables Attributes Values Total:

View A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Select A 0.061 1.182 0.000 0.000 0.000 1.182 0.000 2.424

Select C 0.000 0.212 0.000 0.000 0.000 0.152 0.000 0.364

From A 0.030 0.000 0.000 0.000 0.030 0.000 0.000 0.061

From C 0.000 0.000 0.000 0.000 0.061 0.000 0.000 0.061

Where Join A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Where Join C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Where Cond A 0.061 0.182 0.788 0.091 0.000 0.030 0.303 1.455

Where Cond C 0.000 0.152 0.152 0.030 0.000 0.091 0.152 0.576

Group By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Group By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Having C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Order By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Order By C 0.000 0.000 0.000 0.000 0.000 0.030 0.000 0.030

Total A 0.152 1.364 0.788 0.091 0.030 1.212 0.303 3.939

Total C 0.000 0.364 0.152 0.030 0.061 0.273 0.152 1.030

SQL

Component

Type Set

Operators

Summary Error

Average

Response

Count

Where A 0.000 Ambiguous 3.970 33

Where C 0.000 Clear 1.030 33

Union A 0.030

Union C 0.000

Intersect A 0.000

Intersect C 0.000

Minus A 0.000

Minus C 0.000

Total A 0.030

Total C 0.000

Question Four

SQL

Component

Type Keywords Symbols Logical

Operators

Relational

Operators

Tables Attributes Values Total:

View A 0.031 0.000 0.000 0.000 0.000 0.000 0.000 0.031

View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Select A 0.813 1.031 0.000 0.000 0.094 0.500 0.000 2.438

Select C 0.121 0.182 0.000 0.000 0.000 0.121 0.000 0.424

From A 0.063 0.094 0.000 0.000 0.219 0.000 0.000 0.375

From C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Where Join A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Where Join C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Where Cond A 0.094 0.000 0.031 0.000 0.000 0.000 0.000 0.125

Where Cond C 0.061 0.061 0.000 0.000 0.000 0.061 0.000 0.182

Group By A 0.313 0.125 0.000 0.000 0.000 0.438 0.000 0.875

Group By C 0.030 0.000 0.000 0.000 0.000 0.000 0.000 0.030

Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Having C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Order By A 0.031 0.063 0.000 0.000 0.000 0.094 0.000 0.188

Order By C 0.030 0.000 0.000 0.000 0.000 0.000 0.000 0.030

Total A 1.344 1.313 0.031 0.000 0.313 1.031 0.000 4.031

Total C 0.242 0.242 0.000 0.000 0.000 0.182 0.000 0.667

SQL

Component

Type Set

Operators

Summary Error

Average

Response

Count

Where A 0.000 Ambiguous 4.031 32

Where C 0.000 Clear 0.667 33

Union A 0.000

Union C 0.000

Intersect A 0.000

Intersect C 0.000

Minus A 0.000

Minus C 0.000

Total A 0.000

Total C 0.000

Page 86: My Masters Thesis

80

Question Five

SQL

Component

Type Keywords Symbols Logical

Operators

Relational

Operators

Tables Attributes Values Total:

View A 0.091 0.000 0.000 0.000 0.030 0.000 0.000 0.121

View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Select A 0.121 1.273 0.000 0.000 0.030 1.273 0.000 2.697

Select C 0.033 0.267 0.000 0.000 0.067 0.333 0.000 0.700

From A 0.030 0.303 0.000 0.000 0.333 0.000 0.000 0.667

From C 0.033 0.233 0.000 0.000 0.333 0.000 0.000 0.600

Where Join A 0.030 1.212 0.333 0.515 1.273 1.303 0.000 4.667

Where Join C 0.033 0.700 0.200 0.233 0.667 0.733 0.000 2.567

Where Cond A 0.030 0.212 0.091 0.212 0.000 0.273 0.364 1.182

Where Cond C 0.000 0.233 0.100 0.200 0.067 0.433 0.267 1.300

Group By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Group By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Having C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Order By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Order By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Total A 0.303 3.000 0.424 0.727 1.667 2.848 0.364 9.333

Total C 0.100 1.433 0.300 0.433 1.133 1.500 0.267 5.167

SQL

Component

Type Set

Operators

Summary Error

Average

Response

Count

Where A 0.091 Ambiguous 9.424 33

Where C 0.033 Clear 5.200 30

Union A 0.000

Union C 0.000

Intersect A 0.000

Intersect C 0.000

Minus A 0.000

Minus C 0.000

Total A 0.091

Total C 0.033

Question Six

SQL

Component

Type Keywords Symbols Logical

Operators

Relational

Operators

Tables Attributes Values Total:

View A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

View C 0.043 0.000 0.000 0.000 0.000 0.000 0.000 0.043

Select A 2.235 5.529 0.000 0.000 0.765 3.412 0.353 12.294

Select C 0.174 1.435 0.000 0.000 0.217 0.652 0.043 2.522

From A 0.000 0.471 0.000 0.000 0.588 0.000 0.000 1.059

From C 0.000 0.087 0.000 0.000 0.087 0.000 0.000 0.174

Where Join A 0.235 1.353 0.176 0.647 1.294 1.294 0.000 5.000

Where Join C 0.000 0.391 0.174 0.174 0.391 0.522 0.000 1.652

Where Cond A 0.176 2.118 0.941 1.647 0.294 2.118 1.471 8.765

Where Cond C 0.000 0.217 0.130 0.130 0.000 0.130 0.130 0.739

Group By A 0.706 2.059 0.000 0.000 0.647 2.353 0.000 5.765

Group By C 0.261 1.130 0.000 0.000 0.391 1.087 0.000 2.870

Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Having C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Order By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Order By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Total A 3.353 11.529 1.118 2.294 3.588 9.176 1.824 32.882

Total C 0.478 3.261 0.304 0.304 1.087 2.391 0.174 8.000

SQL

Component

Type Set

Operators

Summary Error

Average

Response

Count

Where A 0.059 Ambiguous 32.941 17

Where C 0.000 Clear 8.000 23

Union A 0.000

Union C 0.000

Intersect A 0.000

Intersect C 0.000

Minus A 0.000

Minus C 0.000

Total A 0.059

Total C 0.000

Page 87: My Masters Thesis

81

Question Seven

SQL

Component

Type Keywords Symbols Logical

Operators

Relational

Operators

Tables Attributes Values Total:

View A 0.200 0.000 0.000 0.000 0.067 0.000 0.000 0.267

View C 0.000 0.133 0.000 0.000 0.000 0.000 0.000 0.133

Select A 0.533 1.200 0.000 0.000 0.333 0.600 0.000 2.667

Select C 0.733 0.800 0.000 0.000 0.133 0.533 0.000 2.200

From A 0.133 0.067 0.000 0.000 0.200 0.000 0.000 0.400

From C 0.000 0.067 0.000 0.000 0.067 0.000 0.000 0.133

Where Join A 0.200 1.667 0.067 0.133 0.133 0.133 0.000 2.333

Where Join C 0.067 1.067 0.067 0.067 0.133 0.133 0.000 1.533

Where Cond A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Where Cond C 0.000 0.067 0.067 0.067 0.067 0.067 0.067 0.400

Group By A 0.267 0.400 0.000 0.000 0.200 0.467 0.000 1.333

Group By C 0.133 0.467 0.000 0.000 0.200 0.400 0.000 1.200

Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Having C 0.067 0.133 0.000 0.067 0.133 0.133 0.000 0.533

Order By A 0.000 0.000 0.000 0.000 0.000 0.267 0.000 0.267

Order By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Total A 1.333 3.333 0.067 0.133 0.933 1.467 0.000 7.267

Total C 1.000 2.733 0.133 0.200 0.733 1.267 0.067 6.133

SQL

Component

Type Set

Operators

Summary Error

Average

Response

Count

Where A 0.000 Ambiguous 7.267 15

Where C 0.000 Clear 6.133 15

Union A 0.000

Union C 0.000

Intersect A 0.000

Intersect C 0.000

Minus A 0.000

Minus C 0.000

Total A 0.000

Total C 0.000

Question Eight

SQL

Component

Type Keywords Symbols Logical

Operators

Relational

Operators

Tables Attributes Values Total:

View A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Select A 0.167 0.667 0.000 0.000 0.000 0.167 0.000 1.000

Select C 0.200 0.400 0.000 0.000 0.000 0.100 0.000 0.700

From A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

From C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Where Join A 0.000 0.333 0.167 0.167 0.333 0.333 0.000 1.333

Where Join C 0.000 0.600 0.300 0.300 0.600 0.800 0.000 2.600

Where Cond A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Where Cond C 0.000 0.000 0.100 0.000 0.000 0.000 0.000 0.100

Group By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Group By C 0.100 0.400 0.000 0.000 0.100 0.400 0.000 1.000

Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Having C 0.400 0.600 0.000 0.100 0.200 0.600 0.100 2.000

Order By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Order By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Total A 0.167 1.000 0.167 0.167 0.333 0.500 0.000 2.333

Total C 0.700 2.000 0.400 0.400 0.900 1.900 0.100 6.400

SQL

Component

Type Set

Operators

Summary Error

Average

Response

Count

Where A 0.000 Ambiguous 2.333 6

Where C 0.000 Clear 6.400 10

Union A 0.000

Union C 0.000

Intersect A 0.000

Intersect C 0.000

Minus A 0.000

Minus C 0.000

Total A 0.000

Total C 0.000

Page 88: My Masters Thesis

82

Question Nine

SQL

Component

Type Keywords Symbols Logical

Operators

Relational

Operators

Tables Attributes Values Total:

View A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Select A 2.000 1.000 0.000 0.000 0.000 1.000 0.000 4.000

Select C 1.500 1.000 0.000 0.000 0.000 2.000 0.000 4.500

From A 0.000 0.333 0.000 0.000 0.333 0.000 0.000 0.667

From C 0.000 0.500 0.000 0.000 0.500 0.000 0.000 1.000

Where Join A 0.000 0.667 0.333 0.333 0.667 0.667 0.000 2.667

Where Join C 0.000 1.000 0.500 0.500 1.000 1.000 0.000 4.000

Where Cond A 0.000 1.333 1.333 0.667 0.000 0.667 1.333 5.333

Where Cond C 0.000 1.000 1.000 0.500 0.000 0.500 1.000 4.000

Group By A 0.333 1.333 0.000 0.000 0.333 1.333 0.000 3.333

Group By C 0.000 1.000 0.000 0.000 0.000 1.000 0.000 2.000

Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Having C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Order By A 1.000 0.333 0.000 0.000 0.000 0.667 0.000 2.000

Order By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Total A 3.333 5.000 1.667 1.000 1.333 4.333 1.333 18.000

Total C 1.500 4.500 1.500 1.000 1.500 4.500 1.000 15.500

SQL

Component

Type Set

Operators

Summary Error

Average

Response

Count

Where A 0.000 Ambiguous 18.000 3

Where C 0.000 Clear 15.500 2

Union A 0.000

Union C 0.000

Intersect A 0.000

Intersect C 0.000

Minus A 0.000

Minus C 0.000

Total A 0.000

Total C 0.000

Question Ten

SQL

Component

Type Keywords Symbols Logical

Operators

Relational

Operators

Tables Attributes Values Total:

View A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Select A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Select C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

From A 0.000 1.000 0.000 0.000 1.000 0.000 0.000 2.000

From C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Where Join A 0.000 2.000 1.000 1.000 2.000 2.000 0.000 8.000

Where Join C 0.000 0.500 0.250 0.250 0.500 1.000 0.000 2.500

Where Cond A 1.000 3.000 0.000 2.000 0.000 2.000 2.000 10.000

Where Cond C 0.000 0.500 0.000 0.500 0.000 0.000 0.500 1.500

Group By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Group By C 0.000 0.500 0.000 0.000 0.000 0.500 0.000 1.000

Having A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Having C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Order By A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Order By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Total A 1.000 6.000 1.000 3.000 3.000 4.000 2.000 20.000

Total C 0.000 1.500 0.250 0.750 0.500 1.500 0.500 5.000

SQL

Component

Type Set

Operators

Summary Error

Average

Response

Count

Where A 0.000 Ambiguous 20.000 1

Where C 0.000 Clear 5.000 4

Union A 0.000

Union C 0.000

Intersect A 0.000

Intersect C 0.000

Minus A 0.000

Minus C 0.000

Total A 0.000

Total C 0.000

Page 89: My Masters Thesis

83

Question Eleven

SQL

Component

Type Keywords Symbols Logical

Operators

Relational

Operators

Tables Attributes Values Total:

View A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Select A 2.500 4.000 0.000 0.000 0.500 2.500 0.000 9.500

Select C 1.000 1.000 0.000 0.000 0.000 0.000 0.000 2.000

From A 0.500 0.000 0.000 0.000 0.500 0.000 0.000 1.000

From C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Where Join A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Where Join C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Where Cond A 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Where Cond C 0.000 0.000 0.000 0.000 0.000 0.000 2.000 2.000

Group By A 0.500 1.000 0.000 0.000 0.500 1.000 0.000 3.000

Group By C 0.000 1.000 0.000 0.000 0.000 1.000 0.000 2.000

Having A 3.000 2.000 0.000 1.000 0.000 2.000 0.000 8.000

Having C 1.000 1.000 0.000 0.000 0.000 0.000 0.000 2.000

Order By A 0.500 0.000 0.000 0.000 0.000 0.000 0.000 0.500

Order By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Total A 7.000 7.000 0.000 1.000 1.500 5.500 0.000 22.000

Total C 2.000 3.000 0.000 0.000 0.000 1.000 2.000 8.000

SQL

Component

Type Set

Operators

Summary Error

Average

Response

Count

Where A 0.000 Ambiguous 22.500 2

Where C 0.000 Clear 8.000 1

Union A 0.000

Union C 0.000

Intersect A 0.000

Intersect C 0.000

Minus A 0.500

Minus C 0.000

Total A 0.500

Total C 0.000

Question Twelve

SQL

Component

Type Keywords Symbols Logical

Operators

Relational

Operators

Tables Attributes Values Total:

View A

View C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Select A

Select C 0.000 2.000 0.000 0.000 0.000 0.000 0.000 2.000

From A

From C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Where Join A

Where Join C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Where Cond A

Where Cond C 0.000 0.000 0.000 0.000 0.000 0.000 2.000 2.000

Group By A

Group By C 0.000 1.000 0.000 0.000 0.000 1.000 0.000 2.000

Having A

Having C 0.000 2.000 0.000 0.000 0.000 0.000 0.000 2.000

Order By A

Order By C 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Total A

Total C 0.000 5.000 0.000 0.000 0.000 1.000 2.000 8.000

SQL

Component

Type Set

Operators

Summary Error

Average

Response

Count

Where A Ambiguous

Where C 0.000 Clear 8.000 1

Union A

Union C 0.000

Intersect A

Intersect C 0.000

Minus A

Minus C 0.000

Total A

Total C 0.000

Page 90: My Masters Thesis

84

Appendix J: Seven Ambiguity Types Question Assessment Ratings

This table displays the average of ambiguity assessments provided by two independent non-

researchers. The scale used to assess the presence of the different type of ambiguity is:

0 1 2 3 4

None

A little Some Much A Great Deal

Question Formulation

Lexic

al

Syn

tacti

cal

Infl

ecti

ve

Pragm

ati

c

Extr

an

eou

s

Em

ph

ati

c

Su

ggest

ive

1 Ambiguous 1.5 2 0.5 1 0.5 0.5 0.5

1 Clear 0.5 0.5 0 0 0 0 0

2 Ambiguous 2 1 0 1.5 0.5 1.5 0.5

2 Clear 1 0.5 0 1 0 0 0

3 Ambiguous 0.5 3.5 0 1 0 0.5 0.5

3 Clear 0.5 0 0 0 0.5 0.5 0

4 Ambiguous 1.5 1 0 3 0 0.5 0

4 Clear 0.5 2 0 2 0 0 0

5 Ambiguous 1.5 2.5 0 0.5 0 2 0

5 Clear 1 0.5 0 0 0.5 0 0

6 Ambiguous 1.5 0.5 0.5 3 3.5 1.5 2.5

6 Clear 0.5 0.5 0 0.5 0.5 0 0

7 Ambiguous 1.5 2.5 0 0.5 0 1 1

7 Clear 0.5 0.5 0 0 0 0 0

8 Ambiguous 0.5 3.5 0.5 0.5 0 0 0

8 Clear 0.5 0.5 0 1 0 0 0

9 Ambiguous 1.5 0.5 0 2.5 0 3 0

9 Clear 0.5 0.5 0 0.5 0 0 0.5

10 Ambiguous 2 0 0 2 0.5 1 1.5

10 Clear 0.5 0 0 0 0 0 0

11 Ambiguous 1.5 1 0 2 1 1 1.5

11 Clear 0.5 0 0 0 0 0 0

12 Clear 0.5 0 0 0.5 0.5 0 0

Page 91: My Masters Thesis

85

Appendix K: Ambiguity Assessment Instrument

Ambiguity Measurement Questionnaire

Type Information Request

Lexical A report of our clients for our marketing brochure mail-out.

The word "report" may have several meanings, independent of its context.

There is: a gunshot report echoing through the hillside; the Lieutenant

reported to the Captain; I dropped the heavy report on my toe, etc.

Although the context may make the meaning clear, the lexical ambiguity

that is present adds to cognitive effort and contributes to ambiguity overall

in that manner.

Syntactical A report of clients in Brisbane and on our Gold list.

The natural language "and" does not map well to its Boolean equivalent. A

legitimate interpretation would be to assume that this request is for clients

that satisfy both conditions (Brisbane-based and on the Gold List), or for

clients that satisfy either condition (Brisbane-based or on the Gold list).

Another formulation is Bob hit the man with a stick. It is not clear,

syntactically, whether it was the man with a stick, that was hit, or whether

the man was hit with a stick by Bob.

Inflective A report that is the product of our last marketing campaign regarding sales

of our accounting software product in the last month.

Inflective ambiguity here derives from the use of the word "product" with

two different meanings in the one information request. Inflective

ambiguity is where the same word is used in the one grammatical structure

(paragraph, sentence, phrase) with different meanings. Natural writing

tends to avoid this.

Pragmatic A report of all the clients for a department.

The ambiguity here is that the department has not been specified. It would

be legitimate to prepare a report for any department, although it is likely

that this will not address the needs of the person making the information

request. Further information is needed to resolve this actual ambiguity.

Extraneous A report of all clients (and their names and addresses only) for the Tax and

Business Services department. Some of those clients are our biggest

earners, you know.

The last sentence is extraneous - unlike pragmatic ambiguity, it contains

information that is redundant, uninformative, or not necessary to meet the

needs of the question or task asked in the statement. It is "noise" in the

communication - where more words are used than are necessary to make

the statement.

Emphatic A report of our good clients.

Ambiguity here could derive from the lack of ability to provide the verbal

emphasis of the words in its written form. Depending on the emphasis

used, "good clients" could be legitimately interpreted to be clients that pay

on time, clients that have the most dollar-value sales, our very best clients

Page 92: My Masters Thesis

86

Type Information Request

(a much shorter list than if based on dollar-value), or even, with the correct

sarcastic or ironic emphasis on the spoken word, our worst clients - those

that do not pay.

Suggestive A report of the clients of this accounting practice that have lodged taxation

returns in the past five years in accordance with the requirements of the

Australian Taxation Office.

The request for information is quite clear until the phrase "in accordance

with the requirements of the Australian Taxation Office". By definition, all

taxation returns should be lodged in accordance with these requirements.

The extra phrase introduces suggestive ambiguity into the information

request by suggesting that the report will not necessarily consist of all

taxation clients.

Page 93: My Masters Thesis

87

Mark all Information Requests in Accordance with the Following Scale

0 1 2 3 4

none A little Some Much A Great

Deal

No. Ambiguity

Type

Information Request

(Scale)

1. Management wants a list of each of our suppliers with no

duplicates in the list.

Lexical

Syntactical

Inflective

Pragmatic

Extraneous

Emphatic

Suggestive

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

List the distinct suppliers of the items we stock.

Lexical

Syntactical

Inflective

Pragmatic

Extraneous

Emphatic

Suggestive

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

2. Produce a report that lists the inventory items where the quantity

on hand is much larger, on a percentage basis, than the quantity

ordered.

Lexical

Syntactical

Inflective

Pragmatic

Extraneous

Emphatic

Suggestive

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

List item number, item name, quantity on hand, quantity on order

where quantity on hand is greater than 2 * quantity ordered.

Lexical

Syntactical

Inflective

Pragmatic

Extraneous

Emphatic

Suggestive

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

Page 94: My Masters Thesis

88

No. Ambiguity

Type

Information Request

(Scale)

3. Management wants a list of all Japanese customers and customers

with credit limits over $15,000.

Lexical

Syntactical

Inflective

Pragmatic

Extraneous

Emphatic

Suggestive

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

List customer numbers, customer names, country, and credit limit

of customers with credit limits greater than $15,000 or of

customers in Japan.

Lexical

Syntactical

Inflective

Pragmatic

Extraneous

Emphatic

Suggestive

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

4. Produce a report that statistically compares the credit limits for

customers in different countries.

Lexical

Syntactical

Inflective

Pragmatic

Extraneous

Emphatic

Suggestive

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

List country, average credit limit, and standard deviation of

customer credit limit grouped by country.

Lexical

Syntactical

Inflective

Pragmatic

Extraneous

Emphatic

Suggestive

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

5. Produce a report of clients that prefer the Speedair carrier and

addresses.

Lexical

Syntactical

Inflective

Pragmatic

Extraneous

Emphatic

Suggestive

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

Page 95: My Masters Thesis

89

No. Ambiguity

Type

Information Request

(Scale)

List customer number, customer name, street, city, post code, and

country where the customer's preferred carrier is Speedair.

Lexical

Syntactical

Inflective

Pragmatic

Extraneous

Emphatic

Suggestive

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

6. We're wondering if some of our winemakers are using poor quality

packaging and bottles - we've had a few complaints. Can you get

us a report that gives us some sort of idea about what items we are

shipping compared to what the customers are taking delivery of?

It would probably be a good idea while you're at it to give a

comparative percentage of the stuff shipped that doesn't make it -

just so the vintners won't try and weasel their way out of it, you

understand, they're good at that.

Lexical

Syntactical

Inflective

Pragmatic

Extraneous

Emphatic

Suggestive

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

List item maker, item number, item name, and 100 * (sum of

quantity shipped less sum of quantity accepted) / (sum of quantity

shipped) where the type of alcohol is wine.

Lexical

Syntactical

Inflective

Pragmatic

Extraneous

Emphatic

Suggestive

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

7. Prepare a report that provides *all* customer's details and

indicates the number of different products they have ordered from

us.

Lexical

Syntactical

Inflective

Pragmatic

Extraneous

Emphatic

Suggestive

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

Page 96: My Masters Thesis

90

No. Ambiguity

Type

Information Request

(Scale)

List customer number, and customer name for *all* customers,

and, if they have ordered anything, a count of unique items

ordered.

Lexical

Syntactical

Inflective

Pragmatic

Extraneous

Emphatic

Suggestive

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

8. Management wants to know which customers we've shipped goods

more than 10 times to them by the shipper that they requested.

Lexical

Syntactical

Inflective

Pragmatic

Extraneous

Emphatic

Suggestive

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

List customer number, name, and count of invoices, where the

actual carrier is the same as the customer's preferred carrier,

having more than 10 shipments.

Lexical

Syntactical

Inflective

Pragmatic

Extraneous

Emphatic

Suggestive

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

9. Produce a report, with best items first, on the gross contribution to

profitability of each inventory item for July 1999.

Lexical

Syntactical

Inflective

Pragmatic

Extraneous

Emphatic

Suggestive

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

List item number, item description, and (unit price less unit cost)

multiplied by units sold in July 1999. Sort your output by

descending gross contribution to profitability.

Lexical

Syntactical

Inflective

Pragmatic

Extraneous

Emphatic

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

Page 97: My Masters Thesis

91

No. Ambiguity

Type

Information Request

(Scale)

Suggestive 0 1 2 3 4

Page 98: My Masters Thesis

92

No. Ambiguity

Type

Information Request

(Scale)

10. Produce a report with the relevant customer details that gives us an

idea of how much of our business is exposed to foreign currency

fluctuations.

Lexical

Syntactical

Inflective

Pragmatic

Extraneous

Emphatic

Suggestive

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

List customer number, customer name, customer country, and a

total of the amount paid where the settlement currency code for the

invoice is not equal to the currency code for Australian dollars.

Group results by customer number.

Lexical

Syntactical

Inflective

Pragmatic

Extraneous

Emphatic

Suggestive

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

11. Management is concerned about current slow-moving inventory

items, based on shipments since 1 June 1999. Produce a report of

the items that they might be most concerned about.

Lexical

Syntactical

Inflective

Pragmatic

Extraneous

Emphatic

Suggestive

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

List inventory item number, item description, quantity on hand,

and sum(quantity shipped) with ship dates greater than 1 June

1999 that have sums of the quantity shipped less than the sums of

the quantity on hand.

Lexical

Syntactical

Inflective

Pragmatic

Extraneous

Emphatic

Suggestive

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

Page 99: My Masters Thesis

93

No. Ambiguity

Type

Information Request

(Scale)

12. Produce a report that gives some idea about our best USA export

items where the amount since March is bigger than $5,000.

Lexical

Syntactical

Inflective

Pragmatic

Extraneous

Emphatic

Suggestive

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

List item numbers, item descriptions and the total accepted

quantity times agreed price of each item for items shipped to US

customers since 1 March 1999 and having a total accepted quantity

times agreed price greater than $5,000.

Lexical

Syntactical

Inflective

Pragmatic

Extraneous

Emphatic

Suggestive

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

Page 100: My Masters Thesis

94

Appendix L: Internal Validity of the Experiment

A full explanation of the recognised seven "threats" for the internal validity of experiments is

contained in Huck et al. (1974). The comments made below have their basis in the discussion

presented in Huck et al. (1974).

History

The history threat to internal validity arises where an event outside of the domain of the

experiment occurs that may affect the independent variable. As the experiment took place

over a two hour period in a controlled setting, over two days of experimental testing, there is

not considered to be a history threat to internal validity for this experiment.

Maturation

Maturation occurs where the participants mature, grow, and learn during the course of the

experiment. The passage of time increases the recorded end user query performance. Any

maturation effect is adequately controlled for in this instance, as the experiment was two

hours in duration, homogeneous groups were used, and each tutorial group tested contained

both Group A and Group B participants. Further, both groups received the ambiguity

treatment on alternate questions. Any residual maturation effect (such as learning the use of

the SQL experimental tool or increased proficiency in SQL during the experiment) applies

equally to the clear and ambiguous treatment effects.

Page 101: My Masters Thesis

95

Testing

Testing occurs where the individuals taking the test score higher than their first sitting of the

test. Within this experiment, the possibility exists that participants learned more about the

use of the experimental tools and process (the SQL editor). Subsequent questions (for

example, question one compared to question six) might result in superior performance

(particularly time for completion) due to the testing effect. Due to the factors cited for the

maturation effect, any experimental testing effect - should there be any - applies equally to

both the clear and ambiguous formulations of the question. Additionally, participants who

had undertaken similar experiments previously are stratified into separate classes. Group A

and Group B were homogeneous in this respect. Therefore, both within the experiment, and

from previous experiments, any testing effect that exists in this experiment from these

sources applies equally to both treatment effects.

Instrumentation

Instrumentation is identified by Huck et al. (1974) as the effect of any change in the

observational technique accounting for any experimentally observed difference. This could

arise in the current experiment with a maturation change in the assessors over the time taken

to assess student responses. Assessors could correct later participant responses differently to

earlier participant responses.

This effect is controlled for in several ways. Firstly, when assessing responses, assessors had

no means to identify participant responses by student name, only student number. This

avoided assessors' preconceptions about student's performance. The use of two independent

assessors controlled for some differences in marking strategies, as did the use of diary notes

Page 102: My Masters Thesis

96

to ensure consistency of marking over time. An exhaustive cross-checking and data

correctness procedure also mitigates this effect.

Responses were assessed by student in no particular order. Group A and Group B participant

responses were evenly distributed in the marking order, with a calculated non-parametric runs

test z statistic of 0.9924 (Newbold 1984). This weak z-statistic (significant only at a 32%

confidence level on a two-tailed hypothesis) implies that any residual instrumentation effect,

should it exist, is evenly applicable to either question formulation. Overall, the threat of

instrumentation to experimental results in this regard is controlled for.

Statistical Regression

Statistical regression occurs where the analysis of the experiment is on extreme scores, such

that subsequent tests tend to regress to the mean (Huck et al. 1974). The current experiment

is not exposed to this threat to internal validity, as extreme scores are not the focus of the

experiment. Furthermore, the experimental design and assessment process used adequately

controls for this threat to internal validity, as previously described.

Mortality

Mortality occurs where participants drop out of the experiment during its course. As this

experiment is short in duration (two hours), participant mortality did not occur during the

experiment. In addition, all sixty-six students enrolled in the subjects participated in the

experiment. The mortality effect is of some concern, however, in that incomplete participant

responses were removed from the analysis. There were 506 participant responses, of which

425 responses were completed and statistically analysed in the experiment.

Page 103: My Masters Thesis

97

The effect of this acknowledged experimental bias is to reduce the total number of responses

examined, and a general tendency to remove from analysis responses with a significant

number of errors. As this bias tends to be against the direction of the hypotheses made in this

paper, any conclusions drawn in this regard are strengthened, and the mortality effect on

interpretation of results is lessened. Overall, the mortality effect strengthens any conclusions

drawn, and thus is less of an internal validity issue for the current experiment.

Selection Bias

The selection process resulted in two homogeneous groups, Group A and Group B, drawn

from the entire student population of two information systems subjects. There is no evident

selection bias between Group A and Group B. In any case, both Group A and Group B

received the treatment effect of ambiguity on alternate questions, further mitigating concerns

of the effect of a selection bias on experimental results.