what is tqa

8
WHAT IS TQA? TQA is a type of evaluation, but what is evaluation? Michael Scriven, a leading evaluation researcher, defines it a follows: ―‗Evaluation‘is taken to mean the determination of merit, worth, or significance‖ (2007: 1). This definition itself presents a difficulty: How do we define value or worth, be it moral, aesthetic or utilitarian? By extension, evaluation involves asking a question that has challenged thinkers from time immemorial: Is a particular thing good? Just like evaluation in the broad sense, TQA can be quantitative or qualitative: it can be based on mathematical/statistical measurement (as in the case of most academic instruments) or on reader response, interviews and questionnaires (e.g. Nida). TQA can be diagnostic (determining areas for improvement at the outset of a course of study), formative (measuring progress and giving feedback during a course of study) or summative (measuring the results of learning).

Upload: andreea-dobrinas

Post on 10-Nov-2015

217 views

Category:

Documents


4 download

DESCRIPTION

translation quality assessment

TRANSCRIPT

WHAT IS TQA

WHAT IS TQA?

TQA is a type of evaluation, but what is evaluation? Michael Scriven, a leading evaluation researcher, defines it a follows: Evaluationis taken to mean the determination of merit, worth, or significance (2007: 1). This definition itself presents a difficulty: How do we define value or worth, be it moral, aesthetic or utilitarian? By extension, evaluation involves asking a question that has challenged thinkers from time immemorial: Is a particular thing good? Just like evaluation in the broad sense, TQA can be quantitative or qualitative: it can be based on mathematical/statistical measurement (as in the case of most academic instruments) or on reader response, interviews and questionnaires (e.g. Nida). TQA can be diagnostic (determining areas for improvement at the outset of a course of study), formative (measuring progress and giving feedback during a course of study) or summative (measuring the results of learning).

Quality in translation is certainly one of the most debated subjects in the field. The strong interest it continues to generate among different groups, from researchers and translation organisations to practitioners and translation teachers, has made it a field of inquiry on its own, called translation quality assessment (TQA). This interest is motivated by both academic and economic/professional reasons: the need to evaluate students work and the translation providers need to ensure a quality product). What makes a good translation? What are the standards that have to be met for a translation to be excellent, good or simply acceptable? Is there a universally acceptable model of evaluation? In the absence of any precise answer to the above questions, one can imagine an impetuous wish to develop an evaluation system which would solve the problem of subjectivity by providing a standard specification of what an acceptable translation should or should not contain. But, as it is extensively recognised (Pym 1992, Sager1989), there is still no universally accepted evaluation model in the translation world: there are no generally accepted objective criteria for evaluating the quality of both translations and interpreting performance. Even the latest national and international standards in this area DIN 2345 and the ISO 9000 series do not regulate the evaluation of translation quality in a particular context. [] The result is assessment chaos. (Institut fr Angewandte Linguistik und Translatologie, 1999, in Williams, 2001: 327). The reason why no single standard will suffice is that quality is context dependent. This is what Sager (1989) says when he says there are no absolute standards of translation quality, but only more or less appropriate translations for the purpose for which they are intended. For many types of texts, both vocative and informative, an important element of their appropriateness or fitness for purpose will be extrinsic whether they effectively usable by their consumers/readers in pursuit of their purpose. Since the establishment of such extrinsic standards of translation quality is elusive, a common tendency is to take a narrower view, focusing on intrinsic characteristics of translated texts and on errors committed in translation as a way of measuring quality.

PROBLEMS AND ISSUES IN DESIGNING AND APPLYING TQA Why is it so difficult to establish and apply a TQA model? There are many reasons. I consider the following ones to be the most important, and they inevitably entail problems of validity or reliability. a) The evaluator: Does the evaluator have the linguistic or subject-field knowledge required? The client, whose knowledge may be limited, inevitably evaluates the finished product too. Indeed, the clients assessment may be the only one. Further, a number of translation researchers, including Hnig and other functionalists, Dyson (1994) and Kingscott (1996) have implicitly or explicitly given precedence to the readers response or requirements, not the translators definition of an adequate translation, as the yardstick for gauging quality. b) Level of target language rigour: Elegant style is considered essential by some evaluators, but not by others. Some evaluators consider typos and spelling and punctuation errors to be peccadillos and ignore them in their overall assessment, while M. Williams / Translation Quality Assesment Mutatis Mutandis. Vol 2, No 1. 2009. pp. 3 - 23 6 others will regard them as serious because they are precisely the errors that the client/end user will detect. c) Seriousness of errors of transfer: The same inconsistency is apparent in the assessment of level of accuracy. Some evaluators will ignore minor shifts in meaning if the core message is preserved in the translation, while others will insist on total "fidelity," even if an omission of a concept at one point is offset by its inclusion elsewhere in the text. Reasons (b) and (c) underlie the frequent complaints about evaluator subjectivity. d) Sampling versus full-text analysis: TQA has traditionally been based on intensive error detection and analysis and has therefore required a considerable investment in human resources. It takes time. One means of obviating the problem has been sampling the analysis of samples of translations instead of whole texts. Yet this approach has shortcomings. First, the evaluator may not take into account any "compensatory" efforts that the translator has made in unsampled parts of the text. Second, the evaluator may not have taken into account the co-text in order to grasp the meaning of the text as a whole. Third, as Daniel Gouadec has pointed out, "There is always a risk that the most serious errors may lie outside the samples. This is especially true of the work of established translators, who are capable of dramatic, uncontrolled deviations from the meaning of the source text (1989: 56). e) Quantification of quality: Microtextual analysis of samples has been used extensively not only because it saves time but also because it provides error counts as a justification for a negative assessment. Translation services and teachers of translation alike have developed TQA grids with several quality levels, or grades, based on the number of errors in a text of fixed length. It is felt that quantification lends objectivity and defensibility to the assessment. The problem lies with the borderline cases. Assuming that, in order to be user-friendly, such a grid does not allow for many levels of seriousness of error, it is quite possible for a translation containing one more error than the maximum allowed to be as good as, if not better than, another translation that contains exactly the maximum number of errors allowed and yet be rated unsatisfactory. f) Levels of seriousness of error: One way to circumvent the drawbacks of quantification is to grade errors by seriousness: major, minor, weak point, etc. The problem then is to seek a consensus on what constitutes a major, as opposed to a minor, error. For example, an error in translating numerals may be considered very serious by some, particularly in financial, scientific or technical material, yet others will claim that the client or end user will recognize the slip-up and automatically correct it in the process of reading. g) Multiple levels of assessment: Many authorities, including Nord (1991) and House (1997), identify a number of parameters against which the quality of a translation M. Williams / Translation Quality Assesment Mutatis Mutandis. Vol 2, No 1. 2009. pp. 3 - 23 7 should be assessed: accuracy, target language quality, format (appearance of text), register, situationality, etc. The problem is this: Assuming you can make a fair assessment against each parameter, how do you then generate an overall quality rating for the translation? h) TQA purpose/function: The required characteristics of a TQA tool built for formative assessment in a university context may differ significantly from one developed for predelivery quality control by a translation supplier. According to Hatim and Mason, "Even within what has been published on the subject of evaluation, one must ],(distinguish between the activities of assessing the quality of translations [ translation criticism and translation quality control on the one hand and those of assessing performance on the other" (1997: 199).Professional Approaches to Translation Evaluation

The 70s saw a development in the field, in both practice and theory. A growing emphasis started to be placed on the creation of explicit and applicable correction scales and, as a result, on the creation of translation error typologies. Following this, the idea of a translation acceptability threshold based on a certain number of errors was introduced. The broader motivation was to reduce such factors as time, money, human effort and subjectivity and to introduce a more systematic type of analysis. The commitment to deliver error-free translations to clients, the enormous amount of materials to be translated, and the growing competition between translation providers resulted in an increasing interest in quality assurance. The first step towards an innovative TQA model which had at heart the concept of categorisation of errors, was taken in Canada in the 70s and resulted in the Canadian Language Quality Measurement System (Sical), Canadian governments Translation Bureaus property. This system, in its successive versions was developed by Gouadec (Williams 2001) and was based on an error scheme which, on the one hand, made the distinction between transfer and language error and, on the other hand, labelled the error as being major or minor. The complexity of the scale is such that it allows 39 the identification of 675 error types (300 lexical and 375 syntactic) (Melis & Hurtado 2001: 274). In judging the acceptability of one translation, the major errors were the one that counted. A major error was considered when, translating an essential element from the ST, the translator would fail to render the exact meaning of the original, create confusion related to meaning or use incorrect or obvious inadequate language. As a result, the threshold for acceptability was not situated too high and, presumably, translations of a questionable quality could have still be judged as acceptable. In theory, then, a fully acceptable translation of 400 words could contain as many as 12 errors of transfer, provided no major error was detected. However, the designers of Sical III predicated the lowering of the tolerance level on the statistical probability that a translation with 12 such errors would also contain at least one major error. (Williams, 2001: 330) It was clear that Sical was conducting a sample analysis at the word and sentence level, not on the text as a whole. The fact the Sical deals only with syntactic and semantic aspects means it overlooks any phenomena that occur at the level of sentence relations. This resulted in criticism over the acceptability of the content of a translation as a whole and over the imprecision of the specific number of errors and their type (Williams, 2001:331). Moreover, the large number of error types made this model hard to use. However, it proved to be popular, since numerous other organisations and agencies in Canada (the Ontario government translation services, Bell Canada) opted for a customised version of Sical. The search for workable evaluation schemes based on error classification has continued, with many following the Sical model and listing a number of error categories with or without a certain score attached to each and every one of them. In that category fall schemes developed and adopted by big translation organisations such as ATA (American Translators Association) whose scheme include 22 errors types ranging from terminology and register to accents and diacritical marks. The categories require the evaluator to spot the translation errors, then to assign 1, 2, 4, 8, or 16 error points for each error. A passage (of usually 225- 275 words) with a final score of 18 or higher is marked Fail. This tendency to assign a weighting on a pre-defined scale to every translation error, rather than simply mark it as minor or major, rapidly gained popularity as it was considered a step forward in the development of translation quality evaluation models. But, while acknowledging that such a scale is more refined, we cannot implicitly accept its objectiveness. No meta rules are stressed as to how an evaluator should apply these scores, that is what constitutes a 1-point error versus a 16-point one. By making it available on their web site, ATA gives the translators an idea about what types of errors might be allowed in a translation that meets ATA standards.

In an attempt to further minimise the time and effort spent by the evaluator in objectively grading a translation, some companies used the computer to manage mathematical operations and manual processing in the allocation of translation errors. SAE J 2450 is a quality metrics developed by SAE (Society of Automotive Engineers) in collaboration with GM (General Motors). The aim was to establish a standard quality metric for the automotive industry that could be used to provide an objective measure of linguistic quality for automotive service information regardless of language or process. The metric became an SAE Recommended Practice in October 2001. The model is based on seven error categories focussing on content problems that might affect the overall understanding of the content, rather than on style see Figure 2. These categories prompt the evaluator or the translator to classify them as major or minor, with a numeric score and severity level (serious /minor) attached to each error. According to its relevance in the source text (ST), each error has a certain weight; the final score is obtained by adding up the scores of the errors and dividing the result by the number of words in the text.)

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.126.3654&rep=rep1&type=pdf