language assessment tsl3123 notes
Post on 19-Jan-2017
792 Views
Preview:
TRANSCRIPT
TOPIC 1 – OVERVIEW OF ASSESSMENT (CONTEXT, ISSUES AND TRENDS)
DEFINITIONS OF TERMS
1. Test
A subset of ASSESSMENT intended to measure a test-takers language
proficiency, knowledge, performance or skills.
A technique of assessment.
2. Assessment
A comprehensive process of planning, collecting, analyzing, reporting and
using information on students over time.
Appraising the level of magnitude of a person’s attributes.
Consists of tests, projects, portfolios, anecdotal information and student self-
reflection.
3. Evaluation
Evaluation ≠ testing
Involves the interpretation of information.
Involves conveying the marks or test results into performance report for test-
takers.
4. Measurement
Assigning numbers or values to certain attributes of objects, events or people
according to a rule-governed system.
Measurement must be conducted according to explicit rules and procedures
e.g. test specifications, criteria and marking procedures.
Process of quantifying the observable performance of classroom learners.
Test scores are measurements. Conveying the meaning of those scores is evaluation.
Evaluation, however can also take place WITHOUT measurement (e.g. responses in
classroom by learners)
Trends and Issues – read Module.
TOPIC 2 – ROLE AND PURPOSES OF ASSESSMENT IN TEACHING AND LEARNING
REASONS/PURPOSE OF ASSESSMENT
To determine the effective teaching strategies to be used in the classroom.
To improve classroom practice and instructions.
To provide information to children, parents and administrators.
To measure students’ achievement – identify students’ strengths and weaknesses.
Identify the difficult topic in a given unit – need to reteach.
Used to measure proficiency, place students into levels of course, diagnose students’
strengths and weaknesses.
ASSESSMENT OF LEARNING @ SUMMATIVE LEARNING
A summary of learning.
Measure, record and report on student’s level of achievement in regards to specific
learning expectations.
Tells teacher the current status of the students’ learning.
Provide focus to improve on students’ achievement.
ASSESSMENT FOR LEARNING @ FORMATIVE ASSESSMENT/DIAGNOSTIC ASSESSMENTS
Conducted to improve students learning in teaching and learning process.
Find out alternative teaching strategies
Find out students’ understanding of the instructions.
ASSESSMENT TYPES
FORMAL INFORMAL
Public exams (UPSR, PMR, SPM, STPM)
Year-end exams / semester exams.
Monthly test
Projects
Question and Answer
Quizzes
Students’ self-reflection
SIX TYPES OF INFORMATION TESTS PROVIDE ABOUT STUDENTS (HENNING, 1987)
Diagnosis and feedback
Screening and selection
Placement
Program evaluation
Providing research criteria
Assessment of attitudes and socio-psychological differences
CLASSIFICATION OF TESTS (ALDERSON, CLAPHAM & WALL, 1995)
Proficiency Tests
Not based on a particular curriculum or language program.
Assess overall language ability of students at varying levels.
Describes what students are capable of doing in a language.
E.g. ETS, Cambridge ESOL, American TOEFL, IELTS.
Achievement Tests
Similar to progress tests
Administered at mid-and-end point of the semester or academic year.
Content of achievement tests is generally based on the specific course content.
Diagnostic Tests
Seek to identify those language areas in which students need further help.
Information gained is crucial for further course activities and providing students
with remediation.
Aptitude Tests
Measure general ability or capacity to learn a foreign language a priori.
Predicts success in that undertaking.
Progress Test
Measure the progress the students are making towards defined course or program
goals.
Teacher-produced, narrowed focus.
Cover smaller amount of material and assess fewer objectives.
Placement Tests
Assess students’ level of language ability for placement in an appropriate course or
class.
Indicates the level at which a student will learn most effectively.
Create groups homogenous in level.
TOPIC 3 – BASIC TERMINOLOGY IN TESTING
NORM-REFERENCE TEST
In NRTs individual test-taker’s score is interpreted in relation to
o a mean (average score),
o median (middle score),
o standard deviation (extent on variance in scores)
o and/or percentile rank
Purpose: to place test-takers along a mathematical continuum in rank order.
Scores commonly reported back to test-taker in the form of a numerical score.
Administered to compare an individual performance with his peers or a group/
In SBE, NRTs are used for summative evaluation e.g. year-end examination for streaming
and selection.
CRITERION-REFERENCE TEST (CRT)
A collection of information about student progress or achievement in relation to a
specified criterion.
Standards serve as the criteria or yardstick for measurement.
Provide:
o Feedback to test-takers mostly in the form of grades, on specific course or
lesson objectives.
o Information on students’ mastery based on outcomes or objectives as
specified in the syllabus.
Advantage: allows test-takers to make inferences about how much language proficiency
they originally had and their successive gains over time.
NRT CRT
Definition/Purpose Test that measures student achievement as compared to others. Determine performance difference among individual and groups.
Approach that provides information on student’s mastery based on a specified criterion Determine learning mastery based on specified criterion.
Test item / Frequency
Easy to difficult – discriminates examinees’ ability. Continuous assessment in classroom.
Guided by minimum achievement in the related objectives. Continuous assessment.
Appropriateness / Examples
Summative evaluation UPSR, PMR, SPM, STPM
Formative evaluation. Monthly test, coursework, project.
OBJECTIVE TEST
A test that consist of right or wrong answers or responses.
Can be marked objectively.
Popular because easy to prepare and quick to mark, providing a quantifiable and
concrete result.
Focus more on specific facts than general ideas and concepts.
Types:
o MCQ
o True/false items
o Matching items
o Fill-in the blanks items.
MCQ
o Limit beneficial washback.
o Enable cheating
o Challenging to write successful items.
o Strictly limits what can be tested.
o Test only recognition knowledge.
o Encourage guessing.
SUBJECTIVE TEST
Evaluated by giving an opinion based on agreed criteria.
Include essay, short-answer questions, vocabulary tests, and take-home tests.
Provides more opportunity to test-takes to show/demonstrate their understanding and
in depth knowledge and skills in the subject matter.
Test takers might provide some acceptable answers that might not be predicted.
Enable students to be more creative and critical.
E.g. extended-response items, restricted-response items, essay.
TOPIC 4 – BASIC PRINCIPLES OF ASSESSMENT
RELIABILITY
The degree to which an assessment tool produces stable and consistent results.
A reliable test is stable, consistent and dependable.
Same test, given to same test-takers on two circumstances, should yield same results.
Lack of reliability threats the item’s validity.
A reliable test is
o Consistent in its conditions across two or more administrations
o Gives clear directions for scoring / evaluation
o Has uniform rubrics for scoring / evaluation.
o Unambiguous to test-takers.
Two types of reliability
o Rater reliability (markers)
- Inter-rater reliability – degree of similarity between 2 tester without
influencing one another
- Intra-rater reliability – consistency within the raters.
o Test administration reliability
- Deals with interference and conditions during the test administration.
- Outside interference – noise, temperature, variations in
photocopying, lighting, conditions of chairs and tables.
Factors affecting the reliability of a test:
o Test length factor
- Longer tests are more reliable – avoids guessing.
o Teacher – student factor
- Teacher’s encouragement.
- Familiarity of test formats.
o Environmental factors
- Lighting
- Ventilation
- Comfortable setting
o Test administration factor
- Clear instruction
- Enough time
o Marking factors
- Markers’ objectivity and perception of the task of marking itself.
VALIDITY
Whether the assessment really measures what it claims to measure.
Types of validity
Face validity A test should look right (students take it seriously)
Should measure the knowledge or skills it is made to measure.
Affects how the students see and respond to the test (serious or otherwise)
Content validity Are all the content learnt presented in the test?
Can use the table of test specification o Skills o Level o Topics with all the items
Construct validity Does the test measure what it claims to? o Linguistic construct – fluency, proficiency. o Oral test – speed and rhythm.
Concurrent validity The use of another reputable test and compare with test set – measure students’ performance.
Predictive validity Predict result on a public exam from the trial paper
Pre-SPM to SPM
High predictive validity = predictable results in latter measure.
PRACTICALITY
Is the assessment tool practical to be used, and could be set up within possible means?
If carried out, would the effort of setting it up worth the objectives of having the test in
the first place?
Could the results be easily interpreted?
OBJECTIVITY
Refers to the ability of the examiners to be objective and non-partisan or non-bias.
Scores must be awarded not under the influence of examiners’ emotions or
opinions or skills.
WASHBACK EFFECT
Impact that tests have on teaching and learning.
Students’ mistakes can indicate areas of improvement.
Compliment students’ achievements.
More important tests have greater washback effects.
AUTHENTIC
Assessment should reflect the actual feature of the target language task.
Students’ are motivate when tasks involve real world situations and contexts.
INTERPRETABILITY
Involves how meaning is assigned to the scores.
Take into consideration these factors:
o Reliability.
o Validity
o Scores, norms and related technical features.
o Administration and scoring variation.
TOPIC 5 – DESIGNING CLASSROOM LANGUAGE TEST
Stages of Test Planning
Determining Who is to be tested?
What is the purpose of testing
What is to be tested?
What is the scope of the test?
How detailed and accurate must the results be?
Planning Set a specification for the test (Table of Specifications etc.)
Includes information on content, format and timing, criteria, level of performance and determine them.
Writing Construct test items.
Collaborate with colleagues to find faults in the test paper.
Preparing Prepare the test items based on principles and techniques.
Reviewing Should not be reviewed straight away after construction.
Other teachers / native speakers need to review the test paper prepared.
Pre-testing Test on a similar target group to analyse individual items as well as whole test.
Validating Validate the test using formula:
𝐼𝐹 (𝑖𝑡𝑒𝑚 𝑓𝑎𝑐𝑖𝑙𝑖𝑡𝑦) = Σ𝑐 (𝑛𝑜. 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑎𝑛𝑠𝑤𝑒𝑟𝑠)
𝑁(𝑛𝑜. 𝑜𝑓 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠)
𝐼𝐹 (𝑖𝑡𝑒𝑚 𝑑𝑖𝑓𝑓𝑖𝑐𝑢𝑙𝑡𝑦) = Σ𝑤 (𝑛𝑜. 𝑜𝑓 𝑤𝑟𝑜𝑛𝑔 𝑎𝑛𝑤𝑒𝑟𝑠)
𝑁 (𝑛𝑜. 𝑜𝑓 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠)
0 to 1 (<0.37 – difficult, >0.63 – easy)
PREPARING A TEST SPECIFICATION / BLUEPRINT
Outlines the test construct and how it would look like.
Covers:
o Content description
o Item types (MCQ, cloze, etc.)
o Tasks (essay, short passage)
o Skills included
o Scoring
o Reporting
BLOOM’S TAXONOMY
Cognitive Dimension
Definition Key-words
Remembering Recalling information
Retrieving knowledge from long term memory
Recognizing Naming Listing
Describing
Understanding Explaining ideas or concepts
Construct meaning from instructional messages.
Interpreting Summarizing
Classifying Explaining
Applying Using information in another familiar situation.
Apply a procedure to a familiar task.
Implementing Carrying out
Executing
Analyzing Break materials into parts to explore understandings and relationships.
Comparing Organizing
Deconstructing
Evaluating Justifying a decision or making judgments based on criteria and standards.
Checking Hypothesizing
Critiquing Experimenting
Evaluating
Creating Generating new ideas, putting elements together to form coherent whole or new pattern.
Designing Constructing
Planning
SOLO (STRUCTURE OF THE OBSERVED LEARNING OUTCOME) TAXONOMY
Describes how a learner’s performance develops from simple to complex levels.
5 stages:
o Quantitative phase
Pre-structural
Uni-structural
Multi-structural
o Qualitative phase
Relational
Extended abstract
Stages Descriptions Sample Verbs
Prestructural Incompetence Fail Incompetent
Misses the point
Unistructural One relevant aspect Identify Name
Follow simple procedure
Multistructural Several relevant independent aspects. Combine Describe
Enumerate Perform serial
skills List
Relational Integrated into a structure Analyse Apply
Argue / compare Criticize Relate
Extended abstract
Generalized into a new domain Create Formulate Generate
Hypothesize
GUIDELINES FOR CONSTRUCTING TEST ITEMS
Building test items must take into consideration following items:
Aims of the test
Range of the topics to be tested
Range of skills to be tested.
Test format
Level of difficulty
International and cultural considerations
TEST FORMATS
Depends on the test itself
UPSR - paper 1 and paper 2
TOEFL – internet based test and paper based test
IELTS – four language skills
MUET – similar to IELTS
TOPIC 6 – ASSESSING LANGUAGE SKILLS CONTENT
OBJECTIVE AND SUBJECTIVE TEST
Select type tests – objective (MCQ, TRUE/FALSE)
Supply type tests – subjective (ESSAY, DICTATION)
Types of tests according to students’ expected response
o Selected response
True false
Matching
Multiple choice
o Constructed response
Fill in
Short answer
Performance test
o Personal response
Conferences
Portfolios
Self and peer assessments
TYPES OF TEST ITEMS TO ASSESS LANGUAGE CONTENT
Discrete Point Test
o Examine one element at a time
o Language is seen to be made up of smaller units and it may be possible to test
language by testing each unit at a time.
o Example – MCQ
Integrative Test
o Requires the candidates to combine many language elements in the
completion of a task.
o A simultaneous measure of knowledge and ability of a variety of language
features, modes or skills.
o Example – essays
Communicative Test
o Testing involves many aspects that revolves around communicative elements
and meaningful content.
o Involves performance, authenticity and scored on real-life outcomes (usually
behavioral)
TOPIC 7 – SCORING, GRADING AND ASSESSMENT CRITERIA
APPROACHES TO SCORING
1. OBJECTIVE APPROACH / METHOD
Relies on quantified methods of evaluating students’ writing
o Establish standardization – limit length of assessment (word count)
o Identify elements to be assessed – identify mistakes (spelling error,
grammar mistakes, vocabulary, etc)
o Operationalize the assessment – assign a weight score to each error,
according to distortion of readability or flow error.
o Quantify the assessment using Correctness score:
o 𝑤𝑜𝑟𝑑 𝑙𝑖𝑚𝑖𝑡
𝑠𝑢𝑚 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟 𝑠𝑐𝑜𝑟𝑒𝑠
2. HOLISTIC APPROACH
Reader reacts to the students’ compositions as a whole and a single score is
awarded to the writing.
Each score will be accompanied with a general descriptor of abilities (rubric)
Primary trait scoring – particular functional focus is selected which is based on the
purpose of the writing and grading is based on how well the student is able to
express that function.
Emphasizes functional and communicative ability rather than ability and accuracy.
3. ANALYTIC APPROACH
A familiar approach for many evaluators.
Raters assess students’ performance on a variety of categories which are
hypothesized to make up the skill of writing
Example:
Components Weight
Content 30%
Organization 40%
Vocabulary 30%
Points assigned to each component reflects the importance of the components in
question.
COMPARING THE THREE APPROACHES
Approach Advantages Disadvantages
Holistic Quickly graded
Provide public standard that is
understood by the teachers
and students alike
Relatively higher degree of
rater reliability
Applicable to assessment of
many different topics
Emphasize students’ strengths
rather than weaknesses.
Single score may actually
mask differences across
individual compositions
Does not provide a lot of
diagnostic feedback.
Analytic It provides clear guidelines in
grading in form of the various
components
Allows graders to consciously
address important aspects of
writing
Writing ability is unnaturally
split up into components.
Objective Emphasizes the students’
strengths rather than their
weaknesses.
Still some degree of
subjectivity involved.
Accentuates negative
aspects of learners’ writing
without giving credit for
what they can do well.
TOPIC 9 – REPORTING OF ASSESSMENT DATA
PURPOSES OF REPORTING
To obtain information concerning a particular behaviour or characteristic.
Based on those information, several types of decisions can be made.
o Instructional decisions
o Grading decisions
o Diagnostic decisions
o Selection decisions
o Placement decisions
o Counselling & guidance decisions
o Program or curriculum decisions
o Administrative decisions
REPORTING METHODS
Student achievement and progress can be reported by comparing:
Norm-referenced assessment and reporting
o Assessing and reporting students’ achievement and progress in comparison to
other students.
Criterion-referenced assessment and reporting
o In comparison to a predetermined criteria.
Outcome-based approach
o Provide information that students can be working towards syllabus outcomes
anywhere along the continuum of learning.
PRINCIPLES OF EFFECTIVE AND INFORMATIVE ASSESSMENT AND REPORTING
Has clear, direct links with outcomes
Is integral to teaching and learning
Is balanced, comprehensive and varied.
Is valid.
Is fair.
Engages the learner.
Values teacher judgment.
Time efficient and manageable
Recognizes individual achievement and progress
Involves a whole school approach
Actively involves parents
Conveys meaningful and useful information.
TOPIC 10 – ISSUES AND CONCERNS RELATED TO ASSESSMENT IN MALAYSIAN
PRIMARY SCHOOLS
EXAM ORIENTED SYSTEM
Current education system is too examination oriented and over-emphasized rote
learning with institutions of higher learning fast becoming diploma mills.
Too focused on public examinations results as important determinants of students’
progression to higher levels of educations or occupational opportunities.
PTS (defunct) > UPSR > PMR > SPM/STAM /STPM
Summative national examinations itself should not have any negative impacts on
students – the issue is that they do not currently test the full range of skills that the
education system aspires to produce.
LP has started some reforms – PBS (SCHOOL BASED ASSESSMENT)
o School assessment – assesses students learning. Test questions and marking
schemes are developed, administered, scored and reported by school teachers
based on LP guidance.
o Central assessment – developed wholly by LP itself.
o Psychometric assessment – aptitude tests and personality inventory to assess
students’ skills, interests, aptitude and attitude and personality. Assess
students’ innate ability and acquired abilities.
o Physical, sports and co-curricular activities assessment – refers to
assessments of student performance and participation in physical and health
education, sports, uniformed bodies, etc.
o This new format enables broader assessment range over longer period of time.
o Teacher can focus more on delivering meaningful learning as stipulated in the
curriculum.
SCHOOL-BASED ASSESSMENT
Traditional assessment methods no longer adequate in today’s setting.
School-based assessment have immense potential in terms of flexibility and validity, but
have potential drawback in terms of reliability, quality control, and assurance.
Recommended because of the gains in validity which can be seen when students’
performance on assessed tasks can be judged in a greater range and context and more
than is possible in time limited tests.
Malaysian SBA context
o Standard –referenced assessment
o Holistic
o Integrated
o Balanced
o Robust
Components
o Academic
School assessment (Performance Standards)
Centralized Assessment
o Non-academic
Physical activities, co-curricular,
Psychometric tests
ALTERNATIVE ASSESSMENTS
Procedures that differ from traditional notions.
PORTFOLIOS
Contain:
o Introductory section
o Academic works section
o Personal section
o Assessment section
SELF ASSESSMENT AND PEER ASSESSMENT
Remind learners that they are not working in isolation.
Help create a community of learners
Improve the product
Improve the process
Help learners to be reflective
Stimulate metacognition
TOPIC 8 – ITEM ANALYSIS AND INTERPRETATION
top related