automated writing evaluation: enough about reliability...

25
Automated Writing Evaluation: Enough about reliability! What really matters for students and teachers? Jooyoung Lee, Zhi Li, Stephanie Link, Hyejin Yang, Volker Hegelheimer Saturday, September 22, 2012

Upload: ngohuong

Post on 07-Sep-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

Automated Writing Evaluation: Enough about reliability! What really matters for students and

teachers?

Jooyoung Lee, Zhi Li, Stephanie Link, Hyejin Yang, Volker Hegelheimer

 Saturday, September 22, 2012

Beyond reliability

Automated essay scoring (AES)

PEG IEA Intelli-Metric

E-rater

Correspondence with human

rating

61-87% 77-89%

70-85% 85-91% 96-98%

87-97% (Agreement)

45-59% (Exact agreement)

User-centric evaluation

Development of NLP tools for writing-AES

1966

AES for Testing context

late-1990s

AWE for Classroom use

2003

System-centric evaluation

Literature Review: Needs Analysis

•  Academic writing for international graduate students (U of Hawaii)

•  Process skills •  Pre-writing, editing/revising

•  Computer skills

•  Getting help, finding & using resources

•  Discourse/rhetorical skills •  Field-specific research paper (sections of the paper)

•  Posing research questions (finding niche)

•  Grammar “patches”, hedges, connectors

•  Style/appropriacy

•  Bibliographies/citing/plagiarism (Negretti, 2001)

What research says about AWE •  Automated Writing Evaluation tools provide both numerical

scores and formative feedback

•  Positive findings: •  Motivation (Grimes & Warshauer, 2006)

•  Grammar (Chodorow et al., 2010)

•  Rhetorical development (Cotos, 2011)

•  Negative findings: •  Great focus on grammatical and mechanical aspects

•  Losing sense of audience (CCCC, 2004)

Motivation (Gap) for Study •  Lack of previous studies that investigated stakeholders’ actual

needs

•  Inconsistent opinions between AWE users

•  Goal: to investigate what the students and teachers actually need in ESL writing classes and how AWE can meet their needs

Research Questions

1.  What are the needs of students and teachers in the ESL writing curriculum?

2.  What are stakeholders’ views of the current status of AWE?

3.  In what ways can AWE improve to meet the needs of students and teachers?

Methodology: setting

§  ESL curriculum at Iowa State University

§  The purpose of English 101 curriculum is: •  To prepare undergraduate non-native speakers of English for success in

various written assignments in academic context

•  To prepare them for English 150: first-year composition

Methodology: participants

•  Coordinators (N = 3)

•  One coordinator was also a 101 teacher

•  Teachers (N = 6)

•  Experienced & inexperienced users of AWE

•  Students (N = 167)

•  Experienced: 72 Inexperienced: 95

Methodology: Criterion

Methodology: data collection and analysis

§  Diverse participants

§  Questionnaires (Descriptive statistics) •  1 questionnaires for experienced students •  1 questionnaire for new students •  1 questionnaire for teachers

§  Interviews (A priori à inductive coding)

•  3 interviews w/ coordinators- 30-60 minutes

§  Feedback tool analysis

(Long, 2005)

RQ1: Needs for students

Students’ view Teachers’ views Coordinators’ view

Local features

q  Expressions q Grammar q Organization q  Content q  Process writing

Global features

q  Learner autonomy q  Skills in applying

feedback q  Skills for writing

improvement (esp. in content development and organization)

Global features

q  Learner autonomy q  Skills in applying

feedback q  Strategies for process

writing q Access to ample

amount of feedback q Genre awareness q  Focused feedback

Using reference material, reading for information, synthesizing it to their own essay, and then making their judgment, that is independent learning and if they

can do that on their own [that would be the ultimate goal]   (Teacher 2)

“I would say that [students] need little pieces that they need to learn and to not

learn everything at once.” (Coordinator 1)

RQ1: Needs for teachers

Teachers’ views Coordinators’ view

q  Evaluating writing q Helping students with grammar q Assisting students in becoming

independent learners. q Other practical needs (platform for

learner community and peer review, integration w/ course management system)

q  Reducing workload q Understanding how to provide

feedback q Knowing what feedback to provide q  Providing feedback and in a

manageable fashion q  Effectively implementing and

integrating technology into classrooms

That fit it really nicely with my preconception of Criterion removing some of the workload of the teachers, so that teachers end up reading better papers.

That’s how I envisioned it initially. (Coordinator 3)

Teachers need help with:

RQ2: Student Views of Criterion

Q: Do you think Criterion helped you write your  argumentative paper?

N (=72) %

Yes 63 87.5

No 9 12.5

“I think Criterion cannot really give me some suggestion, so I hope my instructor can give me more suggestions after he/she finish reading my

paper.”

RQ2: Student-Teacher Views of Criterion

Major functions on Criterion

Experienced Teacher (N=3)

M (SD)

Experienced Student (N=72)

M (SD)

Inexperienced Student (N=95)

M (SD)

Feedback on Grammar 5 (1) 4.66 (1.00) 4.74 (1.03)

Feedback on Usage 5 (1) 4.40 (0.87) 4.59 (1.09)

Feedback on Mechanics

5.33 (0.58) 4.19 (0.87) 4.76 (1.08)

Feedback on Style 4 (1) 3.76 (1.20) 4.38 (1.19)

Feedback on Organization and Development

2.67 (0.58) 3.63 (1.22) 4.52 (1.08)

RQ2: Teacher Views of AWE

Overall positive view with some inconveniences

§  Positive •  “The reason that I give high ratings to Grammar, Usage, and

Mechanics is that I believe students can benefit from them if they pay attention to them.” (teacher 1)

•  “As Criterion provides feedback repetitively, I hope it can help students learn how to improve their writing skills by themselves”. (teacher 3)

§  Negative •  “Although Criterion enables students to save their drafts, one

pitfall is that students can only save the very first and the last drafts, which are not good for students and teachers.” (teacher 4)

RQ2: Teacher Coordinators’ Views of AWE

Likert scale items (1 = not useful, 6=very useful)

Coordinator and Teacher

1

Coordinator 2

Coordinator 3

Grading students' essay 1 1 5

Giving feedback to students 4 5 3

Setting up assignments 1 3 5

Receiving and collecting students' essay 1 3 5

Tracking students' progress 1 4 2

Reducing workload in terms of grading and feedback giving 1.5 3 2

RQ2: Coordinator Views of AWE

•  Changes in his attitudes

“My thought was Criterion should be able to alleviate some of the pressure on teachers…hoping that it would remove or take a way some...of the grading burden on the side of teachers. Based

on some of the things we’ve looked at, some of the problems that students had with, or teachers had with Criterion…some of

the inconsistency in terms of grading, recognizing some mistakes, and some of your recent findings .. I’m beginning to

doubt as to whether or not it really helps instructors. I don’t know yet. I’d like to learn more about it. I’m not convinced as I

once was about utility of it. I still think there is..but I think I have to take a deeper look at it.” (Coordinator 3)

RQ3: Suggestions for future AWE Based on current needs and stakeholders’ views

Student Teachers Coordinators

Suggestions q Organization/

style

q  Feedback is not comprehensible

q Utility (e.g. save draft / feedback; pop-up notes)

q  Learner / Teacher training (tech support / material support)

q  Focus more on focused feedback (treatable errors)

“I wish they could see the submissions of other students. I wish they had a feature for peer review.” (Coordinator 1)

RQ3: Suggestions for future AWE

“[Students] need other entity to tell them about their writing to make them look at their writing

again; there’s some good feedback that ESL students can benefit from; it’s not perfect but pretty

good at it.” (Coordinator 2)

Chodorow’s study (2010) citation -> “articles / prepositions” HYEJIN please revise

Implications

•  Feedback Categories (Ferris, 2001)

  AWE Feedback Treatability

fragment, missing comma treatable

run-on sentences treatable

garbled sentences treatable

SV agreement treatable

ill-formed verb treatable

pronoun errors possessive errors

AWE Feedback Treatability

article errors less treatable

confused words wrong/missing words wrong form of word treatable

faulty comparison nonstandard word form negation error preposition error less treatable

Implications

Error gravity (WHO IS THIS?, ####)

Stakeholders’ needs and AWE Students’ needs

q  Expressions q Grammar q Organization q  Content q  Process writing q  Learner autonomy q  Skills in applying feedback q  Skills for writing improvement

(esp. in content development and organization)

q Access to ample amount of feedback q Genre awareness q  Focused feedback

Teachers’ Needs

q  Evaluating writing q Helping students with grammar q Assisting students in becoming

independent learners. q Other practical needs q  Reducing workload q Understanding how to provide

feedback q Knowing what feedback to provide q  Providing feedback and in a

manageable fashion q  Effectively implementing and

integrating technology into classrooms

Implications – please write a short note to connect the checklist ….

•  “I think [AWE] should be used and we should figure out how to best use it. It may not be perfect for everybody but there is a better way of using it. We just have to find out...so I’m not ready to give up on it.” (Coordinator 3)

Thank you! Questions/Comments?

E-mail: ___

http://volkerh.public.iastate.edu/awe

References •  CCCC. (2004). Position statement on teaching, learning, and assessing writing in digital environments. Retrieved from

http://www.ncte.org/cccc/resources/positions

•  Chodorow, M., Gamon, M., & Tetreault, J. (2010). The utility of article and preposition error correction systems for English language learners: Feedback and assessment. Language Testing, 27(3), 419-436.

•  Cotos, E. (2011). Potential of automated writing evaluation feedback. CALICO Journal, 28(2), 420-459.

•  Grimes, D., & Warschauer, M. (2006, April). Automated essay scoring in the classroom. Paper presented at the American Educational Research Association, San Francisco, California.

•  Grimes, D., & Waschauer, M. (2006). Automated essay scoring in the classroom, Paper presented at the American Educational Research Association.

•  Grimes, D. & Warschauer, M. (2010). Utility in a Fallible Tool: A Multi-Site Case Study of Automated Writing Evaluation. Journal of Technology, Learning, and Assessment, 8(6). Retrieved [date] from http://www.jtla.org.James, C. L. (2006 ). Validating a computerized scoring system for assessing writing and placing students in composition courses. Assessing Writing, 11(3), 167-178.

•  Long, M. (2005). Methodological issues in learner needs analysis research. In H. Long (Ed.), Second Language Needs Analysis. Cambridge: Cambridge University Press.

•  Vann, R. J., Meyer, D. E., & Lorenz, F. O. (1984). Error gravity: A study of faculty opinion of ESL errors. TESOL Quarterly, 18(3), 427–440.