study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for...

43
Quantitative L2 Research (1990– 2010): A Methodological Synthesis and Call for Reform By Luke Plonsky (2014) The Modern Language Journal, 98 (1), 450- 470 Mahsa Farahanynia

Upload: mahsa-farahanynia

Post on 11-Apr-2017

141 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

Study Quality in Quantitative L2 Research (1990–2010): A

Methodological Synthesis and Call for Reform

By Luke Plonsky (2014)

The Modern Language Journal, 98 (1), 450-470

Mahsa Farahanynia

Page 2: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

Abstract This article builds on the growing line of inquiry into methodological practices in quantitative second language (L2) research. Specifically, the study uses synthetic techniques to examine changes over time in research and reporting practices. Purpose of the study

606 primary reports of quantitative L2 research from two journals - Language Learning and Studies in Second Language Acquisition -were surveyed on different design features, statistical analyses, and data reporting practices. Method

Frequencies and percentages of each feature were then calculated and compared across the 1990s and the first decade of the 2000s to examine changes taking place in the field. Data analysis

The results indicate numerous changes including increases in sample sizes, delayed posttesting, and the availability of critical data such as effect sizes, reliability estimates, and standard deviations to accompany means. With respect to statistical procedures, the range of analyses has not changed, and the field continues its unfortunate reliance on statistical significance. Results

The findings are grouped according to three themes, which are discussed in light of previous reviews in this and other fields: (a) means-based analyses, (b) missing data, null hypothesis significance testing, and the “power problem,” and (c) design preferences. The article concludes with an extended call for reform targeting six groups of stakeholders in the field. Most notably, an argument is made for field-specific methodological standards and enhancements to graduate curricula and training. Discussion

Page 3: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

1. Introduction

Research synthesis and meta analysis have enhanced L2 researchers’ awareness of methodological quality such as transparency in data reporting, value of replication, and the relative contributions of statistical significance.

Meta analysis studies have identified flaws in designs (lack of pretest) and in data analysis and reporting (missing reliability and SD), which are threats to validity.

Page 4: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

1. Introduction

Purpose of the study

1. To evaluate L2 research and reporting practices

2. To provide direction for methodological reform targeted toward different stake-holders

3. To indicate areas in need of further methodological research

Page 5: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

2. Background and motivation for the current study

Much of the existing research on study quality stemming from the meta analytic literature does not address all possible dimensions of quality. Many of their measures of study quality have been employed to weight effect sizes prior to combining them via meta analysis.

Present study does not combine effect sizes nor does it assign composite quality scores to individual studies; however, the previous measures greatly informed the design of the instrument for assessing study quality, and the definition of methodological quality adopted in this study.

Methodological quality is defined as:

(a) adherence to standards of contextually appropriate, methodological rigor in research practice

(b) transparent and complete reporting of such practice.

Page 6: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

The measures and definition might be criticized based on three accounts:

1. They lack the specificity to address certain subdomain-specific issues, i.e., though being broad and field specific, they are not relevant to all studies or areas within the scope of L2 research.

2. The focus is just on quantitative research practices. 3. The notion of study quality can and should also be examined

from a substantive (i.e., subdomain- and/or context-specific) rather than exclusively methodological perspective.

There are certain domains wherein it would be difficult to align the use of synthetic and particularly meta-analytic methods.

2. Background and motivation for the current study

Page 7: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

2. Background and motivation for the current study: Concerns

Meta-analyses and other reviews mention two concerns:

1. Quality related concerns

1) Designs centered around a) lack of pretesting and delayed posttesting in (quasi-) experiments, b) the lack of random assignment of experimental conditions, and c) small samples suggesting low statistical power

2) Reporting practices centered around missing and unreported data such as a) SD to accompany mean, b) instrument and rater reliability, c) effect size estimates limit the interpretability of the primary study results and result in the exclusion of valuable data from meta-analytic reviews

2. Methodological related concerns:

They centered around methodological practices and trends in the field, for example design features such as statistical analysis or research context (class vs. lab) and whether or not studies were experimental or observational.

Page 8: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

2. Background and motivation for the current study: Response to the concerns

Plonsky and Gass (2011), Plonsky (2013) (both of which

had a great influence on this study, and have parallel

designs and findings), and current study have been an

attempt to respond to such concerns

Page 9: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

2. Background and motivation for the current study: About studies Polansky and Gass (2011)

Used 174 primary study belonging to interaction (one specific area) and interactionist research with cognitive, postpositivism, and quantitative approach

Based on Manual search of 14 journals Purpose: To describe and evaluate quantitative research and reporting practices

Plonsky (2013) Used a modified version of their instrument to check the generalizability of their

findings Used a sample of 606 reports of quantitative L2 research in 2 journals Purpose: To describe and evaluate quantitative research and reporting practices Encompassing a much more diverse set of theoretical and methodological approaches

(compared to Polansky and Gass’ study) Current study

Its focus is not on overall practices but on methodological changes over the last decades and it has a strong concern for the future of the field (by discussing call for reform)

Page 10: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

2. Background and motivation for the current study: Previous studies

The report of previous studies is discussed based on

1. Design2. analysis,3. Reporting practices4. Changes over time

Page 11: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

2. Background and motivation for the current study: Design

Four design features related to methodological quality (in Polansky and Gass [2011] & Plonsky [2013]):

1) Random assignment to experimental conditions,

2) Inclusion of a control or comparison group,

3) Pretesting,

4) Delayed post-testing.

Findings:

5) Random assignment was not common esp. among in classroom-based studies, even in lab-based research (78% in interaction studies and 48% in broader domain)

6) A majority of studies used a control or comparison group and included pretest. Those with no pretest, are justified through random assignment; however, it was even uncommon in Plonsky’s sample of lab-based experiments.

7) The use of delayed post test was very mixed, ranging from 29% in Plonsky’s sample of lab studies to 81% in Polansky and Gass’ (2011) sample of classroom- based studies.

8) Average sample size of around 20 produces a debilitating effect on statistical power in the domain.

Page 12: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

2.Background and motivation for the current study: Analysis

Polansky and Gass (2011) & Plonsky (2013) focused on reviewing different statistical procedures. General findings:

1) The most frequently used statistics are those that compare means.

2) Correlation-type statistics, including multiple regression, were somehow regular, while other multivariate statistics were scarce (The order: ANOVA –correlation- regression).

3) Most studies employ multiple unique analyses (t-tests + correlation). Specific findings:

1) The relatively narrow range of analyses employed.

2) Within this limited set of procedure, nearly all are related to mean scores and therefore, to general linear model (ANOVA as subpart of multiple regression), which eludes our understanding and interpretation.

3) Discrepancy between analyses employed in one subdomain of research as compared to a broader cross-section of the field.

Page 13: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

2.Background and motivation for the current study: Analysis (cont.)

Plonsky (2013): It is due to the fact that L2 researchers force independent variables (e.g., motivation) into nominal data (e.g., high and low) in order to enable comparisons on a continuous dependent variable.

Both studies calculated post hoc power based on observed sample sizes and effect sizes. The results (.56 in the interactionist research and .57 in L2 research overall) indicated that much of the field’s efforts have been underpowered and therefore, unreliable.

Page 14: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

2.Background and motivation for the current study: Data reporting practices

Three categories (in Plonsky [2013]):

1) Descriptive statistics,

2) Data related to inferential statistics,

3) Additional reporting practices associated with transparency and/or recommended or required the 6th edition of APA.

Findings [both studies]:

4) Lack of basic descriptive statistics reported (31% studies do not report SD, 20% studies do not report means before conducting t-test or ANOVA or do not report the t or F values, which is more frequent with no significant results which shows a publication bias towards resulting in p values less than 0.05). There was a bias in reporting effect size as well, when p≥ 0.05, it was omitted.

5) Regarding the report of instrument and rater reliability, both found high estimates (64% and 45% of the sample)

6) Other reporting practices related to quality and transparency were very rare (such as confidence intervals, evidence of having checked statistical assumptions, and power analyses)

The omission of such information can limit interpretability and introduce upward bias in the results at the meta-analytic level.

Page 15: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

2.Background and motivation for the current study: Changes over time

Findings (in Plonsky & Gass [2011]): Increases in the use of features typically associated with experimental

quality, such as random group assignment and delayed posttest. Increases in reporting p values, checking of statistical assumptions, and effect sizes, means and SD, and test statistics such as t and f values.

Shift towards more classroom-based research as researchers seek to generalize their findings from lab studies to classrooms where there is less control and more validity

Potential sources of methodological changes:

Editorial policy, new guidelines of learned societies such as APA, or the psychonomic society, influential publications, amendments to graduate curriculum in research methods, and shift in substantive foci (e.g., the move towards psycholinguistic construct and processes in recent years).

Page 16: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

3. Research question

The present study seeks to describe and evaluate the development of

L2 methods over the last two decades. To this end, the following

research question was addressed:

RQ: To what extent have design features, statistical procedures, and

reporting practices in quantitative L2 research changed over time?

Page 17: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

4. Method: Study identification and retrieval Step 1: To select journals, he considered Vanpatten and William’s (2002) list

of 15 journals that regularly publish L2 research. Step 2: After consulting journal description, he reduced the list to four ones

(Language Learning, Modern Language Journal, Second Language Research, and Studies in Second Language Acquisition) that focused primarily on second language learning (as apposed to language teaching or technology)

Step 3: Exclusion of Modern Language Journal (due to its wider range of interest in L2 pedagogy) and Second Language Research (due to being too narrow focusing more on psycholinguistic techniques and/or L2 morphosyntax).

Step 4: Language Learning and Studies in Second Language Acquisition have broad range of topics; so, the sample studies were reduced to the areas including L2 morphosyntactic development, vocabulary, reading, writing, listening, speaking, pronunciation, individual differences, processing, perception, automaticity, pragmatics, sociolinguistic competence, instructional effectiveness, study abroad, task complexity, translation, interaction, and assessment.

Page 18: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

4.Method: Study identification and retrieval

All quantitative primary L2 studies published in Language Learning (n=327) and Studies in Second Language Acquisition (n=279) from 1990-2010.

Limitations:

1) Although neither journal quality nor any other index for journal prominence such as impact factor were considered in this selection, they are highly rated L2 journals, according to Vanpatten and William (2002), and the sample present an overly positive view of methodological quality.

2) Other substantive foci or subdomains that appear more often in these two journals than in other L2 journals may have produced a sample that lacks even representation of all areas of quantitative L2 research.

Page 19: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

4.Method: Coding and analysis Coding:

Five different categories of data (based on a protocol devised by Plonsky [2011b] and adopted by Plonsky &Gass [2011]):

1) Study identification (e.g., year of publication, journal),

2) Design (e.g., random group assignment, pretesting),

3) Analyses (e.g., correlation, t-test),

4) Reporting of data (e.g., reliability coefficients, means),

5) Outcomes (e.g., effect sizes). Coding scheme items were dichotomous in order to readily assess the

presence /absence of methodological features and to reduce rater inference. Analysis:

After data collection and coding, frequencies and percentages for design type/features, analyses, and reporting practices for studies published 1990-1999 and 2000-2010 were calculated and compared.

Page 20: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

5.Results: Designs Design-related changes over time:

Figure 1: Increase in classroom-based and experimental research Decrease in lab-based and observational research

Figure 2: Increase in pretest and posttest use in experimental studies Decrease in the use of random assignment and control group in experimental studies

Page 21: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

5.Results: Designs Change in number of participants over time:

Table 1: Median increased by 11%, which indicates an increase in statistical

power Number of subgroups per study rose a bit (4%) which may mitigate the

potential increase in power.

Page 22: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

5.Results: Analyses

Figure 3: Most tests were observed greater frequency over time Most frequent ones in both decades: t-test and ANOVA Increase dramatically: t-test and ANOVA Increase moderately: chi-square, regression, MANOVA,

ANCOVA, SEM, MANCOVA, Similar: Rasch model, nonparametric tests Decrease: correlations, factor analyses, and discriminant function analyses.

Page 23: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

5.Results: AnalysesFigure 4: Regarding the number of statistical analyses, L2 researchers are moving toward using a

wider variety of analyses in each study.

Page 24: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

5.Results: AnalysesTable 2: The frequency of statistical testing as measured in reported p

values increased from one decade to the next which shows a loss of statistical power and greater likelihood of statistically significant findings due to chance (Type I error).

Page 25: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

5.Results: Reporting practices (descriptive statistics)

Figure 5: Slight increase: sample sizes, frequencies, percentages, Dramatic increase: mean, SD,

confidence intervals, and effect sizes Slight decrease: correlations Dramatic decrease: mean without SD

Page 26: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

5.Results: Reporting practices (inferential statistics )

Figure 6: Increase in reporting: exact p value (as recommended by APA),

relative p value, F value, t value, chi-square value,

Decrease in the number of studies that are consistentin reporting either exact or relative p value.

Decrease in the percentage of studies that report ANOVA/t-test (means-based tests) without mean, without SD, or without F or t values.

Page 27: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

5.Results: Reporting practices (other practices)

Increase in reporting: research questions, visual display of data, reliability coefficient, a predetermined level of statistical significance, checking of statistical assumptions, and power analyses (which was absent before 2000).

Page 28: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

5.Results: Summary of results1. A slight shift from observational to experimental studies and from lab-to

classroom research

2. Decrease in random assignment and inclusion of control group in experimental studies, and increase in the inclusion of pretests and posttests

3. Increase in sample size; however, increase in number of statistical tests and subgroups, which mitigate any possible increase in statistical analysis.

4. No big change in variety of statistical analyses, increase in number of unique tests along with the number of tests of statistical significance per study, which led to decrease in overall power

5. Significant improvement in reporting practices: The greater inclusion of mean, SD, F and t values, effect sizes, confidence intervals, exact p values, research questions, visual display of data, reliability coefficient, a predetermined level of statistical significance, checking of statistical assumptions, and power analyses.

Page 29: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

6. Discussion Limitations:

1) Just collecting data from two journals,

2) Although presenting a wide range of interests in L2 research, the perceived rigor associated with these journals and their low acceptance rates may present a biased picture of methodological practices in the field,

3) The very notion of study quality as defined in this study,

4) Presenting and discussing findings only in their aggregate form.

In order to discuss findings, following Plonsky (2013), three major themes are selected:

1) Means-based analysis,

2) Missing data, null hypothesis significance testing (NHST), and power problem,

3) Design preferences.

Page 30: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

6. Discussion: Means-based analysis Change 1: Means-based analysis not only dominate quantitative L2 research

but also appear to be occupying an increasingly large share of the analyses over time.

Depending so much on ANOVA and t-test in L2 research is not problematic if1) They are used appropriately, 2) Assumptions such as normality are met,3) Results are reported faithfully and thoroughly.

Regarding 2 & 3: Mostly go unmet Regarding 1: L2 research is better to consider education and psychology in

which there has been a shift from simple means-based analysis toward multivariate analyses and regression along with descriptive statistics and visual display of data.

Page 31: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

6. Discussion: Means-based analysis Moving towards multiple regression and MANOVA has two benefits:

1) At the statistical level, these analyses address multiple relationships between variables simultaneously thus decreasing the need for substantial testing and preserving experiment-wise power.

2) At the conceptual level, these analytical approaches better reflect the complex multivariate nature of the constructs measured, much of which examine the relationship between multiple dependent and independent variables.

This shift has challenges:

1) Lack of sufficient training (need for graduate curriculum related to methodological training),

2) Lack of texts deeply discussing multivariate statistics.

Change 2: The diversity within individual studies has increased, i.e., a shift in the number of

different types of questions and objectives posed by each study.

Change 3: Increase in the use of inferential statistics

Page 32: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

6. Discussion: Missing data, NHST, and power problem Sources of power problem (Plonsky, 2013):

1) Necessary data are missing,

2) It happens mostly when results are not significant,

3) L2 research relies heavily on the flawed practices of NHST (as opposed to focusing on practical significance),

4) The combined effect of 2 & 3 along with small samples and scarce concern over statistical assumptions and scarce use of power analyses.

This study shows reporting descriptive statistics, effect sizes, CI, etc. has boosted, which may mitigate power problems.

Despite progress missing data among descriptive statistics is common which makes it hard to interpret primary studies and to synthesize them via meta-analysis and introduce some sort of bias toward some statistically significant effects

Related to the presence of bias toward statistically significant results is the default of NHST as primary-and only- means of analyzing and interpreting quantitative data in L2 research.

There is a trend toward practical significance and synthetic-mindedness (increase in reporting effect sizes); however, they do not interpret their meaning (that shows its scale free nature) and just stick to Cohen’s benchmarks (while it is field specific).

Page 33: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

Along with greater availability of effect sizes, two sources of bias appear:

1) Difference between earlier and more recent studies (due to factors such as theoretical maturity, subtlety of analysis improvements in research designs and instrumentation) may produce a heavy set of primary effects and bias across studies.

2) Different reporting practices across journals and greater meta-analyzability in one journal over another , e.g., Language Learning and MLJ are better targets because of their stated policy in reporting effect sizes.

Difference between Language Learning (LL) and SSLA: 1) articles in LL are generally based on slightly larger samples but almost twice as many statistical test are conducted based on data from these samples, 2) research published in LL is almost 50% more likely to have been conducted with intact classes whereas SSLA publishes a greater number of lab studies.

Future research focus on differences across journals to indicate their identities and methodological orientations.

6. Discussion: Missing data, NHST, and power problem

Page 34: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

To mitigate power problem: A step forward is to employ larger sample. Of course, in L2 research small sample is typical due to number of participants available.

Obtaining larger study-wise and group-wise samples will introduce various logistic and financial constraints. There is a large tension between obtaining large samples (and sufficient statistical power) and preserving ecological validity.

It is better to work with necessarily small samples to limit their use of inferential statistics. In these cases, their foci should be on descriptive statistics such CI and effect sizes.

6. Discussion: Missing data, NHST, and power problem

Page 35: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

6. Discussion: Design preferences Experimental studies have increased in quantitative research. Observational studies are

still in the majority. Such change shows a field-wide change in the type of relationships suggested in models

of L2 learning, use, and teaching, and shows the maturity of our domain. The trends for design features associate with experimental quality are mixed. Here, we

have a decrease in control groups and random assignments, contrary to what Plonsky and Gass found.

Rise in pretest is not surprising since random assignment decreased (a kind of tradoff). Increase in delayed posttest might reflect development occurring in substantive areas

during the period studied, and in more mature areas, the interest has been on testing the longevity of experimental effects.

Three related challenges of this study:

1) Overwhelming presence of convenience sampling,

2) The need for clearer and more explicit definitions of the populations of interest,

3) Undersampling of important learner demographics such older adults, children, low literacy, true beginners.

Page 36: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

6. Discussion: A call for reform The purpose of this study was not only to look back but ahead as

well, not just concerning future studies related to methodological quality but also providing implications for the field of applied linguistics.

The comments are directed toward different groups of stakeholders:

1) Individual/primary researchers

2) Journal editors

3) Meta-researchers

4) Graduate curriculum committees and researcher trainers

5) Grant-funding agencies and their reviewers

6) American association for applied linguistics

Page 37: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

6. Discussion: A call for reform to individual/primary researchers

1. When planning study, consider power. Use an estimate of the anticipated effect size from a related study or meta-analysis to determine an appropriate sample size. consider sample size and its inverse relationship with sampling error when interpreting results

2. Be skeptical of p value: 1) a sig. result with a very large sample is meaningless (since group difference is due to large sample), 2) a sig result with small sample is unreliable (it is due to high sampling error), 3) the use of small sample to study small effect leads to the overestimation of statistical significance

3. Calculate and report effect sizes and explain what they mean.

4. Calculate and report reliability

5. Report thoroughly (SD with mean, exact t or f value, exact p value)

6. Consider regression analyses is an appropriate approach to your data rather than comparing group means

7. Make a team consisting of both experimental and observational researchers

8. Work towards an in-depth understanding of one or more specialized research techniques or statistical analysis.

Page 38: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

6. Discussion: A call for reform to journal editors

They should ensure that methods in published L2 research are sound enough to contribute to L2 theory/practice. To do so:

1. Use your power to influence and improve research practices, follow your guidelines strictly, consult other editors and organizations to devise good guidelines,

2. Consider that sole report of effect size is not enough,

3. Demand consistency across and within papers,

4. Include methodological review as part pf review process.

Page 39: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

6. Discussion: A call for reform to meta-researchers

1. Use your voice to make known the weak and strong research practices,

2. Do more than summarize. Examine relationships not addressed sufficiently in primary studies,

3. Examine methods in primary studies to explain variance in effects as well (ask questions such as were most of the samples were very small? Were they carried out in lab, class, or both?...),

4. Examine changes in effects over time for bias,

5. Use your findings to provide guidance to future studies’ efforts to interpret their results,

6. Cast net for primary studies. They may vary in quality and you can address their empirical problem,

Page 40: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

6. Discussion: A call for reform to graduate curriculum committees and researcher trainers

1. There has been a growth in statistical know-how but there is still plenty of room for growth,

2. Though in this article the emphasis is on multiple regression, but ANOVA is still common. So train the students to test assumptions of ANOVA, use, report, and interpret the results and effect sizes,

3. Emphasize the importance of understanding, interpreting, reporting descriptive statistics,

4. Emphasize the importance of and relationship among power, sampling error, effect sizes, and statistical significance,

5. Emphasize that a single study can not provide a conclusive answer to a question,

6. Graduate curriculum committees should encourage their students to take more specialized courses in research methods and statistics.

Page 41: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

6. Discussion: A call for reform to grant-funding agencies and their reviewers

They determine what proposals must be funded (such as TESOL International Research Program and language learning Grants Program)

They should determine and state a clear set of methodological standards (e.g., regarding study design and statistical power)for grant proposals.

Page 42: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

6. Discussion: A call for reform to American association for applied linguistics (AAAL)

Though focusing on research and policy related to substantive matters, AAAL has been silent regarding how applied linguistics can be conducted.

AAAL can designate a task force to conduct methodological standards for L2 research and establish field-specific norms for conducting and reporting on L2 research.

The task force or committee consists of: at least one member of the executive committee of AAAL, members from editorial boards of applied linguistics journals, a small number of both quantitatively and qualitatively minded researchers, and one or more methodologists.

Page 43: Study quality in quantitative l2 research (1990–2010) a methodological synthesis and call for reform

7. Conclusion

The findings have indicted a number of methodological improvements taking place in field.

The trends over time seems to be optimistic about future. We are required to have a systematic, field-wide

methodological reform. To do so, the field must:

1) develop and enforce methodological standards in l2 quantitative L2 research,

2) improve methodological training and graduate programs, aligning the curricula to the field’s standards.