1 amy rubinstein, ph.d., scientific review officer adrian vancea, ph.d., program analyst office of...

1

Amy Rubinstein, Ph.D., Scientific Review Officer

Adrian Vancea, Ph.D., Program Analyst

Office of Planning, Analysis and Evaluation

Study on Direct Ranking of Applications: Advantages & Limitations

2

• Applications are assigned to 3 reviewers who provide preliminary impact scores (1-9) and critiques.

• After panel discussion of each of the top 50% of applications, all panel members vote on a final overall impact score.

• Each application’s score is derived from the average of all panel members’ votes and multiplied by 10 (resulting in final scores of 10-90).

• R01 applications are assigned a percentile based on the scores of applications reviewed in the relevant study section in that round and the previous 2 rounds.

Current System for Evaluating and Ranking Applications Reviewed in CSR Study Sections

3

• The number of applications reviewed by NIH is at or near historic highs and award rates are at historic lows.

• It can be difficult to differentiate between the top 1-20% of applications reviewed in study sections using raw scores.

• Concerns about the potential for an application to be funded results in compression of scores in the 1-3 range (final scores between 10 and 30).

• The current system of percentiles is used to rank applications reviewed in different study sections. However, score compression results in many applications with the same percentile, making funding decisions more difficult.

Why Consider Direct Ranking?

4

1%: 10-13 11%: 25 21%: 33

2%: 14-15 12%: 26 22%: 34

3%: 16-17 13%: 27 24%: 35

4%: 18-19 14%: 28 25%: 36

6%: 20 16%: 29 27%: 37

8%: 21-22 17%: 30 29%: 38

9%: 23 19%: 31 31%: 39

10%: 24 20%: 32 33%: 40

Percentile Base ReportCouncil Date: 2015/01 IC: CSR

6

• Reviewers not be forced to give applications higher (worse) overall impact scores than they think the applications deserve.

• Reviewers required to distinguish between applications of similar quality and separate the very best from the rest.

• Reviewers have the opportunity to re-rank applications after hearing the discussion of all applications, something that is less practical with the current system.

Potential Advantages of a Rank Order Method

7

• New Investigators are reviewed in a separate cluster but must be integrated into the final rank order of applications that are reviewed.

• Applications cannot be ranked with respect to applications in the previous two rounds as is done with the percentile system.

• Reviewers in study sections that cover highly diverse scientific areas may find direct ranking more difficult.

• Private ranking may lack the transparency of the current system where reviewers who vote out of the range set by assigned reviewers must provide justification during or after the discussion.

Challenges Associated with Direct Ranking

8

• Carried out in parallel with the current review system in the 2014_10 and 2015_01 council rounds.

• Applications were scored as usual; reviewers were asked to privately rank their top 10 R01 applications discussed on a separate column on the score sheet.

• Rank data was analyzed for informational purposes and not used to influence funding decisions.

Pilot Study for Direct Ranking of Applications

9

• 32 chartered scientific review groups (SRGs) from the 2014_10 and 2015_01 council rounds

• Number of discussed R01 applications per SRG ranges from 12 to 39 (average 26.12)

• Number of reviewers per SRG ranges from 13 to 31 (average 22.97)

Participating Study Sections

10

• Measure correlation between the percentiles/scores and direct ranking results

–Each application has an associated percentile/score–Associate an “average rank” with each application–Expect good correlation

• Propose a method for breaking up ties using the ranking results

• Visualize correlation between ranking and percentiles.

Data Analysis

11

Application

Rev1Score

Rev1Rank

Rev2Score

Rev2Rank

Rev3Score

Rev3Rank

Rev18Score

Rev18Rank

Rev19Score

Rev19Rank

Prior Score

Percentile

1 1 1 1 2 1 1 … … 1 6 1 2 10 2

2 2 3 1 4 2 2 … … 1 NR 1 NR 13 4

3 5 NR 5 NR CF CF … … 5 NR 6 1 52 52

4 1 2 1 1 CF CF … … 1 1 CF CF 10 2

5 2 6 2 5 4 6 … … 1 NR 2 5 25 9

6 2 5 1 3 NP NP … … 1 4 CF CF 16 6

… … … … … … … … … … … … … … …

15 3 8 5 NR 4 NR … … 2 3 5 NR 40 36

16 5 NR 5 NR 4 NR … … 5 NR 5 NR 49 48

17 4 NR 4 NR 4 NR … … 4 NR 4 8 41 40

NP = Not present, CF = Conflict, NR = Not ranked

Source Data Format

12

Application

Rev1Score

Rev1Rank

Rev2Score

Rev2Rank

Rev3Score

Rev3Rank

Rev18Score

Rev18Rank

Rev19Score

Rev19Rank

Prior Score

Percentile

1 1 1 1 2 1 1 … … 1 6 1 2 10 2

2 2 3 1 4 2 2 … … 1 15 1 13 13 4

3 5 14 5 13 CF CF … … 5 15 6 1 52 52

4 1 2 1 1 CF CF … … 1 1 CF CF 10 2

5 2 6 2 5 4 6 … … 1 15 2 5 25 9

6 2 5 1 3 NP NP … … 1 4 CF CF 16 6

… … … … … … … … … … … … … … …

15 3 8 5 13 4 12 … … 2 3 5 13 40 36

16 5 14 5 13 4 12 … … 5 15 5 13 49 48

17 4 14 4 13 4 12 … … 4 15 4 8 41 40

Data with Imputed Ranks

13

Next step is to calculate average the rank for each application

Application

Rev1Score

Rev1Rank

Rev2Score

Rev2Rank

Rev3Score

Rev3Rank

Rev18Score

Rev18Rank

Rev19Score

Rev19Rank

Prior Score

Percentile

1 1 1 1 2 1 1 … … 1 6 1 2 10 2

2 2 3 1 4 2 2 … … 1 15 1 13 13 4

3 5 14 5 13 CF CF … … 5 15 6 1 52 52

4 1 2 1 1 CF CF … … 1 1 CF CF 10 2

5 2 6 2 5 4 6 … … 1 15 2 5 25 9

6 2 5 1 3 NP NP … … 1 4 CF CF 16 6

… … … … … … … … … … … … … … …

15 3 8 5 13 4 12 … … 2 3 5 13 40 36

16 5 14 5 13 4 12 … … 5 15 5 13 49 48

17 4 14 4 13 4 12 … … 4 15 4 8 41 40


14

• For application A, only 19-1=18 reviewers can rank –Average rank = average of the 18 ranks = 83/18 = 4.61

• For application B, only 19-2=17 reviewers can rank –Average rank = average of the 17 ranks = 165/17 = 9.71

Application

R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15 R16 R17 R18 R19 Avg Rank

A 3 6 3 6 7 2 5 7 CF 6 4 4 6 4 3 5 6 2 4 4.61

B 14 7 5 13 5 6 7 12 10 15 5 8 9 NP 15 10 10 14 CF 9.71

Average Rank

15

Application

Rev1Score

Rev1Rank

Rev2Score

Rev2Rank

Rev3Score

Rev3Rank

Rev18Score

Rev18Rank

Rev19Score

Rev19Rank

Final Score

Percentile

Avg Rank

1 1 1 1 2 1 1 … … 1 6 1 2 10 2 2.44

2 2 3 1 4 2 2 … … 1 15 1 13 13 4 4.18

3 5 14 5 13 CF CF … … 5 15 6 1 52 52 12.25

4 1 2 1 1 CF CF … … 1 1 CF CF 10 2 2.77

5 2 6 2 5 4 6 … … 1 15 2 5 25 9 7.17

6 2 5 1 3 NP NP … … 1 4 CF CF 16 6 4.79

… … … … … … … … … … … … … … …

15 3 8 5 13 4 12 … … 2 3 5 13 40 36 11.17

16 5 14 5 13 4 12 … … 5 15 5 13 49 48 12.67

17 4 14 4 13 4 12 … … 4 15 4 8 41 40 11.67


16

Correlation Coefficient Between Rank and Percentile

17

B is better than A

R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15 R16 R17 R18 R19 Avg Rank

A 14 3 CF NP 9 13 6 15 14 3 2 4 5 7 7 9 9 CF 7 7.94

B 5 7 NP 8 5 4 7 CF 14 13 4 1 4 5 12 6 7 4 5 6.53

• How can one differentiate between two applications?• We want something natural and easy to understand.

• 19-5=14 common reviewers considered to rank/compare both applications

• 5 reviewers that consider A better than B• 9 reviewers that consider B better than A

Comparing Applications with Similar Percentiles

18

ApplIndices

714%

814%

916%

1018%

1121%

1221%

1321%

1421%

714%

7 714/20

714/19

714/17

711/20

712/21

716/21

716/20

814%

8 812/17

812/16

119/16

8*9/18

1311/19

814/16

916%

9 108/15

1110/14

128/13

1312/19

9*6/12

1018%

10 1110/17

129/17

1311/17

109/13

1121%

11 119/14

1310/19

1111/15

1221%

12 1312/19

1210/14

1321%

13 1315/19

1421%

14

14 out of 20 common reviewers ranked 7 as better than 8

* Indicates ties

Direct Comparison Matrix

19

Application A Application B

score (Average) 27 27

percentile 11% 11%

score range 2 ,3, 4 2, 3

reviewers preference by score 3/19 (16%) 3/19 (16%)

13/19 (68%) awarded each application the same score

ranking (Average) 6.1 6.2

ranking range 1- NR 2-NR

reviewers preference 13/18 (72% of reviewers) 5/18 (28% of reviewers) A stronger than B

Comparing Two Applications with the Same Percentile

20

Visualization of Binning for All SRGs

21

Visualization of Binning for Single SRG

22

• Helped reviewers prioritize applications and improved score spreading.

• Reviewers more engaged in discussions because of the need to rank.

• Difficult to rank applications that the reviewer did not read.

• May provide some complementary information but should not replace current system.

Reviewer Comments

23

• Does ranking add value to the peer review process?

• Could the rank ordering exercise be used as a tool by SROs to help panels spread scores and become more engaged in discussion?

• Can rank ordering be used by Program Staff to break ties or provide more information needed for funding decisions?

Questions and Next Steps

• Dr. Ghenima Dirami, SRO, Lung Injury and Repair study section• Dr. Gary Hunnicutt, SRO, Cellular, Molecular and Integrative

Reproduction study section• Dr. Raya Mandler, SRO, Molecular and Integrative Signal

Transduction study section• Dr. Atul Sahai, SRO, Pathobiology of Kidney Disease study

section• Dr. Wei-qin Zhao, SRO, Neurobiology of Learning Memory

study section• Dr. Adrian Vancea, Program Analyst, Office of Planning, Analysis

and Evaluation • Dr. Amy Rubinstein, SRO, Gene and Drug Delivery Systems

Study Section

Direct Ranking Pilot Working Group Members

25

Post Ranking Pilot

Office of Planning, Analysis and Evaluation

Q & A

1 amy rubinstein, ph.d., scientific review officer adrian vancea, ph.d., program analyst office of...

Documents

scores of applications

ranking applications

applications score

r01 applications

direct ranking of applications

rerank applications

final scores

compression of scores