generalizability of polygenic risk scores for breast

12
Original Investigation | Genetics and Genomics Generalizability of Polygenic Risk Scores for Breast Cancer Among Women With European, African, and Latinx Ancestry Cong Liu, PhD; Nur Zeinomar, PhD; Wendy K. Chung, MD, PhD; Krzysztof Kiryluk, MD; Ali G. Gharavi, MD; George Hripcsak, MD; Katherine D. Crew, MD; Ning Shang, PhD; Atlas Khan, PhD; David Fasel, BS; Teri A. Manolio, MD, PhD; Gail P. Jarvik, MD, PhD; Robb Rowley, MD; Ann E. Justice, PhD; Alanna K. Rahm, PhD; Stephanie M. Fullerton, DPhil; Jordan W. Smoller, MD, ScD; Eric B. Larson, MD; Paul K. Crane, MD; Ozan Dikilitas, MD; Georgia L. Wiesner, MD; Alexander G. Bick, MD, PhD; Mary Beth Terry, PhD; Chunhua Weng, PhD Abstract IMPORTANCE Multiple polygenic risk scores (PRSs) for breast cancer have been developed from large research consortia; however, their generalizability to diverse clinical settings is unknown. OBJECTIVE To examine the performance of previously developed breast cancer PRSs in a clinical setting for women of European, African, and Latinx ancestry. DESIGN, SETTING, AND PARTICIPANTS This cohort study using the Electronic Medical Records and Genomics (eMERGE) network data set included 39 591 women from 9 contributing medical centers in the US that had electronic medical records (EMR) linked to genotype data. Breast cancer cases and controls were identified through a validated EMR phenotyping algorithm. MAIN OUTCOMES AND MEASURES Multivariable logistic regression was used to assess the association between breast cancer risk and 7 previously developed PRSs, adjusting for age, study site, breast cancer family history, and first 3 ancestry informative principal components. RESULTS This study included 39 591 women: 33 594 with European, 3801 with African, and 2196 with Latinx ancestry. The mean (SD) age at breast cancer diagnosis was 60.7 (13.0), 58.8 (12.5), and 60.1 (13.0) years for women with European, African, and Latinx ancestry, respectively. PRSs derived from women with European ancestry were associated with breast cancer risk in women with European ancestry (highest odds ratio [OR] per 1-SD increase, 1.46; 95% CI, 1.41-1.51), women with Latinx ancestry (highest OR, 1.31; 95% CI, 1.09-1.58), and women with African ancestry (OR, 1.19; 95% CI, 1.05-1.35). For women with European ancestry, this association with breast cancer risk was largest in the extremes of the PRS distribution, with ORs ranging from 2.19 (95% CI, 1.84-2.53) to 2.48 (95% CI, 1.89-3.25) for the 3 different PRSs examined for those in the highest 1% of the PRS compared with those in the middle quantile. Among women with Latinx and African ancestries at the extremes of the PRS distribution, there were no statistically significant associations. CONCLUSIONS AND RELEVANCE This cohort study found that PRS models derived from women with European ancestry for breast cancer risk generalized well for women with European, Latinx, and African ancestries across different clinical settings, although the effect sizes for women with African ancestry were smaller, likely because of differences in risk allele frequencies and linkage disequilibrium patterns. These results highlight the need to improve representation of diverse population groups, particularly women with African ancestry, in genomic research cohorts. JAMA Network Open. 2021;4(8):e2119084. doi:10.1001/jamanetworkopen.2021.19084 Key Points Question How do previously developed breast cancer polygenic risk scores (PRSs) perform in a clinical setting for women of different ancestries? Findings In this multicenter cohort study linking electronic medical records to genotyping data that including 39 591 women, PRSs were significantly associated with breast cancer risk in women of all ancestries, although the effect sizes were smaller in women with African ancestry. Meaning Previously developed PRS models for breast cancer risk performed well for women with European and Latinx ancestries in different clinical settings; these results suggest that larger studies are needed to develop and validate PRSs for women with African ancestry. + Invited Commentary + Supplemental content Author affiliations and article information are listed at the end of this article. Open Access. This is an open access article distributed under the terms of the CC-BY License. JAMA Network Open. 2021;4(8):e2119084. doi:10.1001/jamanetworkopen.2021.19084 (Reprinted) August 4, 2021 1/12 Downloaded From: https://jamanetwork.com/ on 02/16/2022

Upload: others

Post on 17-Feb-2022

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Generalizability of Polygenic Risk Scores for Breast

Original Investigation | Genetics and Genomics

Generalizability of Polygenic Risk Scores for Breast CancerAmong Women With European, African, and Latinx AncestryCong Liu, PhD; Nur Zeinomar, PhD; Wendy K. Chung, MD, PhD; Krzysztof Kiryluk, MD; Ali G. Gharavi, MD; George Hripcsak, MD; Katherine D. Crew, MD;Ning Shang, PhD; Atlas Khan, PhD; David Fasel, BS; Teri A. Manolio, MD, PhD; Gail P. Jarvik, MD, PhD; Robb Rowley, MD; Ann E. Justice, PhD; Alanna K. Rahm, PhD;Stephanie M. Fullerton, DPhil; Jordan W. Smoller, MD, ScD; Eric B. Larson, MD; Paul K. Crane, MD; Ozan Dikilitas, MD; Georgia L. Wiesner, MD;Alexander G. Bick, MD, PhD; Mary Beth Terry, PhD; Chunhua Weng, PhD

Abstract

IMPORTANCE Multiple polygenic risk scores (PRSs) for breast cancer have been developed fromlarge research consortia; however, their generalizability to diverse clinical settings is unknown.

OBJECTIVE To examine the performance of previously developed breast cancer PRSs in a clinicalsetting for women of European, African, and Latinx ancestry.

DESIGN, SETTING, AND PARTICIPANTS This cohort study using the Electronic Medical Recordsand Genomics (eMERGE) network data set included 39 591 women from 9 contributing medicalcenters in the US that had electronic medical records (EMR) linked to genotype data. Breast cancercases and controls were identified through a validated EMR phenotyping algorithm.

MAIN OUTCOMES AND MEASURES Multivariable logistic regression was used to assess theassociation between breast cancer risk and 7 previously developed PRSs, adjusting for age, studysite, breast cancer family history, and first 3 ancestry informative principal components.

RESULTS This study included 39 591 women: 33 594 with European, 3801 with African, and 2196with Latinx ancestry. The mean (SD) age at breast cancer diagnosis was 60.7 (13.0), 58.8 (12.5), and60.1 (13.0) years for women with European, African, and Latinx ancestry, respectively. PRSs derivedfrom women with European ancestry were associated with breast cancer risk in women withEuropean ancestry (highest odds ratio [OR] per 1-SD increase, 1.46; 95% CI, 1.41-1.51), women withLatinx ancestry (highest OR, 1.31; 95% CI, 1.09-1.58), and women with African ancestry (OR, 1.19; 95%CI, 1.05-1.35). For women with European ancestry, this association with breast cancer risk was largestin the extremes of the PRS distribution, with ORs ranging from 2.19 (95% CI, 1.84-2.53) to 2.48 (95%CI, 1.89-3.25) for the 3 different PRSs examined for those in the highest 1% of the PRS compared withthose in the middle quantile. Among women with Latinx and African ancestries at the extremes ofthe PRS distribution, there were no statistically significant associations.

CONCLUSIONS AND RELEVANCE This cohort study found that PRS models derived from womenwith European ancestry for breast cancer risk generalized well for women with European, Latinx, andAfrican ancestries across different clinical settings, although the effect sizes for women with Africanancestry were smaller, likely because of differences in risk allele frequencies and linkagedisequilibrium patterns. These results highlight the need to improve representation of diversepopulation groups, particularly women with African ancestry, in genomic research cohorts.

JAMA Network Open. 2021;4(8):e2119084. doi:10.1001/jamanetworkopen.2021.19084

Key PointsQuestion How do previously developed

breast cancer polygenic risk scores

(PRSs) perform in a clinical setting for

women of different ancestries?

Findings In this multicenter cohort

study linking electronic medical records

to genotyping data that including 39 591

women, PRSs were significantly

associated with breast cancer risk in

women of all ancestries, although the

effect sizes were smaller in women with

African ancestry.

Meaning Previously developed PRS

models for breast cancer risk performed

well for women with European and

Latinx ancestries in different clinical

settings; these results suggest that

larger studies are needed to develop and

validate PRSs for women with African

ancestry.

+ Invited Commentary

+ Supplemental content

Author affiliations and article information arelisted at the end of this article.

Open Access. This is an open access article distributed under the terms of the CC-BY License.

JAMA Network Open. 2021;4(8):e2119084. doi:10.1001/jamanetworkopen.2021.19084 (Reprinted) August 4, 2021 1/12

Downloaded From: https://jamanetwork.com/ on 02/16/2022

Page 2: Generalizability of Polygenic Risk Scores for Breast

Introduction

Polygenic risk scores (PRSs) have consistently shown the ability to stratify the risk of breast canceramong women with European ancestry,1 but their generalizability to other race/ethnic groups is morelimited. For example, using large consortia of women with European ancestry, a PRS developed inthe Breast Cancer Association Consortium (BCAC), reported approximately 2-fold and 4-foldincreases in breast cancer risk for women in the top 10% and 1% of the PRS respectively; comparedwith women in the middle quantiles of risk (40% to 60%).2 This association has been replicated invalidation studies using large cohorts of women with European ancestry.3,4

Understanding the performance of these PRSs in diverse populations is crucial as we movetoward clinical implementation of the PRS. In order to incorporate PRSs into clinical practice, modelswill need to be integrated with other clinical covariates like family history in the electronical medicalrecords (EMR).5 With few exceptions,6 studies have not yet evaluated the performance of breastcancer PRSs using clinical data extracted from the EMR.

The Electronic Medical Records and Genomics (eMERGE) network is a federated network ofacademic medical centers in the US and has compiled EMRs and genotype data for genomicresearch.7 By using the rich resources of the eMERGE network, including the extensive breast cancerphenotyping algorithm and a diverse population assembled across the network’s federated sites,this study aims to provide a systematic evaluation of the generalizability of previously developedbreast cancer PRSs for women of European, African, and Latinx ancestry.

Methods

Study ParticipantsThe participants involved in this cohort study were women enrolled through the eMERGE networkfrom 9 contributing US medical centers with EMRs linked to genotype data. We identified breastcancer cases and controls through a validated phenotyping algorithm. We established ancestry byrequiring the observed/self-reported ancestry to match the genetic ancestry inferred by principalcomponent analysis-based k-means group, as previously described.8 Note that for Latinx women, weonly used self-report because of the diversity of admixture genetic background. We did not includeAsian, American Indian/Native American, Native Hawaiian/Pacific Islander, and other ancestry groupsin this study given the corresponding small number of breast cancer cases.

This study was conducted in accordance with the principles of the Declaration of Helsinki.9 Theinstitutional review board of each contributing institution approved the eMERGE study, and theColumbia University Health Sciences institutional review board approved this study because analysiswas conducted using deidentified data. All participants provided written informed consent prior tostudy inclusion. A specific discussion of the ethical considerations across the eMERGE III study isdescribed elsewhere.10 This study followed the Strengthening the Reporting of ObservationalStudies in Epidemiology (STROBE) reporting guideline.

PRS ModelsWe examined the performance of 7 PRS models previously developed and tested in women withEuropean, African, or Latinx ancestries (Table 1). We reconstructed each PRS based on includedvariants and corresponding effect sizes in the original publications and used PLINK version 1.915,16 tocalculate each PRS (more details in the eMethods in the Supplement). We included 3 PRS modelsdeveloped in women with European ancestry (2 developed from BCAC data with small and largenumbers of variants [BCAC-S and BCAC-L, respectively]2 and 1 from UKBiobank data [UKBB]11),which included 313, 3820, and 5218 variants, respectively. We also included 2 PRS models developedin or adapted to Latinx women (Women’s Health Initiative [WHI-LA], 71 variants13; and a modeldeveloped by Shieh et al12 including multiple cohorts of US Latina and Latin American women[LATINAS], 179 variants), as well as 2 PRS models developed in women with African ancestry (WHI

JAMA Network Open | Genetics and Genomics Generalizability of Polygenic Risk Scores for Breast Cancer by Race/Ethnicity

JAMA Network Open. 2021;4(8):e2119084. doi:10.1001/jamanetworkopen.2021.19084 (Reprinted) August 4, 2021 2/12

Downloaded From: https://jamanetwork.com/ on 02/16/2022

Page 3: Generalizability of Polygenic Risk Scores for Breast

cohort of women with African ancestry [WHI-AA],13 75 variants; and a cohort from the AfricanDiaspora study conducted by the Root consortium [ROOT],14 34 variants) (Table 1). For women withEuropean ancestry, we also evaluated PRSs developed for estrogen receptor (ER)-positive andER-negative breast cancers.2

GenotypingDetails of the eMERGE genotyping, imputation, and quality control procedures have been previouslydescribed.8 For this study, variants that match the following 3 criteria were retained for PRScalculation: (1) a mean R2 imputation quality greater than 0.3 across genotype array-batches; (2) Pvalue greater than 1 × 10−6 in ancestry-specific Hardy Weinberg Equilibrium tests; and (3) minor allelefrequency (MAF) greater than 0.005. Principal component analysis was performed in both thecombined data set and ancestry-specific data set after the MAF filtering and linkage disequilibrium(LD) pruning.

PhenotypingWe used EMR data to phenotype each participant, including breast cancer case-control status,demographic information, ER status, family history, and age. We classified women as cases orcontrols using a validated phenotyping algorithm (above 95% positive predictive value for cases andnegative predictive value for controls) that incorporated information from International Classificationof Diseases, Ninth Revision (ICD-9) and ICD-10 diagnostic codes (eTables 1 and2 in the Supplement),breast pathology reports, and medications (eTable 3 in the Supplement). The phenotyping workflowis shown in eFigure 1 in the Supplement, and more details can be found in the eMethods in theSupplement.

Statistical AnalysisTo evaluate the performance of each model, we standardized the PRSs to have a risk score unitexpressed as an SD of the control distribution. The association of the standardized PRSs and breastcancer risk was evaluated by logistic regression adjusted for the first 3 ancestry-specific principalcomponents,8 age, breast cancer family history, and study site. We defined age as the time periodbetween the year the phenotyping algorithm was executed and the year of birth. In addition, weexamined the association of breast cancer by percentiles of PRS, compared with the middle quantile(40% to 60%) or with the remainder of the population.

Table 1. Seven Polygenic Risk Score (PRS) Models Previously Developed for Women With European Ancestryor Optimized for Other Ancestries

PRS Models No. of Variantsa SourceValidation, No. in cohort(No. of cases)b

Ancestry of ValidationCohort

BCAC-L 3820 (2532)Mavaddat et al,2 (2019)

18323 (11 428) European

BCAC-S 313 (209)

UKBB 5218 (4192) Khera et al,11 (2018) 157895 (6586) European

LATINAS 180c (140) Shieh et al,12 (2020) 4658 (7622) Latinx

WHI-AA 75 (67)Allman et al,13 (2015)

7539 (416 cases) African

WHI-LA 71 (66) 3363 (147 cases) Latinx

ROOT 34 (31) Wang et al,14 (2018) 3686 (1657) African

Abbreviations: BCAC-L, Breast Cancer Association Consortium with large variant total; BCAC-S, Breast Cancer AssociationConsortium with small variant total; LATINAS, model with multiple cohorts of US Latina and Latin American women; ROOT,African Diaspora study; UKBB, UKBiobank; WHI-AA, Women’s Health Initiative for women with African ancestry; WHI-LA,Women’s Health Initiative for women with Latinx ancestry.a Variants overlap with the genotype data set in eMERGE.b All studies used independent GWAS data sets to develop PRS models.c While the original publication states there are 180 variants in the PRS, 1 variant was removed because of low imputation

quality, which left 179 variants.

JAMA Network Open | Genetics and Genomics Generalizability of Polygenic Risk Scores for Breast Cancer by Race/Ethnicity

JAMA Network Open. 2021;4(8):e2119084. doi:10.1001/jamanetworkopen.2021.19084 (Reprinted) August 4, 2021 3/12

Downloaded From: https://jamanetwork.com/ on 02/16/2022

Page 4: Generalizability of Polygenic Risk Scores for Breast

To examine the discrimination of each PRS, we estimated the area under the receiver operatorcharacteristic curves (AUC), with only the PRS used as a predictor. To estimate the percentage of thetotal variance in breast cancer risk explained by PRS, we used Nagelkerke’s pseudo R2 calculated forthe full model inclusive of the PRS plus the covariates minus R2 for the covariates alone. We alsochose the PRS showing the largest effect size within each ancestry to estimate the cumulative risk ofbreast cancer for high PRS risk (top tertile), moderate PRS risk (middle tertile), and low risk (bottomtertile) individuals in each ancestry using iCARE17 (See eMethods in the Supplement).

We also assessed statistical power for testing associations of PRSs with breast cancer givensample size for each ancestry. Based on ancestry-specific empirical effect sizes of the PRS obtainedfrom the literature, we assumed odds ratios (ORs) of 1.61,2 1.23,13 and 1.5812 for women withEuropean, African, and Latinx ancestry, respectively. Our power analysis shows we have 100%, 58%,and 99% power to detect an association with the above assumed ORs for women with European,African, and Latinx ancestries, respectively. When we assumed a moderate PRS effect size (OR, 1.39)for women with Latinx ancestry as reported in Allman et al,13 we observed 79% power to detect anassociation in Latinx women. However, if we assumed the same OR estimated for women withEuropean ancestry in non-European women (ie, OR, 1.61), we should have 100% and 99% power todetect an association for breast cancer in women with African and Latinx ancestry, respectively. Allanalyses were carried out in R version 3.0.2 (R Project for Statistical Computing). All statistical testswere 2-sided, and P values < .05 were considered significant.

Results

Our study included 39 591 women, including 33 594 women with European ancestry (mean [SD] age,66.1 [17.7] years), 3801 with African ancestry (mean [SD] age, 59.6 [16.5] years), and 2196 with Latinxancestry (mean [SD] age, 59.9 [19.4] years) (Table 2). The total number of variants included in thePRS calculation for each model is presented in Table 1.

Table 2. Participant Characteristics

Characteristic

Participants, No (%)European ancestry,(n = 33 594)

African ancestry,(n = 3801)

Latinx ancestry,(n = 2196)

Age, mean (SD), ya 66.1 (17.7) 59.6 (16.5) 59.9 (19.4)

Breast cancer diagnosis 3960 (11.8) 274 (7.2) 147 (6.7)

Age at breast cancer diagnosis, mean (SD), yb 60.7 (13.0) 58.8 (12.5) 60.1 (13.0)

Estrogen receptor status, No. (% of cases)

Positive 1052 (26.6) 20 (7.3) 22 (15.0)

Negative 241 (6.1) 15 (5.5) 4 (2.7)

Missing 2667 (67.3) 239 (87.2) 121 (82.3)

eMERGE network site

Columbia University Medical Center 202 (0.6) 73 (1.9) 158 (7.2)

Geisinger 1330 (4) 4 (0.1) 8 (0.4)

Partners Healthcare 13 392 (39.9) 927 (24.4) 1093 (49.8)

Kaiser Permanente Washington HealthResearch Institute/University of Washington

1646 (4.9) 65 (1.7) 55 (2.5)

Mayo Clinic 3547 (10.6) 10 (0.3) 22 (1)

Marshfield Clinic Research Foundation 2815 (8.4) 8 (0.4)

Mount Sinaic 220 (0.7) 2515 (66.2) 742 (33.8)

Northwestern University 1878 (5.6) 207 (5.4) 23 (1)

Vanderbilt University 8564 (25.5) 0 87 (4)

Abbreviation: eMERGE, Electronic Medical Recordsand Genomics.a Age was calculated at the time of electronic

phenotyping algorithm deployment.b Age at breast cancer diagnosis was defined as the

age at the first breast cancer InternationalClassification of Diseases–related code.

c Mount Sinai only executed the phenotype algorithmfor case-control definition; no estrogen receptorstatus data were extracted for this site.

JAMA Network Open | Genetics and Genomics Generalizability of Polygenic Risk Scores for Breast Cancer by Race/Ethnicity

JAMA Network Open. 2021;4(8):e2119084. doi:10.1001/jamanetworkopen.2021.19084 (Reprinted) August 4, 2021 4/12

Downloaded From: https://jamanetwork.com/ on 02/16/2022

Page 5: Generalizability of Polygenic Risk Scores for Breast

Association of PRS With Breast Cancer Risk in Women of European AncestryOur primary analysis examined the association of BCAC-S, BCAC-L, and UKBB in 3960 breast cancercases and 29 634 control women with European ancestry and is shown in Figure 1. We foundstatistically significant associations with overall breast cancer risk for all 3 PRSs examined; with meanORs per SD of the PRS ranging from 1.36 to 1.46, adjusted for the first 3 ancestry-specific principalcomponents, age, family history, and study site (BCAC-L: OR, 1.40; 95% CI, 1.35-1.45; BCAC-S: OR,1.36; 95% CI, 1.31-1.41; UKBB: OR, 1.46; 95% CI, 1.41-1.51).

As illustrated in Figure 2, this association with breast cancer risk was largest in the extremes ofthe PRS distribution, with ORs ranging from 2.19 (95% CI, 1.84-2.53) to 2.48 (95% CI, 1.89-3.25) forthe 3 different PRSs examined for those in the highest 1% of the PRS compared with those in themiddle quantile. For example, for the UKBB PRS, we observed an approximate 2.5-fold increase inrisk for those in the top 1% (OR, 2.48; 95% CI, 1.89-3.25) compared to those in the middle quantile

Figure 1. Association of Polygenic Risk Scores (PRSs) With Breast Cancer Risk in Women With European, African,and Latinx Ancestry in the eMERGE Cohorts

0.9

BCAC-L

BCAC-S

PRS

mod

el

Breast cancer risk, OR (log2 scale)

UKBB

LATINAS

1.2 1.4 1.6

WHI-LA

ROOT

WHI-AA

1.0 1.1 1.3 1.5

European ancestryAfrican ancestryLatinx ancestry

Odds ratios (ORs) are adjusted for the first 3 ancestry-specific principal components, age, family history, andstudy site. Breast Cancer Association Consortium withsmall variant total (BCAC-S) includes 313 variants inthe original PRS, BCAC with large variant total(BCAC-L) includes 3820 variants in the original PRS,Women’s Health Initiative for women with Latinxancestry (WHI-LA) includes 71 variants in the originalPRS and was optimized for women with Latinxancestry, WHI for women with African ancestry(WHI-AA) includes 75 variants in the original PRS andwas optimized for women with African ancestry,UKBiobank (UKBB) includes 5218 variants in theoriginal PRS, African Diaspora study (ROOT) includes34 variants in the original PRS and was optimized towomen with African ancestry, and the LATINAS modelincludes 179 variants from multiple cohorts in theoriginal PRS and was optimized for women with Latinxancestry.

Figure 2. The Association of Polygenic Risk Scores (PRSs) With Overall Breast Cancer Risk in Women With European Ancestry Relative to the Middle Quantile

<1 5-10 10-20 20-40 60-80 80-90 90-95 95-99 ≥99

2.00

Brea

st c

ance

r ris

k, O

R (l

og2

scal

e)

Model distribution, %

1.00

0.50

0.25

1-5

BCAC-LPRS model

BCAC-SUKBB

Odds ratios (ORs) are adjusted for the first 3 ancestry-specific principal components, age, family history, and study site. BCAC-L indicates Breast Cancer Association Consortium withlarge variant total; BCAC-S, Breast Cancer Association Consortium with small variant total; UKBB, UKBiobank.

JAMA Network Open | Genetics and Genomics Generalizability of Polygenic Risk Scores for Breast Cancer by Race/Ethnicity

JAMA Network Open. 2021;4(8):e2119084. doi:10.1001/jamanetworkopen.2021.19084 (Reprinted) August 4, 2021 5/12

Downloaded From: https://jamanetwork.com/ on 02/16/2022

Page 6: Generalizability of Polygenic Risk Scores for Breast

(40%-60%) (Figure 2). Our findings were similar when we compared the extreme ends of the PRSdistribution with those in the remainder of the PRS distribution (eFigure 2 in the Supplement). TheAUCs were similar for all 3 PRSs (BCAC-L: AUC, 0.60; 95% CI, 0.59-0.61; BCAC-S: AUC, 0.59; 95% CI,0.58-0.60; UKBB: AUC, 0.61; 95% CI, 0.60-0.62). The proportion of variance explained solely byPRS ranged from 1.7% to 2.5%, which is similar to what was reported originally (eg, 2.8% in the UKBBstudy11) (eTable 4 in the Supplement).

When we examined the association by ER status, we found significant associations for bothER-positive and ER-negative breast cancers, although the observed effect size was larger forER-positive compared with ER-negative breast cancer (eFigure 3 in the Supplement). The findingswere nearly identical for both overall PRSs and PRSs optimized for each breast cancer subtype(eFigure 3 in the Supplement).

Association of PRS With Breast Cancer Risk in Women of African AncestryWe examined the association of 5 previously developed PRSs: 3 based on women with Europeanancestry (BCAC-S, BCAC-L, and UKBB) and 2 developed in women of African ancestry (ROOT andWHI-AA) in 3801 women with African ancestry (including 274 cases). We found statisticallysignificant associations for the 3 PRS models based on women with European ancestry and breastcancer risk with average ORs per SD of the PRS ranging from 1.15 (95% CI, 1.03-1.30) to 1.19 (95% CI,1.04-1.35), but not for PRSs based on women with African ancestry (Figure 1). Compared with womenwith European ancestry, we observed lower AUCs in women with African ancestry (BCAC- L: AUC,0.55; 95% CI, 0.51-0.58; BCAC-S: AUC, 0.53; 95% CI, 0.50-0.57; UKBB: AUC, 0.55; 95% CI,0.52-0.59) (eTable 4 in the Supplement). The AUCs for PRSs developed in women with Africanancestry were 0.52 (95% CI, 0.48-0.55) for ROOT and 0.50 (95% CI, 0.47-0.54) for WHI-AA.

Association of PRS With Breast Cancer Risk in Latinx WomenWe examined the association of 5 PRSs (BCAC-S, BCAC-L, UKBB, WHI-LA, and LATINAS), 2 of whichwere developed in or adapted to women with Latinx ancestry (WHI-LA and LATINAS) in 2196 Latinxwomen (including 147 cases). For Latinx women, we observed a statistically significant associationfor overall breast cancer risk for 3 of the PRSs examined (BCAC-L, UKBB, LATINAS), with ORs per SDranging from 1.20 (95% CI, 1.01-1.42) to 1.31 (95% CI, 1.09-1.58) (Figure 1). Compared with womenwith European ancestry, we found lower AUCs in women with Latinx ancestry for BCAC-L, BCAC-S,and UKBB (with AUCs ranging from 0.53 to 0.56) (eTable 4 in the Supplement). The AUCs for PRSsdeveloped in women with Latinx ancestry were 0.54 (95% CI, 0.47-0.62) for LATINAS and 0.48(95% CI, 0.43-0.53) for WHI-LA.

Estimation of Absolute Risk of Breast CancerAs shown in Figure 3, there were differences in cumulative absolute breast cancer risk by riskcategories of PRS for women with European, African, and Latinx ancestries when individuals weregrouped into tertiles of the PRS distribution. When we compared those in the high PRS risk categorywith those at the low risk, women with European ancestry had larger risk gradients than women withAfrican and Latinx ancestries. For example, women with European, African, and Latinx ancestries inthe low PRS risk category had a cumulative breast cancer risk of 6.5%, 7.6%, and 6.1%, respectively,by age 80 years, whereas women in the high PRS risk category had 19.6%, 12.6% and 13.5%cumulative risk, respectively (Figure 3).

Discussion

For PRSs developed in cohorts of women with European ancestry (UKBB, BCAC-L, BCAC-S), wereplicated associations for increased breast cancer risk in women with European ancestry, althoughthe ORs we observed in our study were smaller in magnitude than the original studies (eTable 5 in theSupplement). For example, BCAC-L had an OR of 1.40 in women with European ancestry compared

JAMA Network Open | Genetics and Genomics Generalizability of Polygenic Risk Scores for Breast Cancer by Race/Ethnicity

JAMA Network Open. 2021;4(8):e2119084. doi:10.1001/jamanetworkopen.2021.19084 (Reprinted) August 4, 2021 6/12

Downloaded From: https://jamanetwork.com/ on 02/16/2022

Page 7: Generalizability of Polygenic Risk Scores for Breast

with an OR of 1.66 (95% CI, 1.61-1.70) reported in the original study.2 This smaller magnitude might beexplained by the reduced variant set, caused by the genotype platform discrepancy between theeMERGE network and published studies.

It is important to note that our study had limited sample size of women with non-Europeanancestries, despite using a large resource like eMERGE. Similar to other studies investigating thegeneralizability of PRSs in cohorts of women with European and non-European ancestry, we foundthat European ancestry–based PRS models generalized well in women with Latinx and Africanancestry, but with attenuated associations observed in women with African ancestry as reported in arecent study.18 This is likely due to Latinx individuals in the US having a greater proportion ofEuropean ancestry than individuals with African ancestry.19 Previous work showed Latinx individualsin the eMERGE cohort have a complex genetic admixture with its principle component-basedsubstructure centered mainly on the European samples with arms extending into the African andAsian groups.8 As such, the association detected for Latinx women in our cohort is likely driven bythe proportion of underlying European ancestry. Future studies are needed to examine thisassociation in different Latinx groups with greater African ancestry (eg, Caribbean groups) andNative American ancestry (eg, Central American groups).

Given the Eurocentricity of genomic studies, the smaller effect sizes for European ancestry–based PRSs with breast cancer risk in women with African ancestry in our study is not surprising andis consistent with PRS performance in non-European cohorts for other diseases20-23 and a large studyexamining European ancestry–based PRS in over 19 000 women with African ancestry, includingover 9000 cases of breast cancer.18 While our power analysis suggests we have limited power (58%)to recover the signal detected by the original African ancestry–based PRS (OR, 1.23), we did have100% power to detect an association in women with African ancestry if the European ancestry–based PRS can generalize as well in women with African ancestry (ie, if OR equaled or exceeded 1.61).The flip-flop phenomenon, in which a variant is a risk factor in 1 population but protective in another,has been observed among approximately 30% to 40% of variants across studies.14 Although theROOT model used in our study only consisted of variants with the effect size in the same directionamong women with European and African ancestries, it did not generalize well in the women withAfrican ancestry in the eMERGE network. The poor generalizability may also be partly explained bydifferences in risk allele frequencies and LD patterns among diverse ancestries.23

Among the European ancestry–based PRSs (BCAC-S, BCAC-L, and UKBB), UKBB achieved thelargest effect size in the eMERGE cohort among women with European ancestry. Although UKBBused the same genome-wide association study (GWAS) summary statistics provided in the BCACstudy, it developed and validated the PRS based on an independent larger sample size collectedthrough UK Biobank, which can contribute to its stronger generalizability. Another possible

Figure 3. Cumulative Risk of Breast Cancer From Birth Estimated Using UKBB Polygenic Risk Score Model in Women With European, African, and Latinx Ancestry

0.20

Cum

ulat

ive

risk

of b

reas

t can

cer

0.15

0.10

0.05

0

20 40 50 60 70 80

Age, y30

European ancestryA

0.20

Cum

ulat

ive

risk

of b

reas

t can

cer

0.15

0.10

0.05

0

20 40 50 60 70 80

Age, y30

African ancestryB

0.20

Cum

ulat

ive

risk

of b

reas

t can

cer

0.15

0.10

0.05

0

20 40 50 60 70 80

Age, y30

Latinx ancestryC

High riskMedium riskLow risk

JAMA Network Open | Genetics and Genomics Generalizability of Polygenic Risk Scores for Breast Cancer by Race/Ethnicity

JAMA Network Open. 2021;4(8):e2119084. doi:10.1001/jamanetworkopen.2021.19084 (Reprinted) August 4, 2021 7/12

Downloaded From: https://jamanetwork.com/ on 02/16/2022

Page 8: Generalizability of Polygenic Risk Scores for Breast

explanation is that UKBB’s similar phenotype definition and data was collected in the clinical settingutilizing EMRs. However, our phenotype algorithm included women with ductal carcinoma in situ(DCIS) who have stage 0 or noninvasive breast cancer. Our sensitivity analysis suggested thatdefining cases excluding DCIS achieved a slightly higher OR (eTable 6 in the Supplement). BecauseDCIS cases often requires definitive treatment with complete surgical resection, radiation therapy,and adjuvant hormonal therapy, we believe a validated PRS should also be able to make prediction forDCIS cases. Of note, some breast cancer risk prediction tools such as the Tyrer-Cuzick model24

account for both invasive and noninvasive breast cancer. Future PRS development work mayconsider including DCIS in the training sample.

For PRSs developed in non-European ancestry study populations (WHI-AA, WHI-LA, and ROOT)or adapted to non-European ancestry populations (LATINAS), we did not replicate the previouslyreported associations in the eMERGE cohort for women with Latinx or African ancestry, except forthe LATINAS in Latinx women. LATINAS is a multiethnic PRS that utilized effect sizes obtained frompopulations with European ancestry and further developed the PRS in a cohort of Latinx women,suggesting that combining training data from samples from individuals with European ancestry couldimprove the observed associations in non-European ancestry populations.25,26 We found that while61 of 179 variants (34.1%) included in LATINAS were also included in UKBB model, only 12 of 71(16.9%) included in WHI-LA were included in the UKBB model. Because the PRSs developed instudies using populations of non-European ancestry are often based on much smaller GWAS cohorts,the uncertainty of the effect sizes used in those PRSs is larger, making their predictive power lowerfor populations with non-European ancestry.12,14 In addition, the PRSs based on individuals withnon-European ancestry included fewer variants passing the statistical threshold because of thesmaller sample size in the discovery GWAS cohort, which would possibly contribute further to theirweaker generalizability. Of note is the limited sample size for women with Latinx or African ancestryin our study, so future studies with adequate power are warranted to evaluate PRS performance forthese groups. Furthermore, even with an adequate sample size for populations with non-Europeanancestry, limitations inherent to the genotyping platforms used in GWAS27 can make thissubpopulation optimization theoretically insufficient to reduce the bias if the subpopulation riskallele is not captured by the genotype platform, which is possible because many array designs arebased on samples of populations with European ancestry. Moreover, previous findings that womenwith African ancestry have a 40% higher mortality rate,28 which is often attributed to later stage ofdiagnosis and related preventative health care barriers, underscores the urgent need to increasediversity in genomic studies so that future clinical applications of the PRS do not exacerbate existinghealth disparities.

The eMERGE29 and the All of Us Research Program30 are 2 programs actively involved inincreasing recruitment of diverse patients to help address the gap. These EMR-derived cohortsprovide a scalable approach to independently validate previously developed PRSs for differentphenotypes across multiple clinical operation sites.31-33 We found similar magnitudes of PRSassociation in women with European ancestry across all study sites, except for Vanderbilt University(eFigure 4 in the Supplement). This difference might be related to the heterogeneity in thegenotyping platforms and/or EMR systems.34,35 Breast cancer PRS models based on populations withnon-European ancestry are still in development via large consortia studies, such as the ConfluenceProject,36 which aims to develop a large research resource including at least 300 000 breast cancercases and 300 000 controls of different races/ethnicities by the confluence of existing GWAS andnew genome-wide genotyping data

LimitationsThis study had several limitations. The small sample size of women with Latinx or African ancestry inour study is a limitation, particularly in being able to examine associations for women at the extremeends of the PRS and by BC subtype. Missing marker information was much more common in womenin these groups than for women with European ancestry and imputation is generally poorer in

JAMA Network Open | Genetics and Genomics Generalizability of Polygenic Risk Scores for Breast Cancer by Race/Ethnicity

JAMA Network Open. 2021;4(8):e2119084. doi:10.1001/jamanetworkopen.2021.19084 (Reprinted) August 4, 2021 8/12

Downloaded From: https://jamanetwork.com/ on 02/16/2022

Page 9: Generalizability of Polygenic Risk Scores for Breast

populations with non-European ancestry, potentially leading to important and unmeasurable biases.Also, while the eMERGE network is a rich and unique resource for this study, it is primarily focusedon academic centers, and may not be generalizable to patients in community practices. Additionally,our validation is based on PRSs constructed using a reduced variant set because of the genotypeplatform discrepancy between the eMERGE network and published studies. A variant in the originalmodel can be excluded for multiple reasons such as ambiguity (ie, those with complementary alleles,either C/G or A/T), low imputation quality, or allele mismatch. Theoretically, expected PRSs can becalculated for the full variant set in the original published PRS by taking the imputed probabilities formismatched genotypes into consideration. However, given the low imputation quality for thosemismatched genotypes we excluded in this study, the expected PRS for the full variant set could havea large variance, and as such, we did not conduct the calculation in our study. Our sensitivity analysisfound that while using a more conservative imputation quality threshold (ie, imputation R2 > 0.8)significantly reduced the number of variants in the genotype data set, our results were largelyunchanged (eFigure 5 and eTable 7 in the Supplement).

Conclusions

In summary, we found PRS models based on populations with European ancestry were significantlyassociated with breast cancer risk in women with European ancestry in the eMERGE network. Wealso found that these PRSs generalized well to women with European and Latinx ancestry, and to alesser degree to women with African ancestry, although further studies with larger sample size ofwomen with African ancestry are needed. Additionally, we found that PRS developed in small GWASstudies of populations with non-European ancestry did not generalize well in the respective ancestrygroup. Our results highlight the need to increase the inclusion of racially and ethnically diverseindividuals, particularly individuals with African ancestry, in large-scale genomic studies. Until well-developed and validated PRSs for women with non-European ancestry become available, the currentPRSs developed based on cohorts with European ancestry could be adapted for Latinx women, butnot women with African ancestry, in clinical settings until additional data sets become available in thisimportant and high-risk group.

ARTICLE INFORMATIONAccepted for Publication: May 23, 2021.

Published: August 4, 2021. doi:10.1001/jamanetworkopen.2021.19084

Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2021 Liu C et al.JAMA Network Open.

Corresponding Author: Cong Liu, PhD, Department of Biomedical Informatics, Columbia University, 622 W 168 St,PH-20, Rm 407, New York, NY 10032 ([email protected]).

Author Affiliations: Department of Biomedical Informatics, Columbia University Irving Medical Center, New York,New York (Liu, Hripcsak, Shang, Fasel, Weng); Department of Epidemiology, Columbia University Irving MedicalCenter, New York, New York (Zeinomar, Terry); Division of Medical Oncology, Rutgers Cancer Institute of NewJersey, Robert Wood Johnson Medical School, New Brunswick, New Jersey (Zeinomar); Department of Pediatrics,Columbia University Irving Medical Center, New York, New York (Chung); Department of Medicine, ColumbiaUniversity Irving Medical Center, New York, New York (Kiryluk, Gharavi, Crew, Khan); National Human GenomeResearch Institute, Bethesda, Maryland (Manolio, Rowley); Department of Medicine, University of Washington,Seattle (Jarvik, Crane); Department of Population Health Sciences, Geisinger, Danville, Pennsylvania (Justice);Genomic Medicine Institute, Geisinger, Danville, Pennsylvania (Rahm); Department of Bioethics and Humanities,University of Washington, Seattle (Fullerton); Psychiatric and Neurodevelopmental Genetics Unit, Center forGenomic Medicine, Massachusetts General Hospital, Boston, Massachusetts (Smoller); Kaiser PermanenteWashington Health Research Institute, Seattle, Washington (Larson); Department of Cardiovascular Medicine,Mayo Clinic, Rochester, Minnesota (Dikilitas); Department of Medicine, Division of Genetic Medicine, VanderbiltUniversity Medical Center, Nashville, Tennessee (Wiesner, Bick).

JAMA Network Open | Genetics and Genomics Generalizability of Polygenic Risk Scores for Breast Cancer by Race/Ethnicity

JAMA Network Open. 2021;4(8):e2119084. doi:10.1001/jamanetworkopen.2021.19084 (Reprinted) August 4, 2021 9/12

Downloaded From: https://jamanetwork.com/ on 02/16/2022

Page 10: Generalizability of Polygenic Risk Scores for Breast

Author Contributions: Dr Weng had full access to all of the data in the study and takes responsibility for theintegrity of the data and the accuracy of the data analysis. Drs Liu and Zeinomar contributed equally to this workas first authors. Drs Terry and Weng contributed equally as senior authors.

Concept and design: Liu, Zeinomar, Kiryluk, Manolio, Weng.

Acquisition, analysis, or interpretation of data: Liu, Zeinomar, Chung, Kiryluk, Gharavi, Hripcsak, Crew, Shang, Khan,Fasel, Jarvik, Rowley, Justice, Rahm, Fullerton, Smoller, Larson, Crane, Dikilitas, Wiesner, Bick, Terry, Weng.

Drafting of the manuscript: Liu, Zeinomar, Gharavi, Manolio, Dikilitas.

Critical revision of the manuscript for important intellectual content: Liu, Zeinomar, Chung, Kiryluk, Hripcsak, Crew,Shang, Khan, Fasel, Manolio, Jarvik, Rowley, Justice, Rahm, Fullerton, Smoller, Larson, Crane, Dikilitas, Wiesner,Bick, Terry, Weng.

Statistical analysis: Liu, Zeinomar, Kiryluk, Khan, Bick, Terry.

Obtained funding: Kiryluk, Gharavi, Hripcsak, Jarvik, Rahm, Larson, Weng.

Administrative, technical, or material support: Liu, Hripcsak, Shang, Jarvik, Rowley, Justice, Rahm, Larson.

Supervision: Kiryluk, Crew, Jarvik, Bick, Terry, Weng.

Conflict of Interest Disclosures: Dr Gharavi reported receiving grants from Renal Research Institute and Natera,and reported service on the advisory board for Goldfinch Bio outside the submitted work. Dr Smoller reportedreceiving an honorarium for an internal seminar from Biogen, Inc outside the submitted work. No other disclosureswere reported.

Funding/Support: The eMERGE Network was initiated and funded by National Human Genome Research Institute(NHGRI) through the following grants: U01HG006828 (Cincinnati Children's Hospital Medical Center and BostonChildren's Hospital); U01HG006830 (Children's Hospital of Philadelphia); U01HG006389 (Essentia Institute ofRural Health, Marshfield Clinic Research Foundation, and Pennsylvania State University); U01HG006382(Geisinger Clinic); U01HG006375 (Group Health Cooperative and the University of Washington); U01HG006379(Mayo Clinic); U01HG006380 (Icahn School of Medicine at Mount Sinai); U01HG006388 (NorthwesternUniversity); U01HG006378 (Vanderbilt University Medical Center); and U01HG006385 (Vanderbilt UniversityMedical Center serving as the coordinating center). This phase of the eMERGE network was initiated and fundedby the NHGRI through the following grants: U01HG8657 (Group Health Cooperative/University of Washington);U01HG8685 (Brigham and Women's Hospital); U01HG8672 (Vanderbilt University Medical Center); U01HG6379(Mayo Clinic); U01HG8679 (Geisinger Clinic); U01HG8680 (Columbia University Health Sciences); U01HG8684(Children's Hospital of Philadelphia); U01HG8673 (Northwestern University); U01HG8701 (Vanderbilt UniversityMedical Center serving as the Coordinating Center); U01HG8676 (Partners Healthcare and the Broad Institute);U54MD007593 (Meharry Translational Research Center); and U01HG8664 (Baylor College of Medicine). DrsWeng and Liu received additional support from National Library of Medicine/National Human Genomic ResearchInstitute Grant R01LM012895. Dr Zeinomar was supported by the National Institutes of Health (NIH) NationalCenter for Advancing Translational Sciences (NCATS) TL1 Training Program, grant No. TL1TR001875.

Role of the Funder/Sponsor: The funders had no role in the design and conduct of the study; collection,management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; anddecision to submit the manuscript for publication.

Disclaimer: The contents of this article are solely the responsibility of the authors and do not necessarily representthe official views of the National Institutes of Health.

Additional Contributions: We would like to thank all the investigators and participants of the electronic MedicalRecords and Genomics (eMERGE) Network.

Additional Information: The database of Genotypes and Phenotypes accession number for the imputedgenotype data reported in this paper is phs001584.v1.p1. The clinical data sets generated and analyzed during thecurrent study contained personal health information and hence are not publicly available, but are available fromthe corresponding authors and the eMERGE consortium through authorized collaborations.

REFERENCES1. Yanes T, Young M-A, Meiser B, James PA. Clinical applications of polygenic breast cancer risk: a critical reviewand perspectives of an emerging field. Breast Cancer Res. 2020;22(1):21. doi:10.1186/s13058-020-01260-3

2. Mavaddat N, Michailidou K, Dennis J, et al; ABCTB Investigators; kConFab/AOCS Investigators; NBCSCollaborators. Polygenic risk scores for prediction of breast cancer and breast cancer subtypes. Am J Hum Genet.2019;104(1):21-34. doi:10.1016/j.ajhg.2018.11.002

3. Lakeman IMM, Rodríguez-Girondo M, Lee A, et al. Validation of the BOADICEA model and a 313-variantpolygenic risk score for breast cancer risk prediction in a Dutch prospective cohort. Genet Med. 2020;22(11):1803-1811. doi:10.1038/s41436-020-0884-4

JAMA Network Open | Genetics and Genomics Generalizability of Polygenic Risk Scores for Breast Cancer by Race/Ethnicity

JAMA Network Open. 2021;4(8):e2119084. doi:10.1001/jamanetworkopen.2021.19084 (Reprinted) August 4, 2021 10/12

Downloaded From: https://jamanetwork.com/ on 02/16/2022

Page 11: Generalizability of Polygenic Risk Scores for Breast

4. Jia G, Lu Y, Wen W, et al. Evaluating the utility of polygenic risk scores in identifying high-risk individuals for eightcommon cancers. JNCI Cancer Spectr. 2020;4(3):pkaa021. doi:10.1093/jncics/pkaa021

5. Lambert SA, Abraham G, Inouye M. Towards clinical utility of polygenic risk scores. Hum Mol Genet. 2019;28(R2):R133-R142. doi:10.1093/hmg/ddz187

6. Fritsche LG, Gruber SB, Wu Z, et al. Association of polygenic risk scores for multiple cancers in a phenome-widestudy: results from the Michigan Genomics Initiative. Am J Hum Genet. 2018;102(6):1048-1061. doi:10.1016/j.ajhg.2018.04.001

7. Gottesman O, Kuivaniemi H, Tromp G, et al. The Electronic Medical Records and Genomics (eMERGE) Network:past, present, and future. Genet Med. 2013;15(10):761-771. doi:10.1038/gim.2013.72

8. Stanaway IB, Hall TO, Rosenthal EA, et al; eMERGE Network. The eMERGE genotype set of 83 717 subjectsimputed to ~40 million variants genome wide and association with the herpes zoster medical record phenotype.Genet Epidemiol. 2019;43(1):63-81.

9. World Medical Association. World Medical Association Declaration of Helsinki: ethical principles for medicalresearch involving human subjects. JAMA. 2013;310(20):2191-2194. doi:10.1001/jama.2013.281053

10. Fossey R, Kochan D, Winkler E, et al. Ethical considerations related to return of results from genomic medicineprojects: the eMERGE Network (Phase III) experience. J Pers Med. 2018;8(1):2. doi:10.3390/jpm8010002

11. Khera AV, Chaffin M, Aragam KG, et al. Genome-wide polygenic scores for common diseases identify individualswith risk equivalent to monogenic mutations. Nat Genet. 2018;50(9):1219-1224. doi:10.1038/s41588-018-0183-z

12. Shieh Y, Fejerman L, Lott PC, et al; COLUMBUS Consortium. A polygenic risk score for breast cancer in USLatinas and Latin American women. J Natl Cancer Inst. 2020;112(6):590-598. doi:10.1093/jnci/djz174

13. Allman R, Dite GS, Hopper JL, et al. SNPs and breast cancer risk prediction for African American and Hispanicwomen. Breast Cancer Res Treat. 2015;154(3):583-589. doi:10.1007/s10549-015-3641-7

14. Wang S, Qian F, Zheng Y, et al. Genetic variants demonstrating flip-flop phenomenon and breast cancer riskprediction among women of African ancestry. Breast Cancer Res Treat. 2018;168(3):703-712. doi:10.1007/s10549-017-4638-1

15. Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challengeof larger and richer datasets. Gigascience. 2015;4:7. doi:10.1186/s13742-015-0047-8

16. Purcell S, Neale B, Todd-Brown K, et al. PLINK: a tool set for whole-genome association and population-basedlinkage analyses. Am J Hum Genet. 2007;81(3):559-575. doi:10.1086/519795

17. Pal Choudhury P, Maas P, Wilcox A, et al. iCARE: An R package to build, validate and apply absolute risk models.PLoS One. 2020;15(2):e0228198. doi:10.1371/journal.pone.0228198

18. Du Z, Gao G, Adedokun B, et al; GBHS Study Team. Evaluating polygenic risk scores for breast cancer in womenof African ancestry. J Natl Cancer Inst. 2021;djab050. doi:10.1093/jnci/djab050

19. Bryc K, Durand EY, Macpherson JM, Reich D, Mountain JL. The genetic ancestry of African Americans, Latinos,and European Americans across the United States. Am J Hum Genet. 2015;96(1):37-53. doi:10.1016/j.ajhg.2014.11.010

20. Duncan L, Shen H, Gelaye B, et al. Analysis of polygenic risk score usage and performance in diverse humanpopulations. Nat Commun. 2019;10(1):3328. doi:10.1038/s41467-019-11112-0

21. Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. Clinical use of current polygenic risk scores mayexacerbate health disparities. Nat Genet. 2019;51(4):584-591. doi:10.1038/s41588-019-0379-x

22. Chande AT, Rishishwar L, Conley AB, Valderrama-Aguirre A, Medina-Rivas MA, Jordan IK. Ancestry effects ontype 2 diabetes genetic risk inference in Hispanic/Latino populations. BMC Med Genet. 2020;21(suppl 2):132.doi:10.1186/s12881-020-01068-0

23. Fujiwara T, Yamamoto Y, Kim JD, Buske O, Takagi T. PubCaseFinder: a case-report-based, phenotype-drivendifferential-diagnosis system for rare diseases. Am J Hum Genet. 2018;103(3):389-399. doi:10.1016/j.ajhg.2018.08.003

24. Valero MG, Zabor EC, Park A, et al. The Tyrer–Cuzick model inaccurately predicts invasive breast cancer risk inwomen with LCIS. Ann Surg Oncol. 2020;27(3):736-740. doi:10.1245/s10434-019-07814-w

25. Márquez-Luna C, Loh PR, Price AL; South Asian Type 2 Diabetes (SAT2D) Consortium; SIGMA Type 2 DiabetesConsortium. Multiethnic polygenic risk scores improve risk prediction in diverse populations. Genet Epidemiol.2017;41(8):811-823. doi:10.1002/gepi.22083

26. Coram MA, Fang H, Candille SI, Assimes TL, Tang H. Leveraging multi-ethnic evidence for risk assessment ofquantitative traits in minority populations. Am J Hum Genet. 2017;101(4):638. doi:10.1016/j.ajhg.2017.09.005

JAMA Network Open | Genetics and Genomics Generalizability of Polygenic Risk Scores for Breast Cancer by Race/Ethnicity

JAMA Network Open. 2021;4(8):e2119084. doi:10.1001/jamanetworkopen.2021.19084 (Reprinted) August 4, 2021 11/12

Downloaded From: https://jamanetwork.com/ on 02/16/2022

Page 12: Generalizability of Polygenic Risk Scores for Breast

27. De La Vega FM, Bustamante CD. Polygenic risk scores: a biased prediction? Genome Med. 2018;10(1):100. doi:10.1186/s13073-018-0610-x

28. DeSantis CE, Ma J, Gaudet MM, et al. Breast cancer statistics, 2019. CA Cancer J Clin. 2019;69(6):438-451.doi:10.3322/caac.21583

29. McCarty CA, Chisholm RL, Chute CG, et al; eMERGE Team. The eMERGE Network: a consortium ofbiorepositories linked to electronic medical records data for conducting genomic studies. BMC Med Genomics.2011;4:13. doi:10.1186/1755-8794-4-13

30. Denny JC, Rutter JL, Goldstein DB, et al; All of Us Research Program Investigators. The “all of us” researchprogram. N Engl J Med. 2019;381(7):668-676. doi:10.1056/NEJMsr1809937

31. Mosley JD, Feng Q, Wells QS, et al. A study paradigm integrating prospective epidemiologic cohorts andelectronic health records to identify disease biomarkers. Nat Commun. 2018;9(1):3522. doi:10.1038/s41467-018-05624-4

32. Li R, Chen Y, Ritchie MD, Moore JH. Electronic health records and polygenic risk scores for predicting diseaserisk. Nat Rev Genet. 2020;21(8):493-502. doi:10.1038/s41576-020-0224-1

33. Bowton E, Field JR, Wang S, et al. Biobanks and electronic medical records: enabling cost-effective research.Sci Transl Med. 2014;6(234):234cm3. doi:10.1126/scitranslmed.3008604

34. Crosslin DR, Tromp G, Burt A, et al; electronic Medical Records and Genomics (eMERGE) Network. Controllingfor population structure and genotyping platform bias in the eMERGE multi-institutional biobank linked toelectronic health records. Front Genet. 2014;5:352. doi:10.3389/fgene.2014.00352

35. Zuvich RL, Armstrong LL, Bielinski SJ, et al. Pitfalls of merging GWAS data: lessons learned in the eMERGEnetwork and quality control procedures to maintain high data quality. Genet Epidemiol. 2011;35(8):887-898. doi:10.1002/gepi.20639

36. Confluence Project. National Cancer Institute. Accessed July 6, 2021. https://dceg.cancer.gov/research/cancer-types/breast-cancer/confluence-project

SUPPLEMENT.eTable 1. Female Breast Cancer Diagnostic CodeseTable 2. Breast Cancer History CodeseTable 3. Hormone Therapy Drugs for Breast CancereTable 4. The AUC (95% CI) for Breast Cancer Prediction Using PRS as a Single Predictor in Women of DifferentAncestrieseTable 5. Associations of PRS With Breast Cancer in the Original Studies for EA WomeneTable 6. The OR (95% CI) of Breast Cancer per Standard PRS Unit Increase for Women of Different Ancestries inthe eMERGE CohortseTable 7. The Different Factors Can Affect the Reducing of the Number of SNPseFigure 1. Breast Cancer Phenotyping WorkfloweFigure 2. The Association of Overall Breast Cancer (BC) for European Ancestry Women in the Highest 20%, 10%,5%, and 1% of PRS Risk, Compared to Women With Lower PRS Risk (PRS Percentiles <80, 90, 95, and 99,Respectively)eFigure 3. The Association Between Different PRSs and Estrogen Receptor (ER)-positive and ER-negative BreastCancer for Women of European AncestryeFigure 4. The Association of the Different PRSs and Breast Cancer (BC) Risk in Women of European, African, andLatinx Ancestries, Stratified by eMERGE Study SiteseFigure 5. The Association Between Different PRSs Generated by Various Filter Stacks and Overall Breast Cancerfor WomeneMethods.eReferences.

JAMA Network Open | Genetics and Genomics Generalizability of Polygenic Risk Scores for Breast Cancer by Race/Ethnicity

JAMA Network Open. 2021;4(8):e2119084. doi:10.1001/jamanetworkopen.2021.19084 (Reprinted) August 4, 2021 12/12

Downloaded From: https://jamanetwork.com/ on 02/16/2022