detection of snp-snp interactions in trios of parents with schizophrenic children

11
Genetic Epidemiology 34 : 396–406 (2010) Detection of SNP-SNP Interactions in Trios of Parents with Schizophrenic Children Qing Li, 1 M. Daniele Fallin, 1,2 Thomas A. Louis, 1 Virginia K. Lasseter, 3 John A. McGrath, 3 Dimitri Avramopoulos, 4 Paula S. Wolyniec, 3 David Valle, 4 Kung-Yee Liang, 1 Ann E. Pulver, 3 and Ingo Ruczinski 1 1 Department of Biostatistics , Bloomberg School of Public Health, Baltimore, Maryland 2 Department of Epidemiology, Bloomberg School of Public Health, Baltimore, Maryland 3 Department of Psychiatry and Behavioral Sciences, School of Medicine, Johns Hopkins University, Baltimore, Maryland 4 McKusick-Nathans Institute of Genetic Medicine, School of Medicine, Johns Hopkins University, Baltimore, Maryland Schizophrenia (SZ) is a heritable and complex psychiatric disorder with an estimated worldwide prevalence of about 1%. Research on the risk factors for SZ has thus far yielded few clues to causes, but has pointed to a heterogeneous etiology that likely involves multiple genes and gene-environment interactions. In this manuscript, we apply a novel method (trio logic regression, Li et al., 2009) to case-parent trio data from a SZ candidate gene study conducted on families of Ashkenazi Jewish descent, and demonstrate the method’s ability to detect multi-gene models for SZ risk in the family-based design. In particular, we demonstrate how this method revealed a genotype-phenotype association that includes an allele without marginal effect. Genet. Epidemiol. 34 : 396–406, 2010. r 2010 Wiley-Liss, Inc. Key words: case-parent trios; interaction; logic regression; single nucleotide polymorphisms Contract grant sponsor: National Institutes of Mental Health (NIMH); Contract grant numbers: R01MH057314; R01MH58153; Contract grant sponsor: NIH; Contract grant number: R01 DK061662; Contract grant sponsor: National Institute of Diabetes, Digestive and Kidney Diseases; Contract grant number: R01 HL090577; Contract grant sponsor: CTSA. Correspondence to: Ingo Ruczinski, Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205. E-mail: [email protected] Received 3 August 2009; Revised 24 October 2009; Accepted 23 November 2009 Published online 21 June 2010 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/gepi.20488 INTRODUCTION Schizophrenia (SZ) is a complex psychiatric disorder with an estimated worldwide prevalence of 1%. The diagnosis and classification of SZ relies on observations of disease-related symptoms, including delusions, hallucinations, bizarre behavior, disorganized speech, flattened affect and lack of motivation. Research on the risk factors for SZ has thus far yielded few clues to causes, but has pointed to a heterogeneous etiology that likely involves multiple genes and gene-environment inter- actions. From very early on, segregation analysis has shown that SZ is heritable, but does not follow a simple Mendelian pattern [see Baron, 1986; Carter and Chung, 1980; Risch and Baron, 1984]. To date, over 2000 papers on the genetics of SZ have been published (Schizophrenia Forum 2009, http://www.schizophreniaforum.org/) and about 150 genes have been implicated; however, for each gene identified there exists at least one paper reporting no linkage or association. Regardless of candidate gene or genome-wide scan approaches, few studies have investigated the simulta- neous effect of two or more genetic factors on SZ susceptibility, despite evidence suggesting SZ risk may be a function of a set of genes rather than a single one [for some reviews see Chiu et al., 2002; Gogos and Gerber, 2006; Harrison and Owen, 2003; Jurewicz et al., 2001]. The multi-gene mechanism may be through gene-gene interaction or via additive effects of independent genetic causes. Either scenario can partially explain the common phenomenon in genetic association analysis for SZ that no single gene or marker is found to have a large effect size and that each finding has been difficult to replicate across studies. Thus, methods to accommodate multiple genes simultaneously when searching for SZ genetic risk factors are absolutely necessary. The advantages of family-based designs include greater robustness against population stratification [Spielman and Ewens, 1996], direct observation of parental transmission, easier detection of Mendelian errors and de novo deletions, and the option to investigate parent-of-origin effects [Benyamin et al., 2009; Laird and Lange, 2006; Weinberg et al., 1998]. However, while several methods are available to identify gene-gene interactions in population- based studies (such as cohort and case-control studies), this task is challenging under family-based designs. We recently introduced a novel statistical method to detect SNP-SNP interactions in trios with affected probands [Li et al., 2009], using an extension to the logic regression approach [Kooperberg et al., 2001; Ruczinski et al., 2003]. In this manuscript, we describe and employ this case- parent trio-based logic regression approach to identify gene-gene interactions for SZ. Our approach can detect high-order interactions involving multiple independent SNPs while accounting for linkage disequilibrium (LD) among SNPs in a family data set. We evaluate and apply the proposed trio logic regression method to trio data from r 2010 Wiley-Liss, Inc.

Upload: qing-li

Post on 11-Jun-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Detection of SNP-SNP interactions in trios of parents with schizophrenic children

Genetic Epidemiology 34 : 396–406 (2010)

Detection of SNP-SNP Interactions in Trios of Parentswith Schizophrenic Children

Qing Li,1 M. Daniele Fallin,1,2 Thomas A. Louis,1 Virginia K. Lasseter,3 John A. McGrath,3 Dimitri Avramopoulos,4

Paula S. Wolyniec,3 David Valle,4 Kung-Yee Liang,1 Ann E. Pulver,3 and Ingo Ruczinski1�

1Department of Biostatistics, Bloomberg School of Public Health, Baltimore, Maryland2Department of Epidemiology, Bloomberg School of Public Health, Baltimore, Maryland

3Department of Psychiatry and Behavioral Sciences, School of Medicine, Johns Hopkins University, Baltimore, Maryland4McKusick-Nathans Institute of Genetic Medicine, School of Medicine, Johns Hopkins University, Baltimore, Maryland

Schizophrenia (SZ) is a heritable and complex psychiatric disorder with an estimated worldwide prevalence of about 1%.Research on the risk factors for SZ has thus far yielded few clues to causes, but has pointed to a heterogeneous etiology thatlikely involves multiple genes and gene-environment interactions. In this manuscript, we apply a novel method (trio logicregression, Li et al., 2009) to case-parent trio data from a SZ candidate gene study conducted on families of AshkenaziJewish descent, and demonstrate the method’s ability to detect multi-gene models for SZ risk in the family-based design. Inparticular, we demonstrate how this method revealed a genotype-phenotype association that includes an allele withoutmarginal effect. Genet. Epidemiol. 34 : 396–406, 2010. r 2010 Wiley-Liss, Inc.

Key words: case-parent trios; interaction; logic regression; single nucleotide polymorphisms

Contract grant sponsor: National Institutes of Mental Health (NIMH); Contract grant numbers: R01MH057314; R01MH58153; Contractgrant sponsor: NIH; Contract grant number: R01 DK061662; Contract grant sponsor: National Institute of Diabetes, Digestive and KidneyDiseases; Contract grant number: R01 HL090577; Contract grant sponsor: CTSA.�Correspondence to: Ingo Ruczinski, Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University,Baltimore, MD 21205. E-mail: [email protected] 3 August 2009; Revised 24 October 2009; Accepted 23 November 2009Published online 21 June 2010 in Wiley InterScience (www.interscience.wiley.com).DOI: 10.1002/gepi.20488

INTRODUCTION

Schizophrenia (SZ) is a complex psychiatric disorderwith an estimated worldwide prevalence of 1%. Thediagnosis and classification of SZ relies on observationsof disease-related symptoms, including delusions,hallucinations, bizarre behavior, disorganized speech,flattened affect and lack of motivation. Research on therisk factors for SZ has thus far yielded few clues to causes,but has pointed to a heterogeneous etiology that likelyinvolves multiple genes and gene-environment inter-actions. From very early on, segregation analysis hasshown that SZ is heritable, but does not follow a simpleMendelian pattern [see Baron, 1986; Carter and Chung,1980; Risch and Baron, 1984]. To date, over 2000 papers onthe genetics of SZ have been published (SchizophreniaForum 2009, http://www.schizophreniaforum.org/) andabout 150 genes have been implicated; however, for eachgene identified there exists at least one paper reporting nolinkage or association.

Regardless of candidate gene or genome-wide scanapproaches, few studies have investigated the simulta-neous effect of two or more genetic factors on SZsusceptibility, despite evidence suggesting SZ risk maybe a function of a set of genes rather than a single one [forsome reviews see Chiu et al., 2002; Gogos and Gerber,2006; Harrison and Owen, 2003; Jurewicz et al., 2001].The multi-gene mechanism may be through gene-gene

interaction or via additive effects of independent geneticcauses. Either scenario can partially explain the commonphenomenon in genetic association analysis for SZ that nosingle gene or marker is found to have a large effect sizeand that each finding has been difficult to replicate acrossstudies. Thus, methods to accommodate multiple genessimultaneously when searching for SZ genetic risk factorsare absolutely necessary.

The advantages of family-based designs include greaterrobustness against population stratification [Spielman andEwens, 1996], direct observation of parental transmission,easier detection of Mendelian errors and de novodeletions, and the option to investigate parent-of-origineffects [Benyamin et al., 2009; Laird and Lange, 2006;Weinberg et al., 1998]. However, while several methods areavailable to identify gene-gene interactions in population-based studies (such as cohort and case-control studies),this task is challenging under family-based designs. Werecently introduced a novel statistical method to detectSNP-SNP interactions in trios with affected probands[Li et al., 2009], using an extension to the logic regressionapproach [Kooperberg et al., 2001; Ruczinski et al., 2003].In this manuscript, we describe and employ this case-parent trio-based logic regression approach to identifygene-gene interactions for SZ. Our approach can detecthigh-order interactions involving multiple independentSNPs while accounting for linkage disequilibrium (LD)among SNPs in a family data set. We evaluate and applythe proposed trio logic regression method to trio data from

r 2010 Wiley-Liss, Inc.

Page 2: Detection of SNP-SNP interactions in trios of parents with schizophrenic children

a SZ candidate gene study conducted on families ofAshkenazi Jewish descent [Fallin et al., 2005], anddemonstrate the method’s ability to detect multi-genemodels for SZ risk in the family-based design.

METHODS

TRIO LOGIC REGRESSION

Trio logic regression is an adaptation of the logicregression method [Kooperberg and Ruczinski, 2005;Ruczinski et al., 2003] originally developed for case-controldata. The approach searches for the optimal combinationsof risk factors, as represented by Boolean statements, thatbest explain differences between cases and controls.We have extended the method for the identification ofhigh-order interactions using a case-parent trio design, toidentify risk factor combinations that best distinguishtransmitted and non-transmitted genotypes. Below webriefly outline the approach, and then apply the method totrio data from a SZ candidate gene study conducted onfamilies of Ashkenazi Jewish descent [Fallin et al., 2005].

We assume only two risk groups in the population,defined by a combination of SNP genotypes. The SNPs aretypically represented as two binary variables in dominantand recessive coding, and combined by the Boolean and, orand not operators they generate the genotype pattern Gthat defines the risk group (‘‘carriers’’), and the low-riskindividuals who do not meet this pattern (‘‘non-carriers’’).The underlying disease association model for interactioncan be written as follows:

logp

1� p

� �¼ a1bIG; ð1Þ

where I is the indicator function (equal to one if G is true,and zero otherwise). Thus, a determines the log odds ofdisease among non-carriers, and b determines the log oddsratio for carriers vs. non-carriers. We use the symbol _ forthe Boolean operator or, ^ for and, and the symbol for theBoolean complement. We denote the SNP codingby D (dominant) and R (recessive). For example,(SNPR

1 ^ SNPD15) states that subjects with two variant alleles

at SNP 1 and at least one variant allele at SNP 15 are athigher risk (assuming b40).

Our method requires major modifications to the originallogic regression. Foremost, a setup for the trio data isrequired, suitable for conditional logistic regression with a1:3 matching ratio for the case genotype set vs. 3 possibleMendelian genotype realizations, given the parents.Further, the permutation tests to determine the optimalinteraction model size have to be modified for this setting,and computational challenges such as non-convergence inthe model search when dealing with family data andcorrelated SNP data have to be addressed. To generate thematched data for analysis, our method extends the conceptof ‘‘pseudo control’’ for multiple markers (i.e., the set ofother possible Mendelian realizations of genotype sets fora child, given the parental genotypes) while adjusting fornatural LD among markers. For unlinked markers, thenumber of ‘‘pseudo control’’ genotype combinationsgrows exponentially with the number of markers, sinceall locus combinations are possible (3n ‘‘pseudo-controls’’for n SNPs in the Boolean-defined risk set). This would

require a 1:9 matched set for two-SNP models,a 1:27 matched set for three SNPs, etc. To avoid thisdimensionality problem, we chose to restrict all analyses to1:3 matching by choosing a random sample of three‘‘pseudo-control’’ SNP combinations per model, regardlessof the full dimensionality. To generate one ‘‘pseudocontrol’’ serial genotype for each case, we randomlysample one ‘‘pseudo control’’ from the three possibilitieswithout replacement for each marker, and form them intoa serial of genotypes that represent one realization of thepossible ‘‘pseudo-controls’’. In contrast, for markers intight LD (e.g., markers belonging to the same haplotypeblock), ‘‘pseudo controls’’ are first generated for an entireLD block and then sampled as a unit. As in the originallogic regression approach, trio logic regression alsorequires complete data [various imputation approachessuitable for logic regression have been described in Daiet al., 2006].

In the R package trio we have implemented a haplotype-based approach specifically to impute missing genotypesin data suitable for trio logic regression. In brief, theimputation method takes the LD block structure andthe phase information into account, and is based on theobserved trio genotype data after Mendelian errors havebeen removed or resolved. The enumeration of all possiblehaplotype combinations for the trios can be prohibitive,but an efficient approach that avoids deriving the entiremating table has been implemented. Instances where allgenotypes of one or more subjects in the trio are missing(for example if the father of the proband was unknown ornot genotyped) can be accommodated. The completedtrios will then be used to generate the pseudo-controls,resulting in a data set suitable for trio logic regression [seeLi et al., 2009 for more details].

When fitting a trio logic regression, one can specify themodel size, i.e., the maximum number of binary factors, orSNPs in this case, that can be included in the finalinteraction term (definition of the high-risk carriers). Inaddition to an interaction term expressed as a Booleanexpression of susceptible genotypes in their dominant orrecessive coding, trio logic regression also returns anestimated coefficient value of the log odds ratio forthe Boolean risk set as well as the associated scoreas a measure of model fitness. Before settling onan optimal model in terms of SNP membership, wemust first determine the appropriate number of markersto include in an interaction model by fitting severalmodels of different sizes, then using permutation tests todecide the optimal model size. The permutation test isconducted by permuting the case and pseudo controlserial genotypes within each matched set after conditioningon a specific fitted logic model. For example, one beginswith fitting the best single-SNP model, then examinesthe distribution of permutations conditioning on thisfitted logic model (size 1) to see how much morevariability is explained beyond that fitted model. Thenone fits the best size 2 model, and examines thedistribution of permutation results conditioning on thatfitted model, etc., until further variability in the data set isnot well-explained by an increase in model size. Once themodel size is chosen in this way, the logic model of thatsize with the lowest deviance in the conditional logisticregression analysis is selected as the optimal overall modelof interaction for the data set [see Li et al., 2009 for moredetails].

397Detection of SNP-SNP Interactions

Genet. Epidemiol.

Page 3: Detection of SNP-SNP interactions in trios of parents with schizophrenic children

SZ CASE-PARENT TRIO ANALYSIS

We have applied this method to a SZ candidate genestudy [Fallin et al., 2005] consisting of 375 SNPs in 55genes, typed in each of 312 trios with affected cases. Thedata were collected on trios with at least one childdiagnosed with SZ or schizoaffective (SZA) disordersamong participants of Ashkenazi Jewish descent. Detailsabout data ascertainment are described in the previousanalysis paper [Fallin et al., 2005]. Out of the original 440SNPs, we excluded 65 SNPs on chromosome 22, due to alarge portion of missing data and Mendelian inheritanceerrors. In addition, we excluded nine trios due to greaterthan 90% missing parental genotype data.

SIMULATIONS

To evaluate the performance of trio logic regression,especially within the context of our SZ trio data, weconducted a simulation study to address two issues. First,we show how often our proposed method correctlyidentifies susceptibility loci under a disease risk modelof high-order interaction containing up to three SNPs.Second, we show the probability of detecting signalssimilar to our SZ data analysis results.

The simulations generate values that reflect the geneticvariation observed in our SZ data. Among founders’genotypes on 375 SNPs, we first estimated haplotypeblocks and frequencies using the software HaployView 4.1[Barrett et al., 2005]. To be consistent with the method usedto determine the block structure in the previous study[Fallin et al., 2005], we continued relying on D’ to measurepairwise LD and the solid spline method to determine theblocks, setting the cut-off value to partition the block to beD’40.80. The computation limitation for our trio logicregression requires no larger haplotype size than sevenloci. Therefore, several large blocks were divided into twoblocks depending on the LD among markers. A total of 152haplotype blocks were identified with the remaining 53SNPs not located in haplotype blocks and identified assingle markers. We used these haplotype block structures,haplotype frequencies, and allele frequencies as generatingvalues for our simulated trios.

When specifying two-SNP and three-SNP interactionmodels of risk, we used interaction structures and effectsizes in the range of those observed in our SZ data. AllSNPs included in a particular interaction scenario wereindependent (e.g., from differenthaplotype blocks), andnone of the risk SNPs had more than three other SNPs inhigh LD with it. For each interaction model, we set thepopulation risk among the non-carriers to 1% and 0.1%,respectively. For each of those values, the odds of beingaffected in the carrier group was set to two, three, and fourtimes that of the non-carrier group. Therefore, for each ofthe interaction terms we had six sub-scenarios, determinedby the prevalence-by-effect size combinations (Table I).For each sub-scenario, we generated 1,000,000 trios intotal, from which 300 or 500 trios were randomly selectedto form a data replicate. This was repeated 2,500 times.For each of the 2,500 data replicates per sub-scenario,we ran logic regression under the assumed modelsizes (e.g., the number of variables in the Boolean term)of 1–5.

We note that, in principle, one can obtain a P-value forthe parameter of the Boolean term in a logic model bysimply fitting the regression model (conditional logistic

regressions in this case), using the identified Booleanexpression as the covariate. However, this P-value reflectsnothing about the model-search and whether or not themethod selects the true risk factors (in particular, themodel search procedure is not based on hypothesistesting, but optimizing an objective function such as thelikelihood). Therefore, instead of relying on parameterssuch as the type I error rate and the power, we evaluate theperformance of our trio logic regression approach bydefining the ‘‘correct pick rates’’ as a measure of thesuccess of logic regression,identifying the correct SNPsinvolved in the simulated interaction.

In general, a logic regression analysis begins by fitting aseries of logic models at various model sizes of 1 to apre-determined size n. Once the optimal model size m ischosen (via permutation tests), the ‘‘best’’ interactionmodel for size m is identified as the combination that bestdescribes the association between predictors and response(e.g., the lowest deviance in conditional logistic regres-sion). We view a ‘‘successful’’ logic regression procedurefor any data set in general to be one in which the bestinteraction model of size m contains the true susceptibleSNPs. However, the permutation tests for model selectionare the computationally most demanding step in the logicregression analysis, and are prohibitive for a simulationsetup as described above. Thus, we generated logicregression models for various sizes, and calculated pickrates, described in greater detail below.

As a concrete example, for a true interaction of twoSNPs, we would define two types of correct pick rates insimulations: (a) one based on all data sets that identifyboth susceptible SNPs and (b) one based on all data setsthat identify at least one of the susceptible SNPs. Wecalculate these two types of pick rates for various a priorimodel size choices. For example, if the user had chosen a

TABLE I. The parameters and setup for the simulationstudy described in the Methods section. The SNP coding isdetermined by superscripts D (dominant) and R(recessive). We use the symbol 3 for the Boolean operatoror, and the symbol for the Boolean complement. Thedisease risk model is assumed to be logit(p) 5 a1bIG,where I is the indicator function (equal to one if G is true,and zero otherwise). Thus, a determines the log odds ofdisease among non-carriers, and b determines the log oddsratio for carriers versus non-carriers.

Genotypes combination (G) P(D|IG 5 0)(a) ORðbÞ

1.1 2(0.69)

1.2 rs3782219D _ rs1530848D 0.010(�4.6) 3(1.10)

1.3 4(1.39)

1.4 2(0.69)

1.5 rs3782219D _ rs1530848D 0.001(�6.9) 3(1.10)

1.6 4(1.39)

2.1 2(0.69)

2.2 rs3782219D _ rs1530848D _ rs3735736D 0.010(�4.6) 3(1.10)

2.3 4(1.39)

2.4 2(0.69)

2.5 rs3782219D _ rs1530848D _ rs3735736D 0.001(�6.9) 3(1.10)

2.6 4(1.39)

398 Li et al.

Genet. Epidemiol.

Page 4: Detection of SNP-SNP interactions in trios of parents with schizophrenic children

model size of 2, the correct pick rate for identifying at leastone of two true SNPs (pick rate definition b above)is calculated as the percentage of simulated data setreplicates in which at least one of the susceptible SNPs iscontained in the best size 2 interaction model of the logicregression analysis. For a chosen optimal model of size 3,the correct pick rate for identifying at least one of two trueSNPs is calculated as the percentage of simulated data setreplicates in which at least one of the susceptible SNPs iscontained in the best size 3 interaction model results of thelogic regression analysis, etc.

Generally, for a chosen model of size m, the correct pickrates pi

m are defined as the proportion of the N replicatesfor which the logic models of size m correctly identify atleast i of the susceptible loci, where i can vary from 1 to themaximum number of susceptible SNPs defined in theinteraction term. Therefore, in our simulation evaluation,we calculated p1

m and p2m for two-SNP interaction scenarios,

and p1m, p2

m and p3m for three-SNP interaction scenarios,

where m used in the analysis varied from 1 to 5. Becausewe generated data with LD, we also calculated pick rateswhere the replicate results contained the true risk SNP, orany SNP in high LD (same haplotype block) with the riskSNP, as this is a more likely situation in real data. As aresult, the pick rate based on blocks is always higher thanthe one for particular risk SNPs.

RESULTS

In this section we show the simulation study results,followed by the trio logic analysis results for our SZ trio data.

SIMULATION RESULTS

Our simulations show that the correct pick rate increaseswith the odds ratio as well as with sample size, but decreasesas more susceptibility SNPs are involved in the interactionterm. Occasionally, a small improvement in pick rate (around1%) can be found when the population risk among non-carriers drops below 1% (very rare disease), but in generalthis parameter had very little influence on the simulationresults. We therefore only present the pick rate results forscenarios with 1% population risk among non-carriers undertwo-way interaction models (Fig. 1) and three-way inter-action models (Fig. 2).

Under the two-SNP interaction model with 300 trios andthe odds ratio equal to 2, at least one correct block wasidentified approximately 45% of the time, using assumedmodels sizes from 1 to 5. Both blocks involved in theinteraction were correctly identified only about 5% of thetime, however (Fig. 1). The pick rate among all five modelswent up to 85% for picking one block and to above 30% forpicking both blocks when the odds ratios were assumed to

300

Trio

s

020

4060

8010

0

500

Trio

s

1 2 3 4 5 2 3 4 5

020

4060

8010

0

Model size:

Pick 1 correctly Pick 2 correctly

OR=4, blockOR=4, lociOR=3, blockOR=3, lociOR=2, blockOR=2, loci

Fig. 1. The correct pick rates pim(in %) for various model sizes (m) for the two-SNP interaction simulation, assuming 1.0% disease

prevalence among the non-carriers (see scenarios 1.1–1.3 in Table I). For a chosen model of size m, the correct pick rates pim are defined

as the proportion of the N 5 2500 replicates for which the logic models of size m correctly identify at least i of the susceptible loci, wherei can vary from 1 to the maximum number of susceptible SNPs defined in the interaction term. In our simulation evaluation we

calculated p1m (left) and p2

m (right), where m varies from 1 to 5. Because we generated data with LD, we also calculated pick rates where

the replicate results contained the true risk SNP, or any SNP in high LD (same haplotype block) with the risk SNP. The upper panels

correspond to the simulations using 300 trios, the lower panels correspond to the simulations using 500 trios.

399Detection of SNP-SNP Interactions

Genet. Epidemiol.

Page 5: Detection of SNP-SNP interactions in trios of parents with schizophrenic children

be 3. With a larger sample size of 500 trios and the oddsratio equal to 2, we saw a larger than 70% chance to find atleast one correct block, and a larger than 15% chance tofind both correct blocks among all models of size one tofive. The pick rates obviously were much higher for oddsratios of 3, corresponding to �100 and �70% for pickingone and two blocks, respectively.

Assuming a three-SNP interaction, the pick rates werelower compared to the corresponding pick rates under thetwo-SNP interaction model, assuming equal odds ratiosand sample sizes. The observed drop in pick rateswas larger for smaller logic regression model sizes. Forthree-SNP interaction models with 300 trios and oddsratio equal to 2, the rates were about 35% for picking oneblock and 5% for two blocks. These rates increasedto �60 and 13%, respectively, with a sample size of500 trios. As before, the rates increased substantiallywhen the odds ratios were assumed to be equal to 3 orhigher. Even for 300 trios, we observed a larger than70% chance to detect at least one block and 25% to detectat least two blocks in the logic models. However, ourchance to detect all three loci/blocks was very small(the highest was around 15%), unless the sample isincreased to 500 trios. At the highest, there is about 50%chance to detect all three blocks at odds ratio of 4 when thesample size is 500 trios. Meanwhile, the chances of pickingone or two correct blocks are about 100% and 85%,respectively.

Our results show that the trio logic method performswell even at somewhat small sample sizes of a fewhundred trios and at effect sizes equal to or higher than 2,

for a two-SNP interaction term, yielding non-trivialprobabilities to detect genetic signal. For higher interactionmodels, at larger effect sizes, our method continues tobring forth satisfactory results. Obviously, larger samplesizes would boost the chances to detect the loci or blocksthat affect the phenotype.

SZ TRIO RESULTS

Trio logic regression essentially serves as a tool toexplore possible interactions, and to detect the genotypecombinations best supported by the observed data. Wefirst fit trio logic regression multiple times in our SZ data,setting model sizes from 1 to 4 with an increment of 1. Theactual results derived from logic regression might besubject to some variability due to the following tworeasons. First, logic regression employs simulated annealingto investigate epistatic models, and thus, findings candiffer due to the probabilistic nature of the searchalgorithm. This is mainly an issue when a large numberof SNPs are typed and as a consequence, the search spaceis immense. Second, logic regression requires completerecords and thus, an imputation of missing genotypes.Therefore, differences in results could potentially arise ifthere are many missing data. Neither is the case here.Nonetheless, to investigate the robustness of our findingsin light of the missing genotypes and the probabilisticsearch, we generated a total of six matched data sets basedon our observed data (also recalling that only three of thepossible multi-locus pseudo-control genotype combinationsare sampled per family in each analysis) and conducted

300

Trio

s

020

4060

8010

0

500

Trio

s

1 2 3 4 5 2 3 4 5 3 4 5

020

4060

8010

0

Model size:

yltcerroc3kciPyltcerroc2kciPyltcerroc1kciP

OR=4, blockOR=4, lociOR=3, blockOR=3, lociOR=2, blockOR=2, loci

Fig. 2. The correct pick rates pim (in %) for various model sizes (m) for the three-SNP interaction simulation, assuming 1.0% disease

prevalence among the non-carriers (see scenarios 2.1–2.3 in Table I). The description of this plot is the same as in the caption of Figure 1,

except that we now evaluate p1m (left), p2

m (middle), and p3m (right).

400 Li et al.

Genet. Epidemiol.

Page 6: Detection of SNP-SNP interactions in trios of parents with schizophrenic children

the analysis on each of those data sets 10 times usingdifferent random seeds (60 logic regression runs in totalfor each model size). The findings were consistent asdescribed below.

The permutation test indicates that two or three SNPs isthe optimal model size, and that over-fitting occursthereafter (Fig. 3). We ran 500 permutations for each

conditional hypothesis test. Since the permutation scoresconditioning on model sizes two and three are verysimilar, we conclude that the optimal model size is eitherof two or three SNPs, with little additional phenotypevariation explained by increasing model size further. Theremaining ambiguity is unlikely to disappear due to ourlimited sample size.

Freq

uenc

y

0

Given model size 0

Given model size 1

Given model size 2

0.30

Given model size 3

510

1520

25

Freq

uenc

y

05

1015

2025

Freq

uenc

y

05

1015

2025

Freq

uenc

y

05

1015

2025

0.31 0.32 0.33 0.34 0.35

Fig. 3. Histograms of the permutation scores from the sequential hypothesis tests as described in Ruczinski et al. [2003] and adapted by

Li et al. [2009]. When conditioning on models of size 0, i.e., performing the null model test for signal in the data (upper panel), thedistribution of scores is worse than for the permutation runs when conditioning on models of size 1 (second panel), size 2 (third panel),

and size 3 (fourth panel). The latter two distributions are very similar but yield better scores than the one from size 1, indicating that the

optimal model size is 2 or 3.

401Detection of SNP-SNP Interactions

Genet. Epidemiol.

Page 7: Detection of SNP-SNP interactions in trios of parents with schizophrenic children

As described before, we used logic regression to identifythe best set of SNPs and their specific Boolean expressionfor various model sizes. Ordered by model sizes, the fouridentified logic models are:

1 0:665� frs3782219DðNOS1Þg

2 0:887� frs3782219DðNOS1Þ _ rs1530848D

ðCHRNB3Þg

3 1:145� frs3782219DðNOS1Þ _ rs1530848D

ðCHRNB3Þ _

rs3735736DðPNOCÞg

4 1:295� frs3782219DðNOS1Þ _ rs1530848D

ðCHRNB3Þ _

rs3735736DðPNOCÞ _ rs740603R

ðCOMTÞg

The four SNPs rs3782219D in the NOS1 gene onchromosome 12, rs1530848D in CHRNB3 on chromosome8, rs3735736D in prepronociceptin (PNOC) on chromosome8, and rs740603R in COMT on chromosome 22 weresequentially identified. It should be noted that these modelsdo not reflect the typical form of epistatic interaction.Because susceptible SNPs are connected via the ‘‘3’’ (or)operator, the models identify SNPs of comparable non-syngergistic effects. The permutation results indicated thatthe two-SNP or three-SNP model appear to be optimal forthis data set. The main effect of having the commonhomozygote on SNP rs3782219 (NOS1) is almost double thenon-carrier disease risk. The two-SNP logic model showsthat an individual with a common homozygote genotype atrs3782219 (NOS1), or with at least one variant allele atrs1530848 (CHRNB3) is 143% more likely to be affectedthan otherwise. Further, the above models state that havingthe common homozygote genotype at rs3782219 (NOS1), orat least one minor allele either at rs1530848 (CHRNB3) or atrs3735736 (PNOC), will triple the disease risk.

Before interpreting the scientific implications of theidentified models, we first elaborate on their statisticalmeaning from the perspective of traditional conditionallogistic regression, using various combinations of the firstthree identified SNPs (Tables II and III). The marginaleffect of SNP rs3782219D(NOS1) alone is substantial withthe odds ratio estimated at 1.95, and a P-value of3.7� 10�5. The marginal effects of the other two factors,rs1530848D (CHRNB3) and rs3735736D (PNOC), are notstatistically significant at a nominal type I error rate of 5%.The log-additive models including only two and threeSNPs, respectively, without interaction terms give similarestimates for the coefficients and the standard errors. This

is not surprising, as the SNPs do not appear to be in LD,and thus, the logistic likelihood factorizes.

However, after addition of interaction terms, the effects foreach SNP are no longer trivial. The conditional logistic modelcontaining rs3782219D(NOS1) and rs1530848D (CHRNB3),plus their interaction term, shows statistically significantinteraction (P-value 5 0.004), as well as significantly in-creased risks for both the rs3782219D(NOS1) andrs1530848D (CHRNB3), with an estimated odds ratio of 2.46(P-value 5 1.2� 10�6) for NOS1, and OR 5 2.52 (P-value 50.006) for CHRNB3 (Table II). It is worth noting that thismodel generates in essence identical effect estimates for bothSNPs, as well as a negative interaction coefficient of similarmagnitude (0.90 for NOS1, 0.92 for CHRNB3, and �1.09 forthe interaction, all within error of each other). This perfectlyexplains the ‘‘3’’ operator in the two-SNP logic modelrs3782219D _ rs1530848D, which can best be seen whenillustrating the estimated log odds ratio stratified bygenotypes on rs3782219D (NOS1) and rs1530848D (CHRNB3),based on the main effects only, main effects plus interaction,and the two-SNP logic regression model (Fig. 4).

Similarly, after we add all pairwise interaction terms tothe linear model with the top three identified SNPs, eachof the main effects shows a coefficient estimate near anodds ratio of 3, with interaction coefficients reflecting theresults of the logic regression model (Table III). Thesaturated linear model with an additional term for three-SNP interaction does not deviate substantially from thisobservation.

To understand the number of trios contributing to thesefindings, we present a matched count table for trios basedon the three identified SNPs for the logic models (Table IV),with emphasis on the number of ‘‘uninformative’’ trios inwhich all three ‘‘pseudo controls’’ match with the case onthe exposure, i.e., the risk factors or the Boolean expres-sions (Table IV, last column). When consideringrs1530848D (CHRNB3) or rs3735736D (PNOC) alone, over70% of the trios were uninformative. This explains thelarge standard error estimates which diminish the marginaleffect in single SNP analysis of the two. Combined withrs3782219D(NOS1), additional trios became informative.Meanwhile, because two SNPs act similarly towarddisease risk as rs3782219D(NOS1), the estimated effectfor the union sets increased, compared to the estimates forrs1530848 (CHRNB3) and rs3735736 (PNOC) alone.

TABLE II. Conditional logistic regression results for linear combinations of the top two identified SNPs and theirinteraction term (denoted by ‘‘:’’). To simplify the display of the results, the corresponding gene names are used insteadof the rs numbers, i.e. NOS1 for rs3782219, and CHRNB3 for rs1530848. While P-values for logic regression models arenot particularly meaningful due to the large number of models visited in the search, they are included to allow foranother comparison between the models listed.

dOR b̂ðseÞ Z P-value

Marginal NOS1D 1.94 0.67(0.16) 4.12 4e�05CHRNB3D 1.17 0.16(0.22) 0.71 0.480

Logic NOS1D _ CHRNB3D 2.43 0.89(0.18) 4.89 1e�06

Additive NOS1D 1.95 0.67(0.16) 4.15 3e�05CHRNB3D 1.21 0.19(0.22) 0.86 0.390

Additive NOS1D 2.46 0.90(0.19) 4.86 1e�06CHRNB3D 2.52 0.92(0.33) 2.77 0.006NOS1D : CHRNB3D 0.34 �1.09(0.34) �2.87 0.004

402 Li et al.

Genet. Epidemiol.

Page 8: Detection of SNP-SNP interactions in trios of parents with schizophrenic children

DISCUSSION

We have demonstrated the performance of trio logicregression to detect high-order interaction under thecontext of our SZ trio data. Using the proposed method,

we conducted a search for possible SNP-SNP interactionsfor SZ and SZA disorder using case-parent trio data. Weidentified three susceptible SNPs and their correspondinggenes, including NOS1 on chromosome 12, and CHRNB3and PNOC on chromosome 8.

0.86 0.19 0.19

0.86 0.19 0.19

0.67 0 0

main effects only

NOS NOS NOS NOS NOS NOS

CH

RN

B3D

0

1

2

0 1 2

CH

RN

B3D

CH

RN

B3D

0.73 0.92 0.92

0.73 0.92 0.92

0.9 0 0

main effects + interaction

0

1

2 0.89 0.89 0.89

0.89 0.89 0.89

0.89 0 0

logic regression

0

1

2

0 1 2 0 1 2

Fig. 4. The fitted log odds ratios by separate strata for different two-SNP models (see Table II). The log odds ratios are based on the

estimates from the conditional logistic regression model with main effects only (left), main effects plus interaction (middle), and thelogic regression model (right). The integers 0, 1, and 2 indicate the number of variant alleles for the respective SNPs, thus defining a

total of nine cells. The shades of the cells reflect the magnitude of the fitted log odds ratios. Since the fitted values from the model with

main effects and interactions are within error of each other, the logic regression model can be understood as a ‘‘smoothed’’ one-parameter model that indicates the same risk region.

TABLE III. Conditional logistic regression results for linear combinations of the top three identified SNPs and theirinteraction terms (denoted by ‘‘:’’). To simplify the display of the results, the corresponding gene names are used insteadof the rs number, i.e. NOS1 for rs3782219, CHRNB3 for rs1530848, and PNOC for rs3735736. While P-values for logicregression models are not particularly meaningful due to the large number of models visited in the search, they areincluded to allow for another comparison between the models listed.

dOR b̂ðseÞ Z P-value

Logic modelNOS1D _ CHRNB3D _ PNOCD 3.14 1.15(0.20) 5.67 2e�08Additive modelNOS1D 1.96 0.67(0.16) 4.16 3e�05CHRNB3D 1.20 0.18(0.22) 0.83 0.410PNOCD 1.54 0.43(0.25) 1.70 0.088Additive model of one marginal ef fect plus one two-SNP logic termNOS1D 1.96 0.67(0.16) 4.17 3e�05CHRNB3D _ PNOCD 1.47 0.39(0.18) 2.15 0.032Additive model with all two-SNP interactionsNOS1D 2.95 1.08(0.21) 5.24 2e�07CHRNB3D 3.08 1.13(0.35) 3.23 0.001PNOCD 3.68 1.30(0.40) 3.24 0.001CHRNB3D : NOS1D 0.29 �1.23(0.39) �3.16 0.002PNOCD : NOS1D 0.34 �1.08(0.45) �2.42 0.015CHRNB3D : PNOCD 0.22 �1.51(0.79) �1.90 0.058Additive model with all two-SNP interactions and the three-SNP interactionNOS1D 3.08 1.12(0.21) 5.35 9e�08CHRNB3D 3.47 1.24(0.36) 3.50 0.001PNOCD 4.25 1.45(0.41) 3.53 4e�04CHRNB3D : NOS1D 0.25 �1.39(0.40) �3.46 0.001PNOCD : NOS1D 0.27 �1.30(0.47) �2.79 0.005CHRNB3D : PNOCD 0.06 �1.78(1.24) �2.24 0.025NOS1D : CHRNB3D : PNOCD 9.35 2.24(1.48) 1.51 0.130

403Detection of SNP-SNP Interactions

Genet. Epidemiol.

Page 9: Detection of SNP-SNP interactions in trios of parents with schizophrenic children

Our two-SNP interaction model with predictorrs3782219D _ rs1530848D describes an estimated odds ratioof 2.43, indicating a 143% of increase in the disease risk if aperson has at least one of the matching genotypes. Ourthree-SNP interaction model shows that the presence ofany of the SNPs rs3782219D, rs1530848D, rs3735736D)increases the risk of SZ/SZA in an individual by 214%.This type of interaction does not meet our typicalexpectation of epistasis, where the higher risk is associatedwith simultaneous presence of multiple mutations.However, for the two-SNP model, we can interpret it inthe sense that the simultaneous presence of at least onevariant allele for rs3782219 and the common homozygoteon rs1530848 is protective against the disease with 41%risk reduction. The three-SNP model states that in additionto the above condition for the two-SNP model theindividual must have the common homozygous genotypeon rs3735736, resulting in a risk reduction of 69%.

Our simulations show that the chance of detecting aSNP effect like that observed for SNP rs3782219D(NOS1)or its neighboring SNPs is promising, about 86% for a truetwo-SNP interaction and 71% for a three-SNP interactionmodel. However, the chance of detecting effects like thatobserved for SNP rs1530848D (CHRNB3) is only about30%, assuming a two-SNP interaction model, an odds ratioof 3, and the disease risk among non-carriers being 1%.Despite the potential over-fitting problem, the pick ratesremain stable across different the model sizes. It adds toour confidence that the four fitted models for the SZ triodata set are nested (in the sense that every model of size 2or larger contains the SNPs of the smaller models),reflecting a consistent increase in model fitness withinthe reasonable model sizes. Since rs3782219 (NOS1) hasonly one other SNP in high LD (a two-SNP block) and bothrs1530848 (CHRNB3) and rs375736 (PNOC) belong to twodifferent blocks of four SNPs each, we think the gain inpick rates for blocks over those for loci is not an artifact oflarger block sizes, but mainly due to the strong LD amongmarkers in the same blocks.

According to the initial analysis [Fallin et al., 2005], ahighly suggestive main effect for SNP rs3782219 (NOS1)was identified via single marker TDT tests, with thereported P-value of 0.0003. SNP rs3782219 is located inneuronal nitric oxide synthase 1 (NOS1), an essential brainenzyme responsible for the synthesis of Nitric Oxide (NO).NO was identified as a widespread messenger molecule inthe central nervous system, displaying many propertiesof a neurotransmitter [Bredt and Snyder, 1992]. Theoverproduction by neural NOS is implicated in severalneurodegenerative diseases, including SZ, Alzheimers,and Parkinsons [Hogg et al., 2003]. Several genetic studieshave evaluated the association between the NOS gene andSZ or SZA disorders. In a German [Reif et al., 2006] and aJapanese study [Shinkai et al., 2002], NOS1 was reported tobe associated with SZ. Other studies have shown thatneuronal NOS expression is higher among the affected vs.controls, in areas of prefrontal cortex [Baba et al., 2004],and postmortem brain specimens of cerebellum [Karsonet al., 1996].

SNPs rs1530848 on chromosome 8 was not found to behighly suggestive in previous analyses. Its related gene,CHRNB3, is also called nicotinic cholinergic receptor, b3.Nicotinic acetylcholine receptors (nACHRs) are cholinergicreceptors that form ligand-gated ion channels in theplasma membranes of certain neurons, and have 12subunits, a2 through 10 and b2 through 4. CHRNB3 isone of them [Novere et al., 2002]. The neuronal nACHRsare made up of five subunits of types a and b in a ratio of3a:2b. These receptors take part in fast synaptic trans-mission (e.g., in autonomous ganglionic neurons andrestricted brain areas), releasing high concentrations ofneurotransmitters. Preferentially located at presynapticsites in several brain regions, they influence the release ofneurotransmitters [Hogg et al., 2003].

The nAChRs genes respond differently to nicotine, atvery different effective concentrations. Together withCHRNA6 gene, CHRNB3 have been identified for nicotinedependence in several samples [Hoft et al., 2009; Zeigeret al., 2008]. The hypothesis that smoking is a means ofself-medicating by SZ patients has long been proposed,initially due to the observation that SZ patients smokemore frequently and heavily than the general population[Mobascher and Winterer, 2008; Ochoa and Lasalde-Dominicci, 2007].

Two other subunit genes, CHRNB2 (located on 1q21)and CHRNA4 (located on 20q13) were chosen as candidategenes for cognitive impairments in SZ, based on theirpresence in the high nicotine binding sites in the brain. In aJapanese case-control study, both genes were studiedindividually for SZ and reported no association [Kishiet al., 2008], while Luca et al. [2006] found that the twogenes are simultaneously associated with SZ in Canadianfamilies. They also reported no significant association foreach individual gene. Furthermore, gene-gene interactionswere reported for nicotine dependence among CHRNA4,CHRNB2, brain-derived neurotrophic factor, and neuro-trophic tyrosine kinase receptor 2 genes in a case-controlstudy [Li et al., 2008].

It remains a complex question how different nACHRssubunit genes interact on the brain functions. We cannotexclude the possibility that CHRNB3 interacts with NOS1gene in the etiology of SZ and related disorders. Currently,only one study has reported the contribution of nACHRsto NO production and found that genes a7- and/or a9/10

TABLE IV. Matched count table for trios based on theidentified SNPs. Here, nk refers to the number of trios inwhich k pseudo-controls have the same value for theexposure variable (the value of the Boolean term) as thecases. Trios in column n3 are uninformative. As before, thecorresponding gene names are used instead of the rsnumbers to simplify the display of the results, i.e. NOS1 forrs3782219, CHRNB3 for rs1530848, and PNOC for rs3735736.

n0 n1 n2 n3

NOS1D

TRUE 16 86 0 127FALSE 0 47 20 16

CHRNB3D TRUE 0 44 4 6FALSE 3 35 0 220

PNOCD TRUE 0 40 1 2FALSE 1 25 0 243

NOS1D _ CHRNB3D TRUE 14 74 19 145FALSE 4 34 9 13

NOS1D _ CHRNB3D _ PNOCD TRUE 12 65 40 151FALSE 4 26 4 10

404 Li et al.

Genet. Epidemiol.

Page 10: Detection of SNP-SNP interactions in trios of parents with schizophrenic children

play a role in nicotine-induced NO-production in isolatedrat dorsal root ganglion neurons [Papadopolou et al.,2004].

The SNP rs375736 is located in the prepronociceptin(PNOC) gene. The initial study has reported a suggestiveassociation with SZ/SZA [Fallin et al., 2005]. Nociceptin/orphanin FQ (N/OFQ) peptide receptor (NOP) enablesseveral biological actions by N/OFQ. PNOC gene encodesthe NOP system, which may be responsible for pain,learning, memory, fear, anxiety, auditory processing,neuroendocrine control, sleep, and neuronal development[Blaveri et al., 2001]. Therefore, PNOC has been investigatedfor a potential involvement with alcohol and other drugadditions; however, no association was found [Xuei et al.,2008]. PNOC gene is not as widely studied as NOS1 andnACHRs genes. Currently, a biological justification for aninteraction among PNOC gene and other identified genesdoes not seem plausible.

Our novel method trio logic regression assumes thatonly two risk groups (as defined by the Boolean statement)exist in the population. This assumption may pose alimitation in that only this particular interaction modelform can be evaluated. It might be beneficial to furtherextend the method to cover more general forms; forexample, estimating additive effects of multiple riskgroups, and to incorporate gene-environment interactions.On the other hand, the trio logic method already enablesus to explore a vast number of models (exponential in thenumber of predictors). We expect that the prominent effectof most underlying interaction mechanisms can berepresented, possibly in a somewhat crude way, by aBoolean expression of some of the top influential markers(assuming that they are included in our analysis), andtherefore can be revealed by our method.

Another limitation of the current implementation is that,for efficiency in the simulations and imputations, weenforce a maximum haplotype length (seven loci). Thisrequires a division of larger haplotype blocks into smallersub-blocks. Another possible remedy could also be topre-select SNPs in the study based on the LD patternwithin and across blocks. In our SZ data set only fewblocks were larger than seven loci, and though not ideal,each of those was divided into two sub-blocks. Obviously,for studies using much denser marker maps the maximumhaplotype length requirement would pose a much moresevere constraint, and a pre-selection of SNPs (includingthe choice of tag SNPs) might be necessary.

In conclusion, one certainly needs to interpret ourinteraction findings with caution, considering our samplesize is only about 300 trios (in particular light of thesimulation results). The replication of our findings in anothersample would be extremely useful to validate the two-SNPor three-SNP interaction models. An investigation of therelationship among the NOS1, CHRNB3, and PNOC genesand their related functions, possibly yielding a satisfactorybiological explanation, could also lead us to interpret theinteraction mechanism in a more meaningful way.

ACKNOWLEDGMENTS

We thank the families for their participation in thisresearch. The SZ/SZA trio study was funded by NationalInstitutes of Mental Health (NIMH) grants R01MH057314and R01MH58153. Additional support was provided by

NIH grants R01 DK061662 from the National Institute ofDiabetes, Digestive and Kidney Diseases, R01 HL090577from the National Heart, Lung, and Blood Institute, and aCTSA grant to the Johns Hopkins Medical Institutions.

REFERENCESBaba H, Suzuki T, Arai H, Emson PC. 2004. Expression of nNOS and

soluble guanylate cyclase in schizophrenic brain. Neuroreport

15:677–680.

Baron M. 1986. Genetics of schizophrenia: I. Familial patterns and

mode of inheritance. Biol Psychiatry 21:1051–1066

Barrett JC, Fry B, Maller J, Daly MJ. 2005. Haploview: analysis and

visualization of LD and haplotype maps. Bioinformatics 21:

263–265.

Benyamin B, Visscher PM, McRae AF. 2009. Family-based genome-

wide association studies. Pharmacogenomics 10:181–190.

Blaveri E, Kalsi G, Lawrence J, Quested D, Moorey H, Lamb G,

Kohen D, Shiwach R, Chowdhury U, Curtis D, McQuillin A,

Gramoustianou ES, Gurling HM. 2001. Genetic association studies

of schizophrenia using the 8p21–22 genes: prepronociceptin

(PNOC), neuronal nicotinic cholinergic receptor alpha polypeptide

2 (CHRNA2) and arylamine n-acetyltransferase 1 (NAT1). Eur J

Hum Genet 9:469–472.

Bredt DS, Snyder SH. 1992. Nitric oxide, a novel neuronal messenger

Neuron 8:3–11.

Carter CL, Chung CS. 1980. Segregation analysis of schizophrenia

under a mixed genetic model. Hum Hered 30:350–356.

Chiu YF, McGrath JA, Thornquist MH, Wolyniec PS, Nestadt G,

Swartz KL, Lasseter VK, Liang KY, Pulver AE. 2002. Genetic

heterogeneity in schizophrenia ii: conditional analyses of affected

schizophrenia sibling pairs provide evidence for an interaction

between markers on chromosome 8p and 14q. Mol Psychiatry

7:658–664.

Dai JY, Ruczinski I, LeBlanc M, Kooperberg C. 2006. Imputation

methods to improve inference in SNP association studies. Genet

Epidemiol 30:690–702.Fallin MD, Lasseter VK, Avramopoulos D, Nicodemus KK, Wolyniec PS,

McGrath JA, Steel G, Nestadt G, Liang KY, Huganir RL, Valle D,

Pulver AE. 2005. Bipolar I disorder and schizophrenia: a

440-single-nucleotide polymorphism screen of 64 candidate genes

among Ashkenazi Jewish case-parent trios. Am J Hum Genet

77:918–936.

Gogos JA, Gerber DJ. 2006. Schizophrenia susceptibility genes:

emergence of positional candidates and future directions. Trends

Pharmacol Sci 27:226–233.

Harrison PJ, Owen MJ. 2003. Genes for schizophrenia? Recent findings

and their pathophysiological implications. Lancet 361:417–419.

Hoft NR, Corley RP, McQueen MB, Schlaepfer IR, Huizinga D,

Ehringer MA. 2009. Genetic association of the CHRNA6 and

CHRNB3 genes with tobacco dependence in a nationally repre-

sentative sample. Neuropsychopharmacology 34:698–706.

Hogg RC, Raggenbass M, Bertrand D. 2003. Nicotinic acetylcholine

receptors: from structure to brain function. Rev Physiol Biochem

Pharmacol 147:1–46.

Jurewicz I, Owen RJ, O’Donovan MC, Owen MJ. 2001. Searching for

susceptibility genes in schizophrenia. Eur Neuropsychopharmacol

11:395–398.

Karson CN, Griffin WS, Mrak RE, Husain M, Dawson TM, Snyder SH,

Moore NC, Sturner WQ. 1996. Nitric oxide synthase (NOS) in

schizophrenia: increases in cerebellar vermis. Mol Chem Neuro-

pathol 27:275–284.

Kishi T, Ikeda M, Kitajima T, Yamanouchi Y, Kinoshita Y, Kawashima K,

Okochi T, Inada T, Ozaki N, Iwata N. 2008. Genetic association

analysis of tagging SNPs in alpha4 and beta2 subunits of neuronal

nicotinic acetylcholine receptor genes (CHRNA4 and CHRNB2) with

405Detection of SNP-SNP Interactions

Genet. Epidemiol.

Page 11: Detection of SNP-SNP interactions in trios of parents with schizophrenic children

schizophrenia in the Japanese populations. JNeural Transm 115:

1457–1461.

Kooperberg C, Ruczinski I. 2005. Identifying interacting SNPs using

Monte Carlo logic regression. Genet Epidemiol 28:157–170.

Kooperberg C, Ruczinski I, LeBlanc ML, Hsu L. 2001. Sequence

analysis using logic regression. Genet Epidemiol 21:S626–S631.

Laird NM, Lange C. 2006. Family-based designs in the age of large-

scale gene-association studies. Nat Rev Genet 7:385–394.

Li MD, Lou XY, Chen G, Ma JZ, Elston RC. 2008. Gene-gene

interactions among CHRNA4, CHRNB2, BDNF, and NTRK2 in

nicotine dependence. Biol Psychiatry 64:951–957.

Li Q, Louis TA, Fallin MD, Ruczinski I. 2009. Trio logic regression—

detection of SNP-SNP interactions in case-parent trios. Working

Paper 194, Johns Hopkins Department of Biostatistics.

Luca VD, Voineskos S, Wong G, Kennedy JL. 2006. Genetic interaction

between alpha4 and beta2 subunits of high affinity nicotinic

receptor: analysis in schizophrenia. Exp Brain Res 174:292–296.

Mobascher A, Winterer G. 2008. The molecular and cellular neurobiol-

ogy of nicotine abuse in schizophrenia. Pharmacopsychiatry

41:S51–S59.

Novere NL, Corringer PJ, Changeux JP. 2002. The diversity of subunit

composition in nAChRs: evolutionary origins, physiologic and

pharmacologic consequences. J Neurobiol 53:447–456.

Ochoa EL, Lasalde-Dominicci J. 2007. Cognitive deficits in schizo-

phrenia: focus on neuronal nicotinic acetylcholine receptors and

smoking. Cell Mol Neurobiol 27:609–639.

Papadopolou S, Hartmann P, Lips KS, Kummer W, Haberberger RV.

2004. Nicotinic receptor mediated stimulation of NO-generation in

neurons of rat thoracic dorsal root ganglia. Neurosci Lett 361:32–35.

Reif A, Herterich S, Strobel A, Ehlis AC, Saur D, Jacob CP, Wienker T,

Topner T, Fritzen S, Walter U, Schmitt A, Fallgatter AJ, Lesch KP.

2006. A neuronal nitric oxide synthase (NOS-I) haplotype associated

with schizophrenia modifies prefrontal cortex function. Mol

Psychiatry 11:286–300.

Risch N, Baron M. 1984. Segregation analysis of schizophrenia and

related disorders. Am J Hum Genet 36:1039–1059.

Ruczinski I, Kooperberg C, Leblanc M. 2003. Logic regression.

J Comput Graphical Stat 12:475–511.

Shinkai T, Ohmori O, Hori H, Nakamura J. 2002. Allelic association of

the neuronal nitric oxide synthase (NOS1) gene with schizophre-

nia. Mol Psychiatry 7:560–563.

Spielman RS, Ewens WJ. 1996. The tdt and other family-based tests for

linkage disequilibrium and association. Am J Hum Genet59:983–989.

Weinberg CR, Wilcox AJ, Lie RT. 1998. A log-linear approach

to case-parent-triad data: assessing effects of disease genes

that act either directly or through maternal effects and that

may be subject to parental imprinting. Am J Hum Genet 62:

969–978.

Xuei X, Flury-Wetherill L, Almasy L, Bierut L, Tischfield J, Schuckit Jr M,

JIN, Foroud T, Edenberg HJ. 2008. Association analysis of genes

encoding the nociceptin receptor (OPRL1) and its endogenous ligand

(PNOC) with alcohol or illicit drug dependence. Addict Biol

13:80–87.

Zeiger JS, Haberstick BC, Schlaepfer I, Collins AC, Corley RP, Crowley TJ,

Hewitt JK, Hopfer CJ, Lessem J, McQueen MB, Rhee SH, Ehringer

MA. 2008. The neuronal nicotinic receptor subunit genes (CHRNA6

and CHRNB3) are associated with subjective responses to tobacco.

Hum Mol Genet 17:724–734.

406 Li et al.

Genet. Epidemiol.