The etiology of Parkinson disease (PD) involves both genetic susceptibility and environmental exposures. In particular, coffee consumption is inversely associated with PD but the mechanisms underlying this intriguing association are unknown. According to a recent genome-wide gene–environment interaction study, the inverse coffee–PD association was two times stronger among carriers of the T allele of SNP rs4998386 in gene GRIN2A than in homozygotes for the C allele. We attempted to replicate this result in a similarly sized pooled analysis of 2,289 cases and 2,809 controls from four independent studies (Denmark, France, Seattle-United States (US), and Rochester-US) with detailed caffeinated coffee consumption data and rs4998386 genotypes. Using a variety of definitions of coffee drinking and statistical modeling techniques, we failed to replicate this interaction. Notably, whereas in the original study there was an association between rs4998386 and coffee consumption among controls, but not among cases, none of the datasets analyzed here indicated an association between rs4998386 and coffee consumption among controls. Based on large, well-characterized datasets independent from the original study, our results are not in favor of an interaction between caffeinated coffee consumption and rs4998386 for PD risk and suggest that the original finding may have been driven by an association of coffee consumption with rs4998386 in controls. The next years will likely see an increasing number of papers examining gene–environment interactions at the genome-wide level, which poses important methodological challenges. Our findings underline the need for a careful assessment of the findings of such studies.
Genome-wide association studies (GWAS) have identified thousands of genetic risk variants for common diseases, which typically explain only a small proportion of the underlying heritability . Unexplained or missing heritability could be partly due to gene–environment interactions. PD is a good example of a disease for which numerous susceptibility loci  and putative risk or protective environmental factors  have been identified and may interact. Among environmental factors, there is robust epidemiological evidence that coffee consumption is inversely associated with PD independently of smoking . Caffeine is hypothesized to account for this association because it is an adenosine A2A-receptor antagonist, and this family of agents has been shown to be neuroprotective and attenuate loss of dopaminergic neurons in animal models of PD ; however, other explanations for this association, including reverse causation or confounding, cannot be discarded.
A recent genome-wide gene–environment interaction study in PD (testing 811,597 single nucleotide polymorphisms [SNPs] across 1,458 cases and 931 controls) used a joint test of marginal association and gene–environment interaction , followed by analyses stratified by coffee consumption, to identify modifiers of the coffee-PD association . The inverse association between coffee and PD was about two times stronger among carriers of the rare T allele of rs4998386 in GRIN2A than in homozygotes for the major C allele (odds ratio (OR) for interaction, ORinteraction = 0.52, p = 4×10−3). This finding was replicated in a pooled analysis of three independent US datasets (1014 cases, 1917 controls; ORinteraction = 0.48, p = 5×10−4). The authors concluded that the inclusion of coffee consumption in their analyses to test for an interaction with rs4998386 allowed them to uncover one of the most important PD susceptibility genes, not previously identified in GWAS due to its small overall effect. GRIN2A encodes a subunit of the N-methyl-D-aspartate (NMDA) glutamate receptor and regulates excitatory neurotransmission in the brain. The authors considered it to be biologically plausible that GRIN2A plays a role in PD through an interaction with caffeinated coffee and suggested that GRIN2A genotypes may be a useful biomarker for pharmacogenetic studies on prevention and treatment in PD.
The study by Hamza et al.  represents one of the first published attempts to identify gene–environment interactions at a genome-wide scale, a challenging task given the requirement of very large sample sizes with exposure data . The results from this study are of great interest as they may provide insight into the PD–coffee association and thus the underlying pathophysiology of PD. Analyses of gene–environment interactions can be performed through a variety of approaches , and, to better understand the findings presented by Hamza et al. , we performed a re-analysis of their data by examining the association between coffee and rs4998386 separately in cases and controls (Table S1). We found a strong positive association in controls between rs4998386-T and heavy coffee drinking (OR = 1.48, 95% CI = 1.23, 1.78, p = 3×10−5), thus suggesting that GRIN2A-rs4998386-T is associated with an increased likelihood of drinking coffee among persons free of PD. On the contrary, among PD cases, heavy coffee drinking tended to be less frequent in carriers of the rs4998386-T allele, but this association was not statistically significant (OR = 0.82, 95% CI = 0.65, 1.03, p = 0.08). Therefore, it appears that the interaction between rs4998386 and coffee consumption was in part explained by a positive association between the rs4998386-T allele and coffee consumption among controls, but not among PD cases.
Because of the well-described constraints of genome-wide gene–environment interaction analyses  and of this somewhat unusual pattern of gene–environment interaction, our objective was to replicate these findings by pooling data from four independent and well-characterized studies, three of them population-based, which had collected detailed coffee data.
Our analyses comprised 2,289 cases and 2,809 controls with complete data on coffee consumption, GRIN2A-rs4998386, and ever smoking. Rs4998386 genotypes were in Hardy-Weinberg equilibrium (HWE) in controls from each dataset (p≥0.05) and the frequency of the T allele was similar in controls across all studies (ranging from 8.7% to 11.8%). Rs4998386 was not associated with PD in any of the four datasets (Table S2). Danish participants had the highest level of coffee drinking. Ever coffee drinking was statistically significantly inversely associated with PD in the French and Danish datasets; in the Seattle-US dataset, PD cases were less frequently heavy coffee drinkers than controls, and there was no statistically significant association of coffee drinking and PD in the Rochester-US dataset (Table S2). Ever smoking was inversely associated with PD in all studies (Table S2). In pooled marginal association analyses of the French, Danish, and Seattle-US studies, rs4998386 showed no evidence for association with PD risk while ever coffee drinking was inversely associated with PD, showing a dose-response relation for all coffee variables (Table S3).
Table S4 shows the cross-tabulation of rs4998386 and coffee drinking by case-control status and dataset. Regardless of the definition of coffee drinking, there was no consistent significant departure from multiplicative effects of rs4998386 and coffee drinking in any of the individual datasets (Table 1, Table S5). In pooled analyses of the French, Danish, and Seattle-US datasets (Table 1), the inverse association with ever coffee drinking was stronger among CT+TT carriers (OR = 0.73/1.32 = 0.55) compared to CC carriers (OR = 0.77), but the difference was not statistically significant (ORinteraction = 0.72, p = 0.18). In analyses based on quantitative characteristics of coffee drinking, there was no evidence of statistically significant interactions, except for the category of 130–200 cupyears of coffee consumption (p = 0.038): the association with cupyears was stronger among CT+TT carriers (OR = 0.52/1.33 = 0.39) compared to CC carriers (OR = 0.70). Analyses based on the Rochester-US dataset and pooled analyses of all datasets revealed no statistically significant interactions. Analyses using the same approach to categorize coffee drinking as Hamza et al.  revealed no statistically significant interactions, except for participants from the Rochester-US dataset in the second quartile of cupyears; however, this result was only based on seven cases and 23 controls, this pattern was not apparent in the other studies, and there was no evidence of interaction at higher consumption levels (Table S6). In addition, interaction ORs with heavy coffee drinking tended to be greater than one, whereas Hamza et al.  reported interaction ORs smaller than one (Table S6). Pooled analyses of the French, Danish, and Seattle-US data using the empirical Bayes approach yielded results consistent with those of our main analyses; compared to the traditional case-control analysis, interaction ORs were generally closer to one and p-values greater (Table S7).
We found similar results in sensitivity analyses when excluding TT homozygotes, adjusting for packyears of smoking (2140 cases, 2602 controls) or Mini-Mental State Examination (MMSE) (686 cases, 1,100 controls), or upon stratification by sex, median disease duration (<5 versus ≥5 years), and median age (≤70 versus >70 years) (data not shown). In addition, in the Seattle-US dataset, there was no interaction between rs4998386 and total caffeine intake from seven food and beverage sources.
Case-only analyses of the association between rs4998386 CT-TT genotypes and coffee consumption showed no evidence of association regardless of the coffee definition (Table 2). Table 3 shows the same set of analyses in controls. While there was no statistically significant association between rs4998386 and coffee, OR estimates tended to be greater than one.
Taken altogether, these findings are not in favor of an interaction between rs4998386 and coffee drinking for the risk of PD.
In this large data pooling effort across multiple sites in the US and Europe, we found no evidence of an interaction between coffee intake and GRIN2A-rs4998386 in PD as previously reported , even though we included a similar number of cases and controls as the replication phase and more than twice as many participants as the discovery phase of the original study . We performed extensive sensitivity analyses, in which we considered alternative definitions of coffee consumption, applied different statistical approaches, and performed stratified analyses that demonstrated the robustness of our lack of replication of the interaction between coffee intake and GRIN2A-rs4998386 in PD.
There are several possible explanations for our lack of replication. First, one could argue that the approach of Hamza et al.  is not specifically targeted at identifying gene–environment interactions: for the genome-wide discovery phase, they used the 2-df Kraft test, i.e., a test that combines marginal and interaction effects and was originally presented as a “tool for large-scale association scans where the true gene–environment interaction model is unknown” . For their replication, Hamza et al.  specifically focused on the rs4998386-PD association among heavy coffee drinkers, which was genome-wide significant in their pooled analyses of discovery and replication data (OR = 0.51, p = 7×10−8); however, the test for the interaction between rs4998386 and coffee was not genome-wide significant (OR = 0.51, p = 3×10−5). Second, the interaction reported by Hamza et al.  resulted in part from a highly significant association between coffee consumption and rs4998386 among controls. Interestingly, this is the only situation where the case-only approach is less efficient than traditional case-control studies to identify gene–environment interactions . The interpretation of this pattern of association in the Hamza et al.  study is not straightforward: while controls who carried the rs4998386-T allele were heavier coffee drinkers than noncarriers, there was a nonsignificant association between rs4998386-T and coffee in the opposite direction among PD patients, therefore suggesting that GRIN2A may play a role in coffee drinking behaviour with opposite effects in healthy subjects and PD cases. In contrast, we found no association between coffee drinking and rs4998386 among population controls included in the present study. This is supported by a meta-analysis of GWAS on coffee intake from eight Caucasian cohorts (n = 18,176) that found no association between the number of cups of coffee per day and GRIN2A-rs4998386 in healthy subjects (beta regression coefficient per one T allele = 0.0105, SE = 0.0165, p = 0.52; I2 = 18%, pheterogeneity = 0.41; personal communication ). Third, PD patients included in the Hamza et al.  study were younger than those included in the present analysis. However, we found no evidence of interaction in analyses restricted to younger PD patients and controls. Fourth, Hamza et al.  used dataset-specific cutoffs to define coffee variables; this approach combines participants from separate datasets with different exposure levels in the same category and the resulting ORs do not have a simple interpretation. Our results were sensitive to the way coffee consumption data were categorized, as interaction estimates from analyses based on our main definition and those based on Hamza et al.  were not comparable; it is therefore possible that findings from Hamza et al.  may be sensitive to the way coffee data were categorized for their analyses.
According to our power calculations, our study was well powered to identify an interaction of the size estimated by Hamza et al.  or even weaker. The case-only approach, a method with increased statistical power to detect gene–environment interactions compared to traditional case-control analyses, relies on the assumption of gene–environment independence among controls . In our study, rs4998386 was not associated with coffee consumption among controls and the case-only approach also did not identify a statistically significant gene–environment interaction; moreover, the interaction estimate was not in the same direction as reported by Hamza et al. . Although the number of controls included in the present study was sufficient to detect an association between rs4998386 and coffee among controls of the size estimated based on the data from Hamza et al.  (Table S1), it could be argued that it was insufficient to detect a much weaker association. For this reason, we also implemented an empirical Bayes method that allows relaxing the gene–environment independence assumption while still maintaining increased efficiency compared to a traditional case-control analysis , and we also failed to detect an interaction using this approach.
The datasets included in the present analysis have considerable strengths. Notably, three of them were population-based with controls representative of the underlying population from which the cases arose. We included participants from various regions characterized by a wide range of coffee consumption behaviors, with a particularly high coffee consumption in the study from Denmark. These studies had a variety of designs and all failed to confirm the interaction. Most PD patients were clinically evaluated by movement disorders specialists in a standardized way in three of the studies. Finally, we used several analytic methods that all produced consistent results.
There are also limitations to this analysis. Three of the studies included prevalent PD patients; however, there was no evidence of interaction in those with shorter disease duration. Patients were not clinically assessed as part of the study in Denmark; however, PD patients were followed at neurological centers and an extensive effort was made to standardize diagnoses based on the review of the complete medical records . The inverse association between coffee and PD was weaker in the Seattle-US dataset than in the European dataset, but this is unlikely to bias interaction odds ratios . Coffee and PD were not inversely associated in the Rochester-US dataset, which is likely due to its use of sibling controls; this design may be less efficient to examine associations with environmental factors (because of overmatching), but it has been shown that they provide unbiased estimates of gene–environment interactions and have, in fact, increased power to detect them compared to traditional study designs . Finally, we included only cases and controls of self-reported non-Hispanic Caucasian race/ethnicity and our analyses were adjusted for and stratified by the dataset, but we cannot exclude the possibility that more subtle within-study population substructure may influence our results. However, it is very unlikely that this may account for our negative findings for several reasons: (i) The frequency of rs4998386 genotypes was comparable across all studies. Hence, the minor allele frequency does not appear to vary substantially across non-Hispanic Caucasians from different countries. (ii) One of the four studies included affected PD cases and their unaffected sibs; this design is not at risk of bias due to population stratification. (iii) In this paper, our main focus is the estimate of the GRIN2A-by-coffee interaction. Previous work shows that that if there is no association between the genetic and the environmental factor within ethnic groups, the unadjusted (for population stratification) interaction estimate is unbiased . As there was no association between rs4998386 and coffee in any of the datasets, it is therefore unlikely that population substructure has a major impact on our results.
In summary, our results strongly suggest that GRIN2A-rs4998386 does not interact with coffee for the risk of PD. Future studies of PD, coffee consumption, and genes are of continued interest to improve our understanding of whether the association between PD and coffee is truly causal, and if so, what are the underlying pathophysiological mechanisms. Such investigations may benefit substantially by considering how the interaction manifests, i.e., whether it is driven by cases or controls. The coming years will likely see an increasing number of papers on gene–environment interactions at the genome-wide scale which pose important methodological challenges. Our findings underline the need for a careful assessment of the findings of such studies.
Written informed consent was obtained from all subjects, and the study protocol was approved by the UCLA Institutional Review Board, the Danish Data Protection Agency, the ethics committee of Copenhagen, the ethics committee of the Pitié Salpêtrière University Hospital, the Institutional Review Boards at the University of Washington and Group Health Cooperative, and the Institutional Review Board of the Mayo Clinic (Rochester, MN).
A population-based case-control study was performed within a health insurance system (Mutualité Sociale Agricole, MSA). Patients (18–80 years) from five French districts who were treated for PD were included (2006–2007) . They were examined by neurologists and PD was diagnosed using standard criteria . Two controls per case were randomly drawn from the electronic list of all MSA members and individually matched on age, sex, and district of residency. The reference year was the year of PD onset in cases and the same year in matched controls. The participation rate was similar in cases (82%) and controls (77%). We excluded 31 cases and 62 controls without a DNA sample or of non-European ancestry, leaving 300 cases and 598 controls for the analyses.
PD patients treated at ten large neurological centers in Denmark were identified in the Danish National Hospital Register files (1996–2009), which include information since 1977 on all hospitalizations in Denmark, and matched on birth year and sex to 5–10 density sampled controls selected from the Danish Central Population Registry at time of case identification. From among 2,762 putative eligible PD cases, 179 were excluded from recruitment owing to lack of a PD diagnosis after an initial medical record review by medically trained research staff supervised by a movement disorder specialist, leaving 2,583 patients to be contacted. Of these, 497 (19%) declined participation, and the diagnosis of idiopathic PD could not be confirmed using standard diagnostic criteria for another 273 putative patients . The remaining 1,813 idiopathic PD cases provided exposure data by questionnaire and interview, of whom 1,575 (87%) also provided DNA samples for genotyping. Out of 3,626 eligible controls, 1,887 completed an interview and questionnaire and 1,607 (85%) provided a DNA sample. 287 cases and 213 controls had missing information for either coffee drinking or smoking, leaving 1,288 cases and 1,394 controls for the analyses. The reference date for exposure assessment was the occurrence of the first cardinal (motor) symptom or the first date of PD diagnosis from the medical records, and controls were assigned the date of their respective matched cases.
Newly diagnosed PD cases were identified in a population-based case-control study conducted in western Washington State at Group Health Cooperative (GHC), a health maintenance organization, and the University of Washington Neurology Clinic in Seattle (1992–2008) , . All diagnoses were confirmed by a neurologist or verified by a team of neurologists by consensus chart review . Cases were enrolled within four years of diagnosis (most within two years), and all had a MMSE score ≥24. Controls were neurologically normal (no history of multiple sclerosis, Alzheimer's disease, or other neurodegenerative disorder, MMSE score ≥24), enrolled in GHC, and frequency-matched to cases by sex, age, race and ethnicity, clinic, and year of GHC enrollment. 386 cases and 502 controls who were non-Hispanic Caucasians with genotyping, coffee, and smoking data were included in the analyses.
This family-based dataset consists of 443 discordant sibling pairs, such that each sibling pair has one member affected with PD and one unaffected. Cases residing in Minnesota or one of the surrounding states (Wisconsin, Iowa, South Dakota, and North Dakota) were enrolled (1996–2004) at the Department of Neurology of the Mayo Clinic (Rochester, MN). All cases underwent a standardized clinical assessment performed by a neurologist; PD was diagnosed using standard criteria . Cases provided a genealogical history, and, when permitted, available siblings were contacted for a telephone interview to exclude parkinsonism via a validated screening instrument. Cases were matched to a single participating sibling without parkinsonism, first by sex (when possible) and then by closest age . Exposure data were obtained by direct (or proxy for incapacitated subjects) interview using a structured questionnaire administered via telephone by trained research assistants blinded to case-control status . Both genotype and relevant exposures were available for 315 pairs.
The source of DNA was saliva (Oragene) for France and Denmark, blood (86%) and buccal specimens (14%) for Seattle-US, and blood for Rochester-US.
SNP rs4998386 was genotyped in the French (call rate, 97%), Seattle-US (call rate, 97%) and Danish (call rate, 96%) datasets using allelic discrimination assays based on TaqMan chemistry according to the manufacturer's protocol (Life Technologies, Inc). For the genotyping of the French samples, six DNA samples from the PEG dataset included in the study of Hamza et al.  were included on each plate (concordance rate, 100%). For the Danish samples, each plate included 5% HapMap CEU (Northern/Western Europe ancestry) control samples (concordance rate, 100%).
The Rochester-US dataset contained individual-level genotype data from the “Mayo-Perlegen Linked Efforts to Accelerate Parkinson's Solutions (LEAPS) Collaboration” . Genotyping was performed using a Perlegen platform (198,345 SNPs). Data cleaning was performed with the PLINK toolset v1.07 (http://pngu.mgh.harvard.edu/~purcell/plink/) as previously described . Briefly, samples were cleaned on the basis of genotyping efficiency and quality by excluding all SNPs with a minor allele frequency (MAF)<0.01, missing rates >2%, or HWE violations (P<1×10−6); samples with genotyping efficiency <95% were excluded. This resulted in 149,817 analyzable SNPs (433 PD cases, 428 controls). rs4998386 was not genotyped and was imputed as follows: uncovered autosomal SNPs were imputed on a genome-wide scale from the cleaned dataset using the IMPUTE program v2.0 (http://mathgen.stats.ox.ac.uk/impute/impute_v2.html) and the precompiled HapMap 3 (release #2) and 1000 Genomes CEU+TSI (Pilot 1) panels (obtained on June 2nd, 2010). Individual-level genotypes were assigned according to the genotype called with 0.9 or greater posterior call probability, or coded as missing if the posterior call probability fell below 0.9. Imputed datasets were cleaned following the same thresholds as outlined above. The final dataset consisted of 735,746 SNPs (429 PD cases, 427 controls), including rs4998386.
Analyses were based on caffeinated coffee consumption, as for three out of the four datasets included in Hamza et al. . All studies assessed whether participants had ever been coffee drinkers. The French, Danish, and Rochester-US studies collected information on the usual number of cups of coffee per day that participants drank during several periods of life, thus allowing us to compute an average number of coffee cups per day and a cumulative number of coffee cupyears (number of cups per day multiplied by the number of years); only exposures occurring prior to PD onset in cases or the reference date in controls were considered. The Seattle-US study collected information on the typical lifetime number of coffee cups per day, but not on duration of coffee drinking; we used data from a PD case-control study conducted in California (PEG) and included in Hamza et al.  to impute average duration of coffee drinking. Among several covariates (PD disease status, smoking, age, sex, number of coffee cups per day), the main determinants of duration of coffee drinking were age and number of coffee cups per day; we used these covariates to impute duration of coffee drinking using linear regression for this study.
For our main analyses, we used four different definitions of coffee intake: (i) never versus ever coffee drinking; (ii) number of cups per day in four classes (never, 1, 2, ≥3); (iii) cupyears (never, [0–65], [65–130], [130–200], >200); (iv) years of coffee drinking (never, [0–37], [37–45], [45– 53], >53); cutoffs were chosen a priori, based on the inspection of the distributions of the variables in each dataset, so that there was a sufficient number of exposed subjects in each category in all studies.
In sensitivity analyses, we used the same approach as Hamza et al. : median cupyears was determined among controls (excluding those with zero intake) in each dataset separately and used to distinguish light (never, ≤median) from heavy (>median) drinkers; in addition, quartiles of cupyears were defined among controls from each dataset using the full range (from zero to maximum intake). We did not use this approach as our primary method, because it combines in the same category participants from different datasets with different exposure levels and creates difficulties for the interpretation of results. In addition, never coffee drinkers may have different characteristics compared to coffee drinkers, but are not considered as a distinct category. Additionally, we performed sensitivity analyses for the Seattle-US dataset using caffeine intake from seven food and beverage sources as the exposure variable .
Power calculations were performed using the Quanto Software .
Based on the following parameters (minor allele frequency, MAF: 10%; ever coffee drinking: 80%; marginal genetic OR: 1.0; marginal coffee OR: 0.75; ncases = 2289, ncontrols = 2809), our study had a power of 98.8% to detect an interaction OR of 0.5 (as estimated by the original study ) at the 0.05 two-sided level; power was still adequate (86.7%) to detect a weaker interaction OR of 0.6. Assuming a 40% exposure frequency for heavy versus light coffee drinking, the power to detect an interaction OR of 0.5 was 99.3% and it was 91.0% to detect an interaction OR of 0.6.
Our study had a power of 99.9% to detect an interaction OR of 0.5; power was still adequate (85.0%) to detect a weaker interaction OR of 0.7. If we assumed a 40% exposure frequency (heavy versus light coffee drinking), the power to detect an interaction OR of 0.5 was 99.9% and it was 86.6% to detect an interaction OR of 0.7.
Only non-Hispanic Caucasian subjects were included in the analysis as there were very few subjects from other racial and ethnic groups in all studies. We checked that rs4998386 genotypes were in HWE among controls from each dataset using an exact test (p≥0.05). All analyses were first performed independently for each dataset. For the French, Danish, and Seattle-US studies, we computed ORs and 95% confidence intervals (CI) using unconditional logistic regression adjusted for age (in quartiles) and sex; we broke the matching for the French and Danish studies as some participants did not provide DNA. For the Rochester-US dataset, we used conditional logistic regression to take into account the fact that cases and controls were related. All analyses were also adjusted for ever cigarette smoking. Second, we obtained pooled OR estimates for the French, Danish, and Seattle-US studies by using unconditional logistic regression adjusted for age (in quartiles), sex, ever cigarette smoking, and dataset; we did not include the Rochester-US dataset at this stage due to the difference in study design. Finally, we combined the four studies in a single analysis by using an approach that allows pooling of individual data from matched and unmatched case-control studies , .
We examined the marginal association of rs4998386 and coffee with PD. Since TT homozygotes were very rare (<1% of controls in all studies), we used a dominant model of inheritance (at least one T-allele versus none); in sensitivity analyses, we excluded TT-homozygotes to check for the robustness of our results .
To investigate the interaction between rs4998386 and coffee, we used a variety of approaches. We estimated the individual and joint effects of rs4998386 (dominant coding) and coffee and performed a statistical test of interaction by including multiplicative terms between rs4998386 and each category of coffee drinking in the models while retaining all respective main effects . A global test of interaction was performed by comparing the log-likelihood of this model with that of a model without interaction terms; this approach is preferable compared to including interactions with linear continuous variables that may lead to biased interaction estimates . We tested for interactions on a multiplicative scale as in the original paper ; when at least one exposure is “preventive” (e.g., coffee), multiplicative statistical models are appropriate according to several causal models . Interactions were also estimated through an empirical Bayes method that allows relaxing the gene–environment independence hypothesis among controls and is usually more powerful than the standard analysis ; this approach is only available for unmatched case-control data and was not implemented for the Rochester-US dataset.
Finally, we studied the association between rs4998386 and coffee separately among cases and controls using unconditional logistic regression adjusted for age (in quartiles), sex, ever cigarette smoking, and dataset. Under the hypothesis of gene-environment independence among controls (i.e., rs4998386 is not associated with coffee drinking behavior), a significant association between rs4998386 and coffee among cases indicates an interaction. This approach is usually more powerful than a traditional case-control analysis; however, if the hypothesis of gene–environment independence among controls does not hold, interaction ORs are biased and type 1 error is inflated .
In sensitivity analyses, we performed analyses stratified by sex. Since participants included in the present study were on average older than those included in the original paper , we also performed analyses stratified by median age. We assessed whether disease duration had an influence on our findings by performing analyses by disease duration (<5 years, ≥5 years). We also checked whether adjusting for MMSE (available for the French and Seattle-US datasets) or packyears of smoking had an impact on our findings.
Conditional and unconditional logistic regression and empirical Bayes analyses were performed with R, v3.01 (R-Foundation for Statistical Computing, Vienna, Austria).
9. ThomasDC, LewingerJP, MurcrayCE, GaudermanWJ (2012) Invited commentary: GE-Whiz! Ratcheting gene-environment studies up to the whole genome and the whole exposome. Am J Epidemiol 175: 203–207.
10. AminN, ByrneE, JohnsonJ, Chenevix-TrenchG, WalterS, et al. (2012) Genome-wide association analysis of coffee drinking suggests association with CYP1A1/CYP1A2 and NRCAM. Mol Psychiatry 17: 1116–1129.
11. YangQ, KhouryMJ, FlandersWD (1997) Sample size requirements in case-only designs to detect gene-environment interaction. Am J Epidemiol 146: 713–720.
12. MukherjeeB, ChatterjeeN (2008) Exploiting gene-environment independence for analysis of case-control studies: an empirical Bayes-type shrinkage estimator to trade-off between bias and efficiency. Biometrics 64: 685–694.
13. WermuthL, LassenCF, HimmerslevL, OlsenJ, RitzB (2012) Validation of hospital register-based diagnosis of Parkinson's disease. Dan Med J 59: A4391.
14. MorimotoLM, WhiteE, NewcombPA (2003) Selection bias in the assessment of gene-environment interaction in case-control studies. Am J Epidemiol 158: 259–263.
15. WitteJS, GaudermanWJ, ThomasDC (1999) Asymptotic bias and efficiency in case-control studies of candidate genes and gene-environment interactions: basic family designs. Am J Epidemiol 149: 693–705.
16. WangY, LocalioR, RebbeckTR (2006) Evaluating bias due to population stratification in epidemiologic studies of gene-gene or gene-environment interactions. Cancer Epidemiol Biomarkers Prev 15: 124–32.
17. FayardC, BonaventureA, BenatruI, RozeE, DumurgierJ, et al. (2011) Impact of recommendations on the initial therapy of Parkinson's disease: a population-based study in France. Parkinsonism Relat Disord 17: 543–546.
18. BowerJH, MaraganoreDM, McDonnellSK, RoccaWA (1999) Incidence and distribution of parkinsonism in Olmsted County, Minnesota, 1976–1990. Neurology 52: 1214–1220.
19. CheckowayH, PowersK, Smith-WellerT, FranklinGM, LongstrethWT, et al. (2002) Parkinson's disease risks associated with cigarette smoking, alcohol consumption, and caffeine intake. Am J Epidemiol 155: 732–738.
20. Searles NielsenS, CheckowayH, ButlerRA, NelsonHH, FarinFM, et al. (2012) LINE-1 DNA methylation, smoking and risk of Parkinson's disease. J Parkinsons Dis 2: 303–308.
21. MaraganoreDM, de AndradeM, LesnickTG, StrainKJ, FarrerMJ, et al. (2005) High-resolution whole-genome association study of Parkinson disease. Am J Hum Genet 77: 685–693.
22. FacherisMF, SchneiderNK, LesnickTG, de AndradeM, CunninghamJM, et al. (2008) Coffee, caffeine-related genes, and Parkinson's disease: a case-control study. Mov Disord 23: 2033–2040.
23. Searles NielsenS, FranklinGM, LongstrethWT, SwansonPD, CheckowayH (2013) Nicotine from edible Solanaceae and risk of Parkinson disease. Ann Neurol 74: 472–477.
24. GaudermanWJ (2002) Sample size requirements for association studies of gene-gene interaction. Am J Epidemiol 155: 478–484.
25. HubermanM, LangholzB (1999) Re: “Combined analysis of matched and unmatched case-control studies: comparison of risk estimates from different studies”. Am J Epidemiol 150: 219–220.
26. MorenoV, MartínML, BoschFX, de SanjoséS, TorresF, et al. (1996) Combined analysis of matched and unmatched case-control studies: comparison of risk estimates from different studies. Am J Epidemiol 143: 293–300.
27. BottoLD, KhouryMJ (2001) Commentary: facing the challenge of gene-environment interaction: the two-by-four table and beyond. Am J Epidemiol 153: 1016–1020.
28. Tchetgen TchetgenEJ, KraftP (2011) On the robustness of tests of genetic associations incorporating gene-environment interaction when the environmental exposure is misspecified. Epidemiology 22: 257–261.
29. WeinbergCR (1986) Applicability of the simple independent action model to epidemiologic studies involving two factors and a dichotomous outcome. Am J Epidemiol 123: 162–173.