Modeling of Environmental Effects in Genome-Wide Association Studies Identifies and as Novel Loci Influencing Serum Cholesterol Levels

Download PDF České info

Genome-wide association studies (GWAS) have identified 38 larger genetic regions affecting classical blood lipid levels without adjusting for important environmental influences. We modeled diet and physical activity in a GWAS in order to identify novel loci affecting total cholesterol, LDL cholesterol, HDL cholesterol, and triglyceride levels. The Swedish (SE) EUROSPAN cohort (N_SE = 656) was screened for candidate genes and the non-Swedish (NS) EUROSPAN cohorts (N_NS = 3,282) were used for replication. In total, 3 SNPs were associated in the Swedish sample and were replicated in the non-Swedish cohorts. While SNP rs1532624 was a replication of the previously published association between CETP and HDL cholesterol, the other two were novel findings. For the latter SNPs, the p-value for association was substantially improved by inclusion of environmental covariates: SNP rs5400 (p_{SE,unadjusted} = 3.6×10⁻⁵, p_SE,adjusted = 2.2×10⁻⁶, p_{NS,unadjusted} = 0.047) in the SLC2A2 (Glucose transporter type 2) and rs2000999 (p_{SE,unadjusted} = 1.1×10⁻³, p_SE,adjusted = 3.8×10⁻⁴, p_{NS,unadjusted} = 0.035) in the HP gene (Haptoglobin-related protein precursor). Both showed evidence of association with total cholesterol. These results demonstrate that inclusion of important environmental factors in the analysis model can reveal new genetic susceptibility loci.

Published in the journal: . PLoS Genet 6(1): e32767. doi:10.1371/journal.pgen.1000798
Category: Research Article
doi: https://doi.org/10.1371/journal.pgen.1000798

Summary

Introduction

Genome-wide association studies (GWAS) have identified more than 38 larger genetic regions which influence blood levels of total cholesterol (TC), low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C) and triglycerides (TG) [1]–[3]. These studies modeled basic anthropometric confounders, such as sex and age, while leaving out important environmental influences, such as diet and activity. This strategy is statistically suboptimal since the unexplained variation in the phenotype can increase the measurement error and as a result require larger sample sizes to detect a significant effect. Manolio [4] argued strongly for modeling of environmental covariates in GWAS and recommended lipid levels as a paradigmatic phenotype for studying the genetic and environmental architecture of quantitative traits.

In order to explore the usefulness of including both environmental and genetic factors in the analysis model, we used lipid measurements from the EUROSPAN study, comprising 3,938 individuals for whom genome-wide SNP data (N_SNP = 311,388) were available [5]. We measured daily intake of food and physical activity at work and at leisure and modeled the influence of those environmental covariates on serum lipid levels in a GWAS. First, data from the Northern Sweden Population Health Study (NSPHS) were used as a discovery cohort to screen for SNPs that displayed the lowest p-values when the model was adjusted for environmental covariates. We then used the other, non-Swedish EUROSPAN cohorts for replication of our strongest associations in a candidate gene association study (CGAS).

We chose a population living in northern Sweden for the selection of candidate loci because it shows strong natural heterogeneity in certain lifestyle factors (e.g. diet, activity), but homogeneity in other environmental aspects such as climate [6]. Whereas one group is living a modern, sedentary lifestyle found also in the southern part of Sweden and other western European countries, a subgroup of Swedes follows a traditional, semi-nomadic way of life based on reindeer herding. Reindeer herders typically show higher intake of game meat (reindeer, moose), which has a high protein and low fat content, and lower intake of non-game meat, fish, and dairy products among other, lesser differences. They also exert more physical activity at work to tend their reindeer herds, but less activity at leisure [7].

Results

Exploratory GWAS in NSPHS

We performed a GWAS with a lifestyle-adjusted model which included not only sex and age, but also daily intake of game meat, non-game meat, fish, milk products, physical activity at work and at leisure as covariates. We focused on the 0.05% of all SNPs with the lowest p-values in the diet -⁠ and activity-adjusted model (corresponding to about 150 SNPs per lipid). For total cholesterol, 88 of these were located in a gene and 14 in genes that have been associated with energy metabolism (http://www.ncbi.nlm.nih.gov/omim/). For LDL-C, 65 SNPs were located in a gene, of which 8 were functionally relevant. Several of the SNPs for LDL-C were identical with those affecting total cholesterol, as expected from the high correlation (r = 0.91) between both phenotypes. For HDL-C, SNP rs2292883, located in the MLPH gene (Melanophilin), showed a genome-wide significant p-value (p = 1.06×10⁻⁰⁷). 69 SNPs for HDL-C were located in a gene and 14 of those genes were reported as having a metabolic effect. Finally, for triglycerides, 63 SNPs were located in a gene, but only 4 SNPs in genes with a functional annotation of interest (Table 1 and Table S1A, S1B, S1C, S1D).

Candidate SNPs (<i>n</i> = 39) selected from the Swedish discovery cohort. — **Tab. 1. Candidate SNPs (n = 39) selected from the Swedish discovery cohort.**

P-value changes

In order to evaluate the effect of including diet and activity covariates in the association analysis, we overlaid the p-values in the Manhattan plots from the NSPHS for the unadjusted and adjusted GWAS models (Figure 1, Figure 2, Figure 3, Figure 4). More refined GWAS results separating the effect of adjusting for either diet or physical activity are presented in Figure S1A, S1B, S1C, S1D; and Figure S2A, S2B, S2C, S2D. As expected, the p-values for a number of SNPs were sensitive to the inclusion of both diet and activity covariates in the model. We matched the 0.05% SNPs with the lowest p-values (top SNP list) between the unadjusted and the adjusted model. For TC, 83 (53%) SNPs were found in both top SNP lists. Those lists contained 102 (64%) identical SNPs for LDL-C and 103 (65%) for HDL-C. The analyses resulted in the same 74 (47%) top SNPs for TG levels (Table S1A, S1B, S1C, S1D). Finally, we compared the p-value changes of the resulting 39 candidate SNPs that are located in genes with a metabolic effect between the diet and activity-adjusted (full) model and the unadjusted (restricted) model resulting in an up to 27-fold p-value decrease (Table 1).

**Fig. 1. Manhattan plot of genome-wide effects on total cholesterol levels in the Swedish discovery cohort.**

**Fig. 2. Manhattan plot of genome-wide effects on LDL cholesterol levels in the Swedish discovery cohort.**

**Fig. 3. Manhattan plot of genome-wide effects on HDL cholesterol levels in the Swedish discovery cohort.**

**Fig. 4. Manhattan plot of genome-wide effects on triglyceride levels in the Swedish discovery cohort.**

Confirmatory CGAS in EUROSPAN

A food -⁠ and activity-adjusted candidate gene association study of the final 39 candidate SNPs in the Scottish (SC) sample (N = 714) was applied using similar lifestyle covariates (Table 2; Table S1E, S1F, S1G, S1H; Table S2). We replicated the effect of rs2000999 (p_SC,unadj = 6.16×10⁻⁰³, p_SC,adj = 4.33×10⁻⁰³) in the HP gene (Haptoglobin-related protein Precursor) on TC level and the effect of rs1532624 (p_SC,unadj = 2.40×10⁻⁰⁹, p_SC,adj = 1.96×10⁻⁰⁹) in CETP (Cholesteryl ester transfer protein) on HDL-C. In the Swedish cohort (SE), the unadjusted genetic effect of rs2000999 in the HP gene is equivalent to a moderately large difference in average TC level of 20.21 mg/dl between the homozyguous genotypes (Mean_SE,unadj(TC|A/A)−Mean_SE,unadj(TC|G/G) = 243.16−222.95, Effect Size_SE,unadj = 0.41, Effect Size_SE,adj = 0.44)(Effect Size (ES) = (M_A/A−M_B/B)/SD_pooled). Equivalent effects were observed in the Scottish replication sample (M_SC,unadj(TC|A/A)−M_SC,unadj(TC|G/G) = 235.36 mg/dl−222.54 mg/dl = 12.82 mg/dl, ES_SC,unadj = 0.29, ES_SC,adj = 0.52). SNP rs1532624 in the CETP gene is associated with a large, unadjusted difference in HDL-C level of 9.99 mg/dl (M_SE,unadj(HDL-C|A/A)−M_SE,unadj(HDL-C|C/C) = 68.14 mg/dl−58.15 mg/dl, ES_SE,unadj = 0.73, ES_SE,adj = 0.48) in the discovery cohort and similar effects regarding direction and size in the replication cohort (M_SC,unadj(HDL-C|A/A)−M_SC,unadj(HDL-C|C/C) = 69.79 mg/dl−60.75 mg/dl = 9.04 mg/dl; ES_{SC, unadj} = 0.59, ES_{SC, adj} = 0.57).

SNPs (<i>n</i> = 3) discovered in a Swedish and replicated in a non-Swedish EUROSPAN cohort. — **Tab. 2. SNPs (n = 3) discovered in a Swedish and replicated in a non-Swedish EUROSPAN cohort.**

We also performed an unadjusted candidate gene analysis of the 39 candidate SNPs in all non-Swedish (NS) EUROSPAN cohorts (Scotland, Croatia, The Netherlands, and Italy, N_NS = 3,282) and aggregated the results in a meta-analysis (Table 2; Table S1I, S1J, S1K, S1L). We confirmed the effects of rs5400 (p_NS = 4.68×10⁻⁰²) in SLC2A2 on TC. We again found that rs2000999 (p_NS,unadj = 3.54×10⁻²) in HP influences TC levels and rs1532624 (p_NS,unadj = 2.87×10⁻²⁰) in CETP (Cholesteryl ester transfer protein) affects HDL-C levels. The unadjusted genetic effect of rs5400 is equivalent to a moderately large difference in mean TC level of 27.11 mg/dl between homozyguous genotypes (M_SE,unadj(TC|A/A)−M_SE,unadj(TC|G/G) = 249.30 mg/dl−222.19 mg/dl, ES_SE,unadj = 0.57, ES_SE,adj = 0.66) in the Swedish Cohort and a small total effect in all non-Swedish samples (M_NS,unadj(TC|A/A)−M_NS,unadj(TC|G/G) = 236.69 mg/dl−223.34 mg/dl = 13.35 mg/dl, ES_NS,unadj = 0.30).

No other associations, including LDL cholesterol or triglycerides levels, were replicated (all p>0.05). The genome-wide significant SNP rs2292883 in the Melanophilin (MLPH) gene found in the Swedish cohort was not confirmed.

Discussion

Environmental covariates may either act as moderators, mediators or even suppressors, thereby affecting the discovery of genetic susceptibility loci [8],[9]. Therefore, we conducted a GWAS, modeling genetic and important environmental effects, such as food intake and physical activity, on serum levels of classical lipids. To our knowledge, this is the first GWAS on blood lipid levels modeling environmental factors, in particular major food categories and physical activity, in international cohorts. Our analysis replicated one known locus in the CETP gene [1] and identified two other gene loci in the SLC2A2 and HP gene, respectively, involved in energy metabolism but not previously reported to be associated with cholesterol levels.

SLC2A2 encodes the facilitated glucose transporter member 2 (GLUT-2, Solute carrier family 2) and is predominantly expressed in the liver. Mice deficient in GLUT-2 are hyperglycemic and have elevated plasma levels of glucagon and free fatty acids [10]. Mutations in GLUT-2 cause the Fanconi-Bickel syndrome (FBS) characterized by hypercholesterolemia and hyperlipidemia [11],[12]. Cerf [13] argued that a high-fat diet causes a decreased expression of the GLUT-2 glucose receptor on β-cell islets. As a result, glucose stimulation of insulin exocytosis is impaired causing hyperglycemia, a clinical hallmark of type 2 diabetes. In addition, Kilpelainen et al. [14] found that physical activity moderates the genetic effect of SLC2A2 on type 2 diabetes. These studies suggest that these lifestyle factors could have masked genetic effects in previous, unadjusted GWAS. This is emphasized by the strong increase in statistical significance of the SLC2A2 polymorphisms after adjusting for diet and physical activity, indicating that the examined lifestyle factors modified the effect of this gene. Our supplemental results show that physical activity markedly moderated the genetic effect on total cholesterol.

The HP gene encodes the Haptoglobin-related Protein Precursor (Hp), which binds hemoglobin (Hb) to form a stable Hp-Hb complex and, thereby, prevents Hb-induced oxidative tissue damage. Asleh et al. [15] identified severe impairment in the ability of Hp to prevent oxidation caused by glycosylated Hb. Diabetes is also associated with an increase in the non-enzymatic glycosylation of serum proteins, so these authors suggested that there is a specific interaction between diabetes, cardiovascular disease and the Hp genotype. It results from the increased need of rapidly clearing glycosylated Hb-Hp complexes from the subendothelial space before they oxidatively modify low-density lipoprotein to form the atherogenic oxidized low-density lipoprotein. The p-value for association between the HP SNP rs2000999 and total serum cholesterol concentration decreased in the model adjusted for diet and physical activity, suggesting that the genetic effect is moderated by diet and physical activity. Our supporting material points out the moderating role of physical activity in particular.

We also observed a highly significant association between rs1532624 in CETP and HDL-C levels. The CETP protein catalyzes the transfer of insoluble cholesteryl esters among lipoprotein particles. Variation in CETP is known to affect the susceptibility to atherosclerosis and other cardiovascular diseases [16]. Adjustment for diet and physical activity in our model caused an increase of the p-value of this SNP. Our supporting results indicate that the genetic effect is mediated by diet or by physical activity in a similar way.

This study also has some limitations. First, we are aware that our candidate gene association approach covers only a very small fraction of all genomic loci, which is one of the potential reasons why some classical lipid-influencing genes, such as APOE, are not represented in our candidate SNP list. Therefore, our approach is not comprehensive and may have failed to identify other relevant lifestyle-sensitive genetic variants. Nonetheless, we decided to apply this approach to make the best out of the available lifestyle data. Second, our study provides only limited information on the role of individual lifestyle factors for a genetic variant. However, in this study we aimed at amplifying genetic effects by adjusting for a maximum amount of environmental variance in a single model and, therefore, we neglected some of these aspects here. Third, we did not model genetic covariates in known lipid-relevant genes which may also moderate the effect of other genetic predictors. This is due to the focus of this paper on gene-environment relationships.

In summary, we have demonstrated that modeling environmental factors, in particular major food categories and physical activity, can improve statistical power and lead to the discovery of novel susceptibility loci. Such models also provide an understanding of the complex interplay of genetic and environmental factors affecting human quantitative traits. Inclusion of environmental covariates represents a much needed next step in the quest to model the complete environmental and genetic architecture of complex traits.

Methods

Ethics statement

All EUROSPAN studies were approved by the appropriate research ethics committees according to the Declaration of Helsinki [17]. The Northern Swedish Population Health Study (NSPHS) was approved by the local ethics committee at the University of Uppsala (Regionala Etikprövningsnämnden, Uppsala). The Scottish ORCADES study was approved by the NHS Orkney Research Ethics Committee and the North of Scotland REC. The Croatian VIS study was approved by the ethics committee of the medical faculty in Zagreb and the Multi-Centre Research Ethics Committee for Scotland. The Dutch ERF study was approved by the Erasmus institutional medical ethics committee in Rotterdam, The Netherlands. The Italian MICROS study was approved by the ethical committee of the Autonomous Province of Bolzano, Italy.

Participants

The examined subjects stem from five different population-representative, pedigree-based cohorts from the EUROSPAN consortium (http://www.eurospan.org). All studies include a comprehensive collection of data on family structure, lifestyle, blood samples for clinical chemistry, RNA and DNA analyses, medical history, and current health status. All participants gave their written informed consent [18]. A brief description of each population is given below:

The Northern Swedish Population Health Study (NSPHS) represents a cross-sectional study conducted in the community of Karesuando in the subartic region of the County of Norrbotten, Sweden, in 2006 [5]. This parish has about 1500 eligible inhabitants of whom 740 participated in the study. The final sample consisted of 309 men and 347 women who were aged between 14 and 91 years. The inclusion of diet and activity covariates in the analytical model and according missing values reduced the effective sample size by less than 5%.

The Orkney Complex Disease Study (ORCADES) is a longitudinal study in the isolated Scottish archipelago of Orkney [19]. Participants from a subgroup of ten islands (N = 719) were used for the presented analysis. The sample comprised 334 men and 385 women aged between 18 and 100 years. The inclusion of diet and activity covariates in the analytical model and according missing values reduced the effective sample size by less than 5%.

The VIS study is a cross-sectional study in the villages of Vis and Komiza on the Dalmatian island of Vis, Croatia, and was conducted between 2003 and 2004 [20]–[22]. 795 participants who had both genotype and phenotypic data available were analysed. This cohort included 328 men and 467 women with an age between 18 and 93 years.

The Microisolates in South Tyrol Study (MICROS) is a cross-sectional study carried out in the villages of Stelvio, Vallelunga, and Martello, Venosta valley, South Tyrol, Italy, from 2001 to 2003 [23]. The 1,097 participants (475 males, 622 females, age between 18 and 88 years) presented in this study are those for whom both relevant genotype and phenotype data were available.

The Erasmus Rucphen Family Study (ERF) is a longitudinal study on a population living in the Rucphen region, the Netherlands, in the 19th century [24]. Fasting total cholesterol, HDL cholesterol and triglyceride levels were available. LDL cholesterol was estimated using the Friedewald formula [25]. The 918 individuals included in this study consisted of the first series of participants with 354 men and 564 women aged between 18 and 92 years.

Genotyping

DNA samples were genotyped according to the manufacturer's instructions on Illumina Infinium HumanHap300v2 or HumanCNV370v1 SNP bead microarrays. Both arrays have 311,388 SNP markers in common that are distributed across the human genome. Analysis of the raw data was done in the BeadStudio software with the recommended parameters for the Infinium assay and using the genotype cluster files provided by Illumina. Individuals with a call rate below 95% and SNPs with a call rate below 98%, deviating from Hard-Weinberg equilibrium (p_HWE<1×10⁻⁶) or with a minor allele frequency of less than 1% were excluded from the analysis.

Lipids

Total cholesterol (TC), low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), and triglycerides (TG) were quantified by enzymatic photometric assays using an ADVIA1650 clinical chemistry analyzer (Siemens Healthcare Diagnostics GmbH, Eschborn, Germany) at the Institute for Clinical Chemistry and Laboratory Medicine, Regensburg University Medical Center, Germany.

Diet

In the NSPHS cohort, we collected data with a food frequency questionnaire based on the Northern Sweden 84-item Food Frequency Questionnaire (NoS-84-FFQ) [26]. We included in the questionnaire several items on foods specific for the lifestyle in this geographic region, in particular on game consumption (reindeer, moose). The answer options consisted of an 11-point format: 0 = “Never”, 1 = “less than 1 time per month”, 2 = “1 to 3 times per month”, 3 = “1 time per week”, 4 = “2 to 4 times per week”, 5 = “5 to 6 times per week”, 6 = “1 time per day”, 7 = “2 to 3 times per day”, 8 = “4 to 5 times per day”, 9 = “6 to 8 times per day”, 10 = “9 to 10 times per day”. The questionnaire was applied in electronic format by a trained study nurse as an interviewer. For each food item we calculated daily intake in gram per day as a standardized unit of measurement and aggregated the items to food categories, such game meat, non-game meat, fish, and dairy products. We evaluated the construct validity (known-groups validity) of the added items on game consumption in the NoS-84-FFQ questionnaire. We compared reindeer herders (N = 94) versus non-reindeer herders (N = 505). We observed highly significant, large effect sizes in men (ES = 1.25, p = 9.7×10⁻⁰⁴) and women (ES = 1.15, p = 2.9×10⁻⁰⁵) in the expected direction corresponding with an approximately three times higher consumption of absolute overall game intake in reindeer herders compared to others. A similar approach was used for the measurement and analysis of dietary data collected with a food frequency questionnaire in the Scottish cohort (Table S2).

Physical activity

In the NSPHS cohort, we used two self-report scales to measure overall physical activity at work and at leisure. The Work Activity Scale (WAS, 6 items) addresses typical occupational physical activities: sitting, standing, walking, lifting, and general indicators of physical activity, i. e. sweating and tiredness after work. The Leisure Activity Scale (LAS, 4 items) asks for various typical freetime activities such walking, cycling, other sporting activities, and sweating as a general indicator of physical activity. Participants reported the frequency of each activity on a 5-point rating scale (1 = “never”, 2 = “seldom”, 3 = “sometimes”, 4 = “often”, and 5 = “always”). Both scales showed satisfying internal consistency with Cronbach's α(WAS) = 0.73 and Cronbach's α(LAS) = 0.70. A similar approach was used for the measurement and analysis of data on physical activity collected with a self-report questionnaire in the Scottish cohort (Table S2).

Statistical analysis

Model selection

Sex and age are chosen as standard moderators of medical outcomes. Food and physical activity covariates have been selected based on findings on natural variation in lifestyle factors in this (data not presented) and other [7] northern Swedish populations between a modern, sedentary and a traditional, semi-nomadic lifestyle based on reindeer herding. Mostly significant associations between diet and activity covariates and lipid levels were found in the examined Swedish EUROSPAN cohort in the following ranges: r = [−0.01;0.12] (p = [1.28×10⁻⁰²;0.16]) for game meat, r = [−0.13;−0.05] (p = [8.63×10⁻⁰⁴;0.74]) for non-game meat, r = [0.06;0.16] (p = [2.12×10⁻⁰⁵;0.12]) for fish, r = [0.04;0.13] (p = [2.51×10⁻⁰⁹;3.85×10⁻⁰⁶]) for physical activity at work, and r = [−0.11;0.01] (p = [5.05×10⁻⁰⁹;1.30×10⁻⁰⁶]) for physical activity at leisure (Table S3). We finally selected sex, age, game meat, non-game meat, fish, dairy products, physical activity at work, and physical activity at leisure as covariates in our diet -⁠ and activity-adjusted model (“adjusted” model) in the Swedish EUROSPAN sample. Sex and age were used as covariates in the “unadjusted” model.

We tested whether the inclusion of those covariates in the explanatory model led to a statistical significant improvement of the goodness of model fit compared to a restricted model by applying a maximum likelihood ratio (MLR) test. We inferred a significant better model fit of the full model if the difference of the χ² value between both models had an equal or lower probability than p = 0.05 (one-sided, upper tail) on a χ² distribution with k degrees of freedom. The degrees of freedom k are equal to the difference of the number of parameters in each model. The difference of χ² values between both models is calculated according to the following formula with MLE indicating the maximum likelihood estimates per model: χ²(rest−full) = −2 (log₁₀(MLE_rest)−log₁₀(MLE_full)). The comparison of the goodness of fit between the unadjusted and the diet -⁠ and activity-adjusted full model, using a MLR test, showed a statistically significant improvement for all four lipid traits (TC: χ²_diff = 59.69, df = 6, p = 5.21×10⁻¹¹; LDL-C: χ²_diff = 39.45, df = 6, p = 5.85×10⁻⁰⁷; HDL-C: χ²_diff = 29.57, df = 6, p = 4.75×10⁻⁰⁵; TG: χ²_diff = 69.32, df = 6, p = 5.65×10⁻¹³). All included polygenic, anthropometric and lifestyle factors (with the effect of including only the polygenic, sex, and age effects in parentheses) explained 64.07% (58.02%) of the variation of TC, 59.47% (56.47%) of the variation of LDL-C, 83.73% (82.59%) of the variance of HDL-C and 58.68% (41.80%) of the variation of TG levels. Dietary measures accounted for 22% (TC), 40% (LDL-C), 74% (HDL-C), and 7% (TG), respectively, of the variance explained by lifestyle factors with physical activity being responsible for the rest. GWAS results for models adjusted for sex, age, and diet only (Figures S1A, S1B, S1C, S1D) or physical activity only (Figures S2A, S2B, S2C, S2D) are presented in the supporting figures.

The confounding effect of treatment with statins on total cholesterol level and LDL cholesterol level was adjusted for by imputing untreated lipid concentrations of medicated individuals using the npsubtreated() function of the R/GenABEL package which implements the algorithm of Tobin et al. [27]. Additionally, we conducted the same analysis in subsamples which did not receive any lipid-lowering treatment and found overall converging, but somewhat weaker results for rs2000999 (p_SE,adj = 2.55×10⁻⁰⁴; p_SC,adj = 2.07×10⁻⁰², p_NS,unadj = 5.93×10⁻⁰²), rs1532624 (p_SE,adj = 2.26×10⁻⁰⁵; p_SC,adj = 2.28×10⁻⁰⁹, p_NS,unadj = 2.37×10⁻¹⁹), and rs5400 (p_SE,adj = 5.34×10⁻⁰⁶; p_SC,adj = 2.23×10⁻⁰¹, p_NS,unadj = 8.04×10⁻⁰²) (Table S4).

Genome-wide association analysis

First, deviations from normality for all quantitative traits (lipids, age, diet, and physical activity) were corrected by inverse-normal transformation without adjusting for covariates. Second, linear mixed effects models were fitted for the transformed outcomes (TC, LDL-C, HDL-C, TG) using the above mentioned covariates in the Swedish EUROSPAN sample and corresponding measures in the Scottish EUROSPAN sample (Table S2). The analysis was performed using the “polygenic” linear mixed effects model function polygenic() of the R/GenABEL package. Third, genome-wide association analysis was performed using a score test, a family-based association test [28], implemented in the mmscore() function of R/GenABEL. It uses the residuals and the variance-covariance matrix from the polygenic model and additional the SNP fixed effect coded under an additive model (0 = A/A, 1 = A/B, 2 = B/B). Fourth, genome-wide significance of a genetic loci was based on a local type I error of α = 0.05/311 388 SNPs = 1.6×10⁻⁷ according to a Bonferroni adjustment.

Candidate gene association analysis

The same statistical approach was used for association analysis of candidate loci with a local type I error of α = 0.05. No Bonferroni adjustment was applied to protect against α inflation since this method would be biased for the following reasons. The applied selection procedure for candidate loci makes the assumption of a global null hypothesis highly unlikely. Additionally, the phenotypes and some of the genotypes are highly correlated decreasing the number of independent tests. Instead all confirmatory tests are reported to allow the reader to evaluate the overall significance of the findings [29].

Relatedness

λ coefficients of lifestyle-adjusted genome-wide analysis varied in a low range between 1.00 and 1.04 in the Swedish cohort (see QQ-plots, Figures S3A, S3B, S3C, S3D, and Figure S4A, S4B, S4C, S4D) and between 1.00 and 1.01 in the Scottish cohort across all lipid traits. λ values for the unadjusted model used in the other three EUROSPAN cohorts did not exceed 1.01. These values indicate that our statistical model adequately handled relatedness in our pedigree-based samples since deflation of λ values is expected after correction for family structure.

Software and databases

We performed all analysis with the statistical analysis system R (V2.8.1) [30] mainly using the packages GenABEL (V1.4.2) [31] and biomarRt (V1.16.0) [32]. We accessed the following databases: Ensembl (http://www.ensembl.org) and Online Mendelian Inheritance in Men (http://www.ncbi.nlm.nih.gov/omim/).

Supporting Information

Zdroje

1. AulchenkoYS

RipattiS

LindqvistI

BoomsmaD

HeidIM

2009 Loci influencing lipid levels and coronary heart disease risk in 16 European population cohorts. Nat Genet 41 47 55 doi:10.1038/ng.269

2. SabattiC

ServiceSK

HartikainenA

PoutaA

RipattiS

2009 Genome-wide association analysis of metabolic traits in a birth cohort from a founder population. Nat Genet 41 35 46 doi:10.1038/ng.271

3. KathiresanS

WillerCJ

PelosoGM

DemissieS

MusunuruK

2009 Common variants at 30 loci contribute to polygenic dyslipidemia. Nat Genet 41 56 65 doi:10.1038/ng.291

4. ManolioTA

2009 Cohort studies and the genetics of complex disease. Nat Genet 41 5 6 doi:10.1038/ng0109-5

5. JohanssonA

MarroniF

HaywardC

FranklinCS

KirichenkoAV

2009 Common variants in the JAZF1 gene associated with height identified by linkage and genome-wide association analysis. Hum Mol Genet 18 373 380 doi:10.1093/hmg/ddn350

6. RossAB

JohanssonA

IngmanM

GyllenstenU

2006 Lifestyle, genetics, and disease in Sami. Croat Med J 47 553 65 doi:16909452

7. RossA

JohanssonÅ

Vavruch-NilssonV

HasslerS

SjölanderP

2009 Adherence to a traditional lifestyle affects food and nutrient intake among modern Swedish Sami. International Journal of Circumpolar Health 68 313 416

8. PearlJ

2003 Statistics and causal inference: A review. TEST 12 281 345 doi:10.1007/BF02595718

9. BaronRM

KennyDA

1986 The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. J Pers Soc Psychol 51 1173 82

10. GuillamMT

HümmlerE

SchaererE

YehJI

BirnbaumMJ

1997 Early diabetes and abnormal postnatal pancreatic islet development in mice lacking Glut-2. Nat Genet 17 327 330 doi:10.1038/ng1197-327

11. SanterR

SchneppenheimR

DombrowskiA

GötzeH

SteinmannB

1997 Mutations in GLUT2, the gene for the liver-type glucose transporter, in patients with Fanconi-Bickel syndrome. Nat Genet 17 324 326 doi:10.1038/ng1197-324

12. ManzF

BickelH

BrodehlJ

FeistD

GellissenK

1987 Fanconi-Bickel syndrome. Pediatr Nephrol 1 509 518

13. CerfME

2007 High fat diet modulation of glucose sensing in the beta-cell. Med Sci Monit 13 RA12 17

14. KilpelainenTO

LakkaTA

LaaksonenDE

LaukkanenO

LindstromJ

2007 Physical activity modifies the effect of SNPs in the SLC2A2 (GLUT2) and ABCC8 (SUR1) genes on the risk of developing type 2 diabetes. Physiol Genomics 31 264 272 doi:10.1152/physiolgenomics.00036.2007

15. AslehR

MarshS

ShilkrutM

BinahO

GuettaJ

2003 Genetically determined heterogeneity in hemoglobin scavenging and susceptibility to diabetic cardiovascular disease. Circ Res 92 1193 1200 doi:10.1161/01.RES.0000076889.23082.F1

16. DullaartRPF

SluiterWJ

2008 Common variation in the CETP gene and the implications for cardiovascular disease and its treatment: an updated analysis. Pharmacogenomics 9 747 763 doi:10.2217/14622416.9.6.747

17. World Medical Association (WMA) 2000 World Medical Association Declaration of Helsinki: Ethical principles for medical research involving human subjects [Internet]. Available: http://www.wma.net/e/policy/pdf/17c.pdf. Accessed 30 Sep 2009

18. MascalzoniD

JanssensACJ

StewartA

PramstallerP

GyllenstenU

2009 Comparison of participant information and informed consent forms of five European studies in genetic isolated populations. Eur J Hum Genet Available: http://www.ncbi.nlm.nih.gov/pubmed/19826451. Accessed 19 Oct 2009

19. McQuillanR

LeuteneggerA

Abdel-RahmanR

FranklinCS

PericicM

2008 Runs of homozygosity in European populations. Am J Hum Genet 83 359 372 doi:10.1016/j.ajhg.2008.08.007

20. BaraćL

PericićM

KlarićIM

RootsiS

JanićijevićB

2003 Y chromosomal heritage of Croatian population and its island isolates. Eur J Hum Genet 11 535 542 doi:10.1038/sj.ejhg.5200992

21. RudanI

CampbellH

RudanP

1999 Genetic epidemiological studies of eastern Adriatic Island isolates, Croatia: objective and strategies. Collegium Antropologicum 23 531 46 doi:10646227

22. VitartV

BiloglavZ

HaywardC

JanicijevicB

Smolej-NarancicN

2006 3000 years of solitude: extreme differentiation in the island isolates of Dalmatia, Croatia. Eur J Hum Genet 14 478 87 doi:5201589

23. PattaroC

MarroniF

RieglerA

MascalzoniD

PichlerI

2007 The genetic study of three population microisolates in South Tyrol (MICROS): study design and epidemiological perspectives. BMC Med Genet 8 29 doi:1471-2350-8-29

24. AulchenkoY

HeutinkP

MackayI

Bertoli-AvellaAM

PullenJ

2004 Linkage disequilibrium in young genetically isolated Dutch population. Eur J Hum Genet 12 527 34 doi:15054401

25. FriedewaldWT

LevyRI

FredricksonDS

1972 Estimation of the concentration of low-density lipoprotein cholesterol in plasma, without use of the preparative ultracentrifuge. Clin Chem 18 499 502

26. JohanssonI

HallmansG

WikmanA

BiessyC

RiboliE

2002 Validation and calibration of food-frequency questionnaire measurements in the Northern Sweden Health and Disease cohort. Public Health Nutr 5 487 96 doi:10.1079/PHNPHN2001315

27. TobinMD

SheehanNA

ScurrahKJ

BurtonPR

2005 Adjusting for treatment effects in studies of quantitative traits: antihypertensive therapy and systolic blood pressure. Stat Med 24 2911 2935 doi:10.1002/sim.2165

28. ChenW

AbecasisGR

2007 Family-based association tests for genomewide association scans. Am J Hum Genet 81 913 926 doi:10.1086/521580

29. ProschanMA

WaclawiwMA

2000 Practical guidelines for multiplicity adjustment in clinical trials. Control Clin Trials 21 527 539

30. R Development Core Team 2006 R: A language and environment for statistical computing. R Foundation for Statistical Computing

31. AulchenkoY

RipkeS

IsaacsA

van DuijnC

2007 GenABEL: an R library for genome-wide association analysis. Bioinformatics 23 1294 6 doi:btm108