Disproportionate Contributions of Select Genomic Compartments and Cell Types to Genetic Risk for Coronary Artery Disease

Download PDF České info

Coronary artery disease (CAD) and its subcomponent, myocardial infarction (MI), are the leading causes of infirmity and death worldwide. Large-scale genetic association studies have identified many genetic markers associated with CAD and MI. However, it has been difficult to determine the precise functional effects of these markers. Furthermore, it is unknown which cell types are biologically important in the development of MI/CAD. By intersecting findings from large-scale genetic association studies with functional genomic annotations, we show that genetic markers located in genomic regions that regulate expression of genes make up a large proportion of the genetic risk of MI/CAD. Furthermore, we show that this effect is particularly strong in certain tissues, including adipose, brain and spleen tissue. These results highlight the role of tissue-specific regulatory mechanisms in the genetic etiology of MI/CAD.

Published in the journal: . PLoS Genet 11(10): e32767. doi:10.1371/journal.pgen.1005622
Category: Research Article
doi: https://doi.org/10.1371/journal.pgen.1005622

Summary

Introduction

Coronary artery disease (CAD) and myocardial infarction (MI) remain among the leading causes of infirmity and death worldwide despite advances in and widespread adoption of medical therapies treating this disease. Studies have shown a large genetic component for CAD, with the heritability estimated to be 30–60% [1]. Large-scale genome-wide association studies (GWAS) have identified common single nucleotide polymorphisms (SNPs) at 45 loci associated with MI/CAD risk [2–8]. Although these newly discovered loci have led to important new biological insights for MI/CAD [9,10], the proportion of the heritability explained by these loci represents approximately 15% of the estimated heritability [8]. Therefore, a large proportion of the genetic effects are apparently not explained by known loci. This phenomenon has been similarly observed with GWAS for other complex diseases [11].

Previously, we modeled the genetic architecture of MI and CAD using GWAS data [12]. Using simulated genetic models, we inferred that a polygenic model comprised of thousands of associated common variants with small effects explains the majority of the heritability (proportion of total liability-scale variance explained is 0.48, 76% of family-study h²) for MI/CAD [12]. Furthermore, recent work has partitioned heritability of complex diseases into broad categories of the genome [13]. Pooling 11 common diseases, including CAD, Gusev et al. have shown that heritability is disproportionally represented in regulatory elements, specifically in DNase I hypersensitivity sites (DHS) (h²_g = 79% for SNPs in DHS regions). However, a specific analysis of CAD yielded a wide interval of the true enrichment of DHS, with genotyped estimates of h²_g = 10.9 to 71.3% (95% confidence intervals) for SNPs in DHS regions. Hence, other than being broadly distributed throughout the genome, the molecular consequences of MI/CAD-associated common variants largely remain undefined. Furthermore, it is unclear which cell or tissue types are influenced the most by MI/CAD-associated SNPs.

We addressed these unresolved issues by leveraging data from the National Institutes of Health (NIH) Roadmap Epigenomics Project [14,15] and the Encyclopedia of DNA Elements (ENCODE) Consortium [16]. These projects have comprehensively catalogued biochemical functional regions such as those critical for transcription, transcription factor binding, chromatin structure and histone modification in different cell types, providing unique opportunities to scrutinize links between non-protein-coding DNA sequence variants and gene function. Studies have shown that SNPs in these functional DNA elements can regulate gene expression [17] and common disease-associated loci [18,19].

Using these data, we partitioned the genetic risk of MI/CAD into different categories, to discern drivers in specific cell types that may biologically influence MI/CAD. First, we investigated components of polygenicity and heritability in distinct genomic compartments. Second, we tested for differences in polygenicity, enrichment measures and heritability across diverse cell types within three histone modification marks. Third, we examined clusters of 45 loci discovered from recent GWAS meta-analyses [8] for MI/CAD mapping to the three histone marks in different cell types. Finally, we investigated whether specific biological networks were expressed differently in certain cell types.

Results

Relative contribution of genomic compartments to MI/CAD risk

We imputed two GWAS datasets, the Myocardial Infarction Genetics Consortium (MIGen) and the Wellcome Trust Case Control Consortium (WTCCC) CAD, using reference haplotypes from the 1000 Genomes Project [20]. We imputed ~7 million SNPs with a high imputation quality metric (>0.5) in both datasets. First, we investigated the relative contributions of different genomic compartments to MI/CAD risk. We partitioned the human genome into three distinct variant sets: “genic noncoding”, “genic coding” and “intergenic” (S1 Table) (see Materials and Methods). For each variant set, we performed two different analyses: 1) polygenic risk score analysis, where we test association of a genetic score comprised of multiple SNPs and 2) SNP-heritability analysis, where we estimate the variance in liability to MI/CAD [21] (see Materials and Methods).

In the polygenic risk score analysis, we observed that the ‘genic’ variant set, defined as genomic regions within 10 kilobases (kb) upstream and downstream of a gene, showed a substantially stronger signal than SNPs in intergenic regions (defined as genomic regions outside of 10 kb of a gene) (Fig 1 and S1 Fig). Similar results were observed amongst genomic regions with window sizes of 20 kb and 50 kb (S2 and S3 Figs). Among this set of variants, both association (Fig 1) and the phenotypic variance explained by the polygenic risk scores before and after normalization by the number of SNPs (S1–S3 Figs) were the most significant in regions adjacent to the protein-coding DNA regions, called “genic noncoding” regions. We observed that polygenic risk scores were strongly associated with MI/CAD in “genic noncoding” regions (P<10⁻¹⁰), explaining 1 to 1.5% of the phenotypic variance. By comparison, polygenic risk scores in regions within protein-coding DNA regions, called “genic coding” regions, were less strongly associated (P<10⁻⁵) and explained approximately 0.5% of the variability. Similar patterns were observed after normalizing by the number of SNPs in the polygenic risk score. These patterns were particularly evident for P value thresholds <10⁻⁵ (see Materials and Methods). The association signal remained among the variant sets with discovery P value thresholds greater than or equal to 0.05 after excluding regions within ±1 megabase of the 45 known SNPs associated with MI/CAD risk [8] (S4 Fig).

**Fig. 1. Contributions of three genomic compartments to the polygenicity of MI/CAD.**

We further examined the role of different genomic compartments on the heritability for MI/CAD risk by testing whether SNPs in the three compartments make up a large portion of the heritability for MI/CAD (Table 1 and S2 Table). Consistent with findings from our polygenic risk score analysis, we observed that most of the heritability resides within the “genic” regions. In a meta-analysis of the MIGen and WTCCC CAD studies, SNPs in “genic noncoding” regions explained approximately 58.9% (variance in liability = 0.25, P = 1×10⁻⁹) of the total heritability, resulting in a fold enrichment of variance of 1.2. In contrast, the heritability of MI/CAD explained by SNPs residing within “genic coding” regions was estimated to be only 10% of the total (variance in liability = 0.042, P = 0.07). SNPs residing within “genic coding” regions accounted for only 0.5% of the total number of variants, resulting in a high fold enrichment of variance of 19.1. Despite the enrichment of variance estimates for both “genic coding” and “genic noncoding”, neither category statistically deviated beyond expectation (P = 0.088 and P = 0.23 respectively). On the other hand, a statistically significant depletion of variance in liability was observed in the “intergenic” regions compared to expectation (observed variance in liability = 0.13, expected variance in liability = 0.22 [0.52 fraction of SNPs of total × 0.42 total variance in liability], difference in observed and expected variance in liability P = 0.0089). Similar results were observed amongst genomic regions with window sizes of 20 kb and 50 kb (S3 and S4 Tables). Heritability estimates with a prevalence of 3% of early-onset MI/CAD showed reduced heritability (variance in liability = 0.35) explained by the genomic compartment whereas P values and enrichment of variance remained the same (S5 Table).

**Tab. 1. Heritability of MI/CAD explained by three genomic compartment sets.**

Strength of polygenic association signal within regulatory elements

Given the high polygenicity for MI/CAD explained by SNPs in noncoding regions surrounding protein-coding regions, we further examined whether polygenicity is stronger within specific regulatory elements. Using data from the NIH Roadmap Epigenomics Project (see URL), we specifically examined three histone modification marks that are indicative of active promoters (H3K4me3/H3K9ac) or enhancers (H3K27ac). A polygenic association signal comprised of SNPs with association P<0.05 for MI/CAD was stronger in the histone marks, beyond what we expect by chance after randomly sampling “genic noncoding” regions outside of the marks (Mann-Whitney test P = 1.1×10⁻⁹⁵) (S5 Fig).

Genetic contribution of regulatory elements between specific cell types

We next tested for differences in polygenicity, enrichment and heritability estimates between different cell or tissue types within the three histone modification marks (Fig 2). We observed heterogeneity on the polygenicity of MI/CAD between cell types (Fig 2A). For example, SNPs in the H3K27ac and H3K9ac in bone marrow derived mesenchymal stem cell cultured cells and SNPs in the H3K4me3 regions in mesenchymal stem cell derived adipocyte cultured cells were amongst the strongest signals. Meanwhile, SNPs in any of the three histone marks in hematopoietic CD3, Treg, CD4, CD25, CD45RA primary cells were amongst the weakest signals. Enrichment analyses showed strong excesses for highly associated SNPs (P<10⁻⁵), compared with matched random SNPs, for mesenchymal stem cells, heart tissues such as fetal heart and ventricle, and intestinal mucosa including rectal and colonic mucosa, among others (Fig 2B). Cell-type specific effects were also generally consistent for heritability estimates (Fig 2C). The cell-type specific enrichment signals were strongest among strongly associated SNPs (Fig 2B).

**Fig. 2. Polygenic, enrichment and heritability analysis of three histone modification marks across cell types.**

When tested for ENCODE data, statistically significant polygenicity and high heritability estimates were observed for variants residing in specific active chromatin regions, including enhancers and weak transcription regions (about 3 and 11% of reference genome, respectively) across several cell lines, including skeletal muscle and vascular cell lines (S6 Fig). Polygenicity and heritability estimates were substantially weaker for variants in inactive chromatin states than those in active states (S7 Fig), with the notable exception of the quiescent state. Heritability estimates for variants in a quiescent state were particularly high although the variants in this state accounted for >60% of the reference epigenome (S7 Fig). However, the enrichment of variance (% variance of total divided by % SNPs of total) in the quiescent state was low (on average 0.76 [0.50−0.92] in different cell lines) compared with those for enhancers (on average 6.5 [3.2−10.3]) and weak transcription (1.9 [1.1–2.8]). In general, we note that ENCODE cell line chromHMM state results were largely driven by the sizes of these genomic compartments (i.e. % SNPs), which would be expected if alignment of MI/CAD genetic effects and ChromHMM stats were random.

Disproportionate number of MI/CAD-associated SNPs in regulatory elements in specific cell types

These observations have implications for fine-mapping loci discovered from large-scale GWAS. In particular, the results suggest that causal variants within GWAS loci are overrepresented in regulatory elements adjacent to protein-coding regions of the genome. Therefore, we investigated 45 independent, lead SNPs discovered from recent GWAS meta-analyses [8] for MI/CAD and tested whether a disproportionate number of these SNPs overlapped histone marks across diverse cell lines and tissues. We generated 10,000 random sets of SNPs that were selected to match the query GWAS SNPs based on similar minor allele frequency (±0.05 frequency), number of SNPs in linkage disequilibrium (LD) with the query SNP (±10% of number of SNPs in LD with query SNP using r²>0.5), distance to nearest gene (±10% of distance of nearest gene from query SNPs) and gene density (±10% of number of genes in loci around the query SNPs) [25]. We excluded two SNPs (rs3798220 and rs12205331) out of the 45 GWAS SNPs because we were unable to find appropriate matching null SNPs. We observed that 25 (58.1%) out of 43 GWAS SNPs overlapped one of three histone marks. From the 10,000 random null sets, a median of 15 (34.9%) out of the 43 random SNPs (1^st quartile of 13 and 3^rd quartile of 17) overlapped histone marks (S8 Fig). Only a very small fraction of random sets showed a higher number of GWAS SNPs overlapping with histone marks than what we observed (4 out of 10,000 random null sets) (P = 4×10⁻⁴).

Clustering of MI/CAD-associated SNPs in histone marks and cell types

Because histone modification differed between cell types, we next examined whether the 45 MI/CAD GWAS SNPs yielded differential gene regulation across various tissues. We mapped the 45 GWAS SNPs, along with SNPs in high LD (r²≥0.8), that reside within each of the three histone marks in different cell types. We observed distinct patterns between the different GWAS loci and cell types (Fig 3 and S9 and S10 Figs). For example, for the histone mark H3K27ac, 12 of the 45 loci were expressed in more than 80% of the cell types, whereas 13 of the 45 loci were expressed in less than 20%. Using HaploReg v2 [26], specific cell lines displayed enrichment with the 45 loci in strong enhancer regions. In particular, cell types related to adipose nuclei, spleen and brain tissue were amongst the strongest enrichment signals for inferred strong enhancer chromatin states (Table 2). We observed consistent results in these cell lines when also considering SNPs with less stringent significance levels using polygenic association analysis (Fig 2). HaploReg analysis for available ENCODE cell lines showed 24-fold enrichment with the 45 loci in enhancer regions in H1 cell lines (P = 3×10⁻⁵).

**Fig. 3. Hierarchical clustering of 45 MI/CAD GWAS SNPs and specific cell types for a histone modification mark (H3K27ac).**

**Tab. 2. Significant enrichment of 45 MI/CAD-associated SNPs in specific cell types detected by enrichment analysis.**

Connectivity in a protein-protein interaction network among MI/CAD-associated SNPs

Finally, to investigate whether specific biological networks are expressed in specific cell types, we examined connectivity of protein-protein interaction (PPI) networks among the 45 GWAS loci (45 GWAS SNPs, as well as SNPs in r²≥0.8) in different cell lines. Consistent with the HaploReg v2 and polygenic association analysis, we observed high direct connectivity in a PPI network involving known lipid genes, particularly apolipoprotein E (APOE), apolipoprotein C3 (APOC3) and low-density lipoprotein receptor (LDLR) in adipose nuclei (P = 0.002) and mesenchymal stem cell line derived adipocyte cultured cells (P = 0.001), and apolipoprotein B (APOB) and Sortilin 1 (SORT1) in adult liver (P = 0.009) (S11 Fig). Specific effects were observed in PPI networks in other cell types as well, including adult liver (S6 Table).

Discussion

We utilized several complementary human genetic approaches to partition the genetic risk of MI/CAD into different genomic categories and cell types. Three principal findings emerged: (1) genetic variants residing in noncoding regions flanking protein-coding genes make up a large proportion of the heritability for MI/CAD; (2) association signals are enriched in histone modification marks; and (3) clear cell-type specific effects emerged with genetic effects of MI/CAD-associated SNPs being enhanced in adipocyte cell lines.

We highlight an important role for genetic variants residing within specific regulatory elements including promoters and enhancer regions, suggesting that the genetic risk for MI/CAD is largely driven by variants in key regulatory elements. Because we did not adjust for traditional risk factors of MI/CAD, our heritability estimates may include the fraction derived from these risk factors. Furthermore, our heritability estimate of 0.43 for the total genome is consistent with previous family-based studies with estimates of 0.3–0.63 [1,28,29]. Our work is consistent with previous studies that showed that a significantly higher portion of GWAS SNPs overlap regulatory elements such as transcription factor binding regions and/or DNase I hypersensitive sites [18,30]. These findings have important implications for interpretation of GWAS signals, particularly for identifying causal variants. Our findings particularly highlight an important role for promoter and/or enhancer regions in fine-mapping GWAS signals.

Our analyses further highlighted the genetic effects of MI/CAD-associated SNPs occurring in specific cell types, including mesenchymal and adipocyte cell lines. Adipocytes may affect cardiovascular homeostasis by regulating diverse peptides and nonpeptide compounds, which have been implicated in cardiovascular disease pathogenicity [31]. Furthermore, obesity is a casual risk factor for MI/CAD [32]. Here, we demonstrate that alteration of chromatin-mediated gene regulation by DNA sequence variation at key genomic regions within adipocytes is an important determinant of MI/CAD risk. Our results are consistent with studies that have shown that adipocytes play an important role in the pathogenesis of obesity, with adverse effects on inflammation, hemodynamics, and cardiovascular function [33]. We propose that the adipose-cardiovascular axis is not only an acquired mediator of MI/CAD risk but also a critical genetic determinant of MI/CAD risk in the general population. Furthermore, we highlight other loci that are not known to be involved in cardio-metabolism but appear to map to histone marks in the adipocyte cell lines. For example, SNPs in the MIA3 and MRAS loci are highly associated with CAD [2–4] but have not been found to be associated with any cardio-metabolic intermediate trait [34]. Notably, we observe enrichment of MI/CAD-associated SNPs disrupting regulation in brain and spleen tissue raising new hypotheses in the pathogenesis of human atherosclerosis. Hypothalamic-pituitary regulation of the autonomic nervous system has a diverse array of impacts on cardiovascular disease determinants including blood pressure, heart rate, sodium regulation, metabolism, and physiologic responses to stress [35]. Furthermore, proinflammatory mediators within the spleen may have an important role in MI [36]. Because the spleen consists of a multitude of cell types, including immune B/T cells, macrophages, monocytes, endothelial cells, smooth muscle cells from larger arterioles, chromatin data of specific cell types in the spleen may be useful to identify specific functional roles related to MI. The mechanism by which these tissues contribute to a higher enrichment to MI/CAD associated SNPs may be through tissue-specific regulatory perturbation in functional regulatory regions (histone marks and/or chromatin enhancer states) adjacent to protein-coding regions, specifically in adipose, heart, brain and spleen tissues. Thus we propose that genetic determinants of early-onset myocardial infarction have biologic roles within distinct tissue types and these observations should prioritize experimental modeling strategies.

We note the following limitations. We note that the case samples in MIGen and WTCCC CAD were ascertained based on early-onset MI/CAD that was not fatal (male cases < 50 years old and female cases < 60 years old). Therefore, the genetic architecture and some of the tissue types highlighted in this study may be less relevant to the more general, broader CAD phenotype. Furthermore, we observed strong enrichment signals in some tissues (such as heart and intestinal mucosa) that may serve as a proxy for tissues that are biologically relevant to MI/CAD (such as smooth muscle in arterial walls). This may be due to tissues with similar cell types (i.e. smooth muscle cells in different tissues) having similar open chromatin structures. We note that we observed relatively weak signals for immune-related tissues and cell types despite a recent study suggesting that there was enrichment of tissue-specific eSNPs associated with CAD in pathways related to the immune system[37]. We have not examined our results in the context of biological pathways relevant to specific cell types (for example, immune/inflammation pathways in immune cells), and further investigation on this is warranted.

In summary, we have shown that disease-causing variants for MI/CAD are enriched in promoters and enhancer regions flanking protein-coding genes. Functional data from the NIH Roadmap Project and ENCODE provide key links to specific cell types such as mesenchymal stem cell derived adipocyte cultured cells, heart, brain and spleen cells with risk variants for MI/CAD. Our results highlight the importance of investigating the noncoding regions of the genome in genetic studies and suggest a key role of tissue-specific regulatory mechanisms on the etiology of MI/CAD. Identifying critical nodes that are significant drivers of a substantial portion of human disease can both inform biological investigation and prioritize therapeutic efforts.

Materials and Methods

Details regarding material and methods for primary analyses presented in main figures and tables are provided below and in corresponding figure and table legends. Details about methods for supplemental figures and tables are provided in the legends in the Supporting Information.

GWAS data for MI/CAD

We obtained GWAS data for 5,903 samples from MIGen [5] (2,905 cases and 2,998 controls) and 4,837 samples from WTCCC [38] (1,914 cases and 2,923 controls). For quality control, samples with extreme heterozygosity, gender mismatch or sample call rate <95% and SNPs with Hardy-Weinberg equilibrium P<10⁻⁶, minor allele frequency <1% or SNP call rate <98% were excluded prior to imputation.

Imputation into the 1000 Genomes reference panel

We prephased MIGen GWAS data using the MaCH software and imputed into the 1000 Genomes Phase I Integrated Release Version 3 Haplotypes panel using minimac [39]. We imputed WTCCC CAD data using IMPUTE2 [39–41]. To control for imputation quality, we removed low-frequency SNPs with minor allele frequency <0.5% and used SNPs with high imputation quality metric (minimac rsq or IMPUTE2 info) >0.5. We tested for association with additive tests for imputed dosage data using the SNPTEST [40,42] and PLINK [43] software. Gender, age and principal components for population structure [44] were used for covariates in the association analysis.

Genomic annotation

A categorization of SNPs was adopted in order to compare their relative aggregate properties (S1 Table). “Genic” regions were defined as ±10 kb of the 3′ or 5′ untranslated regions (UTR) of a gene. “Intergenic” regions were those that were beyond ±10 kb of the 3′ or 5′ UTR of a gene. “Genic coding” variants were those that code amino acid sequence (nonsense, missense, synonymous). “Genic noncoding” variants were those that were resided outside of the coding region but within the “genic” window size. This includes the region beyond the 3’ or 5’ UTR, as well as the introns. 19.5% of variants in the “genic noncoding” region are also observed in the DHS regions as defined by Gusev et al [45]. Genomic compartments for the “genic coding”, “genic noncoding”, and “intergenic” regions were defined to be non-overlapping. Other window sizes for “genic” regions of ±20 kb and ±50 kb were also tested.

Regulatory element annotation

To examine polygenic effects of specific chromatin marks, we obtained histone modification marks (histone H3 lysine 4 trimethylation or H3K4me3, histone H3 lysine 9 acetylation or H3K9ac, histone H3 lysine 27 acetylation or H3K27ac) in diverse cell types or tissues that are indicative of active enhancers or promoters from the NIH Roadmap Epigenomics Project [14,15] data repository (see URL). We identified histone mark using the MACS v1.4 software [46], with a P value cutoff of 10⁻⁵ and a false discovery rate cutoff of 0.01. We ran the histone mark test files with control files matched by individual if available in order to increase specificity. We also obtained chromatin core 15-state data for 16 ENCODE cell lines (Epigenome ID E114-E119) (see URL). We examined eight active chromatin states (active transcription start site [TSS] proximal promoter states [active TSS, flanking active TSS], a transcribed state at the 5′ and 3′ end of genes showing both promoter and enhancer signatures [transcription at gene 5′ and 3′], actively transcribed states [strong transcription, weak transcription], enhancer states [enhancers, genic enhancers], and a state associated with zinc finger protein genes [ZNF and repeats]) and seven inactive states (constitutive heterochromatin, bivalent regulatory states [bivalent TSS, flanking bivalent TSS/enhancers, bivalent enhancers], repressed PolyComb states [repressed PolyComb, weak repressed PolyComb], and a quiescent state) inferred by ChromHMM [15].

Polygenic risk score analysis across three distinct variant sets

Polygenic risk score (PRS_i) for individual i for a given variant set was calculated as PRSi = ∑j∈SNPsβj×dij, where β_j >0 is the natural log of odds ratio for the risk allele of SNP j from the association result in the MIGen discovery set and d_ij is the dosage (0−2) for that same allele of individual i from the WTCCC CAD validation set, as previously described [12]. Association of polygenic risk scores with disease status was tested using the Wald test and variance explained was estimated by Nagelkerke’s R² from logistic regression [47]. Variance explained after normalization was calculated using Nagelkerke’s R² divided by the number of SNPs. We used different discovery P value thresholds in MIGen (association P<5×10⁻⁸, 5×10⁻⁷, 5×10⁻⁶, 5×10⁻⁵, 5×10⁻⁴, 0.005, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 1) to define various polygenic risk scores (Fig 1).

SNP-heritability analysis across three distinct variant sets

The proportion of phenotypic variance explained by each variant set was estimated using the restricted maximum likelihood method [48] implemented in the Genome-wide Complex Trait Analysis (GCTA) software, transformed to the underlying liability scale assuming a prevalence of 6% for CAD [22,23]. We removed one of each pair of individuals with estimated relatedness larger than 0.05 (grm-cutoff 0.05 in the GCTA software). Given a previous observation that LD does not substantially influence polygenic risk score analysis [49] or heritability explained by genotyped SNPs [13], and that imputed SNPs do not produce biased heritability estimates compared to genotyped SNPs [13], we included all SNPs in our variant sets (Table 1). We performed heritability estimates independently in the MIGen and WTCCC CAD studies, and then a meta-analysis across both studies using as weights the inverse variance (Tables 1 and S2–S4). We also estimated heritability with a prevalence of 3% for early-onset MI/CAD (S5 Table).

Polygenic risk score, enrichment and heritability analysis of three histone modification marks across cell types

We performed analyses on SNPs within three histone marks (H3K27ac, H3K4me3, H3K9ac) in different cell types. We performed polygenic risk score analysis as described above. We used a discovery P value threshold of <0.05 in MIGen to form the polygenic risk score and then validation in WTCCC CAD (Fig 2A). We performed enrichment analysis by comparing the proportion of significant variants passing a specific association P threshold of a variant set with that of a baseline set. We tested different association P thresholds P<5×10⁻⁷, 5×10⁻⁶, 5×10⁻⁵. For the baseline set, we examined variants that resided in the intergenic regions, were not conserved (Genomic Evolutionary Rate Profiling score [50] <5) and did not overlap with any of the studied regulatory elements (Fig 2B). We performed heritability analysis as described above. We restricted heritability analysis to variants within histone marks for the indicated cell line using the restricted maximum likelihood method [48] implemented in the GCTA software (Fig 2C) [22,23]. For all approaches, we performed analyses restricting to SNPs residing in the three histone marks (H3K27ac, H3K4me3, H3K9ac) that were present in the different cell types

Enrichment analysis of MI/CAD-associated SNPs in specific cell types

For enrichment analyses, we selected 45 independent SNPs that were shown to be significantly associated with CAD in a large-scale GWAS meta-analysis (Table 2) [8]. We also included tag SNPs in strong LD (r²≥0.8 in 379 individuals from European populations [85 CEU, 98 TSI, 93 FIN, 89 GBR, 14 IBS] from the 1000 Genomes Project [20]) with the 45 GWAS SNPs. We utilized the NIH Roadmap and ENCODE data and two mammalian conservation algorithms, GERP and SiPhy-omega, implemented in HaploReg v2 [26] (see URL) to examine if the 45 GWAS loci are enriched in regions of inferred strong enhancer chromatin states [27] in specific cell types. The background set for enhancer enrichment analysis was “All SNPs in 1000 Genomes phase I data”.

Hierarchical clustering of MI/CAD-associated SNPs, histone modification marks and specific cell types

For hierarchical clustering analysis, we mapped all 45 MI/CAD GWAS SNPs, along with SNPs in high LD (r²≥0.8) (same SNP set in enrichment analysis), to each of the three histone marks (H3K27ac, H3K4me3 and H3K9ac), which are associated with active regulatory regions, in different cell types (Fig 3). Hierarchical clustering was performed using the heatmap function in R (R Project for Statistical Computing). Reordering of the rows and columns to produce the dendrogram was based on the presence or absence of a SNP residing in a histone mark in different cell types.

Protein connectivity of MI/CAD-associated SNPs in specific cell types

We tested for direct connectivity of genes in GWAS loci in specific cell types. We tested 45 MI/CAD GWAS SNPs, in addition to SNPs in high LD (r²≥0.8) (same SNP set in enrichment analysis), that overlapped with the three histone marks (H3K4me3, H3K9ac, H3K27ac) in a specific cell type. DAPPLE [51] was utilized to test for direct connectivity in PPI networks. Gene regulatory regions were defined as within 110 kb upstream of transcription start site and 40 kb downstream of transcription end site of each of the 45 lead SNPs or tag SNPs were included in the analysis. We performed 1,000 permutations to obtain empirical significance for the observed connectivity compared with the expected connectivity.

Web resources

NIH Roadmap Epigenomics Project, http://www.roadmapepigenomics.org ENCODE Chromatin Core 15-State data, http://egg2.wustl.edu/roadmap/data/byFileType/chromhmmSegmentations/ChmmModels/coreMarks/jointModel/final/ HaploReg v2, http://www.broadinstitute.org/mammals/haploreg/haploreg.php

Supporting Information

Zdroje

1. Marenberg ME, Risch N, Berkman LF, Floderus B, de Faire U (1994) Genetic susceptibility to death from coronary heart disease in a study of twins. N Engl J Med 330: 1041–1046. 8127331

2. McPherson R, Pertsemlidis A, Kavaslar N, Stewart A, Roberts R, et al. (2007) A common allele on chromosome 9 associated with coronary heart disease. Science 316: 1488–1491. 17478681

3. Samani NJ, Erdmann J, Hall AS, Hengstenberg C, Mangino M, et al. (2007) Genomewide association analysis of coronary artery disease. N Engl J Med 357: 443–453. 17634449

4. Helgadottir A, Thorleifsson G, Manolescu A, Gretarsdottir S, Blondal T, et al. (2007) A common variant on chromosome 9p21 affects the risk of myocardial infarction. Science 316: 1491–1493. 17478679

5. Myocardial Infarction Genetics Consortium, Kathiresan S, Voight BF, Purcell S, Musunuru K, et al. (2009) Genome-wide association of early-onset myocardial infarction with single nucleotide polymorphisms and copy number variants. Nat Genet 41: 334–341. doi: 10.1038/ng.327 19198609

6. Schunkert H, Konig IR, Kathiresan S, Reilly MP, Assimes TL, et al. (2011) Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat Genet 43: 333–338. doi: 10.1038/ng.784 21378990

7. Coronary Artery Disease Genetics Consortium (2011) A genome-wide association study in Europeans and South Asians identifies five new loci for coronary artery disease. Nat Genet 43: 339–344. doi: 10.1038/ng.782 21378988

8. CARDIoGRAMplusC4D Consortium, Deloukas P, Kanoni S, Willenborg C, Farrall M, et al. (2013) Large-scale association analysis identifies new risk loci for coronary artery disease. Nat Genet 45: 25–33. doi: 10.1038/ng.2480 23202125

9. Musunuru K, Strong A, Frank-Kamenetsky M, Lee NE, Ahfeldt T, et al. (2010) From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature 466: 714–719. doi: 10.1038/nature09266 20686566

10. Kathiresan S, Srivastava D (2012) Genetics of human cardiovascular disease. Cell 148: 1242–1257. doi: 10.1016/j.cell.2012.03.001 22424232

11. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, et al. (2009) Finding the missing heritability of complex diseases. Nature 461: 747–753. doi: 10.1038/nature08494 19812666

12. Stahl EA, Wegmann D, Trynka G, Gutierrez-Achury J, Do R, et al. (2012) Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nat Genet 44: 483–489. doi: 10.1038/ng.2232 22446960

13. Gusev A, Lee SH, Trynka G, Finucane H, Vilhjalmsson BJ, et al. (2014) Partitioning Heritability of Regulatory and Cell-Type-Specific Variants across 11 Common Diseases. Am J Hum Genet 95: 535–552. doi: 10.1016/j.ajhg.2014.10.004 25439723

14. Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A, et al. (2010) The NIH Roadmap Epigenomics Mapping Consortium. Nat Biotechnol 28: 1045–1048. doi: 10.1038/nbt1010-1045 20944595

15. Roadmap Epigenomics Consortium, Kundaje A, Meuleman W, Ernst J, Bilenky M, et al. (2015) Integrative analysis of 111 reference human epigenomes. Nature 518: 317–330. doi: 10.1038/nature14248 25693563

16. Encode Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489: 57–74. doi: 10.1038/nature11247 22955616

17. Degner JF, Pai AA, Pique-Regi R, Veyrieras JB, Gaffney DJ, et al. (2012) DNase I sensitivity QTLs are a major determinant of human expression variation. Nature 482: 390–394. doi: 10.1038/nature10808 22307276

18. Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, et al. (2012) Systematic localization of common disease-associated variation in regulatory DNA. Science 337: 1190–1195. doi: 10.1126/science.1222794 22955828

19. Trynka G, Sandor C, Han B, Xu H, Stranger BE, et al. (2013) Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat Genet 45: 124–130. doi: 10.1038/ng.2504 23263488

20. Genomes Project Consortium, Abecasis GR, Auton A, Brooks LD, DePristo MA, et al. (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491: 56–65. doi: 10.1038/nature11632 23128226

21. Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, et al. (2010) Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42: 565–569. doi: 10.1038/ng.608 20562875

22. Yang J, Lee SH, Goddard ME, Visscher PM (2011) GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet 88: 76–82. doi: 10.1016/j.ajhg.2010.11.011 21167468

23. Lee SH, Wray NR, Goddard ME, Visscher PM (2011) Estimating missing heritability for disease from genome-wide association studies. Am J Hum Genet 88: 294–305. doi: 10.1016/j.ajhg.2011.02.002 21376301

24. Schunkert H, Konig IR, Kathiresan S, Reilly MP, Assimes TL, et al. (2011) Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat Genet 43: 333–338. doi: 10.1038/ng.784 21378990

25. Pers TH, Timshel P, Hirschhorn JN (2015) SNPsnap: a Web-based tool for identification and annotation of matched SNPs. Bioinformatics 31: 418–420. doi: 10.1093/bioinformatics/btu655 25316677

26. Ward LD, Kellis M (2012) HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res 40: D930–934. doi: 10.1093/nar/gkr917 22064851

27. Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, et al. (2011) Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473: 43–49. doi: 10.1038/nature09906 21441907

28. Nora JJ, Lortscher RH, Spangler RD, Nora AH, Kimberling WJ (1980) Genetic—epidemiologic study of early-onset ischemic heart disease. Circulation 61: 503–508. 7353240

29. Zdravkovic S, Wienke A, Pedersen NL, Marenberg ME, Yashin AI, et al. (2002) Heritability of death from coronary heart disease: a 36-year follow-up of 20 966 Swedish twins. J Intern Med 252: 247–254. 12270005

30. Farh KK, Marson A, Zhu J, Kleinewietfeld M, Housley WJ, et al. (2014) Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature.

31. Poirier P, Giles TD, Bray GA, Hong Y, Stern JS, et al. (2006) Obesity and cardiovascular disease: pathophysiology, evaluation, and effect of weight loss. Arterioscler Thromb Vasc Biol 26: 968–976. 16627822

32. Juonala M, Magnussen CG, Berenson GS, Venn A, Burns TL, et al. (2011) Childhood adiposity, adult adiposity, and cardiovascular risk factors. N Engl J Med 365: 1876–1885. doi: 10.1056/NEJMoa1010112 22087679

33. Lavie CJ, Milani RV, Ventura HO (2009) Obesity and cardiovascular disease: risk factor, paradox, and impact of weight loss. J Am Coll Cardiol 53: 1925–1932. doi: 10.1016/j.jacc.2008.12.068 19460605

34. Wood GC, Still CD, Chu X, Susek M, Erdman R, et al. (2008) Association of chromosome 9p21 SNPs with cardiovascular phenotypes in morbid obesity using electronic health record data. Genomic Med 2: 33–43. doi: 10.1007/s11568-008-9023-z 18716918

35. Szczepanska-Sadowska E, Cudnoch-Jedrzejewska A, Ufnal M, Zera T (2010) Brain and cardiovascular diseases: common neurogenic background of cardiovascular, metabolic and inflammatory diseases. J Physiol Pharmacol 61: 509–521. 21081794

36. Swirski FK, Nahrendorf M, Etzrodt M, Wildgruber M, Cortez-Retamozo V, et al. (2009) Identification of splenic reservoir monocytes and their deployment to inflammatory sites. Science 325: 612–616. doi: 10.1126/science.1175202 19644120

37. Makinen VP, Civelek M, Meng Q, Zhang B, Zhu J, et al. (2014) Integrative genomics reveals novel molecular pathways and gene networks for coronary artery disease. PLoS Genet 10: e1004502. doi: 10.1371/journal.pgen.1004502 25033284

38. WTCCC (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447: 661–678. 17554300

39. Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR (2012) Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet 44: 955–959. doi: 10.1038/ng.2354 22820512

40. Marchini J, Howie B, Myers S, McVean G, Donnelly P (2007) A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 39: 906–913. 17572673

41. Howie BN, Donnelly P, Marchini J (2009) A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet 5: e1000529. doi: 10.1371/journal.pgen.1000529 19543373

42. Marchini J, Howie B (2010) Genotype imputation for genome-wide association studies. Nat Rev Genet 11: 499–511. doi: 10.1038/nrg2796 20517342

43. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559–575. 17701901

44. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, et al. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38: 904–909. 16862161

45. Gusev A, Lee SH, Trynka G, Finucane H, Vilhjálmsson BJ, et al. (2014) Partitioning Heritability of Regulatory and Cell-Type-Specific Variants across 11 Common Diseases. American Journal of Human Genetics 95: 535–552. doi: 10.1016/j.ajhg.2014.10.004 25439723

46. Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, et al. (2008) Model-based analysis of ChIP-Seq (MACS). Genome Biol 9: R137. doi: 10.1186/gb-2008-9-9-r137 18798982

47. Nagelkerke NJD (1991) A note on a general definition of the coefficient of determination. Biometrika 78: 691–692.

48. Lee SH, DeCandia TR, Ripke S, Yang J, Schizophrenia Psychiatric Genome-Wide Association Study C, et al. (2012) Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nat Genet 44: 247–250. doi: 10.1038/ng.1108 22344220

49. International Schizophrenia Consortium, Purcell SM, Wray NR, Stone JL, Visscher PM, et al. (2009) Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460: 748–752. doi: 10.1038/nature08185 19571811

50. Cooper GM, Stone EA, Asimenos G, Program NCS, Green ED, et al. (2005) Distribution and intensity of constraint in mammalian genomic sequence. Genome Res 15: 901–913. 15965027

51. Rossin EJ, Lage K, Raychaudhuri S, Xavier RJ, Tatar D, et al. (2011) Proteins encoded in genomic regions associated with immune-mediated disease physically interact and suggest underlying biology. PLoS Genet 7: e1001273. doi: 10.1371/journal.pgen.1001273 21249183