Aberrant DNA methylation is an important cancer hallmark, yet the dynamics of DNA methylation changes in human carcinogenesis remain largely unexplored. Moreover, the role of DNA methylation for prediction of clinical outcome is still uncertain and confined to specific cancers. Here we perform the most comprehensive study of DNA methylation changes throughout human carcinogenesis, analysing 27,578 CpGs in each of 1,475 samples, ranging from normal cells in advance of non-invasive neoplastic transformation to non-invasive and invasive cancers and metastatic tissue. We demonstrate that hypermethylation at stem cell PolyComb Group Target genes (PCGTs) occurs in cytologically normal cells three years in advance of the first morphological neoplastic changes, while hypomethylation occurs preferentially at CpGs which are heavily Methylated in Embryonic Stem Cells (MESCs) and increases significantly with cancer invasion in both the epithelial and stromal tumour compartments. In contrast to PCGT hypermethylation, MESC hypomethylation progresses significantly from primary to metastatic cancer and defines a poor prognostic signature in four different gynaecological cancers. Finally, we associate expression of TET enzymes, which are involved in active DNA demethylation, to MESC hypomethylation in cancer. These findings have major implications for cancer and embryonic stem cell biology and establish the importance of systemic DNA hypomethylation for predicting prognosis in a wide range of different cancers.
Aberrant DNA methylation is one of the most important cancer hallmarks , yet its precise role in carcinogenesis and clinical prognosis remains ill-defined . Indeed, the dynamical changes in DNA methylation that happen during carcinogenesis, in particular those prior to morphological changes, have not yet been explored in detail. Moreover, no study has so far reported a DNA methylation signature capable of predicting prognosis across multiple human cancers, unlike gene expression and DNA copy number where such prognostic signatures have been described , .
Both hyper and hypomethylation are commonly observed in cancer . In contrast to hypomethylation, which seems to target large inter-genic satellite repeat regions, hypermethylation appears to happen locally, preferentially targeting the promoters of genes. Several studies have reported that a statistically high fraction of these promoters map to stem cell PolyComb Group Target genes (PCGTs) , , many of which encode transcription factors needed for differentiation, and which are normally suppressed in embryonic stem cells through a reversible mechanism mediated by the Polycomb Repressive Complex (PRC2) . This preferential hypermethylation at PCGTs in cancer supports the view that the reversible gene repression of PCGTs in stem cells may be replaced by permanent silencing in cancer, potentially impairing the differentiation capacity of cells , , . Although there is no causal functional data linking PCGT methylation to carcinogenesis yet, there is accumulating evidence that factors which lead to cancer, for instance age or oxidative damage, are causally involved in DNA methylation at PCGTs –.
Another feature of the epigenetic landscape characterising human embryonic stem cells (hESC) was described by Lister et al . Specifically, using single-base-resolution DNA methylation maps, they demonstrated that a substantial fraction of CpGs is heavily (>80%) Methylated in human Embryonic Stem Cells (MESC) (see Materials and Methods for the precise definition of MESC CpGs and Table S1 for the complete list of MESC CpGs on the 27 k array). However, it is unknown at present what role MESCs may play throughout carcinogenesis. Thus, which epigenetic stem cell features are retained or changed in human cancer and even more importantly at which stage during human carcinogenesis these epigenetic changes occur, is still unclear.
Motivated by these outstanding questions, we decided to (i) explore the dynamics of epigenetic changes at stem cell loci (PCGTs and MESCs) throughout all stages of human carcinogenesis and (ii) to investigate their potential role in predicting poor prognosis.
To address our first aim, we used as a model the uterine cervix, since screening programs in place allow easy access to this organ, and cervical carcinogenesis is also one of the few scenarios in humans where DNA methylation changes in the actual cell of origin and occurring throughout disease progression can be analyzed. Specifically, we measured DNA methylation at over 27,000 CpGs in cervical cells and at three different stages: (a) three years before onset of dysplastic changes, (b) at the stage of non-invasive dysplasia, and (c) at the stage of invasive cervical cancer. To address our second aim we analysed DNA methylation data from 5 independent cohorts encompassing a total of 1,026 tumour samples in 4 different gynaecological cancers. In total, we analysed DNA methylation data from 10 independent studies, encompassing normal and cancer tissue from 5 different tissue types, including metastases (Table 1).
Using these data we here report four major novel aspects of cancer epigenetics: (i) Hypermethylation at PCGT stem cell loci occurs up to three years before the first signs of morphological transformation, (ii) hypomethylation at MESC stem cell loci is a hallmark of cancer invasion, affecting both epithelial and stromal compartments, and increases further in metastases, (iii) hypomethylation instability at MESCs defines a stem cell DNA methylation signature that predicts poor prognosis in multiple human cancers independently of standard prognostic factors, and (iv) expression of TET enzymes – is strongly associated with MESC hypomethylation.
All methylation data in this study were generated with the Illumina Infinium Human Methylation27 beadchip array (Materials and Methods), which assesses the DNA methylation status of 27,578 CpG sites located in the promoter regions of 14,495 genes as described previously . Among these CpGs, 3,465 map to PCGTs, whilst 5,943 map to MESC CpGs (Materials and Methods, Table S1 and Table S2). We also made a distinction between CpGs located within Partially Methylated Domains (PMDs) (a total of 4,750 CpGs on the array mapped to PMDs), and those that are not (termed non-PMDs). PMDs demonstrate reduced methylation levels in more differentiated embryonic tissue compared to embryonic stem cells, and consist of focally hypermethylated elements (corresponding overwhelmingly to CpG islands), concentrated within regions of long-range hypomethylation . PMDs were recently described also in cancer . For precise definitions see Text S1.
To investigate the dynamics of DNA methylation in human carcinogenesis we designed a study with samples from three different phases reflecting cervical carcinogenesis: (1) ‘Before Dysplasia (BDy)’: normal cervical epithelial cells collected within the ARTISTIC trial ,  (n = 152) of which 75 developed a cervical intraepithelial neoplasia grade 2 or 3 (CIN2/3) after three years (cases), whereas the other 77 remained normal (controls). These samples were matched for age and HPV status. (2) ‘Dysplasia (Dy)’: age-matched non-invasive dysplastic epithelial cells (CIN2/3) (n = 18, all HPV+) and normal cervical epithelial cells (n = 30, 19 HPV− and 11 HPV+) collected within screening programs , and (3) ‘Invasive Cancer (CA)’: invasive cervical cancer tissue (n = 48) and normal cervical tissue (n = 15) collected within a clinical setting. Further details of the samples are described in Text S1 (see also Table 1).
As expected, PCGTs were highly enriched among CpGs hypermethylated in invasive cervical cancer (Figure 1A and 1C). In contrast, CpGs that become hypomethylated in invasive cervical cancer are to a large extent MESCs (Figure 1B and 1D). Most importantly, PCGTs were hypermethylated three years prior to any cytological changes (Figure 1C, OR = 2.44; 95%CI = 2.27–2.63; p<10−100), especially for those PCGT CpGs located within PMDs (OR = 4.81; 95%CI = 4.19–5.52; p<10−100). We verified that PCGT enrichment was also independent of HPV status (P<0.005 for HPV+ and HPV−). Notably, the frequency of hypermethylation remained fairly constant throughout the phases from non-invasive dysplasia to invasive cancer (Figure 2A and Figures S1, S2, S3, S4).
In contrast to PCGT methylation, MESC hypomethylation appears as a progressive process towards invasive cancer: whereas we observed a substantial enrichment of MESCs in the normal samples three years prior to the dysplastic changes (OR = 5.69 and 9.55 for PMD and nonPMD respectively), non-invasive dysplastic samples had an increased MESC enrichment in hypomethylated CpGs (OR = 7.62 and 12.30 for PMD and nonPMD, respectively) and eventually MESC CpGs contributed most significantly to hypomethylated CpGs in invasive cancer (OR = 18.84 and 26.85 for PMD and nonPMD respectively; Figure 1D, Figure 2A, and Figures S1, S2, S3, S4). In order to check that these enrichments are not just a consequence of the baseline methylation levels (i.e. the levels in normal tissue), we estimated the enrichment relative to other CpGs with specific baseline methylation levels (CpGs with mean β-values in normal cervical tissue samples of <0.2 and >0.4). This confirmed that the observed PCGT and MESC enrichment was independent of the initial methylation levels in normal tissue, and that this was particularly true for PCGT/MESC CpGs within PMDs (Figure S5). Thus, MESC CpGs that showed reduced methylation levels (<80%) in normal tissue compared to their levels in hESCs (>80%) were still more likely to exhibit further hypomethylation in dysplasia and cancer than a control set of CpGs with similar methylation levels in normal tissue (Figure S5).
To test if PCGT and MESC methylation changes are also present in cells which are not immediately involved in carcinogenesis we studied white blood cell DNA from women who carry BRCA1 mutations and who are therefore at an 80% lifetime risk of developing breast and/or ovarian cancer. Whereas MESC methylation was not altered, we observed that PCGTs were highly enriched among CpGs hypermethylated in blood cells from BRCA1 mutation carriers (Figures S6 and S7).
Next, we asked if the progressive hypomethylation of MESCs towards invasive cancer is a generic feature of tumour biology. We analysed DNA methylation profiles of breast, endometrial, colorectal and lung cancer (Text S1; Figure 2B and Figures S1, S2, S6, S7), and in all cancer types we observed a significant loss of methylation at MESC CpGs, concurrent with the expected hypermethylation of PCGT CpGs.
As demonstrated in Figure 2A and 2B, PCGT methylation enrichment exists prior to and at the stage of non-invasive dysplasia when analyzing only epithelial cells without stroma and remains constant when studying invasive cancer tissue which contains some stromal components. In contrast, MESC enrichment doubles in the hypomethylated fraction when comparing invasive cancer to non-invasive dysplastic cells. This pronounced enrichment could be contributed by MESC hypomethylation in the cancer-associated stromal component. To test this, we analyzed those PCGTs and MESCs that are enriched in the hyper- and hypomethylated fractions in lung cancer and asked if these CpGs are also enriched in lung cancer associated fibroblasts compared to normal lung fibroblasts . Interestingly, while there was no enrichment of PCGTs (Figure 2C), there was a clear enrichment of lung cancer MESCs among PMD CpGs that are hypomethylated in lung cancer fibroblasts (Figure 2D). This further supports the view that MESC hypomethylation is an important characteristic of cancer invasion, and that it may therefore be a molecular determinant of clinical outcome.
Molecular signatures, and in particular gene expression signatures, involving stem cell genes have been associated with poor prognosis in several cancers , . Therefore, given the fundamental role of PCGT and MESC CpGs in the dynamics of DNA methylation in human cancer, as just described, it is natural to ask if DNA methylation changes at these stem cell loci can predict clinical outcome. In particular, we posited that epigenetic instability, as measured by DNA methylation changes from a normal reference, might indicate clinical outcome. To test this idea, we devised an Epigenetic Instability Index (EpI) to evaluate instability for each tumour sample as the fraction of significant DNA methylation changes relative to a corresponding normal reference profile (Materials and Methods). The instability index was divided into 4 types according to the baseline normal reference methylation (0 = unmethylated, 1 = hemimethylated, 2 = methylated) and the nature of DNA methylation changes (0→1/2, 1→2, 1→0, 2→0/1) observed in cancer (Materials and Methods, Figure 3A and 3B). In addition, we considered the EpI restricted to PCGT and MESC stem cell loci, and since very few PCGT CpGs were observed to be methylated (1 or 2) in normal tissue, this resulted in 3 stem cell EpI indices: PCGT (0→1/2), MESC (1→0), MESC (2→0/1). Remarkably, we observed that the demethylation instability index (DeMI) at MESCs (2→0/1) was associated with poor prognosis in endometrial, breast, ovarian, and cervical cancers (Figure 4). In multivariate analysis, the DeMI was a predictor of poor prognosis in all cancers independently of other prognostic factors (Table 2 and Table S3), demonstrating the clinical potential of this DNA methylation stem cell signature. In contrast, the methylation instability index defined at PCGTs only correlated with clinical outcome in ovarian cancer (Table S3). Survival analysis at individual CpG level further demonstrated the consistent enrichment of MESC CpGs among prognostic CpGs hypomethylated in poor outcome samples in all 4 invasive cancers, whereas PCGT CpGs were not consistently enriched in either the hyper or hypomethylated prognostic component (Table S4). There was also substantial overlap between the MESC CpGs which have stable methylation levels in normal tissue and which become hypomethylated in cancer, and prognostic MESC CpGs that are hypomethylated in poor outcome tumour samples (Table S5).
To further demonstrate that MESC hypomethylation is an important determinant of poor outcome in human cancer, we tested if these epigenetic changes progress further in metastatic cancer. Thus, we compared DNA methylation profiles of primary endometrial cancers to extra-uterine metastases of endometrial cancer. Importantly, the DeMI index was higher in metastatic cancer compared to primary tumours, but not so for the hypermethylation instability index at PCGTs (Figure 5A). In fact, the DeMI index demonstrates clinical potential for discriminating primaries that may be destined to metastasize (Figure 5B). From these data we can therefore conclude that while PCGT hypermethylation is an important event in early oncogenesis, which persists at later stages, MESC hypomethylation is a progressive process and a key characteristic of more malignant cancers (Figure 3B).
The ability of the DeMI index to predict clinical outcome in multiple cancers indicates that a core set of MESC CpGs may be involved. To investigate this we ranked the MESC CpGs according to the frequency of hypomethylation in each of the cancers considered. Many CpGs were observed to be hypomethylated in large fractions of tumours (Figure 6 and Table S6). While there were 6 MESC CpGs (FCGR3B, FLJ27255, FCN2, KRT82, CDH13, KRTAP8-1 on chromosome 1, 6, 9, 12, 16 and 21 respectively) commonly hypomethylated at a frequency of at least 10% in all four cancers (P<10−4), there were substantially larger overlaps between related cancers such as ovarian and endometrial cancer (overlap of 98 CpGs, OR = 134, 95%CI = (89–205), P = 3.2×10−124). Gene Set Enrichment Analysis (GSEA)  of the hypomethylated MESCs in each cancer also revealed a striking overlap of enriched terms, especially between endometrial and ovarian cancer where we observed widespread hypomethylation at 20q11 and 9q34 (Table S7).
Up until recently it has been assumed that DNA demethylation in cancer is a passive event, occurring as a result of absent re-methylation during DNA replication, with a consequent dilution of this covalent DNA modification. This view has now been substantially challenged by the identification of TET (ten eleven translocation) dioxygenases, which can convert 5-methylcytosine into 5-hydroxymethylcytosine and 5-carboxylcytosine, which thus constitutes a pathway for active DNA demethylation –, . In particular, it has been demonstrated that TET3-mediated DNA hydroxylation is involved in epigenetic reprogramming of the zygotic paternal DNA following natural fertilization and that this may also contribute to somatic cell nuclear reprogramming during animal cloning . We therefore analysed mRNA expression of TET1 and two isoforms of TET2, and TET3 (see Text S1 for primer information), to test whether hypomethylation is associated with TET expression. We observed a strong correlation between high TET, in particular TET3 expression, and hypomethylation, specifically at MESC CpGs (Figure 7 and Figure S8). We checked that the anti-correlation of TET expression with MESC CpG methylation was independent of the level of methylation in normal tissue (Figure S9). Although this observation is purely correlative, it is consistent with the view that TET3 overexpression (Figure S10) in cancer contributes to reprogramming of cancer cells via active DNA demethylation.
Epithelial cells of the uterine cervix offer a unique opportunity to study epigenetic alterations throughout carcinogenesis. Our first key result is the demonstration that normal cells of origin acquire methylation changes at least three years in advance of the first morphological changes. Specifically, our data demonstrate that PCGT hypermethylation and MESC hypomethylation are major contributors to early cervical carcinogenesis. This is independent of human papillomavirus (HPV) infection as our study was matched for HPV status, and since PCGT enrichment was observed in both HPV+ and HPV− samples. Importantly, the observed enrichments were also independent of the levels of methylation in normal tissue. That is, MESCs which showed full methylation (i.e. β-value>0.8) or hemi-methylation (i.e. 0.3<β-value<0.7) were preferentially hypomethylated in dysplasia and cancer in comparison to control sets of CpGs with same methylation levels in normal tissue.
The role of PCGT methylation as a very early event is further supported by our finding that PCGTs were highly enriched among CpGs which were hypermethylated in blood cells from BRCA1 mutation carriers, suggesting that BRCA1 is an important regulator of the DNA methylome and that aberrant BRCA1 function could lead to increased predisposition to cancer through increased methylation at PCGT loci. The fact that BRCA1 mutation carriers showed increased PCGT methylation in their blood cells but are at no substantial increased risk to develop blood-borne cancers suggests that PCGT hypermethylation refers a substantial risk but that there are additional factors required (e.g. endocrine, paracrine or viral triggers).
Our second key result is that MESC hypomethylation occurs in both the epithelial and stromal components of cancer and that this is a progressive process, increasing significantly towards invasion and metastatic cancer. This in turn suggests that the level of MESC hypomethylation in primary tumours may be an important determinant of clinical outcome.
Indeed, our third key result is the report of a stem cell (MESC) DNA hypomethylation signature that can predict clinical outcome in multiple human cancers, independently of known prognostic factors. To the best of our knowledge this constitutes the first report of a common prognostic signature in cancer that is based on DNA methylation, and is therefore an epigenetic analogue to the prognostic genomic instability signature presented in .
Besides the key distinction of PCGT and MESC CpGs, we also observed that the localisation of CpGs in relation to PMDs was another important facet of the pattern of DNA methylation changes. Specifically, PCGT hypermethylation was observed preferentially within PMDs, while the progressive MESC hypomethylation in cancer was equally strong in PMDs and non-PMDs. We point out that while the PMDs considered here were defined for colon cancer cells, that these broad regions of partial methylation overlap significantly between colon tissue and fibroblasts, suggesting that these regions may be largely similar also between different tissues.
The similarities between normal developmental and cancer epigenetic programming are intriguing. While embryonic stem cells suppress differentiation-inducing genes reversibly via promoter occupancy of PRC2, cancer cells suppress these same genes much more robustly via covalent DNA modification. Even more interestingly, trophoblast cells whose core function is to invade the maternal tissue and form the placenta, are relatively more hypomethylated compared with the inner cell mass, which will differentiate into the embryo , supporting the view that hypomethylation may be associated with the capacity to invade neighbouring tissue such as the maternal endometrium. Similarly, the observed correlation between MESC hypomethylation and the malignant potential of cancers suggests that fully methylated MESCs may provide a protective mechanism against invasion. Thus, the fact that the great majority of MESCs exhibit similar high methylation levels in stem cells and normal tissues, means that high MESC methylation may be viewed as an intrinsic property of any normal cell, regardless of whether it is a stem cell or a mature differentiated one. In this model then, hypomethylation at MESCs would lead to a transformed cellular phenotype that is more prone to invasion. In this context however, it is worth pointing out that the observed MESC hypomethylation could also be reflecting changes in the stromal cell content of the tumours. Indeed, the observation that cancer fibroblasts show similar hypomethylation changes at MESC loci suggests that the more frequent MESC hypomethylation in invasive cancers could be partly due to increased numbers of cancer fibroblasts.
It could also be argued that the other DNA methylation changes we have reported here are the result of changes in the stromal and immune cell compartments of the tumours. However, we verified using Principal Components Analysis (PCA) and GSEA analysis  on normal liquid based cytology (LBC) samples and separately on age-matched cervical dysplasias (Table 1, “Dy”-study) that the components of variation associated with stromal and immune cell markers were very similar between normal and dysplasia, in stark contrast to PCGTs which showed a dramatic difference with comparatively no variation in normal tissue but representing the dominant component of variation in dysplasia (manuscript in preparation). Thus, the DNA methylation changes at PCGT loci reported here are unlikely to be due to changes in the stromal cell composition of tumours.
Finally, the crucial role of TET3 in DNA demethylation and early development, its overexpression in cancer, and the observed correlation with MESC hypomethylation, supports the view that aberrant developmental programs leading to reprogramming of the epigenome in adult cells may be critical for carcinogenesis. Interfering with these aberrant programs may therefore lead to novel ways to treat cancer.
In summary, our findings suggest that epigenetic deregulation of two distinct sets of genes, both important for stem cell integrity, impact carcinogenesis in different ways: one process involves gain of methylation and is a hallmark of de-differentiation and early oncogenesis, while the other involves loss of methylation and is a key determinant of invasion and clinical outcome.
Materials and Methods
Definition of MESC
A recent study used bisulfite sequencing to map, at single-base-resolution, DNA methylation throughout the majority of the human genome in both embryonic stem cells and fibroblasts . For each CpG site, the number of C and T reads covering each methyl cytosine on both forward and reverse strands were provided . The multiple reads covering each methyl cytosine can be used as readout of the fraction of sequences within the sample that are methylated at that particular site (i.e. C reads/C+T reads) , and hence, referred as the methylation level of the site. In this study, Methylated in human Embryonic Stem Cells (MESC) CpGs are the CpG sites that were covered by at least 5 reads on both forward and reverse strands (i.e. the total number of C and T reads on both strands > = 5) and the overall mean methylation levels (i.e. the average methylation level of both the forward and reverse strands) is greater than 80%. MESC CpGs were then mapped to those present on the Illumina 27 k array (Table S1). Functional annotation (gene assignment) of the MESC CpGs present on the array was obtained from Illumina and Bioconductor annotation packages.
Definition of PCGTs
PolyComb Group Target genes (PCGTs) were defined as CpGs which are occupied by SUZ12 and/or EeD and/or are trimethylated at Lysine 27 on histone H3 in human embryonic stem cells (Table S2, annotation file kindly provided by Benjamin P. Berman and Peter W. Laird) .
DNA Methylation Assay
DNA from LBC samples and tissues was isolated using the Qiagen DNeasy Blood and Tissue Kit (Qiagen Ltd, UK, 69506) and quantified via spectrophotometry (Nanodrop, Thermo Scientific Ltd UK) with 600 ng DNA from each sample. DNA from whole blood was extracted using a chloroform based extraction method from 400 µL of blood. All DNA samples were bisulphite modified using the EZ DNA Methylation Kit D5004/8 (Zymo Research, Orange, CA, USA) according to the manufacturer's instructions.
DNA methylation profiling
The genome wide methylation analyses were performed using the validated Illumina Infinium Human Methylation27 BeadChip (Illumina Inc USA, WG-311-1201) . During the assay, bisulphite (BS) converted DNA is amplified, fragmented and hybridised to the BeadChip arrays (each chip accommodates 12 samples as designated by Sentrix positions A–L). A single base extension is then performed using labelled DNP- and biotin labelled dNTPs. The arrays were imaged using a BeadArray Reader. Image processing and intensity data extraction were performed according to Illumina's instructions. Each interrogated locus is represented by specific oligomers linked to two bead types: one representing the sequence for methylated DNA (M) and the other for unmethylated DNA (U). For each specific CpG site, the methylation status is calculated from the intensity of the M and U alleles, as the ratio of the fluorescent signals β = Max(M,0)/[Max(M,0)+Max(U,0)+100]. Hence, DNA methylation β-values are continuous variables between 0 (absent methylation) and 1 (completely methylated) representing the ratio of the methylated allele to the combined locus intensity.
Total RNA was isolated as previously described . Reverse transcription of RNA was performed using M-MLV Reverse Transcriptase (Promega) according to the manufacturer's instructions. Primers and probes for the TET genes were designed using Primer Express (Applied Biosystems, Foster City, CA, USA). Samples in which TET was not amplified by real-time PCR after 45 cycles were classified as TET negative.
Quality control and inter-array normalisation
Quality control procedures and intra-array normalisation were run on all data except the ‘Colon CA’, ‘Lung CA’, and ‘Ovarian CA’ sets, for which the intra-array normalised data was downloaded directly from Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) databases. Background corrected U and M values, β-values (as generated from the Beadstudio software) and built-in controls were used to evaluate the quality of individual arrays. Samples with low BS conversion efficiency (BS control intensity values <4000) were excluded, as well as other outliers that we detected using boxplots of total intensity I = U+M values and histograms of β-values. Samples were filtered further according to CpG coverage, using the Beadstudio P-values of detection of signal above background.
Enrichment analysis was performed using a two-tailed Fisher's exact test. Odds ratios (OR) and 95% confidence intervals (CI) of enrichment were also computed and their corresponding significance levels estimated. Enrichment analysis was performed with a range of thresholds to check for robustness and using the Infinium 27 k array as reference to avoid array-specific bias.
A linear regression approach was used to model the association between disease status (cases or controls) and the CpG β-value methylation profile. Adjustment for age and experimental factors (e.g. bisulphite conversion) was performed by inclusion of these factors in the model as covariates. Chip effects were observed, and in this study all data were adjusted by either applying the “ComBat” method (a method that is robust to outliers and that allows for adjustment in cases where sample sizes per chip are small)  or using the chip as a covariate in the linear model. The linear model was adopted over a non-linear logistic or probit model as the linear model performed better in capturing profiles with larger effect sizes.
Given the two disease-status-associated CpG lists (hyper- or hypomethylated) obtained from the supervised analysis, the two-tailed binomial test was used to detect the skewness of the methylation in various categories (i.e. colon-PMD PCGTs, colon-PMD MESCs, nonPMD PCGTs, and nonPMD MESCs) of the CpGs (Figure 2, Figures S1 and S2).
Epigenetic instability analysis
We devised an Epigenetic Instability Index (EpI) for each tumour sample as follows. First, CpG readings were defined as unmethylated (0) (β-value<0.25), hemimethylated (1) (0.25≤β-value≤0.7), and methylated (2) (β-value>0.7). Next, we selected CpGs with stable methylation profiles in normal tissue, defined as those CpGs with the same methylation state in all normal samples corresponding to the given tissue. These stable CpGs can undergo four types of DNA methylation changes in cancer: 0->1/2, 1->2, 1->0 and 2->0/1. Therefore, for each tumour sample, we computed four different “instability” indices, reflecting the fraction of stable CpGs undergoing the specific types of DNA methylation changes shown. When computing these indices, and to ensure their robustness to the choice of methylation thresholds above, we also required at least a 10% change in β-values for calling DNA methylation differences between normal and cancer tissue. This buffering therefore avoids calling potentially small differences in β-values (<10%), which nevertheless may trespass the methylation thresholds (0.25, 0.7) used. The EpI indices were also computed by restricting the set of stable CpGs to those mapping to PCGT and MESC stem cell loci. Since the great majority of PCGT CpGs were observed to be stably unmethylated (0) in normal tissue, this resulted in 3 “stem cell EpI” indices: PCGT (0->1/2), MESC (1->0), MESC (2->0/1). We call the latter index the Demethylation instability index (DeMI).
Univariate and multivariate Cox regression models were used for the survival analysis. In the multivariate analysis, besides DNA methylation β-values (or the EpI index), those clinical and histological factors, which were associated with survival in univariate analysis were also included as covariates.
1. JonesPABaylinSB 2007 The epigenomics of cancer. Cell 128 683 692
2. BaylinSBJonesPA 2011 A decade of exploring the cancer epigenome - biological and translational implications. Nat Rev Cancer 11 726 734
3. CarterSLEklundACKohaneISHarrisLNSzallasiZ 2006 A signature of chromosomal instability inferred from gene expression profiles predicts clinical outcome in multiple human cancers. Nat Genet 38 1043 1048
4. TingDTLipsonDPaulSBranniganBWAkhavanfardS 2011 Aberrant overexpression of satellite repeats in pancreatic and other epithelial cancers. Science 331 593 596
6. OhmJEMcGarveyKMYuXChengLSchuebelKE 2007 A stem cell-like chromatin pattern may predispose tumor suppressor genes to DNA hypermethylation and heriTable Silencing. Nat Genet 39 237 242
7. LeeTIJennerRGBoyerLAGuentherMGLevineSS 2006 Control of developmental regulators by Polycomb in human embryonic stem cells. Cell 125 301 313
8. O'HaganHMWangWSenSDestefano ShieldsCLeeSS 2011 Oxidative Damage Targets Complexes Containing DNA Methyltransferases, SIRT1, and Polycomb Members to Promoter CpG Islands. Cancer Cell 20 606 619
9. MaegawaSHinkalGKimHSShenLZhangL 2010 Widespread and tissue specific age-related DNA methylation changes in mice. Genome Res 20 332 340
10. RakyanVKDownTAMaslauSAndrewTYangTP 2010 Human aging-associated DNA hypermethylation occurs preferentially at bivalent chromatin domains. Genome Res 20 434 439
11. TeschendorffAEMenonUGentry-MaharajARamusSJWeisenbergerDJ 2010 Age-dependent DNA methylation of genes that are suppressed in stem cells is a hallmark of cancer. Genome Res 20 440 446
12. Lister RPMDowenRHHawkinsRDHonGTonti-FilippiniJNeryJRLeeLYeZNgoQM 2009 Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462 315 322
13. PastorWAPapeUJHuangYHendersonHRListerR 2011 Genome-wide mapping of 5-hydroxymethylcytosine in embryonic stem cells. Nature 473 394 397
14. WilliamsKChristensenJPedersenMTJohansenJVCloosPA 2011 TET1 and hydroxymethylcytosine in transcription and DNA methylation fidelity. Nature 473 343 348
15. FiczGBrancoMRSeisenbergerSSantosFKruegerF 2011 Dynamic regulation of 5-hydroxymethylcytosine in mouse ES cells and during differentiation. Nature 473 398 402
16. GuTPGuoFYangHWuHPXuGF 2011 The role of Tet3 DNA dioxygenase in epigenetic reprogramming by oocytes. Nature
17. WuHD'AlessioACItoSXiaKWangZ 2011 Dual functions of Tet1 in transcriptional regulation in mouse embryonic stem cells. Nature 473 389 393
18. BibikovaMFanJB 2010 Genome-wide DNA methylation profiling. Wiley Interdiscip Rev Syst Biol Med 2 210 223
19. BermanBPWeisenbergerDJAmanJFHinoueTRamjanZ 2011 Regions of focal DNA hypermethylation and long range hypomethylation in colorectal cancer coincide with nuclear lamina-associated domains. Nature Genetics
20. KitchenerHCAlmonteMGilhamCDowieRStoykovaB 2009 ARTISTIC: a randomised trial of human papillomavirus (HPV) testing in primary cervical screening. Health Technol Assess 13 1 150, iii–iv
21. KitchenerHCAlmonteMThomsonCWheelerPSargentA 2009 HPV testing in combination with liquid-based cytology in primary cervical screening (ARTISTIC): a randomised controlled trial. Lancet Oncol 10 672 682
22. ApostolidouSHadwinRBurnellMJonesABaffD 2009 DNA methylation analysis in liquid-based cytology for cervical cancer screening. Int J Cancer 125 2995 3002
23. NavabRStrumpfDBandarchiBZhuCQPintilieM 2011 Prognostic gene-expression signature of carcinoma-associated fibroblasts in non-small cell lung cancer. Proc Natl Acad Sci U S A 108 7160 7165
24. Ben-PorathIThomsonMWCareyVJGeRBellGW 2008 An embryonic stem cell-like gene expression signature in poorly differentiated aggressive human tumors. Nat Genet 40 499 507
25. EppertKTakenakaKLechmanERWaldronLNilssonB 2011 Stem cell gene expression programs influence clinical outcome in human leukemia. Nat Med 17 1086 1093
26. SubramanianATamayoPMoothaVKMukherjeeSEbertBL 2005 Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102 15545 15550
27. HeYFLiBZLiZLiuPWangY 2011 Tet-mediated formation of 5-carboxylcytosine and its excision by TDG in mammalian DNA. Science 333 1303 1307