Genome-Wide Association Study Identifies Novel Restless Legs Syndrome Susceptibility Loci on 2p14 and 16q12.1

Restless legs syndrome (RLS) is a sensorimotor disorder with an age-dependent prevalence of up to 10% in the general population above 65 years of age. Affected individuals suffer from uncomfortable sensations and an urge to move in the lower limbs that occurs mainly in resting situations during the evening or at night. Moving the legs or walking leads to an improvement of symptoms. Concomitantly, patients report sleep disturbances with consequences such as reduced daytime functioning. We conducted a genome-wide association study (GWA) for RLS in 922 cases and 1,526 controls (using 301,406 SNPs) followed by a replication of 76 candidate SNPs in 3,935 cases and 5,754 controls, all of European ancestry. Herein, we identified six RLS susceptibility loci of genome-wide significance, two of them novel: an intergenic region on chromosome 2p14 (rs6747972, P = 9.03 × 10−11, OR = 1.23) and a locus on 16q12.1 (rs3104767, P = 9.4 × 10−19, OR = 1.35) in a linkage disequilibrium block of 140 kb containing the 5′-end of TOX3 and the adjacent non-coding RNA BC034767.

Published in the journal: . PLoS Genet 7(7): e32767. doi:10.1371/journal.pgen.1002171
Category: Research Article
doi: 10.1371/journal.pgen.1002171


Restless legs syndrome (RLS) is a sensorimotor disorder with an age-dependent prevalence of up to 10% in the general population above 65 years of age. Affected individuals suffer from uncomfortable sensations and an urge to move in the lower limbs that occurs mainly in resting situations during the evening or at night. Moving the legs or walking leads to an improvement of symptoms. Concomitantly, patients report sleep disturbances with consequences such as reduced daytime functioning. We conducted a genome-wide association study (GWA) for RLS in 922 cases and 1,526 controls (using 301,406 SNPs) followed by a replication of 76 candidate SNPs in 3,935 cases and 5,754 controls, all of European ancestry. Herein, we identified six RLS susceptibility loci of genome-wide significance, two of them novel: an intergenic region on chromosome 2p14 (rs6747972, P = 9.03 × 10−11, OR = 1.23) and a locus on 16q12.1 (rs3104767, P = 9.4 × 10−19, OR = 1.35) in a linkage disequilibrium block of 140 kb containing the 5′-end of TOX3 and the adjacent non-coding RNA BC034767.


Restless legs syndrome (RLS) is a common neurological disorder with a prevalence of up to 10 %, which increases with age [1]. Affected individuals suffer from an urge to move due to uncomfortable sensations in the lower limbs present in the evening or at night. The symptoms occur during rest and relaxation, with walking or moving the extremity leading to prompt relief. Consequently, initiation and maintenance of sleep become defective [1]. RLS has been associated with iron deficiency, and is pharmacologically responsive to dopaminergic substitution. Increased cardiovascular events, depression, and anxiety count among the known co-morbidities [1].

Genome-wide association studies (GWAs) identified genetic risk factors within MEIS1, BTBD9, PTPRD, and a locus encompassing MAP2K5 and SKOR1 [2][4]. To identify additional RLS susceptibility loci, we undertook an enlarged GWA in a German case-control population, followed by replication in independent case-control samples originating from Europe, the United States of America, and Canada. In doing so, we identified six RLS susceptibility loci with genome-wide significance in the joint analysis, two of them novel: an intergenic region on chromosome 2p14 and a locus on 16q12.1 in close proximity to TOX3 and the adjacent non-coding RNA BC034767.


We enlarged our previously reported [2], [4] GWA sample to 954 German RLS cases and 1,814 German population-based controls from the KORA-S3/F3 survey and genotyped them on Affymetrix 5.0 (cases) and 6.0 (controls) arrays. To correct for population stratification, as a first step, we performed a multidimensional scaling (MDS) analysis, leading to the exclusion of 18 controls as outliers. In a second step, we conducted a variance components analysis to identify any residual substructure in the remaining samples, resulting in an inflation factor λ of 1.025 (Figures S1 and S2). The first four axes of variation from the MDS analysis were included as covariates in the association analysis of the genome-wide stage and all P-values were corrected for the observed λ.

Prior to statistical analysis, genotyping data was subjected to extensive quality control. We excluded a total of 302 DNA samples due to a genotyping call rate <98 %. For individual SNP quality control, we adopted a stringent protocol in order to account for the complexity of an analysis combining 5.0 and 6.0 arrays. We excluded SNPs with a minor allele frequency (MAF) <5%, a callrate <98%, or a significant deviation from Hardy-Weinberg Equilibrium (HWE) in controls (P<0.00001). In addition, we dropped SNPs likely to be false-positive associations due to differential clustering between 5.0 and 6.0 arrays by adding a second set of cases of an unrelated phenotype and discarding SNPs showing association in this setup (see Materials and Methods). Finally, we tested 301,406 SNPs for association in 922 cases and 1,526 controls. Based on a threshold level of a nominal λ-corrected PGWA<10-4, a total of 47 SNPs distributed over 26 loci were selected for follow-up in the replication study (Figure 1, Table S1).

Manhattan plot of the GWA.
Fig. 1. Manhattan plot of the GWA.
Association results of the GWA stage. The x-axis represents genomic position along the 22 autosomes and the x-chromosome, the y-axis shows -log10(P) for each SNP assayed. SNPs with a nominal λ-corrected P<10−4 are highlighted as circles.

We genotyped these 47 SNPs together with 29 adjacent SNPs in strong linkage disequilibrium (LD, r2 = 0.5–0.9) using the Sequenom iPLEX platform in seven case-control populations of European descent, comprising a total of 3,935 cases and 5,754 controls. Eleven SNPs with a call rate <95%, MAF<5%, and P<0.00001 for deviation from HWE in controls as well as 432 samples with a genotyping call rate <90% were excluded. A set of 47 SNPs, genotyped in 186 samples on both platforms (Affymetrix and Sequenom), was used to calculate an average concordance rate of 99.24 %.

The combined analysis of all replication samples confirmed the known four susceptibility loci and, in addition, identified two novel association signals on chromosomes 2p14 and 16q12.1 (Table 1). To address possible population stratification within the combined replication sample, we performed a fixed-effects meta-analysis. For four of the replication case-control populations, we included λ inflation factors which were available from a genomic controls experiment in a previous study in these populations [4]. These were used to correct the estimates for the standard error. Joint analysis of GWA and all replication samples showed genome-wide significance for these two novel loci as well as for the known RLS loci in MEIS1, BTBD9, PTPRD, and MAP2K5/SKOR1 with a nominal λ -corrected PJOINT <5×10−8 (Table 1). Depending on the variable power to detect the effects, the separate analyses of individual subsamples in the replication either confirmed the association after correction for multiple testing or yielded nominally significant results (Tables S2 and S3). The differing relevance of the risk loci in the individual samples is illustrated in forest plots (Figure 2). There was no evidence of epistasis between any of the six risk loci (PBonferroni >0.45).

Forest plots of the RLS risk loci (1 SNP per locus).
Fig. 2. Forest plots of the RLS risk loci (1 SNP per locus).
OR and corresponding confidence interval for the GWA sample, all individual replication samples, the combined replication sample as well as the combined GWA and replication sample are depicted. ORs are indicated by squares with the size of the square corresponding to the sample size for the individual populations. (A) rs2300478 in MEIS1; (B) rs9357271 in BTBD9; (C) rs1975197 in PTPRD; (D) rs12593813 in MAP2K5/SKOR1; (E) rs6747972 in intergenic region on chromosome 2; (F) rs3104767 in TOX3/BC034767.

Tab. 1. Association results of GWA and joint analysis of GWA and replication.
Association results of GWA and joint analysis of GWA and replication.
RLS-associated SNPs with genome-wide significance. PGWA, λ-corrected nominal P-value of GWA stage. PREPLICATION, nominal P-value obtained from meta-analysis of the replication stage samples. PJOINT, nominal P-value of the joint meta-analysis of GWA and replication stage, λ-corrected in samples where λ-values were available. Nominal P-values in GWA were calculated using logistic regression with sex, age, and the first four components from the MDS analysis of the IBS matrix as covariates. For nominal PREPLICATION and PJOINT -values, a fixed-effects inverse-variance meta-analysis was performed. Risk allele frequencies and odds ratios were calculated in the joint sample. LD blocks were defined by D' using Haploview 4.2 based on HapMap CEU population data from HapMap release #27. CI, 95% confidence interval. Genome positions refer to the Human March 2006 (hg18) assembly.

The association signal on 2p14 (rs6747972: nominal λ-corrected PJOINT = 9.03×10−11, odds ratio (OR)  = 1.23) is located in an LD block of 120 kb within an intergenic region 1.3 Mb downstream of MEIS1 (Figure 3). Assuming a long-range regulatory function of the SNP-containing region, in silico analysis for clusters of highly conserved non-coding elements using the ANCORA browser ( identified MEIS1 as well as ETAA1 as potential target genes [5], [6].

New genome-wide significant RLS loci.
Fig. 3. New genome-wide significant RLS loci.
a) Risk locus on chromosome 2p14, showing the best-associated SNP rs6747972 and ±200 kb of surrounding sequence. b) Risk locus on chromosome 16p21, showing the best-associated SNP rs3104767 and ±200 kb of surrounding sequence. The left-hand x-axis shows the negative log10 of the nominal λ-corrected P-values of the GWA stage for all SNPs genotyped in the respective region. The right-hand x-axis shows the recombination frequency in cM/Mb. The y-axis shows the genomic position in Mb based on the hg18 assembly. The r2-based LD between SNPs is colour-coded, ranging from red (r2>0.8) to dark blue (r2<0.2) and uses the best-associated SNP as reference. This SNP is depicted as a violet diamond. Recombination frequency and r2 values are calculated from the HapMap II (release 22) CEU population. Plots were generated with LocusZoom 1.1 (

The second locus on chromosome 16q12.1 (rs3104767: nominal λ-corrected PJOINT  = 9.4×10−19, OR = 1.35) is located within an LD block of 140 kb (Figure 3), which contains the 5′UTR of TOX3 (synonyms TNRC9 and CAGF9) and the non-coding RNA BC034767 (synonym LOC643714). TOX3 is a member of the high mobility box group family of non-histone chromatin proteins which interacts with CREB and CBP and plays a critical role in mediating calcium-dependent transcription in neurons [7]. GWAs have identified susceptibility variants for breast cancer in the identical region [8]. The best-associated breast cancer SNP, rs3803662, is in low LD (r2∼0.1, HapMap CEU data) with rs3104767, but showed association to RLS (λ-corrected nominal PGWA = 7.29×10−7). However, logistic regression analysis conditioned on rs3104767 demonstrated that this association is dependent on rs3104767 (rs3803662: PGWA/conditioned = 0.2883).

BC034767 is represented in GenBank by two identical mRNA transcripts, BC034767 and BC029912. According to the gene model information of the UCSC and Ensembl genome browsers ( and, these mRNAs are predicted to be non-coding. Additional in silico analysis using the Coding Potential Calculator ( supported this by attributing only a weak coding potential to this RNA, suggesting a regulatory function instead [9]. We also searched for rare alleles with strong effects and performed a mutation screening by sequencing all coding and non-coding exons of TOX3 and BC034767 in 188 German RLS cases (Table S4). In TOX3, a total of nine variants not listed in dbSNP (Build 130) were found, three of which are non-synonymous. Only one of these is also annotated in the 1000 Genomes project (November 2010 data release). Three additional new variants were located in putative exons 1 and 2 of BC034767. Analysis of the frequency of these variants as well as all known non-synonymous, frameshift, and splice-site coding SNPs in TOX3 in a subset of one of the replication samples (726 cases and 735 controls from the GER1 sample) did not reveal any association to RLS. For a power of >80%, however, variants with an OR above 4.5 and a MAF ≥0.01 would be required. For even lower MAFs, ORs ≥10 would be necessary for sufficient power. Furthermore, the described CAG repeat within exon 7 of TOX3 was not polymorphic as shown by fragment analysis in 100 population-based controls.

According to publicly available expression data (, in humans, BC034767 is expressed in the testes only, while TOX3 expression has been shown in the salivary glands, the trachea, and in the CNS. Detailed in-depth real time PCR profiling of TOX3 showed high expression levels in the frontal and occipital cortex, the cerebellum, and the retina [10]. To assess a putative eQTL function of rs6747972 or rs3104767, we studied the SNP-genotype-dependent expression of TOX3 and BC034767 as well as of genes known to directly interact with TOX3 (CREB-1/CREBBP/CITED1) and potential target genes of long-range regulatory elements at the locus on chromosome 2 (MEIS1/ETAA1) in RNA expression microarray data from peripheral blood in 323 general population controls [11]. No differential genotype-dependent expression variation was found.

To assess the potential for genetic risk prediction, we split our GWA sample in a training and a test set and determined classifiers for case-control status in the training set to predict case-control status in the test set. Training and test set were independent of each other – not only with respect to included individuals but also with respect to the genotyping procedure as we used genotypes generated on different genotyping platforms. As training set, we used those cases of the current GWA which had been genotyped on 500K arrays in a previous GWA and the corresponding control set [2], in total, 326 cases and 1,498 controls. The test set comprised 583 cases and 1,526 controls, genotyped on 5.0/6.0 arrays as part of the current study. Prior to the analysis, we removed the six known risk loci and performed LD-pruning to limit the analysis to SNPs not in LD with each other. In the end, a total of 76,532 SNPs were included in the pruned dataset. We conducted logistic regression with age and sex as covariates. Based on these association results, the sum score of SNPs showing the most significant effects (i.e. the number of risk alleles over all SNPs) weighted by the ln(OR) of these effects was chosen as predictor variable in the test set. We then varied the P-value threshold for SNPs included in the sum score. For a P-value <0.6, we observed a maximum area under the curve (AUC) of 63.9% and an explained genetic variance of 6.6% (Nagelkerke's R), values comparable to estimates obtained for other complex diseases such as breast cancer or diabetes (Table S5) [12][14]. Inclusion of the six known risk loci in this analysis resulted in a maximum AUC of 64.2% and an explained genetic variance of 6.8%.

Additionally, we performed risk prediction in the combined GWA and replication sample including only the six established RLS risk loci. For this purpose, we used the weighted risk allele score resulting in ORs of up to 8.6 (95% CI: 2.46–46.25) and an AUC of 65.1% (Figures S3 and S4).

By increasing the size of our discovery sample, we have identified two new RLS susceptibility loci. The top six loci show effect sizes between 1.22 and 1.77 and risk allele frequencies between 19 and 82%, and reveal genes in neuronal transcription pathways not previously suspected to be involved in the disorder.

Materials and Methods

Study population and phenotype assessment

Ethics statement

Written informed consent was obtained from each participant in the respective language. The study has been approved by the institutional review boards of the contributing authors. The primary review board was located in Munich, Bayerische Ärztekammer and Technische Universität München.

RLS patients (GWA and replication phase)

A total of 2,944 cases (GWA  = 954, replication  = 1,990) of European descent were recruited in two cycles via specialized outpatient clinics for RLS. German and Austrian cases for the GWA (GWA) and the replication sample (GER1) were recruited in Munich, Marburg, Kassel, Göttingen, Berlin (Germany, n in GWA = 830, n in GER1 = 1,028), Vienna, and Innsbruck (Austria, n in GWA = 124, n in GER1 = 288). The additional replication samples originated from Prag (Czech Republic (CZ), n = 351), Montpellier (France (FR), n = 182), and Turku (Finland (FIN), n = 141). In all patients, diagnosis was based upon the diagnostic criteria of the International RLS Study Group [1] as assessed in a personal interview conducted by an RLS expert. A positive family history was based on the report of at least one additional family member affected by RLS. We excluded patients with secondary RLS due to uremia, dialysis, or anemia due to iron deficiency. The presence of secondary RLS was determined by clinical interview, physical and neurological examination, blood chemistry, and nerve conduction studies whenever deemed clinically necessary.

In addition, 1,104 participants (GER2) of the “Course of RLS (COR-) Study”, a prospective cohort study on the natural course of disease in members of the German RLS patient organizations, were included as an additional replication sample. After providing informed consent, study participants sent their blood for DNA extraction to the Institute of Human Genetics, Munich, Germany. A limited validation of the RLS diagnosis among the majority of members was achieved through a diagnostic questionnaire. Five percent had also received a standardized physical examination and interview in one of the specialized RLS centers in Germany prior to recruitment. To avoid doublets, we checked these subjects against those recruited through other German RLS centers and excluded samples with identical birth date and sex.

556 cases (US) were recruited in the United States at Departments of Neurology at Universities in Baltimore, Miami, Houston, and Palo Alto. Diagnosis of RLS was made as mentioned above.

285 cases (CA) were recruited and diagnosed as above in Montréal, Canada. All subjects were exclusively of French-Canadian ancestry as defined by having four grandparents of French-Canadian origin.

Detailed demographic data of all samples are provided in Table S6.

Control populations (GWA and replication phase)

Controls for German and Austrian cases were of European descent and recruited from the KORA S3/F3 and S4 surveys, general population-based controls from southern Germany. KORA procedures and samples have been described [15]. For the GWA phase, we included 1,814 subjects from S3/F3, and, for the replication stage, 1,471 subjects from S4.

For replication of the GER2 sample, we used controls from the Dortmund Health Study (DHS), a population-based survey conducted in the city of Dortmund with the aim of determining the prevalence of chronic diseases and their risk factors in the general population. Sampling for the study was done randomly from the city's population register stratified by five-year age group and gender [16]. 597 subjects selected at random from the Czech blood and bone marrow donor registry served as Czech controls [17]. French controls included 768 parents of multiple sclerosis patients recruited from the French Group of Multiple Sclerosis Genetics Study (REFGENSEP) [18]. Finnish controls comprised 360 participants of the National FINRISK Study, a cross-sectional population survey on coronary risk factors collected every five years. The current study contains individuals recruited in 2002. Detailed description of the FINRISK cohorts can be found at

French-Canadian controls were 285 unrelated individuals recruited at the same hospital as the cases.

1,200 participants of the Wisconsin Sleep Cohort (WSC), an ongoing longitudinal study on the causes, consequences, and natural course of disease of sleep disorders, functioned as US controls [19].

None of the controls were phenotyped for RLS. All studies were approved by the institutional review boards in Germany, Austria, Czech Republic, France, Finland, the US, and Canada. Written informed consent was obtained from each participant. Detailed demographic data of all samples are provided in Table S6.



Genotyping was performed on Affymetrix Genome-Wide Human SNP Arrays 5.0 (cases) and 6.0 (controls) following the manufacturer's protocol. The case sample included 628 cases from previous GWAs [2], [4] and 326 new cases. After genotype-calling using the BRLMM-P clustering algorithm [20], a total of 475,976 overlapping SNPs on both Affymetrix arrays were subjected to quality control. We added 655 cases of a different phenotype unrelated to RLS, genotyped on 5.0 arrays, to the analysis and excluded those SNPs which showed a significant difference of allele frequencies in cases (RLS and unrelated phenotype on 5.0) and controls (6.0) (n = 92). Thereby, we filtered out SNPs likely to be false-positive associations. We excluded SNPs with a minor allele frequency (MAF) <5% (n = 88,582), a callrate <98% (n = 65,906) or a significant deviation from Hardy-Weinberg Equilibrium (HWE) in controls (P<0.00001) (n = 20,060). Cluster plots of the GWA genotyping data for the best-associated SNPs in Table 1 are shown in Figure S5. Genotypes of these SNPs are available in Table S7.


We selected all SNPs with a λ-corrected Pnominal<10−4 in the GWA for replication. These SNPs clustered in 26 loci (defined as the best associated SNP ±150 kb of flanking sequence). We genotyped a total of three SNPs in each of the 26 regions. These were either further associated neighbouring SNPs with a λ-corrected Pnominal<10−3 or, in case of singleton SNPs, additional neighbouring SNPs from HapMap with the highest possible r2 (at least >0.5) with the best-associated SNP. We also genotyped the best-associated SNPs identified in the previous GWAs [2], [4].

Genotyping was performed on the MassARRAY system using MALDI-TOF mass spectrometry with the iPLEX Gold chemistry (Sequenom Inc, San Diego, CA, USA). Primers were designed using AssayDesign with iPLEX Gold default parameters. Automated genotype calling was done with SpectroTYPER 3.4. Genotype clustering was visually checked by an experienced evaluator.

SNPs with a call rate<95%, MAF<5%, and P<0.00001 for deviations from HWE in controls were excluded. DNA samples with a call rate<90% were also excluded.

Population stratification analysis


To identify and correct for population stratification, we performed an MDS analysis as implemented in PLINK 1.07 (, [21]) on the IBS matrix of our discovery sample. After excluding outliers by plotting the main axes of variation against each other, we performed logistic regression with age, sex, and the values of the MDS components as covariates. Using the Genomic Control approach [22], we obtained an inflation factor λ of 1.11.

Additionally, we performed a variance components analysis using the EMMAX software (, [23]) and, again, calculated the inflation factor with Genomic Control, now resulting in a λ of 1.025. EMMAX uses a mixed linear model and does not only correct for population stratification but also for hidden relatedness. We, therefore, decided to base correction for population substructure on the EMMAX results.


Correction for population stratification was performed for the German, Czech, and the Canadian subsamples. The λ-values of 1.1032, 1.2286, and 1.2637 were derived from a previous Genomic Control experiment within the same samples using 176 intergenic or intronic SNPs [4]. Here, we had applied the expanded Genomic Control method GCF developed by Devlin and Roeder [24]. In the meta-analysis of all replication samples, the λ-corrected standard errors were included for the German, Czech, and Canadian samples. For the other replication samples from France, Finland, and the USA, no such data was available and, therefore, no correction factor was included in the analysis.

Statistical analysis

Statistical analysis was performed using PLINK 1.07 (, [21]). In the GWA sample, we applied logistic regression with age, sex, and the first four axes of variation resulting from an MDS analysis as covariates.

P-values were λ-corrected with the λ of 1.025 from the EMMAX analysis. In the individual analysis of the single replication samples, we tested for association using logistic regression and correcting for gender and age as well as for population stratification where possible (see Population Stratification). Each replication sample was Bonferroni-corrected using the number of SNPs which passed quality control for the respective sample.

For the combined analysis of all replication samples, we performed a fixed-effects inverse-variance meta-analysis. Where available, we used λ-corrected standard errors in this analysis. Bonferroni-correction was performed for 74 SNPs, i.e. the number of SNPs which passed quality control in at least one replication sample.

For the joint analysis of the GWA and the replication samples, we also used a fixed-effects inverse-variance meta-analysis and again included λ-corrected values as far as possible. For the conditioned analysis, the SNP to be conditioned on was included as an additional covariate in the logistic regression analysis as implemented in PLINK.

Interaction analysis was performed using the –epistasis option in PLINK. Significance was determined via Bonferroni-correction (i.e. 0.05/28, as 28 SNP combinations were tested for interaction).

Power calculation

Power calculation was performed using the CaTS power calculator [25] using a prevalence set of 0.08 and an additive genetic model (Table S3). The significance level was set at 0.05/74 for replication stage analysis and at 0.05/301,406 for genome-wide significance in the joint analysis of GWA and replication. For the rare variants association study, the significance level was set at 0.05/12.

Mutation screening of TOX3 and BC034767

All coding and non-coding exons including adjacent splice sites of TOX3 (reference sequence NM_001146188) and BC034767 (reference sequence IMAGE 5172237) were screened for mutations in 188 German RLS cases.

Mutation screening was performed with high resolution melting curve analysis using the LightScanner technology and standard protocols (IDAHO Technology Inc.). DNAs were analyzed in doublets. Samples with aberrant melting pattern were sequenced using BigDyeTerminator chemistry 3.1 (ABI) on an ABI 3730 sequencer. Sequence analysis was performed with the Staden package [26]. Primers were designed using ExonPrimer ( or Primer3plus ( All identified variants were then genotyped in 735 RLS cases and 735 controls of the general population (KORA cohort) on the MassARRAY system, as described above.

In addition, fragment analysis of exon 7 of TOX3 was performed to screen for polymorphic CAG trinucleotide repeats. DNA of 100 controls (50 females, 50 males) was pooled and analyzed on an ABI 3730 sequencer with LIZ-500 (ABI) as a standard. Primers were designed using Primer3plus, the forward Primer contains FAM for detection. Analysis was performed using GeneMapper v3.5.

Expression analyses

Associations between MEIS1/ETAA1 RNA expression and rs6747972 and between TOX3/BC034767/CREB-1/CREBBP/CITED1 expression and rs3104767 were assessed using genome-wide SNP data (Affymetrix 6.0 chip) in conjunction with microarray data for human blood samples (n = 323 general population controls from the KORA cohort, Illumina Human WG6 v2 Expression BeadChip) [11]. A linear regression model conditioned on expression and controlling for age and sex was used to test for association.

Prediction of genetic risk

Based on the performance of P-value-threshold selected SNPs in a training and a test sample

As training sample, we used those GWA-cases which had also been genotyped for our previous study [2]. We also included the control samples from this study. As a first quality control step, we carried out an association analysis comparing the Affymetrix 500K genotypes of these GWA-cases to the Affymetrix 5.0 genotypes of the same cases. Significant P-values would indicate systematic differences in the genotyping between the different chips. For further analysis, we only used those 259,302 SNPs with P-values >0.10. We performed a second quality control step in which IDs with a callrate below 98% and SNPs with a callrate below 98%, a MAF lower than 5%, or a P-value for deviation from HWE<0.00001 were removed.

Further, we excluded the four already known risk loci as well as the two newly identified loci and performed LD-pruning to limit the analysis to SNPs not in LD with each other. This was performed using a window-size of 50 SNPs. In each step, this window was shifted 5 SNPs. We used a threshold of 2 for the VIF (variance inflation factor). 76,532 SNPs, 326 cases, and 1,498 controls were included in the final training dataset. We conducted logistic regression with age and sex as covariates. Based on these association results, the sum score of SNPs showing the most significant effects (i.e. the number of risk alleles over all SNPs) weighted by the ln(OR) of these effects was chosen as predictor variable in the test set, comprising the remaining 583 cases of the GWA sample and 1,526 controls. None of these cases/controls were included in the training-sample, i.e. the test-sample constitutes a completely independent sample. Based on this sum score, we calculated the ROC curve and Nagelkerke's R to measure the explained variance.

Based on a weighted risk allele score

To evaluate the predictive value in our sample, we calculated a weighted sum score of risk alleles in the combined GWA and replication sample. To this end, we used one SNP from each RLS risk region and also included markers from the two newly identified regions on chromosome 16q12 and 2p14 (MEIS1: rs2300478, 2p14: rs6747972, BTBD9: rs9296249, PTPRD: rs1975197, MAP2K5: rs11635424, TOX3/BC034767: rs3104767). At each SNP, the number of risk alleles was weighted with the corresponding ln(OR) for this SNP. The corresponding distribution of the score in cases and controls is illustrated in Figure S3. Employing this score for risk prediction resulted in an AUC of 0.651 (Figure S4).

Supporting Information

Attachment 1

Attachment 2

Attachment 3

Attachment 4

Attachment 5

Attachment 6

Attachment 7

Attachment 8

Attachment 9

Attachment 10

Attachment 11

Attachment 12


1. AllenRPPicchiettiDHeningWATrenkwalderCWaltersAS 2003 Restless legs syndrome: diagnostic criteria, special considerations, and epidemiology. A report from the restless legs syndrome diagnosis and epidemiology workshop at the National Institutes of Health. Sleep Med 4 101 119

2. WinkelmannJSchormairBLichtnerPRipkeSXiongL 2007 Genome-wide association study of restless legs syndrome identifies common variants in three genomic regions. Nat Genet 39 1000 1006

3. StefanssonHRyeDBHicksAPeturssonHIngasonA 2007 A genetic risk factor for periodic limb movements in sleep. N Engl J Med 357 639 647

4. SchormairBKemlinkDRoeskeDEcksteinGXiongL 2008 PTPRD (protein tyrosine phosphatase receptor type delta) is associated with restless legs syndrome. Nat Genet 40 946 948

5. EngstromPGFredmanDLenhardB 2008 Ancora: a web resource for exploring highly conserved noncoding elements and their association with developmental regulatory genes. Genome Biol 9 R34

6. KikutaHLaplanteMNavratilovaPKomisarczukAZEngstromPG 2007 Genomic regulatory blocks encompass multiple neighboring genes and maintain conserved synteny in vertebrates. Genome Res 17 545 555

7. YuanSHQiuZGhoshA 2009 TOX3 regulates calcium-dependent transcription in neurons. Proc Natl Acad Sci U S A 106 2909 2914

8. EastonDFPooleyKADunningAMPharoahPDThompsonD 2007 Genome-wide association study identifies novel breast cancer susceptibility loci. Nature 447 1087 1093

9. MercerTRDingerMEMattickJS 2009 Long non-coding RNAs: insights into functions. Nat Rev Genet 10 155 159

10. DittmerSKovacsZYuanSHSiszlerGKöglM 2011 TOX3 is a neuronal survival factor that induces transcription depending on the presence of CITED1 or phosphorylated CREB in the transcriptionally active complex. J Cell Sci 124 252 60

11. MeisingerCProkischHGiegerCSoranzoNMehtaD 2009 A genome-wide association study identifies three loci associated with mean platelet volume. Am J Hum Genet 84 1 66 71

12. WacholderSHartgePPrenticeRGarcia-ClosasMFeigelsonHS 2010 Performance of common genetic variants in breast-cancer risk models. N Engl J Med 362 986 993

13. LangoHPalmerCNMorrisADZegginiEHattersleyAT 2008 Assessing the combined impact of 18 common genetic variants of modest effect sizes on type 2 diabetes risk. Diabetes 57 3129 3135

14. van HoekMDehghanAWittemanJCvan DuijnCMUitterlindenAG 2008 Predicting type 2 diabetes based on polymorphisms from genome-wide association studies: a population-based study. Diabetes 57 3122 3128

15. WichmannHEGiegerCIlligT 2005 KORA-gen–resource for population genetics, controls and a broad spectrum of disease phenotypes. Gesundheitswesen 67 Suppl 1 S26 30

16. HappeSVennemannMEversSBergerK 2008 Treatment wish of individuals with known and unknown restless legs syndrome in the community. J Neurol 255 1365 1371

17. PardiniBNaccaratiAPolakovaVSmerhovskyZHlavataI 2009 NBN 657del5 heterozygous mutations and colorectal cancer risk in the Czech Republic. Mutat Res 666 64 67

18. Cournu-RebeixIGeninELerayEBabronMCCohenJ 2008 HLA-DRB1*15 allele influences the later course of relapsing remitting multiple sclerosis. Genes Immun 9 570 574

19. YoungTPaltaMDempseyJPeppardPENietoFJ 2009 Burden of sleep apnea: rationale, design, and major findings of the Wisconsin Sleep Cohort study. Wmj 108 246 249

20. Affymetrix Inc. 2007 BRLMM-P: a Genotype Calling Method for the SNP 5.0 Array. Accessed 03. December 2010

21. PurcellSNealeBTodd-BrownKThomasLFerreiraMA 2007 PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81 559 575

22. DevlinBRoederK 1999 Genomic control for association studies. Biometrics 55 997 1004

23. KangHMSulJHServiceSKZaitlenNAKongSY 2010 Variance component model to account for sample structure in genome-wide association studies. Nat Genet 42 348 54

24. DevlinBBacanuSARoederK 2004 Genomic controls to the extreme. Nat Genet 36 1129 1130

25. SkolADScottLJAbecasisGRBoehnkeM 2006 Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat Genet 38 209 213

26. StadenRBealKFBonfieldJK 2000 The Staden package, 1998. Methods Mol Biol 132 115 130

Genetika Reprodukční medicína

Článek vyšel v časopise

PLOS Genetics

2011 Číslo 7

Nejčtenější v tomto čísle

Tomuto tématu se dále věnují…


Zvyšte si kvalifikaci online z pohodlí domova

Co je dobré vědět o IPF
nový kurz

Nová éra v léčbě migrény
Autoři: MUDr. Eva Medová, MUDr. Tomáš Nežádal, Ph.D.

Imunitní trombocytopenie (ITP) u dospělých pacientů
Autoři: prof. MUDr. Tomáš Kozák, Ph.D., MBA

Význam nutraceutik u kardiovaskulárních onemocnění

Pěnová skleroterapie
Autoři: MUDr. Marek Šlais

Všechny kurzy
Kurzy Doporučená témata Časopisy
Zapomenuté heslo

Nemáte účet?  Registrujte se

Zapomenuté heslo

Zadejte e-mailovou adresu se kterou jste vytvářel(a) účet, budou Vám na ni zaslány informace k nastavení nového hesla.


Nemáte účet?  Registrujte se

VIRTUÁLNÍ ČEKÁRNA ČR Jste praktický lékař nebo pediatr? Zapojte se! Jste praktik nebo pediatr? Zapojte se!