Leprosy is an infectious disease caused by the obligate intracellular pathogen Mycobacterium leprae and remains endemic in many parts of the world. Despite several major studies on susceptibility to leprosy, few genomic loci have been replicated independently. We have conducted an association analysis of more than 1,500 individuals from different case-control and family studies, and observed consistent associations between genetic variants in both TLR1 and the HLA-DRB1/DQA1 regions with susceptibility to leprosy (TLR1 I602S, case-control P = 5.7×10−8, OR = 0.31, 95% CI = 0.20–0.48, and HLA-DQA1 rs1071630, case-control P = 4.9×10−14, OR = 0.43, 95% CI = 0.35–0.54). The effect sizes of these associations suggest that TLR1 and HLA-DRB1/DQA1 are major susceptibility genes in susceptibility to leprosy. Further population differentiation analysis shows that the TLR1 locus is extremely differentiated. The protective dysfunctional 602S allele is rare in Africa but expands to become the dominant allele among individuals of European descent. This supports the hypothesis that this locus may be under selection from mycobacteria or other pathogens that are recognized by TLR1 and its co-receptors. These observations provide insight into the long standing host-pathogen relationship between human and mycobacteria and highlight the key role of the TLR pathway in infectious diseases.
Leprosy is a chronic granulomatous disease affecting the skin and peripheral nerves and caused by Mycobacterium leprae. Despite its being the first identified pathogen in humans, leprosy remains endemic in central Africa, Southeast Asia and South America with more than 200,000 new cases per year globally. Our understanding of the bacterial pathogenesis and interaction with the human host is limited, because of the inability to culture the bacterium in vitro. Nevertheless, twin studies , familial clustering  and segregation analyses  studies have suggested that host genetics play an important role in susceptibility to this infectious disease with the heritability estimated up to 57% . This allows genetic research to help understand the immunity against M. leprae and provide insight into the host-pathogen relationship.
Although several genetic loci have been reported to associate with susceptibility to leprosy, candidates showing independent replications are scanty, and the vast majority of the phenotypic variation remains unexplained. To identify the genetic variants affecting susceptibility to leprosy, we conducted a population based case-control association study using a gene-centric 50 K microarray covering variants in 2,092 genes throughout the genome , and found TLR1 and HLA-DRB1/DQA1 as major determinants of leprosy susceptibility. We also observe a high degree of population differentiation at the TLR1 gene, suggesting that mycobacterial diseases may have contributed to the evolution of this locus. These observations refine our understanding of the long interaction between human and mycobacteria and suggest that modulation of the TLR1 pathway may be valuable in future treatment of mycobacterial diseases.
Primary association analysis
We genotyped 258 leprosy cases and 300 controls recruited from New Delhi, India where the disease is prevalent. Diagnosis of leprosy was made by at least two independent leprologists with standard histopathological examination of affected skin lesions. After quality control filtering, the genotype rate was 99.5% with separate multi-dimensional scaling (MDS) and principal component analysis (PCA) showing minimal population substructure in the 448 individuals (209 cases and 239 controls) carried forward for analysis (Figures S1 and S2, Tables S1 and S2). The strongest association was observed at the human leukocyte antigen (HLA) class II locus at chromosome 6p21 (Figure 1), with two SNPs showing genome-wide level of association (rs9270650 P = 6.4×10−10 and rs1071630 P = 8.5×10−10, Table 1, Table S3).
Human leukocyte antigen (HLA) region
We genotyped variants with P<1×10−4 in the primary association analysis, using the Sequenom MassArray primer extension assay in an independent case-control cohort recruited from Kolkata (220 cases and 162 controls), West Bengal as part of the replication study. To further minimise the possibility of population stratification resulting in spurious association, we undertook a family-based study recruited from Kumbakonam (N = 941 with 161 families), Tamil Nadu in south India which is robust to population substructure . These results provide further support for the HLA associations, with rs1071630 located in HLA-DQA1 showing convincing evidence of association across these populations (case-control P = 4.9×10−14, OR = 0.43, 95% CI = 0.35–0.54, Table 1, Figure 2, Figure S3 and Table S4). Strong evidence of association was also observed with rs9270650 in HLA-DRB1 (case-control P = 3.1×10−11, OR = 2.24, 95% CI = 1.76–2.86, Table 1, Figure S3 and Table S4). Despite the more significant associations observed in the recessive models as compared to the dominant models, the allelic tests remained most significant suggesting that the alleles may be acting with a multiplicative effect (Table S5). The associations remained highly significant after corrections for age and gender in logistic regression models (Table S5). Statistical tests of heterogeneity showed similar effect sizes between the multibacillary and paucibacillary subtypes (Table S6). These results are consistent with a recently published study conducted in the Han Chinese population, showing association at the same locus with a similar effect size . Conditional logistic regression analysis suggested that neither of these two variants can alone account for the observed association (Table S7), but instead form bi-variate haplotypes highly associated with disease states (haplotype TC, case-control P = 7.3×10−15, OR = 0.41, 95% CI = 0.33–0.52; haplotype CT, case-control P = 5.2×10−12, OR = 2.20, 95% CI = 1.75–2.76, Table S8). Individuals homozygous for the risk alleles at both loci are more than eight times more susceptible to leprosy than their homozygous protective counterparts (Figure S4 and Table S9).
Toll-like receptor 1 (TLR1) region
Consistent direction of effect was also observed at the Toll-like receptor (TLR) gene cluster at chromosome 4p14. The association peaks at a non-synonymous variant rs5743618 (TLR1 I602S/T1805G), which affects receptor translocation to the cell surface , with a significant protective association against leprosy in both New Delhi (P = 1.3×10−6, OR = 0.27, 95% CI = 0.15−0.47) and Kolkata (P = 0.012, OR = 0.40, 95% CI = 0.20–0.83, Table 1, Table S5). The combined statistic from these two case-control populations showed strong evidence of association (P = 5.7×10−8, OR = 0.31, 95% CI = 0.20–0.48). The Kumbakonam family study showed a consistent direction of effect (P = 0.090, OR = 0.61, 95% CI = 0.35–1.09). Analyzed together with a small reported set of 57 cases and 90 controls in Turkey , the best estimate of risk for the 602S allele is OR = 0.37 (95% CI = 0.26–0.51) with an overall P = 1.7×10−9 for the case-control studies (Table 1,Figure 2and Table S5). This association is the strongest among all 75 SNPs genotyped in the 80 kb flanking region, and accounts for the observed association in the locus (Table S10). The associations remained significant after corrections for age and gender (Table S5), with similar effect sizes between the multibacillary and paucibacillary subtypes (Table S6). Haplotypic analysis identified a haplotype containing the 602S variant that associated significantly with leprosy susceptibility (CTTTCT, P = 1.3×10−6, OR = 0.29, 95% CI = 0.18–0.48, Figures S5 and S6, Table S11), although this magnitude of effect is similar to the single locus association suggesting this effect is mainly driven by the 602S association. In fact, this TLR1 variant shows the most consistent association after HLA among the 2,092 genes tested in our microarray. However, this association was not observed in the recent GWA study , likely due to the absence of this SNP on the array utilized by the Zhang et al, and the low power to detect the association by proxy SNPs due to the low allele frequency (1.7%) in the Han Chinese population.
Previous candidates in leprosy susceptibility
The gene-centric microarray allows genic SNPs to be interrogated at a high density including putative leprosy susceptibility candidates (Table 2). Notably, evidence of association was observed at SNPs flanking MICA (rs12660741, P = 9.9×10−5) and TNF (rs3093662, P = 3.2×10−4), both located on chromosome 6. These associations remained nominally significant after correcting for the HLA variants rs9270650 and rs1071630. These data suggest that these genes may be involved in immunity against M. leprae. However, we did not replicate the reported association with the LTA promoter variant  (rs2239704/LTA +80, P = 0.20), or with the regulatory variants in PARK2/PACRG within the New Delhi case-control study as previously described . Despite consistent replication at the Chromosome 13 locus (C13orf31 and CCDC122), we did not observe any significant association with variants in NOD2, TNFSF15 or RIPK2 that were identified in the recent GWA study . The differences may be due to population specific effects which have been described in other diseases  or differences in allele frequencies.
Population differentiation at TLR1
Infectious diseases have been a major selective force during human evolution ,  and have contributed to the diversity of the mammalian HLA genes . Several studies have shown that this TLR gene cluster is highly differentiated between populations , , . To further investigate the geographic distribution of this locus, we genotyped 12 SNPs in this region in 1,463 individuals from 6 populations (New Delhi, Kolkata and Kumbakonam from India, Malawi, Gambia and United Kingdom, Table S1). We observed a high degree of population differentiation at this locus  (Figure S7), which peaks at the I602S variant with a FST value higher than any other variants in this locus (Table S12). To compare the degree of differentiation with the rest of the genome, we genotyped I602S in the CEU, CHB and YRI populations  and observed extreme differentiation (FST = 0.55) among over 3 million polymorphic SNPs in the genome (>99th percentile). Mapping the global frequency of the I602S mutation in 15 different populations reveals that the protective serine allele is derived and rare in Africa, but expands in frequency to become the dominant allele in Europe (Figure 3 and Figure S8). Today, more than 40% of individuals of European descent are homozygous for this mutation. The ancestral isoleucine allele is crucial for protein translocation to the cell membrane, which is abrogated by the hydrophilic serine substitution resulting in a hypo-functional phenotype , ,  that associates with protection against an important pathogen. Together with a recent study showing signature of selection at this locus , these suggest that mycobacteria may have contributed to the extreme population differentiation at this locus. This provides insight into the host-pathogen relationship between human and M. leprae.
Despite the long history of leprosy and medical advances, leprosy remains a scourge in many parts of the world. Our understanding of the pathogenesis of M. leprae and its interaction with humans is limited, in part due to the inability to culture the bacterium in vitro. For this reason, there is greater reliance on genetics to understand the disease and its causative pathogen. Some progress has been made to identify genetic loci which associate with susceptibility to leprosy; however the numbers of candidates showing consistent replication across populations are few. We here conduct a population based case-control association study, surveying genetic variants in more than 2,000 genes throughout the genome.
Our gene-centric study identified genetic variants at TLR1 and HLA-DRB1/DQA1 as major determinants of leprosy susceptibility. These associations showed a consistent direction of effect across the populations under study, including the family samples from Kumbakonam in South India although the two-sided association statistic was borderline for the TLR1 variant (Table 1). These results minimize the chance that the associations are spurious, because the transmission disequilibrium test is known to be robust to population stratification.
The human TLR1 has been previously shown to mediate immune response to M. leprae. Our results advance our understanding of human immunity to M. leprae. Firstly, this present study is the first to show the consistent effect of TLR1 on leprosy susceptibility, at a significance level exceeding a genome-wide threshold. Secondly, the TLR1 association is the most significant after the HLA class II locus, of the greater than 2,000 loci surveyed. Although this result does not exclude the possibility of other genes involved in leprosy, it does support that the innate immune system, activated by the TLR receptors, plays a significant role in the defence against M. leprae in humans. Thirdly, the hydrophilic serine substation (TLR1 I602S) has been shown to result in a functional knockout phenotype , . Although this mutation is rare in Africa and Asia, our data suggest that a significant proportion of individuals of European descent are homozygous for this knockout variant. These functional TLR1 knockout individuals have a normal immunological phenotype and are protected against leprosy, suggesting that M. leprae may have utilized TLR1 as part of its pathogenesis mechanisms, as certain bacteria are known to produce homologues of the TLR1 signaling domain to subvert the innate immune responses . Collectively, these results suggest that modulation of the TLR1 pathway may be valuable in future treatment of mycobacterial diseases without significant side effects.
When we compare these results to the recently published GWA study , we observed both consistent and discrepant results with the different genetic loci. Reassuringly, both studies confirm the class II HLA locus as a major leprosy susceptibility locus, a finding consistent with early studies reporting association of HLA with leprosy , . Consistent findings were also observed for genetic variants at C13orf31 and CCDC122. However, separate genotyping of the reported SNPs did not replicate the associations at the NOD2, TNFSF15 or RIPK2 genes ; nor did the Zhang et al. study report any notable association at the TLR1 locus . The disparity at the TLR1 locus is likely due to the low allele frequency of TLR1 I602S, and that the haplotype carrying this variant would have low power to detect the association. For the other associated genes (NOD2, TNFSF15 and RIPK2) there could be a number of reasons for the non-replication. The reported effect sizes of variants in these genes are moderate compared to the HLA and chromosome 13 locus. Assuming these effect estimates are real, the present study will have less power to detect these associations. Another possibility is a population specific effect whereby variants in these genes confer susceptibility in the Chinese population but not in the Indians. Given the significant differences in the genetic composition between the Indian and Chinese , this may have contributed to the heterogeneous signals between the studies. Previous studies have described population specific genetic effects for other diseases . Furthermore, despite the relative homogenous population of M. leprae, the 16 mycobacterial subtypes highly correlated with the geography can be readily identified . Given the highly specialized nature of the immune system, these mycobacterial strains may activate the host immune response in slightly different ways, accounting for overlapping yet some discordant results between the studies. Finally, we cannot exclude the possibility that NOD2, TNFSF15 and RIPK2 are not true leprosy susceptibility loci.
This study provides insight into the host-pathogen relationship between human and M. leprae. Our data showed that the TLR1 locus is more highly differentiated as compared with the rest of the genome. Although the historical selective agents behind this differentiation are difficult to ascertain, our data suggests that mycobacterial infections may contribute to such a high degree of differentiation. Several studies have suggested that leprosy may be associated with reduced reproductive fitness , , and the all-cause mortality of lepromatous leprosy was reported to be at least 3.5 times of the general populations , . These factors, together with the long human co-existence with M. leprae and intense social stigma associated with the disease , suggest that leprosy may have contributed to the present global distribution of human TLR1. Other infectious diseases especially tuberculosis could have also contributed, given the overlapping antigenic repertoire and immunogenicity of mycobacterial species, despite a lack of association in a Gambian case-control population due to low allele frequency (Table S13). However, given the lack of data on concurrent selective events, this hypothesis remains difficult to prove and other mechanisms are possible including genetic drift, population migration or bottlenecks, or a combined effect of two or more of these mechanisms.
In conclusion, this study suggests that TLR1 and HLA-DRB1/DQA1 are important genetic determinants of susceptibility to leprosy. This study also shows a protective effect against an important pathogen with a hypo-functional TLR1 variant. These observations further highlight the key role of the TLRs in infectious diseases ,  and suggest that modulation of the TLR1 pathway may be valuable in future treatment of mycobacterial diseases.
Materials and Methods
For the New Delhi cohort of samples, informed written consents following the Indian Council of Medical Research (ICMR) specification were obtained from all the individuals whose blood samples were collected. The study was approved by the Jawaharlal Nehru University ethics committee. For the Kolkata cohort of samples, approval was obtained from the ICMR and consent was obtained from the Tropical School of Calcutta and the Swasti Blood Donor Centre at Kolkata. For the Kumbakonam samples, ethical approval was obtained from the ICMR, New Delhi and Hindu Mission Hospital in Kumbakonam.
We genotyped 258 leprosy patients and 300 controls from New Delhi . After quality control filtering with call rates (>95% for SNPs and >90% for samples), minor allele frequency (>1%), Hardy-Weinberg equilibrium (P>1×10−6), relatedness (<20%), heterozygosity (22–27%) and outlying ancestry (within 97.5th percentile), the genotype rate was 99.5% with separate multi-dimensional scaling (MDS) and principal component analysis (PCA) showing minimal population substructure in the 448 individuals (209 cases and 239 controls) carried forward for analysis. The replication studies included 220 leprosy patients and 162 controls from Kolkata, and 941 individuals from 246 families from Kumbakonam , with which 299 individuals (168 cases and 131 controls) from Kolkata and 852 individuals from Kumbakonam were included after quality control filtering. Diagnosis and classification of leprosy was based on clinical and microscopic examination of the skin lesions. Patients were classified either as multibacillary (MB, with bacterial index >0), a group that included patients with lepromatous (LL), borderline lepromatous (BL) and borderline (BB) leprosy, or as paucibacillary (PB, with bacterial index of 0) which included patients with borderline tuberculoid (BT) and tuberculoid (TT) leprosy. Paucibacillary leprosy was characterized by the presence of large well-defined skin lesions which were less than five in number, dry and with almost 90–100% loss of sensation. Multibacillary leprosy was characterized by the presence of six or more skin lesions that were smaller in size, tending to be bilaterally symmetrical with about 10–40% loss of sensation. The study was approved by the relevant ethics committee. Informed consent was obtained from individuals whose blood samples were collected. In addition, we have also genotyped a total of 1,463 individuals from the Kumbakonam, India, Gambia, Malawi, United Kingdom, Tomsk and Tuva of Russia, and HapMap CEU, CHB and YRI populations  for the population differentiation study.
Genotyping and quality control
All individuals in the New Delhi cohort were genotyped with the Illumina IBC gene-centric 50 K array . The data quality control and analysis were performed using PLINK . Multi-dimensional scaling (MDS) and principal component analysis (PCA) were carried out with PLINK and EIGENSTRAT to remove population outliers. The microarray genotypes more than 48,000 SNPs distributed in approximately 2,100 genes throughout the genome, including 3,470 non-synonymous markers. A total of 209 leprosy cases and 239 controls were carried forwarded for analysis. The SNPs in the replication studies and the population differentiation analysis were genotyped using the Sequenom MassArray primer extension assay. The TLR1 I602S polymorphism was additionally genotyped with a restriction enzyme digestion assay (PstI), and the rs1071630 with a primer induced restriction assay (MlyI).
Separate MDS and PCA analyses were performed in PLINK to visualize the population structure in the New Delhi cohort (Figure S1), and individuals with outlying ancestry were removed. The analysis was performed in the resultant set of individuals. The primary test of association in the New Delhi and Kolkata cohorts was carried out with the Pearson's χ2 allelic test, Cochran-Armitage trend test and logistic regression (Table S5); although the allelic test statistics were used in the main text and tables unless otherwise specified. The transmission-disequilibrium-test was used for the Kumbakonam family cohort. Both analyses were performed in PLINK. The primary association with the allelic test was performed without any correction. However, the associations at TLR1 and HLA-DRB1/DQA1 were further tested in a logistic regression model with correction for age, gender and the first MDS component (Table S5). As this study involves genotypes of genic instead of random SNPs, the quantile-quantile (QQ) plot was generated with comparison to the statistics of the same SNPs in the WTCCC Crohn's disease, rheumatoid arthritis and type 1 diabetes cohorts in addition to the expected values under the null hypothesis (Figure S2), which suggests that the statistics are not inflated by population substructure. The Mantel Haenszel statistics was used to combine data from case-control cohorts, but only when there was no significant heterogeneity in odds ratio between studies indicated by the Woolf's test. The haplotypic analysis was performed by Haploview, with the haplotypes reconstructed according to the confidence interval definition . The fixation index FST was used for the population differentiation analysis. Let pi denote the allele frequency of the ith population and p-bar denote the mean allele frequency, the FST value is given bywhere n is the number of populations in comparison. The haplotype diversity analysis was performed with Haplosuite .