Linkage disequilibrium and haplotype block patterns in popcorn populations

Authors: Andréa Carla Bastos Andrade ^aff001; José Marcelo Soriano Viana ^aff001; Helcio Duarte Pereira ^aff001; Vitor Batista Pinto ^aff001; Fabyano Fonseca e Silva ^aff002
Authors place of work: Federal University of Viçosa, Department of General Biology, Viçosa, MG, Brazil ^aff001; Federal University of Viçosa, Department of Animal Science, Viçosa, MG, Brazil ^aff002
Published in the journal: PLoS ONE 14(9)
Category: Research Article
doi: https://doi.org/10.1371/journal.pone.0219417

Summary

Linkage disequilibrium (LD) analysis provides information on the evolutionary aspects of populations. Recently, haplotype blocks have been used to increase the power of quantitative trait loci detection in genome-wide association studies and the prediction accuracy of genomic selection. Our objectives were as follows: to compare the degree of LD, LD decay, and LD decay extent in popcorn populations; to characterize the number and length of haplotype blocks in the populations; and to determine whether maize chromosomes also have a pattern of interspaced regions of high and low rates of recombination. We used a biparental population, a synthetic, and a breeding population, genotyped for approximately 75,000 single nucleotide polymorphisms (SNPs). The sample size ranged from 190 to 192 plants. For the whole-genome LD and haplotype block analyses, we assumed a window of 500 kb. To characterize the block and step patterns of LD in the populations, we constructed LD maps by chromosome, defining a cold spot as a chromosome segment including SNPs with the same LDU position. The LD and haplotype block analyses were also performed at the intragenic level, selecting 12 genes related to zein, starch, cellulose, and fatty acid biosynthesis. The populations with the higher and lower frequencies of |D'| values greater than 0.75 were the biparental (65–74%) and the breeding population (26–58%), respectively. There were slight differences between the populations regarding the average distance for SNPs with |D'| values greater than 0.75 (in the range of approximately 207 to 229 kb). The level of LD expressed by the r² values was low in the populations (0.02, 0.04, and 0.04, on average) but comparable to some non-isolated human populations. The frequency of r² values greater than 0.75 was lower in the biparental population (0.2–0.5%) and higher in the other populations (0.2–1.6%). The average distance for SNPs with r² values greater than 0.75 was much higher in the biparental population (approximately 80 to 126 kb). In the other populations, the ranges were approximately 6 to 19 and 6 to 35 kb. The heatmaps for the regions covered by the first 100 SNPs in each chromosome, in each population (1 to 3.3 Mb, approximately), provided evidence that the comparatively few high r² values (close to 1.0) occurred only for SNPs in close proximity, especially in the synthetic and breeding populations. Due to the reduced number of SNPs in the haplotype blocks (2 to 3) in the populations, it is not expected advantage of a haplotype-based association study as well as genomic selection along generations. The results concerning LD decay (rapid decay after 5–10 kb) and LD decay extent (along up to 300 kb) are in the range observed with maize inbred line panels. The LD maps indicate that maize chromosomes had a pattern of regions of extensive LD interspaced with regions of low LD. However, our simulated LD map provides evidence that this pattern can reflect regions with differences in allele frequencies and LD levels (expressed by |D'|) and not regions with high and low rates of recombination.

Keywords:

Plant genomics – Haplotypes – Maize – Molecular genetics – Genome-wide association studies – Inbred strains – Animal sexual behavior – Chromosome mapping

Introduction

Linkage disequilibrium (LD) analysis is important to humans, other animal species, and plant geneticists because the results can be used for positional cloning, provide information on the rate of recombination, gene conversion, and evolutionary aspects of populations, including recombination history, mutation, selection, genetic drift, and admixture, and allow for the selection of populations and single nucleotide polymorphisms (SNPs) for association studies [1]. The most common LD measures are D' and r². The statistic D' is the ratio between D (the difference between products of haplotypes, D = P(AB).P(ab)–P(Ab).P(aB)) and the deviation of the actual gametic frequency from linkage equilibrium [2]. The statistic r² is the square of the correlation between the values of alleles at two loci in the same gamete, where D is the covariance [3].

Additional information on historical recombination is provided by analysis of the haplotype block pattern in populations. A haplotype block is a chromosome region in which there are few haplotypes (combinations of alleles of multiple SNPs within a haplotype block) (2–4 per block), and for which the LD analysis provides evidence of a low rate of recombination [1]. Recently, haplotype blocks have been used to increase the power of QTL (quantitative trait loci) detection in genome-wide association studies (GWAS) and the prediction accuracy with genomic selection. Based on a panel including 183 maize inbred lines genotyped for 38,000 SNPs, Maldonado et al. [4] confirmed the advantage of haplotype-based GWAS for ear and plant height, the ear height/plant height ratio, and leaf angle relative to single SNP analysis. Hess et al. [5] observed an increase of up to 5.5% in the accuracy of genomic prediction in an admixed dairy cattle population using fixed-length haplotypes relative to the single SNP approach. Although there are several methods for defining a haplotype block, the most common procedure was proposed by Gabriel et al. [6]. Their criterion is that the one-sided upper 95% confidence bound on D' is > 0.98 and the lower bound is > 0.70.

Characterization of the LD and haplotype block patterns in human, domesticated animal, and plant populations has provided variable results concerning the degree of LD, LD decay, LD decay extent, and number and length of the haplotype blocks. Most maize LD studies have been done with inbred line panels. Thirunavukkarasu et al. [7] and Truntzler et al. [8] observed an overall average r² between 0.23 and 0.61, LD decay after 5–10 kb, and LD extent along 200–300 kb. Faster LD decay and shorter LD extent (less than 4 kb) were observed by Maldonado et al. [4]. Higher LD and slower LD decay were observed in biparental and multiparental maize populations [9]. The number and length of haplotype blocks is also highly variable [4, 7].

In several investigations in human populations, the structure of LD was described based on LD maps. In an LD map, each SNP has an LD position in LD units (LDUs). One LDU is the distance in kilobases at which disequilibrium (expressed as the Malecot's prediction of association– ρ) declines to approximately 0.37 of its starting value. Assuming unrelated individuals, ρ equates to the absolute value of D'. The difference between the LD positions of two SNPs divided by the distance in kilobases (d) is the exponential decline of disequilibrium (ε). LDUs share an inverse relationship with the recombination rate. Thus, regions with extensive LD have few LDUs (plateaus or blocks), and regions with many LDUs have high levels of recombination rate (steps). Holes in the LD maps are regions where greater marker density is required to provide a full characterization of the block and step patterns of the LD. Holes are identified by an LD map interval of 3, which is an arbitrary value because disequilibrium is indeterminate for εd > 3 and of doubtful reliability for εd > 2 [10, 11].

Because there is no information on LD and the structure of haplotype blocks in popcorn populations and no LD maps for maize, the objectives of this study were: (1) to compare the degree of LD, the LD decay, and the LD decay extent in popcorn populations; (2) to characterize the number and length of haplotype blocks in the populations; and (3) to elaborate the first LD map for maize, for elucidating whether maize chromosomes also have a pattern of interspaced regions of high and low rates of recombination.

Materials and methods

Populations

We used a biparental (F₂ generation) temperate population, a tropical synthetic (Synthetic UFV), and a tropical breeding population (Beija-Flor cycle 4). A biparental population is the most used maize population for deriving doubled haploids and inbred lines in hybrid breeding. Maize synthetic varieties are used as germplasm sources in breeding programs or as improved populations in developing countries. Theoretically, a biparental population shows LD only for linked genes and molecular markers. In a synthetic there is LD for genes and molecular markers with independent assortment. Because selection can change the LD degree, we also included a breeding population. The biparental population was derived from the single cross AP4502, developed by the Agricultural Alumni Seed Improvement Association, Romney, IN, USA. Synthetic UFV and Beija-Flor cycle 4 (BFc4) were developed by the Federal University of Viçosa (UFV), Minas Gerais, Brazil. The synthetic was derived by random crossings involving 20 elite inbred lines from the tropical population Viçosa and 20 elite inbred lines from the tropical population Beija-Flor. The inbred lines were selected based on expansion volume (a measure of popcorn quality). Beija-Flor cycle 4 was developed after four cycles of half-sib selection based on expansion volume.

DNA extraction, genotyping-by-sequencing (GBS), SNP calling, data quality control, and imputation

Leaf samples of young plants were collected for DNA extraction. The DNA extraction was performed using the CTAB (cetyl trimethylammonium bromide) protocol with modifications. After quantification, the DNA samples of 574 plants (190 or 192 from each population) were sent to the Institute of Biotechnology at Cornell University (two plates of 95 samples from the biparental population) and Institut de Recherche en Immunologie et en Cancérologie/IRIC at University of Montreal (four plates of 96 samples from the tropical populations) for GBS services based on HiSeq 2500 (paired-end reads of 125 bp) and NextSeq500 (single-end reads of 85 bp), respectively. The SNP variant call services were provided by the Institute of Biotechnology and Omega Bioservices, Norcross, GA, respectively, using B73 version 4 (current version) as the reference genome [12]. After reading the data using the R package vcfR [13], we filtered by missing allele and chromosome. Then, we computed the SNP and genotype call rates and the minor allele frequency (MAF), employing the R package HapEstXXR [14]. After filtering by MAF > 0.01, we imputed based on Beagle [15] using the R package synbreed [16]. The number of SNPs after data quality control and imputation were 145,420, 74,773, and 76,055 for the biparental population, Synthetic UFV, and Beija-Flor c4, respectively. To maintain a similar number of SNPs for the populations, we finally performed a random sampling of 75,000 SNPs from the biparental population.

LD and haplotype block analyses

For Hardy-Weinberg equilibrium analysis by population and chromosome, the Bonferroni criterion was adopted to keep a global level of significance of 1%. To characterize the block and step patterns of LD in the populations, we constructed LD maps by chromosome using the interval method [17]. We defined a cold spot region as a chromosome segment including SNPs with the same LDU position. To evaluate if the LD maps allow inference of the overall degree of LD by chromosome in the populations, we also processed a simulated data set, generated with REALbreeding software (available by request). This software has been recently used in studies on population structure [18], QTL mapping [19], genomic selection [20], and genome-wide association studies [21]. We simulated the genotyping of 200 individuals in a population (generation 0) and 200 individuals in the same population after 10 generations of random crossings (generation 10), for 287 SNPs spanning 298 cM (density of 1 cM) of a single chromosome.

We then evaluated the degree of LD by chromosome in the populations concerning SNPs separated by up to 500 kb, using a two marker expectation-maximization (EM) algorithm [22]. For the whole-genome LD decay and LD decay extent analyses, we computed the average |D'| and r² values, defining intervals of 50 kb (0–50 to 451–500). To define a haplotype block, we adopted the criterion proposed by Gabriel et al. [6]. The haplotypes were estimated using an accelerated EM algorithm with a partition-ligation approach [23] to generate phased haplotypes for population frequency [24].

The LD and haplotype block analyses were also performed at the intragenic level. We choose 12 genes related to zein (one), starch (four), cellulose (five), and fatty acid biosynthesis (two) (S1 Table). With two exceptions, the selected genes had at least five SNPs in each population (maximum of 21). For the intragenic LD decay and LD decay extent analyses, we computed the average |D'| and r² values defining intervals of 1 kb (0–1 to 10.1–11 kb). All analyses were performed using LDMAP [17] and Haploview [22]. Heatmaps were generated using the R package pheatmap. To assess the haplotype blocks information, the haplotype files for each population and chromosome were read by a program (Haplotype blocks summary) developed in REALbasic 2009 by Prof. José Marcelo Soriano Viana.

Results

With the exception of chromosome 10 in the breeding population, the number of SNPs was generally in proportion to the chromosome length, providing an SNP density in the range of 23.5 to 44.3 kb (one SNP per 30.0 kb on average) (Table 1). The average MAF was approximately 0.1 regardless of chromosome and population, but the populations differed in their MAF distribution. The biparental population had a bimodal distribution and showed a higher number of SNPs with frequencies close to 0.01 and greater than 0.45 (S1 Fig). The synthetic and breeding populations had similar MAF distributions. The analysis of Hardy-Weinberg equilibrium provided evidence that most of the SNPs in the biparental population had a nonsignificant deviation, whereas most of the SNPs in the other populations showed a significant deviation. We retained SNPs with significant deviation from Hardy-Weinberg equilibrium in the synthetic and breeding populations to keep a similar number of SNPs for the LD and haplotype block analyses. To maintain a similar number of SNPs for constructing the LD maps by chromosome, we used the SNPs in Hardy-Weinberg equilibrium in the synthetic and breeding populations as well as a sample of SNPs with no significant deviation from Hardy-Weinberg equilibrium from the biparental population.

**Tab. 1. Number of SNPs, SNP coverage (kb), average SNP interval (bp) and MAF, and minimum, average, and maximum LD measures by chromosome in each population.**

The LD map from the simulated data provided evidence that the LD units were lower for the generation with lower LD (generation 10) (Fig 1). Thus, the LD maps by chromosome revealed that the higher global LD (in LDUs) was observed in the synthetic but only for chromosomes 1 to 7 (S2 Fig). The higher global LD for chromosomes 8 and 9 was observed in the biparental population. The higher global LD for chromosome 10 was seen in the breeding population. The lowest global LD was observed in chromosome 6, and the highest global LD was observed in chromosome 10 of the breeding population. Because of the much higher number of SNPs in Hardy-Weinberg equilibrium in the biparental population, we only used this population for analysis of the number and length of the hot (high recombination rate) and cold (low recombination rate) spot regions of the chromosomes, as well as the number and length of the holes (Table 2). Except for chromosome 10, where the average lengths of the hot and cold spot regions were approximately 37 and 38 kb, respectively, the average lengths of the hot and cold spots regions for the other chromosomes ranged between approximately 45–55 and 83–110 kb, respectively. The number of hot spots ranged between 1,788 and 3,897, and the number of cold spots ranged from 608 to 1,507. The holes represented only 0.4 to 2.7% of the chromosomal genomes.

**Fig. 1. LD maps for generations 0 and 10.**

**Tab. 2. Number and minimum, average, and maximum length (kb) of the hot spots (steps), holes, and cold spots (plateaus) by chromosome in the biparental population.**

Concerning SNPs separated by up to 500 kb, the biparental population and the synthetic had similar average |D'| values (0.77 and 0.75). The values were approximately 10–14% greater than the average value in the breeding population (Table 1). Interestingly, the average r² value in the biparental population was approximately half of the corresponding average values observed in the other populations (0.02 versus 0.04, and 0.04). Regardless of the chromosome, the populations with the higher and lower frequencies of |D'| values greater than 0.75 were the biparental population (65–74%) and the breeding population (26–58%), respectively. However, the frequency of r² values greater than 0.75 was lower in the biparental population (0.2–0.5%) and higher in the other populations (0.2–1.6%) (S2 Table). Furthermore, the average distance for SNPs with r² values greater than 0.75 was much higher in the biparental population (approximately 80 to 126 kb). In the other populations, the ranges were approximately 6 to 19 and 6 to 35 kb. There were slight differences between the populations regarding the average distance for SNPs with |D'| values greater than 0.75 (in the range of approximately 207 to 229 kb).

The heatmaps for the regions covered by the first 100 SNPs in each chromosome, in each population (1 to 3.3 Mb, approximately), provided evidence that the comparatively few high r² values (close to 1.0) occurred only for SNPs in close proximity, especially in the synthetic and breeding populations (S3 Fig). Although these regions do not represent the pattern of LD along the chromosomes (see the LD pattern for five segments of 100 SNPs along chromosome 4 in the biparental population in S4 Fig) there are some regions with blocks of intermediate r² values for distant SNPs, especially in the biparental population.

Regardless of the chromosome, population, and LD measurement, the LD decreased as the between-SNP distance increased from 0–50 to 451–500 kb (S5 and S6 Figs). In general, there was an initially higher LD decrease for SNPs separated by 51–100 kb (3 to 7% for |D'| and 28 to 66% for r², on average) and then a gradual decrease to the minimum LD value for SNPs separated by 451–500 kb. Because there were no significant differences between chromosomes, we can state that following an initial higher decrease after 50 kb, the |D'| and r² in the biparental population extended with similar magnitude for an interval of 450 kb (Fig 2A and 2B). In this interval, the average |D'| values decreased from 0.69–0.77 to 0.64–0.77 in the three populations, and the average r² values in the biparental population decreased from 0.025 to 0.020. However, in the other two populations, the average r² value decreased by approximately 50%. The r² decay from its maximum average value reached 36 to 73% after 5–10 kb (Fig 2C).

**Fig. 2.**
Overall average |D'| (a) and r² (b and c) values by distance interval (kb) in the biparental population (Bip), in the synthetic (Syn), and in the breeding population (BFc4).

The biparental population also differed from the other populations concerning the pattern of haplotype blocks (Table 3). The biparental population presented a lower average number of haplotype blocks per chromosome (approximately 225 versus 700 and 730 on average), a lower block length (approximately 1 versus 11 kb on average), and a lower number of SNPs per block (approximately 2 versus 3 on average). Most of the haplotype blocks in the three populations included two SNPs, but the number of haplotype blocks with three or more SNPs was greater in the synthetic and breeding populations (S7 Fig). It is important to highlight that the total length of the haplotype blocks represents only 0.01 to 5.13% of the chromosome genomes.

**Tab. 3. Haplotype blocks structure of the populations.**

The intragenic LD analysis also revealed higher average |D'| values in the biparental population and synthetic relative to the average value observed in the breeding population (0.74 and 0.88 versus 0.67). The biparental population presented an average r² value that was much lower than the average values observed in the other two populations (0.02 versus 0.13 and 0.14) (Table 4). Regardless of the population, the maximum intragenic |D'| (1,0) was observed for SNPs separated by up to 10.6 kb, while most of the higher intragenic r² values (0.7 or greater) were only observed for the closest SNPs (S8 and S9 Figs). The intragenic heatmaps provided evidence of distinct LD patterns between genes and populations (S9 Fig). With regard to the intragenic LD decay, there was evidence of |D'| and r² decay in the breeding population and r² decay in the synthetic (Fig 3). Concerning the intragenic haplotype block structure, there was general evidence of a single block of variable size (0.03 to 8.72 kb) with two SNPs (Table 5). Genes Zm00001d018033 and Zm00001d041972 showed population differences in terms of block size and number of SNPs.

Intragenic LD decay and LD extent concerning SNPs separated by up to 10.6 kb (|D'| and r<sup>2</sup> average values in intervals of 1 kb). — **Fig. 3. Intragenic LD decay and LD extent concerning SNPs separated by up to 10.6 kb (|D'| and r² average values in intervals of 1 kb).**

**Tab. 4. Intragenic minimum, average, and maximum LD values in each population.**

**Tab. 5. Intragenic haplotype blocks structure in each population.**

Discussion

It is difficult to characterize the LD and haplotype block patterns in two or more unrelated random cross populations based on an LD map and two measures of linkage disequilibrium. Based on studies of the LD pattern in human populations, LD maps demonstrated that the human chromosomes have a pattern of regions of extensive LD (plateaus or cold spots), interspaced with regions of high recombination rate (steps or hot spots) [25, 26]. Both regions are variable in number and length, and cold spots show equal (as assumed in this study) or similar LD in LDUs. The hot spots present distinct LDUs. The same pattern was seen in the LD maps of the chromosomes of the biparental population, elaborated under high density as recommended by Pengelly et al. [25]. To better understand the level of LD in the hot and cold spots, we analyzed two extreme segments of the chromosome 1 LD map, including 30 SNPs. Both segments have similar lengths in LDUs (4.1 and 3.6) and kb (970 and 828). The average |D'| was much greater for the SNPs in the seven cold spots (including three to 12 SNPs) relative to the average value for the SNPs in the 21 hot spots (including two to three SNPs) (0.89 versus 0.29). However, this was not verified via the r² statistic (0.004 versus 0.038).

When comparing populations that share a common origin, have a similar effective population size, and did not face an extreme reduction in size (population bottleneck), the statistics D, D', and r² should provide a comparable characterization of the LD pattern if there are similar allele frequencies. If the populations have distinct distributions of allelic frequencies, D' can be used for analyzing the recombination history, and r² should be the choice if recombination and mutation are important factors affecting the LD [1]. However, in the last two decades, most studies on LD in human populations have aimed to select populations and SNPs (tagging SNPs) for association studies [26, 27]. In general, both |D'| and r² have been used [27, 28], and because of their high level of LD, isolated populations have been recommended for association studies [29]. The statistic r² is the most relevant for association mapping because it has a simple inverse relationship with the sample size required to detect association [1]. The use of LD maps and two measures of LD for comparing the popcorn populations provided some contrasting results, but the general evidence is that the synthetic is the population with the higher LD. As expected, the lower average |D'| value in the breeding population reflects its recombination history. The synthetic and the biparental populations presented greater average |D'| and higher frequency of SNPs with elevated |D'| values because they have no recombination history.

Because of the differences regarding molecular marker type and density, sample size, and genome coverage, comparison of LD values of human, domesticated animal, and plant populations should be made with caution, even when the studies involve the same species. We were surprised by the low average r² values and the reduced frequency of SNPs with r² values greater than 0.25 (defined as useful LD in some studies) in the popcorn populations. In the study of Yan et al. [30], involving 632 maize inbred lines and 943 SNPs (density of one SNP each 2,121 kb), the average r² was only 0.009. However, for SNPs separated by up to 100 kb, the average was 0.2 (0.03, 0.09, and 0.10 for the biparental, synthetic, and breeding populations, respectively). Even higher LD values were reported in the maize NAM (nested association mapping) population [31] and in two biparental and four FPM (four parent maize) populations studied by Anderson et al. [9]. In general, the average r² values observed in the popcorn populations are also lower than the values observed in cattle and chicken populations (0.1 to 0.8 for SNPs separated by up to 100 kb) [32–34]. The density ranged from 27.8 to 112.3 kb in these three studies. Using a 600K SNP chip (density of one SNP per 6.3 kb), Pardo et al. [28] observed a median pairwise r² averaged across all chromosomes of 0.015 and 0.016 for the Dutch and HapMap-CEU populations, respectively.

The absence of a uniform criterion for defining the LD decay and the LD extent also makes comparison of the results with human, domesticated animal, and plant populations difficult. Angius et al. [26] used LD decay as the distance over which the average LD decreases to half of its maximum value (half-length). They defined LD extent as the distance over which the average LD declines to an asymptotic value. Anderson et al. [9] used LD decay as the distance over which the average r² dropped below 0.8, and LD extent as the distance over which the average r² fell below 0.2. Concerning LD decay, our results showed differences between LD measures and populations. There were slight differences between chromosomes, but the higher r² decay occurred after 5–10 kb (36 to 73%). Yan et al. [30] observed an LD decay of 64% after 5–10 kb in an inbred lines panel, and the LD reached an approximate asymptotic r² value of 0.01 in the interval of 1–5 Mb (LD extent of 5 Mb). A similar LD extent (5 Mb) was observed in eight breeds of cattle, but a comparable LD decay (62%) occurred along 100 kb [35]. From the analysis of segments of one Mb in all chromosomes in Ashkenazi Jew, caucasian, and African American populations, Shifman et al. [36] observed LD decays of 17, 21, and 42% along 10 kb, respectively. A similar LD extent of 300 kb occurred in the populations (reaching an approximate asymptotic r² value of 0.05).

If there is a higher LD between QTLs and haplotypes than with individual SNPs, haplotype blocks can provide substantial statistical power in association studies [6] and increased accuracy of genomic prediction of complex traits [37]. Surprisingly, our results evidenced that the number and length of the haplotype blocks and the number of SNPs per haplotype block were proportional to the average r². The criterion of Gabriel et al. [6] appears to provide a reduced number of SNPs per haplotype block. In a study with 235 soybean varieties genotyped by 5,361 SNPs (density of one SNP per 208 kb), Ma et al. [38] observed six SNPs per haplotype block on average. This is not surprising because the group of varieties corresponded to a pure line panel (high LD). In studies with German Holstein cattle and four chicken populations, the average number of SNPs per haplotype block ranged between approximately four to 10, and the mean block length ranged from approximately 146 to 799 kb [32, 33]. Low average numbers of SNPs per haplotype block (approximately 4–5) and reduced average haplotype block lengths (approximately 5–7 kb) were also observed in human populations [6, 28]. However, the size of each block varied dramatically in the study of Gabriel et al. [6], from less than one to 173 kb.

Concerning the low intragenic LD and the minimum size of the haplotype blocks observed in the three populations, we believe that the lower LD for the biparental population is due to crossing two genetically similar high-quality inbred lines. Because there is no information on the LD and haplotype block patterns in the base populations Viçosa and Beija-Flor, we cannot infer that the higher average intragenic r² values observed in the synthetic and breeding populations (for 11 of the 12 genes) are due to selection for quality. Characterization of the LD and haplotype block patterns regarding specific chromosomal regions has only been made by human geneticists, generally aimed at SNP tagging. From the analysis of SNPs within the HLA region on chromosome 6, Evseeva et al. [39] observed 18 haplotype blocks in European populations, based on the criterion of Gabriel et al. [6]. Furthermore, the LD was slightly lower in southern than northern European populations. Using the same criterion, Nuchnoi et al. [40] observed six and four haplotype blocks across a 472-kb region on chromosome 5q31-33 in Southeast (Thai) and Northeast Asian (Chinese and Japanese) populations. Akesaka et al. [41] identified two to six blocks in Korean and Japanese populations, depending on the criterion of an LD block, spanning approximately 3 to 47 kb. The median r² value for the five genes in the region ranged from 0.03 to 0.89.

In conclusion, the level of LD expressed by the r² values in the three popcorn populations with different genetic structures—a biparental population, a synthetic, and a breeding population—is low but comparable to some non-isolated human populations. This finding does not imply that these populations cannot be used for GWAS because there is a fraction of high r² values for SNPs separated by less than 5 kb. The populations are also not excluded for genomic selection because the most important factor affecting this selection process is the relatedness between individuals in the training and validation sets. However, we do not expect a significant advantage from haplotype-based GWAS and genomic selection along generations due to the reduced number of SNPs in the haplotype blocks (2 to 3). The results on LD decay (rapid decay after 5–10 kb) and LD decay extent (along up to 300 kb) are in the range observed with maize inbred line panels. Our most important result is that, similar to human chromosomes, maize (popcorn is also Zea mays, but ssp. everta) chromosomes also have a pattern of regions with extensive LD (plateaus or cold spots), interspaced with regions of low LD (steps or hot spots). It should be highlighted, however, that our simulated LD map provides evidence that this pattern can reflect regions with differences in allele frequencies and LD level (expressed by D') and not regions with high and low rates of recombination as evidenced by Jeffreys et al. [42], since the simulation process assumes a rate of recombination that is proportional to the distance in cM.

Supporting information

S1 Table [pdf]
Gene name, annotation, and chromosome localization, and the number of intragenic SNPs in each population.

S2 Table [pdf]
Minimum and maximum LD values, average distance (kb), and frequency observed in chromosomes by population, concerning SNPs with |D'| and r values higher than 0.75, in the interval 0.25–0.75, and lower than 0.25.

S1 Fig [a]

S2 Fig [pdf]
LD maps of the populations, by chromosome.

S3 Fig [pdf]
LD heatmaps by populations and chromosome regarding the first 100 SNPs; the regions covered ranged from approximately 1.0 to 3.3 Mb; the r and |D'| values are above and below the diagonal, respectively.

S4 Fig [pdf]
LD heatmaps for five segments of 100 SNPs along chromosome 4 in the biparental population; the regions covered ranged from approximately 1.4 to 6.0 Mb; the r and |D'| values are above and below the diagonal, respectively.

S5 Fig [kb]

S6 Fig [kb]

S7 Fig [pdf]
Distribution of the haplotype blocks based on the number of SNPs in the biparental population (Bip), in the synthetic (Syn), and in the breeding population (BFc4).

S8 Fig [bp]

S9 Fig [pdf]
Intragenic LD heatmaps by population; the r and |D'| values are above and below the diagonal, respectively.

Zdroje

1. Wall JD, Pritchard JK. Haplotype blocks and linkage disequilibrium in the human genome. Nature Reviews Genetics. 2003;4(8):587–97. doi: 10.1038/nrg1123 PubMed PMID: WOS:000184491300011. 12897771

2. Lewontin R. The interaction of selection and linkage. I. General considerations; heterotic models. Genetics. 1964;49(1):49. 17248194

3. Weir BS. Linkage disequilibrium and association mapping. Annual review of genomics and human genetics. 2008;9:129–42. doi: 10.1146/annurev.genom.9.081307.164347 18505378.

4. Maldonado C, Mora F, Scapim CA, Coan M. Genome-wide haplotype-based association analysis of key traits of plant lodging and architecture of maize identifies major determinants for leaf angle: hapLA4. PloS one. 2019;14(3). doi: 10.1371/journal.pone.0212925 PubMed PMID: WOS:000460372100053. 30840677

5. Hess M, Druet T, Hess A, Garrick D. Fixed-length haplotypes can improve genomic prediction accuracy in an admixed dairy cattle population. Genetics Selection Evolution. 2017;49. doi: 10.1186/s12711-017-0329-y PubMed PMID: WOS:000405342400001. 28673233

6. Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, et al. The structure of haplotype blocks in the human genome. Science. 2002;296(5576):2225–9. doi: 10.1126/science.1069424 PubMed PMID: WOS:000176379000060. 12029063

7. Thirunavukkarasu N, Hossain F, Shiriga K, Mittal S, Arora K, Rathore A, et al. Unraveling the genetic architecture of subtropical maize (Zea mays L.) lines to assess their utility in breeding programs. BMC genomics. 2013;14. doi: 10.1186/1471-2164-14-877 PubMed PMID: WOS:000328649800002. 24330649

8. Truntzler M, Ranc N, Sawkins MC, Nicolas S, Manicacci D, Lespinasse D, et al. Diversity and linkage disequilibrium features in a composite public/private dent maize panel: consequences for association genetics as evaluated from a case study using flowering time. Theoretical and Applied Genetics. 2012;125(4):731–47. doi: 10.1007/s00122-012-1866-y PubMed PMID: WOS:000307294600009. 22622520

9. Anderson SL, Mahan AL, Murray SC, Klein PE. Four Parent Maize (FPM) Population: Effects of Mating Designs on Linkage Disequilibrium and Mapping Quantitative Traits. Plant Genome. 2018;11(2). doi: 10.3835/plantgenome2017.11.0102 PubMed PMID: WOS:000450929300013. 30025026

10. Tapper WJ, Maniatis N, Morton NE, Collins A. A metric linkage disequilibrium map of a human chromosome. Annals of Human Genetics. 2003;67:487–94. doi: 10.1046/j.1469-1809.2003.00050.x PubMed PMID: WOS:000187442000001. 14641236

11. Zhang WH, Collins A, Maniatis N, Tapper W, Morton NE. Properties of linkage disequilibrium (LD) maps. Proceedings of the National Academy of Sciences of the United States of America. 2002;99(26):17004–7. doi: 10.1073/pnas.012672899 PubMed PMID: WOS:000180101600089. 12486239

12. Jiao YP, Peluso P, Shi JH, Liang T, Stitzer MC, Wang B, et al. Improved maize reference genome with single-molecule technologies. Nature. 2017;546(7659):524-+. doi: 10.1038/nature22971 PubMed PMID: WOS:000403814100037. 28605751

13. Knaus BJ, Grünwald NJ. vcfr: a package to manipulate and visualize variant call format data in R. Molecular Ecology Resources. 2017;17(1):44–53. doi: 10.1111/1755-0998.12549 27401132

14. Knueppel S, Rohde K, Knueppel MS. Package ‘HapEstXXR’. 2015.

15. Browning BL, Browning SR. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet. 2009;84(2):210–23. doi: 10.1016/j.ajhg.2009.01.005 19200528; PubMed Central PMCID: PMC2668004.

16. Wimmer V, Albrecht T, Auinger H-J, Schön C-C. synbreed: a framework for the analysis of genomic prediction data using R. Bioinformatics. 2012;28(15):2086–7. doi: 10.1093/bioinformatics/bts335 22689388

17. Maniatis N, Collins A, Xu CF, McCarthy LC, Hewett DR, Tapper W, et al. The first linkage disequilibrium (LD) maps: Delineation of hot and cold blocks by diplotype analysis. Proceedings of the National Academy of Sciences of the United States of America. 2002;99(4):2228–33. doi: 10.1073/pnas.042680999 PubMed PMID: WOS:000174031100086. 11842208

18. Viana JMS, Valente MSF, Silva FF, Mundim GB, Paes GP. Efficacy of population structure analysis with breeding populations and inbred lines. Genetica. 2013;141(7–9):389–99. doi: 10.1007/s10709-013-9738-1 PubMed PMID: WOS:000325780600013. 24057807

19. Viana JMS, Silva FF, Mundim GB, Azevedo CF, Jan HU. Efficiency of low heritability QTL mapping under high SNP density. Euphytica. 2017;213(1). doi: 10.1007/s10681-016-1800-5 PubMed PMID: WOS:000392317900013.

20. Viana JMS, Pereira HD, Mundim GB, Piepho HP, Silva FFE. Efficiency of genomic prediction of non-assessed single crosses. Heredity. 2018;120(4):283–95. doi: 10.1038/s41437-017-0027-0 PubMed PMID: WOS:000426887000001. 29180718

21. Viana JMS, Mundim GB, Pereira HD, Andrade ACB, Silva FFE. Efficiency of genome-wide association studies in random cross populations. Molecular Breeding. 2017;37(8). doi: 10.1007/s11032-017-0700-2 WOS:000407491600010.

22. Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005;21(2):263–5. doi: 10.1093/bioinformatics/bth457 15297300

23. Qin ZS, Niu T, Liu JS. Partition-ligation–expectation-maximization algorithm for haplotype inference with single-nucleotide polymorphisms. The American Journal of Human Genetics. 2002;71(5):1242–7. doi: 10.1086/344207 12452179

24. Barrett JC. Haploview: Visualization and analysis of SNP genotype data. Cold Spring Harbor Protocols. 2009;2009(10):pdb. ip71. doi: 10.1101/pdb.ip71 20147036

25. Pengelly RJ, Tapper W, Gibson J, Knut M, Tearle R, Collins A, et al. Whole genome sequences are required to fully resolve the linkage disequilibrium structure of human populations. BMC genomics. 2015;16. doi: 10.1186/s12864-015-1854-0 PubMed PMID: WOS:000360607100006. 26335686

26. Angius A, Hyland FCL, Persico I, Pirastu N, Woodage T, Pirastu M, et al. Patterns of linkage disequilibrium between SNPs in a sardinian population isolate and the selection of markers for association studies. Human Heredity. 2008;65(1):9–22. doi: 10.1159/000106058 PubMed PMID: WOS:000249305300002. 17652959

27. Evans DM, Cardon LR. A comparison of linkage disequilibrium patterns and estimated population recombination rates across multiple populations. American Journal of Human Genetics. 2005;76(4):681–7. doi: 10.1086/429274 PubMed PMID: WOS:000227516000014. 15719321

28. Pardo L, Bochdanovits Z, de Geus E, Hottenga JJ, Sullivan P, Posthuma D, et al. Global similarity with local differences in linkage disequilibrium between the Dutch and HapMap-CEU populations. European Journal of Human Genetics. 2009;17(6):802–10. doi: 10.1038/ejhg.2008.248 PubMed PMID: WOS:000266289100016. 19127282

29. Collins A. Allelic association: linkage disequilibrium structure and gene mapping. Mol Biotechnol. 2009;41(1):83–9. doi: 10.1007/s12033-008-9110-3 18841501.

30. Yan JB, Shah T, Warburton ML, Buckler ES, McMullen MD, Crouch J. Genetic Characterization and Linkage Disequilibrium Estimation of a Global Maize Collection Using SNP Markers. PloS one. 2009;4(12). doi: 10.1371/journal.pone.0008451 PubMed PMID: WOS:000273104000015. 20041112

31. Gore MA, Chia JM, Elshire RJ, Sun Q, Ersoz ES, Hurwitz BL, et al. A First-Generation Haplotype Map of Maize. Science. 2009;326(5956):1115–7. doi: 10.1126/science.1177837 PubMed PMID: WOS:000271951000045. 19965431

32. Qanbari S, Hansen M, Weigend S, Preisinger R, Simianer H. Linkage disequilibrium reveals different demographic history in egg laying chickens. BMC genetics. 2010;11. doi: 10.1186/1471-2156-11-103 PubMed PMID: WOS:000285302100001. 21078133

33. Qanbari S, Pimentel ECG, Tetens J, Thaller G, Lichtner P, Sharifi AR, et al. The pattern of linkage disequilibrium in German Holstein cattle. Animal Genetics. 2010;41(4):346–56. doi: 10.1111/j.1365-2052.2009.02011.x PubMed PMID: WOS:000279717800002. 20055813

34. Khatkar MS, Nicholas FW, Collins AR, Zenger KR, Al Cavanagh J, Barris W, et al. Extent of genome-wide linkage disequilibrium in Australian Holstein-Friesian cattle based on a high-density SNP panel. BMC genomics. 2008;9. doi: 10.1186/1471-2164-9-187 PubMed PMID: WOS:000256398400001. 18435834

35. McKay SD, Schnabel RD, Murdoch BM, Matukumalli LK, Aerts J, Coppieters W, et al. Whole genome linkage disequilibrium maps in cattle. BMC genetics. 2007;8. doi: 10.1186/1471-2156-8-74 PubMed PMID: WOS:000252442300001. 17961247

36. Shifman S, Kuypers J, Kokoris M, Yakir B, Darvasi A. Linkage disequilibrium patterns of the human genome across populations. Human Molecular Genetics. 2003;12(7):771–6. doi: 10.1093/hmg/ddg088 PubMed PMID: WOS:000181981400008. 12651872

37. Jonas D, Ducrocq V, Fouilloux MN, Croiseau P. Alternative haplotype construction methods for genomic evaluation. Journal of dairy science. 2016;99(6):4537–46. doi: 10.3168/jds.2015-10433 PubMed PMID: WOS:000375876600041. 26995132

38. Ma YS, Reif JC, Jiang Y, Wen ZX, Wang DC, Liu ZX, et al. Potential of marker selection to increase prediction accuracy of genomic selection in soybean (Glycine max L.). Molecular Breeding. 2016;36(8). doi: 10.1007/s11032-016-0504-9 PubMed PMID: WOS:000382144700013. 27524935

39. Evseeva I, Nicodemus KK, Bonilla C, Tonks S, Bodmer WF. Linkage disequilibrium and age of HLA region SNPs in relation to classic HLA gene alleles within Europe. European Journal of Human Genetics. 2010;18(8):924–32. doi: 10.1038/ejhg.2010.32 PubMed PMID: WOS:000280145100011. 20354563

40. Nuchnoi P, Ohashi J, Naka I, Nacapunchai D, Tokunaga K, Nishida N, et al. Linkage disequilibrium structure of the 5q31-33 region in a Thai population. Journal of Human Genetics. 2008;53(9):850–6. doi: 10.1007/s10038-008-0309-8 PubMed PMID: WOS:000258615100008. 18574552

41. Akesaka T, Lee SG, Ohashi J, Bannai M, Tsuchiya N, Yoon Y, et al. Comparative study of the haplotype structure and linkage disequilibrium of chromosome 1p36.2 region in the Korean and Japanese populations. Journal of Human Genetics. 2004;49(11):603–9. doi: 10.1007/s10038-004-0195-7 PubMed PMID: WOS:000225238200003. 15480877

42. Jeffreys AJ, Holloway JK, Kauppi L, May CA, Neumann R, Slingsby MT, et al. Meiotic recombination hot spots and human DNA diversity. Philosophical Transactions of the Royal Society of London Series B-Biological Sciences. 2004;359(1441):141–52. doi: 10.1098/rstb.2003.1372 PubMed PMID: WOS:000188425400017. 15065666

Linkage disequilibrium and haplotype block patterns in popcorn populations

Summary

Keywords:

Introduction

Materials and methods

Populations

DNA extraction, genotyping-by-sequencing (GBS), SNP calling, data quality control, and imputation

LD and haplotype block analyses

Results

Discussion

Supporting information

Zdroje

PLOS One

Důležitost adherence při depresivním onemocnění

Svět praktické medicíny 1/2024 (znalostní test z časopisu)

Koncepce osteologické péče pro gynekology a praktické lékaře

Sekvenční léčba schizofrenie

Hypertenze a hypercholesterolémie – synergický efekt léčby