Development of Genomic Markers and Mapping Tools for Assembling the Allotetraploid Gossypium hirsutum L. Draft Genome Sequence

Abstract

Cotton (Gossypium spp.) is the largest producer of natural textile fibers. Most worldwide and domestic cotton fiber production is based on cultivars of G. hirsutum L., an allotetraploid. Genetic improvement of cotton remains constrained by alarmingly low levels of genetic diversity, inadequate genomic tools for genetic analysis and manipulation, and the difficulty of effectively harnessing the vastly greater genetic diversity harbored by other Gossypium species. Development of large numbers of single nucleotide polymorphisms (SNPs) for use in intraspecific and interspecific populations will allow for cotton germplasm diversity characterization, high-throughput genotyping, marker-assisted breeding, germplasm introgression of advantageous traits from wild species, and high-density genetic mapping. My research has been focused on utilizing next generation sequencing data for intraspecific and interspecific SNP marker development, validation, and creation of high-throughput genotyping methods to advance cotton research. I used transcriptome sequencing to develop and map the first gene-associated SNPs for five species, G. barbadense (Pima cotton), G. tomentosum, G. mustelinum, G. armourianum, and G. longicalyx. A total of 62,832 non-redundant SNPs were developed. These can be utilized for interspecific germplasm introgression into cultivated G. hirsutum, as well as for subsequent genetic analysis and manipulation. To create SNP-based resources for integrated physical mapping, I used BAC-end sequences (BESs) and resequecing data for 12 G. hirsutum lines, a Pima line and G. longicalyx to derive 132,262 intraspecific and 693,769 interspecific SNPs located in BESs. These SNP data sets were used to help build the first high-throughput genotyping array for cotton, the CottonSNP63K, which now provides a standardized platform for global cotton research. I applied the array to two F2 populations and produced the first two high-density SNP maps for cotton, one intraspecific and one interspecific. By resequencing two interspecific F1 hypo-aneuploids, I also demonstrated that the chromosome-wide changes in SNP genotypes enable highly effective mass-localization of BACs to individual cotton chromosomes. These efforts provide additional validation and placement methods that can be directly integrated with the physical map being constructed for G. hirsutum and enable the production of a high-quality draft genome sequence for cultivated cotton. I used transcriptome sequencing to develop and map the first gene-associated SNPs for five species, G. barbadense (Pima cotton), G. tomentosum, G. mustelinum, G. armourianum, and G. longicalyx. A total of 62,832 non-redundant SNPs were developed. These can be utilized for interspecific germplasm introgression into cultivated G. hirsutum, as well as for subsequent genetic analysis and manipulation. To create SNP-based resources for integrated physical mapping, I used BAC-end sequences (BESs) and resequecing data for 12 G. hirsutum lines, a Pima line and G. longicalyx to derive 132,262 intraspecific and 693,769 interspecific SNPs located in BESs. These SNP data sets were used to help build the first high-throughput genotyping array for cotton, the CottonSNP63K, which now provides a standardized platform for global cotton research. I applied the array to two F2 populations and produced the first two high-density SNP maps for cotton, one intraspecific and one interspecific. By resequencing two interspecific F1 hypo-aneuploids, I also demonstrated that the chromosome-wide changes in SNP genotypes enable highly effective mass-localization of BACs to individual cotton chromosomes. These efforts provide additional validation and placement methods that can be directly integrated with the physical map being constructed for G. hirsutum and enable the production of a high-quality draft genome sequence for cultivated cotton.

Description

Keywords

Cotton, Genome Sequence, Single Nucleotide Polymorphism, Resequencing, Physical Mapping, Intraspecific, Interspecific

Citation