Casein gene cluster in camelids: comparative genome analysis and new findings on haplotype variability and physical mapping.
The structure of casein genes has been fully understood in llamas, whereas in other camelids, this information is still incomplete. In fact, structure and polymorphisms have been identified in three (CSN1S1, αs1-CN; CSN2, β-CN; CSN3, κ-CN) out of four casein genes, whereas controversial information is available for the CSN1S2 (αs2-CN) in terms of structure and genetic diversity. Data from the genome analysis, whose assembly is available for feral camel, Bactrian, dromedary, and alpaca, can contribute to a better knowledge. However, a majority of the scaffolds available in GenBank are still unplaced, and the comparative annotation is often inaccurate or lacking. Therefore, the aims of this study are (1) to perform a comparative genome analysis and synthesize the literature data on camelids casein cluster; (2) to analyze the casein variability in two dromedary populations (Sudanese and Nigerian) using polymorphisms at CSN1S1 (c.150 G > T), CSN2 (g.2126A > G), and CSN3 (g.1029T > C); and (3) to physically map the casein cluster in alpaca. Exon structures, gene and intergenic distances, large insertion/deletion events, SNPs, and microsatellites were annotated. In all camelids, the CSN1S2 consists of 17 exons, confirming the structure of llama CSN1S2 gene. The comparative analysis of the complete casein cluster (∼190 kb) shows 12,818 polymorphisms. The most polymorphic gene is the CSN1S1 (99 SNPs in Bactrian vs. 248 in dromedary vs. 626 in alpaca). The less polymorphic is the CSN3 in the Bactrian (22 SNPs) and alpaca (301 SNPs), whereas it is the CSN1S2 in dromedary (79 SNPs). In the two investigated dromedary populations, the allele frequencies for the three markers are slightly different: the allele C at CSN1S1 is very rare in Nigerian (0.054) and Sudanese dromedaries (0.094), whereas the frequency of the allele G at CSN2 is almost inverted. Haplotype analysis evidenced GAC as the most frequent (0.288) and TGC as the rarest (0.005). The analysis of R-banding metaphases hybridized with specific probes mapped the casein genes on chromosome 2q21 in alpaca. These data deepen the information on the structure of the casein cluster in camelids and add knowledge on the cytogenetic map and haplotype variability.