Comparative genomics

Whole genome alignment is a typical method in comparative genomics. This alignment of eight Yersinia bacteria genomes reveals 78 locally collinear blocks conserved among all eight taxa. Each chromosome has been laid out horizontally and homologous blocks in each genome are shown as identically colored regions linked across genomes. Regions that are inverted relative to Y. pestis KIM are shifted below a genome's center axis.[1]

Comparative genomics is a branch of biological research that examines genome sequences across a spectrum of species, spanning from humans and mice to a diverse array of organisms from bacteria to chimpanzees.[2][3] This large-scale holistic approach compares two or more genomes to discover the similarities and differences between the genomes and to study the biology of the individual genomes.[4] Comparison of whole genome sequences provides a highly detailed view of how organisms are related to each other at the gene level. By comparing whole genome sequences, researchers gain insights into genetic relationships between organisms and study evolutionary changes.[2] The major principle of comparative genomics is that common features of two organisms will often be encoded within the DNA that is evolutionarily conserved between them. Therefore, Comparative genomics provides a powerful tool for studying evolutionary changes among organisms, helping to identify genes that are conserved or common among species, as well as genes that give unique characteristics of each organism. Moreover, these studies can be performed at different levels of the genomes to obtain multiple perspectives about the organisms.[4]

The comparative genomic analysis begins with a simple comparison of the general features of genomes such as genome size, number of genes, and chromosome number. Table 1 presents data on several fully sequenced model organisms, and highlights some striking findings. For instance, while the tiny flowering plant Arabidopsis thaliana has a smaller genome than that of the fruit fly Drosophila melanogaster (157 million base pairs v. 165 million base pairs, respectively) it possesses nearly twice as many genes (25,000 v. 13,000). In fact, A. thaliana has approximately the same number of genes as humans (25,000). Thus, a very early lesson learned in the genomic era is that genome size does not correlate with evolutionary status, nor is the number of genes proportionate to genome size.[5]

Table 1: Comparative genome sizes of humans and other model organisms[2]
Organism Estimated size (base pairs) Chromosome number Estimated gene number
Human (Homo sapiens) 3.1 billion 46 25,000
Mouse (Mus musculus) 2.9 billion 40 25,000
Bovine (Bos taurus) 2.86 billion[6] 60[7] 22,000[8]
Fruit fly (Drosophila melanogater) 165 million 8 13,000
Plant (Arabidopsis thaliana) 157 million 10 25,000
Roundworm (Caenorhabditis elegans) 97 million 12 19,000
Yeast (Saccharomyces cerevisiae) 12 million 32 6,000
Bacteria (Escherichia coli) 4.6 million 1 3,200

In comparative genomics, synteny is the preserved order of genes on chromosomes of related species indicating their descent from a common ancestor. Synteny provides a framework in which the conservation of homologous genes and gene order is identified between genomes of different species.[9] Synteny blocks are more formally defined as regions of chromosomes between genomes that share a common order of homologous genes derived from a common ancestor.[10][11] Alternative names such as conserved synteny or collinearity have been used interchangeably.[12] Comparisons of genome synteny between and within species have provided an opportunity to study evolutionary processes that lead to the diversity of chromosome number and structure in many lineages across the tree of life;[13][14] early discoveries using such approaches include chromosomal conserved regions in nematodes and yeast,[15][16] evolutionary history and phenotypic traits of extremely conserved Hox gene clusters across animals and MADS-box gene family in plants,[17][18] and karyotype evolution in mammals and plants.[19]

Furthermore, comparing two genomes not only reveals conserved domains or synteny but also aids in detecting copy number variations, single nucleotide polymorphisms (SNPs), indels, and other genomic structural variations.

Virtually started as soon as the whole genomes of two organisms became available (that is, the genomes of the bacteria Haemophilus influenzae and Mycoplasma genitalium) in 1995, comparative genomics is now a standard component of the analysis of every new genome sequence.[2][20] With the explosion in the number of genome projects due to the advancements in DNA sequencing technologies, particularly the next-generation sequencing methods in late 2000s, this field has become more sophisticated, making it possible to deal with many genomes in a single study.[21] Comparative genomics has revealed high levels of similarity between closely related organisms, such as humans and chimpanzees, and, more surprisingly, similarity between seemingly distantly related organisms, such as humans and the yeast Saccharomyces cerevisiae.[22] It has also showed the extreme diversity of the gene composition in different evolutionary lineages.[20]

  1. ^ Darling AE, Miklós I, Ragan MA (July 2008). "Dynamics of genome rearrangement in bacterial populations". PLOS Genetics. 4 (7): e1000128. doi:10.1371/journal.pgen.1000128. PMC 2483231. PMID 18650965.
  2. ^ a b c d Touchman J (2010). "Comparative Genomics". Nature Education Knowledge. 3 (10): 13.
  3. ^ Xia X (2013). Comparative Genomics. SpringerBriefs in Genetics. Heidelberg: Springer. doi:10.1007/978-3-642-37146-2. ISBN 978-3-642-37145-5. S2CID 5491782.
  4. ^ a b Wei L, Liu Y, Dubchak I, Shon J, Park J (April 2002). "Comparative genomics approaches to study organism similarities and differences". Journal of Biomedical Informatics. 35 (2): 142–150. doi:10.1016/s1532-0464(02)00506-3. PMID 12474427.
  5. ^ Bennett MD, Leitch IJ, Price HJ, Johnston JS (April 2003). "Comparisons with Caenorhabditis (approximately 100 Mb) and Drosophila (approximately 175 Mb) using flow cytometry show genome size in Arabidopsis to be approximately 157 Mb and thus approximately 25% larger than the Arabidopsis genome initiative estimate of approximately 125 Mb". Annals of Botany. 91 (5): 547–557. doi:10.1093/aob/mcg057. PMC 4242247. PMID 12646499.
  6. ^ Zimin AV, Delcher AL, Florea L, Kelley DR, Schatz MC, Puiu D, et al. (2009). "A whole-genome assembly of the domestic cow, Bos taurus". Genome Biology. 10 (4): R42. doi:10.1186/gb-2009-10-4-r42. ISSN 1465-6906. PMC 2688933. PMID 19393038.
  7. ^ Holečková B, Schwarzbacherová V, Galdíková M, Koleničová S, Halušková J, Staničová J, et al. (2021-08-27). "Chromosomal Aberrations in Cattle". Genes. 12 (9): 1330. doi:10.3390/genes12091330. ISSN 2073-4425. PMC 8468509. PMID 34573313.
  8. ^ Elsik CG, Tellam RL, Worley KC (2009-04-24). "The Genome Sequence of Taurine Cattle: A window to ruminant biology and evolution". Science. 324 (5926): 522–528. Bibcode:2009Sci...324..522A. doi:10.1126/science.1169588. ISSN 0036-8075. PMC 2943200. PMID 19390049.
  9. ^ Liu D, Hunt M, Tsai IJ (January 2018). "Inferring synteny between genome assemblies: a systematic evaluation". BMC Bioinformatics. 19 (1): 26. doi:10.1186/s12859-018-2026-4. PMC 5791376. PMID 29382321.
  10. ^ Vergara IA, Chen N (September 2010). "Large synteny blocks revealed between Caenorhabditis elegans and Caenorhabditis briggsae genomes using OrthoCluster". BMC Genomics. 11: 516. doi:10.1186/1471-2164-11-516. PMC 2997010. PMID 20868500.
  11. ^ Tang H, Lyons E, Pedersen B, Schnable JC, Paterson AH, Freeling M (April 2011). "Screening synteny blocks in pairwise genome comparisons through integer programming". BMC Bioinformatics. 12: 102. doi:10.1186/1471-2105-12-102. PMC 3088904. PMID 21501495.
  12. ^ Ehrlich J, Sankoff D, Nadeau JH (September 1997). "Synteny conservation and chromosome rearrangements during mammalian evolution". Genetics. 147 (1): 289–296. doi:10.1093/genetics/147.1.289. PMC 1208112. PMID 9286688.
  13. ^ Zhang G, Li B, Li C, Gilbert MT, Jarvis ED, Wang J (2014-12-11). "Comparative genomic data of the Avian Phylogenomics Project". GigaScience. 3 (1): 26. doi:10.1186/2047-217X-3-26. PMC 4322804. PMID 25671091.
  14. ^ Howe KL, Bolt BJ, Cain S, Chan J, Chen WJ, Davis P, et al. (January 2016). "WormBase 2016: expanding to enable helminth genomic research". Nucleic Acids Research. 44 (D1): D774–D780. doi:10.1093/nar/gkv1217. PMC 4702863. PMID 26578572.
  15. ^ The C. elegans Sequencing Consortium (December 1998). "Genome sequence of the nematode C. elegans: a platform for investigating biology". Science. 282 (5396): 2012–2018. doi:10.1126/science.282.5396.2012. PMID 9851916.
  16. ^ Wong S, Wolfe KH (July 2005). "Birth of a metabolic gene cluster in yeast by adaptive gene relocation". Nature Genetics. 37 (7): 777–782. doi:10.1038/ng1584. PMID 15951822.
  17. ^ Luebeck EG (October 2010). "Cancer: Genomic evolution of metastasis". Nature. 467 (7319): 1053–1055. Bibcode:2010Natur.467.1053L. doi:10.1038/4671053a. PMID 20981088.
  18. ^ Ruelens P, de Maagd RA, Proost S, Theißen G, Geuten K, Kaufmann K (2013). "FLOWERING LOCUS C in monocots and the tandem origin of angiosperm-specific MADS-box genes". Nature Communications. 4: 2280. Bibcode:2013NatCo...4.2280R. doi:10.1038/ncomms3280. PMID 23955420.
  19. ^ Kemkemer C, Kohn M, Cooper DN, Froenicke L, Högel J, Hameister H, et al. (April 2009). "Gene synteny comparisons between different vertebrates provide new insights into breakage and fusion events during mammalian karyotype evolution". BMC Evolutionary Biology. 9 (1): 84. Bibcode:2009BMCEE...9...84K. doi:10.1186/1471-2148-9-84. PMC 2681463. PMID 19393055.
  20. ^ a b Koonin EV, Galperin MY (2003). Sequence - Evolution - Function: Computational approaches in comparative genomics. Dordrecht: Springer Science+Business Media.
  21. ^ Hu B, Xie G, Lo CC, Starkenburg SR, Chain PS (November 2011). "Pathogen comparative genomics in the next-generation sequencing era: genome alignments, pangenomics and metagenomics". Briefings in Functional Genomics. 10 (6): 322–333. doi:10.1093/bfgp/elr042. PMID 22199376.
  22. ^ Russel PJ, Hertz PE, McMillan B (2011). Biology: The Dynamic Science (2nd ed.). Belmont, CA: Brooks/Cole. pp. 409–410.

© MMXXIII Rich X Search. We shall prevail. All rights reserved. Rich X Search