Mitochondrial DNA (mtDNA)


Eukaryotic cells typically contain several mitochondria, each containing several double-stranded circular DNA molecules. In vertebrate mitochondria, gene content, genome architecture, and gene strand asymmetry are almost invariant (Gissi et al., 2008). 

Critical comments

In mitogenomics, there is a lack of basic conventions: 

(I) there is no unequivocal assignment of strands. Historically, the strand with the lower G+T content was referred to as (L)-strand (Anderson et al., 1981). However, present-day strand assignments in vertebrates do not comply with the original definition (Lima & Prosdocimi, 2017). Recommendation: to avoid confusion, the strands should be distinguished by the relative number of genes contained, with the (+)-strand being the one containing more genes than the (–)-strand (Taanman 1999, fig.1; Gissi et al., 2008; Lima & Prosdocimi, 2017). 

(II) annotations may be based on either (+)- or (–)- strand. Recommendation: annotations should always be based on the (+)-strand. 

(III) annotations may start with any gene or the control region; they can even start within the control region. Recommendation: vertebrate annotations should start with gene F (Montaña-Lozano et al., 2022, Fig. 3) and  have the control region at the end. 

(IV) circular maps may be oriented with genes and the control region arranged either clockwise or counter-clockwise. Recommendation: circular maps should always be displayed in clockwise orientation. 

(V) circular maps may display any gene or the control region at the 12 o’clock position. Recommendation: circular maps should always display gene F (the DNA template for tRNA Phe) at the top center, and the control region immediately left of it. 

Mitogenome composition

In the avian ground pattern, the mitochondrial genome (mitogenome) contains 2 rRNA genes, 22 tRNA genes, 13 protein-coding genes, an non-coding control region, and possibly an extended tandem duplication. Compared to other vertebrates, in avian mitogenomes the positions of the gene clusters CYB:T:P and ND6:E are interchanged. The derived gene order CYB:T:P:ND6:E thus represents an avian ground-pattern autapomorphy (Montaña-Lozano et al., 2022). 

Circular map depicting the putative ancestral avian mitogenome organisation (not to scale). The tandem duplication (TD), which extends between the non-coding control region (CR) and F, is shown separately as it is absent in many bird taxa. When fully developed, it contains a pseudogene Ψ (a degenerate copy of CYB), four functional genes, and an extended control region (Urantówka et al., 2020). Transfer RNA genes are depicted by their one-letter amino-acid code [with L1 = trnL (CUN), L2 = trnL(UUR), S1 = trnS(AGN), and S2 = trnS(UCN)]. 

Avian tandem duplication

Most avian mitogenomes are distinguished from typical vertebrate mitogenomes by the presence of a large tandem duplication comprising several genes and the control region (Urantówka et al., 2018, 2020, 2021; Mackiewicz et al., 2019). Dating back to Haring et al. (1999) this region is also referred to as “pseudo-control region”. Duplicated genes are often (largely) identical to their counterpart, a phenomenon referred to as “concerted evolution” (Urantówka et al., 2021). The molecular mechanism underlying this phenomenon is unknown. In Galloanserae, tandem duplications are entirely absent, and it is unclear whether the lack is primary or secondary. Although representing a putative ground-pattern trait of most avian orders (Mackiewicz et al., 2019), there is a considerable amount of homoplasy in the observed configurations: 

Schematic map of the multigene tandem duplication (type 0) and variously derived configurations (types 1-7). The CR copy 1 was lost in moa (Dinornithiformes), recently extinct palaeognaths from New Zealand (type 7). 

Annotation of mitogenomes

After sequencing, DNA strands have to be analysed in order to identify individual tRNA-, rRNA-, and protein-coding genes, the extended non-coding region commonly referred to as control region, as well as intergenic spacers and overlaps. For protein-coding genes, putative start and stop codons need to be determined. Process and outcome of this procedure are referred to as "annotation". It should be noted, however, that the exact limits of genes are often not recognisable with certainty (Slack et al., 2003). 

Summary of avian mitochondrial annotations. Duplicated regions are not considered. Partial stop codons (TA and T) are completed to UAA by posttranscriptional polyadenylation. 


The protein-coding gene ND3 is peculiar in having an extra nucleotide (mostly cytosine) at position 174. Its presence probably pertains to the avian ground pattern, but it has been lost many times during avian evolution (Jing et al., 2020, suppl. 12). The extra base, however, appears not to be processed during translation, as the downstream reading frame and amino-acid sequence are conserved due to a translational +1 frameshift (Mindell et al., 1998b; Al-Arab et al., 2017; Andreu-Sánchez et al., 2020). 

Control region

The control region (CR), which typically has a length of about 1,150 bp, is the only extended non-coding region of the mitogenome. This region is also referred to as ‘D-loop’, although the true D-loop does neither span the entire control region nor is it found in all mtDNA molecules at any given time (Pereira et al., 2008; Nicholls & Minczuk, 2014). 

For descriptive purposes, Brown et al. (1986) first divided the control region into three domains, with a central conserved domain that is flanked by highly variable domains. However, exact boundaries to delimit the domains have not been defined. 

Variation in the length of the CR, due to repeated terminal sequences, accounts for much mtDNA size variability. 

Both mitochondrial DNA strands are transcribed as long polycistronic molecules, with transcription initiated from the heavy-strand promoter (HSP) and light-strand promoter (LSP), respectively. 

Comparative mitogenomics

Before phylogenetic analyses can be performed, annotated mtDNA sequences have to be aligned, i.e. homologous nucleotide positions (orthologs, “sites”) need to be determined (Cucini et al, 2021). This process relies on the identification of conserved blocks (Castresana, 2000). 

Phylogenetic studies based on mitogenomes

For decades, some of the coding genes (e.g. CO1, CYB, ND2) have routinely been used in phylogenetic studies. While individual gene trees derived from the coding genes and control region usually differ from each other and from species trees based on nuclear DNA, phylogenies that are based on entire mitogenomes are mostly concordant with nuclear DNA-based species trees. Because of the observed gene-tree discordance between individual mtDNA genes, phylogenetic studies should no longer rely on limited sets of mitochondrial genes but on the mitogenome as a whole (Meiklejohn et al., 2014; Havird  & Santos, 2014; Campillo et al., 2019). Since protein-coding mtDNA has a higher mutation rate than protein-coding nuclear DNA, mitochondrial genes are particularly suitable for studying shallow (intra- and interspecific) phylogenetic relationships. 

It may be helpful to partition the mitogenome into various subsets (e.g. according to codon position, RNA secondary structure pairing, and the coding/non-coding distinction) to accommodate data heterogeneity (Powell et al., 2012; de Panis et al., 2021). 

COI barcoding

DNA barcoding is a method of species identification by comparing DNA sequences of an unknown sample with DNA sequences of known species via public online reference databases. For animal species, the sequence used for DNA barcoding is a 648-bp fragment of the mitochondrial CO1 gene. The length of the fragment is determined by the limits of Sanger sequencing. 

COI-barcoding has been chosen, because it turned out that most animal species (except cnidarians) are separated from congeneric species by CO1 sequence divergences higher than 2%, while sequence divergences among conspecifics are usually less than 2% (Hebert et al., 2003). This observation is referred to as the “barcode gap” (Meyer & Paulay, 2005). More than 94% of morphologically defined bird species have been confirmed with COI as a species-level marker gene (Wang et al., 2020). 

In a comparative avian mitogenomic study, the CO1 gene proved to be the one with the least amount of rate heterogeneity across avian orders, thus being closest to a “molecular clock” (Pacheco et al., 2011). This explains its suitability as a reasonable indicator of species limits. 


