of elements


    CR1: EN  RT
    CRE: RT
    I: EN  RT
    Jockey: EN  RT
    L1: EN  EN
    L2: EN  RT
    R1: EN  RT
    R2: RT     RandI: EN  RT
    Rex1: EN  RT
    RTE: EN  RT
    Tad1: EN  RT


  other MGEs

Non-long terminal repeat (non-LTR) retrotransposons are a class of mobile genetic elements (MGEs) that have been found in most eukaryotic genomes, sometimes in extremely high numbers. Computational methods for genome-wide identification of MGEs have become increasingly necessary for both genome annotation and evolutionary studies. We developed an computational approach to the identification of non-LTR retrotransposons in genomic sequences, based on a generalized hidden Markov model (GHMM), in which the hidden states represent the protein domains and linker regions encoded in the non-LTR retrotransposons, and their emission probabilities are modeled by profile hidden Markov models (for protein domains) and Gaussian Bayes classifiers (for linker regions), respectively. In order to classify the non-LTR retrotransposons into one of the twelve previously characterized clades using the same model, we defined separate hidden states for different clades. Our method was tested on the genome sequences of four eukaryotic organisms, Drosophila melanogaster, Daphnia pulex, Ciona intestinalis, and Strongylocentrotus purpuratus. For the D. melanogaster genome, our method found all known full-length elements and simultaneously classified them into the clades, CR1, I, Jockey, LOA, and R1. Notably, for the D. pulex genome, in which no non-LTR retrotransposon has been annotated, our method found a significantly larger number of elements, compared with the results of RepeatMasker using the current version of the RepBase Update (RU) library. We also identified novel elements in the other two genomes, which has been partially studied for non-LTR retrotransposons.

Table1. Summary of ORF-preserving non-LTR retrotransposons