banner02
Benchmarking TE annotation tools for non-model plant species with TEgenomeSimulator
Ting-Hsuan Chen  1@  , Cecilia Deng  2  , Susan Thomson  3  
1 : Lincoln Research Centre, The New Zealand Institute for Plant and Food Research Limited
Lincoln -  New Zealand
2 : Mount Albert Research Centre, The New Zealand Institute for Plant and Food Research Limited
Auckland -  New Zealand
3 : Lincoln Research Centre, The New Zealand Institute for Plant and Food Research Limited
Lincoln -  New Zealand

Transposable elements (TEs) have long been considered repetitive genomic noise that confuse bioinformatics and genomic analyses. TE annotation on a genome assembly has historically been underutilized, more often only applied to hard-masking and filtering out TE sequences before gene annotation. With increasing evidence of both the genetic and epigenetic roles TEs play in gene regulation, environmental adaptation and genome evolution, accurate detection and categorization of these important elements is essential. However, TE annotation tools not only deliver varied results in the representative sequences of TE families and TE categories, but each TE annotator may perform inconsistently across species.

Most TE annotators have been benchmarked on model organisms having manually curated TE libraries for comparison. However, there is no reliable method to assess their performance on non-model organisms. To provide a benchmarking solution for non-model organisms, we integrated and enhanced the approaches used by Wei et al. and Rodriguez and Makałowski, to build TEgenomeSimulator. TEgenomeSimulator is a Python package that can be operated in two modes:

  • Creating an unstructured synthetic genome with multiple chromosomes, followed by random TE insertions sourced from curated TE libraries;
  • Utilizing a hybrid approach that inserts TEs randomly into a user-provided non-TE genome where the existing annotated TE features have been removed from a real genome assembly.

In both these modes, curated TE libraries can be mutagenized to simulate varying rates of diversity, e.g., sequence divergence, target site duplications and nested insertion rate, prior to insertion.

To test TEgenomeSimulator, we used the hybrid approach with the recently published assembly of ‘Donghong' (Actinidia chinensis)7 as an example. We present results from benchmarking a series of tools including EDTA, RepeatModeler2, and EarlGrey on ‘Donghong' following insertion of simulated TE sequences generated from curated TE libraries of Arabidopsis thaliana, Oryza sativa, and Zea maize.



  • Poster
Online user: 5 Privacy
Loading...