Transposable elements (TEs) are drivers of evolution, acting as the substrate by which massive genotypic change can occur. Although often associated with deleterious effects such as disease states, TEs can also provide plasticity to endogenous regulatory networks by contributing additional promoter sequences, protein-coding regions and transcription factor binding sites. In addition, each species contains a unique repertoire of TEs, and will therefore differentially contribute to diversity therein. As such, it is difficult to overstate the importance of careful TE annotation in genome assemblies. With the advent of long-read sequencing technologies at lower cost, it is often tempting to solely rely on high-throughput computational pipelines. While an excellent starting point, automatic pipelines have their drawbacks. For example, novel repeats, such as chimeric elements and undiscovered repeat orders, will likely be overlooked. It is therefore crucial to assess individual models via manual curation in order to produce a high-quality repeat library. For example, a curated library for the human genome reveals an additional 11% TE content over an un-curated RepeatModeler2 library, while an additional 3% can be gained for Drosophila melanogaster.
Current TE curation practices emphasize the use of consensus sequences. However, by only focusing on the average sequence, the rich information the contributing insertions provide is often ignored or discarded. The full alignment for each potential TE family can provide subfamily information, precise classification, and indication of whether the entire consensus has been discovered. Only with the combination of a high-quality TE library and complete genome assembly can a full picture of evolution take place.