The reference genome GRCm39 is commonly used for alignment of genomic data in studies of mouse genetics. However, this assembly was generated using C57BL/6J mice, and many in vitro studies use E14 mouse embryonic stem cells derived from 129/Ola mice. A recent study identified over 30,000 genomic insertions and 27,000 deletions in 129 mice compared to the GRCm39 reference (Ferraj et al., 2023). Furthermore, even within sub-strains of 129 mice there is further genetic diversity. Transposable elements are the main source of insertions, and their high sequence homology can promote other types of structural variation, such as deletions or translocations. Hence, a high-quality strain-specific reference genome is needed to obtain an accurate reflection of the abundance and distribution of repetitive elements and enable study of their functional roles.
Using nanopore direct whole genome DNA sequencing and the 129/SvImJ reference generated by Ferraj et al (2023), we have generated a novel assembly of the E14 genome. Here, we identify a further 3000 novel insertions over 100bp compared to 129S1/SvImJ reference. This suggests that the GRCm38/39 references are an inaccurate reflection of the number, type, and location of TE loci in the E14 genome, and may result in incorrect assignment of TE-derived reads in genomic datasets.