APOBEC3

Exon Targeted Retrieval and Classification Toolbox (ExTRaCT): a gene search pipeline to find APOBEC3 Z-domains in novel bat genomes
Motivation Several computation gene search tools exist to identify and annotate an ever-growing body of newly sequenced genomes of different species. Many annotation tools, however, fall short when the target species diverges from well-studied model organisms, and when searching for short genes with multiple copies. Results We have developed the Exon Targeted Retrieval and Classification Toolbox, ExTRaCT, an automated pipeline to identify any gene exon with conserved structure in novel species genome assemblies. In the use cases presented here, we applied our search tool to 102 bat genomes to find APOBEC3 gene family members. We show that our homolog search algorithm is efficient (run time average of 5 hours for over 100 genomes), works well with reference sequences distantly related to the target (1 out of 498 misclassifications, 0 false positives and 2 false negatives), and is easy to use. As genomic sequencing becomes faster and more accessible, ExTRaCT has downstream applications in phylogenetic, biochemical and genomic studies. It is a simple computational tool that provides a solution to target gene identification, requiring neither whole-genome-assembly annotations, nor prior knowledge of closely related species.
Human and bats genome robustness under COSMIC mutational signatures
Carcinogenesis is an evolutionary process, and mutations can fix the selected phenotypes in selective microenvironments. Both normal and neoplastic cells are robust to the mutational stressors in the microenvironment to the extent that secure their fitness. To test the robustness of genes under a range of mutagens, we developed a sequential mutation simulator, Sinabro, to simulate single base substitution under a given mutational process. Then, we developed a pipeline to measure the robustness of genes and cells under those mutagenesis processes. We discovered significant human genome robustness to the APOBEC mutational signature SBS2, which is associated with viral defense mechanisms and is implicated in cancer. Robustness evaluations across over 70,000 sequences against 41 signatures showed higher resilience under signatures predominantly causing C-to-T (G-to-A) mutations. Principal component analysis indicates the GC content at the codon’s wobble position significantly influences robustness, with increased resilience noted under transition mutations compared to transversions. Then, we tested our results in bats at extremes of the lifespan-to-mass relationship and found the long-lived bat is more robust to APOBEC than the short-lived one. By revealing APOBEC as the prime driver of robustness in the human (and other mammalian) genome, this work bolsters the key potential role of APOBECs in carcinogenesis, as well as evolved countermeasures to this innate mutagenic process. It also provides the baseline of the human and bat genome robustness under mutational processes associated with cancer.