Short tandem repeats (STRs) in the human genome are frequently used as genetic markers in population studies and they are also associated with genetic diseases.
Nevertheless, accurate localization and mapping of short tandem repeats in the reference genome have not been well addressed, and there is a lack of systematic
catalog of short tandem repeats in the most updated human reference genome. Here, we compiled an updated census of mono-nucleotide/di-nucleotide/tri-nucleotide
repeats (MNRs/DNRs/TNRs) from the human reference genome GRCh38, and collectively termed them polytracts. The resultant polytract dataset encompass TNRs, the
presumably more biologically significant repeat species, as well as the under-studied species MNRs and DNRs. With such a composition, the polytract dataset can
provide a negative control for genome analyses which are potentially confounded with polytract regions, and it can also be used as a discovery tool to screen for
particular MNR/DNR/TNR-characteristic genomic features. We integrated the polytract dataset with genome coordinates of RNA-editing sites, and found significant
enrichment of C-to-U and non-canonical RNA-editing events in adjacency of MNR polytracts, especially break-points of tandem polytracts. The same phenomenon was
not observed for the canonical A-to-I RNA editing type. This distinct enrichment patterns between canonical and non-canonical RNA-editing events provides a negative
evidence against the authenticity of non-canonical RNA-editing events. Similarly, we examined locations of enhancer sequences relative to polytracts, and found
varied degree of locational enrichment among subtypes of polytracts. In practice, different researchers may be interested in different genomic features, so we
developed a tool Polytrap to assist with general locational enrichment analysis of polytracts with respect to localizable genomic features. The software package
Polytrap is released on GitHub.
Beyond the human genome, STRs receive increasing attention in non-human organisms as well. To maximize the potential, Polytrap is we made it capable of handling
nine organisms (human, macaque, mouse, rat, dog, chicken, zebrafish, fruitfly, and yeast) and allowed it extendable to uncovered organisms with provisions of
Bioconductor’s genome support (DOI: 10.18129/B9.bioc.BSgenome). A UCSC track hub was configured to enable convenient visualization of polytracts through UCSC
Genome Browser, for four genomes (human, mouse, rat, and fruitfly).
Polytrap on GitHub (https://github.com/hui-sheen/polytrap)
Download polytracts (http://innovebioinfo.com/Annotation/Polytracts/tracts.html)
UCSC Track Hub (http://innovebioinfo.com/Annotation/Polytracts/trackHub/hub.txt)
Human track: (http://innovebioinfo.com/Annotation/Polytracts/trackHub/hg38/humanPolytracts.html)
Mouse track: (http://innovebioinfo.com/Annotation/Polytracts/trackHub/mm10/mousePolytracts.html)
Rat track: (http://innovebioinfo.com/Annotation/Polytracts/trackHub/rn6/ratPolytracts.html)
Fruitfly track: (http://innovebioinfo.com/Annotation/Polytracts/trackHub/dm6/fruitflyPolytracts.html)
Repeat BED files for CPDSeqer protocol
Contact: Dr Yan Guo, YaGuo@salud.unm.edu
Yu H, Zhao S, Ness S, Kang H, Sheng Q, Samuels DC, Oyebamiji O, Zhao Y.Y, Guo Y. Non-canonical RNA-DNA differences and other human genomic features are enriched within very short tandem repeats. Submitted to PLoS Comput Biol.