Polytract

Abstract

       Short tandem repeats (STRs) in the human genome are frequently used as genetic markers in population studies and they are also associated with genetic diseases. Nevertheless, accurate localization and mapping of short tandem repeats in the reference genome have not been well addressed, and there is a lack of systematic catalog of short tandem repeats in the most updated human reference genome. Here, we compiled an updated census of mono-nucleotide/di-nucleotide/tri-nucleotide repeats (MNRs/DNRs/TNRs) from the human reference genome GRCh38, and collectively termed them polytracts. The resultant polytract dataset encompass TNRs, the presumably more biologically significant repeat species, as well as the under-studied species MNRs and DNRs. With such a composition, the polytract dataset can provide a negative control for genome analyses which are potentially confounded with polytract regions, and it can also be used as a discovery tool to screen for particular MNR/DNR/TNR-characteristic genomic features. We integrated the polytract dataset with genome coordinates of RNA-editing sites, and found significant enrichment of C-to-U and non-canonical RNA-editing events in adjacency of MNR polytracts, especially break-points of tandem polytracts. The same phenomenon was not observed for the canonical A-to-I RNA editing type. This distinct enrichment patterns between canonical and non-canonical RNA-editing events provides a negative evidence against the authenticity of non-canonical RNA-editing events. Similarly, we examined locations of enhancer sequences relative to polytracts, and found varied degree of locational enrichment among subtypes of polytracts. In practice, different researchers may be interested in different genomic features, so we developed a tool Polytrap to assist with general locational enrichment analysis of polytracts with respect to localizable genomic features. The software package Polytrap is released on GitHub.
       Beyond the human genome, STRs receive increasing attention in non-human organisms as well. To maximize the potential, Polytrap is we made it capable of handling nine organisms (human, macaque, mouse, rat, dog, chicken, zebrafish, fruitfly, and yeast) and allowed it extendable to uncovered organisms with provisions of Bioconductor’s genome support (DOI: 10.18129/B9.bioc.BSgenome). A UCSC track hub was configured to enable convenient visualization of polytracts through UCSC Genome Browser, for four genomes (human, mouse, rat, and fruitfly).



Polytrap on GitHub (https://github.com/hui-sheen/polytrap)

Download polytracts (http://innovebioinfo.com/Annotation/Polytracts/tracts.html)

UCSC Track Hub (http://innovebioinfo.com/Annotation/Polytracts/trackHub/hub.txt)

Human track: (http://innovebioinfo.com/Annotation/Polytracts/trackHub/hg38/humanPolytracts.html)

Mouse track: (http://innovebioinfo.com/Annotation/Polytracts/trackHub/mm10/mousePolytracts.html)

Rat track: (http://innovebioinfo.com/Annotation/Polytracts/trackHub/rn6/ratPolytracts.html)

Fruitfly track: (http://innovebioinfo.com/Annotation/Polytracts/trackHub/dm6/fruitflyPolytracts.html)

Repeat BED files for CPDSeqer protocol

Contact: Dr Yan Guo, YaGuo@salud.unm.edu
               Yu H, Zhao S, Ness S, Kang H, Sheng Q, Samuels DC, Oyebamiji O, Zhao Y.Y, Guo Y. Non-canonical RNA-DNA differences and other human genomic features are enriched within very short tandem repeats. Submitted to PLoS Comput Biol.