Deep Learning Genomics Methods for the Study of Regulatory Signals in Breast Tissue, Briana Macedo UG '22 (2312983)
It is now possible to utilize large genomics datasets to understand how mutations in our genetic code lead to disease, and in turn, create fine-tuned therapies. We leverage predictions from a deep learning sequence model called DeepSEA to predict the effect of noncoding variants on chromatin profiles. We analyze variants at specific positions surrounding the transcription start site of 17 breast-related genes in order to identify common pathways that may be involved in breast carcinomas. Spearman correlation tests were performed to compare all chromatin feature predictions for pairs of genes. We score gene pairs based on how many of their chromatin features have correlated mutation effects (corrected p-value < 0.05, correlation coefficient > 0.2). This analysis identified TBX3, GATA3, NOTCH4, and ROBO1 to have higher total correlation scores when compared to other breast genes instead of control background genes. Next, a dimensionality reduction was applied to the chromatin feature predictions using PCA followed by UMAP visualization. This analysis resulted in three clusters of breast-related genes. This may suggest that clustered genes are regulated by similar mechanisms and may belong to related biological pathways. Future analysis will involve investigating these potential gene pathways, and how disruption in these pathways can cause detrimental biological effects. Furthermore, we seek to gain a greater understanding about what characteristics about a gene influence their regulatory profiles. The prediction and characterization of chromatin profile effects may identify potential drug targets.