De novo distillation of thermodynamic affinity from deep learning regulatory sequence models of in vivo protein-DNA binding

Amr M Alexandari; Connor A Horton; Avanti Shrikumar; Nilay Shah; Eileen Li; Melanie Weilert; Miles A Pufall; Julia Zeitlinger; Polly M Fordyce; Anshul Kundaje

doi:10.1101/2023.05.11.540401

Back

De novo distillation of thermodynamic affinity from deep learning regulatory sequence models of in vivo protein-DNA binding

Preprint

Open access

De novo distillation of thermodynamic affinity from deep learning regulatory sequence models of in vivo protein-DNA binding

Amr M Alexandari, Connor A Horton, Avanti Shrikumar, Nilay Shah, Eileen Li, Melanie Weilert, Miles A Pufall, Julia Zeitlinger, Polly M Fordyce and Anshul Kundaje

bioRxiv : the preprint server for biology

05/11/2023

DOI: 10.1101/2023.05.11.540401

PMCID: PMC10197627

PMID: 37214836

Files and links (1)

url

https://doi.org/10.1101/2023.05.11.540401View

Preprint (Author's original)This preprint has not been evaluated by subject experts through peer review. Preprints may undergo extensive changes and/or become peer-reviewed journal articles. Open Access

Abstract

Transcription factors (TF) are proteins that bind DNA in a sequence-specific manner to regulate gene transcription. Despite their unique intrinsic sequence preferences, genomic occupancy profiles of TFs differ across cellular contexts. Hence, deciphering the sequence determinants of TF binding, both intrinsic and context-specific, is essential to understand gene regulation and the impact of regulatory, non-coding genetic variation. Biophysical models trained on TF binding assays can estimate intrinsic affinity landscapes and predict occupancy based on TF concentration and affinity. However, these models cannot adequately explain context-specific, binding profiles. Conversely, deep learning models, trained on TF binding assays, effectively predict and explain genomic occupancy profiles as a function of complex regulatory sequence syntax, albeit without a clear biophysical interpretation. To reconcile these complementary models of and TF binding, we developed Affinity Distillation (AD), a method that extracts thermodynamic affinities from deep learning models of TF chromatin immunoprecipitation (ChIP) experiments by marginalizing away the influence of genomic sequence context. Applied to neural networks modeling diverse classes of yeast and mammalian TFs, AD predicts energetic impacts of sequence variation within and surrounding motifs on TF binding as measured by diverse assays with superior dynamic range and accuracy compared to motif-based methods. Furthermore, AD can accurately discern affinities of TF paralogs. Our results highlight thermodynamic affinity as a key determinant of binding, suggest that deep learning models of binding implicitly learn high-resolution affinity landscapes, and show that these affinities can be successfully distilled using AD. This new biophysical interpretation of deep learning models enables high-throughput experiments to explore the influence of sequence context and variation on both intrinsic affinity and occupancy.

Details

Title: Subtitle: De novo distillation of thermodynamic affinity from deep learning regulatory sequence models of in vivo protein-DNA binding
Creators: Amr M Alexandari
Connor A Horton
Avanti Shrikumar
Nilay Shah
Eileen Li
Melanie Weilert
Miles A Pufall
Julia Zeitlinger
Polly M Fordyce
Anshul Kundaje
Resource Type: Preprint
Publication Details: bioRxiv : the preprint server for biology
DOI: 10.1101/2023.05.11.540401
PMID: 37214836
PMCID: PMC10197627
Language: English
Date posted: 05/11/2023
Academic Unit: Biochemistry and Molecular Biology
Record Identifier: 9984419357402771

Metrics

13 Record Views

See more details