Raluca Gordan

Positions:

Associate Professor in Biostatistics & Bioinformatics

Biostatistics & Bioinformatics
School of Medicine

Associate Professor of Computer Science

Computer Science
Trinity College of Arts & Sciences

Associate Professor in Molecular Genetics and Microbiology

Molecular Genetics and Microbiology
School of Medicine

Associate Professor of Cell Biology

Cell Biology
School of Medicine

Member of the Duke Cancer Institute

Duke Cancer Institute
School of Medicine

Education:

Ph.D. 2009

Duke University

Grants:

HARDAC+:Reproducible HPC for next-generation genomics

Administered By
Duke Center for Genomic and Computational Biology
Awarded By
North Carolina Biotechnology Center
Role
Major User
Start Date
End Date

Beyond GWAS: High Throughput Functional Genomics & Epigenome Editing to Elucidate the Effects of Genetic Associations for Schizophrenia

Administered By
Pediatrics, Medical Genetics
Awarded By
National Institutes of Health
Role
Co Investigator
Start Date
End Date

Design, prediction, and prioritization of systematic perturbations of the human genome

Administered By
Biostatistics & Bioinformatics
Awarded By
National Institutes of Health
Role
Co Investigator
Start Date
End Date

Helix-distorting DNA damages at transcription factor binding sites: causes and effects

Administered By
Biostatistics & Bioinformatics
Awarded By
US-Israel Binational Science Foundation
Role
Principal Investigator
Start Date
End Date

Role of DNA structural dynamics in mutagenesis and oncogenesis

Administered By
Biochemistry
Awarded By
National Institutes of Health
Role
Co Investigator
Start Date
End Date

Publications:

Nonconsensus Protein Binding to Repetitive DNA Sequence Elements Significantly Affects Eukaryotic Genomes.

Recent genome-wide experiments in different eukaryotic genomes provide an unprecedented view of transcription factor (TF) binding locations and of nucleosome occupancy. These experiments revealed that a large fraction of TF binding events occur in regions where only a small number of specific TF binding sites (TFBSs) have been detected. Furthermore, in vitro protein-DNA binding measurements performed for hundreds of TFs indicate that TFs are bound with wide range of affinities to different DNA sequences that lack known consensus motifs. These observations have thus challenged the classical picture of specific protein-DNA binding and strongly suggest the existence of additional recognition mechanisms that affect protein-DNA binding preferences. We have previously demonstrated that repetitive DNA sequence elements characterized by certain symmetries statistically affect protein-DNA binding preferences. We call this binding mechanism nonconsensus protein-DNA binding in order to emphasize the point that specific consensus TFBSs do not contribute to this effect. In this paper, using the simple statistical mechanics model developed previously, we calculate the nonconsensus protein-DNA binding free energy for the entire C. elegans and D. melanogaster genomes. Using the available chromatin immunoprecipitation followed by sequencing (ChIP-seq) results on TF-DNA binding preferences for ~100 TFs, we show that DNA sequences characterized by low predicted free energy of nonconsensus binding have statistically higher experimental TF occupancy and lower nucleosome occupancy than sequences characterized by high free energy of nonconsensus binding. This is in agreement with our previous analysis performed for the yeast genome. We suggest therefore that nonconsensus protein-DNA binding assists the formation of nucleosome-free regions, as TFs outcompete nucleosomes at genomic locations with enhanced nonconsensus binding. In addition, here we perform a new, large-scale analysis using in vitro TF-DNA preferences obtained from the universal protein binding microarrays (PBM) for ~90 eukaryotic TFs belonging to 22 different DNA-binding domain types. As a result of this new analysis, we conclude that nonconsensus protein-DNA binding is a widespread phenomenon that significantly affects protein-DNA binding preferences and need not require the presence of consensus (specific) TFBSs in order to achieve genome-wide TF-DNA binding specificity.
Authors
Afek, A; Cohen, H; Barber-Zucker, S; Gordân, R; Lukatsky, DB
MLA Citation
Afek, Ariel, et al. “Nonconsensus Protein Binding to Repetitive DNA Sequence Elements Significantly Affects Eukaryotic Genomes.Plos Comput Biol, vol. 11, no. 8, Aug. 2015, p. e1004429. Pubmed, doi:10.1371/journal.pcbi.1004429.
URI
https://scholars.duke.edu/individual/pub1102703
PMID
26285121
Source
pubmed
Published In
Plos Computational Biology
Volume
11
Published Date
Start Page
e1004429
DOI
10.1371/journal.pcbi.1004429

Quantitative modeling of transcription factor binding specificities using DNA shape.

DNA binding specificities of transcription factors (TFs) are a key component of gene regulatory processes. Underlying mechanisms that explain the highly specific binding of TFs to their genomic target sites are poorly understood. A better understanding of TF-DNA binding requires the ability to quantitatively model TF binding to accessible DNA as its basic step, before additional in vivo components can be considered. Traditionally, these models were built based on nucleotide sequence. Here, we integrated 3D DNA shape information derived with a high-throughput approach into the modeling of TF binding specificities. Using support vector regression, we trained quantitative models of TF binding specificity based on protein binding microarray (PBM) data for 68 mammalian TFs. The evaluation of our models included cross-validation on specific PBM array designs, testing across different PBM array designs, and using PBM-trained models to predict relative binding affinities derived from in vitro selection combined with deep sequencing (SELEX-seq). Our results showed that shape-augmented models compared favorably to sequence-based models. Although both k-mer and DNA shape features can encode interdependencies between nucleotide positions of the binding site, using DNA shape features reduced the dimensionality of the feature space. In addition, analyzing the feature weights of DNA shape-augmented models uncovered TF family-specific structural readout mechanisms that were not revealed by the DNA sequence. As such, this work combines knowledge from structural biology and genomics, and suggests a new path toward understanding TF binding and genome function.
Authors
Zhou, T; Shen, N; Yang, L; Abe, N; Horton, J; Mann, RS; Bussemaker, HJ; Gordân, R; Rohs, R
MLA Citation
Zhou, Tianyin, et al. “Quantitative modeling of transcription factor binding specificities using DNA shape.Proc Natl Acad Sci U S A, vol. 112, no. 15, Apr. 2015, pp. 4654–59. Pubmed, doi:10.1073/pnas.1422023112.
URI
https://scholars.duke.edu/individual/pub1102705
PMID
25775564
Source
pubmed
Published In
Proc Natl Acad Sci U S A
Volume
112
Published Date
Start Page
4654
End Page
4659
DOI
10.1073/pnas.1422023112

Finding regulatory DNA motifs using alignment-free evolutionary conservation information

As an increasing number of eukaryotic genomes are being sequenced, comparative studies aimed at detecting regulatory elements in intergenic sequences are becoming more prevalent. Most comparative methods for transcription factor (TF) binding site discovery make use of global or local alignments of orthologous regulatory regions to assess whether a particular DNA site is conserved across related organisms, and thus more likely to be functional. Since binding sites are usually short, sometimes degenerate, and often independent of orientation, alignment algorithms may not align them correctly. Here, we present a novel, alignment-free approach for using conservation information for TF binding site discovery. We relax the definition of conserved sites: we consider a DNA site within a regulatory region to be conserved in an orthologous sequence if it occurs anywhere in that sequence, irrespective of orientation. We use this definition to derive informative priors over DNA sequence positions, and incorporate these priors into a Gibbs sampling algorithm for motif discovery. Our approach is simple and fast. It requires neither sequence alignments nor the phylogenetic relationships between the orthologous sequences, yet it is more effective on real biological data than methods that do. © The Author(s) 2010. Published by Oxford University Press.
Authors
Gordân, R; Narlikar, L; Hartemink, AJ
MLA Citation
Gordân, R., et al. “Finding regulatory DNA motifs using alignment-free evolutionary conservation information.” Nucleic Acids Research, vol. 38, no. 6, Jan. 2010. Scopus, doi:10.1093/nar/gkp1166.
URI
https://scholars.duke.edu/individual/pub773816
Source
scopus
Published In
Nucleic Acids Research
Volume
38
Published Date
DOI
10.1093/nar/gkp1166

Toward deciphering the mechanistic role of variations in the Rep1 repeat site in the transcription regulation of SNCA gene.

Short structural variants-variants other than single nucleotide polymorphisms-are hypothesized to contribute to many complex diseases, possibly by modulating gene expression. However, the molecular mechanisms by which noncoding short structural variants exert their effects on gene regulation have not been discovered. Here, we study simple sequence repeats (SSRs), a common class of short structural variants. Previously, we showed that repetitive sequences can directly influence the binding of transcription factors to their proximate recognition sites, a mechanism we termed non-consensus binding. In this study, we focus on the SSR termed Rep1, which was associated with Parkinson's disease (PD) and has been implicated in the cis-regulation of the PD-risk SNCA gene. We show that Rep1 acts via the non-consensus binding mechanism to affect the binding of transcription factors from the GATA and ELK families to their specific sites located right next to the Rep1 repeat. Next, we performed an expression analysis to further our understanding regarding the GATA and ELK family members that are potentially relevant for SNCA transcriptional regulation in health and disease. Our analysis indicates a potential role for GATA2, consistent with previous reports. Our study proposes non-consensus transcription factor binding as a potential mechanism through which noncoding repeat variants could exert their pathogenic effects by regulating gene expression.
Authors
Afek, A; Tagliafierro, L; Glenn, OC; Lukatsky, DB; Gordan, R; Chiba-Falek, O
MLA Citation
Afek, A., et al. “Toward deciphering the mechanistic role of variations in the Rep1 repeat site in the transcription regulation of SNCA gene.Neurogenetics, vol. 19, no. 3, Aug. 2018, pp. 135–44. Pubmed, doi:10.1007/s10048-018-0546-8.
URI
https://scholars.duke.edu/individual/pub1315671
PMID
29730780
Source
pubmed
Published In
Neurogenetics
Volume
19
Published Date
Start Page
135
End Page
144
DOI
10.1007/s10048-018-0546-8

Distinguishing direct versus indirect transcription factor-DNA interactions.

Transcriptional regulation is largely enacted by transcription factors (TFs) binding DNA. Large numbers of TF binding motifs have been revealed by ChIP-chip experiments followed by computational DNA motif discovery. However, the success of motif discovery algorithms has been limited when applied to sequences bound in vivo (such as those identified by ChIP-chip) because the observed TF-DNA interactions are not necessarily direct: Some TFs predominantly associate with DNA indirectly through protein partners, while others exhibit both direct and indirect binding. Here, we present the first method for distinguishing between direct and indirect TF-DNA interactions, integrating in vivo TF binding data, in vivo nucleosome occupancy data, and motifs from in vitro protein binding microarray experiments. When applied to yeast ChIP-chip data, our method reveals that only 48% of the data sets can be readily explained by direct binding of the profiled TF, while 16% can be explained by indirect DNA binding. In the remaining 36%, none of the motifs used in our analysis was able to explain the ChIP-chip data, either because the data were too noisy or because the set of motifs was incomplete. As more in vitro TF DNA binding motifs become available, our method could be used to build a complete catalog of direct and indirect TF-DNA interactions. Our method is not restricted to yeast or to ChIP-chip data, but can be applied in any system for which both in vivo binding data and in vitro DNA binding motifs are available.
Authors
Gordân, R; Hartemink, AJ; Bulyk, ML
MLA Citation
Gordân, Raluca, et al. “Distinguishing direct versus indirect transcription factor-DNA interactions.Genome Res, vol. 19, no. 11, Nov. 2009, pp. 2090–100. Pubmed, doi:10.1101/gr.094144.109.
URI
https://scholars.duke.edu/individual/pub773817
PMID
19652015
Source
pubmed
Published In
Genome Res
Volume
19
Published Date
Start Page
2090
End Page
2100
DOI
10.1101/gr.094144.109

Research Areas:

Algorithms
Binding Sites
Cell Cycle
Cell Cycle Proteins
Chromatin
Chromatin Immunoprecipitation
Chromosomal Proteins, Non-Histone
Conserved Sequence
DNA
DNA, Fungal
DNA-Binding Proteins
Drosophila Proteins
Drosophila melanogaster
Forkhead Transcription Factors
Genome
Humans
Linear Models
Models, Biological
Models, Molecular
Molecular Sequence Data
Nucleic Acid Conformation
Nucleosomes
Oligonucleotide Array Sequence Analysis
Origin Recognition Complex
Promoter Regions, Genetic
Protein Array Analysis
Protein Binding
Protein Interaction Domains and Motifs
Protein Interaction Mapping
Saccharomyces cerevisiae
Saccharomyces cerevisiae Proteins
Sequence Alignment
Sequence Analysis, DNA
Software
Support Vector Machine
Support Vector Machines
Telomere-Binding Proteins
Thermodynamics
Transcription Factors