photo of Harmen Bussemaker
607E Fairchild, MC 2441
New York
Office Phone: 
(212) 854-9932
Lab Phone: 
(212) 854-1527
(212) 865-8246
Short Research Description: 

Data-driven predictive modeling of gene regulatory networks (lab website)

Full Research Description: 

SELEX-seq. We have uncovered new mechanistic principles governing the DNA binding specificity of multi-transcription-factor complexes. In close collaboration with the lab of Dr. Richard Mann at CUMC, we developed SELEX-seq, an approach that combines affinity-based selection from pools of “random” DNA molecules in electrophoretic mobility shift assays with massive parallel sequencing and statistical/biophysical modeling in order to generate detailed sequence-to-affinity models (Slattery et al., Cell, 2011 and Riley et al., Methods Mol. Biol., 2014). SELEX-seq is versatile and has quickly been adopted by other labs.

Latent specificity of Hox proteins. In the same study (Slattery et al., Cell, 2011), we applied SELEX-seq to understand how Hox proteins – which play a crucial role in body plan formation and are conserved between fruit flies and humans – can have dramatically different mutant phenotypes even when as individual proteins they bind to DNA with very similar sequence preferences. We demonstrated that the presence of the co-factor Extradenticle (Exd) causes differences between the Hox proteins to reveal themselves. This “latent specificity” mechanism goes beyond standard cooperative binding and could be quite general. The differences between the Hox proteins seem to have their origin in how they read the “shape” of the DNA minor groove; we explore this question in collaboration with the group of Remo Rohs at the University of Southern California. In close collaboration with the Mann lab, we are currently investigating this in detail using SELEX-seq analysis of mutated proteins; we are also extending our approach to ternary homeodomain complexes, which can bind in a variety of configurations. Modeling the rich and dynamic DNA binding behavior of multi-transcription-factor complexes will remain an important theme in our research over the coming years.

Analyzing DNaseI biases reveals a novel DNA methylation readout mechanism. In a recent study whose original aim was to carefully characterize the intrinsic sequence preferences of the widely used footprinting enzyme DNaseI, we uncovered a new and unexpected general mechanism by which DNA methylation can enhance binding by transcription factors (Lazarovici et al., PNAS, 2013). Specifically, it demonstrates how cytosine methylation can change the sequence preferences of DNA binding proteins by an order of magnitude. By analyzing deeply sequenced digests of purified human genomic DNA, we made two striking discoveries: (i) DNaseI cleavage rate varies over a thousand-fold range with the surrounding sequence, and (ii) cleavage near CpG dinucleotides is 10-20 fold higher when the cytosine is methylated. By combining computer simulations of DNA shape performed by the group of Remo Rohs with statistical analysis of massively parallel sequencing data collected in the laboratory of our collaborator Dr. John Stamatoyannopoulos at the University of Washington, we were able to find a unified explanation for these phenomena. It turns out that cytosine methylation narrows the DNA minor groove, which in turn strengthens interactions with positively charged amino-acid side chains. Such minor groove contacts occur for a wide range of transcription factors, as well as nucleosomes. The novel structural mechanism put forward in this study therefore has the potential to significantly deepen our understanding of how epigenetic information is "read" by the cell.

Building sequence-to-affinity models of unprecedented accuracy from high-throughput in vitro protein-DNA interaction data. A key long-term goal of our lab remains the construction of a data-driven “universal” protein-DNA recognition code. In recent years, there has been a surge in the development and application of high-throughput in vitro methods for profiling the DNA sequence specificity of transcription factors. These factors come in large families of closely related proteins that differ from each other in subtle but important ways. Only by first accurately quantifying their DNA binding specificity can we hope to understand and predict their specific functions in the cell through integration with in vivo measurements of genomewide TF binding (ChIP-seq) and gene expression (RNA-seq). The popular protein binding microarray (PBM) technology assays the interaction of a given protein with tens of thousands of DNA probes in parallel. Inferring in vitro binding models from PBM data, which are complex and contain various biases, is not at all straightforward. Indeed, at least 26 different algorithms have been developed to infer “motifs” from them. Of these, the FeatureREDUCE tool that our lab developed (Riley et al., eLife, 2015) emerged as the top performer in an unbiased benchmark comparison study in which we participated (Weirauch et al.Nature Biotech., 2013). It builds on the biophysical modeling principles of MatrixREDUCE (Foat et al.PNAS, 2005; Bioinformatics 2006), but is more sophisticated and uses robust inference techniques that allow us to accurately capture dependencies between nucleotides in spite of data limitations and biases.

Discovery of PQM-1 as a key regulator of aging and longevity. Aging is fundamental to the human life cycle and intimately connected with disease. Understanding the genetic and molecular determinants of animal longevity will provide new avenues for slowing the negative effects of aging. Using the nematode C. elegans as a model organism, and applying a combination of computational and experimental methods, we discovered (Tepper et al.Cell, 2013) that the little-studied transcription factor PQM-1 is a key regulator of development and longevity, and the long-sought factor that binds the so-called DAF-16 associated element (DAE). As it turns out, PQM-1 complements the well-known aging transcription factor DAF-16/FOXO in many respects. Both act as transcriptional activators, but they control distinct sets of target genes (stress response vs. growth). Whether DAF-16 or PQM-1 is nuclear or cytoplasmic depends on the status of the insulin/IGF-1 signaling pathway, but in opposite ways. This causes only one of the factors to be active as a regulator at any given time, depending on the conditions (e.g. low nutrients) and genetic background (e.g., loss of daf-2 or daf-18/PTEN). At the same time, as we have shown, the two factors interact with each other in an essential way: loss of pqm-1 affects DAF-16 subcellular localization, and vice versa. The molecular mechanisms underlying these important processes, however, remain obscure. This work is done in close collaboration with Dr. Coleen Murphy at Princeton University.

Mapping tumorigenesis mechanisms using insertional mutagenesis. Having successfully pioneered an innovative methodology for mapping trans-acting loci that modulate transcription factor activity in yeast (Lee et al., Mol. Syst. Biol., 2010), we recently adapted it to the analysis of mouse tumorigenesis data. Each individual tumor harbors a unique combination of genetic lesions, which together are responsible for the aberrant behavior of its cells. Cancer-causing viruses have been used in mice to systematically sample this genetic diversity. Corresponding changes in global gene expression can be monitored using high-throughput technology. We developed a method – locus expression signature analysis (“LESA”) – that integrates information at the genetic and molecular level to construct a genomewide signature that captures the effect of an individual genetic lesion on the gene regulatory network of the cell. We demonstrated how these signatures can be exploited to gain insight into the regulatory pathways perturbed by each lesion, and suggest drugs that can counteract its effect (Lee et al., PNAS, 2014).

Chromatin context dependence of regulatory interactions. In collaboration with Dr. Bas van Steensel at the Netherlands Cancer Institute, we have investigated the influence of chromatin context on transcription factor binding in Drosophila. We found that most fly genes are organized into multi-gene chromatin domains bound by specific combinations of proteins; these domains are functionally coherent and evolutionary selection acts against chromosomal rearrangements that break them up. These findings have broad mechanistic implications for gene regulation and genome evolution (De Wit et al., PLoS Genet., 2009). In a related study, we analyzed the dependence of transcription factor function on local chromatin context, based on a classification of the Drosophila genome into five major “colors” (Filion et al., Cell, 2010).

Representative Publications: 

Two recent key achievements drive much of our current research agenda. First, we have uncovered a new structural mechanism by which cytosine methylation enhances protein-DNA interaction by narrowing the DNA minor groove (Lazarovici, 2013). We have also discovered that the unknown transcription factor PQM-1 is a crucial antagonist of the widely studied DAF-16/FOXO regulatory pathway that underlies genetic variation in organism lifespan (Tepper, 2013).

  • A. Lazarovici, R. Sandstrom, P.J. Sabo, A. Shafer, A.C. Dantas Machado, Remo Rohs, J. Stamatoyannopoulos, and H.J. Bussemaker. (2013) Probing DNA shape and methylation state on a genomewide scale with DNase I. Proc. Nat. Acad. Sci. U.S.A. 110(16):6376-81. PMCID: PMC3631675
  • A.C. Dantas Machado, T. Zhou, S. Rao, P. Goel, C. Rastogi, A. Lazarovici, H.J. Bussemaker, R. Rohs (2014) Evolving insights on how cytosine methylation affects protein-DNA binding. Brief. Funct. Genomics pii:elu040. PMCID: In Process
  • R.G. Tepper*, J. Ashraf*, R. Kaletsky, G. Kleemann, C. T. MurphyH.J. Bussemaker. (2013) PQM-1 complements DAF-16 as a key transcriptional regulator of DAF-2-mediated development and longevity. Cell, 154(3):676-90. PMCID: PMC3763726

Our influential strategy for discovering cis-regulatory motifs (Bussemaker, 2001) recently culminated in our design of the FeatureREDUCE algorithm for building accurate models of protein-DNA interaction specificity from protein binding microarray (PBM) data (Riley et al., 2015), which emerged as the top-performing algorithm in a recent benchmark study (Weirauch, 2013). We have also co-developed the SELEX-seq method, and applied it to understand cooperative DNA binding by complexes of Hox proteins and their co-factors, in close collaboration with the groups of Richard Mann (Columbia) and Remo Rohs (USC).

  • H.J. Bussemaker, H. Li, and E.D. Siggia (2001). Regulatory element detection using correlation with expression. Nature Genet. 27, 167-171. PMCID: not assigned
  • M.T. Weirauch, A. Cote, R. Norel, M. Annala, Y. Zhao, T.R. Riley, J. Saez Rodriguez, T. Cokelaer, A. Vedenko, S. Talukder, DREAM5 consortium, H.J. Bussemaker, Q.D. Morris, M.L. Bulyk, G. Stolovitzky, T.R. Hughes. (2013) Evaluation of methods for modeling transcription factor sequence specificity. Nat. Biotechnol. 31(2):126-34. PMCID: PMC3687085
  • T.R. Riley, A. Lazarovici, R.S. Mann, and H.J. Bussemaker. (2015) Building accurate sequence-to-affinity models from high-throughput in vitro binding data using FeatureREDUCE. eLife pii: e06397.
  • M. Slattery, T.R. Riley, P. Liu, N. Abe, P. Gomez-Alcala, R. Rohs*, B. Honig*, H.J. Bussemaker*, R.S. Mann*. (2011) Cofactor Binding Evokes Latent Differences in DNA Binding Specificity between Hox proteins. Cell 147(6):1270-82. PMCID: PMC3319069
  • http://bioconductor.org/packages/SELEX

Our work integrating data at different levels (sequence, transcription factor binding, mRNA expression, genetic variation) has led to pioneering strategies for estimating the protein-level activity of transcription factors, distinguishing functional from non-functional transcription factor binding (Gao, 2004), analyzing dynamic post-transcriptional regulation of mRNA stability (Foat, 2005), and using prior information about regulatory networks to discover and characterize trans-acting genetic loci (Lee, 2010; Lee, 2014).

  • F. Gao, B.C. Foat, and H.J. Bussemaker (2004). Defining transcriptional networks through integrative modeling of mRNA expression and transcription factor binding data. BMC Bioinformatics 5, 31. PMCID: PMC407845
  • B.C. Foat, S.S. Houshmandi, W.M. Olivas, H.J. Bussemaker (2005). Profiling condition-specific, genome-wide regulation of mRNA stability in yeast. Proc. Natl. Acad. Sci. U S A. 102(49):17675-17680. PMCID PMC1295595
  • E. Lee and H.J. Bussemaker. (2010) Identifying the genetic determinants of transcription factor activity. Mol. Syst. Biol. 6:412. PMCID: PMC2964119
  • E. Lee, J. de Ridder, J. Kool, L. Wessels, and H.J. Bussemaker (2014) Identifying regulatory mechanisms underlying tumorigenesis in mice. Proc. Nat. Acad. Sci. U.S.A. 111(15):5747-52. PMCID: PMC3992641

Finally, we have contributed to a better understanding of chromatin organization and the influence of local chromatin context on transcription factor function, as part of a long-standing and ongoing collaboration with the group of Bas van Steensel at the Netherlands Cancer Institute.

  • B. van Steensel, J. Delrow, and H.J. Bussemaker (2003). Genome-wide analysis of Drosophila GAGA Factor target genes reveals context-dependent DNA binding. Proc. Nat. Acad. Sci. U.S.A. 100, 2580-2585. PMCID: PMC151383
  • C. Moorman, L.V. Sun, J. Wang, E. de Wit, W. Talhout, F. Greil, K.P. White*, H.J. Bussemaker*, and B. van Steensel* (2006). Hotspots of transcription factor colocalization in the genome of Drosophila melanogaster. Proc. Natl. Acad. Sci. U S A., 103(32):12027-32. PMCID: PMC1567692
  • G.J. Filion, J.G. van Bemmel, U. Braunschweig, W. Talhout, J. Kind, L.D. Ward, W. Brugman, I. de Castro Genebra de Jesus,  R.M. Kerkhoven, H.J. Bussemaker, and B. van Steensel. (2010) Systematic protein location mapping reveals five principal chromatin types in Drosophila cells. Cell 143(2):212-24. PMC3119929
  • J. van Arensbergen, B. van Steensel B, H.J. Bussemaker (2014) In search of the determinants of enhancer-promoter interaction specificity. Trends Cell Biol. pii:S0962-8924(14)118-4. PMCID: In Process

Complete list of publications in PubMed: http://www.ncbi.nlm.nih.gov/pubmed/?term=bussemaker+HJ

Business Office

Department of Biological Sciences
500 Fairchild Center
Mail Code 2401
Columbia University
1212 Amsterdam Avenue
New York, NY 10027

Academic Office

Department of Biological Sciences
600 Fairchild Center
Mail Code 2402
Columbia University
1212 Amsterdam Avenue
New York, NY 10027
212 854-4581