Control Mechanisms of Genome Expression
Introduction
We seek to fully understand the process of epigenetic regulation, transcription initiation, elongation, and termination, and RNA processing, which are all necessary to make functional RNAs from the genomes. Whereas cells composing our body have the same set of genes, they exert diverse functions. This is made possible by the elaborate cellular machinery that controls spatiotemporal expression of the genome. We have identified and characterized a set of protein factors involved in this process. We are also trying to understand how diverse transcriptomes are generated from the genome.
Regulatory mechanisms of transcription elongation
Transcription occurs in three steps: (i) the initiation step, in which RNA polymerase II (Pol II) binds to the promoter region of a gene and starts RNA synthesis, (ii) the elongation step, in which Pol II actively synthesizes RNA in a 5’-to-3’ direction, and (iii) the termination step, in which Pol II reaches the termination site and is released from DNA. These steps are accomplished by numerous factors that bind to DNA or Pol II. Turning the clock back to the 1990s, research on the transcription initiation stage was in full bloom, and it was widely believed that the transcription initiation stage was the key to on/off regulation of gene expression in eukaryotes.
Under these circumstances, our laboratory turned its attention to DRB, a specific inhibitor of RNA polymerase II transcription, which, when added to cultured mammalian cells, was reported to inhibit the synthesis of long transcripts and cause the accumulation of short transcripts. Thus, it was hypothesized that DRB might inhibit the transcription elongation step. Surprisingly, however, DRB did not inhibit purified RNA polymerase II. We therefore sought to elucidate the mechanism of DRB-mediated transcription inhibition using a biochemical approach.
As a result, we identified two new transcription elongation factors, DSIF (SPT4-SPT5) and NELF. We also found that P-TEFb, a transcription elongation factor discovered by David Price's lab, works antagonistically with DSIF and NELF. In other words, DSIF and NELF bind to RNA polymerase II immediately after transcription initiation and act as a brake to inhibit transcription elongation. P-TEFb (CDK9-Cyclin T) has protein phosphatase activity and phosphorylates the C-terminal domain (CTD) of RNA polymerase II and the C-terminal domain (CTR) of DSIF, thereby inducing the release of NELF and reactivating transcription. It has become clear that P-TEFb-mediated phosphorylation triggers the recruitment of additional protein factors that bind to CTD and CTR and results in the formation of a mature transcription elongation complex.
The field of transcription elongation research has evolved with the identification of numerous transcription elongation factors. Although it has become a well-established field such that university textbooks write about it, it continues to be an active research field with many papers dealing with its functional and structural studies published in top journals.
Physiological significance of elongation control
Apart from mechanistic studies on transcription elongation, research aimed at elucidating its physiological significance is being conducted in parallel. In addition to research conducted using cultured human cells as a model system, there are also studies that are conducted as collaborative research using various model organisms.
The first obvious physiological significance of elongation control is its role in inducing transcription of a group of genes that are rapidly expressed in response to external stimuli, collectively referred to as immediate-early genes. The physiological significance of elongation control is considered to stall Pol II just before comleting mRNA synthesis, keeping chromatin in an active state so that transcription can be resumed promptly when an external stimulus arrives, and studies of genes rapidly expressed in response to heat shock, growth factors, hormones, and so on have indeed confirmed this idea.
Collaborative studies using zebrafish and Drosophila have also revealed that elongation control plays an important role in the development and differentiation of the central nervous system. The expression of a large number of cell type-specific genes fluctuates dynamically during development and differentiation; therefore, a situation similar to that of immediate-early genes seems to occur.
In addition, the recent development of next-generation sequencers has made it possible to obtain detailed information on the entire genome. This has made it possible to study the behavior of Pol II and transcription factors over individual genes, which is unapproachable by biochemistry. In fact, it has become clear that the regulation of transcription elongation by DSIF and NELF is important for overall genome expression.
Regulatory mechanisms of transcription termination: how diverse transcriptomes are generated
Our lab is also actively investigating the mechanism by which transcription termination control gives rise to diverse transcriptomes. Pol II-transcribed genes have three distinct 3' end processing pathways, which are normally selected for each gene. In principle, 3' end processing is tightly coupled to transcription termination, 3' end processing inducing transcription termination. We have recently obtained evidence that factors thought to be involved in transcription initiation and elongation (NELF, CBC, Mediator, LEC, and so on) are also involved in the selection of the three processing pathways. Moreover, many genes have multiple transcription termination sites, and we are identifying and analyzing factors involved in the selection of these sites. It is likely that diverse transcriptomes are generated from the same genome in different cell types in part through the regulation of the transcription termination steps.
We are trying to address these challenges by combining genomic and transcriptomic analysis using next-generation sequencers with proteomic analysis using state-of-the-art mass spectrometry and genetic methods using CRISPR/Cas9. This is where the bioinformatics approach becomes very important.