Repetitive DNA sequences in the immunoglobulin switch μ region form RNA-containing secondary structures and undergo hypermutation by activation-induced deaminase (AID). To examine how DNA structure affects transcription and hypermutation, we mapped the position of RNA polymerase II molecules and mutations across a 5-kb region spanning the intronic enhancer to the constant μ gene. For RNA polymerase II, the distribution was determined by nuclear run-on and chromatin immunoprecipitation assays in B cells from uracil-DNA glycosylase (UNG)–deficient mice stimulated ex vivo. RNA polymerases were found at a high density in DNA flanking both sides of a 1-kb repetitive sequence that forms the core of the switch region. The pileup of polymerases was similar in unstimulated and stimulated cells from Ung−/− and Aid−/−Ung−/− mice but was absent in cells from mice with a deletion of the switch region. For mutations, DNA was sequenced from Ung−/− B cells stimulated in vivo. Surprisingly, mutations of A nucleotides, which are incorporated by DNA polymerase η, decreased 10-fold before the repetitive sequence, suggesting that the polymerase was less active in this region. We propose that altered DNA structure in the switch region pauses RNA polymerase II and limits access of DNA polymerase η during hypermutation.
Class switch recombination is initiated by activation-induced deaminase (AID) in switch (S) regions preceding each CH gene (1). AID acts by deaminating cytosine to uracil in DNA, and the rogue uracil causes mutations and DNA strand breaks by error-prone processing. Mutations of C:G bp could be produced by replication past uracil or an abasic site caused by removal of uracil by uracil-DNA glycosylase (UNG). Mutations can also arise from the combined actions of MSH2-MSH6 mismatch repair proteins and the low fidelity DNA polymerase (pol) η to produce mutations of A:T bp (2). Strand breaks produced by cleavage of abasic sites on both strands by an abasic endonuclease (3) would produce substrates for nonhomologous end joining to different S regions. In the S region preceding the Cμ gene, mutations start downstream of an intronic (Iμ) promoter located in the enhancer (Eμ) region (4), accumulate for 2 kb, and then decrease before the Cμ gene (5). A similar pattern of mutations occurs downstream of intronic promoters in the S regions of other CH genes as well (5). This distribution supports the prevailing hypothesis that AID either travels with the transcription complex or is brought to the single-strand regions formed during transcription (6, 7).
The Sμ DNA sequence is unique in two aspects: it contains an abundance of hot spot motifs, and it has a stable secondary structure that exposes single-strand DNA for AID to act on. The substructure of this region contains a 1-kb highly repetitive sequence, which is comprised of tandem WGCW (W is A or T) motifs and clusters of three to four Gs on the nontranscribed strand. WGCW is a hot spot for AID activity, and the G clusters have been proposed to form secondary structures during transcription, such as R-loops and G-loops containing G-rich RNA and C-rich transcribed DNA (8, 9). Although the nontranscribed strand is mostly single stranded in these structures, the spectrum of mutations in Ung−/− Msh2−/− mice indicates that both strands are targeted for mutation by AID (5). The mechanism by which AID deaminates the transcribed strand containing an RNA-DNA hybrid is unknown, although three theories have been proposed: (a) the DNA upstream of an elongating RNA pol II may be supercoiled and unwound, which would allow AID access to both strands (10); (b) the DNA in R-loops may be collapsed by endogenous RNase H digestion, which would expose single-strand regions on the transcribed strand (11); and (c) single-strand DNA on the transcribed strand could be generated during antisense transcription. Antisense transcripts have been detected at low levels in S regions, but it is not known if they participate in hypermutation (12).
The DNA region upstream of the repetitive sequence has fewer WGCW motifs and G clusters, but it can also form secondary structure and undergo hypermutation (5, 13, 14). Huang et al. (11) have recently mapped the 5′ and 3′ boundaries of R-loops in the Sμ region and found that they begin 600 bp upstream and end 600 bp downstream of the 1-kb repetitive sequence. The existence of R-loops and associated single-strand DNA in the upstream region may also limit the participation of MSH2-MSH6 and DNA pol η during hypermutation, which would reduce the frequency of mutations of A:T bp.
RESULTS AND DISCUSSION
Sμ DNA structure affects the location of RNA pol II molecules
In view of the requirement of transcription for hypermutation and the possibility that AID may associate with the transcription complex, we systematically mapped the position of RNA pol II molecules across the Sμ locus using nuclear run-on and chromatin immunoprecipitation (ChIP) assays. The former measures RNA polymerase activity in a region for a given time, whereas the latter measures the amount of RNA pol II molecules bound to DNA at the instant of cell lysis. B cells from Ung−/− mice were studied because the entire Sμ region can be analyzed without deletions caused by switching (15). The 5-kb Sμ region was divided into a series of contiguous 500-bp probes for run-on analysis (Fig. 1 A, S1–9). However, the 1-kb region containing highly repetitive G:C-rich DNA could not be amplified by our PCR techniques, although we tried a variety of DNA polymerases and conditions (unpublished data). We prepared nuclei from naive B cells or cells stimulated ex vivo with LPS and IL-4 for 2 d and initiated de novo transcription by the addition of ribonucleotides and radioactive UTP for 30 min. The labeled nascent RNA molecules were hybridized to membranes containing DNA probes of the Sμ region, and membrane-bound radioactivity was quantified by phosphorimager analysis (Fig. 1 B). Data were normalized to β-actin transcription and corrected for dTTP content because radiolabeled UTP was incorporated. In nuclei from days 0 and 2, the strongest signals were obtained from probes S5 and S7, which flank the repetitive region and are within the R-loop region. Hybridizations were also performed with single-strand DNA probes made in M13 vectors. There was hybridization to transcribed strand probes but no detectable hybridization to nontranscribed strand probes (unpublished data), which indicates that the majority of RNA transcripts are complementary to the transcribed strand, as expected.
To determine if the accumulation of polymerases around the repetitive region was caused by mutations initiated by AID, nuclear run-ons were performed in B cells from Aid−/−Ung−/− mice 2 d after stimulation. As seen in Fig. 1 C, there was still an accumulation of polymerases in S5 and S7 in the absence of AID. We also saw a similar pattern of hybridization in nuclei from hybridoma cells that do not express AID (unpublished data). There were twofold fewer transcripts in S5 and S7 in day-2 cells from Ung−/− mice, which express the AID protein, compared with day-0 and day-2 cells from Ung−/− and Aid−/−Ung−/− cells, respectively. This implies that AID affects the level of transcription, but the molecular basis for this is unknown. To see if the pileup of polymerases was directly caused by the Sμ DNA sequence, we performed run-on assays in cells from mice which had a 3.7-kb deletion in Sμ (Sμ del) (16) and have no detectable R-loop formation (11). As shown in Fig. 1 D, after 2 d of stimulation, there was no accumulation of RNA pol II when the Sμ sequence was virtually deleted.
To confirm the nuclear run-on data, the density of RNA pol II molecules was directly assayed by ChIP with anti-RNA pol II antibodies, which were used to enrich polymerase-bound chromatin. After purification, the associated genomic DNA was amplified by PCR with 10 primer sets located across the Sμ region (Fig. 2 A, Sa–j). The polymerase distribution across the region was the same in day-0 and day-2 activated cells. The highest levels of RNA pol II were detected in the amplicons that were closest to the repetitive region, Sg, Sh, and Si, which showed a twofold increase compared with adjacent amplicons. In contrast, there was no accumulation of RNA pol II in mice devoid of the Sμ region (Fig. 2 B). Therefore, both the nuclear run-on and ChIP assays indicate that RNA pol II accumulates close to the repetitive region. This pattern is directly related to the DNA sequence encoding Sμ and is not related to the activation status of the cells or to the presence of AID-dependent mutation. It should be noted that the quantity of RNA pol II flanking the repetitive region differs between the two assays in that the ChIP assay shows a twofold increase, whereas the run-on assay detects a four- to eightfold increase. This could happen if the ChIP assay depicts the loading of RNA pol II at a given instant, whereas the run-on assay shows the activity of the polymerases during a 30-min incubation time, which may produce a more intense signal, as demonstrated in Fig. S1. In both cases, the increase in polymerase density around the repetitive region supports the notion that DNA structure retards the movement of the polymerases.
To evaluate where the RNA pol II molecules initiated transcription, we measured the quantity of transcripts starting at the VDJ versus Iμ promoters. Expression of Iμ transcripts has been shown to be constitutive before and after mitogen stimulation (17), but a direct comparison to VDJ transcripts has not been performed. RNA was isolated from Ung−/− B cells and analyzed as described in the supplemental materials and methods. There was a threefold increase in RNA levels per cell 2 d after activation (Fig. S2 A), which is likely the result of increased transcription of many genes in cells preparing for G1 progression. For instance, transcription of the β-actin gene increased after stimulation as measured by PCR (unpublished data) and Northern assays (Fig. S2 B). Northern blots also showed that Cμ spliced transcripts originating from VDJ promoters were in much greater abundance compared with Iμ spliced transcripts before and after stimulation (Fig. S2 B). We then quantified the amount of transcripts per cell by absolute quantitative PCR (qPCR; Fig. 2 C) using B1-8 mice, which have a homozygous knockin of a rearranged VHJ558 gene (18). Total cellular RNA from cells before and after ex vivo stimulation was measured with VB1-8 and Iμ primers. Although these cells can switch, the primers should detect transcripts from any rearranged CH gene. The results show that VDJ transcripts increased twofold after ex vivo stimulation, whereas Iμ transcript levels remained constant. The data also show that spliced VDJ transcripts were 5–13-fold higher than Iμ transcripts before and after stimulation, respectively. Thus, transcripts initiating from the VDJ promoter may contribute substantially to forming secondary structure in the Sμ region.
Mutations increase before the Sμ repetitive region and decrease afterward
To evaluate the relationship between DNA structure and somatic hypermutation, we sequenced DNA in the Sμ region. This region has been previously sequenced in Ung−/−Msh2−/− mice, and it showed that the DNA sequence upstream of the repetitive region was targeted for deamination by AID, as measured by G:C to A:T transitions (5). However, somatic hypermutation also consists of mutations of A:T bp, which are made during error-prone processing of uracils by MSH2-MSH6 and DNA pol η. To assess where the A:T mutations occurred, we analyzed the pattern in Ung−/− mice, which have a normal frequency of A:T mutations (15), using DNA from Peyer's patch and immunized splenic B cells. A 4.3-kb region spanning Eμ to downstream of the repetitive region was amplified in ∼500-bp segments that corresponded to the location of probes used for the run-on analysis. As shown in Fig. 3 A and Fig. S3, mutations were sparse around the Eμ-Iμ region and increased sequentially up to the repetitive region, with the highest frequency in S5 (Fig. 3 B). Although we could not sequence the repetitive region because of difficulties in PCR amplification, a high frequency of mutations is likely to be found there as well. After the repetitive region, the frequency fell precipitously, with the exception of a hot spot at T in an AGGCTGGGA motif in S7. Because no mutations were seen in S8, we did not analyze sequences further downstream.
Sμ DNA structure may limit activity of DNA pol η
As expected in Ung−/− mice, the vast majority of G:C mutations were transitions (Fig. S4). After correction for nucleotide composition, it was noted that the frequency of mutations of C increased relative to mutations of G in S5 as recorded from the nontranscribed strand (Fig. 3 C), which has also been reported in previous studies of this region (19–21). This is consistent with R-loop structures exposing the nontranscribed strand for cytosine deamination. Furthermore, there was an unexpected decrease in the frequency of mutations of A bases in S4 and S5. As shown in Fig. 3 D, the frequency of mutations of A plummeted 10-fold to just 3% before the repetitive region, even though the germline dA density was high, and the WGCW density increased only twofold. We conclude that DNA pol η, which incorporates mutations of A during somatic hypermutation, may not be active in S4 and S5. The observation that the decrease in mutations of A was greater than the decrease in mutations of T is also consistent with a role for DNA pol η to introduce mutations on the nontranscribed strand (22, 23). The decline in A mutations begins at the proposed start of R-loops (11), implicating DNA structure as a reason for reduced recruitment of pol η. One way this could happen is by the inability of MSH2-MSH6, which interacts with pol η (2), to recognize single-strand DNA within R-loops and thereby not enlist pol η. Alternatively, pol η could be at the site but unable to copy in a repair patch if the template-transcribed strand is stably complexed to RNA. In either case, pol η would have less activity on R-loop structures. A potential benefit to preventing repair and synthesis by pol η would be to retain uracils for removal by UNG and subsequent nicking of the abasic site to induce switch recombination.
Setting up the Sμ locus for AID
Our interpretation is that the DNA sequence of the Sμ region influences the location of RNA pol II and hypermutation. Recently, Conticello et al. (24) identified a protein that interacts with AID and affects hypermutation, class switching, and gene conversion, possibly via the transcriptional spliceosome complex. One speculative model incorporating DNA structure, transcription, and mutation is presented in Fig. S5. RNA pol II molecules move down the DNA and then pause when they encounter the R-loop region. At the other end of the sequence, they are no longer stalled and proceed to the Cμ gene. Why would the polymerases pause? R-loops contain long RNA transcripts forming a stable hybrid with the transcribed DNA strand, and RNA polymerases may slow down when trying to unwind the tight RNA-DNA hybrid. Such blocks in transcription have been reported in vitro when the nontranscribed strand is G rich and the transcribed strand is C rich and can form R-loops (25, 26). If AID associates with the transcription complex, it would be paused as well and have the opportunity to bind to DNA. Along these lines, a recent study has described an in vitro system where AID-generated mutations were increased when T7 RNA polymerase was stalled (27). The decrease in mutation frequency after the repetitive region could occur if prolonged stalling caused AID to dissociate from the transcription complex in an unknown manner, resulting in fewer mutations. Alternatively, the local concentration of AID molecules may become depleted after being bound to DNA in the repetitive sequence. This model could be tested by detecting AID molecules bound to DNA before and after the repetitive region.
We also show that secondary structure affects the activity of DNA pol η because mutations of dA decrease at the start of R-loop formation. Therefore, both the pileup of RNA polymerases and the reduction of dA mutations provide independent and functional measures of DNA secondary structure in Sμ. In contrast, frog Sμ regions, which do not have G-clusters or R-loops, do not switch as efficiently as mouse S regions (28). Thus, the mouse Sμ structure appears to be the perfect substrate to amass mutations and associated strand breaks for switching. The pattern of RNA pol II accumulation preexists before B cell stimulation at day 0, which is most likely caused by R-loop formation from RNA polymerases transcribing membrane IgM. Both the stalling of transcription complexes and presence of single-strand DNA on the nontranscribed strand would give AID molecules optimal access to initiate the mutagenic cascade. This secondary structure may also explain why mutations readily arise in the Sμ region but not in nearby variable regions in B cells stimulated ex vivo.
MATERIALS AND METHODS
Ung+/− mice on a mixed C57BL/6J × 129SV background were obtained from H. Krokan (Norwegian University of Science and Technology, Trondheim, Norway). Aid+/− mice on a C57BL/6 background were provided by T. Honjo (Kyoto University, Kyoto, Japan), courtesy of M. Scharff (Albert Einstein College of Medicine, Bronx, NY). B1-8 mice on a C57BL/6 background were obtained from R. Casellas (National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, MD). The mice were bred in our animal facilities and used at 3–6 mo of age. Spleens from Sμ del mice were provided by A. Khamlichi (16). All animal procedures were reviewed and approved by the National Institute on Aging Animal Care and Use Committee.
Stimulation of splenic B cells.
Spleen cells were treated with ACK lysing buffer (Quality Biological, Inc.) to lyse red blood cells, and B cells were purified by negative selection using anti-CD43 and anti-CD11b antibodies coupled to magnetic beads (Miltenyi Biotec). Cells were cultured at 106 cells/ml and stimulated with 5 µg/ml of Escherichia coli lipopolysaccharide serotype 0111:B4 (Sigma-Aldrich) and 5 ng/ml of recombinant mouse IL-4 (BD). The cells were harvested on days 0 and 2 after stimulation.
Isolation of nuclei and preparation of nuclear RNA were performed as previously described (29) with some modifications. In brief, ∼5 × 108 cells were pelleted, washed with PBS, resuspended in ice-cold lysis buffer (20 mM Tris-HCl, pH 7.4, 20 mM NaCl, 5 mM MgCl2, and 0.25% vol/vol NP-40), and incubated on ice for 10 min. Nuclei were spun down and resuspended in storage buffer (50 mM Tris-HCl, pH 8.0, 5 mM MgCl2, 0.1 mM EDTA-NaOH, pH 8.0, and 45% vol/vol glycerol). The nuclei in 200-µl aliquots were then mixed with 200 µl of reaction buffer (300 mM KCl, 10 mM MgCl2, and 1 mM each of ATP, CTP, and GTP) plus 500 µCi α-[32P]UTP (3,000 Ci/mmol, 10 mCi/ml; PerkinElmer) and incubated for 30 min at 30°C. Samples were then incubated with 100 U of RNase-free DNase I (Applied Biosystems) for 30 min at 37°C and 20 µg/ml of proteinase K for 45 min at 37°C. Finally, the labeled nascent RNA was purified by Sephadex G-25 column filtration (GE Healthcare) and hybridized to DNA probes as described in the following section.
Preparation of DNA probes and hybridization.
The Sμ region was amplified from genomic DNA with the primers listed in Table S1 and Taq DNA polymerase (Takara Bio Inc.). 1 μg PCR products was then printed on Hybond N+ positively charged nylon membranes (GE Healthcare) as described in the manufacturer's protocol (Bio-Dot Microfiltration apparatus; Bio-Rad Laboratories). The membranes were prehybridized overnight with 3 ml MicroHyb hybridization solution (Invitrogen), 10 µg of Cot DNA (Invitrogen), and 8 µg of poly A DNA (Invitrogen). The labeled RNA was then added in 1 ml of hybridization solution for 24 h in a rotisserie-style incubator at 62°C. The membranes were rinsed in 2× SSC and 0.1% SDS at 62°C, followed by washes in SSC and 0.1% SDS twice at 62°C. Membranes were exposed for 2 d and scanned using a Phosphorimager (GE Healthcare). ImageQuant software (GE Healthcare) was used to convert the hybridization signals into raw intensity values which were analyzed.
ChIP assays were performed as previously described (30), using anti-RNA pol II antibodies (Millipore). The DNA from immunoprecipitation was quantified using Pico green (Invitrogen) and amplified by qPCR using the primers listed in Table S2. Each primer set was examined for >90% amplification efficiency and for the lack of secondary products.
Total cellular RNA was collected from unstimulated (day 0) and stimulated (day 2) B cells from B1-8 mice. Complementary DNA (cDNA) was produced using 0.5 µg and 1.0 µg RNA and the iScript cDNA synthesis kit (Bio-Rad Laboratories). The Iμ standard was amplified using cDNA template and primers Iμ-5′ 51019 (5′-GCTTGAGTAGTTCTAGTTTCCCCAAACTTAAG-3′) and Iμ-3′ 50615 (5′-GAGTTGGTGGTTGGTCGTACAAGTTG-3′) and cloned into a pGEM-T-easy (Promega) cloning vector. Similarly, the VB1-8 standard was amplified using primers VB1-8-2F (5′-CTGAGCACACAGGACCTCACC-3′) and VB1-8-2R(5′-GGACTCACCTGAGGAGACTGTG-3′). For the standard curve, plasmid standards were linearized with ScaI-HF (New England Biolabs, Inc.) and purified by agarose gel electrophoresis and gel extraction (QIAGEN). qPCR was performed using IQ SYBR green supermix (Bio-Rad Laboratories) with primers Iμ-5′ 50827 (5′-CCAATACCCGAAGCATTTACAGTGAC-3′) and Iμ-3′ 50726 (5′-GTGAAGCCGTTTTGACCAGAATGTC-3′) and with primers VB1-8-3F (5′-GACGAGGCCTTGAGTGGATTG-3′) and VB1-8-3R (5′-CATGTAGGCTGTGCTGGAGG-3′). Each primer set was examined for >90% amplification efficiency and for the lack of secondary products. Reactions were performed using a 7900HT real-time instrument (Applied Biosystems) and data analyzed using SDS 2.3 software (Applied Biosystems).
Activated B cells were obtained from the Peyer's patches of three mice, and they were obtained from spleens of four mice immunized with sheep red blood cells for 1 mo and sacrificed 4 d after a boost. Cells were stained with phycoerythrin-labeled antibody to B220 (eBioscience) and fluorescein-labeled peanut agglutinin (PNA; E-Y Laboratories). The cells were separated by flow cytometry, and DNA was prepared from B220+PNA+ cells. Libraries were made by amplifying DNA using the primers listed in Table S1. 100 ng DNA was amplified using Pfx DNA polymerase and PCR enhancer (Invitrogen) in a 50-µl vol using the external primers for 35 cycles. Nested PCR was performed with 5 µl of the first reaction and the internal primers for another 35 cycles. The products were digested, cloned into pBluescript (Agilent Technologies), and sequenced. Only clones with unique patterns of mutation were recorded.
Online supplemental material.
The supplemental materials and methods describe RNA isolation and Northern assays. Fig. S1 presents a model to compare polymerase density by ChIP and run-on assays. Fig. S2 shows the promoter analysis of RNA by Northern blots. Fig. S3 shows the location of mutations in the Sμ region. Fig. S4 displays the spectrum of mutations for each base in the S1–7 regions. Fig. S5 shows a potential model for AID and DNA pol η activity in the Sμ region. Table S1 lists the primers used for sequencing and run-on assays. Table S2 lists the primers used for ChIP analysis.
We thank Joshua Erlandsen, William Yang, Jinshui Fan, and Nicholas Durham for assistance and advice and the Comparative Medicine Section for mouse breeding.
This research was supported entirely by the Intramural Research Program of the National Institutes of Health National Institute on Aging.
The authors have no conflicting financial interests.
D. Rajagopal and R.W. Maul contributed equally to this paper.
T. Chakraborty's present address is Immune Disease Institute, Harvard Medical School, Boston, MA 02115.