Information

How correlated are proximally related CpG sites in human DNA?

How correlated are proximally related CpG sites in human DNA?


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Cytosine residues in DNA that can be methylated (i.e. CpG sites) are likely to be in the same methylation state if they are geographically (proximally) close together.

I can only find one paper that states this empirically (1), that 90% of CpG sites within 50bp of one another are in the same methylation state - see below graph.

However this study is quite dated now (in the fields of epigenetics at least), and is very general.

I would like to know whether CpG's in different regions of the genome are more/less likely to be correlated than in other regions (say, exons vs. promoter regions). It would also be interesting to know whether the correlations change with age, and whether this is related to any disease processes?

Thanks for the input.


  1. Eckhardt, et al, 2006. Nature Genetics. doi: 2010.1038/ng1909

I've got to dash off so I won't be able to give a fully in-depth answer today, but this basically boils down to the concept of CpG islands. Something like 70-80% of CpGs are methylated in humans, so if they were randomly scattered around the genome there is already a pretty high chance nearby CpGs are in the same state. However, because of CpG islands, CpGs of similar state are indeed grouped together, depending on promoter activity (to simplify things a bit).

That being said, if you look for articles that cite your article, a plethora of useful references show up. In particular, this bad boy has a handy figure 1 which displays the overall landscape. Maybe that's too broad of a view, but in general, like methylation patterns cluster together; otherwise, there'd be no such thing as a methylation pattern! This paper has a neat figure which sums it all up nicely:

Basically, it's usually high-methylation or low-methylation; rarely is there anything in between. There's a lot more in there so I'd recommend going through some of them; I'll try to get back to this later.


Correlation of Smoking-Associated DNA Methylation Changes in Buccal Cells With DNA Methylation Changes in Epithelial Cancer

Importance: The utility of buccal cells as an epithelial source tissue for epigenome-wide association studies (EWASs) remains to be demonstrated. Given the direct exposure of buccal cells to potent carcinogens such as smoke, epigenetic changes in these cells may provide insights into the development of smoke-related cancers.

Objective: To perform an EWAS in buccal and blood cells to assess the relative effect of smoking on the DNA methylation (DNAme) patterns in these cell types and to test whether these DNAme changes are also seen in epithelial cancer.

Design, setting, and participants: In 2013, we measured DNAme at more than 480,000 CpG sites in buccal samples provided in 1999 by 790 women (all aged 53 years in 1999) from the United Kingdom Medical Research Council National Survey of Health and Development. This included matched blood samples from 152 women. We constructed a DNAme-based smoking index and tested its sensitivity and specificity to discriminate normal from cancer tissue in more than 5000 samples.

Main outcomes and measures: CpG sites whose DNAme level correlates with smoking pack-years, and construction of an associated sample-specific smoking index, which measures the mean deviation of DNAme at smoking-associated CpG sites from a normal reference.

Results: In a discovery set of 400 women, we identified 1501 smoking-associated CpG sites at a genome-wide significance level of P < 10-7, which were validated in an independent set of 390 women. This represented a 40-fold increase of differentially methylated sites in buccal cells compared with matched blood samples. Hypermethylated sites were enriched for bivalently marked genes and binding sites of transcription factors implicated in DNA repair and chromatin architecture (P < 10-10). A smoking index constructed from the DNAme changes in buccal cells was able to discriminate normal tissue from cancer tissue with a mean receiver operating characteristic area under the curve of 0.99 (range, 0.99-1.00) for lung cancers and of 0.91 (range, 0.71-1.00) for 13 other organs. The corresponding area under the curve of a smoking signature derived from blood cells was lower than that derived from buccal cells in 14 of 15 cancer types (Wilcoxon signed rank test, P = .001).

Conclusions and relevance: These data point toward buccal cells as being a more appropriate source of tissue than blood to conduct EWASs for smoking-related epithelial cancers.


Spatiotemporal specificity of correlated DNA methylation and gene expression pairs across different human tissues and stages of brain development

DNA methylation (DNAm) that occurs on promoter regions is primarily considered to repress gene expression. Previous studies indicated that DNAm could also show positive correlations with gene expression. Both DNAm and gene expression profiles are known to be tissue- and development-specific. This study aims to investigate how DNAm and gene expression are coordinated across different human tissues and developmental stages, as well as the biological significance of such correlations. By analyzing 2,239 samples with both DNAm and gene expression data in the same human subjects obtained from six published datasets, we evaluated the correlations between gene and CpG pairs (GCPs) at cis-regions and compared significantly correlated GCPs (cGCPs) across different tissues and brains at different age groups. A total of 37,363 cGCPs were identified in the six datasets approximately 38% of the cGCPs were positively correlated. The majority (>90%) of cGCPs were tissue- or development-specific. We also observed that the correlation direction can be opposite in different tissues and ages. Further analysis highlighted the importance of cGCPs for their cellular functions and potential roles in complex traits and human diseases. For instance, early developmental brain possessed a highly unique set of cGCPs that were associated with neurogenesis and psychiatric disorders. By assessing the epigenetic factors involved in cGCPs, we discovered novel regulatory mechanisms of positive cGCPs distinct from negative cGCPs, which were related to multiple factors, such as H3K27me3, CTCF, and JARD2. The catalog of cGCPs compiled can be used to guide functional interpretation of genetic and epigenetic studies.


Results

General approach for covalent capture of genomic CpG sites

Previously, we demonstrated that the HhaI DNA cytosine-5 methyltransferase, which serves as a structural and mechanistic paradigm for this class of enzymes, can be engineered to direct efficient transfer of extended linear groups from synthetic AdoMet analogues onto the GCGC sites in DNA 23,24 . Following this concept, we engineered the CpG-specific cytosine-5 MTase SssI (M.SssI) 25 , by site directed mutagenesis of two conserved positions in the cofactor-binding pocket (Q142A/N370A). The recombinant His-tagged protein was expressed in E. coli and isolated in an AdoMet-free form (Supplementary Methods). We have also synthesized a series of optimized AdoMet analogues that contain a sulfonium-bound 6-substituted hex-2-ynyl side chain 26 . The engineered M.SssI (eM.SssI) exhibited a higher than 100-fold increase in the alkylation activity with these synthetic cofactors (Supplementary Fig. S1) as compared with native M.SssI. As the MTase-directed reactions are highly specific with respect to the target sequence, modified residue and atomic position 22,27 , the M.SssI-directed mTAG labelling selectively tags all unmodified and hemimethylated CpG sites 25 and excludes methylated target sites (5mCpG) in gDNA.

In this study, two biotin conjugation chemistries were explored (Fig. 1b). In the first series, conventional chemoselective coupling of a primary alifatic amine group, which are absent from native DNA, with a biotin probe carrying an N-hydroxysuccinimide (NHS) group was employed (Supplementary Fig. S2). Alternatively, we used a fully bioorthogonal copper-free click-chemistry, namely Huisgen 1,3-dipolar cycloaddition of azide to a ring-activated alkyne, dibenzocyclooctyne group (DBCO) 28 . In both cases, a biotin linker containing a cleavable S-S bond was present to facilitate the detachment of the captured DNA fragments.

(a) Flow diagram of the analytical procedure. gDNA is randomly sheared to short fragments (Step 1) and treated with an engineered SssI DNA methyltransferase (eM.SssI) and a cofactor analogue (Ado-6-amine or azide) to attach reactive groups to unmodified CpG sites (Step 2). The derivatized target sites are biotin-tagged using N-hydroxysuccinimidyl ester (Biotin-SS-NHS) (Step 3) and labelled fragments are selectively captured on streptavidin-coated magnetic beads (Step 4). Bound DNA fragments are recovered by cleavage of a disulphide bond in the biotin linker with DTT (Step 5). The enriched fragments are ligated to adaptors and PCR-amplified (Step 6) for microarray analysis or DNA sequencing (Step 7). (b) Covalent transformations during derivatization, biotin tagging and linker cleavage (Steps 2, 3 and 5) using amine-NHS (upper) or azide-DBCO (lower) conjugation chemistries (one of the two triazol regioisomers formed in Step 3 is shown) (c) Affinity capture of DNA fragments containing unmodified CpG sites. Reference DNA fragments, containing two or no CpG sites (2-CG and 0-CG, respectively) was each combined with 300 ng of sonicated human gDNA and processed as described (Steps 1–4) using the amine or azide conjugation chemistries as indicated. The efficiency of CpG capture is assessed by on-beads qPCR analysis of the reference DNA fragments. Error bars defined as ±s.d. from duplicate experiments.

MTAG labelling using amine-NHS chemistry

The general procedure for the enrichment of unmodified DNA consisted of five steps (Fig. 1a). Human gDNA was sonicated to yield short (50–300 bp) fragments (Step 1), aminoalkylated with eM.SssI and Ado-6-amine (Step 2), biotinylated with an amino-reactive reagent NHS-SS-biotin (Step 3) and captured on streptavidin beads (Step 4). In our initial control experiments that measured the levels of labelling and DNA capture (Steps 2–4) by quantitative PCR (qPCR), we designed a series of 200–230 bp DNA fragments containing none, 1, 2 or 4 unmodified CpG sites (probes 0-CG, 1-CG, 2-CG and 4-CG, respectively Fig. 2a). These DNA fragments were tested individually and as spike-ins to sonicated gDNA samples. Robust mTAG labelling produced CpG capture efficiencies around 90%, whereas a 0-CG spike was detectable at a level of

1% (Fig. 1c). In subsequent experiments designed to optimize the procedure, three levels of mTAG labelling intensity (as determined by the streptavidin capture efficiency of the 2-CG probe) were explored: 5–10% capture-low labelling 20–35% capture-medium labelling 60–80% capture-high labelling. It was found that, at medium labelling, the capture of the model DNA fragments linearly correlated with the number of unmodified CpG sites (Fig. 2b), an effect that persisted in the presence of methylated DNA fragments (Supplementary Fig. S3) and upon deep dilution with native gDNA (Fig. 2c). Efficient recovery of streptavidin-bound DNA (Step 5) was achieved via mild chemical cleavage of a disulphide bond in the biotin connector with dithiothreitol (DTT). The released DNA fragments retain only a part of the original linear side chain attached to the labelled cytosine residues (Fig. 1b and Supplementary Fig. S2), which did not interfere with downstream PCR amplification (Supplementary Fig. S4).

(a) Schematic of nonspecific (containing no CpG sites, 0-CG) and specific (containing 1, 2 or 4 CpG sites) DNA probes derived from the mouse genome for quantification of DNA using TaqMan qPCR. CpG sites are shown in black and locations of qPCR primers are shown as arrows. ‘m‘ denotes a premethylated CpG site. (b) DNA recovery through Steps 1–4. DNA probes (25 ng) as indicated were combined with 300 ng of sheared and blunt-ended gDNA and then mTAG-labelled at Medium Intensity (top, amine-NHS conjugation chemistry bottom, azide-DBCO conjugation chemistry). DNA was further processed as described in Methods and the amount of captured DNA was determined in qPCR analysis. (c) DNA recovery through Steps 1–4 in a series of 10-fold dilutions (1:10–1:1,000,000). A specified amount of the 2-CG probe was combined with 300 ng of sheared gDNA and mTAG-labelled at Medium Intensity (top, amine-NHS conjugation chemistry bottom, azide-DBCO conjugation chemistry). DNA was further processed as described and the amount of captured DNA was determined by qPCR analysis. Error bars defined as ±s.d. from duplicate experiments.

To assess the performance of this new technology for epigenome studies, we carried out further control experiments with sheared human gDNA in which all cytosine modifications were stripped by PCR amplification. These ‘fully unmodified’ DNA samples were mTAG-labelled, with separate aliquots for the low, medium and high labelling intensities and bound to streptavidin beads. The enriched DNA samples were PCR-amplified and analysed on a human genome tiling microarray (E array covering chromosomes 5, 7 and 16 from the Affymetrix 2.0 R whole-genome human tilling microarray set). Probe intensities for chromosome 5 (2.6 million probes covering 1,013 genes and 1,227 CpG islands) were scale normalized and averaged. Optimal performance was again observed at the medium labelling intensity of around 25%, although in general, variations in the range of 10–80% showed rather small changes in the mean signal profile (Fig. 3a). The relationship of the mean log array signal intensity versus the local CpG density was linear (r=0.99) in the range from 0 to 10 unmethylated CpG sites per 200 bp fragment (Fig. 3a), and then reached a plateau at higher CpG densities. Given that 87% of the genome contains 10 or less CpGs per 200 bp (Fig. 3b) and only 20–30% of CpGs are unmodified 29 , the overwhelming majority of native gDNA fragments should fall within the linear mTAG labelling and interrogation range.

Modification-devoid human gDNA (prepared by ligating double-stranded adaptors to sheared gDNA, followed by PCR amplification for 15 cycles as described in Methods) was mTAG-labelled using amine-NHS chemistry at different labelling intensities (low labelling intensity, LL medium labelling intensity, ML high labelling intensity, HL), enriched on streptavidin beads and analysed on DNA microarrays representing chr 5. (a) Mean log ratios of microarray intensities, normalized to unlabelled control (0% labelling intensity) and plotted against the number of CpG sites in 200 bp-sized windows. (b) Distribution of CpG sites in 200 bp fragments of different CpG content on human chromosome 5. (c) Box diagram of the array intensity log ratios for a ML experiment plotted against the number of CpG target sites. The bottom and the top of boxes define the first and third quartiles, whiskers mark the lowest and highest data points that are within a 1.5-fold interquartile range.

Next, we analysed native gDNA samples using mTAG-based enrichment coupled with interrogation on tiling arrays (mTAG-chip). mTAG enrichment of the DNA unmethylome from human tissues showed low technical variation (typical correlations in the range of 0.89–0.93). Analysis of the mTAG-chip profiles consistently detected known unmethylated genomic regions (Supplementary Fig. S5) that showed strong associations with histone acetylation and H3K4 methylation—marks of active promoters and functional enhancers 30 (Supplementary Fig. S6).

We also performed a rigorous quantitative comparison of the mTAG approach with published data sets from methylation-sensitive restriction enzyme sequencing (MRE-seq), MBD-seq and MeDIP-seq experiments 8 using IMR90 and H1 MethylC-seq maps 3 as the gold standard. For this, fetal lung fibroblast (IMR90) gDNA (gift of R. Lister) was assessed in the mTAG procedure. Correlation analyses with the IMR90 MethylC-seq map at sequencing depths of >5, >10 or >15 reads (with effective genome coverage of 62%, 35% and 19%, respectively) were carried out for 1,000, 400 or 200 bp-sized windows and were stratified across deciles of local CpG densities (Fig. 4a Supplementary Fig. S7). Depending on the decile, correlation coefficients varied from 0.14 to 0.31 (Fig. 4a). In MRE-seq, MBD-seq and MeDIP-seq experiments on H1 human embryonic stem cells 3 with the corresponding reference MethylC-seq map 8 , we found correlations to be close to 0, except at the highest CpG density decile (Fig. 4a). Altogether, mTAG-chip proved superior to the other methods in 8 or even 9 CpG density deciles representing 80–90% of the human genome and 50–68% of all CpGs (Supplementary Table S2). Similarly, a concordance analysis showed that mTAG-chip achieves better parameters than MeDIP-chip (Fig. 4b). Notably, the overall precision in mapping the DNA methylome increases significantly in an ‘integrative’ mTAG/MeDIP-chip experiment, even in the regions of higher CpG density, where both techniques are similarly faithful.

(a) Pearson correlations between experimental mTAG-chip (amine-NHS chemistry), MeDIP-chip of IMR90 gDNA and published data of MeDIP-seq, MBD-seq, MRE-seq analysis of H1 gDNA 8 were determined for 1 kb tiles on chr4 against the corresponding MethylC-seq data 3 and stratified according to local CpG density. Mean log ratios of probes for chip data or mean numbers of reads for seq data in the tiles were calculated and correlated to mean methylation scores of the MethylC-seq data (minimum 10 reads) using Pearson correlation. Missing-value tiles were excluded, and non-CG methylation sites in the H1 MethylC-seq data were removed before averaging and correlation with the MRE-seq and MBD-seq data. Aggregate correlation numbers (r) obtained with each analytical procedure are shown above the plots. (b) Three-way concordance analysis of mTAG-chip and MeDIP-chip approaches with MethylC-seq. Mean log ratios of the probes in 1 kb tiles were calculated and a methylation type of a tile defined as follows: weak methylation ?25% of the signal distribution partial methylation=25%<signal<75% of the signal distribution high methylation=signal>75% of the signal distribution. Concordance with the bisulfitome data (at >5 reads) covering human chr4+chr15+chr18 was determined if the type matched with that of the MethylC-seq call random calls give a concordance of

0.375. Data stratified according to the number of CpG sites per tile. (c) Correlation of mTAG-chip and MRE-chip versus mTAG-seq using Gaussian kernel smoothing. Gaussian kernel smoothing was used to examine Pearson correlations between mTAG-seq and mTAG-chip (click chemistry) or MRE-chip data representing human brain DNA unmethylomes. In both cases, the correlation increases with increased kernel bandwidth until it reaches a plateau at a bandwidth of around 1.9 kb for mTAG-seq and 3.1 kb for mTAG-chip data.

The large difference in correlation between mTAG-chip and the MRE-, MeDIP- and MBD-sequencing-based methods prompted us to verify if the observed differences arose because of distinct platforms. We therefore performed DNA methylome analysis of the IMR90 gDNA using MeDIP-chip. The observed MeDIP-chip correlations were lower than the ones of the mTAG-chip, however, the former showed significantly higher correlations to MethylC-seq in comparison to the MRE-seq, MBD-seq and MeDIP-seq data sets (Fig. 4a Supplementary Fig. S7). The reasons for the low correlation (<0.4) between the MethylC-seq and the enriched microarray and sequencing-based data sets were not completely clear. In part, it may derive from an insufficient x-fold coverage of the methylome in the MethylC-seq experiment (only 19% of the genome covered at >15 reads), which is required to offset an inherent unevenness of DNA sequencing 31 . Depending on the degree of intra-individual variation of DNA modification, the minimal coverage may vary dramatically from locus to locus and may often require as many as 50–60 reads 32 .

MTAG labelling using azide-alkyne cycloaddition

In the second part of the study, we introduced a bioorthogonal copper-free click-reaction 28 for mTAG labelling (Fig. 1b, bottom). The analytical procedure remained essentially the same, except that a different AdoMet analogue (Ado-6-azide, Supplementary Fig. S1a) and a matching biotin reagent (DBCO-SS-biotin) were used in Step 2 and Step 3 (Fig. 1a), respectively. Control qPCR experiments using the 200–230 bp DNA fragments showed a nearly identical labelling efficiency but offered a 10-fold reduced background labelling as compared with the previous conjugation (Fig. 1c). Other technical parameters appeared identical with both chemistries (Fig. 2 Supplementary Fig. S3) except that amplification of the enriched fragments after chemical cleavage from the beads was slightly reduced in the latter approach (20–25% drop in amplification efficacy over two targets sites modified at High labelling intensity, Supplementary Fig. S4). At Medium labelling intensity, an effect of similar magnitude would be expected for fragments containing many (>4) unmethylated CpG sites. This slight impairment most likely derives from the fact that a larger chemical group remains attached to the DNA after the chemical cleavage of the S-S bond in the biotin linker (Fig. 1b), which may impede a DNA polymerase during the initial cycles of PCR. However, no detectable gross effect was reported in the case of TAB-seq 12 and TAmC-seq 33 methods in which an even bulkier linker group (glucose-azide-DBCO) remained in the released DNA.

We further adapted the Click version of mTAG for large-scale studies using a 96-well plate format (Supplementary Methods). We then analysed mTAG-enriched gDNA samples on the microarrays. Comparisons between mTAG-chip and MeDIP-chip were made using gDNA from human brain and sperm. In our hands, samples enriched with the mTAG technique displayed better hierarchical clustering than MeDIP-enriched samples (Supplementary Fig. S8). The MeDIP and mTAG technique target different CG sites, methylated and unmodified, respectively. This was confirmed in a tiling microarray experiment examining chromosomes 10, 13, 14 and 17. As expected, negative correlations were observed in the brain (r=−0.52) and sperm (r=−0.35) samples between mTAG and MeDIP-chip probes with high signal intensities.

It was previously shown that certain bacterial C5-MTases can catalyse sequence-specific removal of the 5-hydroxymethyl group from a hmC residue yielding unmodified C 34 . If this also occurs under conditions of mTAG labelling of gDNA, the dehydroxymethylated CpG will be spuriously labelled as unmodified cytosine, and enriched. In control experiments with a PCR-generated 190 bp DNA fragment containing a single modified CpG site, a low amount of labelling was detected at the hmC, which became negligible in a reaction with pH=6.5 (Supplementary Fig. S9). To examine if this side reaction creates a measurable difference in human brain gDNA, which exhibits higher hmC content compared with other tissues 35,36 , the DNA was treated with M.SssI for an extended period of time. hmC sites were measured before and after M.SssI treatment using an assay where hmC glucosylation was coupled with MspI restriction enzyme digestion 37 . The microarray data (chromosome 1 and 6, Supplementary Fig. S10) showed that the number of hmC sites detected in M.SssI-exposed and control samples is the same within error, suggesting a negligible, if any, contribution of this side reaction to genomic analyses. Although caC sites also show some labelling in control experiments (Supplementary Fig. S9), their extremely low abundance in gDNA 36 should not affect routine epigenomic studies.

We further assessed DNA unmethylome profiles in the post-mortem human brain using the mTAG-seq approach. In this experiment

100 million reads per sample were generated. Biological and technical replicates showed similarity but were clearly distinct from a non-enriched control in all validation measures, including epigenome-wide pairwise correlation, mapping statistics, standard browser and whole-chromosome profiles (Supplementary Figs S11–S14). The mTAG-seq results were also globally compared with those of mTAG-chip and MRE-chip experiments performed on the same brain gDNA samples. We used Gaussian kernel smoothing, which takes into account regional DNA modification effects, and detected that correlation between the microarray and mTAG-seq data sets gradually increased and reached a maximum (r= 0.78 and 0.49 for mTAG-chip and MRE-chip, respectively) when the size of the kernel bandwidth expanded to 1.9 kb for mTAG-seq and 3.1 kb for the mTAG-chip data (Fig. 4c). The kernel bandwidths in the mTAG-seq and mTAG-chip experiment distributed 50% of its weight in a window of

1.5 kb, respectively, which likely reflects an inherent 1–2 kb CpG co-methylation in the human genome 38 . mTAG-seq and mTAG-chip correlation coefficients varied for different GC content regions (r=0.47, 0.85 and 0.75 for probes with low, medium and high GC content, respectively Supplementary Fig. S15). MRE-chip showed a substantially weaker overall correlation with mTAG-seq (r=0.44).

Consistent with the ability of mTAG to enrich for unmethylated genomic regions, the relative mTAG-seq density in the brain throughout different gene-associated regions showed inverse features compared with modified cytosine maps generated by bisulfite-sequencing of the H1 cell line 3 (Fig. 5a). At the level of individual genes, mTAG-seq profiles of typical protein coding genes showed that the promoters and CpG islands were unmethylated, consistent with high MRE-seq and no MeDIP-seq signal (Fig. 5b). We also demonstrated that the mTAG approach can identify and map the unmodified genomic retroelements. L2b, a non-long terminal repeat retrotransposon (chr19: 41257434–41257808, 11 CpGs), appeared to be unmethylated in the brain as supported by the lack of MeDIP reads and a detectable peak in the MRE profile (Fig. 5c). Another example, MLT1B—a mammalian long terminal repeat retrotransposon (chr14: 106804419–106804800, 3 CpGs), was not detectable by MRE due to the scarcity of suitable restriction endonuclease sites and had a very weak MeDIP signal (Fig. 5c). In both cases, strong mTAG signals illustrated a positive display of these fairly abundant differentially modified DNA retroelements. A more complex situation is shown in Fig. 5d where seven SINE elements were marked by both mTAG-seq and MeDIP-seq in the brain samples, suggesting their intra-individual epigenetic variation.

(a) mTAG-seq and MethylC-seq profiles over a mega-gene. Left, mTAG-seq signal density (unmethylome) displayed with GenePlot 50 throughout different gene-associated regions. Gene annotations were obtained from UCSC Genome Browser. Promoter was defined as 1 kb upstream from transcript start site. Promoters, transcription start sites and 5′-UTRs all displayed increased mTAG-seq signal compared with gene bodies, consistent with hypomethylation of these regions. Right, MethylC-seq profile (methylome) of H1 human embryonic stem cells over a composite gene (replot of Fig. 3a from Lister et al. 3 ). (b) mTAG-seq profile of two typical protein coding genes. Genome browser view of mTAG-seq data over typical genes in comparison to MeDIP-seq and MRE-seq of the brain 51 . The upper panel displayed MLH1, and the lower panel displayed SHANK3. Promoters of both genes were unmethylated, consistent with high mTAG-seq, high MRE-seq, and no MeDIP-seq signal. The gene body of SHANK3 contained several unmethylated CpG islands, which exhibited high mTAG-seq signal. (c) Unmethylated retroelements revealed by mTAG-seq. Genome browser view of two retroelements. The upper panel displayed L2b. The element was marked by high mTAG-seq, high MRE-seq and low MeDIP-seq signal. These three methods were in agreement in supporting the unmethylated status of the element. The lower panel displayed MLT1B. The element was marked by high mTAG-seq and low MeDIP-seq signal, indicating a hypomethylated status. MRE-seq produced no signal over this region due to the low abundance of restriction endonuclease sites. (d) Partially methylated retroelements revealed by mTAG-seq. Genome browser view of seven SINE elements. These elements were consistently marked by mTAG-seq in two brain samplesthe same elements displayed a MeDIP-seq signal, altogether suggesting they were partially methylated (contained both unmodified and methylated CpGs). A non-enriched input track (mTAG Input) was also shown indicating that the coenrichment of the MeDIP-seq and mTAG-seq signals over these SINE element was not an artifact of sequence alignment.


Discussion

KLF7 is a highly conserved gene in humans and animals [23, 24]. Previous reports on mammalian species showed that KLF7 regulated neuroectodermal and mesodermal development [2] and played a role in obesity [6], T2DM [7, 8], and blood disease [9]. Our previous study showed that KLF7 was an important regulator in chicken adipose tissue development [17, 18]. Currently, the results showed that KLF7 transcripts of the adipose tissue of Chinese fast-growing yellow broilers were associated with age, in line with our previous report in white broilers [17]. Although there was no biologically significant difference in KLF7 transcripts of the weeks of 2, 4, and 6, there was a downward trend at the week of 6. The decline in KLF7 expression during the weeks of 2, 4, and 6 might be due to the suppression function of KLF7 in the formation of adipose tissue at the early stage [17]. In addition, the increase in KLF7 transcripts at the age of 8 weeks suggested that chicken KLF7 might have a function in mature adipose tissue, similar to the report on its orthology in human in vitro [7, 8].

The level of blood glucose was changeable and associated with insulinaemia in chicken [25]. Here, the results showed that chicken KLF7 transcripts were correlated with fasting blood glucose level, in line with our previous reports in white broilers that the chicken KLF7 was involved in the regulation of adipogenesis and blood metabolic indicators [17, 18], and provided additional evidence for the role of KLF7 in metabolic syndrome from the perspective of non-rodent model animals. In addition, the previous study showed that chickens selected for low fasting glycaemia (LG) were fattier than their counterparts selected for high fasting glycaemia (HG) [25]. The negative correlation of KLF7 expression to the fasting glycaemia in chicken and abdominal fat content [17, 18], suggested that a feedback regulation among KLF7 expression, glycaemia, and obesity might exist in chicken (Fig. 6).

Schematic representation of the relationship of DNA methylation, KLF7 transcripts, glycaemia and abdominal fat content in chicken. The negative associations marked red and blue were confirmed by the reference [17, 25], respectively

DNA methylation is important to the regulation of gene expression and function in animals [19]. The previous studies in humans showed that DNA methylation of KLF7 was associated with the occurrence and development of gastric cancer [20, 21], however, there is no report on the DNA methylation of KLF7 in adipose tissue and birds. Sequence analysis showed that there was a similar distribution of CpG-rich sequence in chicken KLF7 as that of human KLF7, suggested that the chicken KLF7 might be also regulated by DNA methylation.

Sequenom MassArray were used to study the DNA methylation in the promoter and Exon 2 of chicken KLF7. A total of 22 valid datasets were obtained, and the level of DNA methylation in the promoter was lower than those in Exon 2. This was probably because it is a consecutively expressed gene during adipogenesis [17] therefore, the promoter of chicken KLF7 could not be strongly silenced by a long-term mechanism like DNA methylation in adipose tissue.

In addition, none of the loci detected were significantly different among the ages of 2, 4, 6, and 8 weeks, indicated that the DNA methylation might not be a main regulation method of the KLF7 expression in adipose tissue during development.

The association analysis showed that only the methylation of PCpG6 was significantly associated with KLF7 transcripts in chicken abdominal adipose tissue, and the contribution of PCpG6 to the variation on KLF7 transcripts was 0.2545. Sequence analysis showed that there were several binding sites of transcriptional factors at the locus of PCpG6, including TFAP2C and TFAP2A (supplementary Table 1), The negative correlation between DNA methylation of PCpG6 with KLF7 transcripts might be mediated by transcriptional factor. However, further investigation is needed to verify this hypothesis.

Our previous study showed that one SNP (c. A141G) in the KLF7 coding sequence was associated with blood very-low-density lipoprotein and abdominal fat content in broilers [18] and chicken KLF7 regulated the promoter activity of lipoprotein lipase (LPL) [17]. In the current study, the result showed that the methylation of E2CpG9 was significantly associated with blood HDL level, further suggested that KLF7 might play a role in the fat transport in chicken. In addition, the E2CpG9 was conserved between chicken and human KLF7s (supplementary Figure 1C), this result might provide a clue to the function of KLF7 in human.

To avoid the misinterpretation of the interaction effect of different loci and to discover the relationship between general DNA methylation and KLF7 transcripts, the methylation data were subjected to PCA. Fourteen effective principal components (z1–z14) were extracted from these 22 methylation data. Six principal components (z1–z6) were extracted from 14 principal components by factor extraction, and named Factors 1–6, respectively.

Factor analysis showed that Factor 1 had a higher load on the loci of PCpG3–PCpG9 and E2CpG4, indicating that Factor 1 (z1) mainly represents the effect of DNA methylation in the promoter. Factors 2–6 loaded highly with quite different loci in Exon 2 and PCpG2. Therefore, Factors 2–6 might represent the effect of DNA methylation in Exon 2, and there was a large difference in them.

Correlation analysis showed that the new variable z1 was negatively correlated with KLF7 transcripts, whereas none of these z2–z6 were significant correlated with KLF7 transcripts. In addition, the regression relationship between z1 and KLF7 transcripts was studied, and the contribution of z1 to the variation on KLF7 transcripts was 0.3429, which was greater than the contribution of the single locus PCpG6. Additionally, the ratio of the slope to truncation was about 9.5%, indicating the greatest effect that Factor 1 (z1) had on the KLF7 transcripts was about 9.5%. This was reasonable for effect of DNA methylation on gene expression, indicated that the KLF7 transcripts of chicken abdominal adipose tissue might be inhibited by DNA methylation in promoter. Sequence analysis showed that there were many binding sites of transcription factors at the loci of PCpG3–PCpG9 in chicken KLF7 promoter, respectively (supplementary Table 1), The inhibitory effect of DNA methylation on KLF7 expression might be achieved in part by affecting the binding of transcription factors to the KLF7 promoter, like the report on chicken ApoA-I [26].

There was no significant correlation between z1 and blood metabolic indexes. This might be because DNA methylation in chicken KLF7 does not directly take part in the regulation of blood metabolic indexes. However, further investigation is needed into whether an indirect association exists on DNA methylation in chicken KLF7 and blood metabolic indexes.


Conclusions

DNA melting is the rate-limiting step for 5mC deamination ( Lindahl and Nyberg 1974 Frederico, Kunkel, and Shaw 1990, 1993 Fryxell and Zuckerkandl 2000). Although this fact is well established, it does not necessarily follow that differential DNA melting is solely responsible for the correlation between CpG underrepresentation and GC content in the human genome. If it were solely responsible, then plots of the log10 CpG mutation rate versus GC content (expressed as a decimal fraction) should ideally have a slope of −3.0 ( Fryxell and Zuckerkandl 2000). Based on an analysis of SNP frequencies in the human genome, we show that the best-fit slope is actually −2.7 for all SNPs and −3.0 for intergenic noncoding DNA (excluding CpG islands). This shows that the slope of −3 could not be specifically caused by exons, introns, differential methylation of CpG islands, transcription-coupled DNA repair, or transcription-induced cytosine deamination. Nor could regional differences in DNA methylation explain our results, because the majority of the CpG dinucleotides throughout the human genome are methylated (see above). By comparing alternative methods of analysis, we further show that our results neither depended on the DNA lengths over which the GC content of neighboring sequences was measured nor were they an artifact of the chromosomes or sequence comparisons used to identify the ancestral base.

We did observe a small but significant correlation between GC content and the rates of transitions in GpC dinucleotides inferred from human SNPs. This is consistent with the biased gene conversion hypothesis ( Galtier et al. 2001), which implies that deamination events in both CpG and GpC dinucleotides would be “corrected” at higher rates by biased gene conversion in GC-rich sequences with higher rates of recombination. However, it is also consistent with the reaction mechanism of cytosine deamination itself ( Fryxell and Zuckerkandl 2000). That is, unmethylated cytosines in GpC dinucleotides undergo deamination at lower rates in GC-rich sequences because of reduced DNA melting. This explanation is both simpler and more precise because it successfully predicts the slope of the 5mC deamination mutation rate with respect to neighboring GC content.

Takashi Gojobori, Associate Editor

We thank Megan V. Chapman and Ramya Sundararajan for technical assistance with the identification of orthologous chimpanzee sequences.


Conclusions

The mapping of associationsbetween distal regulatory sites and the genes they control is a challenging task, which only recently began to be confronted on the genome-wide scale. Attempts to predict gene-enhancer pairs were based on the profiling of chromatin states or transcription factor binding [27, 34], or of long-range DNA looping [28, 29].Here, we show that enhancers can be also associated with genes using DNA methylation. In contrast to the above mapping approaches, methylation data are readily available and are highly quantitative, and thus may enhance mapping of gene-enhancer pairing.

We found that distal expression-related methylation sites are abundant in the human genome, co-localizing with enhancer chromatin marks, and are more predictive of expression levels then promoter methylations. While not all distal regulatory sites in the genome must exhibited promoter-like methylation, we showed that a large number of enhancer sites demonstrate reverse correlation between methylation and expression, as in gene promoters.

We have further shown that hypomethylation state is directly related to enhancer activity across cell types (Figure S3 in Additional file 1). The observation that hypomethylated enhancers bind more transcription factors than methylated ones (Figure 2E) suggests a possible mechanism underlying the connection between DNA methylation and enhancer activity. Consistent with this possibility, high-scoring enhancers are particularly enriched within a defined chromatin state (Figures 2D), which is particularly hypomethylated compared to non-regulatory chromatin (Figure S3Cin Additional file 1). Whether this chromatin state holds particularly active enhancers, or perhaps a unique class of methylation-related enhancers, remains to be elucidated.

The range of cell types we analyzed in this study was determined by the availability of methylation and expression data. In addition, the RRBS and Infinium HumanMethylation450 BeadChip methylation data we used provide limited genomic coverage and are strongly biased towards promoters and certain other portions of the genome, while enhancers are not efficiently targeted. Because of this, it is likely that more complete methylomic coverage will expose many additional enhancer-gene pairs. Whole-genome bisulfite sequencing approaches have recently become popular and whole methylome analyses of human tissues are rapidly accumulating. Utilizing our approach, these additional data should allow the production of more comprehensive maps of enhancer-gene pairing across tissues, cell types and conditions.

Unmethylated promoters are permissive for, but do not necessarily determine, transcription initiation. We showed that enhancer methylation associates with cell-type-specific expression levels, even when the promoter is constantly unmethylated (Figure 3). Moreover, enhancer methylation characterizes small (and larger) expression differences. Thus, enhancers are not just on-off switches of cell-type transcription levels, as previously suggested, but may also mediate ranges of expression levels across multiple cell types. In contrast to the traditional model of one enhancer site per cell type, we suggest that a gradient of methylation states at a single enhancer site may direct distinct expression levels in many different cell types (Figure 3C).

In occasional examples, enhancer methylation level has been suggested to be associated with the control of cancer-related genes[35–37]. However, to our knowledge this is the first report on a global association between perturbed enhancer methylation and aberrant expression of cancer genes. We have shown that hypomethylated enhancers associated with the upregulation of many cancer genes controlling various cellular functions (Figure 4C), some of them involved in cell proliferation and some in other cancer-related processes. Moreover, many of these hypomethylated genes were found in most cancer types examined, suggesting a pan-cancer mechanism. However, the larger group of hypermethylated enhancers seemed to target cancer-type-specific genes. Given the limited genomic coverage of this study, many additional cancer-related enhancers are expected in the genome.

To date, almost all studies of cancer-related methylation have focused on gene promoters and CpG islands. Among these, the predominant event in cancers is hypermethylation of polycomb-repressed promoters [9–11]. This hypermethylation does not directly affect expression levels, as the associated genes are inactive in the normal tissue and generally remain inactive in the cancer(although it may limit the potential for re-activation of silenced genes in the cancer). Here, we established a very different occurrence in the other large group of regulatory sites - the transcriptional enhancers. These sites are drastically altered in cancers, to both hypo- and hypermethylation, and are closely related to substantial modifications in the expression levels of cancer-related genes (Figure 4). Moreover, their aberrant methylation in cancers might derive from targeted methylation or demethylation or from selection of the altered cells (Figure 5). Whether targeted or selected, aberrant enhancer methylation may be involved in important events during cancer development.

We have uncovered a class of distal methylation sites that closely describe cell-type transcription levels. These sites reside in a particular subclass of transcriptional enhancers and are associated with cell-type-specific enhancer activity, possibly through communication with the binding of transcription factors. Methylation levels of these enhancers associate with gradual expression differences across cell types, even when the linked promoters are consistently unmethylated across the cell types. The radical changes in methylation of these sites in cancer is beyond that expected from the general profile of the cancer methylome, and may reflect specific targeting of the methylation and demethylation machinery to these sites, and/or functional contribution to tumor development. Further analyses of these sites may provide crucial information about paradigms of gene expression control in normal and cancerous cells.


LOCATION OF DNA METHYLATION

Although the brain contains some of the highest levels of DNA methylation of any tissue in the body, 5mC only accounts for 𢏁% of nucleic acids in the human genome (Ehrlich et al, 1982). The majority of DNA methylation occurs on cytosines that precede a guanine nucleotide or CpG sites. Overall, mammalian genomes are depleted of CpG sites that may result from the mutagenic potential of 5mC that can deaminate to thymine (Coulondre et al, 1978 Bird, 1980). The remaining CpG sites are spread out across the genome where they are heavily methylated with the exception of CpG islands (Bird et al, 1985). Interestingly, there is evidence of non-CpG methylation in mouse and human embryonic stem cells, however these methylation are lost in mature tissues (Ramsahoye et al, 2000 Lister et al, 2009). More thorough analysis of the murine frontal cortex has recently revealed that although the majority of methylation occurs within CpG sites, there is a significant percentage of methylated non-CpG sites (Xie et al, 2012). Because of its recent discovery, the role of non-CpG methylation is still unclear.

DNA methylation is essential for silencing retroviral elements, regulating tissue-specific gene expression, genomic imprinting, and X chromosome inactivation. Importantly, DNA methylation in different genomic regions may exert different influences on gene activities based on the underlying genetic sequence. In the following sections, we will further elaborate upon the role of DNA methylation in different genomic regions.

Intergenic Regions

Approximately 45% of the mammalian genome consists of transposable and viral elements that are silenced by bulk methylation (Schulz et al, 2006). The vast majority of these elements are inactivated by DNA methylation or by mutations acquired over time as the result of the deamination of 5mC (Walsh et al, 1998). If expressed, these elements are potentially harmful as their replication and insertion can lead to gene disruption and DNA mutation (Michaud et al, 1994 Wu et al, 1997 Kuster et al, 1997 Gwynn et al, 1998 Ukai et al, 2003). The intracisternal A particle (IAP) is one of most aggressive retroviruses in the mouse genome (Walsh et al, 1998). IAP is heavily methylated throughout life in gametogenesis, development, and adulthood (Walsh et al, 1998 Gaudet et al, 2004). Even within the embryo when the rest of the genome is relatively hypomethylated, Dnmt1 maintains the repression of IAP elements (Gaudet et al, 2004). When Dnmt1 is depleted by genetic mutations, leading to extensive hypomethylation, IAP elements are expressed (Walsh et al, 1998 Hutnick et al, 2010). This demonstrates that within intergenic regions, one of the main roles of DNA methylation is to repress the expression of potentially harmful genetic elements.

CpG Islands

CpG islands are stretches of DNA roughly 1000 base pairs long that have a higher CpG density than the rest of the genome but often are not methylated (Bird et al, 1985). The majority of gene promoters, roughly 70%, reside within CpG islands (Saxonov et al, 2006). In particular, the promoters for housekeeping genes are often imbedded in CpG islands (Gardiner-Garden and Frommer, 1987). CpG islands, especially those associated with promoters, are highly conserved between mice and humans (Illingworth et al, 2010). The location and preservation of CpG islands throughout evolution implies that these regions possess a functional importance.

It appears that CpG islands have been evolutionarily conserved to promote gene expression by regulating the chromatin structure and transcription factor binding. DNA is regularly wrapped around histone proteins forming small, packaged sections called nucleosomes. The more tightly associated with histone proteins the DNA is, the less permissive it is for gene expression. One of the common features of CpG islands is that they contain less nucleosomes than other stretches of DNA (Tazi and Bird, 1990 Ramirez-Carrozzi et al, 2009 Choi, 2010). The few nucleosomes with which CpG islands are associated often contain histones with modifications involved in enhancing gene expression (Tazi and Bird, 1990 Mikkelsen et al, 2007). Although �% of CpG islands contain known transcription start sites, CpG islands are often devoid of common promoter elements such as TATA boxes (Carninci et al, 2006). As many transcription factor binding sites are GC rich, CpG islands are likely to enhance binding to transcriptional start sites. Despite their lack of common promoter elements, CpG islands enhance the accessibility of DNA and promote transcription factor binding.

The methylation of CpG islands results in stable silencing of gene expression (Mohn et al, 2008). During gametogenesis and early embryonic development, CpG islands undergo differential methylation (Wutz et al, 1997 Caspary et al, 1998 Zwart et al, 2001 Kantor et al, 2004). The ability of methylation to regulate gene expression through CpG islands is particularly important for establishing imprinting (Wutz et al, 1997 Caspary et al, 1998 Zwart et al, 2001 Choi et al, 2005). Imprinted genes are expressed from only one of the two inherited parental chromosomes and their expression is determined by the parent of inheritance. Beyond imprinted genes, DNA methylation of CpG islands regulates gene expression during development and differentiation (Shen et al, 2007 Weber et al, 2007 Fouse et al, 2008 Mohn et al, 2008 Meissner et al, 2008). As CpG islands are associated with the control of gene expression, it would be expected that CpG islands might display tissue-specific patterns of DNA methylation. Although CpG islands in intragenic and gene body regions can have tissue-specific patterns of methylation, CpG islands associated with transcription start sites rarely show tissue-specific methylation patterns (Rakyan et al, 2004 Eckhardt et al, 2006 Meissner et al, 2008 Illingworth et al, 2010 Maunakea et al, 2010). Instead, regions called CpG island shores, located as far as 2 kb from CpG islands, have highly conserved patterns of tissue-specific methylation (Irizarry et al, 2009). Like CpG islands, the methylation of CpG shores is highly correlated with reduced gene expression (Irizarry et al, 2009).

The role of CpG islands in regulating gene expression is still being uncovered. Methylation of CpG islands can impair transcription factor binding, recruit repressive methyl-binding proteins, and stably silence gene expression. However, CpG islands, especially those associated with gene promoters, are rarely methylated. Further studies are needed to determine to what degree DNA methylation of CpG islands regulates gene expression.

Gene Body

As the majority of CpG sites within the mammalian genome are methylated, the genes themselves must also contain methylation. The gene body is considered the region of the gene past the first exon because methylation of the first exon, like promoter methylation, leads to gene silencing (Brenet et al, 2011). Evidence suggests that DNA methylation of the gene body is associated with a higher level of gene expression in dividing cells (Hellman and Chess, 2007 Ball et al, 2009 Aran et al, 2011). However, in slowly dividing and nondividing cells such as the brain, gene body methylation is not associated with increased gene expression (Aran et al, 2011 Guo et al, 2011a, 2011b Xie et al, 2012). Furthermore, in the murine frontal cortex, methylation of non-CpG sites within gene bodies is negatively correlated with gene expression (Xie et al, 2012). How DNA methylation of the gene body contributes to gene regulation is still unclear.


RESULTS

It was the aim of this study to investigate details of the DNA recognition and interaction mechanisms of the catalytic domain of the human DNA methyltransferase DNMT3B. We specifically addressed the combined interaction and readout of the CpG target region together with flanking sequence base pairs. To this end, CpG and non-CpG methylation was tested in libraries of substrates containing CpX target sites embedded into a context of 10 randomized base pairs on either side and the methylation levels were determined by bisulfite conversion coupled to ultra-deep next generation sequencing (NGS) readout. In order to study the role of individual amino acid residues in the DNA interaction process, 13 DNMT3B mutants with amino acid exchanges of DNA-interacting residues were generated (N652A, N656A, N656D, V657A, N658A, R661A, T775A, T775N, T775Q, K777A, N779A, N779D, R823A) and the mutant proteins were purified in the context of the catalytic domain of DNMT3B ( Supplementary Figure S2 ). Afterwards, the effects of the mutations on the methylation of CpG and non-CpG sites in different flanking contexts were studied in detail.

CpG methylation activity of wildtype DNMT3B and DNMT3B mutants

The catalytic activities of the wildtype human DNMT3B catalytic domain (WT) enzyme and the 13 selected mutants were determined with a radioactive DNA methylation assay using a 30mer oligonucleotide substrate containing a single CpG site in a TTC CG GGA sequence context (Figure 2A). In addition, we conducted deep enzymology experiments and investigated methylation of a pool of DNA substrates, in which the target CpX site was flanked by 10 random nucleotides on either side. The substrate pool was methylated by WT DNMT3B and DNMT3B mutants, the reaction products were subjected to hairpin ligation, bisulfite conversion, PCR amplification and NGS analysis as described previously ( 14, 23, 29). As the T775 mutants showed very weak activity in the radioactive kinetics (<0.5% of WT DNMT3B), only T775A was used for the deep enzymology experiments. Data were generated in two independent repeats and sequenced at great depth ( Supplementary Table S4 ).

CpG and non-CpG catalytic activities of WT DNMT3B and DNMT3B mutants. (A) Relative CpG methylation activities determined by radioactive kinetics. Numbers represent averages of three experiments, error bars show the SD. (B) Relative CpG methylation activities determined in the deep enzymology experiments. Numbers represent averages of two experiments, error bars show the SD. (C) Relative non-CpG methylation activities determined in the deep enzymology experiments. Numbers represent averages of two experiments, error bars show the SD.

CpG and non-CpG catalytic activities of WT DNMT3B and DNMT3B mutants. (A) Relative CpG methylation activities determined by radioactive kinetics. Numbers represent averages of three experiments, error bars show the SD. (B) Relative CpG methylation activities determined in the deep enzymology experiments. Numbers represent averages of two experiments, error bars show the SD. (C) Relative non-CpG methylation activities determined in the deep enzymology experiments. Numbers represent averages of two experiments, error bars show the SD.

In the initial analysis, methylation levels of CpG sequences were analyzed. As shown in Figure 2A and B, the results of both activity assays agree closely with each other. When considering both assays, the activities of N652A, N656A, N779A and N779D were similar to WT (<25% reduction when compared to WT). V657A, R661A, K777A and R823A showed a moderate reduction in activity (residual activities between 25 and 75% of WT). N656D and N658A showed a strong reduction of activity with residual activities between 5 and 15% of WT. The activity of the T775 mutants was even weaker (around 0.5% of WT activity). Next, the global non-CpG methylation activity of WT DNMT3B and the mutants was determined (Figure 2C). The CpA methylation activity of the WT enzyme was 17% of the activity observed at CpG sites, CpT activity was 7% and CpC 6%. The relative overall non-CpG activities of most mutants were similar to WT, with a few interesting exceptions: K777A showed an increase in non-CpG methylation in all three sequence contexts. Similarly, T775A showed an increase in non-CpG methylation, but error margins between the repeats were larger due to the very low overall methylation levels. N658A showed increased relative activity at CpA sites, and V657A showed an increase in the relative activity at CpT sites. Moreover, N656D and R823A showed globally reduced non-CpG methylation.

Flanking sequence preferences of CpG and non-CpG methylation by WT DNMT3B

To investigate if and to which extend flanking sequence interactions modulate the CpG recognition, two approaches for data analysis of the deep enzymology data were applied. First in a global analysis, for the different types of CpX methylation by human WT DNMT3B, the average methylation levels of all substrates containing a particular base at one of the -8 to +8 flank sites were determined to identify bases favorable or unfavorable for activity. The data were expressed in observed/expected values of the methylation levels. We first compared the results of both independent experimental repeats showing low error levels ( Supplementary Figure S3A and S3B ) and high correlation of the derived profiles ( Supplementary Figure S3C ). Therefore, the two data sets were merged and the magnitude of the position specific enrichments and depletions was calculated showing that the –2 to +3 flanks were most important for catalytic activity ( Supplementary Figure S3D ). In the merged data set, the position specific enrichment of bases was calculated for the different types of CpX methylation for the -4 to +4 flanking region (Figure 3A). For WT CpG methylation, the profile was very similar to previous results obtained with murine DNMT3B showing strongest preferences for T at the –2 site, and G/A at +1 ( 14). Even weaker trends of the previous data set were confirmed like a preference for A and disfavor for G at –1 and favor for C and disfavor for A at +2. In this global analysis, all four types of CpX methylation showed similar trends in flanking sequence preferences indicating that at a global level, a flanking sequence favorable for CpG methylation is also favorable for non-CpG methylation (Figure 3B). Correlation analysis showed that the flanking sequence preferences of CpA and CpT methylation were most similar at the overall level, followed by CpG and CpA. CpC methylation flanking sequence preferences were most distinct from the other profiles (Figure 3C).

Global analysis of flanking sequence effects on CpG and non-CpG methylation by wildtype DNMT3B showing overall correlations of CpX methylation. (A) Average CpX methylation levels of all substrates containing specific bases at –4 to +4 flank positions. Methylation levels are given as observed/expected (obs/exp) values. (B) Correlation of non-CpG and CpG –4 to +4 flanking sequence preferences. (C) Pearson correlation factors of the different CpX methylation flanking sequence profiles. (D) Global correlation of CpX methylation in 256 NNCXNN sequences. (E) Boxplot showing the correlation of CpG and non-CpG methylation levels of NNCXNN sequences. The boxes display the first and third quartiles with medians indicated by vertical lines. Whiskers display the data range. (F) Pearson correlation factors of the flanking preferences of all types of CpG and non-CpG methylation. (G) Ratio of the average methylation rates of the 15% most preferred and most disfavored flanking sites for CpG and CpN methylation.

Global analysis of flanking sequence effects on CpG and non-CpG methylation by wildtype DNMT3B showing overall correlations of CpX methylation. (A) Average CpX methylation levels of all substrates containing specific bases at –4 to +4 flank positions. Methylation levels are given as observed/expected (obs/exp) values. (B) Correlation of non-CpG and CpG –4 to +4 flanking sequence preferences. (C) Pearson correlation factors of the different CpX methylation flanking sequence profiles. (D) Global correlation of CpX methylation in 256 NNCXNN sequences. (E) Boxplot showing the correlation of CpG and non-CpG methylation levels of NNCXNN sequences. The boxes display the first and third quartiles with medians indicated by vertical lines. Whiskers display the data range. (F) Pearson correlation factors of the flanking preferences of all types of CpG and non-CpG methylation. (G) Ratio of the average methylation rates of the 15% most preferred and most disfavored flanking sites for CpG and CpN methylation.

Next, a more specific analysis was performed. For this, the sequencing reads for each CpX specificity were separated and average methylation levels of all 256 NNCXNN hexanucleotides were determined. Due to the separate treatment of each NNCXNN sequence, this analysis is optimized to detect the combined effect of more than one flanking position on the methylation activity. The methylation levels of both repeats showed high correlation ( Supplementary Figure S4A ) and therefore corresponding data sets were merged. A correlation analyses of the CpG, CpA, CpT and CpC methylation levels in the different NNCXNN flanking sequence contexts (Figure 3D– F) revealed a good correlation and similar trends as observed in the –4 to +4 flanking sequence enrichment analysis (Figure 3A– C). However, comparison of the average methylation rates of the 15% most favored and 15% most disfavored sites clearly indicates that the overall flanking sequence preferences were enhanced for non-CpG methylation (Figure 3G).

Detailed analysis of non-CpG flanking sequence preferences

A more detailed comparison of the CpG profiles with the profiles observed for CpA, CpT and CpC methylation revealed few but striking differences in flanking sequence effects in the –4 to +4 flank base preferences (Figure 4A). For CpA methylation, many trends observed for CpG methylation were enhanced, T was more favored and G more disfavored at the –2 site, A was more preferred at the –1 site where G was more disfavored. In contrast, G was more favored at +1, +2 and +3. Strikingly, the preference for A at +1 observed with CpG methylation was specific for the CpG context and lost with CpA and CpT. For CpT methylation, the profile shows high similarity with CpA. For CpC, the pattern looks different with T favored at many places, most prominently at –2 and at +2, but it is also tolerated at +1 where T is highly disfavored in CpG methylation. It is also noteworthy that G is highly disfavored at –2 in all types of non-CpG methylation.

Detailed differences of WT DNMT3B flanking sequence preferences in CpG and non-CpG methylation and comparison with DNMT3B dependent genomic non-CpG methylation patterns. (A) Global methylation analysis showing average methylation levels of all substrates which carry a specific base at each –4 to +4 flank site. Methylation levels are given as observed/expected (obs/exp) values. (B) Analysis based on correlation of the hexanucleotide sequences. The 10 NNCXNN flanks with highest CpX methylation were used to illustrate the enrichment of bases with a Weblogo. (C) Enrichment and depletion of bases at –4 to +4 flank positions for CpX methylation in the genomic DNMT3B dependent methylation (Genome) and the DNMT3B in vitro preferences (DNMT3B). Data are shown as observed/expected values (obs/exp). (D) Correlation of genomic (Genome) and enzymatic (DNMT3B) observed/expected methylation profiles. The P-values for the correlations of CpA, CpT and CpC methylation are 2.8 × 10 -7 , 7.6 × 10 -6 and 9.2 × 10 -5 , respectively.

Detailed differences of WT DNMT3B flanking sequence preferences in CpG and non-CpG methylation and comparison with DNMT3B dependent genomic non-CpG methylation patterns. (A) Global methylation analysis showing average methylation levels of all substrates which carry a specific base at each –4 to +4 flank site. Methylation levels are given as observed/expected (obs/exp) values. (B) Analysis based on correlation of the hexanucleotide sequences. The 10 NNCXNN flanks with highest CpX methylation were used to illustrate the enrichment of bases with a Weblogo. (C) Enrichment and depletion of bases at –4 to +4 flank positions for CpX methylation in the genomic DNMT3B dependent methylation (Genome) and the DNMT3B in vitro preferences (DNMT3B). Data are shown as observed/expected values (obs/exp). (D) Correlation of genomic (Genome) and enzymatic (DNMT3B) observed/expected methylation profiles. The P-values for the correlations of CpA, CpT and CpC methylation are 2.8 × 10 -7 , 7.6 × 10 -6 and 9.2 × 10 -5 , respectively.

To explore the flanking sequence effects in CpG and non-CpG methylation in more details, we inspected the average methylation levels in NNCXNN sequence contexts. This analysis illustrated the strong effects of the –2 to +2 flanks mainly on the non-CpG methylation activity of DNMT3B, because the ratio of the average methylation rates of the 15% best and worst flanking sites was 5.1 for CpG, but 18, 21 and 52 for CpA, CpT and CpC, respectively (Figure 3G and Supplementary Figure S5A ). In case of CpT and CpC, many of the worst sites showed no detectable methylation. Next, the flanks with highest CpX methylation levels were used to calculate Weblogos (Figure 4B). Overall, this analysis reproduced most of the conclusions drawn from the global analysis shown in Figure 4A, including the preference for T, A, G/A and C at the –2, –1 and +2 sites in CpG methylation. It is interesting to note that the human SatII sequence context (TT CG AT) although it does not reflect the most preferred residues for CpG methylation at the –1 and +2 positions has rank of 19 among all 256 NNCGNN sites (where a low rank indicates high activity). This result illustrates that the combined interaction with all positions leads to a high preference for this sequence. For non-CpG methylation, key findings of the global analysis were reproduced as well, including the strong preference for A(–1) in CpA methylation and reduction of A preference at +1 and the strong preference for T(–2) in CpC methylation. We observed another striking result for CpA methylation, because the four sequences with highest CpA methylation all were TACAGN sequences. As shown in Supplementary Figure S5B , the preference of DNMT3B for the TACXG sequence was much higher in CpA and CpT methylation than in CpG methylation providing an impressive example of the coupling of flanking sequence contacts with CpG recognition and the stronger effects of flanking sequences on non-CpG methylation.

Comparison of genomic non-CpG methylation patterns with in vitro preferences

To determine the in vivo flanking preferences of DNMT3B, public whole genome bisulfite data of CpG and non-CpG methylation from human ES cells (hESC) were compared with data from the same cell line after KO of DNMT3B ( 33). The difference of both data sets was used as indication of the activity of DNMT3B in the native cells. As already reported ( 33), the data show a major contribution of DNMT3B to CpA and CpT methylation in this cell line (Table 1). The contribution of DNMT3B to CpG methylation was minor (Table 1), which is in agreement with the general observation that CpG methylation is most strongly determined by DNMT1 ( 29). CpC methylation levels were lowest and close to background and the measurable contribution of DNMT3B was moderate. Next, the flanking sequence dependent methylation levels of the DNMT3B associated genomic methylation were determined and compared with the in vitro flanking sequence preferences of DNMT3B (Figure 4C and D). For CpG methylation, no significant correlation was observed in agreement with the notion that DNMT3B does not strongly contribute to CpG methylation in this cell line. However, the flanking profiles of genomic CpA and CpT methylation were very strongly and highly significantly correlated with the activity profiles of DNMT3B with Pearson correlation r-values of 0.96 and 0.9 and P-values below 1 × 10 -5 (Figure 4C and D, Table 1). For CpC methylation, a weaker but still highly significant correlation was observed. These findings indicate that the cellular non-CpG methylation activity of DNMT3B is strongly influenced by its flanking sequence preferences, in particular at CpA and CpT sites.

Methylation levels in hESC and DNMT3B KO hESC

. Methylation level (%) . . Correlation genomic vs. enzymatic obs/exp methylation profiles .
. WT -H1 hESC . DNMT3B KO H1 cells . WT-KO . DNMT3B contribution to WT methylation (%) . r-value . P-value .
CpG84.34 79.16 5.18 6.1 0.05 n.s.
CpA2.47 0.85 1.62 65.5 0.96 2.8E-07
CpT0.81 0.46 0.35 43.5 0.90 7.6E-06
CpC0.57 0.50 0.07 12.1 0.68 9.2E-05
. Methylation level (%) . . Correlation genomic vs. enzymatic obs/exp methylation profiles .
. WT -H1 hESC . DNMT3B KO H1 cells . WT-KO . DNMT3B contribution to WT methylation (%) . r-value . P-value .
CpG84.34 79.16 5.18 6.1 0.05 n.s.
CpA2.47 0.85 1.62 65.5 0.96 2.8E-07
CpT0.81 0.46 0.35 43.5 0.90 7.6E-06
CpC0.57 0.50 0.07 12.1 0.68 9.2E-05

Methylation levels in hESC and DNMT3B KO hESC

. Methylation level (%) . . Correlation genomic vs. enzymatic obs/exp methylation profiles .
. WT -H1 hESC . DNMT3B KO H1 cells . WT-KO . DNMT3B contribution to WT methylation (%) . r-value . P-value .
CpG84.34 79.16 5.18 6.1 0.05 n.s.
CpA2.47 0.85 1.62 65.5 0.96 2.8E-07
CpT0.81 0.46 0.35 43.5 0.90 7.6E-06
CpC0.57 0.50 0.07 12.1 0.68 9.2E-05
. Methylation level (%) . . Correlation genomic vs. enzymatic obs/exp methylation profiles .
. WT -H1 hESC . DNMT3B KO H1 cells . WT-KO . DNMT3B contribution to WT methylation (%) . r-value . P-value .
CpG84.34 79.16 5.18 6.1 0.05 n.s.
CpA2.47 0.85 1.62 65.5 0.96 2.8E-07
CpT0.81 0.46 0.35 43.5 0.90 7.6E-06
CpC0.57 0.50 0.07 12.1 0.68 9.2E-05

Detailed flanking sequence preferences of CpG and non-CpG methylation by DNMT3B mutants

Next, detailed deep enzymology based flanking sequence preferences of all DNMT3B mutants (except T775A) were determined. In case of all CpG and most non-CpG data sets, the individual repeats showed a strong correlation ( Supplementary Figure S6 ) indicating a good quality of the data sets, which were then merged for further analysis. In support of the reliable data quality, most of the profiles reproduced the key preferences of WT DNMT3B including the preferences for T(-2), A(–1) and G(+1) (Figure 5). However, there were interesting global and mutant specific effects observed. K777A showed the strongest local changes in flanking sequence preferences at the +1 site where G was strongly disfavored in CpG methylation and T was strongly favored in non-CpG methylation. N656D showed a massively altered profile at the –2 to +1 sites indicating that larger perturbation of the catalytic loop conformation was caused by the negative charge of the aspartate. Another striking effect was observed at the +1 flanking site, where G and A are preferred by WT in CpG methylation. As described above, the A preference was lost in non-CpG methylation, and the same effect was also observed for many mutants even in CpG methylation, most strongly in the case of N779D, but also for N779A, N658A and N656A.

Enrichment and depletion of bases in the –4 to +4 flank region of CpG and non-CpG substrates methylated by DNMT3B mutants. On the x-axis the flank positions are shown. On the y-axis the relative methylation levels are shown as observed/expected.

Enrichment and depletion of bases in the –4 to +4 flank region of CpG and non-CpG substrates methylated by DNMT3B mutants. On the x-axis the flank positions are shown. On the y-axis the relative methylation levels are shown as observed/expected.

To investigate if the mutations affected the equalization of methylation rates in different flanking contexts, we compared the degree of the flanking sequence preferences in WT DNMT3B and its mutants. Visual inspection of the data revealed that GG CG GG and GG CG TG sites often were only weakly methylated by the mutants. We, therefore, determined the relative methylation rates of these two target sites for WT and all mutants showing that WT was able to methylate them with good activity, but four of the mutants (N652A, V657A, N658A and K777A) had no activity at one of these sites or even at both (Figure 6A). We also compared the ratios of average methylation rates of the 15% most preferred and most disfavored sites, which is around 5.1 in the case of WT. Strikingly, this value increased to 15 in the case of N656D, 20 for N658A and 17 for N779D (Figure 6B) indicating that these mutants showed increased sensitivity towards flanking sequence effects. For example, N656D had almost zero activity at A(C/T)CGTG sites, N779D was not active on GGCGA sites, and N658A had zero activity at 6 out of the 256 flanking contexts indicating that this residue has a very important role in the adaptation of DNMT3B to different flanking sequences.

Effects of amino acid mutations on flanking sequence preferences of DNMT3B. (A) Mutations in DNMT3B increase the flanking sequence sensitivity of CpG methylation leading to a drop in the ability of some DNMT3B mutants to methylate GG CG GG and GG CG TG sequences, here expressed as the ratio of their methylation and the average of all NNCGNN methylation rates. Note that four of the mutations led to a complete loss of GG CG GG and/or GG CG TG methylation (N652A, V657A, N658A and K777A). (B) Ratio of the methylation rates of the 15% best and 15% worst NNCGNN sites. Note, the elevated effects of flanking sequences in N658A, N779D and N656D. (C) Rank of the SatII sequence (TTCGAT) methylation preference by WT and mutant DNMT3B in all 256 NNCGNN flanks. A low rank indicates high methylation activity. (D) Heatmap of the activity of DNMT3B and R823A at the 4096 different NNNCGNNN sequences sorted by the difference in preferences. (E) Occurrence of bases at the +1 to +3 flank sites in the 5% of sequences most preferred and disfavored by R823A. (F) DNA shape effect on the CpG methylation activity of WT DNMT3B and the K777A and N779D mutants. Differences in the minor groove width were determined between the methylated and unmethylated sequences. In both mutants, an increased minor groove width at the +2 to +4 flanks is associated with activity. Data are shown as average of two repeats, error bars indicate the standard deviation.

Effects of amino acid mutations on flanking sequence preferences of DNMT3B. (A) Mutations in DNMT3B increase the flanking sequence sensitivity of CpG methylation leading to a drop in the ability of some DNMT3B mutants to methylate GG CG GG and GG CG TG sequences, here expressed as the ratio of their methylation and the average of all NNCGNN methylation rates. Note that four of the mutations led to a complete loss of GG CG GG and/or GG CG TG methylation (N652A, V657A, N658A and K777A). (B) Ratio of the methylation rates of the 15% best and 15% worst NNCGNN sites. Note, the elevated effects of flanking sequences in N658A, N779D and N656D. (C) Rank of the SatII sequence (TTCGAT) methylation preference by WT and mutant DNMT3B in all 256 NNCGNN flanks. A low rank indicates high methylation activity. (D) Heatmap of the activity of DNMT3B and R823A at the 4096 different NNNCGNNN sequences sorted by the difference in preferences. (E) Occurrence of bases at the +1 to +3 flank sites in the 5% of sequences most preferred and disfavored by R823A. (F) DNA shape effect on the CpG methylation activity of WT DNMT3B and the K777A and N779D mutants. Differences in the minor groove width were determined between the methylated and unmethylated sequences. In both mutants, an increased minor groove width at the +2 to +4 flanks is associated with activity. Data are shown as average of two repeats, error bars indicate the standard deviation.

Next, we were interested to investigate the role of the mutated amino acid residues in the preferred interaction of DNMT3B with the SatII sequence. As mentioned above this sequence is at rank 19 in WT DNMT3B. Interestingly, the preference was reduced moderately in the case of N656A, V657A, N658A, T775A and N779A and more strongly for N656D, R661A, K777A and N779D. Actually, for N779D the SatII sequence was already in the disfavored fraction of flanking sequences (Figure 6C). In case of N779A/D the reduced preference for the SatII sequence is correlated with the loss of the preference for A(+1). The strong effects of R661A and K777A can be connected to their loss of the T(–1) and T(–2) preferences, which are unique among all investigated mutants. Overall, these results show that the SatII interaction depends on the positioning of both the catalytic loop and TRD loop based on several protein-DNA contacts.

Finally, we investigated the CpG flanking sequence preference of R823A more closely. R823 forms a phosphate contact at the +3 flank site ( 14). The corresponding residue in DNMT3A (R882) is a hotspot of DNMT3A mutations in AML ( 24) and its mutation has been shown to lead to strong changes of flanking sequence preferences of DNMT3A ( 23, 25). To investigate potential changes in preferences at the +3 site, CpG methylation datasets of WT DNMT3B and R823A determined with a substrate pool containing a hemimethylated CpG target site were sequenced at greater depth ( Supplementary Table S4 ). Initially, the data were analyzed with respect to the average methylation levels of all substrates containing a particular base at one of the –4 to +4 flank sites ( Supplementary Figure S7 ). This analysis revealed high similarity between the profiles determined with the substrate libraries containing CN and hemimethylated CpG target sites, despite the much higher sequencing depth of the latter. Moreover, the profiles of WT DNMT3B and R823A were very similar to each other. Next, the more sensitive analysis of the NNNCGNNN flanking sequence preferences was conducted using the hemimethylated CpG data. As shown in Figure 6D, both WT DNMT3B and R823A showed similar activities at many sites, which is in agreement with the previous analysis. However, sorting of the data by the ratio of the activities of R823A and WT revealed a subset of sites that was methylated better by R823A than by wildtype and vice versa. An analysis of sequences present in the 5% sequences most favored and disfavored by R823A (Figure 6E) discovered that sites preferred by R823A show strong enrichment of GGG sequences at flank position +1 to +3. The enrichment of T at +1 in favored and disfavored sequences appeared because T at the +1 site is very disfavored, leading to low catalytic rates which amplifies ratios of rates. Hence, one role of the R823 residue appears to be in balancing a strong interaction with triple-G sequences located in the +1 to +3 flank region. More generally, our data show that DNA phosphate contact of R823 plays a role in the interaction with the 3′ flank and as observed with its R882 counterpart in DNMT3A, mutation of R823 leads to changes in the flanking sequence preferences.

DNA shape readout

In addition to contacts to the edges of the bases in the major and minor groove (called direct readout), DNA sequence can be determined by DNA binding proteins through probing sequence specific conformational preferences of the DNA (called DNA shape readout or indirect readout). To investigate if the sequence preferences determined here may affect the DNA conformation, the DNA shape server was used (http://rohslab.cmb.usc.edu) ( 35), which provides parameters for the roll, helical twist, minor groove width and propeller twist based on penta- or hexanucleotide sequences centered at one base pair (minor groove width, propeller twist) or base pair step (roll, helical twist). The occurrence and methylation of all base or base pair centered penta- and hexanucleotides was determined and the corresponding changes of the DNA shape parameters comparing methylated sequences with all sequences were calculated. While no significant changes were observed in most cases, an increased minor groove width at flank position +2 to +4 was correlated with the activity of the K777A and N779D mutants (Figure 6F). The TRD loop approaches the DNA from the major groove at the plus flank side and in the K777A mutant structure a compression of the major groove has been observed ( 14). According to our activity data, the corresponding enlargement of the minor groove is favorable for methylation activity of this mutant. Similar effects may occur with N779D. The correlation of an increased minor groove width at flanks sites +2 to +4 with activity of the K777A and N779D mutants illustrates the cooperative effects of DNA contacts from the major and minor groove side.

Comparison of the flanking sequence preferences of human and mouse DNMT3B

The catalytic domains of human and mouse DNMT3B enzymes differ at 17 out of 285 amino acid residues. Among these, a K-to-R substitution is observed at the residue corresponding to human DNMT3B K782, which is in the TRD loop and contacts the backbone of +2 and +3 flank of the non-target strand. In contrast, the catalytic domains of human and mouse DNMT3A are identical in amino acid sequence. To find out if flanking sequence preferences of the human and mouse DNMT3B enzymes differ, CpG methylation preferences of human and mouse DNMT3B (mouse data were taken from our previous publication ( 14)) were analyzed regarding the average methylation of NNNCGNNN sites, because our previous paper has shown that the +3 site is very important for SatII preferences ( 14). As shown in Figure 7A, flanking sequence preferences of human and mouse DNMT3B were highly correlated, but preferences at some flanking sequences differ significantly (Figure 7B), suggesting that some minor changes of flanking sequence preferences occurred in the evolution of mouse and human DNMT3B. Genetic studies have shown that human SatII repeats are a major methylation target of DNMT3B in human cells ( 10–12) while minor satellite repeats are major targets of mouse DNMT3B ( 36). Therefore, we investigated the preferences of human and mouse DNMT3B for the CpG sites in these repeat elements. For this analysis, DNMT3B methylation preferences were taken for the preferred DNA strand, assuming that DNMT1 will convert hemimethylated sites generated by DNMT3B into the fully methylation state. Our data revealed that human DNMT3B shows an increased preference for SatII sequences while murine DNMT3B prefers minor satellite repeats (Figure 7C and D). Although these changes are only moderate, they suggest an evolutionary adaptation of the flanking sequence preferences of DNMT3B enzymes for their most important physiological targets. In contrast and as reported before, DNMT3A strongly disfavors both sequences ( Supplementary Figure S8 ).

Comparison of the flanking sequence preferences of human and mouse DNMT3B. (A) Heatmap of the average methylation activities of mouse and human DNMT3B and DNMT3A at NNNCGNNN sites sorted by the average activity of DNMT3B. The amino acid sequence of catalytic domain of human and mouse DNMT3A is identical. DNMT3A data were taken from ( 14). (B) Heatmap of the average methylation activities of mouse and human DNMT3B at NNNCGNNN sites sorted by the difference between human and mouse enzymes. (C) Boxplot of the ranks of the CpG sites in the human SatII repeat (n = 67) and murine minor satellite repeat (n = 11) in the NNNCGNNN flanking preferences of human and mouse DNMT3B. A low rank indicates high preference. The boxes display the first and third quartiles with medians indicated by vertical lines. Whiskers display the data range. (D) Ratio of the median ranks of human and mouse DNMT3B for the methylation of CpG sites in human SatII and mouse MinSat repeats.

Comparison of the flanking sequence preferences of human and mouse DNMT3B. (A) Heatmap of the average methylation activities of mouse and human DNMT3B and DNMT3A at NNNCGNNN sites sorted by the average activity of DNMT3B. The amino acid sequence of catalytic domain of human and mouse DNMT3A is identical. DNMT3A data were taken from ( 14). (B) Heatmap of the average methylation activities of mouse and human DNMT3B at NNNCGNNN sites sorted by the difference between human and mouse enzymes. (C) Boxplot of the ranks of the CpG sites in the human SatII repeat (n = 67) and murine minor satellite repeat (n = 11) in the NNNCGNNN flanking preferences of human and mouse DNMT3B. A low rank indicates high preference. The boxes display the first and third quartiles with medians indicated by vertical lines. Whiskers display the data range. (D) Ratio of the median ranks of human and mouse DNMT3B for the methylation of CpG sites in human SatII and mouse MinSat repeats.


References

López-Otín C, Blasco MA, Partridge L, Serrano M, Kroemer G. The hallmarks of aging. Cell. 2013153:1194–217.

Holliday R, Pugh JE. DNA modification mechanisms and gene activity during development. Science. 1975187:226–32.

Li E, Beard C, Jaenisch R. Role for DNA methylation in genomic imprinting. Nature. 1993366:362–5.

Riggs AD, Pfeifer GP. X-chromosome inactivation and cell memory. Trends Genet. 19928:169–74.

Ehrlich M, Gama-Sosa MA, Huang L-H, Midgett RM, Kuo KC, McCune RA, et al. Amount and distribution of 5-methylcytosine in human DNA from different types of tissues or cells. Nucleic Acids Res. 198210:2709–21.

Deaton AM, Bird A. CpG islands and the regulation of transcription. Genes Dev. 201125:1010–22.

Berdishev MT, Korotaev GK, Boiarskikh GV, Vanyushin BF. Nucleotide composition of DNA and RNA from somatic tissues of humpback salmon and its changes during spawning. Biokhimiia. 196738:988–93.

Romanov GA, Vanyushin BF. Methylation of reiterated sequences in mammalian DNAs. Effects of the tissue type, age, malignancy and hormonal induction. Biochim Biophys Acta. 1981653:204–18.

Wilson VL, Smith RA, Ma S, Cutler RG. Genomic 5-methyldeoxycytidine decreases with age. J Biol Chem. 1987262:9948–51.

Wilson VL, Jones PA. DNA methylation decreases in aging but not in immortal cells. Science. 1983220:1055–7.

Fairweather DS, Fox M, Margison GP. The in vitro lifespan of MRC-5 cells is shortened by 5-azacytidine-induced demethylation. Exp Cell Res. 1987168:153–9.

Heyn H, Li N, Ferreira HJ, Moran S, Pisano DG, Gomez A, et al. Distinct DNA methylomes of newborns and centenarians. Proc Natl Acad Sci U S A. 2012109:10522–7.

Bjornsson HT, Sigursson MI, Fallin MD, Irizarry RA, Aspelund T, Cui H, et al. Intra-individual change over time in DNA methylation with familial clustering. JAMA. 2008299:2877–83.

Issa J-PJ, Ottaviano YL, Celano P, Hamilton SR, Davidson NE, Baylin SB. Methylation of the oestrogen receptor CpG island links ageing and neoplasia in human colon. Nat Genet. 19947:536–40.

Florath I, Butterbach K, Müller H, Bewerunge-Hudler M, Brenner H. Cross-sectional and longitudinal changes in DNA methylation with age: an epigenome-wide analysis revealing over 60 novel age-associated CpG sites. Hum Mol Genet. 201423:1186–201.

Bocklandt S, Lin W, Sehl ME, Sánchez FJ, Sinsheimer JS, Horvath S, et al. Epigenetic predictor of age. PLoS One. 20116:e14821.

Horvath S. DNA methylation age of human tissues and cell types. Genome Biol. 201314:R115.

Weidner CI, Lin Q, Koch CM, Eisele L, Beier F, Ziegler P, et al. Aging of blood can be tracked by DNA methylation changes at just three CpG sites. Genome Biol. 201415:R24.

Steegenga WT, Boekschoten MV, Lute C, Hooiveld GJ, de Groot PJ, Morris TJ, et al. Genome-wide age-related changes in DNA methylation and gene expression in human PBMCs. Age (Dordr). 201436:9648.

Almén MS, Nilsson EK, Jacobsson JA, Kalnina I, Klovins J, Fredriksson R, et al. Genome-wide analysis reveals DNA methylation markers that vary with both age and obesity. Gene. 2014548:61–7.

De Mello VDF, Pulkkinen L, Lalli M, Kolehmainen M, Pihlajamäki J, Uusitupa M. DNA methylation in obesity and type 2 diabetes. Ann Med. 201446:103–13.

Bind M-A, Lepeule J, Zanobetti A, Gasparrini A, Baccarelli A, Coull BA, et al. Air pollution and gene-specific methylation in the Normative Aging Study: association, effect modification, and mediation analysis. Epigenetics. 20149:448–58.

Lambrou A, Baccarelli A, Wright RO, Weisskopf M, Bollati V, Amarasiriwardena C, et al. Arsenic exposure and DNA methylation among elderly men. Epidemiology. 201223:668–76.

Noreen F, Röösli M, Gaj P, Pietrzak J, Weis S, Urfer P, et al. Modulation of age- and cancer-associated DNA methylation change in the healthy colon by aspirin and lifestyle. J Natl Cancer Inst. 2014106:dju161.

Besingi W, Johansson Å. Smoke related DNA methylation changes in the etiology of human disease. Hum Mol Genet. 201323:2290–7.

Bind M-A, Zanobetti A, Gasparrini A, Peters A, Coull B, Baccarelli A, et al. Effects of temperature and relative humidity on DNA methylation. Epidemiology. 201425:561–9.

Bernstein BE, Mikkelsen TS, Xie X, Kamal M, Huebert DJ, Cuff J, et al. A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell. 2006125:315–26.

Boyer LA, Plath K, Zeitlinger J, Brambrink T, Medeiros LA, Lee TI, et al. Polycomb complexes repress developmental regulators in murine embryonic stem cells. Nature. 2006441:349–53.

Bracken AP, Dietrich N, Pasini D, Hansen KH, Helin K. Genome-wide mapping of Polycomb target genes unravels their roles in cell fate transitions. Genes Dev. 200620:1123–36.

Lee TI, Jenner RG, Boyer LA, Guenther MG, Levine SS, Kumar RM, et al. Control of developmental regulators by Polycomb in human embryonic stem cells. Cell. 2006125:301–13.

Margueron R, Reinberg D. The Polycomb complex PRC2 and its mark in life. Nature. 2011469:343–9.

Eskeland R, Leeb M, Grimes GR, Kress C, Boyle S, Sproul D, et al. Ring1B compacts chromatin structure and represses gene expression independent of histone ubiquitination. Mol Cell. 201038:452–64.

Francis NJ, Kingston RE, Woodcock CL. Chromatin compaction by a Polycomb group protein complex. Science. 2004306:1574–7.

Rauch T, Li H, Wu X, Pfeifer GP. MIRA-assisted microarray analysis, a new technology for the determination of DNA methylation patterns, identifies frequent methylation of homeodomain-containing genes in lung cancer cells. Cancer Res. 200666:7939–47.

Schlesinger Y, Straussman R, Keshet I, Farkash S, Hecht M, Zimmerman J, et al. Polycomb-mediated methylation on Lys27 of histone H3 pre-marks genes for de novo methylation in cancer. Nat Genet. 200739:232–6.

Ohm JE, McGarvey KM, Yu X, Cheng L, Schuebel KE, Cope L, et al. A stem cell-like chromatin pattern may predispose tumor suppressor genes to DNA hypermethylation and heritable silencing. Nat Genet. 200739:237–42.

Widschwendter M, Fiegl H, Egle D, Mueller-Holzner E, Spizzo G, Marth C, et al. Epigenetic stem cell signature in cancer. Nat Genet. 200739:157–8.

Hahn MA, Hahn T, Lee D-H, Esworthy RS, Kim B-W, Riggs AD, et al. Methylation of polycomb target genes in intestinal cancer is mediated by inflammation. Cancer Res. 200868:10280–9.

Kalari S, Jung M, Kernstine KH, Takahashi T, Pfeifer GP. The DNA methylation landscape of small cell lung cancer suggests a differentiation defect of neuroendocrine cells. Oncogene. 201332:3559–68.

Johnson KC, Koestler DC, Cheng C, Christensen BC. Age-related DNA methylation in normal breast tissue and its relationship with invasive breast tumor methylation. Epigenetics. 20149:268–75.

Lynch MD, Smith AJH, De Gobbi M, Flenley M, Hughes JR, Vernimmen D, et al. An interspecies analysis reveals a key role for unmethylated CpG dinucleotides in vertebrate Polycomb complex recruitment. EMBO J. 201231:317–29.

Maegawa S, Hinkal G, Kim HS, Shen L, Zhang L, Zhang J, et al. Widespread and tissue specific age-related DNA methylation changes in mice. Genome Res. 201020:332–40.

Reddington JP, Perricone SM, Nestor CE, Reichmann J, Youngson NA, Suzuki M, et al. Redistribution of H3K27me3 upon DNA hypomethylation results in de-repression of Polycomb target genes. Genome Biol. 201314:R25.

Blackledge NP, Farcas AM, Kondo T, King HW, McGouran JF, Hanssen LLP, et al. Variant PRC1 complex-dependent H2A ubiquitylation drives PRC2 recruitment and Polycomb domain formation. Cell. 2014157:1445–59.

Cedar H, Bergman Y. Programming of DNA methylation patterns. Annu Rev Biochem. 201281:97–117.

Gal-Yam EN, Egger G, Iniguez L, Holster H, Einarsson S, Zhang X, et al. Frequent switching of Polycomb repressive marks and DNA hypermethylation in the PC3 prostate cancer cell line. Proc Natl Acad Sci U S A. 2008105:12979–84.

Finkel T, Serrano M, Blasco MA. The common biology of cancer and ageing. Nature. 2007448:767–74.

Wilson AS, Power BE, Molloy PL. DNA hypomethylation and human diseases. Biochim Biophys Acta. 20071775:138–62.

Berman BP, Weisenberger DJ, Aman JF, Hinoue T, Ramjan Z, Liu Y, et al. Regions of focal DNA hypermethylation and long-range hypomethylation in colorectal cancer coincide with nuclear lamina-associated domains. Nat Genet. 201244:40–6.

Howard G, Eiges R, Gaudet F, Jaenisch R, Eden A. Activation and transposition of endogenous retroviral elements in hypomethylation induced tumors in mice. Oncogene. 200827:404–8.

Vidal AC, Henry NM, Murphy SK, Oneko O, Nye M, Bartlett JA, et al. PEG1/MEST and IGF2 DNA methylation in CIN and in cervical cancer. Clin Transl Oncol. 201416:266–72.

Fujii H, Biel MA, Zhou W, Weitzman SA, Baylin SB, Gabrielson E. Methylation of the HIC-1 candidate tumor suppressor gene in human breast cancer. Oncogene. 199816:2159–64.

Yuan Y, Qian ZR, Sano T, Asa SL, Yamada S, Kagawa N, et al. Reduction of GSTP1 expression by DNA methylation correlates with clinicopathological features in pituitary adenomas. Mod Pathol. 200821:856–65.

Sutherland KD, Lindeman GJ, Choong DYH, Wittlin S, Brentzell L, Phillips W, et al. Differential hypermethylation of SOCS genes in ovarian and breast carcinomas. Oncogene. 200423:7726–33.

Dammann R, Li C, Yoon JH, Chin PL, Bates S, Pfeifer GP. Epigenetic inactivation of a RAS association domain family protein from the lung tumour suppressor locus 3p21.3. Nat Genet. 200025:315–9.

Cody DT, Huang Y, Darby CJ, Johnson GK, Domann FE. Differential DNA methylation of the p16 INK4A/CDKN2A promoter in human oral cancer cells and normal human oral keratinocytes. Oral Oncol. 199935:516–22.

Virmani AK, Rathi A, Sathyanarayana UG, Padar A, Huang CX, Cunnigham HT, et al. Aberrant methylation of the adenomatous polyposis coli (APC) gene promoter 1A in breast and lung carcinomas. Clin Cancer Res. 20017:1998–2004.

Gaudet MM, Campan M, Figueroa JD, Yang XR, Lissowska J, Peplonska B, et al. DNA hypermethylation of ESR1 and PGR in breast cancer: pathologic and epidemiologic associations. Cancer Epidemiol Biomarkers Prev. 200918:3036–43.

Issa JP, Ahuja N, Toyota M, Bronner MP, Brentnall TA. Accelerated age-related CpG island methylation in ulcerative colitis. Cancer Res. 200161:3573–7.

Lund G, Andersson L, Lauria M, Lindholm M, Fraga MF, Villar-Garea A, et al. DNA methylation polymorphisms precede any histological sign of atherosclerosis in mice lacking apolipoprotein E. J Biol Chem. 2004279:29147–54.

Gowers IR, Walters K, Kiss-Toth E, Read RC, Duff GW, Wilson AG. Age-related loss of CpG methylation in the tumour necrosis factor promoter. Cytokine. 201156:792–7.

Dayeh T, Volkov P, Salö S, Hall E, Nilsson E, Olsson AH, et al. Genome-wide DNA methylation analysis of human pancreatic islets from type 2 diabetic and non-diabetic donors identifies candidate genes that influence insulin secretion. PLoS Genet. 201410:e1004160.

Tohgi H, Utsugisawa K, Nagane Y, Yoshimura M, Genda Y, Ukitsu M. Reduction with age in methylcytosine in the promoter region −224 approximately −101 of the amyloid precursor protein gene in autopsy human cortex. Brain Res Mol Brain Res. 199970:288–92.

Fuso A, Seminara L, Cavallaro RA, D’Anselmi F, Scarpa S. S-adenosylmethionine/homocysteine cycle alterations modify DNA methylation status with consequent deregulation of PS1 and BACE and beta-amyloid production. Mol Cell Neurosci. 200528:195–204.

Fuso A, Cavallaro RA, Zampelli A, D’Anselmi F, Piscopo P, Confaloni A, et al. gamma-Secretase is differentially modulated by alterations of homocysteine cycle in neuroblastoma and glioblastoma cells. J Alzheimers Dis. 200711:275–90.

Schrack JA, Knuth ND, Simonsick EM, Ferrucci L. “IDEAL” aging Is associated with lower resting metabolic rate: The Baltimore Longitudinal Study of Aging. J Am Geriatr Soc. 201462:667–72.

Mason JB. Biomarkers of nutrient exposure and status in one-carbon (methyl) metabolism. J Nutr. 2003133:941S–7S.

Ulrey CL, Liu L, Andrews LG, Tollefsbol TO. The impact of metabolism on DNA methylation. Hum Mol Genet. 200514 Spec No 1:R139-47.

Chiang PK, Gordon RK, Tal J, Zeng GC, Doctor BP, Pardhasaradhi K, et al. S-adenosylmethionine and methylation. FASEB J. 199610:471–80.

James SJ, Melnyk S, Pogribna M, Pogribny IP, Caudill MA. Elevation in S-adenosylhomocysteine and DNA hypomethylation: potential epigenetic mechanism for homocysteine-related pathology. J Nutr. 2002132:2361S–6S.

Choi S-W, Claycombe KJ, Martinez JA, Friso S, Schalinske KL. Nutritional epigenomics: a portal to disease prevention. Adv Nutr. 20134:530–2.

Kuo H-K, Sorond FA, Chen J-H, Hashmi A, Milberg WP, Lipsitz LA. The role of homocysteine in multisystem age-related problems: a systematic review. J Gerontol A Biol Sci Med Sci. 200560:1190–201.

Bae S, Ulrich CM, Bailey LB, Malysheva O, Brown EC, Maneval DR, et al. Impact of folic acid fortification on global DNA methylation and one-carbon biomarkers in the Women’s Health Initiative Observational Study cohort. Epigenetics. 20149:396–403.

Pogribny IP, Tryndyak VP, Boureiko A, Melnyk S, Bagnyukova TV, Montgomery B, et al. Mechanisms of peroxisome proliferator-induced DNA hypomethylation in rat liver. Mutat Res. 2008644:17–23.

Ions LJ, Wakeling LA, Bosomworth HJ, Hardyman JEJ, Escolme SM, Swan DC, et al. Effects of Sirt1 on DNA methylation and expression of genes affected by dietary restriction. Age (Dordr). 201335:1835–49.

Denis H, Ndlovu ’Matladi N, Fuks F. Regulation of mammalian DNA methyltransferases: a route to new mechanisms. EMBO Rep. 201112:647–56.

Bestor TH. The DNA, methyltransferases of mammals. Hum Mol Genet. 20009:2395–402.

Casillas Jr MA, Lopatina N, Andrews LG, Tollefsbol TO. Transcriptional control of the DNA methyltransferases is altered in aging and neoplastically-transformed human fibroblasts. Mol Cell Biochem. 2003252:33–43.

Lopatina N, Haskell JF, Andrews LG, Poole JC, Saldanha S, Tollefsbol T. Differential maintenance and de novo methylating activity by three DNA methyltransferases in aging and immortalized fibroblasts. J Cell Biochem. 200284:324–34.

Liu L, van Groen T, Kadish I, Li Y, Wang D, James SR, et al. Insufficient DNA methylation affects healthy aging and promotes age-related health problems. Clin Epigenetics. 20112:349–60.

Armstrong VL, Rakoczy S, Rojanathammanee L, Brown-Borg HM. Expression of DNA methyltransferases is influenced by growth hormone in the long-living ames dwarf mouse in vivo and in vitro. J Gerontol A Biol Sci Med Sci. 201469:923–33.

Hahn MA, Qiu R, Wu X, Li AX, Zhang H, Wang J, et al. Dynamics of 5-hydroxymethylcytosine and chromatin marks in Mammalian neurogenesis. Cell Rep. 20133:291–300.

Tahiliani M, Koh KP, Shen Y, Pastor WA, Bandukwala H, Brudno Y, et al. Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian DNA by MLL partner TET1. Science. 2009324:930–5.

Tan L, Shi YG. Tet family proteins and 5-hydroxymethylcytosine in development and disease. Development. 2012139:1895–902.

Horvath S, Zhang Y, Langfelder P, Kahn RS, Boks MP, Van Eijk K, et al. Aging effects on DNA methylation modules in human brain and blood tissue. Genome Biol. 201213:R97.

Issa J-P. Age-related epigenetic changes and the immune system. Clin Immunol. 2003109:103–8.

Day K, Waite LL, Thalacker-Mercer A, West A, Bamman MM, Brooks JD, et al. Differential DNA methylation with age displays both common and dynamic features across human tissues that are influenced by CpG landscape. Genome Biol. 201314:R102.

Hannum G, Guinney J, Zhao L, Zhang L, Hughes G, Sadda S, et al. Genome-wide methylation profiles reveal quantitative views of human aging rates. Mol Cell. 201349:359–67.

Pollina EA, Brunet A. Epigenetic regulation of aging stem cells. Oncogene. 201130:3105–26.

Bröske A-M, Vockentanz L, Kharazi S, Huska MR, Mancini E, Scheller M, et al. DNA methylation protects hematopoietic stem cell multipotency from myeloerythroid restriction. Nat Genet. 200941:1207–15.

Beerman I, Bock C, Garrison BS, Smith ZD, Gu H, Meissner A, et al. Proliferation-dependent alterations of the DNA methylation landscape underlie hematopoietic stem cell aging. Cell Stem Cell. 201312:413–25.

Sun D, Luo M, Jeong M, Rodriguez B, Xia Z, Hannah R, et al. Epigenomic profiling of young and aged HSCs reveals concerted changes during aging that reinforce self-renewal. Cell Stem Cell. 201414:673–88.

Ooi SK, Qiu C, Bernstein E, Li K, Jia D, Yang Z, et al. DNMT3L connects unmethylated lysine 4 of histone H3 to de novo methylation of DNA. Nature. 2007448:714–7.

Osorio FG, Varela I, Lara E, Puente XS, Espada J, Santoro R, et al. Nuclear envelope alterations generate an aging-like epigenetic pattern in mice deficient in Zmpste24 metalloprotease. Aging Cell. 20109:947–57.

Oommen AM, Griffin JB, Sarath G, Zempleni J. Roles for nutrients in epigenetic events. J Nutr Biochem. 200516:74–7.


Watch the video: Inferring Frame Conditions with Static Correlation Analysis (June 2022).