Information

How many palindromic sequences in human genome?

How many palindromic sequences in human genome?


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I'm writing an article on palindromes (the words) and I wanted to mention the existence of palindromic gene sequences. Roughly how many palindromes exist in the human genome? I understand the number will vary from person to person. All I want to know is the order of magnitude. Is it 10^2, 10^3, 10^6?


The human genome contains approximately 1.25*10^7 palindromes longer than 6 bp (that is 8 bp or longer). Some interesting facts are also known about their distribution.

We found that 24 palindrome-abundant intervals are mostly located on G-bands, which condense early, replicate late, and are relatively A+T rich. In general, palindromes are overrepresented in introns but underrepresented in exons. Upstream region has enriched palindrome distribution, where palindromes can serve as transcription factor binding sites. We created a Human DNA Palindrome Database (HPALDB) which is accessible at http://vhp.ntu.edu.sg/hpaldb . It contains 12,556,994 entries covering all palindromes in the human genome longer than 6 bp.

And no, the number of shorter ones (shorter than 8 bp) isn't mentioned in the paper because they are considered unimportant, biologically.


Rationale

Improvements in the efficiency of DNA sequencing have both broadened the applications for sequencing and dramatically increased the size of sequencing datasets. Technologies from Illumina (San Diego, CA, USA) and Applied Biosystems (Foster City, CA, USA) have been used to profile methylation patterns (MeDIP-Seq) [1], to map DNA-protein interactions (ChIP-Seq) [2], and to identify differentially expressed genes (RNA-Seq) [3] in the human genome and other species. The Illumina instrument was recently used to re-sequence three human genomes, one from a cancer patient and two from previously unsequenced ethnic groups [4–6]. Each of these studies required the alignment of large numbers of short DNA sequences ('short reads') onto the human genome. For example, two of the studies [4, 5] used the short read alignment tool Maq [7] to align more than 130 billion bases (about 45× coverage) of short Illumina reads to a human reference genome in order to detect genetic variations. The third human re-sequencing study [6] used the SOAP program [8] to align more than 100 billion bases to the reference genome. In addition to these projects, the 1,000 Genomes project is in the process of using high-throughput sequencing instruments to sequence a total of about six trillion base pairs of human DNA [9].

With existing methods, the computational cost of aligning many short reads to a mammalian genome is very large. For example, extrapolating from the results presented here in Tables 1 and 2, one can see that Maq would require more than 5 central processing unit (CPU)-months and SOAP more than 3 CPU-years to align the 140 billion bases from the study by Ley and coworkers [5]. Although using Maq or SOAP for this purpose has been shown to be feasible by using multiple CPUs, there is a clear need for new tools that consume less time and computational resources.

Maq and SOAP take the same basic algorithmic approach as other recent read mapping tools such as RMAP [10], ZOOM [11], and SHRiMP [12]. Each tool builds a hash table of short oligomers present in either the reads (SHRiMP, Maq, RMAP, and ZOOM) or the reference (SOAP). Some employ recent theoretical advances to align reads quickly without sacrificing sensitivity. For example, ZOOM uses 'spaced seeds' to significantly outperform RMAP, which is based on a simpler algorithm developed by Baeza-Yaetes and Perleberg [13]. Spaced seeds have been shown to yield higher sensitivity than contiguous seeds of the same length [14, 15]. SHRiMP employs a combination of spaced seeds and the Smith-Waterman [16] algorithm to align reads with high sensitivity at the expense of speed. Eland is a commercial alignment program available from Illumina that uses a hash-based algorithm to align reads.

Bowtie uses a different and novel indexing strategy to create an ultrafast, memory-efficient short read aligner geared toward mammalian re-sequencing. In our experiments using reads from the 1,000 Genomes project, Bowtie aligns 35-base pair (bp) reads at a rate of more than 25 million reads per CPU-hour, which is more than 35 times faster than Maq and 300 times faster than SOAP under the same conditions (see Tables 1 and 2). Bowtie employs a Burrows-Wheeler index based on the full-text minute-space (FM) index, which has a memory footprint of only about 1.3 gigabytes (GB) for the human genome. The small footprint allows Bowtie to run on a typical desktop computer with 2 GB of RAM. The index is small enough to be distributed over the internet and to be stored on disk and re-used. Multiple processor cores can be used simultaneously to achieve even greater alignment speed. We have used Bowtie to align 14.3× coverage worth of human Illumina reads from the 1,000 Genomes project in about 14 hours on a single desktop computer with four processor cores.

Bowtie makes a number of compromises to achieve this speed, but these trade-offs are reasonable within the context of mammalian re-sequencing projects. If one or more exact matches exist for a read, then Bowtie is guaranteed to report one, but if the best match is an inexact one then Bowtie is not guaranteed in all cases to find the highest quality alignment. With its highest performance settings, Bowtie may fail to align a small number of reads with valid alignments, if those reads have multiple mismatches. If the stronger guarantees are desired, Bowtie supports options that increase accuracy at the cost of some performance. For instance, the '--best' option will guarantee that all alignments reported are best in terms of minimizing mismatches in the seed portion of the read, although this option incurs additional computational cost.

With its default options, Bowtie's sensitivity measured in terms of reads aligned is equal to SOAP's and somewhat less than Maq's. Command line options allow the user to increase sensitivity at the cost of greater running time, and to enable Bowtie to report multiple hits for a read. Bowtie can align reads as short as four bases and as long as 1,024 bases. The input to a single run of Bowtie may comprise a mixture of reads with different lengths.


Contents

Satellite DNA, together with minisatellite and microsatellite DNA, constitute the tandem repeats. [3]

The major satellite DNA families in humans are called:

Satellite family Size of repeat unit (bp) Location in human chromosomes
α (alphoid DNA) 170 [4] All chromosomes
β 68 Centromeres of chromosomes 1, 9, 13, 14, 15, 21, 22 and Y
Satellite 1 25-48 Centromeres and other regions in heterochromatin of most chromosomes
Satellite 2 5 Most chromosomes
Satellite 3 5 Most chromosomes

A repeated pattern can be between 1 base pair long (a mononucleotide repeat) to several thousand base pairs long, [5] and the total size of a satellite DNA block can be several megabases without interruption. Long repeat units have been described containing domains of shorter repeated segments and mononucleotides (1-5 bp), arranged in clusters of microsatellites, wherein differences among individual copies of the longer repeat units were clustered. [5] Most satellite DNA is localized to the telomeric or the centromeric region of the chromosome. The nucleotide sequence of the repeats is fairly well conserved across species. However, variation in the length of the repeat is common. For example, minisatellite DNA is a short region (1-5kb) of repeating elements with length >9 nucleotides. Whereas microsatellites in DNA sequences are considered to have a length of 1-8 nucleotides . [6] The difference in how many of the repeats is present in the region (length of the region) is the basis for DNA fingerprinting. [ citation needed ]

Microsatellites are thought to have originated by polymerase slippage during DNA replication. This comes from the observation that microsatellite alleles usually are length polymorphic specifically, the length differences observed between microsatellite alleles are generally multiples of the repeat unit length. [7]

Microsatellite expansion (trinucleotide repeat expansion) is often found in transcription units. Often the base pair repetition will disrupt proper protein synthesis, leading to diseases such as myotonic dystrophy. [8]

Satellite DNA adopts higher-order three-dimensional structures in eukaryotic organisms. This was demonstrated in the land crab Gecarcinus lateralis, whose genome contains 3% of a GC-rich satellite band consisting of a

2100 base pair (bp) "repeat unit" sequence motif called RU. [9] [10] The RU was arranged in long tandem arrays with approximately 16,000 copies per genome. Several RU sequences were cloned and sequenced to reveal conserved regions of conventional DNA sequences over stretches greater than 550 bp, interspersed with five "divergent domains" within each copy of RU.

Four divergent domains consisted of microsatellite repeats, biased in base composition, with purines on one strand and pyrimidines on the other. Some contained mononucleotide repeats of C:G base pairs approximately 20 bp in length. These strand-biased domains ranged in length from approximately 20 bp to greater than 250 bp. The most prevalent repeated sequences in the embedded microsatellite regions were CT:AG, CCT:AGG, CCCT:AGGG, and CGCAC:GTGCG [11] [12] [5] These repeating sequences were shown to adopt altered structures including triple-stranded DNA, Z-DNA, stem-loop and others under superhelical stress. [11] [12] [5]

Between the strand-biased microsatellite repeats and C:G mononucleotide repeats, all sequence variations retained one or two base pairs with A (purine) interrupting the pyrimidine-rich strand and T (pyrimidine) interrupting the purine-rich strand. This sequence feature appeared between microsatellite repeats and C:G mononucleotides in all four of the strand-biased domains sequenced. These interruptions in compositional bias adopted highly distorted conformations as shown by their response to nuclease enzymes, presumably due to steric effects of the larger (bicyclic) purines protruding into the complementary strand of smaller (monocyclic) pyridine rings. The sequence TTAA:TTAA was found in the longest such domain of RU, which produced the strongest of all responses to nucleases. That particular strand-biased divergent domain was subcloned and its altered helical structure was studied in greater detail. [11]

A fifth divergent domain in the RU sequence was characterized by variations of a symmetrical DNA sequence motif of alternating purines and pyrimidines shown to adopt a left-handed Z-DNA/stem-loop structure under superhelical stress. The conserved symmetrical Z-DNA was abbreviated Z4Z5NZ15NZ5Z4, where Z represents alternating purine/pyrimidine sequences. A stem-loop structure was centered in the Z15 element at the highly conserved palindromic sequence CGCACGTGCG:CGCACGTGCG and was flanked by extended palindromic Z-DNA sequences over a 35 bp region. Many RU variants showed deletions of at least 10 bp outside the Z4Z5NZ15NZ5Z4 structural element, while others had additional Z-DNA sequences lengthening the alternating purine and pyrimidine domain to over 50 bp. [13]

One extended RU sequence (EXT) was shown to have six tandem copies of a 142 bp amplified (AMPL) sequence motif inserted into a region bordered by inverted repeats where most copies contained just one AMPL sequence element. There were no nuclease-sensitive altered structures or significant sequence divergence in the relatively conventional AMPL sequence. A truncated RU sequence (TRU), 327 bp shorter than most clones, arose from a single base change leading to a second EcoRI restriction site in TRU. [9]

Another crab, the hermit crab Pagurus pollicaris, was shown to have a family of AT-rich satellites with inverted repeat structures that comprised 30% of the entire genome. Another cryptic satellite from the same crab with the sequence CCTA:TAGG [14] [15] was found inserted into some of the palindromes. [16]


CRISPR/Cas9 in Genome Editing and Beyond

The Cas9 protein (CRISPR-associated protein 9), derived from type II CRISPR (clustered regularly interspaced short palindromic repeats) bacterial immune systems, is emerging as a powerful tool for engineering the genome in diverse organisms. As an RNA-guided DNA endonuclease, Cas9 can be easily programmed to target new sites by altering its guide RNA sequence, and its development as a tool has made sequence-specific gene editing several magnitudes easier. The nuclease-deactivated form of Cas9 further provides a versatile RNA-guided DNA-targeting platform for regulating and imaging the genome, as well as for rewriting the epigenetic status, all in a sequence-specific manner. With all of these advances, we have just begun to explore the possible applications of Cas9 in biomedical research and therapeutics. In this review, we describe the current models of Cas9 function and the structural and biochemical studies that support it. We focus on the applications of Cas9 for genome editing, regulation, and imaging, discuss other possible applications and some technical considerations, and highlight the many advantages that CRISPR/Cas9 technology offers.

Keywords: CRISPR applications Cas9 structure dCas9 epigenetic regulation gene regulation genomic imaging.


Discussion

Recent research has shown that palindromes may be critical to several cellular processes, including transcription, replication, and DNA recombination 37 . Therefore, it is important to study palindrome distribution in the genome to understand their functions and disease associations. Palindromes are abundantly present in the human genome, but their distribution is non-uniform. This distribution can be correlated with their participation in important biological functions. The palindrome lengths also vary greatly in the genome. In general, shorter palindromes are expected to be more abundant than longer palindromes, and both short and long palindromes have been implicated in genomic instability 7 .

The frequency and distribution, from our analysis, of both long and short palindromes of varying lengths in each chromosome are shown in Supplementary Fig. 1 and Supplementary Table 1. We observed that palindromes with lengths between 8 and 20 bp were the most frequent, and chromosomes 3 and 19 had a high number of long palindromes (>200 bp). Long palindromes form secondary structures that act as hot spots for genomic rearrangements and translocations 8 , and are known to participate in as many as 30% of integration events in the human DNA 38 . These palindromes constitute fragile sites, are correlated with breakage and deletion, and are associated with diseases 7 . Our analysis revealed the presence of a large number of palindromes in both the reference genome (GRCh37/hg19 build) and in 1000G. According to our results, the very long palindromes were mostly AT repeats.

AT richness of palindromes

We defined the AT richness of a palindrome using the following formula:

where A/T/C/G represent the number of respective bases present in the palindrome.

AT-rich palindromes are those with an AT-richness percentage of ≥50. Other palindromes are referred to as CG-rich. Eighty percent of the palindromes had an AT content of 80–90%, whereas only 2% of the palindromes had CG content >80%. The analysis of AT- and GC-rich palindromes is shown in Supplementary Table 1. PATRRs are sites frequently associated with double-strand breakage and hairpin or cruciform DNA formation that lead to translocations and recombinations 11 . We found that the longest palindromes were AT-rich.

Palindromes in functional regions

The distribution of palindromes in various genomic regions, such as exons, introns, and regulatory regions, including TFBS, CpG islands, and ncRNA regions, is shown in Supplementary Fig. 2 and Supplementary Table 1. Regulatory regions contain promoters and enhancers, and palindromic sequences in these regions are known to serve as TFBS for regulating gene expression. For example, palindromic sequences were found in promoter regions that may be binding sites for TFs, such as CREB, USF, and NRF-1, providing further support for their role in gene regulation 12 .

As a preliminary study to understand the association of palindromes with diseases, we analyzed the GWAS Catalog SNPs for their influence on the nature of the palindromes, that is, whether they altered the palindromes to make them longer/shorter, or near/perfect. We found that, overall, disease-associated risk variants (GWAS SNPs) were 14 times more likely to be present in palindromic regions than expected. Many diseases/traits are associated with SNPs that cause palindrome changes. For example, 7% or 62 obesity-related trait SNPs, 15% or 30 Crohn’s disease-associated SNPs, and 50% or 28 intelligence-associated SNPs induced palindrome changes these variants were also found in 1000G (Supplementary Table 6).

Using the eQTL calculator from GTEx 39 , we tested whether any of the SNPs in palindromic regions associated with six diseases — diabetes, rheumatoid arthritis (RA), schizophrenia, Alzheimer’s disease, breast cancer, coronary heart disease — affected the expression of the genes to which they had been mapped. This association was tested in tissues relevant to the diseases. Of the 15 SNPs that were significant eQTLs in any of these 6 diseases (Supplementary Table 7), 14 were intronic variants, and 1 was a downstream gene variant (rs610932 mapped to MS4A6A, Alzheimer’s disease). Two of these intronic variants (rs3825932, CTSH, diabetes rs4239702, CD40, RA) overlapped with regions called “retained introns.” These introns are retained during transcription and introduce premature stop codons into mRNA, leading to erroneous gene expression.

When analyzing palindrome-altering GWAS variants with low allele counts (AC < 100) in 1000G (which represents a healthy population), we found one SNP (rs11571833) that is associated with both breast and lung cancers in the BRCA2 gene (with AC = 22). Similarly, an intronic SNP (AC = 15) that is associated with ovarian cancer formed a new palindrome in BRIP1, a gene encoding the BRCA1-interacting protein, required for BRCA1-mediated DNA repair 40 . In our previous pilot study of palindrome alterations by breast cancer-associated variants in The Cancer Genome Atlas (TCGA), we found that many palindrome changes were associated with oncogenes and breast cancer genes 33 . Of all the palindromes that showed any variation in cancer genomes (matched normal and tumor samples), 38% of what was near breast cancer genes happened to be the most differentiated palindromes in tumor samples. The palindromes that are associated with oncogenes, such as RAD21, NBN and KMT2A, were found to have changed significantly in the tumor samples. In addition, we observed that the palindromes that were associated with oncogene NUP98 were completely absent in tumors. These results further support the possible role of palindromes in various diseases, including cancer.

We also identified the individual SNPs in or near palindromes that are associated with multiple diseases or traits. An SNP (rs6857) that is present in the palindromic region of NECTIN2 (alias PVRL2) gene is associated with many diseases that are related to neuronal functions. Examples include macular degeneration, Alzheimer’s disease, memory, and frontotemporal dementia. Interestingly, this SNP is present in the 3’UTR region, leading to the formation of a perfect palindrome in 553 individuals in 1000G. The 3′-untranslated region (UTR) region is involved in gene regulation and influences polyadenylation, mRNA stability, and translation. It also contains binding sites for transcriptional regulators such as miRNA 41 . miRNAs regulate neurogenesis and brain development. Hence, palindrome changes in UTR regions may affect miRNA binding, leading to disease progression. Further, due to the presence of a missense variant in PTPN22 gene that is associated with multiple diseases, such as Crohn’s disease, diabetes, and RA, a non-palindromic sequence became perfectly palindromic in 1000G.

We then reviewed the palindrome sequences that have SNPs with significant regulomeDB scores. We learned that one of the SNPs (rs7386474), which is associated with bipolar disorder and schizophrenia, is a binding site for FOXP2 protein, a TF playing a significant role in these mental illnesses 42 . Another palindrome-altering SNP (rs2535629) associated with autism spectrum disorder and other mental disorders 43 was present in the intron of ITIH3 gene and is bound by several proteins such as FOS, MYC, CTCF, RAD21, SMC3, and ZNF143. These results further support the role of palindromes in diseases since these SNPs lead to palindrome changes that may affect the binding of TFs, hinting at a possible mechanism for disease pathogenesis. A missense variant (A/G) in rs2476601 on chr1, mapped to the gene PTPN22, was associated with the formation of a new palindrome. In a study, this SNP, which is associated with RA, was identified as a “functional SNP” modulating the binding of TFs 44 . The least frequent allele (minor allele) in 1000G in this case was “A” (frequency = 0.027). This allele had a frequency of 0.09 in a particular RA cohort and was linked to RA (p value = 9E − 170) 45 . Compared with the controls, marked downregulation of PTPN22 expression was observed in the RA patients carrying this risk allele (p value = 7E − 03) 46 . RegulomeDB assigned a score of “2b” to rs2476601, indicating that “protein binding is likely to get affected.” From the ENCODE ChIP-Seq data, the two TFs that bind within this region (chr1:114377420–chr1:114377736) were FOS and STAT3, both of which have been linked to RA. Figure 7 provides illustrative examples of palindrome-mediated mechanisms of disease, as indicated in the literature.

a In certain subsets of bipolar disorder and schizophrenia patients, a mutation in the promoter region of PIK3C3 (−432C -> T) extends a 6-base palindrome (“TTTAAA”) into an 8-base palindrome (“ATTTAAAT”), which also acts as a 6/8 recognition sequence for POU domain transcription factors such as POU2F1 (OCT-1) and POU3F3 (BRN-1), whose consensus sequence is “ATTTGCAT” 51 . These transcription factors are regulators of brain development. Binding of POU domain transcription factors to the palindromic sequence may lead to the transcriptional activation of PIK3C3 and PIK3C3-mediated neurodevelopmental changes. b X-linked congenital generalized hypertrichosis is a rare genetic condition characterized by hair overgrowth over the entire body. In families in which this condition is segregated, chromosomal breakpoints are observed in a 180-base palindromic sequence located 82 kb downstream of the SOX3 gene on Xq27.1 52 . SOX (SRY-related HMG-box) transcription factors are regulators of embryonic development. The 180-base palindromic sequence mediates an interchromosomal insertion of either a 125,577 bp fragment from COL23A1 of 5q35.3 or a 300,036 bp fragment from 4q31.2 (including the genes PRMT9 and TMEM184C and sections of EDNRA and ARHGAP10) into these breaks. New regulatory elements may be introduced with the insertion of these fragments. It has been conjectured that, as a result of these new elements, SOX3 may be ectopically expressed in hair follicles or precursor cells during the early stages of hair follicle development. Structures of chromosomes 18 and X, and gene structures of PIK3C3, SOX3, COL23A1, PRMT9, TMEM184C, EDNRA, and ARHGAP10 were taken from UCSC Genome Browser (https://genome.ucsc.edu/) 53 . The images were produced based on the GRCh38 (hg38) assembly. The protein structure of POU2F1 54 (PDB ID: 1OCT) was downloaded from RCSB PDB 55 . The image of POU2F1 was created using UCSF Chimera 56 .

We believe that these results will help researchers to understand palindrome distribution and conservation across various populations. These results will also help to identify individual palindromes that undergo rearrangements due to the presence of variants such as SNPs that could affect various cellular processes leading to gene dysregulation and disease pathogenesis.

The catalog of palindromic sequences (COPS)

The COPS will serve as a resource to investigate palindromic variations in genomics studies of diseases. Specifically, COPS can serve as control data for the comparison of palindrome variations in patient genomes with the palindromes in 1000G. This was demonstrated in our pilot study on TCGA data in which we compared palindromes in matched tumor and normal pairs of genomes with the 1000G data presented in COPS 33 .

We are making available the location and length of every palindrome that appears in the reference genome or the 1000G genomes and its variation in each of the 2504 individual genomes with respect to the reference genome. In addition to the individual occurrences of palindromes, aggregated results are presented to show the distribution in coding and non-coding regions, palindrome conservation across the genomes, the presence of rare and common variants within the palindromes, and the GWAS SNPs that are associated with palindromic changes for various diseases.


The Enormous Power of This Genetic Tool

The CRISPR-Cas9 system has enabled countless scientists to achieve research goals that would have been difficult or near impossible without this technology. The possibility that CRISPR-Cas9 could edit out diseases in so many people is tantalizing. Aside from research and human health, in the field of plant biology, CRISPR-Cas9 has been used to make staple crops more resistant to drought and to pathogens. The technology may play a significant role in the improvements to food security and quality, especially in light of climate change (Joshi et al. 2020). We thank Emmanuelle Charpentier and Jennifer A Doudna for their irreplaceable contributions towards making these developments possible.


Libraries

A "library" is a convenient storage mechanism of genetic information.

  • They are typically either "genomic" or "cDNA" (i.e. mRNA in DNA form) genetic information.
  • Deduced genetic sequences from corresponding polypeptide information can be used to identify specific genetic information within a library.

CDNA library construction

The enzyme responsible for this is an RNA dependent DNA polymerase called reverse transcriptase.

  • Reverse transcriptases have traditionally been isolated from viruses whose genome is actually in an RNA form and must be converted to duplex DNA.
  • These viruses typically carry a functional reverse transcriptase along with their mRNA genetic component when they infect cells.
  • One of the most common commercially available reverse transcriptases is Moloney murine leukemia virus (MMLV).
  • This RNA dependent DNA polymerase (as will all polymerases) add nucleotides to a nacent polynucleotide in the 5' to 3' direction using RNA as the template . It does not contain any 3'->5' exonuclease (proofreading) activity.

MMLV will use mRNA as a template, but requires a primer (it can extend a DNA primer but cannot synthesize one).

  • One of the really neat things about eukaryotic mRNA's is the presence of the 3' poly A tracks.

Note that we have produced complementary DNA (or cDNA) to the original mRNA strand.

If we can introduce "nicks" into the RNA half of this DNA/RNA duplex then the situation would be very similar to that observed in "lagging strand" synthesis of prokaryotic genomic DNA.

  • Nicks in the RNA half of the molecule can be introduced via the action of the enzyme RNAse H.
  • This enzyme exhibits endonucleolytic cleavage of the RNA moiety of RNA/DNA hybrids, as well as 5'->3' and 3'->5' exoribonuclease activity.
  • In other words, it will nick the RNA and then proceed to digest back in both directions:
  • These RNA fragments can now serve as primers for DNA synthesis by E. coli Pol I. This enzyme will also translate the "nicks" to effectively remove the RNA primers:

Figure 3.6.3:DNA synthesis

Note that we will potentially have either a residual 5' RNA cap region, or a gap at the 5' end of the original mRNA strand.

Insertion of cDNA into plasmid.

To complete our construction of a useful cDNA library we need a way to maintain and propagate our cDNA.

  • We can accomplish this by inserting the cDNA into an appropriate plasmid.
  • There are two classical ways of accomplishing this feat:
  1. Homopolymeric tailing
  2. Linker addition

Homopolymeric tailing

Terminal transferase is an unusual DNA polymerase found only in a type of eukaryotic cell called a prelymphocyte.

  • In the presence of a divalent cation the enzyme catalyzes the addition of dNTP's to the 3'-hydroxyl termini of DNA.
  • When the nucleotide to be added is a purine, Mg 2+ is the cation used.
  • When the nucleotide to be added is a pyrimidine, Co 2+ is used.
  • Depending on the reaction conditions, anywhere from three to several thousand bases will be added.

Figure 3.6.4:Terminal transferase activity

  • If we cut our plasmid and also treat it with terminal transferase, except now we add the complementary base to the one we added to our cDNA, we can anneal and ligate the cDNA into the plasmid.

Figure 3.6.5:Ligating cDNA into the plasmid

  • The utility of inserting the C-tailed cDNA insert into a G-tailed Pst I site in the vector is as follows:
  1. The Pst I recognition sequence and cleavage site is
    5' C T G C A G 3'
    3' G A C G T C 5'
  2. Cleavage of this site by Pst I, followed by G-tailing will produce
    5' C T G C A (G)n G 3'
    3' G (G)n A C G T C 5'

Linkers

An alternate method to insert cDNA fragments into a library vector is through the addition of "linkers".

    Linkers are short oligonucleotides (

The steps in linker addition are as follows:

  1. Treatment of cDNA with S1 nuclease (to remove possible 5' cap mRNA fragment remaining in cDNA duplex
  2. Convert potential "ragged" ends to blunt by treatment with Pol I (will fill in 5' overhangs and chew back 3' overhangs)
  3. Methylate cDNA at potential internal Eco RI sites by treatment with Eco RI methylase (plus S-adenosyl methionine)
  4. Ligate linkers to blunt, methylated cDNA using T4 DNA ligase
  5. Cut linkers with Eco RI restriction endonuclease
  6. Remove linker fragments from cDNA fragments by agarose gel electrophoresis
  7. Ligate cDNA to vector DNA fragment (opened up by Eco RI restriction endonuclease

This textbook was published in 1998. The Human Genome Project was completed in 2003.


High-throughput analysis of the activities of xCas9, SpCas9-NG and SpCas9 at matched and mismatched target sequences in human cells

The applications of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing can be limited by a lack of compatible protospacer adjacent motifs (PAMs), insufficient on-target activity and off-target effects. Here, we report an extensive comparison of the PAM-sequence compatibilities and the on-target and off-target activities of Cas9 from Streptococcus pyogenes (SpCas9) and the SpCas9 variants xCas9 and SpCas9-NG (which are known to have broader PAM compatibility than SpCas9) at 26,478 lentivirally integrated target sequences and 78 endogenous target sites in human cells. We found that xCas9 has the lowest tolerance for mismatched target sequences and that SpCas9-NG has the broadest PAM compatibility. We also show, on the basis of newly identified non-NGG PAM sequences, that SpCas9-NG and SpCas9 can edit six previously unedited endogenous sites associated with genetic diseases. Moreover, we provide deep-learning models that predict the activities of xCas9 and SpCas9-NG at the target sequences. The resulting deeper understanding of the activities of xCas9, SpCas9-NG and SpCas9 in human cells should facilitate their use.


Main Text

Introduction

The American Society of Human Genetics (ASHG) Workgroup on Human Germline Genome Editing developed the present position statement and explanatory paper between August 2015 and January 2017. This group, composed of a combination of basic and clinical scientists, bioethicists, health services researchers, lawyers, and genetic counselors, worked together to integrate the scientific status of and socio-ethical views toward human germline genome editing (defined as using genome-editing techniques in a human germ cell or embryo) into this statement. The group met regularly through a series of weekly conference calls and email discussions, proposed a draft statement to the ASHG Board of Directors in April 2016, presented the draft policy statement to ASHG and European Society of Human Genetics (ESHG) members at the ASHG-ESHG Building Bridges session in May 2016, and requested comments from ASHG members in June 2016. A total of 27 comments were received, 4 of which were in opposition to the statement. All comments and recommended modifications were reviewed by the committee and discussed as part of the development of this explanatory paper, which was reviewed and approved by the ASHG Board of Directors in March 2017.

The workgroup included representation from the following professional organizations (in alphabetical order), which then also approved the position statement and paper: the Association of Genetic Nurses and Counsellors, Canadian Association of Genetic Counsellors, International Genetic Epidemiology Society, and National Society of Genetic Counselors. This resulting policy statement was then reviewed and endorsed by the following professional organizations (also listed in alphabetical order) before submission for publication: the American Society for Reproductive Medicine, Asia Pacific Society of Human Genetics (APSHG), British Society for Genetic Medicine, Human Genetics Society of Australasia, Professional Society of Genetic Counselors in Asia, and Southern African Society for Human Genetics. (The APSHG would like to add a comment that we also express a concern that in some countries with inadequate ethics committee oversight or strong institutional review boards [IRBs], the potential for abuse exists. Hence, there is a strong need to continue to educate our professionals, researchers, journal reviewers, journals, and IRBs about this technology. The potential benefits of this technology should not be stifled because of the possibility of poor oversight or misuse.)

Scientific Background

“Genome editing” collectively refers to a set of technologies, including a new tool based on the CRISPR/Cas9 mechanism discovered in Streptococcus pyogenes. This and other organisms use this system to protect themselves from viral infections. The system can be engineered to facilitate the targeted modification of specific DNA sequences in the genomes of living cells. CRISPR/Cas9 and other genome-editing methods have been thoroughly reviewed elsewhere. 1 , 2 , 3 Like many other robust DNA modification technologies, CRISPR/Cas9 has quickly become a widely used research tool, and its embrace testifies to the ease with which it can be customized and its effectiveness in multiple cell types and species. In many ways, preceding gene-transfer technologies that fell short of “genome editing”—i.e., introduced genes into cells but did not permanently incorporate them into the genome—laid the groundwork for the issues presented in this statement. 4 Of relevance here are several key issues raised by early somatic gene-therapy trials: (1) a real prospect of treating and even curing previously intractable diseases, especially in cases where the primary cause is a defective gene (2) the possibility of undesirable side effects, sometimes due to theꃞlivery method or to the random insertion site of the transferred DNA itself and (3) regulatory oversight.

In the 1980s, true genome-targeting techniques—that is, the targeted modification of a specific sequence at its normal genomic location rather than the insertion of gene copies at other locations—were pioneered for germline engineering in mice. These early studies catalyzed much research and thought into the scientific advantages of gene targeting over traditional gene-transfer methods. By 2010, decades of work had culminated in the development of a variety of engineered nucleases such as zinc-finger nucleases, meganucleases, and transcription activator-like effector nucleases. In early 2013, the introduction of an RNA-guided nuclease—the CRISPR/Cas9 system adapted from the bacterial species Streptococcus pyogenes—was shown to specifically cleave target sequences 5 and enable a new approach to precise genome modification in mammalian cells. 6 , 7 , 8 , 9 Since then, additional RNA-guided nucleases from other bacterial species have been described and are being investigated for their potential as genome-editing tools.

Genome-editing tools all work in a similar fashion. They “target” specific DNA sequences for individual genes or non-coding regions by engineering certain proteins or protein-RNA complexes that can then recognize and bind the sequences and generate single-strand or double-strand DNA breaks. For example, a Cas9 protein along with a CRISPR “guide RNA” can find a target gene among the thousands of genes in a cell’s genome and cleave both DNA strands at the target site. It is this cleavage event that can be exploited to create a mutation in, or �it,” the target gene.

The cell’s normal DNA repair machinery then attempts to repair the DNA break. The outcome of this process is often the introduction of a mutation, most frequently the deletion of some DNA at the target site. If a separately engineered 𠇍onor” DNA fragment is also provided, the repair machinery can use this as a template to fix the DNA break—thus, the engineered DNA molecule can allow new sequences to be introduced at the target site. This latter process is key to many potential genome-editing applications, because the donor DNA fragment can carry a normal sequence intended to replace a pre-existing deleterious mutation or, alternatively, a novel, beneficial variant. In this way, mutations that cause disease could potentially be corrected, or new mutations could be introduced to alter gene function in such a way as to prevent or treat disease.

RNA-guided nucleases such as CRISPR/Cas9 have two clear advantages over previous gene-editing tools. First, they can be easily customized to target specific sequences via alteration of only a small number of nucleotides in the guide RNA (20 nucleotides in the case of Streptococcus pyogenes CRISPR/Cas9)𠅊 simple, fast, and inexpensive process that is much simpler than previous gene-editing methods. Second, RNA-guided nucleases are dramatically efficient at cleaving target genomic sequences in some cell types and organs 10 , 11 , 12 —so much so that for many applications, the delivery of the protein and RNA components into target cells, rather than the targeting itself, is now the main rate-limiting step in genome editing.

Thus, individual genes can be targeted for engineering inꃎlls grown in the laboratory or even within live animal tissues. In fact, engineered nucleases have been shown to be򠻿icient in a wide variety of organisms, including many mammals. Human cells are also readily amenable to genome editing. Accordingly, there is considerable interest in using genome-editing tools to develop cell-based human therapeutics that could potentially deliver lifesaving treatments for diseases such as HIV infection, sickle-cell anemia, and cancers.

Genome editing has been shown to work in embryos from many species. This is already accelerating the pace of many areas of biology as researchers use genome-editing methods to more quickly and cheaply study the function of genes in model organisms and economically important species such as crops, livestock, and energy feedstock. It has been shown that engineered nucleases, especially CRISPR/Cas9, can be easily used to edit genes in mammalian embryos such as mice, rats, and even monkeys. 11 , 13 , 14 These embryos can then be implanted into foster animals and carried to term, generating live-born animals carrying precise changes in their DNA. However, off-target mutagenesis and mosaicism in the resulting animals can be significant drawbacks of the technology. 15

The similarity between human embryos and other animal embryos raises the possibility that genome-editing methods could be incorporated into human-assisted reproduction procedures. Already, CRISPR/Cas9-mediated genome editing of 1-cell-stage mouse zygotes is routine 16 in this context, reports that human embryos could be similarly edited are not surprising. In early 2015, the first study demonstrating that CRISPR/Cas9 could be used to modify genes in early-stage human embryos was published. 17 Although the embryos employed for those experiments were not capable of developing to term, the work clearly demonstrated that genome editing with CRISPR/Cas9 in human embryos can readily be performed. This report has stimulated many scientists and organizations to clarify their stance on the use of genome-editing methods.

Here, it is important to note the distinction between somatic and germline genome editing. Somatic genome editing refers to the alteration of cells that cannot contribute to gamete formation and thus cannot be passed on from the individual to offspring. In contrast, germline genome editing, which is the primary focus of this position statement, refers to genome editing that occurs in a germ cell or embryo and results in changes that are theoretically present in all cells of the embryo and that could also potentially be passed from the modified individual to offspring. In theory, modification of gamete-producing cells at any point in development could permit this. Because human germline genome editing has potential effects on both the treated individual and subsequent generations of persons, it entails ethical considerations beyond those of somatic genome modification.

Regardless of whether it entails somatic or germline genome editing, its efficacy and safety must be established before any consideration is given to a genome-editing method as a potential therapeutic approach. CRISPR/Cas9 is indeed highly efficient in many cell types, but it is seldom 100% effective at introducing alterations at a target site, although double-digit percentages are routine. More concerning is that the desired �iting” event usually competes with the generation of unwanted mutations at the target site. Thus, genome-editing applications usually generate a mixture of genetically heterogeneous cells.

It has also been well documented that DNA cleavage by native CRISPR/Cas9 does not always require perfect pairing between all bases in its guide RNA and the target, sometimes permitting unwanted cleavage at off-target locations. 18 , 19 , 20 , 21 Although these off-target effects are low enough to permit most research applications, 22 , 23 the safety requirements for any human clinical genome-editing application are more stringent. New methods and combinations of methods are being used to better estimate the risk that off-target mutations will occur and their potential effects on the patient. We note that rapid strides are being made to reduce the off-target effects of CRISPR/Cas9. 24 , 25

In summary, there remains no agreement as to which specific platforms, methods, and interpretations of benefits and risks will need to be applied in the validation of the safety of genome-editing therapeutic applications. Nevertheless, when considered in the context of somatic therapy, novel methods of genome editing such as CRISPR/Cas9 will probably raise few truly novel ethical issues that have not been addressed in previous contexts, such as with gene-therapy trials. However, CRISPR/Cas9 is so efficacious in human embryos that germline gene editing is also now possible in our species, raising a host of ethical, social, and legal issues that warrant careful consideration and deliberation.

Ethical Issues

The ethical assessment of human germline genome editingꃺlls, broadly, into two categories: (1) those arising from its potential failure and (2) those arising from its success.

Ethical Issues Related to the Potential Failure of Human Germline Genome Editing

Exposing individuals to the health consequences of interventions with potentially harmful effects is of concern when such risks do not outweigh their potential benefits. In human germline genome editing, the magnitude of the potential risks of off-target or unintended consequences are yet to be determined. For this reason, safeguards against misguided or premature attempts of this intervention should rely, at a minimum, on existing mechanisms governing the clinical introduction of other reproductive therapies.

There are both national and international policies that regulate embryo research and interventions early in human development 26 , 27 , 28 that apply to research and the potential clinical translation of human germline genome editing. Their underlying normative frameworks typically address the broad ethical context of human-assisted reproduction technologies and human subjects and genomics research and take into consideration core ethical principles of autonomy, beneficence, non-maleficence, and justice. Differences in these policies include the very definition of what constitutes a human embryo or a reproductive cell, the particular policy tool adopted (legislation, regulation, or professional guidance) and the document’s enforcement (legally binding or self-compliance), and oversight mechanisms (e.g., licensing of activities). Overall, the majority of available statements and recommendations (summarized in Tableਁ ) restrict applications from attempting to initiate a pregnancy with an embryo or reproductive cell whose germline has been altered.

Table 1

Summary of Recommendations in Major Group, Organizational, and Government Statements Related to Human Germline Gene Editing

ArgumentsOrganizations
The Hinxton Group 51NAS, NAM, CAS, and UK Royal Society International Summit 52NAS and NAM Committee on Human Gene Editing 53ASGCT and JSGT 54ISSCR 55Baltimore etਊl. 56EGE 57Lanphier etਊl. 58ACMG 59NIH 60HFEA 61
Basic research should be conductedxxxxxx x
Preclinical research should be conducted xx
There should be a partial or full moratorium on research xx x a
Diverse stakeholders should be involved in decision makingxxxxxxxxx
Clinical use should not proceed currentlyxxxxxxxxx
Clinical use should proceed only if safety and efficacy issues are resolvedxxxxxxx x
Clinical use should proceed only if society has agreed on boundsxxxxx x x
Clinical use should proceed only if appropriate oversight is in placexxx x
Clinical use should proceed only if justice and equity concerns are addressedx x x
Clinical use should proceed only if it is transparent x x
Clinical use should be discouraged worldwide x
Any public policies regulating this area of science should be flexiblex

Only main, overt arguments made in each statement are marked by an “x.” Thus, the lack of an “x” does not necessarily indicate disagreement. The table includes only major recommendations from each statement rather than background and is not exhaustive. Also, because this table cannot capture every nuance of each statement, whether a statement addresses a particular point is in some cases subjective. Many groups speaking independently have made statements about human germline gene editing and related research. These organizations vary in composition from coalitions of experts to professional societies to government entities or representatives, but the content of many of the reports and recommendations is fairly similar. Most statements agree that basic research should be conducted but that clinical applications should be avoided at least in the short term. Many of the statements outline criteria that must be met before clinical use of human germline gene modification should be considered, including overcoming safety and technological barriers, achieving societal consensus on bounds, putting appropriate and transparent oversight mechanisms in place, and addressing equity concerns. The most significant area of disagreement is with regard to the types of research that should be allowed currently, including whether there should be a partial or full moratorium. Abbreviations are as follows: NAS, US National Academy of Sciences NAM, US National Academy of Medicine CAS, Chinese Academy of Sciences ASGCT, American Society for Gene and Cell Therapy JSGT, Japan Society of Gene Therapy ISSCR, International Society for Stem Cell Research EGE, European Group on Ethics in Science and New Technologies ACMG, American College of Medical Genetics NIH, National Institutes of Health HFEA, UK Human Fertilization and Embryology Authority.

Across jurisdictions, the regulation of human embryo and/or germline manipulation could be categorized as restrictive, intermediate, and permissive. Under the restrictive approach, wide-ranging prohibitions (or moratoria) to activities carried out in a human embryo or germ cell are adopted. In contrast, the intermediate and permissive approaches allow some degree of research and clinical activities to be carried out, although with limitations and oversight in place for research activities linked to reproductive purposes. It is important to note that restrictive policies and limited availability or use of basic research funding do not necessarily prevent certain research or the development of new technologies from taking place. 29 For example, in 2001, President George W. Bush restricted federally funded embryonic stem cell research in the US to the use of a small number of cell lines available at the time. 30 This, however, did not prevent individual states (e.g., California funded the California Institute for Regenerative Medicine through proposition 71), private funders, and other countries from providing research dollars for਎mbryonic stem cell research, sometimes in settings with limited transparency and oversight. From a broader perspective, the effect of diverting public funding away from certain areas of research could result in the degradation, or the complete omission, of the usual required mechanisms that ensure that the research is subject to ethical oversight (via research ethics boards and their equivalents) and that it remains in the public domain. The latter enables oversight and transparency through data sharing, peer-reviewed publication, and dissemination of research resources. 31 It ultimately ensures that the research is in the public interest.

Ethical Issues Related to the Success of Human Germline Genome Editing

Beyond the potential and yet unknown risks of human germline genome editing, there are a number of ways in which the impact of these novel technologies could be ethically problematic if and when they function as intended. Concerns regarding the impact of these technologies on an individual, a family, and society more broadly are similar to those raised by gene therapy in general, as well as embryo research and reproductive technologies (e.g., in vitro fertilization, pre-implantation genetic diagnosis, and prenatal testing).

Impact on the Individual and Family: One of the most significant issues related to human genome editing relates to the impact of the technology on future individuals whose genes are modified de facto without their consent. Clinical ethics accepts the idea that parents are, almost always, the most appropriate surrogate medical decision makers for their children until the children develop their own autonomy and decision-making capacity. This is based on the assumption that, except under rare circumstances, parents have the most to lose or gain from a decision and will ultimately make decisions that reflects the future values and beliefs of their children. 32 , 33 By extension, we might assume that parents are the most appropriate decision makers for their future children as well. Although there are anecdotal reports of children and adults who disagree with the medical decisions made by a parent during pregnancy or early childhood, particularly when death was a possible outcome, the idea that a person would have been better off if they had not existed has not gained much traction with the public or in the judicial system, which have usually rejected so-called “wrongful life” suits on the basis of the same principle. 34 Of note, there are also published patient stories by individuals who feel strongly that they would not wish to change or remove their own medical condition if given the choice 35 and individuals who disagree with medical decisions made by their parents during childhood (e.g., surgical decisions around sex assignment for disorders of sexual differentiation and surgical decisions for craniofacial disorders).

Although these examples provide important considerations regarding the lack of consent for individuals most directly affected by genome editing, they compare non-existence and existence with a disability, which is not an exact parallel to comparing existence with and without genetic alterations. It is worth considering, however, whether germline genome editing involves something fundamentally different or new that would change the alignment between the interests of parents and those of their children, as well as where the range of opinions regarding the value of treatment is diverse enough to warrant preserving autonomous choice at the point of decision-making capacity. This recalibrates the argument against genetic testing in childhood for adult-onset conditions, which is discouraged so that the future autonomy of the child is preserved, particularly when there is no medical action in childhood or when there is significant debate about the desirability of knowing predictive information. 36 , 37

Ethical concerns about non-maleficence also surface in contemplating the potential for creating unsanctioned pressure on the resulting child and imbalance within the family. Arguably, the ability to �sily” request interventions intended to reduce medical risks and costs could make parents less tolerant of perceived imperfections or differences within their families. Clinical use of germline genome editing might not be in the best interest of the affected individual if it erodes parental instincts for unconditional acceptance. At a minimum, the potential for harm to individuals and families, ramifications on which we can only speculate, provide a strong argument for prudence and further research. By proceeding with caution, we can ensure better understanding of the potential risks and benefits of gene editing from a scientific perspective and, as such, provide families with a more fulsome exercise of their autonomous decision making through the consent process. Moving with less haste also limits reliance on early and often inadequate models of cause and effect in our understanding of genetic inheritance and could mitigate the impact of decisions based on unsubstantiated notions of genetic determinism.

Impact on Society: Two major ethical questions related to germline editing occur at a societal level: (1) concerns related to eugenics and (2) concerns related to social justice and equal access to technologies.

Eugenics refers to both the selection of positive traits (positive eugenics) and the removal of diseases or traits viewed negatively (negative eugenics). Eugenics in either form is concerning because it could be used to reinforce prejudice and narrow definitions of normalcy in our societies. This is particularly true when there is the potential for 𠇎nhancement” that goes beyond the treatment of medical disorders. Historically, eugenics has also been associated with exaggerated notions of genetic determination and pseudoscience, and its use through force or tacit support by the state has resulted in devastating consequence.

Although the use of human germline genome editing seems unlikely to result in the loss of genetic diversity in future generations in the population as a whole, it could have a greater effect within select subgroups with both the desire and the means to implement specific changes as has already been seen in the case of Down syndrome. 38 One concern that arises in discussions of trait selection, prenatal testing, and the potential for gene therapy or gene editing is the possibility that allowing parents the choice to control aspects of their child’s genetic inheritance (procreative autonomy) could create expectations of this sort of control or even obligations to 𠇌reate theꂾst children” in what has been called procreative beneficence. 39 , 40

These are among the specific concerns about eugenics expressed by the bioethics community and the public, but perhaps the most deeply felt uneasiness is conceptual: the sense that in identifying some individuals and their traits as “unfit,” we experience a collective loss of our humanity. Often articulated as a concern is that we might be “overstepping” and “playing God” by making such changes in a way that modifies the germline and thereby affects future generations. 41 Some might find human germline genome editing less offensive than other approaches (such as prenatal testing and selective abortion of affected fetuses) because it involves altering genes rather than selecting against individuals. 42 However, others point out that any form of selection of individuals (including through already existing prenatal diagnosis and testing) sends a message about the 𠇏itness” of such traits or conditions, thereby reflecting on the worth and value of people who have that trait in our society.

Finally, one of the most important and far-reaching effects of human germline genome editing, if it is successful and implemented clinically, might be increasing the already troubling inequities within and between societies. The clinical use of human germline genome editing is hypothetical at this point, and any discussion of access or price is speculative. That said, human germline genome editing is likely to be expensive, and access, should it ever become a reality, is likely to be limited geographically and might not be covered by all payors and health systems. Unequal access and cultural differences affecting uptake could create large differences in the relative incidence of a given condition by region, ethnic group, or socioeconomic status. Genetic disease, once a universal common denominator, could instead become an artifact of class, geographic location, and culture. In turn, reduced incidence and reduced sense of shared risk could affect the resources available to individuals and families dealing with genetic conditions. 38 , 43 , 44

Accordingly, we have come to an agreement on the positions below and include clarifications and elaborations:

  • 1. At this time, given the nature and number of unanswered scientific, ethical, and policy questions, it is inappropriate to perform germline gene editing that culminates in human pregnancy.
  • As summarized above, there is not yet a high quality evidence base to support the use of germline genome editing, there remains an unknown risk of health consequences, and the ethical issues have not been fully explored and resolved by society.
  • Scientifically, preclinical studies should establish reliability, validity, safety, and efficacy before attempting any germline genome editing that leads to the potential for implantation or human pregnancy at any post-implantation stage. Here, we define some issues that pertain to establishing acceptable thresholds for safety in the context of human gene editing. Two major categories of safety concerns are the effect of unwanted or off-target mutations and the potential unintended effects of the desired on-target base changes (edits) being made. Various methods are being explored for the monitoring of off-target mutations in genome-editing experiments. It is reasonable to presume that any human genome-editing therapeutic application will require stringent monitoring of off-target mutation rates, but there remains no consensus on which methods would be optimal for this or what a desirable maximum off-target mutation rate would be when these techniques are translated clinically.
  • Deep next-generation DNA sequencing at specific sites in the genome is feasible, allowing for the interrogation of selected sites in thousands or even millions of cells. However, it is not yet practical to identify rare off-target mutations comprehensively by deep whole-genome sequencing this is even more challenging when biopsied material is limited. Recently reported unbiased techniques that can empirically determine sites prone to off-target mutations (e.g., GUIDE-seq, Digenome-seq, and BLESS) are currently limited to use in cultured cells. It is not clear that a priori off-target measurements in vitro could be considered sufficient to pre-validate in vivo editing approaches. Therefore, new methods will need to be developed for identifying and monitoring off-target mutation sites in vivo after somatic genome editing (whether in preclinical animal models or, eventually, in humans) and—if human germline genome editing is to be at all considered—within human germ cells and embryos.
  • Identification and monitoring of potential off-target mutation sites are further complicated by the existence of naturally occurring polymorphisms, meaning that off-target predictions should not be based solely on the analysis of a single person’s genome but rather on a collection of genomes that represent a genotypically diverse group of individuals. On the other hand, the relative health risk of off-target mutations is not clear clearly, the genome can tolerate a burden of new mutations that might already exceed the risk posed by current gene-editing methods (given that we are each born with 50� new genetic variants), but it is not clear how this burden translates into disease risk. At the same time, it seems that these risks might be modest in relation to the health consequences of the serious diseases that genome editing could be used to treat.
  • With regard to potential unintended effects of the desired on-target mutations, this could be uncontroversial for many genome-editing applications, particularly those for which a clearly deleterious variant is replaced by a common variant that restores normal gene function. Less clear are editing approaches that introduce novel variants that are known to either augment or disrupt gene function and/or variants that are rare or not known to exist in human populations. Ethical issues regarding novel gene modifications are not new with regard to somatic applications, given that they pertain to other types of somatic gene therapy. But one of the major differences between germline gene editing and somatic gene editing is that the former introduces edits to all cells in the body𠅊nd potentially to future generations—thus warranting deeper consideration.
  • Given these considerations, minimum necessary developments should include the following:
  • • Definitions of broadly acceptable methodologies and minimum standards for measuring off-target mutagenesis.
  • • Consensus regarding the likely impact of, and maximum acceptable thresholds for, off-target mutations.
  • • Consensus regarding the types of acceptable genome edits with regard to their potential for unintended consequences.
  • 2. Currently, there is no reason to prohibit in vitro germline genome editing on human embryos and gametes, with appropriate oversight and consent from donors, to facilitate research on the possible future clinical applications of gene editing. There should be no prohibition on making public funds available to support this research.
  • Consistent with the sentiment of the 2001 ASHG Statement on Stem Cell Research, animal studies should occur to provide the foundation for human investigation. Human germline gene-editing research is acceptable when performed on already existing embryos that are donated for research with appropriate written donor consent. Rigorous basic scientific research covering multiple generations should be conducted toꃞtermine the potential medical and scientific issues򠯯ore any consideration of translational research for human germline genome editing. Such research canꂾ performed ethically via compliance with all applicable laws and policies and can be beneficial through potential discoveries that might occur around the biological processes of pregnancy and infertility and underlying related diseases and their potential treatments. Any study involving in vitro genome editing on human embryos and gametes should be conducted under rigorous and independent governance mechanisms, including approval by ethics review boards and meeting any other policy or regulatory requirements. Second, although we acknowledge that different countries will have different prohibitions on federal funding of embryo research, we feel strongly that without public funding to support germline-editing research, there is a risk that research will move offshore and/or to areas where it is subject to fewer regulations and less oversight and where work is done without transparency.
  • 3. Future clinical application of human germline genome editing should not proceed unless, at a minimum, there is (a) a compelling medical rationale, (b) an evidence base that supports its clinical use, (c) an ethical justification, and (d) a transparent public process to solicit and incorporate stakeholder input.
  • If the preclinical research, as described above, supports the potential clinical translation of human germline genome editing, many more things need to happen before translational research in human germline genome editing is considered. We encourage the global community to begin to address the following medical, ethical, and societal questions in a deliberative and inclusionary way while answering the relevant scientific questions that have been discussed above.
  • First, ASHG feels strongly that there should be a compelling medical rationale for any conditions for which germline genome editing might occur. Using a conceptual model that addresses various aspects of disabling conditions and quality of life, 45 this might include consideration of the following: the medical severity of the condition, treatability, risk of occurrence, and potential availability of other options for treatment, including somatic gene editing and prenatal or preimplantation diagnosis.
  • Second, the clinical translation of technologies to health care is typically preceded by health technology assessment (HTA), which provides a rigorous means of informing clinical and policy decision making through systematic assessment of the supporting evidentiary base. This includes consideration of clinical effectiveness (e.g., validity, utility, and safety), cost effectiveness (e.g., economic evaluation), and risks and benefits for health-care delivery and society (e.g., impact on health services and consistency with societal and ethical values). 46 As an example, in the US, the Evaluation of Genomic Applications in Practice and Prevention (EGAPP) was established as an advisory body to the Office of Public Health Genomics at the Centers for Disease Control and Prevention to provide evidence reviews for genomic technologies. This independent group adopted review methods similar to HTA frameworks, 47 although HTA frameworks typically include broader considerations of health service delivery, economic analysis, and ethical or social issues. Although evaluation of the evidentiary base of a technology is a fundamental step in the translation of any new therapeutic, procedure, or diagnostic test into clinical care, emerging developments could threaten this standard. Genome editing is a widely accessible and relatively easy technique that could enable the technology’s uptake or dissemination across unregulated labs or clinics, sidestepping its formal review and approval before large-scale use. Nonetheless, once evidence begins to build on the validity, utility, safety, and health-care impacts, independent advisory bodies, taking an approach similar to that of EGAPP, ought to be funded and tasked to review and make recommendations about the clinical use and reimbursement of germline genome editing in clinical practice.
  • Third, ethical and social values regarding germline genome editing need to be solicited and considered. There are three general approaches to addressing the਎thical justification and stakeholder assessment of germline genome editing: conducting primary research conducting secondary analyses of published literature on the perceptions, acceptability, quality of life, attitudes, or values of stakeholders and commissioning an expert review. 48 , 49 Surveys of the general public 41 and various scientific and health professional groups on their views toward genome editing have already begun (Alyssa Armsby etਊl., unpublished data A.V. etਊl., unpublished data), but it is difficult to assess the impact of these attitudes in a population that has limited understanding of the technologies they are evaluating, as well as their generalizability to other populations and societies. New approaches to public engagement for addressing ethical and social issues in such complex topics include deliberative democracy, citizen juries, and community-based participatory research. Such public-engagement techniques are increasingly being used𠅊nd even mandated by some jurisdictions (e.g., the UK National Institute for Health Care and Excellence) 50 —in an effort to incorporate citizen values or patient perspectives into technology assessment and ensuing guidance. 48 Engaging broader stakeholder groups, including the medical and scientific communities, persons and families dealing with genetically based disabilities, and the general public, would be warranted given the potential uses and impacts of germline genome-editing technology. These debates and engagements should weigh the risks, benefits, alternatives, unknown consequences, and access, as well as distributive and procedural justice, both on a societal level (across and within societies) and on an individual or community basis. Given the global diversity in culture and social norms around health, illness, and disability, it will be challenging to develop representative stakeholder groups and to know when enough data on public views have been collected. Ultimately, these debates and engagements will inform the frameworks to enable ethical uses of the technology while prohibiting unethical ones.

Summary and Conclusion

Many scientific, medical, and ethical questions remain around the potential for human germline genome editing. ASHG supports somatic genome editing and preclinical (in vitro human and animal) germline genome research but feels strongly that it is premature to consider human germline genome editing in any translational manner at this time. We encourage ethical and social consideration in tandem with basic science research in the upcoming years.