Home
You are here:
Home Clinical Genomics Research Project design and preparation Understanding technology
Last updated: Jun 14, 2018

Understanding technology

There are many genomic technologies, each using different methods to determine DNA and RNA sequences and structure. The following summary includes information about how these approaches work, and some of their strengths and weaknesses. However, this is a rapidly moving field, with new technologies and platforms being developed and released regularly. Discussions with sequencing providers or laboratories with genomics experience are therefore invaluable. 

Sequencing options

A key consideration in genomics research is determining which sequencing technology and method will provide adequate genomic coverage and depth to answer the research question at hand. Sequencing options include whole genome, whole exome (regions of the genome that code for proteins) or targeted gene sequencing panels.

Genomes, exomes and panels

Whole genome sequencing (WGS) is comprehensive but expensive. WGS provides consistent coverage over gene regions, includes regulatory regions that effect genes and can differentiate between pseudogenes and their functional counterparts. However, due to the incredible volume of data, analysis can be complex and time consuming [1].

Whole exome sequencing (WES) is a cheaper, more targeted option that still provides a wealth of information as it covers the regions that are best understood and that most clearly relate to genetic disorders. WES is particularly useful for studying Mendelian disorders but exomes can be of variable quality depending on the kits and targeting approaches used [1].

Targeted gene panels are cheaper still and often include a set of genes or gene regions that associated with a disease or phenotype. Similar to WES, panels generally involve isolating particular sections of the genome prior to sequencing. Targeted approaches are useful if the genes or regions involved in the study are known and are generally not used to investigate novel associations.  The specificity of panels also reduces the chances of detecting incidental findings.

The type of approach used may be tailored to the research question and the analysis methods that will be undertaken.

Have you thought about?

  • Talking to sequencing provider or laboratory about which regions of the genome you need the best coverage and depth on to be certain about your findings?

Back to top

Sequencing and array technologies

Short read

Short read sequencing machines were the first, high throughput technologies available and continue to be the most commonly used. In the preparation stages of sequencing, DNA is broken into fragments. These fragments are often copied or amplified before being read by the sequencing machine.

A major company that provides machines for short read sequencing is Illumina, which uses a sequencing-by-synthesis approach that attaches fluorescent nucleotides to the DNA fragments, then uses lasers and high resolution photography to determine the order of the DNA or RNA bases [1].

There are numerous smaller companies that have developed alternative sequencing methods. For instance, ThermoFisher's ION Torrent sequencer also uses a sequencing-by-synthesis approach. Rather than using fluorescent tags, it monitors pH changes during DNA synthesis. When a base is added to a DNA sequence an acidic H+ ion is released and this lowers the pH of the solution [1].

Pros

Throughput, cost and accuracy vary between different models from these companies and even between labs.

Short read sequencing is currently the most accurate for picking up single nucleotide variants, as well as small insertions, deletions and substitutions. They can be up to 99.9% accurate and are currently the workhorse for cohort and population level research [2].

Cons

Many of the disadvantages of short read sequencing are due to the read length. It takes time, specialist expertise and computing power to accurately assemble a genome.  Complex regions of the genome remain very hard to read such as repetitive or homologous sequences or mobile elements, and structural variants can also be challenging to characterise.

Back to top

Long read

Long read technologies are able to sequence far longer sequences of DNA and RNA than their short read cousins. They can generate read lengths of up to entire microbial genomes, chromosomes or short RNAs. Long read technologies can provide more complete coverage and assembly of genomes and tolerate more genomic complexity. The major companies in this space are Pacific Biosciences (PacBio) and Oxford Nanopore.

PacBio’s sequencers work by generating a complementary strand to the target DNA template and monitoring, in real time, which fluorescently tagged nucleotides is incorporated.

Oxford Nanopore's sequencers work by driving electricity through tiny tubes, called nanopores, sitting in a membrane. As DNA or RNA moves through a nanopore, each base creates a disruption in the electrical current which is analysed in real-time to determine the sequence.

Pros

Longer read sequencing make identifying and aligning complex regions such as large structural changes, repetitive regions and extreme GC sequences less complex bioinformatically [2]. PacBio and Oxford Nanopore are also commonly used to detect epigenetic modifications, such as methylation.

These technologies allow real-time sequencing, which means that the genomic data can be viewed and analysed as it comes off the sequencing machine. This enables simple conveniences such as stopping an experiment when you have the data you need or because of a quality issue.

Cons

Long read sequencing technologies are currently more expensive and have higher error rates than short read technologies. As a result, a common practice is to use short read technologies to “polish” long-read assemblies. Consequently, long read sequencing is widely used in research settings, but not commonly in clinical settings.

This is an area of significant growth, and a combination of improvements to the machines as well as our ability to process the data could soon increase their accuracy and decrease costs.

Back to top

 Arrays

Some studies do not require a next generation sequencing approach, and it could be done more cheaply and simply with an older, tried-and-true method such as DNA microarrays. 

DNA microarrays are small chips covered in thousands of short, synthetic DNA sequences, which together form a gene or region of interest. A sample and control are labelled with a particular colour and then washed over the chip. If the sample has a variant at a particular point, it will not bind to the synthetic sequence, but the control will [3]. DNA microarrays fall into two broad categories;

  • Comparative Genomic Hybridisation (CGH) arrays which pick up large structural changes
  • Single nucleotide polymorphism (SNP) arrays which detect single base changes

In a clinical setting, microarrays are used as front-line tests for autism, intellectual disability, developmental delay and congenital abnormalities [4]. They are also increasingly used for prenatal testing [6]. Depending on the presentation, microarrays have a diagnostic yield of 10-20%, compared to 3-5% with a traditional karyotype. They have high sensitivity and reproducibility and have been well tested, so are sometimes more trusted than newer technologies.

Pros

While in many cases WES and WGS have superseded microarrays in quality and detection, the lower price and comparatively smaller datasets generated by microarrays mean that they remain useful for studies involving a large number of participants.

Microarrays can often be better at dealing with poor quality samples.

Cons

Microarrays have a significantly lower resolution, which can be both a limitation and a benefit.

Some panels cover a specific area of the genome, so are only useful if the study is investigating a specific area of interest [5].

Another limitation is that microarrays can only detect gains or losses of genetic material, and do not show the position of the DNA in relation to the rest of the genome. Hence, balanced chromosomal abnormalities will not be picked up by microarrays.  Imbalances in genomic regions not represented in the array will also not be detected.

Finally, both array CGH and SNP arrays only work in regions that are unique, which means they are not useful in areas that have pseudogenes or large duplications.

Back to top

Back to top

Genomic methods and experimental designs

Transcriptomics

Transcription is the process of reading then transcribing DNA into RNA [6]. The transcriptome is all of the RNA copies in a cell, sample or organism [6]. RNA has a variety of roles. Some are involved in protein building, such as messenger RNA (mRNA), which are translated into amino acid chains by ribosomes. Others have regulatory roles, like micro RNA (miRNA) which down- or up-regulate gene expression.

By looking at the transcriptome, we can determine which genes are turned on as well as the level of gene expression [6]. Furthermore, knowing where and when particular parts of the genome are expressed provides clues to their function [6].

Back to top

Epigenomics

Epigenomics is the study of the chemical compounds that modify or mark the genome and as a result, regulate the activity (expression) of genes. These modifications alter DNA accessibility and chromatin structure, thereby regulating patterns of gene expression, without altering the underlying DNA sequence.

One frequently studied epigenetic modification is DNA methylation, where chemical tags called methyl groups are added to DNA bases to moderate their interaction with other proteins [7]. Interactions between DNA and key structural proteins called histones are also important. When DNA is wrapped around a histone it is inaccessible and is thus “turned off” [8].

Sample preparation for an epigenomics study is different to standard DNA sequencing: consult your local sequencing provider for guidance.

Back to top

Single-cells

Genomic research often involves extracting and sequencing genomic material from large numbers of cells within a single sample. This obscures the many differences between cells in our body and prevents the detailed study of individual cell's functions and interactions. Single cell approaches address this by isolating, amplifying and sequencing an individual cell's DNA, RNA or epigenome.

Increasingly, it is understood that DNA sequences vary cell to cell, through random mutation events, splicing events in stem and immune cells or as a result of mosaicism at birth. Being able to identify and closely examine these differences has important implications for understanding a range of conditions, including autoimmune and neurological disorders.

Single-cell sequencing also has many applications in the study of microbial communities, providing insights into gut and skin microbiomes and viromes.  Single-cell research has the added advantage of allowing researchers to study microorganisms that have not been cultured.

Single-cell approaches also allow researchers to investigate how cancer cells evolve and change, both spatially and over time. These methods have allowed researchers to map the different types of cancers that are present in a patient (some of which may be vulnerable to different therapies).

Using single-cell technology researchers are learning more about the transcriptional and epigenetic differences between neighbouring cells or communities of cells. These approaches are being used to interrogate lowly abundant transcripts, RNA transcript splicing and how changes in gene expression lead to different cell characteristics.

Back to top

Non-coding regions

The best studied regions of the genome are sections that are directly involved in the production of proteins. However, there is increasing interest in regions of the genome – called the non-coding or non-protein-coding regions – which are involved in regulation and protein expression. 

Non-coding regions play significant roles in many diseases acting as enhancer elements, promoters, intronic sites that influence alternative splicing of protein products and more structural regions, affecting how different parts of the genome are expressed. Information about these regions may be captured through standard sequencing methods, but the analysis of the data require different approaches to understand how a variant can impact cellular activity outside a protein-coding sequence.

Have you thought about?

  • Which technology platform to use?
  • Speaking with a bioinformatician about how to analyse the data you generate?
  • How would you confirm your findings?

Back to top

Metagenomics

Metagenomics is the study of multiple genomes simultaneously and can be a useful tool for investigating microbial and viral communities. Metagenomics has been used to investigate gut and skin microbiomes and detect pathogens in an ecosystem. This technique also allows researchers to study microorganisms that have not been cultured.

Two common methods include:

  • Taxonomical analysis (16S and 18S) - “Who is there?”
  • Functional analysis - “What are they doing?”

Back to top

de novo genome assembly

de novo genome assembly is the process of aligning a genome without a reference genome. This can be used for organisms without a good reference genome and generally requires the use of long read or synthetic long-read sequencing to be successful.

Back to top

Functional studies

DNA and RNA sequencing can suggest that a variant has a role in a particular phenotype or disorder, but unless there are many cases to support this, such a connection may need to be confirmed by functional studies.

Functional studies can be carried out in vivo, in vitro or in silico. Many functional studies involve altering the genotype of a cell or animal, then measuring the changes in the phenotype, which could be on a cellular or molecular level (e.g. expression of a protein) or on a larger scale (e.g. behavioural or physical) [9].

Understanding the function of a gene through functional studies can:

  • Provide validation that a variant is the pathogenic cause of a disorder
  • Enable exploration of structural and functional changes in proteins and downstream changes in tissues, organ systems, and metabolic processes
  • Lead to potential manipulation of gene products or testing of drug therapies to restore normal function

Back to top

Back to top

Have you thought about?

  • Can you consult with an existing network or research organisation to access the tools and expertise you need?
  • Which technology platform to use?
  • Speaking with a bioinformatician about how to analyse the data you generate?
  • How would you confirm your findings?
  • What samples do you have available to you?
  • Talking to the people who will analyse the data?
  • Is the sequencing platform or approach appropriate for your research question and within your budget?

Back to top


[1] Clinical Genomics 101, 2017 edition, Frontline Genomics

[2] Goodwin S, McPherson J D, McCombie W R.(2016) Coming of age: ten years of next-generation sequencing technologies Nature Reviews Genetics 17:333–351

[3] DNA Microarray Technology. The National Human Genome Research Institute. (Updated 2017) https://www.genome.gov/10000533/dna-microarray-technology/

[4] Miller, D. T., Adam, M. P., Aradhya, S., Biesecker, L. G., Brothman, A. R., Carter, N. P., … Ledbetter, D. H. (2010). Consensus Statement: Chromosomal Microarray Is a First-Tier Clinical Diagnostic Test for Individuals with Developmental Disabilities or Congenital Anomalies. American Journal of Human Genetics86(5), 749–764. 

[5] Wapner R J, Martin C L, Levy B. (2012) Chromosomal microarray versus karyotyping for prenatal diagnosis. N Engl J Med. 367:2175–84.

[6] Transcriptomics. The National Human Genome Research Institute. (Updated 2015) https://www.genome.gov/13014330/transcriptome-fact-sheet/

[7]  Epigenetics. The National Human Genome Research Institute. (Updated 2016) https://www.genome.gov/27532724/epigenomics-fact-sheet/

[8] Buguliskis J S. The Epigenetic Insights of RNA-Seq. The Use of RNA-Seq Has Enormous Applicability toward the Development of Clinical Diagnostics (2016) GEN Exclusives http://www.genengnews.com/gen-exclusives/the-epigenetic-insights-of-rna-seq/77900651

[9] Wangler M F, Yamamoto S, Chao H-T, et al. (2017). Model Organisms Facilitate Rare Disease Diagnosis and Therapeutic Research. Genetics. 207(1):9-27