Last updated: Apr 27, 2018

Wet Laboratory

Most clinical genomic sequencing-based studies to detect variants follow these key steps: sample preparation and sequencing, processing and alignment of sequencing data, identifying, filtering, annotating, biological and clinical interpretation and validation of variants. At each step, there should be various quality checks to ensure that no errors have occurred. The success of a sequencing study is dependent on the initial steps performed in the wet laboratory and sample management. 

There are countless genomic methodologies and steps that cannot be described as part of this resource. More information can be found in research publications and by talking with sequencing providers or bioinformaticians. 

Sample preparation 

Successful genomic data analysis is critically dependant on high quality sample preparation. Methods and protocols vary between research experiments; however, consistency within an experiment (controls and experimental samples) is paramount.            

There are three common stages in DNA and RNA extractions:

  • Cell lysis
  • Removal of cell membranes
  • DNA or RNA purification

There are numerous protocols and commercial kits available for preparing samples for high throughput sequencing. Protocols and kits are tailored for specific cell types, sample type (frozen tissue or formalin-fixed paraffin-embedded (FFPE)), quantity and size of starting material, the experimental aims as well as the sequencing platforms used.

When preparing samples, it is important to be aware of contamination, which can originate from:  

  • External sources - bacterial, yeast, other individuals,
  • The participant - sample heterogeneity or tissues contamination e.g. percentage of a tumour to non-malignant tissue, percentage of different tissue types in epigenetic or transcriptional studies 

Observed variability can occur as a result real biological differences or technical biases. Technical bias is variation introduced during an experiment as a result of:

  • Sample preparation, storage or processing, (kits, date and location prepared)
  • Library construction
  • Machine, flow cell or lane variability
  • Technical staff

Technical biases can be minimised by randomising sample preparation and sequencing between samples, for example, bias could be introduced if the controls and experimental samples were prepared on different days or by different people [1].                                                                                                    

Quality control

  • Most sequencing providers will have a minimum quality and quantity threshold for sequencing. Common methods for testing quality and quantity of DNA or RNA include visualisation on an agarose gel, spectrophotometry and fluorometric methods.
  • Samples can often be checked for contamination and percentage of sample heterogeneity, although sometimes this is detected after sequencing or analysis.

Have you thought about?

  • Which extraction kit and protocol will you use for the sample preparation? Would this method generate sufficient data for bioinformatics analyses and for addressing your biological questions? (e.g. replicates, coverage)
  • How were your tissues stored or transported?
  • Have you set up your experimental preparation to avoid technical biases?
  • If you are planning to create complementary datasets, can you do multiple extractions at once? (e.g. rather than extract just DNA, consider extracting RNA and protein at the same time)

Talk with your sequencing provider to understand how sample preparation can affect sequencing experiment and what precautions and considerations are important. It is always better to check with the sequencing provider that will be generating the sequence data as many providers have specific requirements.

Back to top

Library preparation

A library in high throughput genomics research often refers to DNA or RNA prepared for a sequencing experiment. Library preparation and protocols vary widely among sequencing methods and platforms. Sequencing providers will be able to provide advice on library preparations.

Broadly speaking there are five main steps in short read library preparation, although not all protocols will require each of these steps:

DNA fragmentation

  • End repair
  • Adapter ligation
  • Amplification
  • Purification

Quality control

  • Many methods require uniform fragment sizes. Library quantification is performed before sequencing to measure the size and quantity of the fragmented DNA. Common methods include visualisation on an agarose gel, spectrophotometry and fluorometric methods.

Have you thought about?

  • Have you considered the impact of different methods involved in library preparation, such as barcoding, indexing, multiplexing, hybridisation, capture and amplification, on the downstream analysis?
  • Have you set up your experimental preparation to avoid technical biases?

Talk with your sequencing provider, bioinformatician and/or an experienced lab technician to understand how library preparation can affect your sequencing experiment and what precautions and considerations are important for your study.

Back to top

Genomic sequencing

Improvements in high throughput sequencing technologies have made sequencing more affordable and accessible than ever before, although it can still be relatively expensive compared to some conventional methods (Sanger sequencing, microarrays). Understanding which sequencing technology will cost effectively answer the research question is an important first step in genomic research. The pros and cons of common sequencing technologies are discussed in Understanding technology              

It is critical to ensure that the depth of sequencing will allow accurate and confident variant detection and that there are enough biological replicates to perform statistical analysis. There are many conventions and tools that can help with determining the depth of sequencing and replicates required. However, it is important to note, that some common conventions might not be suitable for the samples or biological questions, and as such consultation with an expert is recommended.                                                           

The read depth is the number of reads or bases aligned to a given location on the reference genome. The coverage is the average number of sequenced reads that are mapped or aligned to the reference genome. Coverage is an average and varies across the genome but it is a good indication of the level of sequencing performed. Variability in coverage can be due to a range of factors including:

  • Biases in genome structure
  • Complexity in aligning reads to repetitive regions, highly aneuploid genomes or heterogeneous cell populations (e.g. tumours samples)
  • Relative abundance of reads and biases from sample and library preparation

The coverage required will depend on the biology of the sample and the research questions. For example, if the study is investigating a gene or variant in a repetitive region then a greater depth of coverage will be required compared to a gene in a to protein coding region.  

The read depth, not coverage, determines whether variant calls can be made with a certain degree of confidence at a particular base position. 

The sequencing provider or sequencing machine process the raw sequence files to a data file with base calls (or nucleotide sequences) and their corresponding quality scores. Most sequence providers will also perform an initial quality check of the input sample and sequence output.

Quality control

  • Quality control (QC) of read sequences using base quality is critical. Other checks can be used to look for contamination, sequencing biases, adaptors content, technical biases from sample handling and instrument operation and so on.
  • Some common tools and software used for interrogating sequence quality include:
    • FASTQC
    • Integrative Genomics Viewer (IGV)
    • SMRT portal (PacBio)
    • SFF (Ion Torrent)
    • ClinQC

Please note this is not a recommended list of tools but a small subset of frequently used tools. We recommend consulting with an expert or literature before selecting a tool.

Have you thought about?

  • Is the sequencing platform or approach appropriate for your research question and within your budget?
  • Will the read depth and coverage allow you to answer your biological question?
  • Do you have enough biological replicates to achieve statistical analysis? 

Talk to your sequencing provider and/or a bioinformatician to decide which sequencing technology or platform is suitable for your study and to discuss the quality of your sequence data.

Back to top

[1] McIntyre L M, Lopiano K K, Morse A M, Amin V, Oberg A L,Young L J, and Nuzhdin SV. (2011) RNA-seq: technical variability and sampling. BMC Genomics 12:293