Next-generation sequencing (NGS) techniques allow high-throughput detection of a vast amount of genetic variations in a cost-effective manner. However, there are still inconsistencies and debates about how to process and analyze this “big data”. To accurately extract relevant information from genomics data, choosing appropriate tools, knowing how best to use them, and interpreting the results correctly is crucial.
In this article, we discuss an important step in the NGS data analysis workflow–variant filtering and prioritization–including how to overcome challenges, best practices, and how to determine which NGS variant analysis solution is right for you.
What is variant filtering?
Variant filtering is a secondary NGS analysis step that consists of identifying highly confident variants and removing the ones that are falsely called. The variant filtering step used to be mostly left out from deeper testing, even though it can has been shown to significantly improve precision of variant calls.
What is variant prioritization?
Variant prioritization is a vital step in discovering causal variants in order to identify disease-causing mutations. This is because the results of NGS technologies and applications, such as whole-exome sequencing (WES) or whole-genome sequencing (WGS), will often consist of a list of several thousands of variants of unknown significance, many of which are proved to benign (even though any rare variant has the potential to be pathogenic).
On average, WES of human samples detects approximately 20,000–30,000 SNVs and indel calls (1).
Therefore, variant prioritization accelerates and simplifies variant interpretation because the results enable the interpretation of variants of unknown significance. It is a process that with filtering identifies which variants found via NGS testing are likely to affect the function of a gene.
3 key challenges in variant filtering and prioritization
1. Content and curation – During the variant filtering and prioritization processes, narrowing down tens of thousands of variants to a manageable list often entails a laborious process of seeking gene-phenotype relationships by consulting numerous separate databases. In addition, when testing for rare genetic diseases, public databases are widely used for the initial elimination of common variants [minor allele frequency (MAF) > 0.01]. However, some public databases may not have variants that are present within private databases, such as the Human Gene Mutation Database (HGMD) Professional.
- 2. Narrow down to only the most relevant – The more accurate and concise a filtered and prioritized list is, the faster and more precise the downstream analysis. A recent study found that a translational lab using a manual and semi-automated variant filtering and prioritization workflow reported an average of 600 variants per case analyzed by two people (one analyst and one director) over three hours, meaning a throughput of ≈100 variants per man-hour (2).
If causative variants can be identified earlier due to a high rank from prioritization, it’s possible that the full filtered variant list can be short-circuited, reducing the total number of variants reviewed and therefore the time to analyze a case. Additionally, accurate prioritization is a step towards the ultimate goal of fully automating the analysis of the sequencing data for NGS testing.
3. Standardization – Variant filtering and prioritization must be consistent, reproducible, and standardized. However, many clinical research and translational labs use multiple software tools to perform different steps in the NGS secondary analysis workflow, including annotation, variant filtering, and prioritization. Using multiple software tools can introduce workflow complexities and inconsistencies. Now, the gold standard is for NGS labs to use one solution that automates and standardizes variant annotation, filtering and prioritization through a single, user-controlled workflow.
Considerations for selecting a solution for variant filtering and prioritization
1. Automated curation – To help accelerate the variant filtering and prioritization process, select a NGS assessment software solution that eliminates manual curation by providing users accesses to multiple sources of pre-curated content to rapidly and comprehensively prioritize variants [including Single Nucleotide Variants (SNVs), indels, structural, and Copy Number Variants (CNVs)].
2. Powerful algorithms – With the exponential growth of biomedical data sets in NGS testing, it is increasingly important to combine large data sets with machine learning and powerful algorithms in order to quickly identify potential causal variants within minutes.
3. Transparency and full-user control – To enable consistent, reproducible, and standardized variant filtering and prioritization, select a software solution that dynamically computes assessments with full transparency and gives users total control over parameters, policies, and output.
Learn more about QCI Interpret Translational
QCI Interpret Translational is a NGS variant assessment software solution that enables rapid, evidence-powered variant annotation, filtering, and triage for human exome, genome, and large cohort sequencing data.
Leveraging the QIAGEN Knowledge Base, the industry’s largest collection of biological and clinical findings, QCI Interpret Translational improves research efficiency and accuracy by automating manual curation processes, dynamically and transparently assessing variants according to society guidelines with full user-control, and optimizing resource allocation, allowing users to focus on what matters most: transforming genomic data into publishable insights.
Learn more here.
- Bao R, Huang L, Andrade J, et al. Review of current methods, applications, and data management for the bioinformatics analysis of whole exome sequencing. Cancer Inform. 2014;13(Suppl 2):67-82. Published 2014 Sep 21. doi:10.4137/CIN.S13779
- Bick D, Fraser PC, Gutzeit MF, et al. Successful application of whole genome sequencing in a medical genetics clinic. J Pediatr Genet. 2017; 6(02):061–76.