The Quality Sequencing Minimum (QSM): providing comprehensive, consistent, transparent next generation sequencing data quality assurance

Next generation sequencing (NGS) is routinely used in clinical genetic testing. Quality management of NGS testing is essential to ensure performance is consistently and rigorously evaluated. Three primary metrics are used in NGS quality evaluation: depth of coverage, base quality and mapping quality. To provide consistency and transparency in the utilisation of these metrics we present the Quality Sequencing Minimum (QSM). The QSM defines the minimum quality requirement a laboratory has selected for depth of coverage (C), base quality (B) and mapping quality (M) and can be applied per base, exon, gene or other genomic region, as appropriate. The QSM format is CX_BY(P Y)_MZ(P Z). X is the parameter threshold for C, Y the parameter threshold for B, P Y the percentage of reads that must reach Y, Z the parameter threshold for M, P Z the percentage of reads that must reach Z. The data underlying the QSM is in the BAM file, so a QSM can be easily and automatically calculated in any NGS pipeline. We used the QSM to optimise cancer predisposition gene testing using the TruSight Cancer Panel (TSCP). We set the QSM as C50_B10(85)_M20(95). Test regions falling below the QSM were automatically flagged for review, with 100/1471 test regions QSM-flagged in multiple individuals. Supplementing these regions with 132 additional probes improved performance in 85/100. We also used the QSM to optimise testing of genes with pseudogenes such as PTEN and PMS2. In TSCP data from 960 individuals the median number of regions that passed QSM per sample was 1429 (97%). Importantly, the QSM can be used at an individual report level to provide succinct, comprehensive quality assurance information about individual test performance. We believe many laboratories would find the QSM useful. Furthermore, widespread adoption of the QSM would facilitate consistent, transparent reporting of genetic test performance by different laboratories.


Introduction
Next generation sequencing (NGS) is now routinely used to investigate if genomic variation has caused, or has the potential to cause human disease in clinical and research settings 1 . Such genetic tests must robustly be able to detect pathogenic variants (positive tests) and to exclude the presence of pathogenic variants (negative tests). The accuracy of NGS in this regard is dependent on the performance of the assay generating the sequence data and the software tools that analyse the data. Suboptimal technical and/or analytical performance can lead to false positive or false negative results. Comprehensive quality management of NGS pipelines is thus essential and must encompass both quality assurance and quality control 2 . Guidelines for quality management of the technical and analytical aspects of clinical NGS pipelines have been published 2-6 . One of the recommendations is for clear information about test performance and limitations to be provided on the test report, as healthcare professionals need this for clinical decision making 3-5 . Although this is well accepted to be best practice there are no specific guidelines for how it can be achieved, and most test reports provide limited or no information.
Three primary metrics are used to evaluate sequence quality in NGS data: depth of coverage (how many sequence reads are present at a given position), base quality (have the correct bases been called in sequence reads) and mapping quality (have the reads been mapped to the correct position in the genome) 7 . Insufficient depth of coverage is a common cause of false negative errors 8 . A summary statement of coverage, such as the minimum depth or average depth of coverage achieved is often provided as a proxy for overall performance. However, using only the number of reads is not a sufficient measure of performance. False negative errors can still occur even with good depth of coverage, for example if the reads have been aligned incorrectly to the genome. Moreover, depth of coverage has limited utility in reducing false positive errors. Base quality and mapping quality are useful for this and most NGS base callers and read mappers provide Phred-scaled quality scores that quantify the probability that a particular base has been identified incorrectly (base quality score, BQ) 9 , or a read has aligned to the wrong genomic position (mapping quality score, MQ) 10 .
Quality assurance of an NGS test therefore requires attention to the quality control of depth of coverage, base quality and mapping quality during the design, optimisation and utilisation of the test. To bring consistency and transparency to these processes we have developed and implemented the Quality Sequencing Minimum (QSM). The QSM defines the minimum quality requirement that a laboratory has selected for depth of coverage (C), base quality (B) and mapping quality (M) and can be applied per base, exon, gene, or other genomic region, as appropriate. The QSM allows consistent, automated flagging of test regions that fall below minimum quality requirements and thus need additional scrutiny. In addition to its use in optimisation and quality control of NGS pipelines, the QSM can be used at an individual report level to provide succinct, comprehensive quality assurance information about individual test performance.
A standard BAM file contains the data required for the QSM and a QSM can be easily and automatically applied in any NGS pipeline. We have developed a freely available tool called CoverView to do this 11 . Alternatively, custom scripting within any NGS analytical pipeline should readily allow application of a QSM.
We have found use of a QSM a highly effective and efficient way to meet the quality management recommendations for NGS testing, and we believe others may also find it useful. Furthermore, general adoption of the QSM would help to standardise the communication of NGS quality information by the thousands of laboratories now undertaking NGS tests.

Methods
A QSM includes three sequence quality metrics: depth of coverage (C), base quality (B), and mapping quality (M). The values for these metrics are dependent on the tools used to generate them.
The standardised format for a fully comprehensive QSM is: Where X is the parameter threshold for C, Y is the parameter threshold for B, P Y is the percentage of reads that must reach Y, Z is the parameter threshold for M, P Z is the percentage of reads that must reach Z, and 1v1 through nvn are the tool name(s) and version(s) for read, base and mapping quality score generation, and variant calling.
Metric C uses the number of sequencing reads (depth of coverage) that have mapped to the reference genome for a given base position. Metric B uses the BQ of a base call that quantifies the probability that the base calling is incorrect. Metric M uses the MQ that quantifies the probability that a read is mapped to the wrong position during alignment.
The test region for which a QSM is generated is supplied in a BED file and the C, B, M values are in the sample's BAM file and corresponding .BAI file. These are all routinely outputted by NGS analysis pipelines and used by the variant caller to detect and assign confidence to variant calls.
Variant callers apply data filtering at both the read-level and the base-level. Use of a preset parameter to exclude data is called 'hard filtering'. If the variant caller performs hard filtering, these parameters should be used to define the minimum quality requirements for C, B and M. If the caller performs hard filtering at a read-level, but not at a base-level, the minimum quality requirement can be defined in a more flexible fashion.
The actual quality of NGS data generated may be well above a QSM, but any data of lower quality for any of the three metrics is automatically flagged for additional scrutiny. This means that 100% of the constituent bases of the test region must minimally have a depth of coverage of ≥50 reads with a BQ of ≥10 in at least 85% of reads, and MQ of ≥20 in at least 95% of reads. The rationale for these choices is explained below.

Setting a QSM
For the QSM depth of coverage we selected C50, i.e. ≥50 reads per base. The average depth of coverage achieved for the TSCP pipeline in TGLclinical is >1000x. We consider a base position that fails to achieve 5% of the average depth of coverage (i.e. <50 reads) as suboptimal and requiring further evaluation. For many pipelines with much lower average depth of coverage, C50 would likely be too high, and would lead to an intolerably large number of additional evaluations.
For the QSM base quality minimum we selected B10(85), i.e. a BQ of ≥10 in at least 85% of reads. Platypus v.0.2.4, the variant caller used in TGLclinical, performs hard filtering at the read-level, discarding reads that have fewer than 20 bases with a quality score of BQ ≥20. Platypus uses retained reads to call variants, without hard filtering at the base-level, such that bases with BQ<20 in retained reads are used for calling variants.
Bases of low quality tend to occur at the ends of reads 16 , but to mitigate the impact of this, the TSCP probes are densely spaced giving read overlap. This means that a base can have a proportion of low (BQ<20) quality bases yet still have sufficient high quality bases for variant calling. To accommodate this we set the BQ minimum as 10 in at least 85% of reads.
For the QSM mapping quality minimum we selected M20(95), i.e. a MQ of ≥20 in at least 95% of reads. Platypus uses a mapping quality threshold of MQ ≥20 to perform hard filtering at the read-level. MQ is only generated per read so this hard filter also applies at the base-level. MQ is a reflection of the sequence context of the region. In a region with unique sequence context (i.e. not of low complexity or high homology with another part of the genome), only a small proportion of reads We also reviewed the Sanger data for any additional variants that were not detected by TSCP and three variants, all in exon 14, were observed. Finally we generated TSCP data in five samples with PMS2 variants, in exons 3, 4, 10 and 11, which had been detected by another method. All five were detected by the TSCP pipeline. Taken together these data show that TSCP performance for PMS2 exons 1-11 is excellent, and fulfils the QSM. However, TSCP data for PMS2 exons homologous to PMS2CL can lead to false negative and false positive results, and do not routinely fulfil the QSM. Therefore, if PMS2 gene testing is requested we first perform TSCP, and if negative we perform Sanger sequencing with long-range PCR primers that avoid PMS2CL.

Using a QSM on a gene test report
Detailed information about how the QSM was set and used to optimise testing would be provided in a laboratory's full documentation and accreditation information. However, we believe the QSM also has potential utility at an individual report level. The QSM can succinctly provide consistent, transparent, comprehensive information about the performance of an individual test. This can be included on the individual test report and could be provided in different ways as shown in Table 1. Our personal preference is to include the short QSM statement 'This test met QSM C50_B10(85)_M20(95)'. For tests in which some regions have not met QSM and were not repeated, for example because a pathogenic mutation was found in another gene, we would include a statement such as 'This test met QSM C50_B10(85)_M20(95) {except PMS2 exons 12-15}'. This provides clarity about the genes that have been fully or suboptimally tested.

Conclusion
Quality assurance and quality control are essential requirements for genetic testing. It has proved challenging for laboratories to communicate how they are fulfilling these requirements for NGS tests. We have developed the Quality Sequencing Minimum (QSM), to achieve this. The QSM defines the minimum quality requirement a laboratory has selected for depth of coverage, base quality and mapping quality. The QSM is easy to generate and can be flexibly applied per base, exon, gene, or other genomic region, as best suits the laboratory. The QSM is very useful in the optimisation and quality control of NGS pipelines, allowing consistent, automated flagging of test regions that fall below the designated minimum quality requirements. The QSM can also be used at an individual report level to provide succinct, comprehensive quality assurance information about individual test performance. Widespread adoption of the QSM would facilitate consistent, transparent reporting of genetic test performance by different laboratories.

Data availability
The 1. Page 3. The paragraph beginning with "Variant callers apply data filtering at both...." Additional language will be useful in describing how this influences the QSM. Variation in filtering can affect this score suggesting that some provision may be needed in describing how filtering is applied.
2. Page 3. The paragraph beginning with "The actual quality of NGS data generated may be well above......" Related to this is the paragraph that begins on Page 4 beginning with "Detailed review of regions that were QSM-flagged........." It is not clear when to know when to tag lower quality findings and if such findings occur, are the authors alluding to the idea that the QSM score does not apply in these cases. Additional discussion regarding when the QSM is valid and not valid, what additional criteria should be reported to support these findings, in a general sense, should be described. If all lower scores require further evaluation, this seems to defeat the intended simplicity of this measure. In tagging low quality reads, does this indicate, sometimes, that the parameters set were too high?
Page 4 Paragraph beginning with "For the QSM depth of coverage, we selected C50...." Selection of coverage at C50 seems to be somewhat arbitrary. It would be helpful if additional evidence or published studies can be cited.
Page 4. Paragraph beginning with "For the QSM base quality minimum we selected B10(85)." and "For the QSM mapping quality minimum we selected M20(95)..." The values chosen are rationalized in considering the filtering and analysis that Platypus provides. There should be discussion regarding how this may vary using other methods. This is somewhat addressed in Table 1 with the proposed format that includes the software used. Additional discussion with the text would be helpful. In reporting, one may also need to describe the application of QSM in conjunction with the types of variants targeted (e.g., SNPS, indels, repeats), each of which may have different requirements. In addition allelic fraction may be important for cancer and mosaic samples. This should be recognized within the text as a potential confounder.

Is the description of the method technically sound? Partly
Are sufficient details provided to allow replication of the method development and its use by