VCF file format comes with a lot of interesting quality assurance and statistics fields that can be used for filtering in VarSeq. Open your files in a text editor to see all the fields that are available in your files, each field will have a header line with a description of its content. See the VCF Specifications to help with interpretation of the information.
One of the most used values for filtering variants in a somatic mutation workflow is the Alternate Allele Frequency for each sample. This field is not always provided directly in the VCF data but don’t worry, VarSeq will automatically calculate the frequency using the provided allelic depth fields in the file.
Depending on the Variant Caller that was used to produce your files the allelic depth information can come from a variety of fields within the VCF file and VarSeq can use them to compute the Alternate Allele Frequency (Alt Allele Freq).
We will first look for observed counts for both the reference and alternate alleles, these values will be provided in the AO and RO fields. The can also be available as Flow Evaluator observed counts in the FAO and FRO fields (Flow Evaluator fields are preferred).
Next, we will look for observed alternate allele counts and the total allelic depth fields, the alternate allele counts will once again come in either AO or FAO fields. The total allelic depths will be found in the DP or FDP fields respectively.
If none of the above fields are available we will then use the unfiltered counts for all reads that carried reference or alternate alleles found in the AD field. This field is an array where the first entry represents the reference allele and then the following entries are for each alternate allele at this locus.
As a last resort, VarSeq will look for the DP4 field which can commonly be found in VCF files prepared by SAMTools. This field has four entries in the following order: forward reference count, reverse reference count, forward alternate count and reverse alternate count.
If your data contains this information in a different field or format then you can compute your own alternate allele frequency using the Add > Computed Data… > Compute Fields algorithm. If you have questions about the computation or need assistance computing your own field just send us an email at firstname.lastname@example.org!