New features and improvements
- The Trim Sequences tool under the Preparing Raw Data section of the Toolbox has been renamed to Trim Reads.
- A new option has been added to the Trim Reads tool: "Automatic read-through adapter trimming". This option makes it possible to automatically identify overlap in paired reads and will trim the region that is not part of that overlap. This option is turned on by default. This new default affects workflows that include Trim Reads (or by its former name: Trim Sequences); the parameter will be turned on and locked by default. This change also affects the inbuilt workflow Prepare Raw Data.
- The New Trim Adapter List dialog has been updated to a new and more user-friendly interface.
- It is now possible to reverse complement an adapter sequence with a "Reverse Complement" button to the right of the sequence field.
- It is now possible to specify whether the trim should be performed on all reads, or only on the first or second read of a pair.
- A visual shows the adapter and the sequence being trimmed in relation to the rest of the sequence depending on the option chosen when an adapter is found.
Paired sequence lists can now be exported to 2 fastq formatted files, one file containing the first member of each pair, the other containing the second member. This is now the default for Fastq Export when exporting paired data.
The option "Output as single file" is now disabled by default.
- The introduction of the new default setting "Export paired sequence lists to two files", has the implication that existing workflows that include a fastq export step will be in a state of conflict after they are updated for use on this release. This is because this option is not compatible with the option to "Output as single file", which was turned on by default in earlier versions. Affected workflows must be edited to either remove the option "Export paired sequence lists to two files" or the option "Output as single file". Messaging about this is provided when upgrading affected workflows.
- RPKM is now always calculated when running the RNA-Seq Analysis tool with the options "Genome annotated with genes only" and "One reference sequence per transcript".
- The default for the reference type parameter is now "Genome annotated with genes and transcripts".
- In the RNA-Seq Analysis tool, the option "Calculate RPKM for genes without transcripts" has been renamed to "Calculate expression for genes without transcripts".
- The behavior of the RNA-Seq Analysis tool has been changed when the option “Genome annotated with genes and transcripts” is used together with the option “Calculate expression for genes without transcripts".
- The counts of genes without transcripts are calculated. Previously only the TPM and RPKM were calculated.
- For a gene without a corresponding transcript, where that gene is overlapped by the intron of another gene, reads aligning to this region are counted towards the expression of the gene without the transcript. Previously such reads were counted as belonging to the intronic region of the overlapping gene.
- A single-exon transcript for each gene without transcripts is now added to the output TE track.
- Workbenches without a license can now be run in Viewing Mode. In this mode, data can be viewed, imported and exported. Plugins needed for viewing certain data types can be installed. Viewing mode, with its added functionalities, replaces Limited Mode.
- A dialog is now presented on startup if there are installed workflows that need to be updated before they can be run. The information about what to do to when a workflow needs to be updated has been improved.
- The history of a data element can now be exported as a CSV format file.
- An option to include reads that partially overlap variants has been added to the Identify Known Mutations from Sample Mappings tool, enabling detection of variants that are longer than the reads.
- The Identify Known Mutations from Sample Mappings tool has been made slightly more strict when handling insertions and replacements, requiring reads to overlap adjacent reference positions to be counted as fully covering the variant.
- The speed of the Illumina High-Throughput Sequencing Import has been substantially improved. The largest gains are seen on paired read files compressed by gzip with speed improvements of up to 30%.
- Changed amino acids colors to better suit users with various forms of color blindness.
- In table views, it is now possible to filter columns with the filters "Is in list" and "Is not in list" when the values are numbers.
- When exporting files to SAM or BAM format files, information is now entered into the optional fields NM (edit distance) and MD (mismatch string).
- The filter terms for the Identify Candidate Variants tool now include the numeric operators '>=', '<=', 'abs value >=' and 'abs value <='.
- Importing a GO annotation file with the Standard Import tool, specifying the format "Generic annotation file for expression data", now fails with an informative warning if any of the GO annotations are truncated.
- Warnings are now reported if truncated GO annotations are found when opening data created by the Create Expression Browser tool.
- The 'Expression Browser Table' (output from the Create Expression Browser tool) now preserves sorting when changing the grouping, if sorting is not on any of the grouped columns.
- All wizard steps are now shown in the wizard sidebar when starting a tool or workflow.
- Visualization of features that wrap around the origin of circular sequences has been improved for sequences and tracks.
- Table filtering and search now interpret thousands and decimal separators in the same manner as the displayed table. Previously US punctuation was always used. This change means that if a table displays numbers in the form "123.456,7" then it is possible to find numbers less than ten by searching for "< 10,0" or "<10", but not "<10.0". If the table displays numbers in the form "123,456.7" then it is possible to find numbers less than ten by searching for "<10.0" and "<10", but not "<10,0".
- When a tool is disabled in a right-click context menu, hovering the mouse over the tool name will now reveal why a tool was disabled in most cases.
- The help window can now be closed by pressing the escape key.
- HTML formatting tags are now removed during export of data to Excel .xlsx or .xls format. This change does not affect the export of hyperlinks.
- This history information for data generated using the Identify Candidate Variants tool now includes a match criteria field, recording if the option 'match all' or 'match any' was used.
- For Reads tracks, the side panel option "Highlight reverse paired reads" is now enabled by default.
- Parameters for the Trim Sequences tool are now shown in the same order when running the tool from the Toolbox or within a workflow.
- The column headings in the table containing statistics for each mapping, optionally produced by the QC for Read Mapping tool, have been made more descriptive.
- The Search for Reads in SRA tool now reports in the top left corner the number of rows being displayed.
- Communication of error messages from the NCBI when running the Search for Reads in SRA tool has been improved.
- Map Reads to Reference now outputs an empty read mapping and report when the input contains 0 reads.
- A warning message is now presented when the tool Extract Sequences is run with the "Extract to single sequences option" selected and more than 100 sequences would result.
- The Roche 454 and SOLiD Import tools have been moved to the Legacy folder of the Workbench Toolbox.
- The option "Search on both strands" has been removed in the Trim Reads tool (formerly named Trim Sequences) and the Extract and Count tool.
- The Create Mapping Graph tool has been modified so that the coverage of overlapping paired end reads is now only counted as one in the overlapping region, instead of two as done previously.
- Removed the line "Total consensus length" from Detailed Mapping Report when using a Read Mapping Track as input, as these tracks do not contain consensus information.
- Clicking "Select genes in other views" in a Volcano Plot with an empty selection no longer gives an error message.
- The SAM and BAM Mapping Files importer now fails if there are reads with more than one primary alignment where both are marked as being the first in a pair or both are marked as being second in a pair.
- Scrolling in a table now scrolls a fixed number of pixels, and not a fixed number of rows or columns.
- The "Adapter trimming" section of the Workbench Preferences has been removed. This section supported functionality that was already retired.
- The "Help" and "Reset" buttons in pop-up dialogs are now buttons with text labels. They were previously buttons with icons.
- The underlying read mapper and de novo binaries included in the Biomedical Genomics Workbench 5.0 are from QIAGEN CLC Assembly Cell 5.0.5.
- Fixed an issue with the QC for Target Sequencing tool where "GC %" was reported as a ratio. It is now reported as a percentage.
- Fixed an issue where paired distances were calculated incorrectly for paired reads in Forward-Reverse orientation where there is adapter read-through. Paired distances can be seen in the report from the Map Reads to Reference tool and the RNA-Seq Analysis tool. The paired distance calculation is also used by the "auto-detect paired distances" option in these tools, although this issue is unlikely to affect the inferred distances.
Fixed an issue with the Add Information about Amino Acid Changes
tool when used with a circular sequence with a CDS annotation placed across the origin. Variants outside such a wrapped annotation could previously be incorrectly annotated with coding region changes.
- Fixed an issue with the Add Information about Amino Acid Changes tool when used with a circular sequence with an intron across the origin. Previously, nearby variants were not annotated with coding region changes. Now, variants in such introns and that are within 2 nucleotides of the nearest exon will be annotated with coding region changes, if such changes are identified.
- Fixed a bug in the Add Information about Amino Acid Changes tool where the CDS reference was used instead of the RNA reference when annotating coding region changes if the RNA and CDS annotations could not be matched. This could result in variants in UTR regions not being reported. The matching has been improved by supporting the 'parent' field used by the GFF3 file format to pair CDS and RNA references.
- Fixed a bug in the RNA-Seq Analysis tool where, when run in "Genes and transcripts" mode, and using "Total counts" as Expression value, the expression values reported for GE tracks would not include shared exon counts. Downstream analyses based on the Set Up Experiment tool could be affected by this issue. Using affected GE tracks as input to the following tools would *not* affect their results: Differential Expression for RNA-Seq, Create Heat Map for RNA-Seq and PCA for RNA-Seq.
- Fixed an issue where the option to run the Differential Expression for RNA-Seq tool in batch mode was made available, leading to an error if it was selected.
- Fixed an issue where it was possible to start the Create Heat Map for RNA-Seq tool with invalid parameters that would cause the tool to fail.
- Fixed an issue where the number of input samples to the Map Reads to References and Map Reads to Contigs tools would be silently limited to 120. The execution is now aborted with a warning message. Each analysis must be started with 120 samples maximum.
- Fixed an issue with the mapping tool in the Workbench, which is used in tools involving a mapping stage, such as Map Reads to References, Map Reads to Contigs and RNA-Seq Analysis, where length and similarity fraction cut-offs in some cases were ignored for reads longer than 500bp.
- Fixed an issue with the InDels and Structural Variants that caused it to crash if it encountered a particular set of conditions relating to reads with deletions.
- Fixed an issues with the InDels and Structural Variants tool duplicate breakpoints and variants were reported if reads mapping as broken pairs were included in the analysis.
- Fixed an issue where filtering a log for a job that was still running would result in error dialogs.
- Fixed an issue that had previously prevented configuration of the export option "Output as single file" in workflows.
- Fixed an issue where data exported with gzip or zip compression did not have the .gz or .zip suffix appended to the filename when earlier exports had been made with the same name and export location specified.
- An issue has been fixed so that it is now possible to export in BAM format reads that contain synonyms, for instance 'X' as synonym for 'N'.
- Fixed bug which caused the fasta exporter to fail when exporting read mappings where one or more reference sequences have no reads mapped to it.
- Fixed an issue that could cause exports of reports with line graphs to fail.
- Fixed an issue where resetting the default parameter values when configuring the Identify Candidate Variants tool did not work.
- Fixed an issue that would prevent the Trim Sequences tool being run with certain length filter settings.
- Fixed an issue where the option to "Highlight reverse paired reads" in the side panel of a reads track would cause paired end reads to be colored incorrectly if the reads completely overlapped, as would happen in the case of adapter read-through.
- Fixed a bug where a cell containing multiple hyperlinked URLs caused export to Excel 2010 or Excel 97-2007 format to fail. Such cell contents are now written in plain text.
- Fixed an issue where the Low Frequency Variant Detection tool could return NaN for the Probability value in rare instances for small datasets.
- Improved performance for several tools when handling genomes with many chromosomes. Examples include Add Fold Changes, Add Information from Overlapping Variants and the BED Exporter.
- Licenses for commercial modules are no longer required to install a module on a Workbench nor to view data generated by tools of a commercial module.
- The flexibility associated with network module licenses has been improved. Workbench module licenses provided via a CLC License Server are now initially loaded only when a tool provided by that module is launched. Such licenses are returned when 4 hours lapses since the last module tool was launched from that Workbench.
- A new plugin, QIAseq Targeted Panel Analysis 1.0, unifies the three QIAseq Targeted Panel plugins that were previously available: QIAseq DNA V3 Panel Analysis, QIAseq Targeted RNA Panel Analysis and QIAseq Targeted RNAscan Panel Analysis. The new plugin covers Targeted DNA for variant calling, Targeted RNA for differential expression and Targeted RNAscan for fusion gene detection with improvements resulting in more accurate variant calling and fusion gene detection.
- SOLiD colorspace data support, including import, will be retired and will not be available in the the next major release of the software.
- Roche 454 NGS import has been moved to the Legacy Tools folder and will be removed in a future release, but will still be available in the next major release of the software.
- The "Prepare Overlapping Raw Data" workflow will be retired, and thus not distributed with the next major release of the software.
If you are concerned about the proposed changes, please contact our Support team (AdvancedGenomicsSupport@qiagen.com).