New features and improvements
Template workflows
Template workflows are available from the Toolbox. These can be run as they are, or can be copied and customized for your needs. The template workflows provided are:
The new Sequence Lists folder under Toolbox | Utility Tools contains tools for working with sequence lists. This includes existing tools, with new names and expanded functionality, as well as new tools:
- Split Sequence List New tool: Splits up nucleotide or peptide sequence lists. The output can be a specified number of lists, lists containing a specified number of sequences, or lists containing sequences with particular attribute values, such as terms in the description.
- Update Sequence Attributes in Lists New tool: Updates and adds information about the sequences in a list. For example, descriptions can be updated, or new information types can be added based on information provided in an Excel file.
- Create Sequence List Existing tool. Create new sequence lists from sequence elements and/or sequence list elements. Previously available only from the File | New menu.
Access to Amazon S3 and BaseSpace
Other new functionality
- MGI/BGI importer An importer for MGI/BGI fastq format files.
- Rename Sequences in Lists Rename sequences within sequence lists by adding or removing characters, or replacing parts of names, optionally using regular expressions.
- Rename Elements Rename elements by adding or removing characters, or replacing parts of names, optionally using regular expressions.
- A Heat map graphics exporter has been introduced for exporting heat maps to graphics file formats.
- Files containing tab separated values (.tsv) can be imported as tables using Standard Import.
- Export VDJ tools Exports T-Cell VDJ repertoire in txt format.
- New top level menus:
- Connections For tools and functionality relevant to connections to other systems, such as a CLC Genomics Server.
- Utilities For general tools and functionality such as search tools, the Plugin Manager, Workflow Manager and Reference Data Manager.
- Improvements to the contents and order of tools in other top level menus
- The Favorites tab, where favorite tools and frequently used tools are listed for easy access, is now available in the Launch dialog and the workflow Add Elements dialog, in addition to in the Toolbox area in the bottom, left side of the Workbench.
RNA-Seq and Expression Analysis improvements
Demultiplex Reads
- Demultiplex Reads now supports setting barcodes from table elements in addition to importing barcodes from local files.
- The barcode import table format has been extended to support additional columns.
- Barcode columns can be sorted.
- When running a workflow that contains a Demultiplex Reads element, the workflow wizard can calculate a preview and remove barcodes, similar to what is seen when running Demultiplex Reads directly from the Toolbox.
- When multiple elements are provided as input, the information in the Preview pane includes information obtained from across these. Previously, only the first input element was used for this.
- BLAST has been upgraded to BLAST+ 2.12.0 that includes a number of improvements and bug fixes. A full list of BLAST+ 2.12.0 changes can be viewed at http://www.ncbi.nlm.nih.gov/books/NBK131777.
- The list of databases available using BLAST at NCBI has been expanded, including the addition of ‘16S ribosomal RNA sequences (Bacteria and Archea)’ and ‘28S ribosomal RNA sequences from Fungi type and reference material (LSU)’.
- When BLAST at NCBI is used with multiple query sequences, the job will continue even if particular sequences fail due to a problem. Results for successful searches (including those with no hits) are returned. Sequences missing from the results due to problems are recorded in the job log.
- Searches against the Patented protein sequences database using BLAST at NCBI work once again. Previously, these searches always failed, with a dialog message saying only that no hits were found even though an error was returned by the NCBI. For affected searches, the error was reported in the job log.
- Fixed an issue affecting BLAST HSP Tables where the calculation of percent overlap between blast hits in reverse direction and query sequence was based on a sequence length that was 2 base pairs two short leading to incorrect values.
- Improvements have been made to make it less likely that a “CPU usage limit was exceeded” error will be returned when running blastp, blastx, tblastn or tblastx using BLAST at NCBI.
Importer and exporter improvements
- Multiple tables can be exported to a single file when using the following exporters: Tab delimited text, Annotation tab delimited text, Table CSV, Annotation CSV.
- A new custom reads option was added to the Illumina importer. The extended options for fastq file import has been added to support 10X data, it is e.g. now possible to import three fastq files with R1, R2, and I1 as paired reads where I1 is added in front of R1.
- When exporting variant tracks to VCF format, variants that fall under thresholds to be exported can now optionally be excluded entirely from the resulting VCF file.
- When using the VCF export setting for complex variant representation “Reference overlap and depth estimate”, complex overlapping reference alleles are now exported with a homozygous reference genotype.
- The list of supported GVF attributes in column 9 has been expanded when importing GVF files using the GFF2/GTF/GVF track importer.
- 1000 Genomes annotations are now better supported by the GFF2/GTF/GVF track importer.
- The Zygosity field is now included when exporting to GVF format.
- A subset of columns to export can be specified when exporting Mapping Coverage data.
Other improvements
- Copy Number Variant Detection (CNVs) can use coverage tables generated by QC for Targeted Sequencing as control mappings. Read mappings can still be used as control mappings.
- Copy Number Variant Detection (CNVs) allows different fold-change thresholds for deletions and amplifications.
- When working with paired reads, Trim Reads allows the trimming of a fixed number of bases to apply to only read 1, only read 2, or both reads of each pair.
- An option has been added to Extract Reads or Create Reads Track from Selection to allow just one member of a pair to be extracted when only one meets the extraction criteria.
- Extract Reads accepts stand alone read mappings in addition to reads tracks as input.
- Create Sample Report can take both the Graphical and the Supplementary Report created by QC for Sequencing Reads as input.
- An option has been added to Amino Acid Changes for using one letter amino acid codes in HGVS annotations.
- Filter on Custom Criteria now accepts expression tracks as input.
- In Quantify miRNA the option to select strand-specific analysis has been removed. The analysis is now always strand-specific.
- Remove Duplicate Mapped Reads considers if reads are duplicates based on the start position of reads instead of both start and end. This allows reads that have undergone quality trimming to be recognised as duplicates.
- The distance to consider around an intron-exon boundary when using Predict Splice Site Effect can be specified. Previously a length of 2 was always used.
- A choice of extinction coefficients has been introduced in Create Sequence Statistics.
- Create Mapping Graph can now generate graphs for forward read coverage and reverse read coverage.
- The Sample Reads tool is now named Subsample Sequence List and is located under the Utility Tools | Sequence Lists subfolder of the Toolbox. Peptide sequence lists are now accepted by this tool, in addition to nucleotide sequence lists. Existing workflows containing a Sample Reads element can be upgraded as normal. The element in the workflow will remain named “Sample Reads”, as seen in the Workflow Editor.
- In the tooltip displayed when moving the mouse cursor over reads in a fully zoomed-in view of a read mapping track, the number of reads supporting a deletion is displayed, in addition to the number of reads supporting particular base calls.
- When viewing data, tabs within the same tab area can be re-ordered by drag and drop.
- When a Track List and the tracks it refers to are copied in a single operation, the new copy of the Track List will refer to the the new copies of the tracks. Previously, the new Track List continued to refer to the original tracks.
- For workflows with paired read import as part of the workflow run, and when the workflow is launched in batch mode, or contains Iterate elements, paired read handling is now the same as for the relevant NGS importer tools (Illumina, Fasta, Sanger) themselves, irrespective of how batch units are defined or organized. Previously when batch units were based on data organization and all files were in the same folder, each file was treated as a separate batch unit irrespective of whether the Paired option was checked.
- In a workflow, Extract Annotated Regions (formerly Extract Annotations) can be connected to many more downstream tools than earlier.
- A Create Sequence List workflow element is available, replacing the New | Sequence List element. Create Sequence List can be connected to many more tools downstream than the earlier element.
- Memory usage when launching workflows in batch mode has been improved.
- Trim Sequences specifies which version of the UniVec database was used, both in the report and in the history of the trimmed sequences output.
- When the option to create a log is enabled when launching analyses in batch mode, a log file is created for each batch unit, as well as a combined log for all the analyses. Previously, only the combined log was generated.
- The table search criteria “is in list” and “is not in list” can be used with integers without specifying a thousand separators in the search term.
- The few tools that directly manipulate input elements, instead of generating a new element containing the changes as output, now generate a new element as output when used within a workflow. This allows them to be handled like any other tool in a workflow context.
- In addition to sequence elements, Add attB Sites accepts sequence lists with fewer than 10,000 sequences as input.
- Internal compression of CLC data has been improved. Elements created with this version of the software, with compression enabled, can be opened in version 21.0.5 and higher. Data must be exported or saved as uncompressed if sharing data with earlier versions of the software.
- Various minor improvements
Bug fixes
- Fixed an issue in Create Box Plot where percentiles reported in the history of a box plot element were off by one. For example, the “25%-ile” value was given the 24th percentile value. The correct values were used in the plots themselves.
- Fixed an issue in Demultiplex Reads where dual barcodes were not allowed to have mismatches in both barcodes.
- Fixed an issue in Demultiplex Reads where dual barcodes could previously be selected in random combinations. Dual barcodes are now handled in pairs.
- When using the “Genome annotated with genes only” in RNA-Seq Analysis, the range of annotation track types that can be used has been expanded. This includes the use of CDS annotation tracks, among others.
- Fixed an issue in Create Sample Report where, when QC thresholds had been specified for Trim Reads, wrong values from the Trim Reads report were shown in table 1.1 Quality Control of the sample report.
- Fixed an issue that caused Create Sample Report to fail when input reports did not contain values for specified QC thresholds.
- Fixed an issue in Combine Reports and Create Sample Report where the “Mean coverage per target” section would report coverages 10x too high when including a report from QC for Targeted Sequencing.
- Fixed an issue in VCF export where, in rare cases, variants below a specified minimum allele fraction threshold were not removed.
- Fixed an issue affecting Local Realignment where large indels upstream of a target region were sometimes not used when provided as guidance variants.
- Fixed an issue that in rare cases could cause Basic Variant Detection, Low Frequency Variant Detection and Fixed Ploidy Variant Detection to fail on very high coverage samples when the “Remove pyro-errors variants” option was enabled.
- Fixed an issue where Remove Duplicate Mapped Reads did not always de-duplicate paired reads with read-through correctly.
- Fixed an issue where Remove Duplicate Mapped Reads did not always de-duplicate reverse mapping single-end reads correctly.
- Fixed an issue affecting QC for Targeted Sequencing, where it failed with an error when an RNA-Seq read mapping containing paired reads was provided as input.
- Fixed an issue in Filter on Custom Criteria where numeric annotations were sometimes not allowed to be filtered using numerical operators such as “<“, “>”, “=”.
- Fixed an issue in Trio Analysis where, in rare cases, inconsistent zygosity between mother and father could lead to a wrong annotation of inheritance. Trio Analysis now reports inheritance as ‘Inconsistent zygosity’ if zygosity or the number of alleles is inconsistent between child, mother or father.
- Fixed an issue with VCF files exported from the CLC Genomics Workbench, where fusions that had one breakpoint in common were represented in a way that prevented QIAGEN Clinical Insight Interpret from displaying the counts.
- Fixed an issue causing Quantify miRNA to fail when there were empty entries in the Accession column of miRbase
- Fixed an issue causing BLAST hits with an identity below 40% to be shown in black even if the threshold for coloring was set lower than this.
- Fixed an issue where threshold values for color selectors in the side panel of the View Area could not be adjusted.
- Fixed an issue where specifying the color range values for heat maps in the side panel settings did not work.
- Fixed an issue where the names of outputs from Output elements attached directly to an Iterate element in workflows were not as intended when the metadata ({3} placeholder was used. We generally recommend that the specific input number(s) to include in output names are specified when configuring workflows that contain control flow elements.
- An element’s position within a folder in the Navigation Area can be controlled when copy/pasting, with the pasted element appearing above a selected element in the same folder. This fixes an issue introduced in CLC Genomics Workbench 12.0, where pasted elements were always placed at the bottom of the list in a folder when pasting.
- Fixed an issue where the content of the recycle bin was not shown correctly after the recycle bin had been emptied.
- Various bug fixes
Changes
- The Sample Reads tool is now named Subsample Sequence List and is located under the Utility Tools | Sequence Lists subfolder of the Toolbox. The functionality of this tool has been expanded. See the Improvements listing above, or refer to the manual.
- The Extract Annotations tool is now named Extract Annotated Regions.
- The tool Set Up Experiment is now named Set Up Microarray Experiment.
- The Track Tools folder, containing tools for working with track elements, has been moved from the top level of the Toolbox to under the Utility Tools folder. Correspondingly, the workflow element for creating track lists is under the Utility Tools folder in the Add Elements dialog and no longer under the “New” list.
- The workflow element for creating sequence lists is under the Utility Tools folder. It no longer appears under the “New” list in the Add Elements dialog.
- The “Number of duplicates distribution” section has been removed from the report produced by Remove Duplicate Mapped Reads.
- When exporting BAM files, file names are limited to a maximum of 254 characters.
- Input modifying tools within workflows generate an output element instead of directly modifying the input provided. Workflows containing these tools may need to be edited.
- The Cut, Copy and Paste buttons have been removed from the toolbar. These functionalities are still available using items under the Edit menu or standard keyboard shortcuts.
- The Restore option under the Edit menu, for moving elements back out of the recycle bin, is now called Restore from Recycle bin.
- The Empty option under the Edit menu, for emptying the recycle bin, is now called Empty Recycle bin.
- The Java version bundled with CLC Genomics Workbench 22.0 is Java 11.0.10, where we use the JRE from AdoptOpenJDK.
The following tools have been moved to the Legacy folder of the Workbench Toolbox and will be retired in a future version of the software:
Functionality retirement
The following tools have been retired:
- Create Track from Experiment (legacy)
- Extract and Count (legacy)
- Download miRBase (legacy)
- Annotate and Merge Counts (legacy)
- Roche 454 NGxS import (legacy)
- Create Combined RNA-Seq Report (legacy)
- Remove Reference Variants (legacy)
The right-click option “Run in batch mode (legacy) for launching installed, multi-input workflows in Batch mode has been retired. Workflows can be launched in batch mode using standard launch functionality.