New features and improvements
Create Sample Report and Combine Reports have been substantially improved.
- Content is customizable. Sections to include and their order can be specified.
- Quality assessment criteria in Create Sample Report can be assigned traffic light colors. Combine Reports uses this information, providing a way to quickly assess overall sample quality in combined sample reports.
- Create Sample Report has support for additional quality control summary items for reports produced by QC for Sequencing Reads, QC for Read Mapping, QC for Targeted Sequencing, and RNA-Seq Analysis.
- Configurations can be re-used in future runs of Create Sample Report and Combine Reports.
- Modify Report Type A tool for changing the type of a report, affecting where the contents of that report will be included in a sample or combined report.
- Sample names to use in sample reports are configurable.
These improvements have resulted in the following changes relative to earlier versions:
- When reports that are not supported by and Combine Reports are provided as input, these tools will fail. Previously, the tools would run, but would ignore the unsupported reports. In practice, in the context of workflows where output channels for unsupported reports have been connected to the input channel of a Create Sample Report or Combine Reports workflow element. Unsupported reports cannot be entered as input in the tools launch wizards.
- Sample reports and combined reports are no longer supported as input to Create Sample Report.
- Information from reports from Map Reads to Reference, Map Reads to Contigs and Map Bisulfite Reads now have their own sections. Previously that information was included in the “Read mapping summary” section.
- Renaming within sample reports and combined reports
- “Methylation levels” is now “Call methylation levels”.
- “Duplicated mapped reads” is now “Remove duplicate mapped reads”.
- “Variants” is now “Create variant track statistics report”.
- “QC summary” is now “Quality control” (combined reports only).
Due to these updates, sample reports and combined reports created in earlier versions of the software should not be used as input to Combine Reports in CLC Genomics Workbench 24.0 or above.
- QIAseq Panel Analysis Assistant Provides access to workflows for analyzing data generated using QIAseq panels and kits, as well as associated functionality, such as downloading reference data and creating customized copies of workflows.
- Two new control flow elements:
- Branch on Sequence Count – used to control the downstream processing of a sequence list depending on the number of sequences in that list.
- Branch on Sample Quality – used to control the downstream processing of any data element based on quality available in sample reports.
- The workflow build id is included in the Workflow details section of the History view for data elements generated using installed workflows. Previously only the workflow name and workflow version were reported.
- Workflow inputs can be preconfigured with files stored on AWS S3. This is of particular relevance when using reference data stored on AWS S3.
- Options have been added to jump to the workflow element at the source or destination of a connection between elements.
Import and export
New import and export functionality
Other import and export improvements
- All importers of fastq format files now annotate sequences with UMIs if UMI information is detected in read headers.
- The llumina importer supports fastq format files with more than 2 billion reads. Such files are imported into multiple, smaller sequence lists.
- The MGI/BGI importer is more flexible in the ways it can determine the files to pair together when importing paired reads.
- The MGI/BGI importer supports joining lanes.
- SAM and BAM files can be imported from AWS S3 buckets.
- Drag-and-drop can be used for selecting files in import tools.
- When exporting heterozygous insertions or deletions to VCF as symbolic alleles, Export VCF no longer creates a non-symbolic VCF line for the reference allele.
- Export history to PDF includes information about the workflow that produced the data element.
- Non-CLC format files can be directly saved to disk from the Navigation Area using the “Save to disk…” option in the right-click menu or by dragging and dropping from the Navigation Area to a file browser.
- Non-CLC format files in a CLC File Location can be opened in a relevant program by dragging them from the Navigation Area to a program icon in a toolbar, or similar, on systems that support this. This adds to the existing functionality where double clicking on such files in the Navigation Area leads these to be opened in a relevant program on systems that support this.
- A View Settings menu has replaced the “Save View” functionality at the bottom of Side Panels of elements open for viewing. The updated functionality makes it simple to save, apply and manage view settings for elements and element types.
- The font size of the Navigation Area, Toolbox tab and Favorites tab can be increased and decreased.
- A new Track List can be created by selecting track elements in the Navigation Area and dragging them onto a track based on a compatible reference genome that is open in Track view.
- Search functionality is available in the Reference Data Manager.
- Column order can be adjusted by moving the corresponding column names up or down in the Side Panel. Previously re-ordering could be done only by dragging a column to the desired location within the table view.
- The column order for most table types can be saved and applied as view settings. Previously, column order could be adjusted when viewing tables, but the revised order could not be saved for later use or for use with other tables of the same type.
- Sets of criteria used to filter tables can be saved as Filter Sets. These can be reapplied to other tables in a single click, or exported and imported, for sharing purposes.
- When exporting an element with no contents to Excel format (.xlsx, .xls), the sheet created contains column headers. Previously, column headers were not included in this case.
- Expression data tables containing more than 1 million rows can now be sorted.
- The list of databases available using BLAST at NCBI has been expanded, including the addition of the experimental taxonomic nt databases ‘Eukaroyta nt (nt_euk)’, ‘Prokaryota (bacteria and archaea) nt (nt_prok)’ and ‘Viruses nt (nt_viruses)’.
- Spaces in the names of BLAST database locations and folders in the path are supported.
The following improvements refer to data available via the Reference Data Manager:
- In Download Genomes, “Dbsnp variants” for Homo sapiens hg19 and hg38 has been updated from dbSNP version 151 to 156.
- Under the QIAGEN Sets tab:
- The following Reference Data Elements have been added:
- Version refseq_GRCh38.p14_no_alt_analysis_set of RefSeq Genes, CDS and mRNA elements.
- Version 20231112_hg38_no_alt_analysis_set for Clinvar.
- Version 20231009_hg38_no_alt_analysis_set for Gene Ontology.
- Version dbsnp_common_v151_ucsc_hg38_no_alt_analysis_set for dbSNP Common. Note that this contains variants on the alt contigs, where the earlier versions 151_refseq_hg38_no_alt_analysis_set, which was based on NCBI’s dbSNP Common, and 151_ensembl_hg38_no_alt_analysis_set did not.
- Version dbsnp_common_v151_ucsc_hg19 for dbSNP Common.
- Reference Data Sets referring to Reference Data Elements listed above have been updated to refer to the newly added versions.
- The RNA trim adapter list included in multimodal Reference Data Sets has been updated.
- Items under the Previous Reference Data Sets tab and Previous Reference Elements tab from earlier releases are no longer available for download. Data that has already been downloaded is not affected. It will still be listed and can still be deleted using functionality in the Reference Data Manager.
Other new features and improvements
- Custom color gradients can be defined, including specifying the type and number of boundaries in the gradient, and colors to use at those boundaries.
- The Volcano plot view of Statistical Comparison Table and Tracks supports interactive customization for generation of publication-ready figures, with features’ coloring determined by p-values and fold changes, including fading of non-significant features and
- Reads can be extracted from read mappings based on the orientation they were mapped in using Extract Reads, Create Reads Track from Selection and Extract from Selection.
- Reads can be extracted from read mappings as broken pairs when only one read of a pair matches the extraction criteria using Extract from Selection, bringing this tool’s option in line with the options available in Extract Reads and Create Reads Track from Selection. The organization of the options in the Extract from Selection wizard has been updated accordingly.
- The wizard layout and options for Filter on Custom Criteria have been improved.
- Filter criteria configured in Filter on Custom Criteria can be re-used in future runs of the tool.
- Annotate with Nearby Information, previously named Annotate with Nearby Gene Information, can use any annotation track for annotating input annotation tracks.
- Substantial speed improvements for Detect and Refine Fusion Genes when the “Detect with novel exon boundaries” option is enabled and the reference sequence contains thousands of chromosomes.
- In variant tracks, replacements consisting of a combination of deletions and SNVs show the SNVs aligned to the right and the deletions aligned to the left, consistent with the general representation of SNVs and deletions in read mapping tracks. Previously, such replacements in variant tracks had SNVs aligned to the left and deletions aligned to the right.
- The names of outputs generated by Homology Based Cloning now contain the name of each of the sequences included in the cloning reaction.
- Improvements to the placement of amino acids in the Amino Acid tracks produced by Amino Acid Changes.
- A warning is shown when read mappings containing Oxford Nanopore or PacBio long reads are provided as input to Fixed Ploidy Variant Detection, Low Frequency Variant Detection or Basic Variant Detection. These tools are not recommended for use with such data.
- Read mapping tracks containing long reads (>10kbp) load faster and are more responsive. This update can affect the order reads are presented compared to when opened using earlier versions of the software.
- Speed improvements when working with large numbers of chromosomes as references (e.g. hundreds of thousands). Examples of tools affected include Convert to Tracks, Create Mapping Graph, and Identify Graph Threshold Areas.
- Outlier calculation has been improved for Combine Reports to be insensitive to rounding. This can lead to existing combined reports that contain box plots and outliers to have fewer outliers.
- Large reports open faster.
- When connected to a CLC Server, subfolders of CLC Server File System Locations that you have access to are listed at the top of the containing folder in the Workbench Navigation Area. Previously subfolders were not ordered according to access level.
- The history for elements created using an external application includes the version of the external application used.
- CLC File Locations can be removed and re-indexed when the Workbench is being run in Viewing Mode.
- Fixed a rare issue that would result in an error dialog being shown during a drag and drop operation in the Navigation Area.
- New policy property: ‘run_on_workbench_when_server_is_available’ When set to ‘deny’, server-enabled tools, and workflows, cannot be run locally on the CLC Genomics Workbench when it is connected to a CLC Genomics Server. The default is set to ‘allow’.
- Various minor improvements
- Fixed an issue that caused Annotate with Repeat and Homopolymer Information to fail when annotating variants in the second to last position on a chromosome.
- Fixed an issue that caused Annotate with Repeat and Homopolymer Information to not annotate variants, when a homopolymer or repeat spanned the origin of a circular reference sequence.
- Fixed an issue in the QC for Targeted Sequencing report section “Minimum coverage of target regions positions”, where the reported percentage of positions with a certain coverage or higher only included positions where coverage was greater than threshold values. Now positions that have a coverage that is equal to or greater than threshold values are included.
- Fixed a rare issue that caused Fixed Ploidy Variant Detection and Low Frequency Variant Detection to incorrectly assign variants that should have been heterozygous as homozygous. This happened when the quality scores for the different nucleotides at a variant position had non-overlapping distributions.
- Fixed an issue where references that contained * and/or = within their name were skipped when importing SAM or BAM mapping files.
- Fixed an issue where checking the “Create subfolders per batch unit” option in the MGI/BGI importer had no effect.
- Fixed an issue that could cause VCF export to fail when exporting fusion tracks containing fusions that were not annotated as PASS in the filter column.
- Fixed an issue where paired reads with unaligned ends overlapping within an insertion were not shown in different colors for forward and reverse reads after the option to show the strands was selected in the Side Panel view settings.
- Fixed an issue where the axis scale range for plots in reports exported to PDF format could sometimes differ from the range seen when viewing that plot in a CLC Workbench.
- Fixed an issue that resulted in some box plots from combined reports not being included in reports when exported to PDF.
- Fixed an issue where infinite values were included in report plots when the report was exported to pdf format.
- Fixed an issue where providing incomplete metadata (e.g. a column name missing) when launching a workflow containing an Iterate element would lead to the analysis stalling, instead of failing with an error.
- Sorting Local Search results based on sized sorts in numerical order. Previously, the sorting was alphabetical.
- Fixed an issue in Download BLAST Databases that caused part of the description of the available databases to be hidden in the launch wizard.
- Fixed an issue affecting the Illumina and MGI/BGI importers where, when all the read files provided as input were compressed with zip, the “Paired reads” option was disabled.
- Fixed an issue in the Workflow Manager where multiple workflow installer files (.cpw) could be selected at the same time. Selection is now restricted to a single workflow file per installation action.
- Fixed an issue where retired Reference Data Elements could be listed under the QIAGEN Sets tab of the Reference Data Manager even though they were no longer available for download.
- Various minor bug fixes
- The following tool names have been updated to better reflect their functionality:
- The following tools have been moved in the Toolbox:
- De Novo Assembly no longer accepts PacBio and PacBio HIFI long reads. For de novo assembly of long reads, use tools from the Long Read Support plugin.
- Map Reads to Reference no longer uses a specialised mapping algorithm when mapping PacBio reads. For this data type, we recommend using Map Long Reads to Reference, available from the Long Read Support plugin.
- The SRA blast database can no longer be used with BLAST at NCBI, because NCBI no longer supports blasting against that database via their API.
- BLAST has been upgraded to BLAST+ 2.14.0. BLAST+ changes can be viewed at http://www.ncbi.nlm.nih.gov/books/NBK131777.
- The SRA toolkit has been updated to version 3.0.2.
- The Java version bundled with CLC Genomics Workbench 24.0 is Java 188.8.131.52, where we use the JRE from the Azul OpenJDK builds.
- Dedicated installers for Intel and ARM-based mac systems are available.
The following tools have been retired:
- QIAGEN GeneReader importer (Legacy)
Important: For network licenses, CLC Network License Manager 5.5.3 or above when available, is needed. See the latest improvements page for release notes. Upgrading the software is described in the manual. Get the CLC Network License Manager installers for the latest version via the website. The version running on a system can be found in the VERSION.txt file in the installation folder of the CLC Network License Manager.
This section includes information not included elsewhere. Please see the dedicated plugin latest improvements pages for details about improvements to other plugins. Links to plugin latest improvements pages are provided on plugin product webpages.
- Tools for analyzing long reads are available from the Long Read Support plugin.
- Various improvements to the Navigation Tools plugin, including bookmarking items in CLC Server File System Locations, renaming bookmarks, and easier navigation to bookmarked items.
- Tools delivered by the Vector NTI import plugin are now legacy. This plugin will be retired in future.
The Sequence Representation functionality, allowing the display name for sequence elements in the Navigation Area to based on the accession, latin name or common name of that sequence has been marked as legacy functionality. Setting the display name for sequence elements in the Navigation Area to the accession, latin name or common name in a sequence record using the Sequence Representation functionality, via a right-click menu or in Preference settings, will be retired in a future release.