Server specific
New features and improvements
- Contents of import/export directories can be browsed from the web interface in the Browse server import/export directories tab under Element info.
- Contents of AWS S3 buckets accessible using AWS Connections configured in the CLC Server can be browsed from the web interface in the Browse S3 locations tab under Element info. Data can be uploaded to S3, downloaded from S3 and deleted in S3 from this area.
- In External Applications, a static script can be specified using the new parameter type: Included script. A script provided using this option becomes accessible to the external process at runtime. This enables integration scripts or extensive parameter files to be included in the External Application and injected into the execution context, rather than being an external dependency. For containerized External Applications this may be the injected integration that enables the direct use of a public available container.
- Files from AWS S3 can now be selected for the External file parameter type of External Applications.
Other improvements
- Search functionality has been substantially improved. Please see the “Important change to search indexes – action needed” section below about changes to search indexes related to this improvement. Indexes for all CLC Locations can be rebuilt using the “Rebuild all indexes” button under Configuration | Main Configuration | File system locations.
- Admin level access to the audit log can be granted to specified groups. The ability to broaden access beyond admin users to installing and configuring workflows, configuring and enabling external applications, and viewing the CLC Server queue, was introduced with version 22.0.
- The message returned upon successful login to the CLC Server now includes information about the connection (username, the CLC Server description, and encryption status). Previously the return message was “Login successful”.
- The full names of graphics exporters are listed when configuring External Applications. Previously, the name “Graphics” was used for each of these.
- A search box has been added to several locations in the web client where long lists are presented, for example, in the Algorithms section under Global Permissions.
- Active CLC File System Locations are listed in alphanumeric order in the web administrative interface. Previously they were listed in the order they were added.
- Apache Tomcat has been updated to version 9.0.65.
- Various minor improvements
Bug fixes
- Fixed an issue where CLC Workbenches could not interact with elements stored on a CLC Server if those elements were created using tools provided by a plugin that was no longer installed on the CLC Server.
- Fixed an issue where text in installer screens was not visible when installing the software in ‘dark mode’ on Linux.
- Various minor bugfixes
Changes
- AWS account details are now entered into AWS Connections. This term replaces the earlier term: “S3 locations”. An AWS region can be specified. When upgrading from an earlier version with AWS account information already configured, the region will be set to the default for the specified AWS partition. For AWS Standard, this is us-east-1. The region can be updated by editing the connection. The region setting is primarily relevant if you plan to submit analyses from a CLC Server with the Cloud Server Plugin installed to run on a CLC Genomics Cloud setup.
- The Core tasks area under the Global Permissions tab has been removed. Standard Data Import is now listed under the Algorithms section with the name “Import Standard Data”. The Data Export setting under Core tasks was legacy functionality, only relevant to External Applications with exporters configured in CLC Genomics Server 9.x and earlier. Permissions previously set for both standard import and legacy exporters are retained.
- The Java version bundled with CLC Genomics Server 23.0 Java 17.0.4, where we use the JRE from the Azul OpenJDK builds.
Important change to search indexes – action needed
- Search functionality has been substantially improved. Associated with this, indexes for all CLC Server data locations must be rebuilt after upgrading to 23.0. If they are not, searches for elements in these locations will not find any results, and data associations to CLC Metadata Tables will not be registered. Indexes built using version 23.0 are placed in a folder called “searchindex2” in the installation area of the CLC Server.
- Old search indexes are not automatically deleted. They can be left in place without detrimental effect, or deleted manually. They are found in the folder “searchindex” in the installation area of the CLC Server.
Functionality retirement
- Boolean compound parameters in External Applications. These were made legacy with version 21.0 and are no longer supported in External Application configurations.
Shared with CLC Workbenches
- Create K-medoids Clustering for RNA-Seq finds clusters of features, e.g., genes/transcripts/miRNAs etc, whose expressions behave similarly, for example first increasing over time and then decreasing. The tool produces a Clustering Collection which contains a Sankey plot showing how these features move between clusters under different conditions, for example different treatments. A line graph representation of features from individual clusters or pairs of clusters is present as well.
- Detect and Refine Fusion Genes – Find fusion genes in RNA-Seq data by identifying potential fusions and then refining that list by evaluation of the evidence for each fusion. This is an updated version of the tool formerly distributed in the Biomedical Genomics Analysis Server Plugin. The updates made are listed in an Improvements section below.
- Target Region Coverage Analysis – Analyze and compare coverage from multiple samples. This tool was formerly distributed in the Biomedical Genomics Analysis Server Plugin..
- Create Consensus Sequences from Variants – Create consensus sequences from a variant track and a reference sequence. This tool was formerly distributed in the Biomedical Genomics Analysis Server Plugin.
- Annotate with GFF/GVF/GTF file – Add annotations from a GFF, GVF or GTF format file onto sequences, individual or in sequence lists. This tool was formerly distributed in the Annotate with GFF file sever plugin.
Other new functionality and improvements
- Substantial speed improvements to RNA-Seq Analysis. Reads that map to multiple transcripts or genes will be distributed differently than earlier due to different choices of random seed in the new implementation. The algorithm is still deterministic.
- Transcripts are no longer renamed in Transcript Expression (TE) output unless renaming is necessary to avoid duplicate names. Previously, transcripts were renamed to the gene name plus a number e.g. “BRCA_1”. This change means that TE tracks in this version of the software cannot typically be used together with TE tracks generated using older versions to produce Heat Maps, PCA plots, Expression, etc.
- Reports UMI fragment counts when relevant. UMI counts are included in the Fragment statistics section of the report if the input reads are annotated with UMIs by tools from the Biomedical Genomics Analysis plugin, and if the library type is set to 3′ sequencing for RNA-Seq Analysis.
- Venn diagrams support four and five groups. Previously up to 3 were supported. Tooltips indicate which groups are part of a specific intersection.
- Quantify miRNA:
- Handles custom databases containing duplicated names.
- Does not allow custom databases containing sequences longer than 60bp. This avoids misallocation of reads to sequences that are similar to small RNAs.
- When adding multiple inputs to Extract IsomiR Counts, the extracted expression tables contain an entry for the combined set of IsomiRs identified among the samples, making them compatible for analysis in Differential Expression in Two Groups and Differential Expression for RNA-Seq.
Differential Expression for RNA-Seq and Differential Expression in Two Groups
Detect and Refine Fusion Genes
This is an updated version of Detect and Refine Fusion Genes, formerly distributed in the Biomedical Genomics Analysis Server Plugin. The updates listed here are relative to the version distributed with Biomedical Genomics Analysis Server Plugin 22.2.
- Fusions will not be called for overlapping genes.
- Novel exon boundary improvements:
- Options have been expanded to allow for detecting fusions with a single fusion partner (“Detect with novel exon boundaries”) as well as detecting those with 2 fusion partners (“Allow fusions with novel exon boundaries in both genes”)
- The “Detect exon skippings” option supports detection of fusions with novel exon boundaries.
- An option has been added to omit non-significant breakpoints from the report.
- A minimum Z-score can now be specified for use when evaluating evidence for a fusion.
- Speed improvements
- The option “Allow fusions with novel exon boundaries in both genes” now defaults to false to reduce the number of false positive fusions. Setting it to true is useful for exhaustive searches of novel fusions.
- Changes to the maximum number of equivalent matches to the reference allowed for a single read to be retained:
- When remapping reads to a fusion chromosome, the maximum number is now 30. Previously it was 10.
- When searching for unaligned ends, the maximum number remains unchanged, as 10.
- The option “Maximum number of hits for a read” has been removed. It’s value was ignored in previous versions.
- Fusions from mRNA transcripts without an associated gene in the Gene track are not used when detecting fusions. mRNA transcript features must have a gene id in one of the following columns to be matched with the associated gene: “Parent”, “gene_id” or “gene_name”.
- Fixed an issue where paired end reads were treated as single end reads when the option to “Only use fusion primer reads” was enabled.
- Fixed an issue where unaligned ends could be too long or too short for reads containing insertions and deletions. This change may lead to small differences in results compared to earlier versions, expected to be due to a decrease in false positive and false negatives reported.
Bisulfite mapping
- Map Reads to Bisulfite Reference speed improvement. This is data dependent, with about a 50% improvement likely for most data sets. This speed up might change the details of results very slightly.
- Call Methylation Level speed improvement. This speedup might, in some cases, change results very slightly.
- Import of read mappings from SAM/BAM now use methylation information from the optional SAM tags XR for read conversion and XG for reference conversion. The recognized values are “CT” and “GA”. Support for these tags is added so that information is not lost if a bisulfite mapping is exported and then re-imported.
- Export of read mappings to SAM/BAM format now includes details on bisulfite conversion. These are specified using the SAM tags XR for read conversion and XG for reference conversion. The possible values of these tags are “CT” and “GA”. This is provided for increased compatibility with third party tools.
Import and export
- VCF Import:
- Supports symbolic alleles for inversions (<INV>), insertions (<INS>), deletions (<DEL>) and tandem duplications (<TANDEM:DUP>). Symbolic alleles that do not contain sequence information or are longer than 100,000 base pairs are imported to annotation tracks instead of variant tracks. Previously symbolic alleles were not imported.
- Improved handling of variants with multiple loci encoded in the same vcf record.
- VCF Export supports symbolic allele representation for insertions (<INS>), deletions (<DEL>) and tandem duplications (<TANDEM:DUP>). (Inversions (<INV>) were already supported.) With the exception of deletions, variants in annotation tracks are always exported as symbolic alleles. Deletions in annotation tracks and variants in variant tracks above a specified size are also exported as symbolic alleles. The default size is 1000 bp, which corresponds with the QCI Interpret requirement that InDels > 1000 bp must be represented as symbolic alleles.
- The PacBio importer supports HiFi reads.
- The read length when exporting to FASTQ format files has been increased from 524,288 bp to 16,777,216 bp.
- SAM/BAM Mapping Files importer:
- Performance improvements
- The circular flag of references is now retained.
- Import Tracks from File has been updated to show a warning if the file is not imported.
- GFF3 Export retains the case of attribute headers. Previously, all headers were adjusted to lower case during export.
- The history information of elements imported using Standard Import includes the specific importer used (e.g. “CSV table importer”, “Fasta Importer”, etc).
- Standard Import can be used to import files from AWS S3 locations.
- When exporting images to bitmap-based formats, the Screen resolution and High resolution options are now bounded so the maximum supported number of pixels will not be exceeded.
Various
- Read mapping speed on Apple Silicon processors has been improved. Read mapping results are not affected by this. Tools benefiting from this change include Map Reads to Reference, RNA-Seq Analysis, Map Reads to Contigs and Map Bisulfite Reads to Reference.
- Branch on Coverage – a new workflow control flow element where the downstream processing of read mappings can be controlled based on coverage values within reports.
- Barcodes can be preconfigured in Demultiplex Reads elements in workflows.
- Demultiplex Reads has been updated to:
- Report barcodes without any matched reads
- Show the barcodes names in the history.
- Workflow Export elements can be preconfigured to export to locations on AWS S3.
- When Low Frequency Variant Detection, Fixed Ploidy Variant Detection or Basic Variant Detection was used with a mapping realigned using Local Realignment with a guidance variant track, it was possible for partial insertions to be called. Now, the full insertion must be present within at least one, individual read for it to be reported.
- QC for Targeted Sequencing:
- Can report coverage statistics per gene.
- Supports analysis of read mappings generated by RNA-Seq Analysis.
- Annotate with Exon Numbers:
- Can add exon numbers to elements in annotation, expression and statistical comparison tracks. Previously only variant tracks could be annotated with exon numbers.
- Adds exon numbers when input elements start outside an exon but still overlap the exon.
- Adds all exons when multiple exons overlaps a single input element.
- Allows annotation with exons from only one transcript or CDS.
- Filter on Custom Criteria can be used to filter Statistical Comparison Tracks, Statistical Comparison Tables, IsomiR tables, and miRNA Seed Tables.
- Reports from Create Sample Reports and Combined Report generated using RNA-Seq reports now include the percentage of reads mapped to exons in the Fragment counting statistics table.
- In Create Sample Report, the percentage of target region positions with coverage above a set threshold can be used as a QC metric.
- QC for Sequencing Reads processes only the first 100,000 base pairs in long reads. Before the tool would fail when provided with very long reads.
- When Annotate with Overlap Information is included more than once in the same workflow, columns with overlap information are now always added in the same order. Previously, concurrency issues could cause column order to be different between different runs.
- Local Realignment no longer realigns reads into regions with no coverage, such as introns in RNA-Seq read mappings.
- Remove Duplicate Mapped Reads uses an improved method to identify duplicate reads when handling paired end reads. In general, this improvement results in slightly more reads being considered duplicates.
- The options for extracting reads according to their location relative to features in an overlap track have been expanded in Extract Reads. Previously reads had to lie fully within an annotated region to be extracted. Now, in addition to that condition, options are provided for extracting any overlapping reads, extracting only reads that fully span annotated regions or extracting all reads except those that overlap with annotations in the overlap track.
- Assemble Sequences to Reference supports alignment of reads that span the origin of a circular reference.
- Secondary Peak Calling has a new option “Peak detection stringency”.
- The report from Copy Number Variant Detection (CNVs):
- Includes a table showing the number of genes affected by CNV calls.
- Contains new coverage plots at genome and chromosome levels.
- The Trim Reads report now includes statistics for the number of reads in intact pairs and in broken pairs.
- Updated restriction site database to REBASE 2022-06-30.
- The Identify Known Mutations from Mappings output channel names when used in a workflow have been improved. The elements produced by the tool have not been changed.
- While viewing data, in most situations, tooltips can be suppressed by holding down the Ctrl key. Similarly those tooltips can be displayed immediately, instead of a moment after the mouse cursor stops moving, by holding down the Shift key.
- Various minor improvements
Bug fixes
- Low Frequency Variant Detection, Fixed Ploidy Variant Detection and Basic Variant Detection:
- Fixed an issue that in very rare cases caused insertions to be called twice. Now, the same insertion is always only included once in the variant track.
- Fixed an issue in the remove pyro-error variants filter. Previously, the frequency threshold for removing pyro-error variants was ignored and more variants than intended were removed. The filter is generally only used for Ion Torrent data. This fix may result in a small improvement to the precision of variant detection.
- Fixed a rare issue affecting variant calling in very low coverage regions, where a variant could be reported that was not present in any single read in the mapping.
- Fixed an issue causing Map Reads to Reference to fail if a masking track covering a whole chromosome was provided as input.
- RNA-Seq Analysis
- Fixed an issue where reads were not counted as unique for a transcript in the GE track table, if the read could map in multiple ways to the same transcript, but only to that transcript.
- Fixed an issue that could lead to an IndexOutOfBounds error when the option “Calculate expression for genes without transcripts” was selected, and two or more genes had the same name, and at least one of these has no transcripts, and the Region column of the table view of the gene track contains the text “join”, “>”, or “<” (i.e., the genes have splice structure, or uncertain end positions).
- Fixed an issue where the gene identifier would be removed from the statistical comparison track and tables produced by the Differential Expression for RNA-Seq tool when it was not recognized to be an Ensembl gene identifier.
- Fixed an issue in Differential Expression in Two Groups and Differential Expression for RNA-Seq that affected the estimation of dispersion estimates including information from nearby genes. This leads to slightly different p-values produced by by these 2 tools.
- Fixed an issue affecting Extract Consensus Sequence where annotations transferred from the reference sequence to the consensus sequence could be wrongly positioned if the read mapping had an insertion in a region that was removed due to low coverage.
- Fixed an issue where, if two genes had the same name and overlapped, their transcripts might become assigned to only one of the genes. The fix only applies when the gene and transcript annotations are imported from GFF3.
- Fixed an issue affecting the naming of outputs from Local Realignment when the tool was provided with multiple read mappings as input and not run in batch mode. Each resulting realigned read mapping is now named after the corresponding input. Previously all the realigned read mappings were named after the first read mapping in the set of inputs.
- QC for Sequencing Reads
- Fixed an issue in the report where the graph for R1 nucleotide contributions would be truncated to only show the same number of nucleotides as the R2 plot.
- Fixed an issue where the median read length in the supplementary report could be incorrect when the number of reads was very low. The median reported in the graphical report was correct.
- Amino Acid Changes
- Fixed an issue causing the output from to be named after the reference data instead of the input data.
- Fixed an issue that caused the transcripts and proteins listed in the Coding region change and Amino acid change columns in the annotated variant track output to be inconsistently ordered.
- Fixed an issue in the Trim Reads report, where the number of reads under “No trim” could be incorrect when “Remove fixed number of bases” was enabled.
- Fixed an issue causing Show Enzymes Cutting Inside/Outside Selection to give wrong results when the selection crossed the junction of a circular sequence and a desired number of cut sites outside the selection was not specified.
- Fixed an issue in VCF Export, where specified minimum ploidy was not always enforced for complex variants. The issue would only occur when an allele had first been removed from a locus to adhere to the specified maximum ploidy.
- Fixed an issue where the wrong entry in a trim adapter list would be opened for editing if the list had been sorted or filtered.
- Fixed a rare issue in K-means/medoids clustering where a gene could be output in multiple clusters. This would occur when genes with identical expressions were chosen to be medoids, and so would only happen when K was comparable to the number of genes with unique expressions across samples.
- Fixed issues with Quantify miRNA where:
- It would fail on paired reads if using spike-ins.
- Opening a sequence list to view it would cause this tool to fail if that same sequence list had been used as input.
- In the report from Create Sample Report the value column in the summary table is coloured green or yellow according to whether the threshold is met. Previously, the threshold column was coloured.
- Workflow related
- Fixed an issue affecting the location of outputs generated from a workflow element that was also linked to a Collect and Distribute element. In cases where the output folder name was defined using the {input} or {2} placeholder, these outputs were sometimes all saved to the first folder created, instead of to different folders as intended.
- Fixed an issue where default names were applied to outputs from Output elements attached directly to an Iterate element in workflows, even when naming placeholders had been configured.
- Fixed an issue affecting workflows with nested Iterate elements where results from the outer level of iteration flowed into a Distribute and Collect element. Any output elements generated in the inner iteration, which should have saved, were lost.
- Fixed an issue where unlocked options for on-the-fly importers in a workflow would be locked if the Input element was re-opened for editing.
- Fixed an issue affecting hyperlinked table entries, where html tags were sometimes included as text in the information exported to Excel or CSV formats.
- Fixed an issue where text in installer screens was not visible when installing the software in ‘dark mode’ on Linux.
- Various other minor bug fixes
The following tools are now legacy tools and will be retired in a future version of the software:
- QIAGEN GeneReader importer (Legacy)
Functionality retirement
The following tools have been retired:
-
- Compare Sample Variant Tracks (Legacy)
- Empirical Analysis of DGE (Legacy)
Plugin notes
Plugin retirements
Compatibility
The follow are the corresponding client applications for CLC Genomics Server 23.0:
-
-
- CLC Genomics Workbench 23.0
- CLC Main Workbench 23.0
- CLC Command Line Tools 23.0
Please see the CLC Genomics Server 23.0 listings above for the details about the new tools and features listed here.
These are the draft release notes for CLC Server Command Line tools 23.0, due for release mid-January, 2023.
Installers for this product are available as “early access” via links at the bottom of this page. These products are not supported, and we recommend they are not used in production during the early access period.
- create_kmedoids_for_rnaseq
- annotate_with_gff (previously distributed in the Annotate with GFF file plugin)
- consensus_from_variants (previously distributed in the Biomedical Genomics Analysis plugin)
- detect_and_refine_fusion_genes (previously distributed in the Biomedical Genomics Analysis plugin)
- target_region_coverage_analysis (previously distributed in the Biomedical Genomics Analysis plugin)
- differential_expression_rna_seq
- option added: –downweight-outliers
- differential_expression_two_groups
- option added: –downweight-outliers
- download_sra
- option removed: –aspera-limit
- option removed: –enable-aspera
- extract_overlapping_reads
- option removed: –in-interval
- option added: –overlap-type
- process_tagged_sequences
- option added: –barcode-values
Barcode structure and barcode values are now provided in separate parameters:
--barcode-1 "linker type#MULTIPLEX_BARCODE#fixedLength#3;linker type#MULTIPLEX_SEQUENCE#maxLength#1000"
--barcode-2 "linker type#MULTIPLEX_BARCODE#fixedLength#3;linker type#MULTIPLEX_SEQUENCE#maxLength#1000"
--barcode-values "a/b#AAA/GGG#ATA/ATA"
Previously, structure and values were provided in the same parameter.
- secondary_peak_calling
- option added: –peak-slope-stringency
- statistics_target_regions
- option added: –create-gene-coverage-track
- option added: –genes
- ngs_import_pacbio
- option added: –hifi-reads
- option removed: –only-sequencing-zmws
- option remove: –read-hq-regions
- Basic data operations such as copying, can be carried out on data elements created using plugins.
- compare_sample_variant_tracks
- empirical_analysis_dge
- Fixed an issue where text in installer screens was not visible when installing the software in ‘dark mode’ on Linux.