Bases2Fastq Release Notes
v2.0.0
July 19, 2024
This release of Bases2Fastq includes enhanced troubleshooting solutions to address errors from corrupted input files or missing data and logging updates. These options provide alternatives to using the --exclude-tiles
argument to attempt to locate and skip invalid data.
New Features
- Added the following optional arguments:
--error-on-missing
: Terminates an execution and logs an error when Bases2Fastq observes that a non-critical file is missing, rather than proceeding with the execution and logging a warning.--no-error-on-invalid
: Skips non-critical files that Bases2Fastq identifies as having an invalid format and logs a warning, rather than terminating the execution and logging an error.
- For Linux OS, the static binary executable requires glibc v2.19 or later, rather than v2.17 or later.
v.1.8.0
April 29, 2024
In anticipation of the upcoming release of AVITI OS 2.6.0 and Cloudbreak UltraQ sequencing kits in June 2024, this release of Bases2Fastq supports Q50 metrics. Bases2Fastq executions that use data from runs with Cloudbreak UltraQ chemistry:
- report Q50 in metrics files, including
RunStats.json
,SampleStats.json
, andMetrics.csv
. - display an additional metric for Q50 in the HTML QC Report.
When an execution does not use data from a run with Cloudbreak UltraQ chemistry, Bases2Fastq does not provide the Q50 value. The JSON files report Q50 values of null
, and the CSV files report Q50 values of empty
.
Improvements
- Added values for Cloudbreak UltraQ sequencing kits to optional arguments.
- Increased the limit for HTML QC reports to display per sample charts from 96 to 120 samples.
Resolved Issues
- If Bases2Fastq executes with data from a run that used the Individually Addressable Lanes add-on and the R1FastQMask and R2FastQMask have different lengths for the same sample in two lanes, Bases2Fastq logs a warning. The execution proceeds with aggregating sample metrics, but the differences between the R1FastQMask and R2FastQMask can produce inaccurate metrics.
v1.7.0
March 27, 2024
Improvements
- Added values for Cloudbreak Freestyle sequencing kits to optional arguments.
- Updated validations to remove cycle maximums for I1/I2/R1/R2 for all sequencing kits.
- Updated
IndexingAssignment.csv
,UnassignedSequences.csv
, andMetrics.csv
to include metrics for individual lanes (1
or2
) and the summed lanes total (1+2
).
Resolved Issues
- The default I1Mask and I2Mask is correctly set to
Y*
when no masks are provided in the run manifest for an Adept library using the 2 x 300 Sequencing Kit Cloudbreak Medium Output. - HTML QC reports correctly state the total number of polonies, rather than the number of polonies after adapter trimming.
v1.6.0
November 20, 2023
Improvements
- To support the Individually Addressable Lanes add-on, the automatic detection for adapter trimming supports samples from each lane.
Resolved Issues
- If Bases2Fastq exceeds local disk memory during an execution, the execution stops and displays a warning message rather than truncating FASTQ files or ending unexpectedly.
- Bases2Fastq identifies
+
and-
as adapter sequence delimiters rather than causing an error.
v1.5.1
September 6, 2023
New Features
None
Resolved Issues
When you specify the
--filter-mask
option with the--qc-only
or--demux-only
optional parameters, the software successfully applies the custom filter mask to the output dataset, rather than failing to apply the options.When you specify the
--filter-mask
option with no cycles for a single read (e.g.,R1:Y15N*-R2:N*
), the software correctly interprets the basemask and uses it during execution, rather than the execution failing.
v1.5.0
August 14, 2023
The Bases2Fastq v1.5.0 Software is available.
New Features
- Projects are introduced to group sample analyses. Projects are defined per sample in Sample Custom Metadata columns in the Run Manifest.
- Within each project directory, a
{ProjectName}_RunStats.json
file and an HTML QC report are available for each project and use project-aggregated samples. Project-level files exclude run-specific values. - The default directory structure has changed. The default directory structure includes project folders that group samples. Use the
--no-projects
optional argument to retain the original output directory structure.
- Within each project directory, a
- For Cloudbreak chemistry sequencing runs, you can customize the filter mask with the
--filter-mask
options to adjust which reads pass filter during base calling.
Resolved Issues
- If one read is greater than 48 cycles, the other read is less than 32 cycles, and paired-end adapter trimming is used, Bases2Fastq fails during execution. If cycle constraints are violated for the selected adapter trimming type, an error correctly displays.
Known Issues
- When you specify the
--filter-mask
option with the--qc-only
or--demux-only
optional parameters, the software fails to apply the custom filter mask to the output dataset. - When you specify the
--filter-mask
option with no cycles for a single read (e.g.,R1:Y15N*-R2:N*
), the execution fails.
v1.4.0
April 27, 2023
The Bases2Fastq v1.4.0 Software is available. This version is required for all AVITI instrument runs performed on AVITI OS v2.0.0 or later.
New Features
The software supports the output data type for AVITI OS v2.0.0. If you use AVITI OS v2.0.0 or higher, upgrading to Bases2Fastq v1.4.0 is required to support the output data structure.
- AVITI OS v2.0.0 or later archives base calls files per cycle into a zip archive, reducing the total number of files output compared to previous AVITI OS versions.
- AVITI OS v2.0.0 also introduces a new structure for input data files. Run manifests no longer contain the
[RunParameters]
section, which are instead written to a RunParameters.json file. - Bases2Fastq recognizes both data formats.
When the run manifest does not specify adapter sequences and adapter trimming is enabled, Bases2Fastq performs detection by default. Both R1 and R2 must have at least 48 cycles to leverage this feature. You can use the option
--detect-adapters
to override any sequences in the run manifest.
Improvements
- In addition to individual lanes, the
Metrics.csv
output file reports the full run summary of both lanes (1+2).
Resolved Issues
- When running Bases2Fastq in an Amazon EC2 instance with a detected IAM role as the credential, Bases2Fastq supports
AWS_REGION
andAWS_DEFAULT_REGION
. Role credential resolution within an EC2 instance works as expected, rather than Bases2Fastq failing to resolve the region due to an update to the Amazon region metadata. - The
IndexAssignment.csv
output file reports the complete run yield, rather than a single lane yield. - Color encoding in the console output is not compatible with some operating systems and terminals that do not display color. Instead of introducing additional dependencies, color encoding is no longer used to facilitate a smoother pipeline experience for programmatic compute environments.
Known Issues
- If one read is greater than 48 cycles, the other read is less than 32 cycles, and paired-end adapter trimming is used, Bases2Fastq fails during execution. As a workaround, single-end type adapter trimming can be used instead.
v1.3.0
Oct 20, 2022
New Features
- The software detects the orientation of index sequences during demultiplexing instead of using the orientation specified in RunManifest.csv. If the detected orientation differs from the run manifest, the statistics output file indicates the detection. This feature creates a more robust method of demultiplexing by reducing user errors.
- A new statistics file,
IndexingAssignment.csv
, reports the indexing metrics for each sample pair. - A new optional argument,
--qc-only
, reports statistics without generating any FASTQ files. - The software supports rclone, a command-line program that manages files in cloud storage, as input and output remotes to interact with cloud storage from various providers. Element has not tested all rclone configurations.
Resolved Issues
- To optimize download time when executing Bases2Fastq on a subset of data, only the subset data that option filters provide is downloaded, rather than the complete dataset.
- If RunParameters.json is missing FlowCellID, Bases2Fastq reverts to using the RunID as the flow cell identifier to prevent blanks in FASTQ headers and unidentified flow cells.
- If RunParameters.json is missing RunName, Bases2Fastq reverts to using the RunID as the run name to prevent blanks in FASTQ headers.
- If whitespace was added to any fields in the FASTQ header, the whitespace is trimmed to prevent processing errors in the FASTQ headers.
Known Issues
- If one read is greater than 48 cycles, the other read is less than 32 cycles, and paired-end adapter trimming is used, Bases2Fastq fails during execution. As a workaround, single-end type adapter trimming can be used instead.