Introduction
During a sequencing run, the Element AVITI™ System records base calls and associated quality scores (Q scores) in .bases
files. Bases2Fastq operates off-instrument through a command-line interface (CLI) and converts the bases files into the FASTQ file format for secondary analysis with the FASTQ-compatible software of your choice.
Analysis begins with demultiplexing, which identifies each sample by the index sequences and assigns polonies to that sample. If samples are not indexed, Bases2Fastq skips demultiplexing and assigns all polonies to one sample. The software converts the demultiplexed bases into FASTQ files, generating one FASTQ file per read (e.g., Read 1 or Read 2) per sample.
Bases2Fastq includes the following features:
- Demultiplexing: Identify sequencing libraries by index sequence and generate FASTQ files
- Native Adapter Trimming: Trim adapter sequences during FASTQ generation, including the automated detection of adapter sequences
- QC Report: HTML Quality control (QC) reports to summarize run and sample quality
- Unique molecular identifier (UMI) Generate UMI FASTQ files
Setting up a Automatic FASTQ Generation in ElemBio Cloud
While sequencing run data can be demultiplexed by Bases2Fastq in any local or cloud compute environment, Element provides options to automate FASTQ generation using ElemBio Cloud. The following cloud providers can be used to automate FASTQ generation on run completion in ElemBio Cloud:
- ElemBio Catalyst, native data storage and analysis add-on within ElemBio Cloud.
- AWS HealthOmics, supported by your own AWS account.
- DNAnexus, supported by your own DNAnexus account.
Run Manifest
A run manifest is CSV file that specifies demultiplexing settings, FASTQ file settings, and sample information. By default, Bases2Fastq uses the run manifest that the AVITI System outputs into the run folder (RunManifest.csv
).
You can execute Bases2Fastq with the original run manifest from a sequencing run or an alternate corrected manifest. Optional arguments provided at execution time override run manifest settings.
For complete information on run manifests, including preparation instructions and use cases for a corrected run manifest, see the Sequencing Run Manifest Documentation.
Adapter Trimming
Library prep adds Read 1 and Read 2 adapters to each sample. When the length of Read 1 or Read 2 exceeds the length of the DNA insert, the run sequences into the adapter. Adapter trimming removes the adapter sequences from the 3' end of each read to prevent adapter-based errors in certain analyses.
Run manifest settings enable adapter trimming and specify adapter sequences. When adapter trimming is enabled in the run manifest, Bases2Fastq automatically detects and trims adapter sequences if the run manifest contains no adapter values or if the execution uses the --detect-adapters
optional argument. Adapter sequence detection requires a passing filter (PF) rate threshold of > 70% to select reference regions during detection. If you are using the Individually Addressable Lanes add-on, automatic detection leverages data from each lane. Alternately, the adapter can be explicitly specified using the adapter values (R1Adapter
and R2Adapter
) in the run manifest.
Visual Representation of Trimming adapter sequences from Read 1 and Read 2
For more information on adapter trimming settings, see the Run Manifest Documentation.
Paired-End versus Single-End Trimming
Bases2Fastq includes paired-end and single-end adapter trimming. When a sample includes insertions and deletions (indels), the software accurately trims adapters that are as short as one base.
For single-end trimming, the software determines where the adapter starts by matching the expected adapter sequences to each position in a read. Single-end adapter trimming individually processes each read, removing the adapter sequences without alignment.
For paired-end trimming, the software considers data from the expected adapter sequences and compared Read 1 and Read 2 data to determine where the adapters start. Paired-end adapter trimming aligns the Read 1 and Read 2 inserts to accurately trim short adapters.
Paired-end adapter trimming is more accurate but requires that Read 1 and Read 2 each include at least 17 cycles. Single-end adapter trimming supports applications that do not meet this requirement. Neither type of adapter trimming increases the run time.
For more information on adapter trimming settings, see the Run Manifest Documentation.
License Agreement
Use of Bases2Fastq is subject to the license agreement available at the Element Biosciences website.