Settings
Settings specify details for Bases2Fastq processes, such as demultiplexing and adapter trimming. The following sections describe the available settings and their default values. Settings that use a Boolean data type allow case-insensitive values of true
, false
, t
, f
, 0
, and 1
. T
or 1
indicate true and f
or 0
indicate false.
Columns
The Settings section includes SettingName
and Value
columns and an optional Lane
column.
Column | Constraints | Value |
---|---|---|
SettingName | Required | The name of the setting |
Value | Required | The value applied to the setting |
Lane | Optional | The number of a lane to restrict a library to: 1, 2, or 1+2 (default) |
Base Masks
A base mask specifies a set of cycles for a particular operation in Bases2Fastq. A series of operators indicates which cycles are included in the base mask. A positive integer or asterisk follows each operator to specify the applicable cycles.
- A
Y
(yes) operator indicates that a cycle is included in the mask. - An
N
(no) operator indicates that a cycle is excluded from the mask. - A positive integer indicates the number of cycles to include or exclude.
- An asterisk matches any remaining cycles in the read.
For example, Y4N*
creates a base mask for the first four cycles in a read. The base mask N3Y2N*
excludes the first three cycles of a read, includes the fourth and fifth cycles, and excludes all remaining cycles.
Read Identifiers
A base mask can include read identifiers that restrict the mask to cycles for Index 1 (I1
), Index 2 (I2
), Read 1 (R1
), or Read 2 (R2
). Each read identifier is encoded as the abbreviated read name followed by a colon (e.g., R1:
). If the base mask does not include a read identifier, Bases2Fastq uses a default read that depends on the base mask setting.
To specify one read for a base mask, start the base mask with the read identifier. If you are specifying multiple reads for a base mask, enter multiple read sections that each start with the read identifier. Separate each read section with a hyphen.
- Example base mask that applies to one read:
I1:Y3N*
- Example base mask that applies to two reads:
I1:Y3N*-I2:Y2N*
Cycle Lengths
A base mask must define the full cycle length of a read, regardless of whether you include select bases in the read or all bases. A read with a base mask that includes a subset of cycles must still account for the remaining cycles. Otherwise, Bases2Fastq displays a validation error.
For example, if Read 1 consists of 30 cycles and you want a base mask for the first 15 cycles, you must end the base mask with the remaining number of cycles. The base mask R1:Y15N15
includes the first 15 cycles (Y15
) of Read 1 (R1:
) and excludes the remaining 15 cycles (N15
). Alternatively, R1:Y15N*
achieves the same goal but uses an asterisk to cover the remaining number of cycles.
Base Mask Settings
Setting | Value | Default |
---|---|---|
R1FastQMask | A base mask that defines which cycles to record in the Read 1 FASTQ file | R1:Y*N |
R2FastQMask | A base mask that defines which cycles to record in the Read 2 FASTQ file | R2:Y*N |
I1Mask | A base mask that defines which cycles to use for Index 1 demultiplexing1 |
|
I2Mask | A base mask that defines which cycles to use for Index 2 demultiplexing1 |
|
UmiMask | A base mask that defines which cycles sequence the unique molecular identifier (UMI). The following details apply to UmiMasks:
| I1:N* |
1 No indexing indicates that indexed libraries are missing or each lane contains only one unindexed library.
Example Base Masks
The following table provides scenarios and example base masks.
Scenario | Base Mask |
---|---|
Create a base mask that includes the first two cycles of Read 1. | R1:Y2N* |
Create a base mask that includes the fourth and fifth cycles of the default read. | N3Y2N3 |
Create a base mask that includes all but the first two and last two cycles of Index 1. | I1:N2Y*N2 |
Create a base mask that includes all but the last cycles of Read 1 and Read 2. | R1:Y*N-R2:Y*N |
Use a base mask for a library that recommends only 28 base pairs in Read 1. | R1:Y28N* |
Use base masks for a 7-base UMI that is in line with Read 1. |
|
Use base masks for an 8-base UMI that is in line with Index 1. |
|
Set up base masks for a single-index library that requires Index 2 FASTQ files for secondary analysis. |
|
UMI, Index, and Control Settings
Setting | Value | Default |
---|---|---|
UmiFastQ | A Boolean value that specifies whether to generate a UMI FASTQ file. When true, Bases2Fastq generates a UmiFastQ file based on the UmiMask setting. | FALSE |
I1FastQ | A Boolean value that specifies whether to generate an I1FastQ file. When true, Bases2Fastq generates an I1FastQ file based on the I1Mask. | False |
I2FastQ | A Boolean value that specifies whether to generate an I2FastQ file. When true, Bases2Fastq generates an I2FastQ file based on the I2Mask. | False |
I1MismatchThreshold | An integer 0-2 that specifies the number of mismatches Bases2Fastq allows when demultiplexing the Index 1 sequence1 | 1 |
I2MismatchThreshold | An integer 0-2 that specifies the number of mismatches Bases2Fastq allows when demultiplexing the Index 2 sequence1 | 1 |
SpikeInAsUnassigned | A Boolean value that specifies whether to categorize PhiX Control Library reads as unassigned:
| True or false |
1 A mismatch is the number of mismatched bases between the observed index read and the expected index sequence that Bases2Fastq tolerates.
Adapter Trimming
Library prep adds Read 1 and Read 2 adapters to each sample. When the length of Read 1 or Read 2 exceeds the length of the DNA insert, the run sequences into the adapter. Adapter trimming removes the adapter sequences from the 3' end of each read to prevent adapter-based errors in certain analyses.
Run manifest settings enable adapter trimming and specify the options. When adapter trimming is enabled, Bases2Fastq automatically detects and trims adapter sequences if the run manifest contains no adapter values or the execution uses the --detect-adapters
optional argument. For more information, see the Bases2Fastq Documentation.
Figure 3: Trimming adapter sequences from Read 1 and Read 2
Paired-End versus Single-End
Bases2Fastq includes paired-end and single-end adapter trimming. Paired-end adapter trimming aligns the Read 1 and Read 2 inserts to accurately trim short adapters. When a sample includes insertions and deletions (indels), the software accurately trims adapters that are as short as one base. Single-end adapter trimming individually processes each read, removing the adapter sequences without alignment.
Paired-end adapter trimming is more accurate but requires that Read 1 and Read 2 each include at least 17 cycles. Single-end adapter trimming supports applications that do not meet this requirement. Neither type of adapter trimming increases the run time.
Default Adapter Sequences
The default R1Adapter
and R2Adapter
values for the Adept Workflow are blank. Consult the third-party library prep documentation for adapter trimming recommendations. If you do not specify values, Read 1 and Read 2 must each include at least 48 cycles. Otherwise, Bases2Fastq cannot detect and trim the adapters.
For the Elevate Workflow, the following sequences are the default values:
- R1Adapter—5' ATGTCGGAAGGTGTGCAGGCTACCGCTTGTCAACT 3'
- R2Adapter—5' ATGTCGGAAGGTGTCTGGTGAGCCAATCCAGCACG 3'
Adapter Trimming Settings
Setting | Value | Default |
---|---|---|
AdapterTrimType | A value of Paired-End or Single-End to specify the type of adapter trimming to perform | Paired-End |
R1AdapterTrim | A Boolean value that specifies whether to trim the adapter sequence from Read 1 | False |
R2AdapterTrim | A Boolean value that specifies whether to trim the adapter sequence from Read 2 | False |
R1Adapter | The adapter sequence to trim from Read 1. Valid values are A, C, G, N, and T. Separate multiple entries with a hyphen or a plus sign (e.g., ATTCCGGGGAATTTGCAT-CGGATTTTGCATT or ATTCCGGGGAATTTGCAT+CGGATTTTGCATT). | See Adapter Trimming |
R2Adapter | The adapter sequence to trim from Read 2. Valid values are A, C, G, N, and T. Separate multiple entries with a hyphen or a plus sign (e.g., ATTCCGGGGAATTTGCAT-CGGATTTTGCATT or ATTCCGGGGAATTTGCAT+CGGATTTTGCATT). | See Adapter Trimming |
R1AdapterNMask | A Boolean value that specifies whether to mask each base in the Read 1 adapter sequence with an N. This N-masking is an alternative to adapter trimming. | False |
R2AdapterNMask | A Boolean value that specifies whether to mask each base in the Read 2 adapter sequence with an N. This N-masking is an alternative to adapter trimming. | False |
R1AdapterMinimumOverlap | An integer from 1 through the Read 1 length that specifies the minimum length an adapter must be for single-end adapter trimming. Bases2Fastq does not trim adapters shorter than the value. | 3 |
R2AdapterMinimumOverlap | An integer from 1 through the Read 2 length that specifies the minimum length an adapter must be for single-end adapter trimming. Bases2Fastq does not trim adapters shorter than the value. | The lesser value:
|
R1AdapterMinimumStringency | A value 0-1 that specifies the fraction of bases that must match the Read 1 adapter sequence for single-end adapter trimming | 0.9 |
R2AdapterMinimumStringency | A value 0-1 that specifies the fraction of bases that must match the Read 2 adapter sequence for single-end adapter trimming | 0.9 |
R1AdapterMinimumTrimmedLength | An integer from 1 through the Read 1 length that specifies the minimum read length after adapter trimming. If a read is shorter than the value, Bases2Fastq removes the entire read, including the corresponding read from all FASTQ files. | The lesser value:
|
R2AdapterMinimumTrimmedLength | An integer from 1 through the Read 2 length that specifies the minimum read length after adapter trimming. If a read is shorter than the value, Bases2Fastq removes the entire read, including the corresponding read from all FASTQ files. | The lesser value:
|
Analysis Lane
Adding a Lane column to the Settings section restricts each setting to a specified lane. If you are not using the Individually Addressable Lanes add-on, you can use the column to divide samples and enable parallel analysis in secondary analysis software. The following values are valid:
1
for lane 12
for lane 21+2
for both lanes
If you omit the Lane column, Bases2Fastq applies all settings to both lanes.
[SETTINGS]
SettingName,Value,Lane,
AdapterTrimType, Paired-End, 1+2
R1AdapterTrim,FALSE,1
R1AdapterNMask,FALSE,1
R2AdapterTrim,FALSE,2
R2AdapterNMask,FALSE,2