Bases2Fastq

The Bases2Fastq Software demultiplexess sequencing data and converts base calls into FASTQ files for secondary analysis with the FASTQ-compatible software of your choice. The Element AVITI™ System records base calls, which are the main output of a sequencing run, with associated quality scores (Q-scores) in bases files. Bases files must be converted into the FASTQ file format for secondary analysis.

This workflow guide provides informationon running Bases2Fastq, including system requirements,commands and arguments,a nd input and output files. Use is subject to the license available at go.elembio.link/eula.

Figure 1: Converting bases into FASTQ files Bases2Fastq

Software Features

Bases2Fastq runs off-instrument and has a command-line interface (CLI). A generalrun commandexecutesthe software.A Bases2Fastqrun startswithdemultiplexing,whichidentifieseach sample by the index sequence and assigns polonies based on the sequence. If samples are not indexed, Bases2Fastq skips demultiplexing and assigns all polonies to one sample. Bases2Fastq converts the demultiplexed bases into FASTQ files, generating one FASTQ file per read (e.g.,Read 1 or Read 2) per sample.

Arguments appended to the general run command let you adjust demultiplexing and FASTQ file generation to suit the application and leverage the following features.

For a complete list of options, see Optional Arguments.

Paired-end and single-end adapter trimming, including detection of adapter sequences
Base masking that includes or excludes cycles from FASTQ files
Index sequence orientation detection,which you can enable or disable
Quality control (QC) reports that open in a browser
Unique molecular identifier (UMI) support

Run Manifest

A run manifest is a comma-separatedvalues (CSV) file that specifies demultiplexing settings, FASTQ file settings,and sample information. Bases2Fastq uses the run manifest to execute a run, by default accessing the output run manifest named RunManifest.csv in the run folder.

Run Bases2Fastq with the output run manifest or a prepared corrected version. You can also use Bases2Fastq arguments to override run manifest settings.The Run ManifestWorkflowGuide(MA-00011) provides complete information, including preparation instructions.

Default Run Manifest

If you do not upload a prepared run manifest during run setup, the AVITI System creates a default run manifest based on the run parameters entered. A default run manifest does not contain index sequences and assigns all reads to one sample during FASTQ generation.

Demultiplexing indexed librariesis not possible with a defaultrun manifest. If sequencing indexed libraries, prepareand uploada run manifest with index sequencesto the AVITISystem. Alternatively,you can create a corrected run manifest with all samples and their associated index sequencesto use with Bases2Fastqafter sequencingcompletes.

Corrected Run Manifest

To create a corrected run manifest,updatethe run manifestused for the sequencingrun. Run Bases2Fastq with the corrected version. For an example,see Run a CorrectedRun Manifeston page 18

The followin gcases requirea corrected run manifest:

You sequencedindexedlibrarieswitha run manifestthat does not includeindexsequences.
A run manifest failed or reported incorrect resultsdue to incorrect settings or indexes,requiringreprocessing.
During sequencing,you did not upload a run manifest with indexed samples. Use a corrected run manifest with all samples and their associated index sequences to demultiplex.

Software and System Setup

Bases2Fastq is run as eithera staticbinaryexecutableor in a containerizedexecutionwithDocker.Therefore,settingup Bases2Fastq requires setting up Docker or the static binary on a computer that meets systemrequirements.The computermust also be configured to transfer input and outputfiles.

note

Bases2Fastq cannot run on an arm processor.

System Requirements

The computer running Bases2Fastq must meet the following requirements:

Operating system (OS)—Docker runs on any OS. Static binary requires LinuxOS on an x86 architecture with glibc 2.17 or later installed. To verify the glibc version for static binary,run ldd --version.
Memory—Both Docker and static binary require 4 GB RAM per concurrent thread.

Software performance depends on the resources dedicated to the processing environment. Element recommends running Bases2Fastq in the cloud on the Amazon m5dn.12xlarge EC2 instance,a virtual server,for optimal performance.Review instance specifications at aws.amazon.com/ec2/instance-types/m5/.

File Transfer and Storage

Store input and output files in a local location or the cloud using Amazon Web Services (AWS), Google Cloud Storage (GCS), or an rclone-compatible provider. Transferring files requires paths to the input and output locations. If the locations are both AWS or both GCS, they must use the same set of credentials.

When using cloud storage,Bases2Fastq downloads input files and stages output files in a temporary directory. The temporary directory must have ~800 GB of scratch space. By default, Bases2Fastq uses the temporary directory of the OS. To change the location of the temporary directory,set the environment variable TMPDIR. In Linux,run the command export TMPDIR="/path/to/scratch", replacing /path/to/scratch with the desired directory.

AWS Storage

AWS storage requires Uniform Resource Identifiers (URIs) to serve as pathsto the Amazon Simple Storage Service(S3) buckets that containthe inputand outputfiles. Accessing the buckets requires AWSc redentials configuredf or the setup.

For AWS withan EC2 instance, Bases2Fastq detects the credentialsor Identity and Access Management (IAM) role associated with the instance.
For AWS without an EC2 instance,define AWS credentials with the following environment variables:
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
- AWS_DEFAULT_REGION

GCS Storage

GCS storage requires Uniform Resource Locators (URLs) to serve as preconfigured paths to the Cloud Storage buckets that contain the input and output files. Set the environmentvariableGOOGLE_APPLICATION_CREDENTIALS to link to the file containing applicationcredentials.

Rclone-Compatible Storage

The file transfer service allows Bases2Fastq to access cloud storage locations.The service is compatible with many cloud storage providers. owever,Element has not tested every available rclone provider.

Followthe instructions at rclone.org/install to download and install rclone. Configure rclone to communicate with your cloud storage per the storage provider-specific instructionsat rclone.org/#providers.

Docker Setup

To downloadand install Docker, follow the OS-specific instructions at docs.docker.com/get-docker/. Element maintains a public registry at hub.docker.com/repository/docker/elembio/bases2fastq.

Static Binary Setup

Static binary setup requires downloading and extracting the static binary. Current and previou sversions are available for installation.

Set Up the Current Version of Bases2Fastq

Download the static binary using one of the following methods:
- Visit go.elementbiosciences.com/bases2fastq-download and follow the onscreen prompts.
- Run the wget command

wget https://bases2fastq-release.s3.amazonaws.com/bases2fastq- latest.tar.gz.

Run the curl command

curl https://bases2fastq-release.s3.amazonaws.com/bases2fastq- latest.tar.gz.

Runtar -xzvf bases2fastq.tar.gz to extractthe downloadedfile.
Run ./bases2fastq --version to displaythe software version and confirm that Bases2Fastq is operational.
To generate HTML QC reports,run the following commands to install Python 3.6 or newer with NumPy,Bokeh,and bs4 packages:

sudo apt install python3 python3-pip libjpeg-dev zlib1g-dev
pip3 install numpy==1.\* bokeh==2.\* bs4==0.\*

—The bs4 package requires Pillow, which in turn requireslibjpeg-dev andzlib-dev. If you do not install Python 3.6 or newer or a package is missing, Bases2Fastq logs a warning and does not generate the report.—

Set Up a Previous Version of Bases2Fastq

Visitgo.elembio.link/documentationto reviewreleasenotes for previousversionsof Bases2Fastq.
Download the static binary using one of the following methods:

Run the following wget command with a version number in place of <version>:

wget https://bases2fastq- release.s3.amazonaws.com/bases2fastq-<version>.tar.gz

Run the following curl command with a version number in place of <version>:

curl https://bases2fastq- release.s3.amazonaws.com/bases2fastq-<version>.tar.gz.

—For example, to use wget to download Bases2Fastq v1.2.0, run the commandwget https://bases2fastq- release.s3.amazonaws.com/bases2fastq-v1.2.0.tar.gz.—

Run tar -xzvf bases2fastq.tar.gz to extract the downloaded file.
Run ./bases2fastq --version to display the software version and confirm that Bases2Fastq is operational.
To generate HTML QC reports,run the following commandsto install Python 3.6 or newer with NumPy ,Bokeh, and bs4 packages:

sudo apt install python3 python3-pip libjpeg-dev zlib1g-dev
pip3 install numpy==1.\* bokeh==2.\* bs4==0.\*

Running Bases2Fastq

Type the general run command that applies to your storage solution.

Storage	General Run Command
AmazonS3	`bases2fastq s3://bucket/input s3://bucket/output [options]`
GCS	`bases2fastq gs://bucket/input gs://bucket/output [options]`
Local	`bases2fastq /path/to/input /path/to/output [options]`
rclone	`bases2fastq path/to/input/remote path/to/output/remote --input-remote "input-name" --output-remote "output-name" [options]`

—When using Docker, the general run command automatically pulls the image to your local environment.—

Replace the input and output locations in the general run command.

Storage	Location to Replace	Replacement
AmazonS3	s3://bucket/input	URI to the AmazonS3 bucket that stores the input files
	s3://bucket/output	URI to the AmazonS3 bucketthat stores the FASTQ files and other outputs
GCS	gs://bucket/input	URI to the Cloud Storage bucket that stores the input files
	gs://bucket/output	URI to the GCS bucket that stores the FASTQ files and other outputs
Local	/path/to/input	Directory where Bases2Fastq accesses the input files
	/path/to/output	Directory where Bases2Fastq writes the FASTQ files and other outputs
rclone	path/to/input/remote	Provider path to the rclone remote that points to the input files
	input-name	Name of the rclone remote that pointsto the input files
	path/to/output/remote	Provider path to the rclone remote that points to the FASTQ files and other outputs
	output-name	Name of the rclone remote that points to the location of the FASTQ files and other outputs

Adjust arguments in the general run command.
- To add arguments,replace [options] with any of the arguments listed in OptionalArguments
- To run Bases2Fastq without any arguments, delete [options].
Press Enter to run Bases2Fastq. —The terminal displays run progress.—
Wait for the terminal to display the elapsed time, which indicates that processing is complete.
Access the output files in the location specified in the general run command.
- To viewlogs, go to info/Bases2Fastq.log.
- To view the HTML QC report, go to the output folder, double-click the file, and move through each tab.

Optional Arguments

The following arguments are optional additions to the general run command. Arguments that affect run parameters default to a value recorded in RunParameters.json, which is an output of a sequencing run and an input for Bases2Fastq.

Argument	Command	Default
--chemistry-version	Overwrite the sequencing kit version. Valid values are 1 for a version 1 kit or Cloudbreak for a Cloudbreak™ kit.	RunParameters.json
--demux-only,-d	Enable demultiplexing-only mode, which performs demultiplexing and outputs index metrics without generating any FASTQ files.	False
--detect-adapters	Detects adapters sequences, overriding any sequences present in the run manifest.	False
--exclude-tile,-e	Use regular expression (regex) to specify a subset of tiles to exclude from processing(e.g.,`L1.*C02S.`). To specify multiple subsets, enter the argument multiple times.	Not applicable
--flowcell-id	Overwrite the flowcell ID used for a run.	RunParameters.json
--force-index-orientation	Perform demultiplexing withoutd etecting the index sequence orientation. When true, Bases2Fastq applies the orientation recorded in the run manifest.	False
--help, -h	Display the usage statement.	Not applicable
--i1-cycles	Overwrite the number of cycles in Index1.	RunParameters.json
--i2-cycles	Overwrite the number of cycles in Index2.	RunParameters.json
--include-tile,-i	Use regex to specify a subset of tiles for processing after excluding all tiles from processing(e.g.,`--exclude-tile ".\" --include- tile "L1.\C02S."`). To specify multiple subsets, enter the argument multiple times.	Not applicable
--input-remote	Sets the rclone remote name that points to the input files.	Not applicable
--kit-configuration	Overwrite the kit configuration. Valid values are 150Cycles for a 2 x 75 kit or 300Cycles for a 2 x 150 kit.	RunParameters.json
--legacy-fastq	Applya legacy file naming convention to FASTQ files, such as SampleName_S1_L001_R1_001.fastq.gz.	False
--log-level, -1	Specify the minimum level required to log an event:INFO, DEBUG,WARNING,or ERROR.	INFO
--num-threads,-p	Specify the number of threads to use for processing.The minimum value is 1, and the maximum value depends on your system.	1
--num-unassigned	Specify a value ≤ 1000 that indicates the maximum number of unassigned sequences to support.	30
--output-remote	Sets the rclone remote name that points to the location of the FASTQ files and other outputs.	Not applicable
--preparation-workflow	Overwrite the library prep workflow. Valid values are Adept and Elevate.	RunParameters.json
--qc-only	Enable QC-only mode, which generates a representative view of run metrics on one tile without generating any FASTQ files.	False
--r1-cycles	Overwrite the number of cycles in Read 1.	RunParameters.json
--r2-cycles	Overwrite the number of cycles in Read 2.	RunParameters.json
--run-manifest,-r	Overwrite the location of RunManifest.csv, which is the run manifest the run generated, with the path to another run manifest.	Not applicable
--settings	Overwrite run manifest settings by entering the argument for each setting you want to overwrite. For example: - `—settings "I1Fastq,True”` -`—settings "I2Fastq,True”` -`settings "I1Mask,I1:N3Y*`	Not applicable
--skip-qc-report	Do not create an HTML QC report.	False
--split-lanes	Divide FASTQ files by flowcell lane.	False
--strict,-s	Enable strict mode, which prevents processing when any input files are invalid. When strict mode is off, Bases2Fastq warns you that files are missing or corrupt but continues processing.	False
--version,-v	Display the current version of Bases2Fastq. Bases2Fastq logs the version at the start of FASTQ file generation regardless of whether you include this argument.	Not applicable

The available setting sare listed in Run ManifestSettingson page 13

Additional Commands

Bases2Fastq supports the help and version argumentsas standalone commands that you can run without the general run command.

info

I did not include the run manifest settings as it can be linked.

Adapter Trimming

Library prep adds Read 1 and Read 2 adapters to each sample. When the length of Read 1 or Read 2 exceeds the length of the DNA insert, the run sequences into the adapter.Adapter trimming removes the adapter sequences from the 3′ end of each read to prevent adapter-based errors in certain analyses.

Run manifest settings enable adapter trimming and specify the options. When adapter trimming is enabled, Bases2Fastq automatically detects and trims adapter sequences regardless of whetherthe run manifest specifies the adapter sequences.

Figure 2: Trimming adapter sequences from Read 1 and Read 2

Bases2Fastq

Paired-End versus Single-End

Bases2Fastq includes paired-end and single-end adapter trimming.Paired-end adapter trimming aligns the Read 1 and Read 2 inserts to accurately trim short adapters. When a sample includes insertions and deletions (indels), the software accurately trims adapters that are as short as one base. Single-end adapter trimming individually processes each read, removing the adapter sequences without alignment.

Paired-end adapter trimming is more accuratebut requires that Read 1 and Read 2 each include at least 17 cycles. Single-end adapter trimming supports applications hat do not meet this requirement. Neither type of adapter trimming increases the run time.

Default Adapter Sequences

The default R1Adapter and R2Adapter valuesfor the Adept Workflow are blank. Consult the third-party library prep documentation for adapter trimming recommendations. If you do not specify values, Read 1 and Read 2 must each include at least 48 cycles. Otherwise, Bases2Fastq cannot detect and trim the adapters.

For the ElevateWorkflow, the following sequences are the default values:

R1Adapter—5' ATGTCGGAAGGTGTGCAGGCTACCGCTTGTCAACT 3'
R2Adapter—5' ATGTCGGAAGGTGTCTGGTGAGCCAATCCAGCACG 3'

Base Masks

A base masks pecifies a set of cycles for a demultiplexing operation. Within a base mask,a series of operators indicates whether cycles are masked. A positive integer or asterisk follows each operator to indicate which cycles to mask.

A Y (yes) operator indicates that a cycle is included in the mask.
An N (no) operator indicates that a cycle is excluded from the mask.
A positive integer indicates the numberof cycles to include or exclude.
An asterisk matches any remaining cycles in the read.

For example,Y4N* masks the first four cycles in a read.The base mask N3Y2N* leaves the first three cycles of a read unmasked, masks the fourth and fifth cycles,and leaves the remaining cycles unmasked.

Read Identifiers

A base mask can include read identifiers that restrict the mask to Index1, Index2, Read 1, or Read 2. Each read identifier is encoded as the abbreviated read name followed by a colon (e.g.,R1:). If the base mask does not include a read identifier,Bases2Fastq uses a default read that depends on the following settings: I1Mask, I2Mask, R1FastQMask, R2FastQMask, and UmiMask.

To specify one read for a base mask, start the base mask with the read identifier. If you are specifying multiple reads for a base mask, enter multiple read sections that each start with the read identifier. Separate each read section with a plus sign.

Example base mask that applies to one read: I1:Y3N*
Example base mask that applies to two reads: I1:Y3N*+I2:Y2N*

Cycle Lengths

A base mask must define the full cycle length of a read, regardless of whether you mask select bases in the read or all bases. A read with select bases masked must still account for the remaining cycles. Otherwise,Bases2Fastq displays a validation error.

For example, if Read 1 consists of 30 bases and you want to mask the first 15, end the base mask with the remaining number of cycles. The base mask R1:Y15N15masksthe first 15 bases (Y15) of Read 1 (R1:) and leaves the remaining 15 bases unmasked(N15). Alternatively, R1:Y15N* achieves the same goal but uses an asterisk to cover the remaining number of cycles.

Example Base Masks

Base Mask	Result
`R1:Y2N*`	Matchesthe firsttwo cycles of Read 1
`N3Y2N3`	Matchesthe fourthand fifthcycles of the defaultread !(Aspose.Words.dda2bdc7-cf9e-40a0-a542-84989225fc34.028.png)
`I1:N2Y*N2`	Matches all but the first two and last two cycles of Index1
`R1:YN-R2:YN`	Matches all but the last cycles of Read 1 and Read 2

Example Run Commands

The following sections provide example run commands, which demonstrate how to perform the following functions:

Run Bases2Fastq in QC-only mode.
Run Bases2Fastq with a corrected run manifest. For more information,see Corrected Run Manifest
Run Bases2Fastq with the settings argument to adjust run manifest settings. For a list of settings,see Run ManifestSettings* on page 13

Run QC-Only Mode

Static Binary

./bases2fastq /path/to/input /path/to/output --qc-only!(Aspose.Words.dda2bdc7-cf9e-40a0-a542-84989225fc34.034.png)

Docker

docker run --rm -v </path/to/input>:/input -v </path/to/output>: /input /output --qc-only

Run a Corrected Run Manifest

Static Binary

./bases2fastq /path/to/input /path/to/output -r /path/to/corrected\_ manifest\_filename.csv!(Aspose.Words.dda2bdc7-cf9e-40a0-a542-84989225fc34.036.png)

Docker

docker run --rm -v </path/to/input>:/input -v </path/to/output>:/output bases2fastq!(Aspose.Words.dda2bdc7-cf9e-40a0-a542-84989225fc34.036.png) /input /output -r /input/<corrected\_manifest\_filename.csv>

Adjust Run Manifest Settings

Static Binary

The following example generates a FASTQfile for Index1 and a UMI FASTQ file.

./bases2fastq /path/to/input /path/to/output --settings "I1Fastq,True" --settings!(Aspose.Words.dda2bdc7-cf9e-40a0-a542-84989225fc34.037.png) "UmiFastQ,True"

The followingexampleenablessingle-end adaptertrimmingfor Read 2.

./bases2fastq /path/to/input /path/to/output --settings "R2AdapterTrim,True"!(Aspose.Words.dda2bdc7-cf9e-40a0-a542-84989225fc34.037.png) --settings "AdapterTrimMode,Single-End"

Docker

The followingexamplegeneratesa FASTQfile for Index1 and a UMI FASTQ file.

docker run --rm -v </path/to/input>:/input -v </path/to/output>:/output!(Aspose.Words.dda2bdc7-cf9e-40a0-a542-84989225fc34.038.png) bases2fastq /input /output --settings "I1Fastq,True" --settings "UmiFastQ,True"

The following example enables single-end adapter trimmingfor Read 2.

docker run --rm -v </path/to/input>:/input -v </path/to/output>:/output!(Aspose.Words.dda2bdc7-cf9e-40a0-a542-84989225fc34.038.png) bases2fastq /input /output --settings "R2AdapterTrim,True" --settings "AdapterTrimMode,Single-End"

Inputs and Outputs

Inputs to Bases2Fastq are the files that a sequencing run generates and transfers to a run folder in your storage location. In turn, Bases2Fastq generates an output folder that containsthe outputfiles,an info directory,and a Samplesdirectory.The Samples directoryorganizesFASTQ filesand samplemetricsinto one folderfor each sample.

Figure 3: Bases2Fastq output directory

Bases2Fastq

Input Files

The followingtableliststhe filesthat serve as inputfor Bases2Fastq. Bases2Fastqonlyuses the alignmentfile whenthe run manifest settingSpikeInAsUnassignedis set to true.

File	Directory and File Name	Description	Quantity*
Alignment	Root/Alignment/{read}_{tile}.aln	Binaryfilesthat indicatewhichpolonies alignto PhiXControlLibrary	One per tile per read
Bases	Root/BaseCalls/{tile}/{read}_ {tileName}_C{cycle:000}.bases.gz	Binaryfilesthat containbase calls and associatedQ-scores	One per read, tile, and cycle
Filter	Root/Filter/{tile}.filter	Binaryfilesthat containthe filterstatus for each polony	One per tile
Location	Root/Location/{tile}.loc	Binaryfilesthat identifypolonylocations on the flowcell	One per tile
Run manifest	Root/RunManifest.csv CSV file that records biologicalsample informationand analysissettings	One per run
Run manifest	Root/RunManifest.json JavaScriptObjectNotation(JSON)file reservedfor Elementprocesses	One per run
Run parameters	Root/RunParameters.json JSON filethat records information aboutthe run configuration	One per run

Output Files

The followingtableliststhe filesthat Bases2Fastqoutputs.Subsequentsectionsdetailthe HTML QC report,FASTQfiles,run metrics, and sample metrics.

File	Directory	Description
Bases2Fastq.log	info	Log file that records softwareevents
HTML QC report	Root	Interactive report on run performanceand quality!
FASTQ	Samples/{sample}	The primary output of Bases2Fastq
IndexAssignment.csv	Root	Yield and the number and rate of polonies assigned for each sample and index combination
Metrics.csv	Root	The mismatchrates, percent assigned,and per-sample yield or each lane
RunManifest.csv	Root	The AVITI Operating Software (AVITIOS)- or user-created run manifest
RunManifestErrors.json	info	A record of errors in the run manifest
RunParameters.json	Root	A copy of the originalrun parametersfile
RunStats.json	Root	Informationon run performance
Samplemetrics	Samples/{sample}	Information on the performance of each sample in the run
UnassignedSequences.csv	Root	The most frequent unassigned index sequences with approximate counts

^{* Counts indicatehow manytimesan incorrectindexsequenceappears.}

HTML QC Report

The HTML QC reportopens in a browserso you can move throughvarioustabs.The tabs let you reviewhistogramsand other charts that visualizeindexassignmentand other qualitymetrics.Bases2Fastqnames the reportper the convention{runname}_QC.html. The reportdoes not showper samplecharts if more than 96 samplesare presentin the run manifest.

FASTQ Files

A FASTQfile records all genomic data and corresponding Q-scores for a sample.FASTQfilesare GZIP-compressedtextfilesnamedper the convention{sample}_{read}.fastq.gz.

Each entryin a FASTQ file corresponds to one read and includes the following four lines:

A sequence identifier that includes run and polony information
Base calls assembled into a sequence comprised of A, C, G, T, and N
A plus sign (+) that separates the sequencefrom the Q-scores
A Q-score for each base in the sequence

Sequence Identifiers

A sequence identifier includes the components described in the following table, formatted in one line: @<instrument>:<run name>:<flow cell ID>:<lane>:<tile>:<x-pos>:<y-pos>:UMI <read>:N:0:<index sequence>.

Component	Value	Description
@	@	Start to the sequence identifier line
`<instrument>`	Upper and lowercase letters, integers 0-9, and underscores (_)	Instrument name
`<run name>`	Upper and lowercase letters, integers 0-9, and underscores (_)	Run name as defined during run setup
`<flowcell ID>`	Upper and lowercase letters and integers 0-9	Flow cell ID from the barcode scan. If no barcode is present, the Run ID replaces the Flow cell ID.
`<lane>`	1 or 2	Lane number
`<tile>`	An integer	Tile number
`<x_pos>`	A zero-padded integer	X-coordinate of the polony
`<y_pos>`	A zero-padded integer	Y-coordinate of the polony
`<UMI>`	A,T,C,G,N,+	UMI sequence with a plus sign separating the Read 1 and Read 2 sequences, if applicable
`<read>`	1 or 2	Read number
`<is filtered>`	N	A legacy filtering value of N. The value exists only for backwards compatibility and does not change.
`<control number>`	0	A legacy control number of 0. The value exists only for backwards compatibility and does not change.
`<indexsequence>`		A value that depends on the indexing strategy indicated in the run manifest: - No indexing - The sample number - Single indexing - The observed index sequence - Dual indexing - The observed Index 1 sequence, a plus sign, and the observed Index 2 sequence

Quality Scores

A Q-score indicates the confidence of a base call based on the Phred scale. A Phred quality score (Q) is logarithmically related to error rate (E): Q = -10log E. In a FASTQ file, an ASCII code represents the Q-score. Bases2Fastq encodes quality scores with a +33 offset (Phred33).

danger

get table from noveed

Run Metrics

A run metrics file,RunStats.json, reports the following performance metrics in JSON file format. The metrics are specifiation the Bases2Fastq run.

Metric	Value
AnalysisVersion	The current version of Bases2Fastq
AnalysisID	The unique,Bases2Fastq-generated identifier for the analysis
AssignedYield	The run yield based on assigned reads in gigabases
FileVersion	The current versionof the file format
FlowCellID	A flowcell identifier sourced from RunParameters.jsonor, if blank,the letter R followed bythe RunID value
I1IsReverseComplement	The observedorientationof the Index1 sequences relative to the orientation recordedin the run manifest
I2IsReverseComplement	The observed orientation of the Index 2 sequences relative to the orientation recorded in the run manifest
Lanes	A detailedlist of per-lanemetrics
MeanReadLength The average read length after adapter trimming
NumPolonies	The totalnumber of polonies calculated for the run
NumPoloniesBeforeTrimming	The total number of polonies calculated for the run before adapter trimming
PercentAssignedReads	The percentage of reads assigned to a sample
PercentUnexpectedIndexPairs	The percentage of all polonies with Index1 and Index2 reads that matched different samples.
PercentMismatch	The percentage of polonie sassigned to a sample with mismatch
PercentQ30	The percentageof ≥ Q30 Q-scores for the run, includingassignedand unassigned reads
PercentReadsTrimmed	The percentage of reads that Bases2Fastq trimmed
QualityScore10thPercentile	The 10th percentileof quality scores
QualityScore25thPercentile	The 25th percentileof quality scores
QualityScore50thPercentile	The 50th percentileof quality scores
QualityScore75thPercentile	The 75th percentileof quality scores
QualityScore90thPercentile	The 90th percentileof quality scores
QualityScoreHistogram	A per-base call Q-score distribution with integer resolution
QualityScoreMean	The averageQ-score of base calls for a sample
RunName	A text-based run identifiersourced from RunParameters.json
RunID	A universally uniqueidentifier (UUID)assigned to the run and sourced from RunParameters.json
Samples	A list of libraries the run sequenced
SampleStats	The per-sample metrics listed in the sample metrics files for the run
TotalYield	The total yield of all reads in gigabases
UnassignedSequences	A list of unassigned index sequences with a count for each unassigned sequence

^{* For demultiplexingto be successful,both index reads must match the same sample.}

Sample Metrics

A sample metrics file reports the following sample-specific performance metrics in JSON file format. Bases2Fastq names the file per the convention{sample}_stats.json.

Field	Value
AnalysisVersion	The current version of Bases2Fastq
BaseComposition	Counts for each A, C, G, T, and N base
ExternalID	An ExternalID specified in the run manifest, if applicable
FileVersion	The current version of the file format
MeanReadLength	The averageread length after adapter trimming
NumPolonies	The total numberof polonies assigned to the sample
NumPoloniesBeforeTrimming	The number of polonie sassigned to a sample before adapter trimming
Occurrences	Additional information per occurrence of the sample
PercentMismatch	The percentage of polonies assignedto the sample with mismatch
PercentQ30	The percentage of ≥ Q30 Q-scores for the sample
PercentQ40	The percentage of ≥ Q40 Q-scores for the sample
PercentReadsTrimmed	The percentage of reads that Bases2Fastq trimmed
PerReadGCCountHistogram	A list of counts: the valueat index i is the number of reads withi G/C calls
QualityScoreMean	The mean Q-score of base calls for the sample
RemovedAdapterLengthHistogram	A histogram showing the number of bases trimmed from an adapterin a given position
SampleID	A globally unique sample identifier
SampleName	The alphabetical sample identifier
SampleNumber	The numeric sample identifier
RunName	A text-based run identifier sourced from RunParameters.json
RunID	A UUID assigned to the run and sourced from RunParameters.json
Yield	The number of bases in the samplein gigabases

Occurrences

Occurrences are a set of fields in a sample metrics file that allocate sample performance metrics by specific occurrences of a sample in the run. For example, if a sample appears in both lanes, Bases2Fastq lists an occurrence for each lane.

Each occurrence includes the identifiers Lane and ExpectedSequence and reports the following performance metrics.

Occurrences	Field Value
BaseComposition	Counts for each A, C, G, T, and N base
CustomMetadata	Custom metadata specified in the run manifest, if applicable
MeanReadLength	The average read length after adapter trimming
NumPolonies	The total numberof polonies assigned to the sample
NumPoloniesBeforeTrimming	The number of polonies assignedto a sample before adapter trimming
PercentMismatch	The percentage of polonies assigned to the sample with mismatch
PercentQ30	The percentage of ≥ Q30 Q-scores for the sample
PercentQ40	The percentage of ≥ Q40 Q-scores for the sample
PercentReadsTrimmed	The percentage of reads that Bases2Fastq trimmed
PerReadGCCountHistogram	A list of counts: the value at index i is the numberof reads with i G/C calls
QualityScoreMean	The mean Q-score of base calls for the sample
R1Adapters	The Read 1 adapter sequences associated with the lane the occurrence belongs to
R2Adapters	The Read 2 adapter sequences associated with the lane the occurrence belongs to
RemovedAdapterLengthHistogram	A histogram showing the numberof bases trimmed from an adapter in a given position
Yield	The numberof bases in the sample in gigabase

Troubleshooting

The following table provides resolutions to common problems that can occur during FASTQ file generation. If a problem persists, contact Element Technical Support.

Problem	Resolution
Bases2Fastq fails to detect credentials attached to an AWS storage location with an EC2 instance	When using automaticrole detectionin AWS,makesure the regionenvironment variable is set credentials attached to an correctly in the EC2 instance: `export AWS\_DEFAULT\_REGION=$aws\_region`.
Indexing performance does not meet specifications	Make sure the run manifest includes the PhiX Control Library index sequences. Spiking in PhiX Control Library without recording the index sequences affects index assignment.
Indexing performance does not meet specifications	Review the index charts in the HTML QC report. The charts show the index assignment percentage rate, the number of polonies assigned to each index, and the most frequent unassigned indexes. Use this information to correct the run manifest or QC-fail the run.
Indexing performance does not meet specifications	Review the I1IsReverseComplement and I2IsReverseComplement metrics in RunStats.json. The metrics show the observed orientation of index sequences relative to the orientation recorded in the Index1 and Index2 columns of the run manifest. If a column contains inconsistent orientations, correct the run manifest accordingly.
The HTML QC report is missing from the output.	If the system is configured for static binary, make sure Python 3.6 or newer is installed with the necessary packages. Bases2Fastq always generates the report on Docker-configured systems but static binary requires Python 3.6 or newer
The HTML QC report is missing from the output.	If Python is correctly installed or the system is configured for Docker, review the error in info/QCReportErrors.txt for the cause. Use this information to generate the HTML QC report.
Flow cell ID is missing from the output.	Run Bases2Fastq with the flow cell ID argument to add a flow cell ID. For example:`--flowcell-id "1234567890"`.
A corrected run manifest requires reprocessing the run.	Run Bases2Fastq with the QC-only argument to validate indexes on one tile in a corrected run manifest. For an example command, see Run QC-Only Mode on page 18.

Software Features​

Run Manifest​

Default Run Manifest​

Corrected Run Manifest​

Software and System Setup​

System Requirements​

File Transfer and Storage​

AWS Storage​

GCS Storage​

Rclone-Compatible Storage​

Docker Setup​

Static Binary Setup​

Set Up the Current Version of Bases2Fastq​

Set Up a Previous Version of Bases2Fastq​

Running Bases2Fastq​

Optional Arguments​

Additional Commands​

Adapter Trimming​

Paired-End versus Single-End​

Default Adapter Sequences​

Base Masks​

Read Identifiers​

Cycle Lengths​

Example Base Masks​

Example Run Commands​

Run QC-Only Mode​

Static Binary​

Docker​

Run a Corrected Run Manifest​

Static Binary​

Docker​

Adjust Run Manifest Settings​

Static Binary​

Docker​

Inputs and Outputs​

Input Files​

Output Files​

HTML QC Report​

FASTQ Files​

Sequence Identifiers​

Quality Scores​

Run Metrics​

Sample Metrics​

Occurrences​

Troubleshooting​

Software Features

Run Manifest

Default Run Manifest

Corrected Run Manifest

Software and System Setup

System Requirements

File Transfer and Storage

AWS Storage

GCS Storage

Rclone-Compatible Storage

Docker Setup

Static Binary Setup

Set Up the Current Version of Bases2Fastq

Set Up a Previous Version of Bases2Fastq

Running Bases2Fastq

Optional Arguments

Additional Commands

Adapter Trimming

Paired-End versus Single-End

Default Adapter Sequences

Base Masks

Read Identifiers

Cycle Lengths

Example Base Masks

Example Run Commands

Run QC-Only Mode

Static Binary

Docker

Run a Corrected Run Manifest

Static Binary

Docker

Adjust Run Manifest Settings

Static Binary

Docker

Inputs and Outputs

Input Files

Output Files

HTML QC Report

FASTQ Files

Sequence Identifiers

Quality Scores

Run Metrics

Sample Metrics

Occurrences

Troubleshooting