Getting Started with Bases2Fastq

Bases2Fastq processes Element AVITI™ System sequencing data and converts base calls into FASTQ files. While ElemBio Cloud offers a Bases2Fastq verified flow that executes in cloud environments, Element makes Bases2Fastq available as an executable through a Docker container or a static binary. This tutorial uses simulated data to demonstrate how to manually set up and execute Bases2Fastq in a Linux or Windows environment.

This tutorial covers the following topics:

The structure of a Bases2Fastq execution command
How to execute Bases2Fastq for your OS and distribution method
The creation of a corrected run manifest to troubleshoot an error
How to reexecute Bases2Fastq with a corrected run manifest

Before You Begin

Make sure you complete the following necessary prerequisites for this tutorial:

Install Docker or the static binary.
- The static binary is only compatible with Linux OS.
- When installing Docker for Windows OS, review the system requirements. Element recommends enabling the WSL 2 backend feature.
Install tree, a CLI tool that maps folder directories.
- Windows OS includes tree by default.
- For Linux OS, use the following command to install tree:

apt-get install tree

Set Up Docker

If you are using Docker for this tutorial, complete the following setup steps:

Make sure Docker for Desktop is running on your system.
Open a CLI terminal.
In the CLI terminal, run the following command to pull the latest Bases2Fastq image from the Element public registry hosted on DockerHub.
The CLI displays the current Bases2Fastq version.

docker run elembio/bases2fastq bases2fastq --version

Create a Directory

Create a bases2fastq-setup folder for this tutorial.

mkdir bases2fastq-setup

To prepare for downloading the simulated data, make the bases2fastq-setup folder the working directory in the CLI.

cd ./bases2fastq-setup

Create a fastq subfolder in the bases2fastq-setup folder.

mkdir fastq

Download the Simulated Data

Download and extract the simulated output data for a 2 x 150 AVITI System run to the bases2fastq-setup folder.
An AWS S3 cloud storage bucket hosts the data. The data use the standard format and structure for output files from an AVITI System.

curl http://element-public-data.s3.amazonaws.com/bases2fastq-share/bases2fastq-v2/20230404-bases2fastq-sim-151-151-9-9.tar.gz -o sim-data.zip

Extract the simulated data from the zip file.

tar -xvf sim-data.zip

Use tree to visualize the files in a tree format and confirm successful extraction of the data.

tree 20230404-bases2fastq-sim-151-151-9-9

Execution Command Overview

Select the tab for your OS and Bases2Fastq distribution.

Docker for Windows OS
Docker for Linux OS
Static Binary for Linux OS

For Docker, the Bases2Fastq execution commands in this tutorial include the following components:

docker run invokes the Docker daemon.
elembio/bases2fastq identifies the image you want to pull.
bases2fastq invokes the Bases2Fastq software.

The commands end with the input and output locations for the execution.

When using Docker to run Bases2Fastq, you must mount your test directory to the Docker container. The commands in this tutorial use a bind mount (-v) to mount the present working directory to the container, using a Docker volume to do so. The commands bind the present working directory to the variable /data to make the input and output locations for the execution accessible to the Docker container. Windows systems use %cd% to identify the present working directory.

For Docker, the Bases2Fastq execution commands in this tutorial include the following components:

docker run invokes the Docker daemon.
elembio/bases2fastq identifies the image you want to pull.
bases2fastq invokes the Bases2Fastq software.

The commands end with the input and output locations for the execution.

When using Docker to run Bases2Fastq, you must mount your test directory to the Docker container. The commands in this tutorial use a bind mount (-v) to mount the present working directory to the container, using a Docker volume to do so. The commands bind the present working directory to the variable /data to make the input and output locations for the execution accessible to the Docker container. Linux systems use ${PWD} to identify the present working directory.

For the static binary, the Bases2Fastq execution commands in this tutorial use ./bases2fastq to invoke the executable. The command then states the input and output locations for the execution. The basic structure of the command is as follows:

./bases2fastq <input> <output>

Execute Bases2Fastq

Docker for Windows OS
Docker for Linux OS
Static Binary for Linux OS

Execute Bases2Fastq with the following command, which uses %cd%:/data to mount the present working directory to the Docker container.

docker run -v %cd%:/data  elembio/bases2fastq bases2fastq /data/20230404-bases2fastq-sim-151-151-9-9 /data/fastq/test1

Wait until the execution completes.
The execution is complete when the CLI displays execution statistics, timing, and output information.

========= Stats Summary =========
Polony count:      100,000
Reads assigned:    98.267%
Mean Q score:      40.715
Percent Q30:       93.117%
Percent Q40:       78.998%
Assigned yield:    0.029 Gb
=================================
============ Timing =============
FASTQ generation:    6.936s
Stats reports:       13.991s
Total elapsed:       20.928s
=================================
Output stored in /data/fastq/test1

Use tree to visualize the output of the execution.

tree ./fastq/test1 /f

Examine the tree directory to make sure all output files are present.
The CLI lists the files as in the following example.

./fastq/test1
├── IndexAssignment.csv
├── Metrics.csv
├── RunManifest.csv
├── RunManifest.json
├── RunParameters.json
├── RunStats.json
├── Samples
│   └── DefaultSample
│       ├── DefaultSample_R1.fastq.gz
│       ├── DefaultSample_R2.fastq.gz
│       └── DefaultSample_stats.json
├── UnassignedSequences.csv
├── high-elevate-sim_QC.html
└── info
    ├── Bases2Fastq.log
    └── RunManifestErrors.json

Access files in the bases2fastq-setup/fastq/test1 folder.
- Access info/Bases2Fastq.log to view logs and check if any errors occurred.
- Open 20230404-Bases2Fastq-Sim_QC.html to view the HTML QC report.
- Notice that the Indexing page in the HTML QC report shows polonies successfully assigned to five samples.

Execute Bases2Fastq with the following command, which uses ${PWD}:"/data" to mount the present working directory to the Docker container.

docker run -v ${PWD}:"/data"  elembio/bases2fastq bases2fastq /data/20230404-bases2fastq-sim-151-151-9-9 /data/fastq/test1

Wait until the execution completes.
The execution is complete when the CLI displays execution statistics, timing, and output information.

========= Stats Summary =========
Polony count:      100,000
Reads assigned:    98.267%
Mean Q score:      40.715
Percent Q30:       93.117%
Percent Q40:       78.998%
Assigned yield:    0.029 Gb
=================================
============ Timing =============
FASTQ generation:    6.936s
Stats reports:       13.991s
Total elapsed:       20.928s
=================================
Output stored in /data/fastq/test1

Use tree to visualize the output of the execution.

tree ./fastq/test1

Examine the tree directory to make sure all output files are present.
The CLI lists the files as in the following example.

./fastq/test1
├── IndexAssignment.csv
├── Metrics.csv
├── RunManifest.csv
├── RunManifest.json
├── RunParameters.json
├── RunStats.json
├── Samples
│   └── DefaultSample
│       ├── DefaultSample_R1.fastq.gz
│       ├── DefaultSample_R2.fastq.gz
│       └── DefaultSample_stats.json
├── UnassignedSequences.csv
├── high-elevate-sim_QC.html
└── info
    ├── Bases2Fastq.log
    └── RunManifestErrors.json

Access files in the bases2fastq-setup/fastq/test1 folder.
- Access info/Bases2Fastq.log to view logs and check if any errors occurred.
- Open 20230404-Bases2Fastq-Sim_QC.html to view the HTML QC report.
- Notice that the Indexing page in the HTML QC report shows polonies successfully assigned to five samples.

Make sure you are at the home directory in the CLI, where the static binary executable is installed. If you are still in the bases2fastq-setup folder, use the command cd .. to move to the home directory.
Execute Bases2Fastq with the following command.

./bases2fastq bases2fastq-setup/20230404-bases2fastq-sim-151-151-9-9 bases2fastq-setup/fastq/test1

Wait until the execution completes.
The execution is complete when the CLI displays execution statistics, timing, and output information.

========= Stats Summary =========
Polony count:      100,000
Reads assigned:    98.267%
Mean Q score:      40.715
Percent Q30:       93.117%
Percent Q40:       78.998%
Assigned yield:    0.029 Gb
=================================
============ Timing =============
FASTQ generation:    6.936s
Stats reports:       13.991s
Total elapsed:       20.928s
=================================
Output stored in /bases2fastq-setup/fastq/test1

Use tree to visualize the output of the execution.

tree ./bases2fastq-setup/fastq/test1

Examine the tree directory to make sure all output files are present.
The CLI lists the files as in the following example.

./fastq/test1
├── IndexAssignment.csv
├── Metrics.csv
├── RunManifest.csv
├── RunManifest.json
├── RunParameters.json
├── RunStats.json
├── Samples
│   └── DefaultSample
│       ├── DefaultSample_R1.fastq.gz
│       ├── DefaultSample_R2.fastq.gz
│       └── DefaultSample_stats.json
├── UnassignedSequences.csv
├── high-elevate-sim_QC.html
└── info
    ├── Bases2Fastq.log
    └── RunManifestErrors.json

Access files in the bases2fastq-setup/fastq/test1 folder.
- Access info/Bases2Fastq.log to view logs and check if any errors occurred.
- Open 20230404-Bases2Fastq-Sim_QC.html to view the HTML QC report.
- Notice that the Indexing page in the HTML QC report shows polonies successfully assigned to five samples.

Troubleshooting

Docker for Windows OS
Docker for Linux OS
Static Binary for Linux OS

If you receive an error message that states the input files do not exist, make sure the input files are available to the Docker container through the mounted file system.

Run the Docker container in interactive mode and attempt to list the files in the mounted file system.

docker run -i   -v %cd%:/data  elembio/bases2fastq ls /data/20230404-bases2fastq-sim-151-151-9-9

Examine the returned list of files.
If the input folder for the command is correct, the CLI lists the following files.

BaseCalls
Filter
Location
RunManifest.csv
RunParameters.json

If the CLI does not list the expected files, complete the following troubleshooting actions:
- Make sure the permissions for your system to Docker are correct.
- Make sure the input data is present in your current working directory.

If you receive an error message that states the input files do not exist, make sure the input files are available to the Docker container through the mounted file system.

Run the Docker container in interactive mode and attempt to list the files in the mounted file system.

docker run -i   -v ${PWD}:"/data"  elembio/bases2fastq ls /data/20230404-bases2fastq-sim-151-151-9-9

Examine the returned list of files.
If the input folder for the command is correct, the CLI lists the following files.

BaseCalls
Filter
Location
RunManifest.csv
RunParameters.json

If the CLI does not list the expected files, complete the following troubleshooting actions:
- Make sure the permissions for your system to Docker are correct.
- Make sure the input data is present in your current working directory.

If you receive an error message that states the input files do not exist, make sure the input files are available in the input folder.

Attempt to list the files in the input directory.

ls bases2fastq-setup/20230404-bases2fastq-sim-151-151-9-9

Examine the returned list of files.
If the input folder for the command is correct, the CLI lists the following files.

BaseCalls Filter Location RunManifest.csv RunParameters.json

If the CLI does not list the expected files, complete the following troubleshooting actions:
- Make sure your current working directory is correct.
- Make sure the input data is present in your current working directory.

Corrected Run Manifest Overview

When you review the output files from your Bases2Fastq execution, you might discover an error in the run manifest for the execution. For example, you might realize that the run manifest mislabels samples, omits settings, or does not include index sequences. To resolve these errors, you can edit the original run manifest to create a corrected run manifest. The --run-manifest or -r optional argument lets you replace the original run manifest with the corrected version.

The following instructions demonstrate how to create a corrected run manifest from the original run manifest in the simulated data. After you reexecute Bases2Fastq with the new run manifest, the HTML QC report shows charts and data that reflect the corrections.

Reexecute Bases2Fastq with a Corrected Run Manifest

Navigate to the bases2fastq-setup/20230404-bases2fastq-sim-151-151-9-9 folder on your computer.
Open RunManifest.csv in a compatible editor.
Edit the SampleName column so that the first three rows are sample_0 and the fourth and fifth rows are sample_1.
The Samples section of the run manifest appears as follows in a comma-separated format.

[Samples]
SampleName, Index1, Index2
sample_0, ACGTGTAGC, GCTAGTGCA
sample_0, CACATGCTG, AGACACTGT
sample_0, GTACACGAT, CTCGTACAG
sample_1, TGTGCATCA, TAGTCGATC
sample_1, TGTAGGCCA, TCTAGCCTC

Save the file as RunManifestCorrected.csv in the bases2fastq-setup folder.
Append the following -r optional argument to the end of your execution command and reexecute the command.

Docker for Windows OS
Docker for Linux OS
Static Binary for Linux OS

docker run -v %cd%:/data  elembio/bases2fastq bases2fastq /data/20230404-bases2fastq-sim-151-151-9-9 /data/fastq/test2 -r /data/RunManifestCorrected.csv

docker run -v ${PWD}:"/data"  elembio/bases2fastq bases2fastq /data/20230404-bases2fastq-sim-151-151-9-9 /data/fastq/test2 -r /data/RunManifestCorrected.csv

./bases2fastq bases2fastq-setup/20230404-bases2fastq-sim-151-151-9-9 bases2fastq-setup/fastq/test2 -r /bases2fastq-setup/20230404-bases2fastq-sim-151-151-9-9/RunManifestCorrected.csv

Wait until the execution completes.
Open 20230404-Bases2Fastq-Sim_QC.html in the bases2fastq-setup/fastq/test2 folder to view the new HTML QC report.
The sample assignment charts and data in the Indexing page of the HTML QC report only show two samples, sample_0 and sample_1.

Additional Resources

For more information on the topics in this tutorial, see the following resources.

Before You Begin​

Set Up Docker​

Create a Directory​

Download the Simulated Data​

Execution Command Overview​

Execute Bases2Fastq​

Troubleshooting​

Corrected Run Manifest Overview​

Reexecute Bases2Fastq with a Corrected Run Manifest​

Additional Resources​