Executing Bases2Fastq
To execute Bases2Fastq on a local compute environment, enter a command in the command line interface (CLI) terminal. The command that you enter is based on the cloud storage setup and execution type, such as Docker or static binary. To specify demultiplexing and FASTQ file generation settings, append optional arguments to the command.
The command has the following four parts:
- The
bases2fastq
executable - An input directory
- An output directory
- Optional arguments to manipulate the output
Review Run Manifest
Before you execute Bases2Fastq, make sure that the run manifest in the input directory includes the correct settings. For more information, see the Run Manifest Documentation.
If the sequencing run manifest does not include index sequences, Bases2Fastq skips demultiplexing and assigns all polonies to one sample. If necessary, create a corrected run manifest.
Execute the Software
Copy the template execution command for your storage type from one of the following subsections:
Update the directory paths in the template command, including
input
andoutput
for the intended input and output locations.Adjust the optional arguments in the run command.
- To add arguments, replace
{options}
with any of the arguments listed in Optional Arguments. - To execute Bases2Fastq without any arguments, delete
{options}
.
- To add arguments, replace
Execute the command in the terminal.
The terminal displays execution progress, as shown in the following sequencing example. The execution is complete when the terminal displays the elapsed time.
Run command: bases2fastq /input /output
Parsing run manifest /input/RunManifest.csv
Starting FASTQ generation for Indexing-Sim
Processing tile L1R02C01S1
Processing tile L1R02C01S2
...
....
Aggregating run stats
========== Stats Summary ==========
Reads assigned: 35.883%
Mean Q score: 24.001
Percent Q30: 38.307%
Percent Q40: 17.019%
Assigned yield: 0.010 Gb
===================================
Generating HTML QC-report
========== Timing ==========
FASTQ generation: 14.870s
Stats reports: 16.304s
Total elapsed: 31.174s
============================
Output stored in /outputAccess the output files in the location that is specified in the execution command.
- To view logs, go to
info/Bases2Fastq.log
. - To view the HTML QC report, go to the output folder, double-click the file, and move through each tab.
- To view logs, go to
Execute with Local Storage
Copy the template execution command for Docker or static binary. Update the template command and execute Bases2Fastq as described in steps 2–5 of the execution instructions.
- Docker
- Static Binary
When you use Docker, the general execution command automatically pulls the image to the local environment. The command uses the following structure:
docker run
: Invokes the Docker daemon to start the specified image.--rm
: Removes the image after the execution completes. Element recommends to use this argument to keep your system clean.elembio/bases2fastq
: Identifies the image that you want to pull. By default, this image is the latest. For a specific a version, include a tag, such aselembio/bases2fastq:1.4.0
./input
: Defines the path to the input directory./output
: Defines the path to the output directory. Two executions with the same output directory overwrites the location.-p
: Threads Bases2Fastq for faster parallel execution. The value is based on your system setup. The example command has a value of 8, which requires at least 8 CPUs.{options}
: Defines the optional arguments.
Executing in local environments might require mounting the directory to Docker:
-v
: Mounts the local path to Docker as a volume.:/input
: Binds the mounted path to a variable for use by Bases2Fastq.docker run --rm -v /Users/username/path/to/input/:/input -v /Users/username/path/to/input/:/output elembio/bases2fastq bases2fastq /input /output -p 8 {options}
The command for static binary uses the following structure:
./bases2fastq
: Invokes the installed static binary executable.input
: Defines the path to the input run directory.output
: Defines the path to the output directory. Two executions with the same output directory overwrites the location.-p
: Threads Bases2Fastq for faster parallel execution. The value is based on your system setup. The example command has a value of 8, which requires at least 8 CPUs.{options}
: Defines the optional arguments../bases2fastq input output -p 8 {options}
Execute with Amazon S3
Amazon S3 requires Uniform Resource Identifiers (URIs) to serve as paths to the Amazon S3 buckets that contain the input and output files. Access to the buckets requires AWS credentials to be configured for the setup. If both of the input and output locations are AWS, they must use the same credentials.
- For Amazon S3 within an EC2 instance, Bases2Fastq detects the credentials or Identity and Access Management (IAM) role that is associated with the instance.
- For Amazon S3 access without an EC2 instance, define AWS credentials with the following environment variables:
AWS_ACCESS_KEY_ID
,AWS_SECRET_ACCESS_KEY
, andAWS_DEFAULT_REGION
.
If Bases2Fastq fails to detect credentials attached to an AWS storage location with an EC2 instance, the environment variable might be incorrect. Make sure that this variable is set correctly in the EC2 instance: export AWS_DEFAULT_REGION=$aws_region
.
To set the credentials as environment variables, run the following
export
command in terminal:export AWS_ACCESS_KEY_ID=ExampleAccessKey
export AWS_SECRET_ACCESS_KEY=EXAMPLEnhMUUOhQF/T51c6A5+DtQas9ghebs
export AWS_DEFAULT_REGION=us-west-2Copy the following template execution command for Docker or static binary:
- Docker
- Static Binary
For Docker, the execution command includes -e
to pull in the credentials set as environment variables.
docker run --rm -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY -e AWS_DEFAULT_REGION=$AWS_DEFAULT_REGION elembio/bases2fastq bases2fastq s3://bucket/input s3://bucket/output -p 8 {options}
./bases2fastq s3://bucket/input s3://bucket/output -p 8 {options}
- Update the template command and execute Bases2Fastq as described in steps 2–5 of the execution instructions.
Execute with GCS
Google Cloud Storage (GCS) storage requires URLs to serve as preconfigured paths to the Cloud Storage buckets that contain the input and output files. Access to the buckets requires GCS credentials to be configured for the setup. If input and output locations are GCS, they must use the same credentials.
The environment variable GOOGLE_APPLICATION_CREDENTIALS
must link to the file that contains the application credentials. For more information, see the Google documentation.
To set the credential as an environment variable, run the following export command in terminal:
export CREDENTIALS="/path/to/GCP-creds.json"
Copy the following template execution command for Docker or static binary:
- Docker
- Static Binary
For Docker, the execution command includes -e
to pull in the credential set as an environment variable.
docker run --rm -e GOOGLE_APPLICATION_CREDENTIALS=$CREDENTIALS elembio/bases2fastq bases2fastq gs://bucket/input gs://bucket/output -p 8 {options}
./bases2fastq gs://bucket/input gs://bucket/output -p 8 {options}
- Update the template command and execute Bases2Fastq as described in steps 2–5 of the execution instructions.
Execute with Rclone-Compatible Cloud
Before you execute Bases2Fastq with rclone, review the rclone requirements for system setup.
If you use rclone, Element recommends that you execute with the static binary. For execution, use the --input-remote
and --output-remote
options as shown in the following code example. Then, update the template command and execute Bases2Fastq as described in steps 2–5 of the execution instructions.
- Static Binary
./bases2fastq path/to/input/remote path/to/output/remote --input-remote "input-name" --output-remote "output-name" -p 8 {options}