Skip to main content

Executing Bases2Fastq

Execute Bases2Fastq on a local compute environment by entering a command in the CLI terminal. The command varies based on cloud storage setup and execution type, either Docker or static binary. Append optional arguments to the command to specify demultiplexing and FASTQ file generation settings.

The command has the following four parts:

  • The bases2fastq executable
  • An input directory
  • An output directory
  • Optional arguments to manipulate the output

Review Run Manifest

Before executing Bases2Fastq, make sure the run manifest in the input directory includes the correct demultiplexing settings. For more information, see the Run Manifest Documentation.

Note

If the run manifest does not include index sequences, Bases2Fastq skips demultiplexing and assigns all polonies to one sample. If necessary, create a corrected run manifest.

Execute the Software

  1. Copy the template execution command for your storage type from one of the following subsections:
  2. Update the directory paths in the template command, including input and output for the intended input and output locations.
  3. Adjust the optional arguments in the run command.
    • To add arguments, replace {options} with any of the arguments listed in Optional Arguments.
    • To execute Bases2Fastq without any arguments, delete {options}.
  4. Execute the command in the terminal.

    The terminal displays execution progress, as shown in the following code example. The execution is complete when the terminal displays the elapsed time.

Run command: bases2fastq /input /output
Parsing run manifest /input/RunManifest.csv
Starting FASTQ generation for Indexing-Sim
Processing tile L1R02C01S1
Processing tile L1R02C01S2
...
....
Aggregating run stats
========== Stats Summary ==========
Reads assigned: 35.883%
Mean Q score: 24.001
Percent Q30: 38.307%
Percent Q40: 17.019%
Assigned yield: 0.010 Gb
===================================
Generating HTML QC-report
========== Timing ==========
FASTQ generation: 14.870s
Stats reports: 16.304s
Total elapsed: 31.174s
============================
Output stored in /output
  1. Access the output files in the location specified in the execution command.
    • To view logs, go to info/Bases2Fastq.log.
    • To view the HTML QC report, go to the output folder, double-click the file, and move through each tab.

Execute with Local Storage

Copy the template execution command for Docker or static binary. Update the template command and execute Bases2Fastq as described in steps 2–5 of the execution instructions.

When using Docker, the general execution command automatically pulls the image to the local environment if it is not already there. The command uses the following structure:

  • docker run invokes the Docker daemon to start the specified image.
  • --rm removes the image after the execution completes. Element recommends using this argument to keep your system clean.
  • elembio/bases2fastq identifies the image that you want to pull. By default, this image is the latest. For a specific a version, include a tag, such as elembio/bases2fastq:1.4.0.
  • /input defines the path to the input directory.
  • /output defines the path to the output directory. Executing two times with the same output directory overwrites the location.
  • -p threads Bases2Fastq for faster parallel execution. The value depends on your system setup. The example command has a value of 8, which requires at least 8 CPUs.
  • {options} defines the optional arguments.

Executing in local environments might require mounting the directory to Docker:

  • -v mounts the local path to Docker as a volume.
  • :/input binds the mounted path to a variable for use by Bases2Fastq.
docker run --rm  -v /Users/username/path/to/input/:/input -v /Users/username/path/to/input/:/output elembio/bases2fastq bases2fastq /input /output -p 8 {options}

Execute with AWS S3

AWS S3 requires Uniform Resource Identifiers (URIs) to serve as paths to the Amazon S3 buckets that contain the input and output files. Accessing the buckets requires AWS credentials configured for the setup. If both input and output locations are AWS, they must use the same credentials.

  • For AWS S3 within an EC2 instance, Bases2Fastq detects the credentials or Identity and Access Management (IAM) role associated with the instance.
  • For AWS S3 access without an EC2 instance, define AWS credentials with the following environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_DEFAULT_REGION.
Note

If Bases2Fastq fails to detect credentials attached to an AWS storage location with an EC2 instance, the environment variable might be incorrect. Make sure this variable is set correctly in the EC2 instance: export AWS_DEFAULT_REGION=$aws_region.

  1. Run the following export command in terminal to set the credentials as environment variables.
export AWS_ACCESS_KEY_ID=ExampleAccessKey
export AWS_SECRET_ACCESS_KEY=EXAMPLEnhMUUOhQF/T51c6A5+DtQas9ghebs
export AWS_DEFAULT_REGION=us-west-2
  1. Copy the following template execution command for Docker or static binary.

For Docker, the execution command includes -e to pull in the credentials set as environment variables.

docker run --rm -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY -e AWS_DEFAULT_REGION=$AWS_DEFAULT_REGION elembio/bases2fastq bases2fastq s3://bucket/input s3://bucket/output -p 8 {options}
  1. Update the template command and execute Bases2Fastq as described in steps 2–5 of the execution instructions.

Execute with GCS

GCS storage requires URLs to serve as preconfigured paths to the Cloud Storage buckets that contain the input and output files. Accessing the buckets requires configuring GCS credentials for the setup. If both input and output locations are GCS, they must use the same credentials.

The environment variable GOOGLE_APPLICATION_CREDENTIALS must link to the file that contains the application credentials. For more information, see the Google documentation.

  1. Run the following export command in terminal to set the credential as an environment variable.
export CREDENTIALS="/path/to/GCP-creds.json"
  1. Copy the following template execution command for Docker or static binary.

For Docker, the execution command includes -e to pull in the credential set as an environment variable.

docker run --rm -e GOOGLE_APPLICATION_CREDENTIALS=$CREDENTIALS elembio/bases2fastq bases2fastq gs://bucket/input gs://bucket/input -p 8 {options}
  1. Update the template command and execute Bases2Fastq as described in steps 2–5 of the execution instructions.

Execute with Rclone-Compatible Cloud

Before executing Bases2Fastq with rclone, review the rclone requirements for system setup.

If you are using rclone, Element recommends executing with static binary. For execution, use the --input-remote and --output-remote options as shown in the following code example. Then update the template command and execute Bases2Fastq as described in steps 2–5 of the execution instructions.

./bases2fastq path/to/input/remote path/to/output/remote --input-remote "input-name" --output-remote "output-name" -p 8 {options}