Skip to main content

Executing Bases2Fastq

To execute Bases2Fastq on a local compute environment, enter a command in the command line interface (CLI) terminal. The command that you enter is based on the cloud storage setup and execution type, such as Docker or static binary. To specify demultiplexing and FASTQ file generation settings, append optional arguments to the command.

The command has the following four parts:

  • The bases2fastq executable
  • An input directory
  • An output directory
  • Optional arguments to manipulate the output

Review Run Manifest

Before you execute Bases2Fastq, make sure that the run manifest in the input directory includes the correct settings. For more information, see the Run Manifest Documentation.

Note:  

If the sequencing run manifest does not include index sequences, Bases2Fastq skips demultiplexing and assigns all polonies to one sample. If necessary, create a corrected run manifest.

Execute the Software

  1. Copy the template execution command for your storage type from one of the following subsections:

  2. Update the directory paths in the template command, including input and output for the intended input and output locations.

  3. Adjust the optional arguments in the run command.

    • To add arguments, replace {options} with any of the arguments listed in Optional Arguments.
    • To execute Bases2Fastq without any arguments, delete {options}.
  4. Execute the command in the terminal.

    The terminal displays execution progress, as shown in the following sequencing example. The execution is complete when the terminal displays the elapsed time.

    Run command: bases2fastq /input /output
    Parsing run manifest /input/RunManifest.csv
    Starting FASTQ generation for Indexing-Sim
    Processing tile L1R02C01S1
    Processing tile L1R02C01S2
    ...
    ....
    Aggregating run stats
    ========== Stats Summary ==========
    Reads assigned: 35.883%
    Mean Q score: 24.001
    Percent Q30: 38.307%
    Percent Q40: 17.019%
    Assigned yield: 0.010 Gb
    ===================================
    Generating HTML QC-report
    ========== Timing ==========
    FASTQ generation: 14.870s
    Stats reports: 16.304s
    Total elapsed: 31.174s
    ============================
    Output stored in /output
  5. Access the output files in the location that is specified in the execution command.

    • To view logs, go to info/Bases2Fastq.log.
    • To view the HTML QC report, go to the output folder, double-click the file, and move through each tab.

Execute with Local Storage

Copy the template execution command for Docker or static binary. Update the template command and execute Bases2Fastq as described in steps 2–5 of the execution instructions.

When you use Docker, the general execution command automatically pulls the image to the local environment. The command uses the following structure:

  • docker run: Invokes the Docker daemon to start the specified image.
  • --rm: Removes the image after the execution completes. Element recommends to use this argument to keep your system clean.
  • elembio/bases2fastq: Identifies the image that you want to pull. By default, this image is the latest. For a specific a version, include a tag, such as elembio/bases2fastq:1.4.0.
  • /input: Defines the path to the input directory.
  • /output: Defines the path to the output directory. Two executions with the same output directory overwrites the location.
  • -p: Threads Bases2Fastq for faster parallel execution. The value is based on your system setup. The example command has a value of 8, which requires at least 8 CPUs.
  • {options}: Defines the optional arguments.

Executing in local environments might require mounting the directory to Docker:

  • -v: Mounts the local path to Docker as a volume.

  • :/input: Binds the mounted path to a variable for use by Bases2Fastq.

    docker run --rm  -v /Users/username/path/to/input/:/input -v /Users/username/path/to/input/:/output elembio/bases2fastq bases2fastq /input /output -p 8 {options}

Execute with Amazon S3

Amazon S3 requires Uniform Resource Identifiers (URIs) to serve as paths to the Amazon S3 buckets that contain the input and output files. Access to the buckets requires AWS credentials to be configured for the setup. If both of the input and output locations are AWS, they must use the same credentials.

  • For Amazon S3 within an EC2 instance, Bases2Fastq detects the credentials or Identity and Access Management (IAM) role that is associated with the instance.
  • For Amazon S3 access without an EC2 instance, define AWS credentials with the following environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_DEFAULT_REGION.
Note:  

If Bases2Fastq fails to detect credentials attached to an AWS storage location with an EC2 instance, the environment variable might be incorrect. Make sure that this variable is set correctly in the EC2 instance: export AWS_DEFAULT_REGION=$aws_region.

  1. To set the credentials as environment variables, run the following export command in terminal:

    export AWS_ACCESS_KEY_ID=ExampleAccessKey
    export AWS_SECRET_ACCESS_KEY=EXAMPLEnhMUUOhQF/T51c6A5+DtQas9ghebs
    export AWS_DEFAULT_REGION=us-west-2
  2. Copy the following template execution command for Docker or static binary:

For Docker, the execution command includes -e to pull in the credentials set as environment variables.

docker run --rm -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY -e AWS_DEFAULT_REGION=$AWS_DEFAULT_REGION elembio/bases2fastq bases2fastq s3://bucket/input s3://bucket/output -p 8 {options}
  1. Update the template command and execute Bases2Fastq as described in steps 2–5 of the execution instructions.

Execute with GCS

Google Cloud Storage (GCS) storage requires URLs to serve as preconfigured paths to the Cloud Storage buckets that contain the input and output files. Access to the buckets requires GCS credentials to be configured for the setup. If input and output locations are GCS, they must use the same credentials.

The environment variable GOOGLE_APPLICATION_CREDENTIALS must link to the file that contains the application credentials. For more information, see the Google documentation.

  1. To set the credential as an environment variable, run the following export command in terminal:

    export CREDENTIALS="/path/to/GCP-creds.json"
  2. Copy the following template execution command for Docker or static binary:

For Docker, the execution command includes -e to pull in the credential set as an environment variable.

docker run --rm -e GOOGLE_APPLICATION_CREDENTIALS=$CREDENTIALS elembio/bases2fastq bases2fastq gs://bucket/input gs://bucket/output -p 8 {options}
  1. Update the template command and execute Bases2Fastq as described in steps 2–5 of the execution instructions.

Execute with Rclone-Compatible Cloud

Before you execute Bases2Fastq with rclone, review the rclone requirements for system setup.

If you use rclone, Element recommends that you execute with the static binary. For execution, use the --input-remote and --output-remote options as shown in the following code example. Then, update the template command and execute Bases2Fastq as described in steps 2–5 of the execution instructions.

./bases2fastq path/to/input/remote path/to/output/remote --input-remote "input-name" --output-remote "output-name" -p 8 {options}