Skip to main content

Executing Bases2Fastq

After installing Bases2Fastq, entering a command prompt in terminal executes the software. The command varies based on cloud storage setup and execution type (Docker or static binary). Generally, the command has the following four parts:

  • The bases2fastq executable
  • An input directory
  • An output directory
  • Optional arguments to manipulate the output

Optional arguments appended to the command let you adjust demultiplexing and FASTQ file generation to suit the application. For a complete list, see Optional Arguments.

Review Run Manifest

Before executing Bases2Fastq, make sure the run manifest in the input directory correctly defines demultiplexing. For more information, see the Run Manifest Documentation.

Note

If samples are not indexed in the run manifest, Bases2Fastq skips demulitplexing and assigns all polonies to one sample. If necessary, create and use a corrected run manifest.

Execute the Software

  1. Copy the template execution command for your storage type from one of the following subsections:
  2. Update the directory paths in the template command, including input and output for the intended input and output locations.
  3. Adjust the optional arguments in the run command.
    • To add arguments, replace {options} with any of the arguments listed in Optional Arguments.
    • To execute Bases2Fastq without any arguments, delete {options}.
  4. Execute the finalized command in terminal.

    The terminal displays execution progress, as shown in the following code example. Execution completes when the terminal displays the elapsed time.

Run command: bases2fastq /input /output
Parsing run manifest /input/RunManifest.csv
Starting FASTQ generation for Indexing-Sim
Processing tile L1R02C01S1
Processing tile L1R02C01S2
...
....
Aggregating run stats
========== Stats Summary ==========
Reads assigned: 35.883%
Mean Q score: 24.001
Percent Q30: 38.307%
Percent Q40: 17.019%
Assigned yield: 0.010 Gb
===================================
Generating HTML QC-report
========== Timing ==========
FASTQ generation: 14.870s
Stats reports: 16.304s
Total elapsed: 31.174s
============================
Output stored in /output
  1. Access the output files in the location specified in the execution command.
    • To view logs, go to info/Bases2Fastq.log.
    • To view the HTML QC report, go to the output folder, double-click the file, and move through each tab.

Execute with Local Storage

Copy the template execution command for Docker or static binary. Update the template command and execute Bases2Fastq as described in steps 2–5 of the execution instructions.

When using Docker, the general execution command automatically pulls the image to the local environment if it is not already there. The command uses the following structure:

  • docker run invokes the Docker daemon to start the specified image.
  • --rm removes the image after the execution completes. Element recommends using this argument to keep your system clean.
  • elembio/bases2fastq identifies the image that you want to pull. By default, this image is the latest. For a specific a version, include a tag, such as elembio/bases2fastq:1.4.0.
  • /input defines the path to the input directory.
  • /output defines the path to the output directory. Executing two times with the same output directory overwrites the location.
  • -p threads Bases2Fastq for faster parallel execution. The value depends on your system setup. The example command has a value of 8, which requires at least 8 CPUs.
  • {options} defines the optional arguments.

Executing in local environments might require mounting the directory to Docker:

  • -v mounts the local path to Docker as a volume.
  • :/input binds the mounted path to a variable for use by Bases2Fastq.
docker run --rm  -v /Users/username/path/to/input/:/input -v /Users/username/path/to/input/:/output elembio/bases2fastq bases2fastq /input /output -p 8 {options}

Execute with AWS S3

AWS S3 requires Uniform Resource Identifiers (URIs) to serve as paths to the Amazon S3 buckets that contain the input and output files. Accessing the buckets requires AWS credentials configured for the setup. If both input and output locations are AWS, they must use the same credentials.

  • For AWS S3 within an EC2 instance, Bases2Fastq detects the credentials or Identity and Access Management (IAM) role associated with the instance.
  • For AWS S3 access without an EC2 instance, define AWS credentials with the following environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_DEFAULT_REGION.
Note

If Bases2Fastq fails to detect credentials attached to an AWS storage location with an EC2 instance, the environment variable might be incorrect. Make sure this variable is set correctly in the EC2 instance: export AWS_DEFAULT_REGION=$aws_region.

  1. Run the following export command in terminal to set the credentials as environment variables.
export AWS_ACCESS_KEY_ID=ExampleAccessKey
export AWS_SECRET_ACCESS_KEY=EXAMPLEnhMUUOhQF/T51c6A5+DtQas9ghebs
export AWS_DEFAULT_REGION=us-west-2
  1. Copy the following template execution command for Docker or static binary.

For Docker, the execution command includes -e to pull in the credentials set as environment variables.

docker run --rm -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY -e AWS_DEFAULT_REGION=$AWS_DEFAULT_REGION elembio/bases2fastq bases2fastq s3://bucket/input s3://bucket/output -p 8 {options}
  1. Update the template command and execute Bases2Fastq as described in steps 2–5 of the execution instructions.

Execute with GCS

GCS storage requires URLs to serve as preconfigured paths to the Cloud Storage buckets that contain the input and output files. Accessing the buckets requires configuring GCS credentials for the setup. If both input and output locations are GCS, they must use the same credentials.

The environment variable GOOGLE_APPLICATION_CREDENTIALS must link to the file that contains the application credentials. For more information, see the Google documentation.

  1. Run the following export command in terminal to set the credential as an environment variable.
export CREDENTIALS="/path/to/GCP-creds.json"
  1. Copy the following template execution command for Docker or static binary.

For Docker, the execution command includes -e to pull in the credential set as an environment variable.

docker run --rm -e GOOGLE_APPLICATION_CREDENTIALS=$CREDENTIALS elembio/bases2fastq bases2fastq gs://bucket/input gs://bucket/input -p 8 {options}
  1. Update the template command and execute Bases2Fastq as described in steps 2–5 of the execution instructions.

Execute with Rclone-Compatible Cloud

Before executing Bases2Fastq with rclone, review the rclone requirements for system setup.

If you are using rclone, Element recommends executing with static binary. For execution, use the --input-remote and --output-remote options as shown in the following code example. Then update the template command and execute Bases2Fastq as described in steps 2–5 of the execution instructions.

./bases2fastq path/to/input/remote path/to/output/remote --input-remote "input-name" --output-remote "output-name" -p 8 {options}

Troubleshooting

The following sections provide resolutions to common problems that can occur during FASTQ file generation. If a problem persists, contact Element Technical Support.

Indexing Performance

If indexing performance does not meet specifications, complete the following steps to examine potential causes:

  1. Make sure the run manifest includes the PhiX Control Library index sequences.

    Spiking in PhiX Control Library without recording the index sequences affects index assignment.

  2. Make sure the orientation of the PhiX Control Library index sequences matches the orientation of the sample index sequences.
  3. Review the index charts in the HTML QC report and use the information to correct the run manifest or QC-fail the sequencing run.

    The charts show the index assignment percentage rate, the number of polonies assigned to each index, and the most frequent unassigned indexes.

  4. Review the I1IsReverseComplement and I2IsReverseComplement metrics in RunStats.json. If a column contains inconsistent orientations, correct the run manifest accordingly.

    The metrics show the observed orientation of index sequences relative to the orientation recorded in the Index1 and Index2 columns of the run manifest.

Missing Flow Cell ID

If the flow cell ID is missing from the output, execute Bases2Fastq with the flow cell ID argument (e.g., --flowcell-id "1234567890"). For an example command, see Add a Missing Flow Cell ID.

Reprocessing with a Corrected Run Manifest

If you must reprocess Bases2Fastq with a corrected run manifest, execute Bases2Fastq with the QC-only argument to validate indexes on one tile in a corrected run manifest. For an example command, see Run QC-Only Mode.