Integrating DNAnexus for FASTQ Automation
Working together, DNAnexus and ElemBio Cloud provide an end-to-end solution for AVITI data analysis. Using a DNAnexus provider and storage connection in ElemBio Cloud, the AVITI System streams data directly to a DNAnexus project. With the configuration of a Bases2Fastq flow in ElemBio Cloud and a workflow in DNAnexus, you can automatically generate FASTQ files and initiate secondary analysis. This tutorial demonstrates how to integrate the ElemBio Cloud and DNAnexus platforms to set up automatic downstream analyses of run data.
The tutorial covers the following topics:
- Understanding the FASTQ generation process with a DNAnexus workflow, Bases2Fastq applet, and ElemBio Cloud
- Creating a workflow that supports the Bases2Fastq applet in DNAnexus
- Connecting the DNAnexus workflow to ElemBio Cloud for automatic FASTQ generation
- Exploring the ability to connect secondary analysis to FASTQ generation in DNAnexus
Process Overview
The AVITI System generates raw bases files. Using the storage connection in ElemBio Cloud, the AVITI System directly transfers the bases files and other output files to a project in DNAnexus. The Bases2Fastq app converts the bases files into single-sample FASTQ files for secondary analysis in a DNAnexus pipeline. Using the DNAnexus apps of your choice, the pipeline processes the samples for secondary analysis, such as alignment and variant calling. Secondary analysis completes the end-to-end analysis workflow.
Before You Begin
This tutorial requires you to already have a DNAnexus account. Sign up for an account at the DNAnexus website and create a project. In the DNAnexus user interface, you can select a project to display project details, including the Project ID. ElemBio Cloud requires this ID for integration with DNAnexus.
DNAnexus Command-Line Tools
Use the command-line tool DNAnexus Platform SDK, or dx-toolkit
, to complete this tutorial. Install the tool and configure it with your project and access tokens before proceeding. Enter the command prompts in this tutorial in a CLI terminal on the system. For more information on the tool, see the DNAnexus Command Line Quickstart.
Set Up the Tutorial Directory
The following commands set up a folder for the tutorial and set it as the working directory in the CLI. Make sure to move the run data to this directory.
mkdir dnanexus-tutorial
cd ./dnanexus-tutorial
Set Up the DNAnexus Workflow
The DNAnexus workflow must use the Bases2Fastq applet to initiate analysis when sequencing completes.
- Clone the Elembio GitHub repository that contains the code for the Bases2Fastq applet. Run the following command to clone the repository.
git clone git@github.com:Elembio/bases2fastq-dx.git
- Log in to the DNAnexus CLI and select your DNAnexus project. Complete all login prompts and select your project for the workflow.
dx login
- Build the applet in the selected project.
dx build -f bases2fastq/
- In the DNAnexus UI, add a new workflow to your project.
- Enter dx_tutorial as the new name for the workflow.
- Add the Bases2Fastq applet as the first step in your dx_tutorial workflow.
- Return to the project and select the new workflow.
DNAnexus displays details for the workflow, including the name, ID, and status.
- Copy the workflow ID for the new workflow to use when configuring ElemBio Cloud.
Configure ElemBio Cloud
After you create the DNAnexus workflow for the Bases2Fastq applet, configure ElemBio Cloud to integrate with the workflow. For a successful integration, complete the following tasks:
Set up a DNAnexus provider. Make sure the access key for the provider has permissions to access the project with the dx_tutorial workflow.
Use the provider to set up a DNAnexus storage connection. Make sure to use the project ID for the project that contains the Bases2Fastq applet.
Use the provider to set up a Bases2Fastq flow with the dx_tutorial workflow.
Test the Bases2Fastq Applet
With the workflow set up, you can run the following command to execute the Bases2Fastq workflow.
- Download and extract simulated data for a 2 x 150 AVITI System run.
curl http://element-public-data.s3.amazonaws.com/bases2fastq-share/bases2fastq-v2/20230404-bases2fastq-sim-151-151-9-9.tar.gz -o sim-data.zip
tar -xvf sim-data.zip
- Upload the simulated data to the DNAnexus project. If the upload is successful, the
dx ls -l
command displays the files from the run data.
time dx upload -r 20230404-bases2fastq-sim-151-151-9-9/ -p --path dx_tutorial
dx ls -l
- Test the workflow with the simulated data. Replace {your-projectid} with your project ID.
dx run bases2fastq-dx -i analysis_directory={your-projectid}:/dx_tutorial
Explore Secondary Analysis
The current workflow successfully initiates the generation of FASTQ files in DNAnexus. However, to make use of these files, you can set up additional steps in the workflow for secondary analysis. A Nextflow Tower applet that Element provides can prepare the FASTQ files for your samples for additional analysis. Parabricks also serves as a first level of additional analysis.
Access the following Elembio GitHub repositories to clone and build these applets. Follow the same procedure as the Bases2Fastq applet to add them as second and third steps to your DNAnexus workflow.