Software and System Setup
Bases2Fastq is run as either a static binary executable or in a containerized execution with Docker. Therefore, setting up Bases2Fastq requires setting up Docker or the static binary on a compute environment that meets system requirements. You must also configure the compute environment to transfer input and output files.
System Requirements
Operating System
The dockerized Bases2Fastq runs on any operating system (OS) but requires the installation of Docker. The static binary executable requires any Linux OS on an x86 architecture with glibc v2.17 or later. To verify the glibc version for static binary, run the following command:
ldd --version
# The version should populate below
ldd (Ubuntu GLIBC 2.31-0ubuntu9.9) 2.31
Bases2Fastq cannot run on an arm processor.
Memory and Performance
Software performance depends on the resources dedicated to the processing environment. For optimal performance, make sure you have at least 16 central processing unit (CPU) cores available and enable threading Bases2Fastq with the -p
argument. Both Docker and static binary require 4 GB RAM per concurrent thread.
The following benchmarks estimate the time Bases2Fastq takes to execute a 2 x 150 sequencing run:
Setup | Time Estimate |
---|---|
A non-volatile memory express (NVMe) solid-state drive (SSD) using 8 threads | 60 minutes |
An NVMe SSD using 20 threads | 30 minutes |
An Amazon m5.12xlarge EC2 instance with 48 virtual CPUs and onboard SSD storage | < 30 minutes |
Temporary Directory
When using cloud storage, Bases2Fastq downloads input files and stages output files in a temporary directory. Intermediate files generated during analysis are also stored in the temporary directory. The temporary directory is cleared after the completion of a Bases2Fastq execution.
The temporary directory typically uses 400–500 GB for a 2 x 150 run (approximately 1 billion reads). For some applications, a run can use up to 800 GB. The necessary amount of scratch space depends on the number of polonies and cycles in the run and the optional arguments in the Bases2Fastq execution.
By default, Bases2Fastq uses the temporary directory of the OS. To change the location of the temporary directory, set the environment variable TMPDIR
. Use the following example command and replace /path/to/scratch
with the desired directory:
export TMPDIR="/path/to/scratch"
File Transfer and Storage Setup
To transfer files, Bases2Fastq requires paths to input and output locations. You can store input and output files in a local location or the cloud. For cloud storage, the following providers are compatible:
- Amazon Web Services Simple Storage Service (AWS S3)
- Google Cloud Storage (GCS)
- Any rclone-compatible provider
AWS S3 and GCS storage connections require credential configuration for Bases2Fastq execution. See Execute with AWS S3 and Execute with GCS for more information.
Rclone Requirements
Rclone is a command-line program to manage files on cloud storage. Rclone provides the ability to mount any local, cloud, or virtual file system. Rclone allows Bases2Fastq to access many cloud storage providers. However, Element has not tested every available rclone provider.
Follow the instructions at rclone.org/install to download and install rclone. Configure an rclone remote to communicate with your cloud storage. For more information, see the storage provider-specific instructions at rclone.org/#providers.
Bases2Fastq Installation
Set up Bases2Fastq using Docker or static binary. Static binary requires downloading and extracting the static binary. Current and previous versions are available for installation.
Install the Current Version
- Docker
- Static Binary
Follow the OS-specific instructions at docs.docker.com/get-docker/.
Run the following command to pull the latest image from the Element public registry at DockerHub:
docker pull elembio/bases2fastq
- To confirm that Bases2Fastq is operational, run the following commands to display the software version and help content:
docker run elembio/bases2fastq bases2fastq --version
docker run elembio/bases2fastq bases2fastq --help
Download the latest version of the static binary using one of the following methods:
- Visit the Element website and follow the onscreen prompts.
- Run the following
wget
command:
wget https://bases2fastq-release.s3.amazonaws.com/bases2fastq-latest.tar.gz
- Run the following
tar
command to extract the file:
tar -xvf bases2fastq-latest.tar.gz
- To confirm that Bases2Fastq is operational, run the following commands to display the software version and help content:
./bases2fastq --version
./bases2fastq --help
- To generate HTML QC reports, run one of the following commands to install Python 3.6 or newer with NumPy, Bokeh, and bs4 packages:
sudo apt install python3 python3-pip libjpeg-dev zlib1g-dev
pip3 install numpy==1.* bs4==0.*
pip3 install 'bokeh>=2.3,<3'
sudo yum install python3 python3-pip libjpeg‑turbo‑devel zlib-devel
pip3 install numpy==1.* bs4==0.*
pip3 install 'bokeh>=2.3,<3'
The bs4 package requires Pillow, which in turn requires libjpeg-dev and zlib-dev. Packages might also require you to upgrade pip3. Install according to your system.
If you do not install Python 3.6 or newer or a package is missing, Bases2Fastq logs a warning and does not generate the HTML QC report.
Install a Previous Version
Bases2Fastq follows the semantic versioning specification. All major, minor, and patch versions are available by tag for both the Docker and static binary.
See Release Notes to review release notes for previous versions of Bases2Fastq.
Complete the instructions for Bases2Fastq installation using a command to download a previous version, as shown in the following code examples.
- Docker
- Static Binary
# Get the latest of version 1
docker pull elembio/bases2fastq:1
# Get the latest minor 1.4 version
docker pull elembio/bases2fastq:1.4
# Get a specific major, minor, patch version
docker pull elembio/bases2fastq:1.4.0
Replace the {version}
in the url https://bases2fastq-release.s3.amazonaws.com/bases2fastq-{version}.tar.gz
to retrieve the desired version.
# Get the latest of version 1
wget https://bases2fastq-release.s3.amazonaws.com/bases2fastq-1.tar.gz
# Get the latest minor 1.4 version
wget https://bases2fastq-release.s3.amazonaws.com/bases2fastq-1.4.tar.gz
# Get a specific major, minor, patch version
wget https://bases2fastq-release.s3.amazonaws.com/bases2fastq-1.4.0.tar.gz