Skip to main content

Executing Cells2Stats

Execute Cells2Stats on a local compute environment by entering a command in the CLI terminal. The command varies based on cloud storage setup and execution type, either Docker or static binary. Append optional arguments to the command to specify file generation settings.

The command has the following three parts:

  • The cells2stats executable
  • Optional arguments to manipulate the output

Execute the Software

  1. Copy the template execution command for your storage type from one of the following subsections:
  2. Adjust the optional arguments in the run command.
    • To add arguments, replace {options} with any of the arguments listed in Optional Arguments.
    • To execute Cells2Stats without any arguments, delete {options}.
  3. Execute the command in the terminal.
  4. The execution is completed successfully when the terminal displays the following log lines.
<timestamp> [info]: Analysis completed successfully.
<timestamp> [info]: Output written to </output>
  1. Access the output files in the run directory. By default they go to the Cytoprofiling/<timestamp> folder.

Execute with Local Storage

Copy the template execution command for Docker or static binary. Update the template command and execute Cells2Stats as described in steps 2–5 of the execution instructions.

When using Docker, the general execution command automatically pulls the image to the local environment if it is not already there. The command uses the following structure:

  • docker run invokes the Docker daemon to start the specified image.
  • --rm removes the image after the execution completes. Element recommends using this argument to keep your system clean.
  • elembio/cells2stats identifies the image that you want to pull. By default, this image is the latest. For a specific a version, include a tag, such as elembio/cells2stats:1.0.0.
  • -j threads Cells2Stats for faster parallel execution. The value depends on your system setup. The example command has a value of 8, which requires at least 8 CPUs.
  • {options} defines the optional arguments.
  • {directory} defines the path to the directory for input and output files.

Executing in local environments might require mounting the directory to Docker:

  • -v mounts the local path to Docker as a volume.
  • :/input binds the mounted path to a variable for use by Cells2Stats.
docker run --rm  elembio/cells2stats cells2stats {directory} -j 8 {options}

Execute with AWS S3

AWS S3 requires Uniform Resource Identifiers (URIs) to serve as paths to the Amazon S3 buckets that contain the input and output files. Accessing the buckets requires AWS credentials configured for the setup. If both input and output locations are AWS, they must use the same credentials.

  • For AWS S3 within an EC2 instance, Cells2Stats detects the credentials or Identity and Access Management (IAM) role associated with the instance.
  • For AWS S3 access without an EC2 instance, define AWS credentials with the following environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_DEFAULT_REGION.
Note:

If Cells2Stats fails to detect credentials attached to an AWS storage location with an EC2 instance, the environment variable might be incorrect. Make sure this variable is set correctly in the EC2 instance: export AWS_DEFAULT_REGION=$aws_region.

  1. Run the following export command in terminal to set the credentials as environment variables.
export AWS_ACCESS_KEY_ID=ExampleAccessKey
export AWS_SECRET_ACCESS_KEY=EXAMPLEnhMUUOhQF/T51c6A5+DtQas9ghebs
export AWS_DEFAULT_REGION=us-west-2
  1. Copy the following template execution command for Docker or static binary.

For Docker, the execution command includes -e to pull in the credentials set as environment variables.

docker run --rm -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY -e AWS_DEFAULT_REGION=$AWS_DEFAULT_REGION elembio/cells2stats cells2stats s3://bucket/input -o s3://bucket/output -j 8 {options}
  1. Update the template command and execute Cells2Stats as described in steps 2–5 of the execution instructions.

Execute with GCS

GCS storage requires URLs to serve as preconfigured paths to the Cloud Storage buckets that contain the input and output files. Accessing the buckets requires configuring GCS credentials for the setup. If both input and output locations are GCS, they must use the same credentials.

The environment variable GOOGLE_APPLICATION_CREDENTIALS must link to the file that contains the application credentials. For more information, see the Google documentation.

  1. Run the following export command in terminal to set the credential as an environment variable.
export CREDENTIALS="/path/to/GCP-creds.json"
  1. Copy the following template execution command for Docker or static binary.

For Docker, the execution command includes -e to pull in the credential set as an environment variable.

docker run --rm -e GOOGLE_APPLICATION_CREDENTIALS=$CREDENTIALS elembio/cells2Stats cells2stats gs://bucket/input -o gs://bucket/output -j 8 {options}
  1. Update the template command and execute Cells2Stats as described in steps 2–5 of the execution instructions.

Execute with Rclone-Compatible Cloud

Before executing Cells2Stats with rclone, review the rclone requirements for system setup.

If you are using rclone, Element recommends executing with static binary. For execution, use the --input-remote and --output-remote options as shown in the following code example. Then update the template command and execute Cells2Stats as described in steps 2–5 of the execution instructions.

./cells2stats path/to/input/remote path/to/output/remote --input-remote "input-name" --output-remote "output-name" -j 8 -s {options}