Software and System Setup
When you set up Cells2Stats on a system, you must install a static binary executable or use a Docker container. In addition to setting up Docker or a static binary compute environment that meets system requirements, you must configure the compute environment to transfer files.
System Requirements
Operating System
Cells2Stats is compatible with various operating systems (OS). Review the following compatibility matrix to determine the appropriate Cells2Stats distribution for your OS.
Operating System | Docker Compatibility | Static Binary Compatibility | OS Notes |
---|---|---|---|
Linux OS | The static binary executable is compatible with any Linux OS on an x86 architecture that uses glibc v2.19 or later. To verify the glibc version for static binary, run the following command: ldd --version | ||
Windows OS | Windows OS is not directly compatible with the static binary executable. Install Docker and run using Docker. Make sure to review the system requirements. Element recommends enabling the WSL 2 backend feature. | ||
Windows OS with Windows Subsystem for Linux (WSL) | If you install WSL on Windows OS, you can use the static binary executable in the WSL environment. | ||
Mac OS | MacOS is not compatible with the static binary executable. Install Docker and run using Docker. |
Cells2Stats is not supported on ARM processors.
Memory and Performance
Software performance depends on the resources dedicated to the processing environment and if CellProfiler is enabled. For optimal performance, make sure you have at least 16 CPU cores available and enable threading Cells2Stats with the -j
argument. The following memory requirements apply to both Docker and static binary distributions:
- If CellProfiler is enabled, Cells2Stats requires 2 GB RAM per concurrent thread.
- If CellProfiler is not enabled, Cells2Stats requires 200 MB RAM per concurrent thread.
The following benchmarks estimate the time Cells2Stats takes to regenerate output files from an AVITI24 System:
Setup | Time Estimate |
---|---|
A non-volatile memory express (NVMe) solid-state drive (SSD) using 8 threads | 2 hours |
An NVMe SSD using 20 threads | 1 hour |
An Amazon c5ad.8xlarge EC2 instance with 48 virtual CPUs and onboard SSD storage | < 50 minutes |
The following benchmarks estimate the time Cells2Stats takes to generate CytoCanvas visualization data from an AVITI24 System:
Setup | Time Estimate |
---|---|
A non-volatile memory express (NVMe) solid-state drive (SSD) using 8 threads | <30 minutes |
An NVMe SSD using 20 threads | 30 minutes |
An Amazon c5ad.8xlarge EC2 instance with 48 virtual CPUs and onboard SSD storage | < 30 minutes |
Temporary Directory
When using cloud storage, Cells2Stats downloads input files and stages output files in a temporary directory. Intermediate files generated during analysis are also stored in the temporary directory. After an execution completes, the temporary directory is cleared. The temporary directory typically uses approximately 50 GB for a typical cytoprofiling run.
By default, Cells2Stats uses the temporary directory of the OS. To change the location of the temporary directory, set the environment variable TMPDIR
. Use the following example command and replace /path/to/scratch
with the desired directory:
export TMPDIR="/path/to/scratch"
File Transfer and Storage Setup
To transfer files, Cells2Stats requires paths to input and output locations. You can store input and output files in a local location or the cloud. For cloud storage, the following providers are compatible:
- Amazon Web Services Simple Storage Service (AWS S3)
- Google Cloud Storage (GCS)
- Any rclone-compatible provider
AWS S3 and GCS storage connections require credential configuration for Cells2Stats execution. See Execute with AWS S3 and Execute with GCS for more information.
Rclone Requirements
Rclone is a command-line program to manage files on cloud storage. Rclone provides the ability to mount any local, cloud, or virtual file system. Rclone allows Cells2Stats to access many cloud storage providers. However, Element has not tested every available rclone provider.
Follow the instructions at rclone.org/install to download and install rclone. Configure an rclone remote to communicate with your cloud storage. For more information, see the provider-specific instructions at rclone.org/#providers.
Cells2Stats Installation
Set up Cells2Stats using Docker or static binary.
The static binary executable for Cells2Stats is only compatible with specific OS configurations. Review the OS requirements before attempting installation.
- Docker
- Static Binary
Follow the OS-specific instructions at docs.docker.com/get-docker/.
Run the following command to pull the latest version of the Cells2Stats image from the Element public registry at DockerHub:
docker pull elembio/cells2stats
- To confirm that Cells2Stats is operational, run the following commands to display the software version and help content:
docker run elembio/cells2stats cells2stats --version
docker run elembio/cells2stats cells2stats --help
Download the latest version of the static binary using one of the following methods:
- Visit the Element website and follow the onscreen prompts.
- Run the following
curl
command:
curl https://cells2stats-release.s3.amazonaws.com/cells2stats-latest.tar.gz -o cells2stats-latest.tar.gz
- Run the following
tar
command to extract the file:
tar -xvf cells2stats-latest.tar.gz
- Cells2Stats requires installing Python 3.9 or greater in order to run the
--visualization
flag and prepare your run output for data visualization in CytoCanvas. In order to run visualization, Cells2Stats also requires the following packages as dependencies.
sudo apt install -y python3 python3-pip libgl1-mesa-glx
pip3 install zarr tifffile numpy opencv-python pandas ome-zarr scikit-imag geopandas pyarrow
sudo yum install -y python3 python3-pip mesa-libGL
pip3 install zarr tifffile numpy opencv-python pandas ome-zarr scikit-image pyarrow geopandas
- To confirm that Cells2Stats is operational, run the following commands to display the software version and help content:
./cells2stats --version
./cells2stats --help
If you need to re-reun with CellProfiler, CellProfiler will have to be installed on your machine. This will require specific setup for your system, but an example installation for Ubuntu is shown below. Note CellProfiler requires python 3.8.
export DEBIAN_FRONTEND=noninteractive
apt -y update && apt -y upgrade
apt install -y software-properties-common
add-apt-repository -y ppa:deadsnakes/ppa
apt install -y python3.8
apt install -y python3-virtualenv
apt install -y make gcc build-essential libgtk-3-dev
apt-get install -y python3-pip openjdk-11-jdk-headless default-libmysqlclient-dev libnotify-dev libsdl2-dev
apt-get install -y freeglut3 freeglut3-dev libgl1-mesa-dev libglu1-mesa-dev libgstreamer-plugins-base1.0-dev libgtk-3-dev libjpeg-dev libsm-dev libtiff-dev libwebkit2gtk-4.0-dev libxtst-dev
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export PATH=$PATH:/home/ubuntu/.local/bin
apt install -y python3.8-distutils
apt-get install -y python3.8-dev
virtualenv cellprofiler_env --python=python3.8
source cellprofiler_env/bin/activate
pip install --upgrade pip
pip install wheel cython==0.29.37 numpy scipy
pip install https://extras.wxpython.org/wxPython4/extras/linux/gtk3/ubuntu-22.04/wxPython-4.2.1-cp38-cp38-linux_x86_64.whl
pip install --no-build-isolation scikit-learn==0.24.2
pip install cellprofiler