Skip to main content

Cytoprofiling Run Output Files

The following is an example run output directory of a successful cytoprofiling run:

example-storage-location
└── DemoInstrumentName
└── 20240506_DemoInstrumentName_ExampleRunName
├── BaseCalling
│ ├── BaseCalls
│ │ └── BXX_CXXX.zip ... (for n cycles)
│ ├── CellXform
│ │ └── BXX ... (for n batches)
│ │ └── LXRXXCXXS1.xform ... (for n tiles)
│ ├── Location
│ │ └── BXX ... (for n batches)
│ │ └── LXRXXCXXS1ZXX.loc ... (for n tiles)
│ └── Filter
│ └── BXX ... (for n batches)
│ └── LXRXXCXXS1ZXX.filter ... (for n tiles)
├── Cell Segmentation
│ └── WellXX ... (for n wells)
│ ├── LXRXXCXXS1_Cell.tif ... (for n tiles)
│ └── LXRXXCXXS1_Nuclear.tif ... (for n tiles)
├── Thumbnails
│ └── WellXX_Thumbnail.png ... (for n wells)
├── Projection
│ └── WellXX ... (for n wells)
│ └──BXX_LXRXXCXXS1_Target.tif ... (for each target and n tiles)
├── Cytoprofiling
│ ├── Instrument
│ │ ├── AverageNormWellStats.csv
│ │ ├── RawCellStats.csv
│ │ ├── RawCellStats.parquet
│ │ ├── RunStats.json
│ │ ├── Versions.json
│ │ └── Wells
│ │ └── WellXX ... (for n wells)
│ │ └── BXX ... (for n batches)
│ │ └── LXRXXCXXS1_barcodes.parquet ... (for n tiles)
├── RunManifest.csv
├── RunManifest.json
├── RunParameters.json
├── Panel.json
├── RunStats.bin
├── RunAnalysisFilesUploaded.json
└── RunUploaded.json

Run Output Files

The following table defines the key cytoprofiling run output files from an AVIT24 System. Parquet files are column-based files that efficiently store data. For more information, see the Apache Parquet Documentation.

Directory and File NameFile FormatDescriptionQuantity
{root}/BaseCalling/BaseCalls/{batch}_C{cycle}.zipBinaryReports the raw cytoprofiling base call dataOne per batch per cycle
{root}/BaseCalling/Filter/{batch}/{tile}.filterBinaryFor use with future applicationsOne per tile per batch
{root}/BaseCalling/CellXform/{batch}/{tile}.xformBinaryTransforms polony locations onto cellsOne per tile per batch
{root}/BaseCalling/Location/{batch}/{tile}.locBinaryIdentifies polony locations on the flow cellOne per tile per batch
{root}/BaseCalling/RunStats.binLogInternal troubleshooting log of offline runsOne per run
{root}/CellSegmentation/{well}/{tile}_Cell.tifTIFCell segmentation masks for a well, where the value for a pixel in a cell is the cell IDOne per tile per well
{root}/CellSegmentation/{well}/{tile}_Nuclear.tifTIFNuclear segmentation masks for a well, where the value for a pixel in a nucleus is 1One per tile per well
{root}/Cytoprofiling/Instrument/AverageNormWellStats.csvCSVReports filtered and average metrics for each well in the runOne per run
{root}/Cytoprofiling/Instrument/Versions.jsonJSONReports the version number for CSV output files and bundled software programsOne per run
{root}/Cytoprofiling/Instrument/RawCellStats.csvCSVReports values per cell for all morphology features and raw target counts in a runOne per run
{root}/Cytoprofiling/Instrument/RawCellStats.parquetParquetR values per cell for all morphology features and raw target counts in a runOne per run
{root}/Cytoprofiling/Instrument/RunStats.jsonJSONReports run metricsOne per run
{root}/Cytoprofiling/Instrument/Wells/
{well}/{batch}/{tile}_barcodes.parquet
ParquetBarcoding information for each polony in a tileOne per tile per batch per well
{root}/Panel.jsonJSONRecords target detection information for the runOne per run
{root}/Projection/{well}/{batch}_{tile}_{target}.tifTIFZ-projected images of cell paint targetsOne per target per tile
{root}/QC_S12_Avid/B02/C001/L1RXXCXXSXZ00_GRN_F4.tifTIFInternal support and run diagnostics filesOne per tile per run
{root}/RunManifest.csv.CSVManifest that records biological sample information and well mappingOne per run
{root}/RunManifest.jsonJSONVersion of the run manifest that is reserved for Element processesOne per run
{root}/RunParameters.jsonJSONRecords information about the run configurationOne per run
{root}/RunAnalysisFilesUploaded.jsonJSONWritten after the last analysis file is transferred and post-run analysis may beginOne per run
{root}/RunUploaded.jsonJSONThe last file transferred and marks run completionOne per run
{root}/Thumbnails/{well}_Thumbnail.pngPNGThumbnail image for a wellOne per well

Metrics

The run output files contain a variety of metrics, such as tile-specific and average metrics.

  • RunStats.json reports statistics and metrics for a run.
  • RawCellStats.csv and RawCellStats.parquet contain a full set of morphology and quantification metrics for each target and batch.
  • AverageNormWellStats.csv provides metric averages for each well. Metrics that end with .std provide the standard deviation for the metric.

RunStats File

The following table defines the fields that are listed in the RunStats.json file:

FieldDescription
AnalysisIDThe identifier for the analysis that is generated by Cells2Parquet and assigned by the analysis software
AnalysisVersionThe Cells2Stats or Molecule software version
AssignedCountPerMM2The target count per cell area mm² that is reported for each control target in the panel
AssignedCountsPerMM2The target counts per cell area mm² of all barcoding targets
AverageAssignedCountsPerMM2The target counts per cell area mm² of all barcoding targets, averaged across wells
BatchesIdentifies a set of information for each batch
BatchNameThe batch name (string)
CellCountThe number of cells that are identifid by cell segmentation (integer)
ControlTypeThe type of control target (for example, Negative Control 1)
CountThe number of polonies that had a specific sequence basecalled that did not match any target's expected sequence within the allowable mismatch threshold (integer)
DemuxStatsThe statistics that are related to the demultiplexing process
ExpectedSequenceThe expected barcode sequence for a target (string)
FileVersionThe file format version such as 1.2.0 (string)
FlowCellIDThe flow cell identifier (string)
MeanAssignedCountPerCellThe mean number of polonies that are assigned to a target sequence within a cell, across all cells (float)
MedianCellDiameterThe median diameter of cells after cell segmentation (float)
NumPoloniesThe number of polonies (integer)
PercentAssignedReadsThe percentage of reads that are assigned to a target (float)
PercentConfluencyThe percent of the flow cell culture area that is covered by cells (float)
PercentMismatchThe percentage of assigned reads that have a mismatch of 1 or 2 bases in their base calls relative to the reference (float)
PercentNucleatedCellsThe percentage of cells that have an identified nucleus
RunIDThe run ID (string)
RunNameThe run name (string)
TargetsIdentifies a set of information for each target
TargetNameThe target name (string)
SequenceThe sequence that was basecalled but remained unassigned for a polony because it did not match an expected sequence for any target in the panel (string)
UnassignedSequencesIdentifies a set of information for sequences that were not assigned
WellLocationThe location ID of a well. A1, A2, B1, B2, C1, C2, D1, D2, E1, E2, F1, and F2 are valid values
WellsIdentifies a set of information for each well that was used

RawCellStats Files

The following table defines the fixed fields that are listed in the RawCellStats.csv and RawCellStats.parquet files and are the same for every run:

FieldDescription
AreaThe area of the cell in pixels
AreaUmThe area of the cell in microns (μm²), which is converted from pixels
CellThe unique ID of the cell within the run. The cell ID is equal to the sum of (tile_index * $2^{16}$) and the tile-specific local cell ID. This allows cell IDs to be computed independently per tile and ensures that the tile-specific cell ID can be retrieved from the cell ID
NuclearAreaThe area of the nucleus in pixels
NuclearAreaUmThe area of the nucleus in microns (μm²), which is converted from pixels
TileThe specific imaging tile or subregion that the cell was imaged from
WellThe position of the well in the plate. This is calculated from the row and column number (for example, A1)
WellLabelThe label of the well, based on user input from the run manifest. If there is no user input, then this is a duplicate of the Well column
XThe X-coordinate of the cell in pixels, within the tile or image. This is the distance from the tile origin
XumThe X-coordinate of the cell in microns (μm), which is converted from pixels. This is the distance from the tile origin
YThe Y-coordinate of the cell in pixels, within the tile or image. This is the distance from the tile origin
YumThe Y-coordinate of the cell in microns (μm), which is converted from pixels. This is the distance from the tile origin

The following table defines the variable fields that are listed in the RawCellStats.csv and RawCellStats.parquet files and are based on the barcoding and cell paint targets within a run:

FieldDescription
Cellular Counts for Barcoding TargetsThe total cellular counts, with nuclear counts included, of each barcoding target. The column name is based on the target and batch and is shown in the format of {target.batch} (for example, ATF2KT1.BO1).
Cellular Intensity for Cell Paint TargetsThe background-subtracted sum of cellular intensity for each cell paint target (for example, Mitochondria.CP02).
Morphology Metrics for Cell Paint TargetsThe morphology metric output for each cell paint target. The column name is based on the metric, target, and batch and is shown in the format of metric_target.batch (for example, Intensity_MeanIntensityEdge_Mitochondria.CP02).

Metrics come from the following CellProfiler modules:
  • MeasureObjectSizeShape
  • MeasureGranularity
  • MeasureObjectIntensity
  • MeasureObjectIntensityDistribution
  • MeasureTexture

Certain metrics are not available in these files. For example, the output files do not report Zernike metrics. In some files, columns for Z-axis metrics appear with values of 0. Z-axis metrics are not available in the RawCellStats.csv and RawCellStats.parquet files because they are not relevant to the analysis output.

For more information on morphology metrics, see the CellProfiler Manual measurement information.
Nuclear Counts for Barcoding TargetsThe nuclear counts of each barcoding target. The column name is based on the target and batch and is shown in the format of {target_Nuclear.batch} (for example, AKT1_Nuclear.BO1).
Nuclear Intensity for Cell Paint TargetsThe background-subtracted sum of nuclear intensity for each cell paint target (for example, Mitochondria_Nuclear.CP02).

Barcodes Parquet Files

The {tile}_barcodes.parquet files provide the raw barcode data for each tile in a batch. The files indicate the location of targets in different cells.

The following table defines the columns in the parquet files.

FieldDescriptionData Type
BarcodeIndexA barcode ID number that corresponds to the order of targets for a batch as listed in the Panel.json file. A value of 0 indicates an unassigned barcode.Int16
CellA tile-specific ID that is associated with a cell. The Cell ID in barcode parquet files differs from the Cell ID for the run in other parquet files.Int16
IsNuclearIndicates whether the barcode is in the nucleus of the cellBoolean
XThe position of the barcode on the X-axis of the tileUInt16
X μmThe X position in micronsFloat
YThe position of the barcode on the Y-axis of the tileUInt16
Y μmThe Y position in micronsFloat
ZThe position of the barcode on the Z-axis of the tileString
Z μmThe Z position in micronsFloat

Panel File

The Panel.json file contains target information for each batch in the run. Each section of the file defines information for batches and targets, such as ImagingPrimerTubes, BarcodingPrimerTubes, ImagingTargets, and BarcodingTargets.

The following tables describe the Panel.json file information:

FieldDescription
AnalysisSettingsIdentifies an array of settings for the run analysis
ApplicationThe application type for the run, such as Counting
CellBoundaryTargetIdentifies the target that is used to generate cell segmentation masks
FileVersionThe Panel.json file version for the run
IDA unique ID that is assigned to the panel
KitTypeSpecifies the type of reagent or assay kit that is used for the run.
NameThe name of the cytoprofiling kit that is associated with the panel
NuclearTargetIdentifies the target that is used to generate nuclear segmentation masks
PanelCartridgePartNumbersIdentifies a set of information that lists the cartridge part numbers for the panel
SpikeInIdThe ID of your custom add-on protein panel JSON
SpikeInNameThe name of your custom add-on protein panel JSON
SupplementaryCellBoundaryTargetIdentifies an optional additional target that is used to generate cell segmentation masks

Primer Tube Sections

The following table defines information in the ImagingPrimerTubes and BarcodingPrimerTubes sections of the Panel.json file. These sections provide information about the settings for different batches.

FieldDescription
BarcodeMaskThe mask that is used to support barcoding for target analysis
BarcodingPrimerTubesIdentifies a set of information about the settings for batches with targets for analysis
BatchNameThe name of a specific batch
DefaultMismatchThe number of base mismatches that are permitted to assign a barcode to a target, which is typically 2
ImagingPrimerTubesIdentifies a set of information about the settings for batches related to cell paint
MinCyclesThe minimum number of cycles for a specific batch
PMGMaskA base mask that is used to generate the map of polonies
RunOrderIdentifies the ordinal position for a batch in the run
TypeThe type of batch relative to amplification, PreAmp or PostAmp

Target Sections

The following table defines information in the ImagingTargets and BarcodingTargets sections of the Panel.json file. The sections provide information about the targets in each batch.

FieldDescription
AnalysesIdentifies an array with the types of analysis for a batch
BarcodeThe barcode of bases for a specific target in a batch
BarcodingTargetsIdentifies the set of information for targets in each batch for analysis
BaseThe base for a particular cycle in a batch
BatchNameThe name for a specific batch
ControlTypeIdentifies the type of control for a target in a batch, if applicable
CycleThe cycle that a particular base uses in a batch
CycleBasesIdentifies a set of information that defines the base in a specific batch cycle
ImagingTargetsIdentifies the set of information for targets in each batch that is related to amplification
ProbeConcentrationThe concentration of the probe for a specific target in a batch
TargetThe name of a target for a batch
TargetTypeThe type of target for a batch as CellPaint, Protein or Transcript

Run Parameters

The RunParameters.json file contains a record of the input information for a run. When you evaluate run performance or troubleshoot, review this file to confirm that the correct parameters were used for the run.

The following table describes the information in the RunParameters.json file:

FieldDescription
AdvancedSettingsIncludes information about advanced run settings, such as custom recipes
AnalysisLanesThe lanes that the run uses, such as 1,2, or 1+2
ApplicationNameThe type of application for the run, such as Counting
BarcodeStrThe barcode number for a consumable, which the instrument scans and identifies during consumable loading
BaseForChannelsIdentifies the set of bases for the channels
BatchesIdentifies the set of batches for the run
BufferIdentifies a set of information that describes the buffer for the run
ChannelsIdentifies the colors to associate with channels
ColorForChannelsIdentifies the channels to associate with bases and colors
ConsumablesIdentifies the set of information for run consumables
CustomRecipeNameThe name of the custom recipe file that is uploaded in Advanced Run Settings
CyclesIdentifies the number of cycles in a batch
CycleFormatDefines the format in which cycles appear
DateTimestamp for the run
ExpirationThe expiration timestamp for a consumable, which the instrument identifies or a user inputs during consumable loading
ExpirationStrAn expiration ID number for a consumable that is associated with the expiration date
FileVersionThe version of the RunParameters.json file for the run
FlowcellIdentifies a set of information that describes the flow cell for the run
ImageHeightThe height of the image in pixels
ImageInfoIdentifies the set of information that defines characteristics of the cytoprofiling image
ImageWidthThe width of the image in pixels
LanesDefines an array that lists the lanes of the flow cell for the run
InstrumentNameThe name of the instrument for the run
LotNumberThe lot number for a consumable, which the instrument identifies during consumable loading before the run
NameThe name of a tile in a set of Tiles
OperatorNameThe name of the person that set up the run on the instrument
PanelCartridgeIdentifies a set of information that describes the sequencing cartridge for the run
PanelNameThe name of the panel that is used for the run
PartNumberThe part number for a consumable, which the instrument scans or a user inputs during consumable loading
PlatformVersionThe version of AVITI OS for the run
PMGMaskA base mask that is used to generate the map of polonies
RecipeExecutionIDA UUID for the run recipe, which governs the stages of a sequencing run
RecipeValuesContains additional values for the recipe that the run uses, such as a filterMask value
RunDescriptionAn optional description for the run
RunFolderNameThe name of the output folder that AVITI OS creates for the run
RunIDA UUID assigned to the run
RunNameA text-based run identifier that is entered by the operator
RunOrderIdentifies the ordinal position for a batch in the run
RunTypeThe type of AVITI24 System run, Cytoprofiling
SerialNumberThe serial number for a consumable, which the instrument scans or a user inputs during consumable loading
SideThe side of the instrument that the run uses, such as SideA or SideB
StorageConnectionIDA UUID for the storage connection that the run uses
TagsThe tags that the Operator applies to the run, as applicable
ThroughputSelectionIdentifies the path for the text file that is used to determine throughput selection for the run
TilesIdentifies a set of tiles on the flow cell for a well
TypeIdentifies the type of batch, such as BarcodingBatch , PreAmpImagingBatch, or PostAmpImagingBatch
WellLayoutIdentifies the well layout, such as 48-well, 12-well, or 1-well
WellsIdentifies a set of information for the wells in the run
WellLocationIdentifies the well for the following Tiles listed
XMillimetersThe well position in millimeters on the X-axis of the image
YMillimetersThe well position in millimeters on the Y-axis of the image
ZPositionsIdentifies the Z-positions that are associated with each batch in the run
ZsIdentifies the order of batches for the run to associate them with Z-positions

Run Uploaded

The RunUploaded.json file indicates the completion of the run. The file contains high-level information about the run and an outcome field that confirms the run outcome. AVITI OS always transfers this file last, which allows its creation to serve as a trigger to start automated downstream analysis.

The following table defines the fields in the RunUploaded.json file:

FieldDescription
versionThe version of the RunUploaded.json file
instrumentThe name of the instrument for the run
instrumentIdA UUID for the instrument
outcomeThe final outcome of the run, such as OutcomeCompleted, OutcomeStopped, or OutcomeFailed
runTypeThe type of AVITI System run, such as Cytoprofiling
recipeExecutionIdA UUID for the run recipe that is sourced from RunParameters.json
runIDA UUID assigned to the run that is sourced from RunParameters.json

Run Analysis Files Uploaded

The RunAnalysisFilesUploaded.json file indicates that the required data to begin downstream analysis was successfully transferred to the output location. This file contains high-level information about the run and an outcome field that confirms the run outcome. AVITI OS always transfers this file only after the required files to begin analysis are confirmed to be transferred. This allows the file creation to trigger the start of automated downstream analysis.