Cytoprofiling Run Output Files
The following is an example run output directory of a successful cytoprofiling run:
example-storage-location
└── DemoInstrumentName
└── 20240506_DemoInstrumentName_ExampleRunName
├── BaseCalling
│ ├── BaseCalls
│ │ └── BXX_CXXX.zip ... (for n cycles)
│ ├── CellXform
│ │ └── BXX ... (for n batches)
│ │ └── LXRXXCXXS1.xform ... (for n tiles)
│ ├── Location
│ │ └── BXX ... (for n batches)
│ │ └── LXRXXCXXS1ZXX.loc ... (for n tiles)
│ └── Filter
│ └── BXX ... (for n batches)
│ └── LXRXXCXXS1ZXX.filter ... (for n tiles)
├── Cell Segmentation
│ └── WellXX ... (for n wells)
│ ├── LXRXXCXXS1_Cell.tif ... (for n tiles)
│ └── LXRXXCXXS1_Nuclear.tif ... (for n tiles)
├── Thumbnails
│ └── WellXX_Thumbnail.png ... (for n wells)
├── Projection
│ └── WellXX ... (for n wells)
│ └──BXX_LXRXXCXXS1_Target.tif ... (for each target and n tiles)
├── Cytoprofiling
│ ├── Instrument
│ │ ├── AverageNormWellStats.csv
│ │ ├── RawCellStats.csv
│ │ ├── RawCellStats.parquet
│ │ ├── RunStats.json
│ │ ├── Versions.json
│ │ └── Wells
│ │ └── WellXX ... (for n wells)
│ │ └── BXX ... (for n batches)
│ │ └── LXRXXCXXS1_barcodes.parquet ... (for n tiles)
├── RunManifest.csv
├── RunManifest.json
├── RunParameters.json
├── Panel.json
├── RunStats.bin
├── RunAnalysisFilesUploaded.json
└── RunUploaded.json
Run Output Files
The following table defines the key cytoprofiling run output files from an AVIT24 System. Parquet files are column-based files that efficiently store data. For more information, see the Apache Parquet Documentation.
Directory and File Name | File Format | Description | Quantity |
---|---|---|---|
{root}/BaseCalling/BaseCalls/{batch}_C{cycle}.zip | Binary | Reports the raw cytoprofiling base call data | One per batch per cycle |
{root}/BaseCalling/Filter/{batch}/{tile}.filter | Binary | For use with future applications | One per tile per batch |
{root}/BaseCalling/CellXform/{batch}/{tile}.xform | Binary | Transforms polony locations onto cells | One per tile per batch |
{root}/BaseCalling/Location/{batch}/{tile}.loc | Binary | Identifies polony locations on the flow cell | One per tile per batch |
{root}/BaseCalling/RunStats.bin | Log | Internal troubleshooting log of offline runs | One per run |
{root}/CellSegmentation/{well}/{tile}_Cell.tif | TIF | Cell segmentation masks for a well, where the value for a pixel in a cell is the cell ID | One per tile per well |
{root}/CellSegmentation/{well}/{tile}_Nuclear.tif | TIF | Nuclear segmentation masks for a well, where the value for a pixel in a nucleus is 1 | One per tile per well |
{root}/Cytoprofiling/Instrument/AverageNormWellStats.csv | CSV | Reports filtered and average metrics for each well in the run | One per run |
{root}/Cytoprofiling/Instrument/Versions.json | JSON | Reports the version number for CSV output files and bundled software programs | One per run |
{root}/Cytoprofiling/Instrument/RawCellStats.csv | CSV | Reports values per cell for all morphology features and raw target counts in a run | One per run |
{root}/Cytoprofiling/Instrument/RawCellStats.parquet | Parquet | R values per cell for all morphology features and raw target counts in a run | One per run |
{root}/Cytoprofiling/Instrument/RunStats.json | JSON | Reports run metrics | One per run |
{root}/Cytoprofiling/Instrument/Wells/ {well}/{batch}/{tile}_barcodes.parquet | Parquet | Barcoding information for each polony in a tile | One per tile per batch per well |
{root}/Panel.json | JSON | Records target detection information for the run | One per run |
{root}/Projection/{well}/{batch}_{tile}_{target}.tif | TIF | Z-projected images of cell paint targets | One per target per tile |
{root}/QC_S12_Avid/B02/C001/L1RXXCXXSXZ00_GRN_F4.tif | TIF | Internal support and run diagnostics files | One per tile per run |
{root}/RunManifest.csv . | CSV | Manifest that records biological sample information and well mapping | One per run |
{root}/RunManifest.json | JSON | Version of the run manifest that is reserved for Element processes | One per run |
{root}/RunParameters.json | JSON | Records information about the run configuration | One per run |
{root}/RunAnalysisFilesUploaded.json | JSON | Written after the last analysis file is transferred and post-run analysis may begin | One per run |
{root}/RunUploaded.json | JSON | The last file transferred and marks run completion | One per run |
{root}/Thumbnails/{well}_Thumbnail.png | PNG | Thumbnail image for a well | One per well |
Metrics
The run output files contain a variety of metrics, such as tile-specific and average metrics.
RunStats.json
reports statistics and metrics for a run.RawCellStats.csv
andRawCellStats.parquet
contain a full set of morphology and quantification metrics for each target and batch.AverageNormWellStats.csv
provides metric averages for each well. Metrics that end with.std
provide the standard deviation for the metric.
RunStats File
The following table defines the fields that are listed in the RunStats.json
file:
Field | Description |
---|---|
AnalysisID | The identifier for the analysis that is generated by Cells2Parquet and assigned by the analysis software |
AnalysisVersion | The Cells2Stats or Molecule software version |
AssignedCountPerMM2 | The target count per cell area mm² that is reported for each control target in the panel |
AssignedCountsPerMM2 | The target counts per cell area mm² of all barcoding targets |
AverageAssignedCountsPerMM2 | The target counts per cell area mm² of all barcoding targets, averaged across wells |
Batches | Identifies a set of information for each batch |
BatchName | The batch name (string) |
CellCount | The number of cells that are identifid by cell segmentation (integer) |
ControlType | The type of control target (for example, Negative Control 1) |
Count | The number of polonies that had a specific sequence basecalled that did not match any target's expected sequence within the allowable mismatch threshold (integer) |
DemuxStats | The statistics that are related to the demultiplexing process |
ExpectedSequence | The expected barcode sequence for a target (string) |
FileVersion | The file format version such as 1.2.0 (string) |
FlowCellID | The flow cell identifier (string) |
MeanAssignedCountPerCell | The mean number of polonies that are assigned to a target sequence within a cell, across all cells (float) |
MedianCellDiameter | The median diameter of cells after cell segmentation (float) |
NumPolonies | The number of polonies (integer) |
PercentAssignedReads | The percentage of reads that are assigned to a target (float) |
PercentConfluency | The percent of the flow cell culture area that is covered by cells (float) |
PercentMismatch | The percentage of assigned reads that have a mismatch of 1 or 2 bases in their base calls relative to the reference (float) |
PercentNucleatedCells | The percentage of cells that have an identified nucleus |
RunID | The run ID (string) |
RunName | The run name (string) |
Targets | Identifies a set of information for each target |
TargetName | The target name (string) |
Sequence | The sequence that was basecalled but remained unassigned for a polony because it did not match an expected sequence for any target in the panel (string) |
UnassignedSequences | Identifies a set of information for sequences that were not assigned |
WellLocation | The location ID of a well. A1, A2, B1, B2, C1, C2, D1, D2, E1, E2, F1, and F2 are valid values |
Wells | Identifies a set of information for each well that was used |
RawCellStats Files
The following table defines the fixed fields that are listed in the RawCellStats.csv
and RawCellStats.parquet
files and are the same for every run:
Field | Description |
---|---|
Area | The area of the cell in pixels |
AreaUm | The area of the cell in microns (μm²), which is converted from pixels |
Cell | The unique ID of the cell within the run. The cell ID is equal to the sum of (tile_index * $2^{16}$) and the tile-specific local cell ID. This allows cell IDs to be computed independently per tile and ensures that the tile-specific cell ID can be retrieved from the cell ID |
NuclearArea | The area of the nucleus in pixels |
NuclearAreaUm | The area of the nucleus in microns (μm²), which is converted from pixels |
Tile | The specific imaging tile or subregion that the cell was imaged from |
Well | The position of the well in the plate. This is calculated from the row and column number (for example, A1) |
WellLabel | The label of the well, based on user input from the run manifest. If there is no user input, then this is a duplicate of the Well column |
X | The X-coordinate of the cell in pixels, within the tile or image. This is the distance from the tile origin |
Xum | The X-coordinate of the cell in microns (μm), which is converted from pixels. This is the distance from the tile origin |
Y | The Y-coordinate of the cell in pixels, within the tile or image. This is the distance from the tile origin |
Yum | The Y-coordinate of the cell in microns (μm), which is converted from pixels. This is the distance from the tile origin |
The following table defines the variable fields that are listed in the RawCellStats.csv
and RawCellStats.parquet
files and are based on the barcoding and cell paint targets within a run:
Field | Description |
---|---|
Cellular Counts for Barcoding Targets | The total cellular counts, with nuclear counts included, of each barcoding target. The column name is based on the target and batch and is shown in the format of {target.batch} (for example, ATF2KT1.BO1 ). |
Cellular Intensity for Cell Paint Targets | The background-subtracted sum of cellular intensity for each cell paint target (for example, Mitochondria.CP02 ). |
Morphology Metrics for Cell Paint Targets | The morphology metric output for each cell paint target. The column name is based on the metric, target, and batch and is shown in the format of metric_target.batch (for example, Intensity_MeanIntensityEdge_Mitochondria.CP02 ). Metrics come from the following CellProfiler modules:
Certain metrics are not available in these files. For example, the output files do not report Zernike metrics. In some files, columns for Z-axis metrics appear with values of 0 . Z-axis metrics are not available in the RawCellStats.csv and RawCellStats.parquet files because they are not relevant to the analysis output. For more information on morphology metrics, see the CellProfiler Manual measurement information. |
Nuclear Counts for Barcoding Targets | The nuclear counts of each barcoding target. The column name is based on the target and batch and is shown in the format of {target_Nuclear.batch} (for example, AKT1_Nuclear.BO1 ). |
Nuclear Intensity for Cell Paint Targets | The background-subtracted sum of nuclear intensity for each cell paint target (for example, Mitochondria_Nuclear.CP02 ). |
Barcodes Parquet Files
The {tile}_barcodes.parquet
files provide the raw barcode data for each tile in a batch. The files indicate the location of targets in different cells.
The following table defines the columns in the parquet files.
Field | Description | Data Type |
---|---|---|
BarcodeIndex | A barcode ID number that corresponds to the order of targets for a batch as listed in the Panel.json file. A value of 0 indicates an unassigned barcode. | Int16 |
Cell | A tile-specific ID that is associated with a cell. The Cell ID in barcode parquet files differs from the Cell ID for the run in other parquet files. | Int16 |
IsNuclear | Indicates whether the barcode is in the nucleus of the cell | Boolean |
X | The position of the barcode on the X-axis of the tile | UInt16 |
X μm | The X position in microns | Float |
Y | The position of the barcode on the Y-axis of the tile | UInt16 |
Y μm | The Y position in microns | Float |
Z | The position of the barcode on the Z-axis of the tile | String |
Z μm | The Z position in microns | Float |
Panel File
The Panel.json
file contains target information for each batch in the run. Each section of the file defines information for batches and targets, such as ImagingPrimerTubes
, BarcodingPrimerTubes
, ImagingTargets
, and BarcodingTargets
.
The following tables describe the Panel.json
file information:
Field | Description |
---|---|
AnalysisSettings | Identifies an array of settings for the run analysis |
Application | The application type for the run, such as Counting |
CellBoundaryTarget | Identifies the target that is used to generate cell segmentation masks |
FileVersion | The Panel.json file version for the run |
ID | A unique ID that is assigned to the panel |
KitType | Specifies the type of reagent or assay kit that is used for the run. |
Name | The name of the cytoprofiling kit that is associated with the panel |
NuclearTarget | Identifies the target that is used to generate nuclear segmentation masks |
PanelCartridgePartNumbers | Identifies a set of information that lists the cartridge part numbers for the panel |
SpikeInId | The ID of your custom add-on protein panel JSON |
SpikeInName | The name of your custom add-on protein panel JSON |
SupplementaryCellBoundaryTarget | Identifies an optional additional target that is used to generate cell segmentation masks |
Primer Tube Sections
The following table defines information in the ImagingPrimerTubes
and BarcodingPrimerTubes
sections of the Panel.json
file. These sections provide information about the settings for different batches.
Field | Description |
---|---|
BarcodeMask | The mask that is used to support barcoding for target analysis |
BarcodingPrimerTubes | Identifies a set of information about the settings for batches with targets for analysis |
BatchName | The name of a specific batch |
DefaultMismatch | The number of base mismatches that are permitted to assign a barcode to a target, which is typically 2 |
ImagingPrimerTubes | Identifies a set of information about the settings for batches related to cell paint |
MinCycles | The minimum number of cycles for a specific batch |
PMGMask | A base mask that is used to generate the map of polonies |
RunOrder | Identifies the ordinal position for a batch in the run |
Type | The type of batch relative to amplification, PreAmp or PostAmp |
Target Sections
The following table defines information in the ImagingTargets
and BarcodingTargets
sections of the Panel.json
file. The sections provide information about the targets in each batch.
Field | Description |
---|---|
Analyses | Identifies an array with the types of analysis for a batch |
Barcode | The barcode of bases for a specific target in a batch |
BarcodingTargets | Identifies the set of information for targets in each batch for analysis |
Base | The base for a particular cycle in a batch |
BatchName | The name for a specific batch |
ControlType | Identifies the type of control for a target in a batch, if applicable |
Cycle | The cycle that a particular base uses in a batch |
CycleBases | Identifies a set of information that defines the base in a specific batch cycle |
ImagingTargets | Identifies the set of information for targets in each batch that is related to amplification |
ProbeConcentration | The concentration of the probe for a specific target in a batch |
Target | The name of a target for a batch |
TargetType | The type of target for a batch as CellPaint , Protein or Transcript |
Run Parameters
The RunParameters.json
file contains a record of the input information for a run. When you evaluate run performance or troubleshoot, review this file to confirm that the correct parameters were used for the run.
The following table describes the information in the RunParameters.json
file:
Field | Description |
---|---|
AdvancedSettings | Includes information about advanced run settings, such as custom recipes |
AnalysisLanes | The lanes that the run uses, such as 1 ,2 , or 1+2 |
ApplicationName | The type of application for the run, such as Counting |
BarcodeStr | The barcode number for a consumable, which the instrument scans and identifies during consumable loading |
BaseForChannels | Identifies the set of bases for the channels |
Batches | Identifies the set of batches for the run |
Buffer | Identifies a set of information that describes the buffer for the run |
Channels | Identifies the colors to associate with channels |
ColorForChannels | Identifies the channels to associate with bases and colors |
Consumables | Identifies the set of information for run consumables |
CustomRecipeName | The name of the custom recipe file that is uploaded in Advanced Run Settings |
Cycles | Identifies the number of cycles in a batch |
CycleFormat | Defines the format in which cycles appear |
Date | Timestamp for the run |
Expiration | The expiration timestamp for a consumable, which the instrument identifies or a user inputs during consumable loading |
ExpirationStr | An expiration ID number for a consumable that is associated with the expiration date |
FileVersion | The version of the RunParameters.json file for the run |
Flowcell | Identifies a set of information that describes the flow cell for the run |
ImageHeight | The height of the image in pixels |
ImageInfo | Identifies the set of information that defines characteristics of the cytoprofiling image |
ImageWidth | The width of the image in pixels |
Lanes | Defines an array that lists the lanes of the flow cell for the run |
InstrumentName | The name of the instrument for the run |
LotNumber | The lot number for a consumable, which the instrument identifies during consumable loading before the run |
Name | The name of a tile in a set of Tiles |
OperatorName | The name of the person that set up the run on the instrument |
PanelCartridge | Identifies a set of information that describes the sequencing cartridge for the run |
PanelName | The name of the panel that is used for the run |
PartNumber | The part number for a consumable, which the instrument scans or a user inputs during consumable loading |
PlatformVersion | The version of AVITI OS for the run |
PMGMask | A base mask that is used to generate the map of polonies |
RecipeExecutionID | A UUID for the run recipe, which governs the stages of a sequencing run |
RecipeValues | Contains additional values for the recipe that the run uses, such as a filterMask value |
RunDescription | An optional description for the run |
RunFolderName | The name of the output folder that AVITI OS creates for the run |
RunID | A UUID assigned to the run |
RunName | A text-based run identifier that is entered by the operator |
RunOrder | Identifies the ordinal position for a batch in the run |
RunType | The type of AVITI24 System run, Cytoprofiling |
SerialNumber | The serial number for a consumable, which the instrument scans or a user inputs during consumable loading |
Side | The side of the instrument that the run uses, such as SideA or SideB |
StorageConnectionID | A UUID for the storage connection that the run uses |
Tags | The tags that the Operator applies to the run, as applicable |
ThroughputSelection | Identifies the path for the text file that is used to determine throughput selection for the run |
Tiles | Identifies a set of tiles on the flow cell for a well |
Type | Identifies the type of batch, such as BarcodingBatch , PreAmpImagingBatch , or PostAmpImagingBatch |
WellLayout | Identifies the well layout, such as 48-well, 12-well, or 1-well |
Wells | Identifies a set of information for the wells in the run |
WellLocation | Identifies the well for the following Tiles listed |
XMillimeters | The well position in millimeters on the X-axis of the image |
YMillimeters | The well position in millimeters on the Y-axis of the image |
ZPositions | Identifies the Z-positions that are associated with each batch in the run |
Zs | Identifies the order of batches for the run to associate them with Z-positions |
Run Uploaded
The RunUploaded.json
file indicates the completion of the run. The file contains high-level information about the run and an outcome
field that confirms the run outcome. AVITI OS always transfers this file last, which allows its creation to serve as a trigger to start automated downstream analysis.
The following table defines the fields in the RunUploaded.json
file:
Field | Description |
---|---|
version | The version of the RunUploaded.json file |
instrument | The name of the instrument for the run |
instrumentId | A UUID for the instrument |
outcome | The final outcome of the run, such as OutcomeCompleted , OutcomeStopped , or OutcomeFailed |
runType | The type of AVITI System run, such as Cytoprofiling |
recipeExecutionId | A UUID for the run recipe that is sourced from RunParameters.json |
runID | A UUID assigned to the run that is sourced from RunParameters.json |
Run Analysis Files Uploaded
The RunAnalysisFilesUploaded.json
file indicates that the required data to begin downstream analysis was successfully transferred to the output location. This file contains high-level information about the run and an outcome
field that confirms the run outcome. AVITI OS always transfers this file only after the required files to begin analysis are confirmed to be transferred. This allows the file creation to trigger the start of automated downstream analysis.