Output Files

Output Files for Sequencing

The following table is a list of Bases2Fastq outputs files for sequencing:

File	Directory	Description
`Bases2Fastq.log`	info	Log file that records software events
`IndexAssignment.csv`	Root	Yield, number, and rate of polonies that are assigned for each sample and index combination
`Metrics.csv`	Root	Mismatch rates, percent assigned, and per sample yield for each lane
`MultiQC_report.html`	Root	MultiQC report on the performance and quality of the sequencing samples
`{ProjectName}_QC.html`	Samples/{ProjectName}	Interactive HTML QC report on the performance and quality of the samples aggregated by project
`{ProjectName}_index_assignment.csv`	Samples/{ProjectName}	Yield, number, and rate of polonies that are assigned for each sample and index combination in a project
`{ProjectName}_metrics.csv`	Samples/{ProjectName}	Mismatch rates, percent assigned, and per sample yield for each lane in a project
`{ProjectName}_RunStats.json`	Samples/{ProjectName}	Information on the performance of samples in a project
`{RunName}_QC.html`	Root	Interactive HTML QC report on run performance and quality for all samples and projects
`RunManifest.csv`	Root	Run manifest for the Bases2Fastq execution
`RunManifest.json`	Root	Machine-readable copy of the run manifest as a JSON file
`RunManifestErrors.json`	info	Record of errors in the run manifest
`RunParameters.json`	Root	Copy of the original run parameters file
`RunStats.json`	Root	Information on run performance
`{SampleName}_{Read}.fastq.gz`	Samples/{SampleName} or Samples/{ProjectName}/{SampleName}	The primary output of Bases2Fastq
`{SampleName}_stats.json`	Samples/{SampleName}	Information on the performance of each sample in the run
`UnassignedSequences.csv`	Root	The most frequent unassigned index sequences with approximate counts¹

¹ ^{Counts indicate how many times an incorrect index sequence appears.}

Output Files for Cytoprofiling

The following table is a list of Bases2Fastq output files for cytoprofiling:

File	Directory	Description
`Bases2Fastq.log`	info	The log file that records software events
`{TargetSite}_{Well}_{Batch}_R1.fastq.gz`	Samples/{Well}	The primary output of Bases2Fastq for targeted runs, triggered by `--per-target-mask`
`{Well}_{Batch}_R1.fastq.gz`	Samples/{Well}	The primary output of Bases2Fastq for untargeted runs
`panel.json`	Root	The panel file for the Bases2Fastq execution
`RunManifest.csv`	Root	The run manifest for the Bases2Fastq execution
`RunManifest.json`	Root	The machine-readable copy of the run manifest as a JSON file
`RunParameters.json`	Root	The copy of the original run parameters file
`TargetCellAssignmentManifest.csv`	Root	The target cell assignment manifest for the Bases2Fastq execution
`TargetCellAssignmentManifest.json`	Root	The machine-readable copy of the target cell assignment manifest as a JSON file

FASTQ Files

A FASTQ file records all genomic data and corresponding Q scores for a sample or cell. FASTQ files are GZIP-compressed text files that Bases2Fastq names {SampleName}_{Read}.fastq.gz for sequencing, and {Well}_{Batch}_R1.fastq.gz and {TargetSite}_{Well}_{Batch}_R1.fastq.gz for untargeted and targeted cytoprofiling runs respectively.

Each entry in a FASTQ file corresponds to one read and includes the following four lines:

A sequence identifier that includes run and polony information
Base calls that are assembled into a sequence comprised of A, C, G, T, and N
A plus sign (+) that separates the sequence from the Q scores
A Q score for each base in the sequence

Note:

If you use a sequencing run manifest with no samples or associated index sequences, then Bases2Fastq assigns all reads to DefaultSample. The software only produces the FASTQ files DefaultSample_R1.fastq.gz and DefaultSample_R2.fastq.gz.

Sequence Identifiers for Sequencing

The sequencing FASTQ sequence identifier includes the components in the following table and is formatted in one line:

@<instrument>:<run name>:<flow cell ID>:<lane>:<tile>:<x-pos>:<y-pos>:UMI <read>:N:0:<index sequence>

Component	Allowed Values	Description
`@`	@	Start to the sequence identifier line
`<instrument>`	Upper and lowercase letters, integers 0–9, and underscores (_)	Instrument name
`<run name>`	Upper and lowercase letters, integers 0–9, hyphens (-), and underscores (_)	Run name that is defined during the run setup
`<flow cell ID>`	Upper and lowercase letters and integers 0–9	Flow Cell ID from the barcode scan. If the barcode scan fails during the run and no barcode is present, then the Run ID replaces the Flow Cell ID.
`<lane>`	1 or 2	Lane number
`<tile>`	An integer	Tile number
`<x_pos>`	A zero-padded integer	X-coordinate of the polony
`<y_pos>`	A zero-padded integer	Y-coordinate of the polony
`<UMI>`	A, C, G, T, and N	UMI sequence with a plus sign that separates the Read 1 and Read 2 sequences, if applicable
`<read>`	1 or 2	Read number
`<is filtered>`	N	A legacy filtering value of N that exists only for backward compatibility and does not change
`<control number>`	0	A legacy control number of 0 that exists only for backward compatibility and does not change
`<index sequence>`	Varies	A value that is based on the indexing strategy that is indicated in the run manifest: No indexing: The sample number Single indexing: The observed index sequence Dual indexing: The observed Index 1 sequence, a plus sign, and the observed Index 2 sequence

Sequence Identifiers for Cytoprofiling

The cytoprofiling FASTQ sequence identifier includes the components in the following table and is formatted in one line:

@<instrument>:<run_name>:<flowcell_id>:<well_index>:<z_tile>:<xloc>:<yloc>:<cell_id>:<nuclear_status>:<batch_id>:<polony_index>:0:N:0:1

Component	Allowed Values	Description
`@`	@	The start to the sequence identifier line
`<instrument>`	Upper and lowercase letters, integers 0–9, dashes (-), and underscores (_)	Instrument name
`<run name>`	Upper and lowercase letters, integers 0–9, and dashes (-)	Run name
`<flow cell ID>`	Upper and lowercase letters, integers 0–9, and dashes (-)	Flow cell ID, `R{Run_ID}`, or `UNKNOWN_FLOWCELL`
`<well_index>`	Numerical	Well ID
`<z_tile>`	Numerical	The tile plus the z slice in the format of `SRRCCZZ` (slice, row, row, col, col, z, z)
`<xloc>`	Numerical, zero-padded	The x-coordinate of the polony
`<yloc>`	Numerical, zero-padded	THe y-coordinate of the polony
`<cell_id>`	Numerical	The unique ID of the cell
`<nuclear_status>`	Boolean	The cellular location of a polony, whether the polony is in the nucleus or not
`<batch_id>`	String	The on-instrument cycling batch
`<read>`	1	Always 1 for cytoprofiling
`<is_filtered>`	N	A legacy filtering value of N that exists only for backward compatibility and does not change
`<control_number>`	0	A legacy control number of N that exists only for backward compatibility and does not change
`<indexsequence>`	1	Always 1 for cytoprofiling runs

Quality Scores

A Q score is based on the Phred scale and indicates the confidence of a base call. A Phred quality score (Q) is logarithmically related to error rate (E): Q = -10log E.

In a FASTQ file, an ASCII code represents the Q score. Bases2Fastq encodes quality scores with a +33 offset (Phred33).

Q Score	ASCII Code	Character	Q Score	ASCII Code	Character	Q Score	ASCII Code	Character
0	33	!	19	52	4	38	71	G
1	34	"	20	53	5	39	72	H
2	35	#	21	54	6	40	73	I
3	36	$	22	55	7	41	74	J
4	37	%	23	56	8	42	75	K
5	38	&	24	57	9	43	76	L
6	39	'	25	58	:	44	77	M
7	40	(	26	59	;	45	78	N
8	41	)	27	60	<	46	79	O
9	42	*	28	61	=	47	80	P
10	43	+	29	62	>	48	81	Q
11	44	,	30	63	?	49	82	R
12	45	-	31	64	@	50	83	S
13	46	.	32	65	A	51	84	T
14	47	/	33	66	B	52	85	U
15	48	0	34	67	C	53	86	V
16	49	1	35	68	D	54	87	W
17	50	2	36	69	E	55	88	X
18	51	3	37	70	F	56	89	Y

HTML QC Reports

HTML QC reports are generated only for sequencing runs and are organized into tabs that display histograms and other charts. The charts visualize index assignment and other quality metrics. If the run manifest includes more than 120 samples, then the report does not display per sample charts.

Bases2Fastq names the QC report for a run {RunName}_QC.html and project-level QC reports {ProjectName}_QC.html.

HTML QC Reports for Individually Addressable Lanes

To generate HTML QC reports for each lane, create projects for each lane in your run manifest. For an example, see the Run Manifest Documentation.

Missing HTML QC Report

If an HTML QC report does not generate on a system that is configured to run the static binary, then complete the following troubleshooting steps:

Make sure that compatible versions of Python and the necessary packages are installed.
To identify the cause, review the error in info/QCReportErrors.txt. Then, use this information to generate the HTML QC report.

MultiQC Reports

MultiQC reports, which are generated by Seqera and designed by Element Biosciences, are available for use with Cells2Stats. MultiQC reports analyze results and statistics from bioinformatics tool outputs, such as log files and console outputs. These reports help summarize experiments that contain multiple samples and multiple analysis steps and are designed to be placed at the end of pipelines or to be run manually when you finish running your tools. Furthermore, MultiQC reports contain the parsed data in a nice friendly format, ready for any further downstream analysis. For more information, see MultiQC Documentation.

Metrics Files

Bases2Fastq reports metrics in different files and formats to support different use cases:

Metrics.csv offers a high-level overview of yield and assignment metrics, per lane and overall.
IndexAssignment.csv summarizes index assignment rates per sample-index pair, per lane, and overall. The project-level index assignment CSV files provide metrics at the level of specific projects.
The JSON metrics files provide aggregate metrics at the run, project, and sample levels with more details than the summary files. The sample-level files also provide metrics at the level of specific occurrences.

Note:

For runs that use Cloudbreak UQ chemistry, output files with metrics only report PercentQ50. For other types of sequencing chemistry, the JSON files report PercentQ50 values of null, and the CSV files report PercentQ50 values of empty.

Run Metrics (RunStats.json)

The run metrics file RunStats.json reports the following performance metrics in a JSON file format. The metrics are specific to the Bases2Fastq execution.

Metric	Value
`AnalysisID`	The unique identifier that Bases2Fastq generates for the analysis
`AnalysisVersion`	The current version of Bases2Fastq
`AssignedYield`	The run yield that is based on assigned reads in gigabases
`FileVersion`	The current version of the file format
`FlowCellID`	A flow cell identifier that is sourced from `RunParameters.json`. If blank, then the letter R followed by the `RunID` value is used.
`I1IsReverseComplement`	The observed orientation of the Index 1 sequences relative to the orientation recorded in the run manifest
`I2IsReverseComplement`	The observed orientation of the Index 2 sequences relative to the orientation recorded in the run manifest
`Lanes`	A detailed list of per lane metrics
`MeanReadLength`	The average read length after adapter trimming
`NumPolonies`	The total number of polonies that are calculated for the run
`NumPoloniesBeforeTrimming`	The total number of polonies that are calculated for the run before adapter trimming
`PercentAssignedReads`	The percentage of reads that are assigned to a sample
`PercentBelowFilterThreshold`	The percentage of all polonies that are below the quality filtering threshold for each cycle
`PercentMismatch`	The percentage of polonies that are assigned to a sample with a mismatch
`PercentMismatchI1`	The percentage of polonies that are assigned to Index 1 sequences with a mismatch
`PercentMismatchI2`	The percentage of polonies that are assigned to Index 2 sequences with a mismatch
`PercentQ30`	The percentage of ≥ Q30 Q scores for the run. This includes assigned and unassigned reads and excludes filtered reads and no calls.
`PercentQ40`	The percentage of ≥ Q40 Q scores for the run. This includes assigned and unassigned reads and excludes filtered reads and no calls.
`PercentQ50`	The percentage of ≥ Q50 Q scores for the run. This includes assigned and unassigned reads and excludes filtered reads and no calls.
`PercentReadsTrimmed`	The percentage of reads that Bases2Fastq trimmed
`PercentUnexpectedIndexPairs`	The percentage of all polonies with Index 1 and Index 2 reads that matched different samples¹
`PerReadMeanQualityScoreHistogram`	The distribution of per-read average quality scores
`QualityScore10thPercentile`	The 10th percentile of quality scores
`QualityScore25thPercentile`	The 25th percentile of quality scores
`QualityScore50thPercentile`	The 50th percentile of quality scores
`QualityScore75thPercentile`	The 75th percentile of quality scores
`QualityScore90thPercentile`	The 90th percentile of quality scores
`QualityScoreHistogram`	A per-base call Q score distribution with integer resolution
`QualityScoreMean`	The average Q score of base calls for a sample and excludes filtered reads and no calls
`RemovedAdapterLengthHistogram`	A histogram that shows the number of bases trimmed from an adapter in a given position
`RunName`	A text-based run identifier that is sourced from `RunParameters.json`
`RunID`	A universally unique identifier (UUID) that is assigned to the run and sourced from `RunParameters.json`
`Samples`	A list of libraries that the run sequenced
`SampleStats`	The per-sample metrics that are listed in the sample metrics files for the run
`TotalYield`	The total yield of all reads in gigabases
`UnassignedSequences`	A list of unassigned index sequences with a count for each unassigned sequence

¹ ^{For demultiplexing to be successful, both index reads must match the same sample.}

Project Metrics ({ProjectName}_RunStats.json)

When a run manifest groups samples by project, Bases2Fastq creates JSON project metrics files. Bases2Fastq names the files {ProjectName}_RunStats.json. The files report the following performance metrics for the samples in the project:

Metric	Value
`AnalysisID`	The unique identifier that Bases2Fastq generates for the analysis
`AnalysisVersion`	The current version of Bases2Fastq
`BaseComposition`	Counts for each A, C, G, T, and N base
`FileVersion`	The current version of the file format
`FlowCellID`	A flow cell identifier that is sourced from `RunParameters.json`. If blank, then the letter R followed by the `RunID` value is used.
`I1IsReverseComplement`	The observed orientation of the Index 1 sequences relative to the orientation recorded in the run manifest
`I2IsReverseComplement`	The observed orientation of the Index 2 sequences relative to the orientation recorded in the run manifest
`Lanes`	A detailed list of per lane metrics
`MeanReadLength`	The average read length after adapter trimming
`NumPolonies`	The total number of polonies that are calculated for the samples in the project
`NumPoloniesBeforeTrimming`	The total number of polonies that are calculated for the samples in the project before adapter trimming
`PercentMismatch`	The percentage of polonies that are assigned to samples with a mismatch in the project
`PercentMismatchI1`	The percentage of polonies that are assigned to Index 1 sequences with a mismatch
`PercentMismatchI2`	The percentage of polonies that are assigned to Index 2 sequences with a mismatch
`PercentQ30`	The percentage of ≥ Q30 Q scores for the project. This includes assigned and unassigned reads and excludes filtered reads and no calls.
`PercentQ40`	The percentage of ≥ Q40 Q scores for the project. This includes assigned and unassigned reads and excludes filtered reads and no calls.
`PercentQ50`	The percentage of ≥ Q50 Q scores for the project. This includes assigned and unassigned reads and excludes filtered reads and no calls.
`PercentReadsTrimmed`	The percentage of reads that Bases2Fastq trimmed
`PerReadGCCountHistogram`	A list of counts: the value at index i is the number of reads with i G/C calls
`Project`	The alphanumeric project identifier
`QualityScore10thPercentile`	The 10th percentile of quality scores
`QualityScore25thPercentile`	The 25th percentile of quality scores
`QualityScore50thPercentile`	The 50th percentile of quality scores
`QualityScore75thPercentile`	The 75th percentile of quality scores
`QualityScore90thPercentile`	The 90th percentile of quality scores
`QualityScoreMean`	The mean Q score of base calls for the samples in the project and excludes filtered reads and no calls
`Reads`	A detailed list of per read metrics
`RemovedAdapterLengthHistogram`	A histogram that shows the number of bases trimmed from an adapter in a given position
`RunName`	A text-based run identifier that is sourced from `RunParameters.json`
`RunID`	A UUID that is assigned to the run and sourced from `RunParameters.json`
`Samples`	A list of libraries that are sequenced for the project
`SampleStats`	The per-sample metrics that are listed in the sample metrics files for the project
`SampleID`	A globally unique sample identifier
`SampleName`	The alphanumeric sample identifier
`SampleNumber`	The numeric sample identifier
`Yield`	The number of bases in the project in gigabases

Sample Metrics ({SampleName}_stats.json)

Sample metrics files report the following sample-specific performance metrics in the JSON file format. Bases2Fastq names the files {SampleName}_stats.json.

Metric	Value
`AnalysisVersion`	The current version of Bases2Fastq
`BaseComposition`	Counts for each A, C, G, T, and N base
`ExternalID`	An external ID that is specified in the run manifest, if applicable
`FileVersion`	The current version of the file format
`MeanReadLength`	The average read length after adapter trimming
`NumPolonies`	The total number of polonies that are assigned to the sample
`NumPoloniesBeforeTrimming`	The number of polonies that are assigned to a sample before adapter trimming
`Occurrences`	Additional information per occurrence of the sample
`PercentMismatch`	The percentage of polonies that are assigned to a sample with mismatch
`PercentQ30`	The percentage of ≥ Q30 Q scores for the sample. This includes assigned reads and excludes filtered reads and no calls.
`PercentQ40`	The percentage of ≥ Q40 Q scores for the sample. This includes assigned reads and excludes filtered reads and no calls.
`PercentQ50`	The percentage of ≥ Q50 Q scores for the sample. This includes assigned reads and excludes filtered reads and no calls.
`PercentReadsTrimmed`	The percentage of reads that Bases2Fastq trimmed
`PerReadGCCountHistogram`	A list of counts: the value at index i is the number of reads with i G/C calls
`QualityScoreMean`	The mean Q score of base calls for the sample and excludes filtered reads and no calls
`RemovedAdapterLengthHistogram`	A histogram that shows the number of bases trimmed from an adapter in a given position
`RunName`	A text-based run identifier that is sourced from `RunParameters.json`
`RunID`	A UUID that is assigned to the run and sourced from `RunParameters.json`
`SampleID`	A globally unique sample identifier
`SampleName`	The alphanumeric sample identifier
`SampleNumber`	The numeric sample identifier
`Yield`	The number of bases in the sample in gigabases

Occurrences

Occurrences are a set of fields in a sample metrics file that allocate sample performance metrics by specific occurrences of a sample in the run. For example, if a sample appears in both lanes, then Bases2Fastq lists an occurrence for each lane.

Each occurrence includes the identifiers Lane and Expected Sequence, and reports the following performance metrics:

Metric	Value
`BaseComposition`	Counts for each A, C, G, T, and N base
`CustomMetadata`	Custom metadata that is specified in the run manifest, if applicable
`MeanReadLength`	The average read length after adapter trimming
`NumPolonies`	The total number of polonies that are assigned to the sample
`NumPoloniesBeforeTrimming`	The number of polonies that are assigned to a sample before adapter trimming
`Occurrences`	The average read length after adapter trimming
`PercentMismatch`	The percentage of polonies that are assigned to a sample with mismatch
`PercentQ30`	The percentage of ≥ Q30 Q scores for the specific occurrence of a sample. This includes assigned reads and excludes filtered reads and no calls.
`PercentQ40`	The percentage of ≥ Q40 Q scores for the specific occurrence of a sample. This includes assigned reads and excludes filtered reads and no calls.
`PercentQ50`	The percentage of ≥ Q50 Q scores for the specific occurrence of a sample. This includes assigned reads and excludes filtered reads and no calls.
`PercentReadsTrimmed`	The percentage of reads that Bases2Fastq trimmed
`PerReadGCCountHistogram`	A list of counts: the value at index i is the number of reads with i G/C calls
`QualityScoreMean`	The mean Q score of base calls for the sample and excludes filtered reads and no calls
`R1Adapters`	The Read 1 adapter sequences that are associated with the lane that the occurrence belongs to
`R2Adapters`	The Read 2 adapter sequences that are associated with the lane that the occurrence belongs to
`RemovedAdapterLengthHistogram`	A histogram that shows the number of bases trimmed from an adapter in a given position
`Yield`	The number of bases in the sample in gigabases

Q Score	ASCII Code	Character	Q Score	ASCII Code	Character	Q Score	ASCII Code	Character
0	33	!	19	52	4	38	71	G
1	34	"	20	53	5	39	72	H
2	35	#	21	54	6	40	73	I
3	36	$	22	55	7	41	74	J
4	37	%	23	56	8	42	75	K
5	38	&	24	57	9	43	76	L
6	39	'	25	58	:	44	77	M
7	40	(	26	59	;	45	78	N
8	41	)	27	60	<	46	79	O
9	42	*	28	61	=	47	80	P
10	43	+	29	62	>	48	81	Q
11	44	,	30	63	?	49	82	R
12	45	-	31	64	@	50	83	S
13	46	.	32	65	A	51	84	T
14	47	/	33	66	B	52	85	U
15	48	0	34	67	C	53	86	V
16	49	1	35	68	D	54	87	W
17	50	2	36	69	E	55	88	X
18	51	3	37	70	F	56	89	Y

Q Score	ASCII Code	Character	Q Score	ASCII Code	Character	Q Score	ASCII Code	Character
0	33	!	19	52	4	38	71	G
1	34	"	20	53	5	39	72	H
2	35	#	21	54	6	40	73	I
3	36	$	22	55	7	41	74	J
4	37	%	23	56	8	42	75	K
5	38	&	24	57	9	43	76	L
6	39	'	25	58	:	44	77	M
7	40	(	26	59	;	45	78	N
8	41	)	27	60	<	46	79	O
9	42	*	28	61	=	47	80	P
10	43	+	29	62	>	48	81	Q
11	44	,	30	63	?	49	82	R
12	45	-	31	64	@	50	83	S
13	46	.	32	65	A	51	84	T
14	47	/	33	66	B	52	85	U
15	48	0	34	67	C	53	86	V
16	49	1	35	68	D	54	87	W
17	50	2	36	69	E	55	88	X
18	51	3	37	70	F	56	89	Y

Output Files for Sequencing​

Output Files for Cytoprofiling​

FASTQ Files​

Sequence Identifiers for Sequencing​

Sequence Identifiers for Cytoprofiling​

Quality Scores​

HTML QC Reports​

HTML QC Reports for Individually Addressable Lanes​

Missing HTML QC Report​

MultiQC Reports​

Metrics Files​

Run Metrics (RunStats.json)​

Project Metrics ({ProjectName}_RunStats.json)​

Sample Metrics ({SampleName}_stats.json)​

Occurrences​