Skip to main content

Output Files

The following table lists the files that Bases2Fastq outputs.

FileDirectoryDescription
Bases2Fastq.loginfoLog file that records software events
IndexAssignment.csvRootYield and the number and rate of polonies assigned for each sample and index combination
Metrics.csvRootThe mismatch rates, percent assigned, and per sample yield for each lane
{ProjectName}_QC.htmlSamples/{ProjectName}Interactive HTML QC report on the performance and quality of the samples aggregated by project
{ProjectName}_index_assignment.csvSamples/{ProjectName}Yield and the number and rate of polonies assigned for each sample and index combination in a project
{ProjectName}_metrics.csvSamples/{ProjectName}The mismatch rates, percent assigned, and per sample yield for each lane in a project
{ProjectName}_RunStats.jsonSamples/{ProjectName}Information on the performance of samples in a project
{RunName}_QC.htmlRootInteractive HTML QC report on run performance and quality for all samples and projects
RunManifest.csvRootThe AVITI OS- or user-created run manifest
RunManifest.jsonRootMachine-readable copy of the run manifest as a JSON file
RunManifestErrors.jsoninfoA record of errors in the run manifest
RunParameters.jsonRootA copy of the original run parameters file
RunStats.jsonRootInformation on run performance
{SampleName}_{Read}.fastq.gzSamples/{SampleName} or
Samples/{ProjectName}/{SampleName}
The primary output of Bases2Fastq
{SampleName}_stats.jsonSamples/{SampleName}Information on the performance of each sample in the run
UnassignedSequences.csvRootThe most frequent unassigned index sequences with approximate counts1

1 Counts indicate how many times an incorrect index sequence appears.

HTML QC Reports

The HTML QC reports open in a browser so you can move through various tabs. The tabs display histograms and other charts that visualize index assignment and other quality metrics. Bases2Fastq names the QC report for a run per the convention {RunName}_QC.html. Project-level QC reports follow the convention {ProjectName}_QC.html.

If the run manifest includes more than 96 samples, the report does not display per sample charts.

HTML QC Reports for Individually Addressable Lanes

If you are using the Individually Addressable Lanes add-on and want HTML QC reports for each lane, create projects for each lane in your run manifest. For an example, see the Run Manifest Documentation.

Missing HTML QC Report

If an HTML QC report does not generate on a system configured for static binary, complete the following troubleshooting steps.

  1. If you are using the static binary executable, make sure compatible versions of Python and the necessary packages are installed.
  2. Review the error in info/QCReportErrors.txt for the cause, and then use this information to generate the HTML QC report.

FASTQ Files

A FASTQ file records all genomic data and corresponding Q-scores for a sample. FASTQ files are GZIP compressed text files named per the convention {SampleName}_{Read}.fastq.gz.

Each entry in a FASTQ file corresponds to one read and includes the following four lines:

  • A sequence identifier that includes run and polony information
  • Base calls assembled into a sequence comprised of A, C, G, T, and N
  • A plus sign (+) that separates the sequence from the Q-scores
  • A Q-score for each base in the sequence

Sequence Identifiers

A sequence identifier includes the components described in the following table, formatted in one line:

@<instrument>:<run name>:<flow cell ID>:<lane>:<tile>:<x-pos>:<y-pos>:UMI <read>:N:0:<index sequence>

FileDirectoryDescription
@@Start to the sequence identifier line
<instrument>Upper and lowercase letters, integers 0–9, and underscores (_)Instrument name
<run name>Upper and lowercase letters, integers 0–9, hyphens (-), and underscoresRun name as defined during run setup
<flow cell ID>Upper and lowercase letters and integers 0–9Flow Cell ID from the barcode scan. If no barcode is present, the Run ID replaces the Flow Cell ID.
<lane>1 or 2Lane number
<tile>An integerTile number
<x_pos>A zero-padded integerX-coordinate of the polony
<y_pos>A zero-padded integerY-coordinate of the polony
<UMI>A, C, G, T, and NUMI sequence with a plus sign separating the Read 1 and Read 2 sequences, if applicable
<read>1 or 2Read number
<is filtered>NA legacy filtering value of N. The value exists only for backwards compatibility and does not change.
<control number>0A legacy control number of 0. The value exists only for backwards compatibility and does not change.
<index sequence>VariesA value that depends on the indexing strategy indicated in the run manifest:
  • No indexing: The sample number
  • Single indexing: The observed index sequence
  • Dual indexing: The observed Index 1 sequence, a plus sign, and the observed Index 2 sequence

Quality Scores

A Q-score indicates the confidence of a base call based on the Phred scale. A Phred quality score (Q) is logarithmically related to error rate (E): Q = -10log E.

In a FASTQ file, an ASCII code represents the Q-score. Bases2Fastq encodes quality scores with a +33 offset (Phred33).

Q-ScoreASCII CodeCharacterQ-ScoreASCII CodeCharacterQ-ScoreASCII CodeCharacter
033!175023467C
134"185133568D
235#195243669E
336$205353770F
437%215463871G
538&225573972H
639'235684073I
740(245794174J
841)2558:4275K
942*2659;4376L
1043+2760<4477M
1144,2861=4578N
1245-2962>4679O
1346.3063?4780P
1447/3164@4881Q
154803265A4982R
164913366B5083S

Run Metrics (RunStats.json)

A run metrics file, RunStats.json, reports the following performance metrics in a JSON file format. The metrics are specific to the Bases2Fastq execution.

MetricValue
AnalysisIDThe unique, Bases2Fastq-generated identifier for the analysis
AnalysisVersionThe current version of Bases2Fastq
AssignedYieldThe run yield based on assigned reads in gigabases
FileVersionThe current version of the file format
FlowCellIDA flow cell identifier sourced from RunParameters.json or, if blank, the letter R followed by the RunID value
I1IsReverseComplementThe observed orientation of the Index 1 sequences relative to the orientation recorded in the run manifest
I2IsReverseComplementThe observed orientation of the Index 2 sequences relative to the orientation recorded in the run manifest
LanesA detailed list of per lane metrics
MeanReadLengthThe average read length after adapter trimming
NumPoloniesThe total number of polonies calculated for the run
NumPoloniesBeforeTrimmingThe total number of polonies calculated for the run before adapter trimming
PercentAssignedReadsThe percentage of reads assigned to a sample
PercentMismatchThe percentage of polonies assigned to a sample with a mismatch
PercentMismatchI1The percentage of polonies assigned to Index 1 sequences with a mismatch
PercentMismatchI2The percentage of polonies assigned to Index 2 sequences with a mismatch
PercentQ30The percentage of ≥ Q30 Q-scores for the run, including assigned and unassigned reads
PercentQ40The percentage of ≥ Q40 Q-scores for the run, including assigned and unassigned reads
PercentReadsTrimmedThe percentage of reads that Bases2Fastq trimmed
PercentUnexpectedIndexPairsThe percentage of all polonies with Index 1 and Index 2 reads that matched different samples1
PerReadMeanQualityScoreHistogramThe distribution of per-read average quality scores
QualityScore10thPercentileThe 10th percentile of quality scores
QualityScore25thPercentileThe 25th percentile of quality scores
QualityScore50thPercentileThe 50th percentile of quality scores
QualityScore75thPercentileThe 75th percentile of quality scores
QualityScore90thPercentileThe 90th percentile of quality scores
QualityScoreHistogramA per-base call Q-score distribution with integer resolution
QualityScoreMeanThe average Q-score of base calls for a sample
RemovedAdapterLengthHistogramA histogram showing the number of bases trimmed from an adapter in a given position
RunNameA text-based run identifier sourced from RunParameters.json
RunIDA universally unique identifier (UUID) assigned to the run and sourced from RunParameters.json
SamplesA list of libraries the run sequenced
SampleStatsThe per-sample metrics listed in the sample metrics files for the run
TotalYieldThe total yield of all reads in gigabases
UnassignedSequencesA list of unassigned index sequences with a count for each unassigned sequence

1 For demultiplexing to be successful, both index reads must match the same sample.

Project Metrics ({ProjectName}_RunStats.json)

When a run manifest groups samples by project, Bases2Fastq creates JSON project metrics files. Bases2Fastq names the files per the convention {ProjectName}_RunStats.json. The files report the following performance metrics for the samples in the project.

MetricValue
AnalysisVersionThe current version of Bases2Fastq
AnalysisIDThe unique, Bases2Fastq-generated identifier for the analysis
BaseCompositionCounts for each A, C, G, T, and N base
FileVersionThe current version of the file format
FlowCellIDA flow cell identifier sourced from RunParameters.json or, if blank, the letter R followed by the RunID value
I1IsReverseComplementThe observed orientation of the Index 1 sequences relative to the orientation recorded in the run manifest
I2IsReverseComplementThe observed orientation of the Index 2 sequences relative to the orientation recorded in the run manifest
LanesA detailed list of per lane metrics
MeanReadLengthThe average read length after adapter trimming
NumPoloniesThe total number of polonies calculated for the samples in the project
NumPoloniesBeforeTrimmingThe total number of polonies calculated for the samples in the project before adapter trimming
PercentMismatchThe percentage of polonies assigned to samples with a mismatch in the project
PercentMismatchI1The percentage of polonies assigned to Index 1 sequences with a mismatch
PercentMismatchI2The percentage of polonies assigned to Index 2 sequences with a mismatch
PercentQ30The percentage of ≥ Q30 Q-scores for the project, including assigned and unassigned reads
PercentQ40The percentage of ≥ Q40 Q-scores for the project, including assigned and unassigned reads
PercentReadsTrimmedThe percentage of reads that Bases2Fastq trimmed
PerReadGCCountHistogramA list of counts: the value at index i is the number of reads with i G/C calls
ProjectThe alphanumeric project identifier
QualityScore10thPercentileThe 10th percentile of quality scores
QualityScore25thPercentileThe 25th percentile of quality scores
QualityScore50thPercentileThe 50th percentile of quality scores
QualityScore75thPercentileThe 75th percentile of quality scores
QualityScore90thPercentileThe 90th percentile of quality scores
QualityScoreMeanThe mean Q-score of base calls for the samples in the project
ReadsA detailed list of per read metrics
RemovedAdapterLengthHistogramA histogram showing the number of bases trimmed from an adapter in a given position
RunNameA text-based run identifier sourced from RunParameters.json
RunIDA UUID assigned to the run and sourced from RunParameters.json
SamplesA list of libraries sequenced for the project
SampleStatsThe per-sample metrics listed in the sample metrics files for the project
SampleIDA globally unique sample identifier
SampleNameThe alphanumeric sample identifier
SampleNumberThe numeric sample identifier
YieldThe number of bases in the project in gigabases

Sample Metrics ({SampleName}_stats.json)

A sample metrics file reports the following sample-specific performance metrics in a JSON file format. Bases2Fastq names the file per the convention {SampleName}_stats.json.

MetricValue
AnalysisVersionThe current version of Bases2Fastq
BaseCompositionCounts for each A, C, G, T, and N base
ExternalIDAn external ID specified in the run manifest, if applicable
FileVersionThe current version of the file format
MeanReadLengthThe average read length after adapter trimming
NumPoloniesThe total number of polonies assigned to the sample
NumPoloniesBeforeTrimmingThe number of polonies assigned to a sample before adapter trimming
OccurrencesAdditional information per occurrence of the sample
PercentMismatchThe percentage of polonies assigned to a sample with mismatch
PercentQ30The percentage of ≥ Q30 Q-scores for the sample
PercentQ40The percentage of ≥ Q40 Q-scores for the sample
PercentReadsTrimmedThe percentage of reads that Bases2Fastq trimmed
PerReadGCCountHistogramA list of counts: the value at index i is the number of reads with i G/C calls
QualityScoreMeanThe mean Q-score of base calls for the sample
RemovedAdapterLengthHistogramA histogram showing the number of bases trimmed from an adapter in a given position
RunNameA text-based run identifier sourced from RunParameters.json
RunIDA UUID assigned to the run and sourced from RunParameters.json
SampleIDA globally unique sample identifier
SampleNameThe alphanumeric sample identifier
SampleNumberThe numeric sample identifier
YieldThe number of bases in the sample in gigabases

Occurrences

Occurrences are a set of fields in a sample metrics file that allocate sample performance metrics by specific occurrences of a sample in the run. For example, if a sample appears in both lanes, Bases2Fastq lists an occurrence for each lane.

Each occurrence includes the identifiers Lane and Expected Sequence and reports the following performance metrics.

MetricValue
BaseCompositionCounts for each A, C, G, T, and N base
CustomMetadataCustom metadata specified in the run manifest, if applicable
MeanReadLengthThe average read length after adapter trimming
NumPoloniesThe total number of polonies assigned to the sample
NumPoloniesBeforeTrimmingThe number of polonies assigned to a sample before adapter trimming
OccurrencesThe average read length after adapter trimming
PercentMismatchThe percentage of polonies assigned to a sample with mismatch
PercentQ30The percentage of ≥ Q30 Q-scores for the run, including assigned and unassigned reads
PercentQ40The percentage of ≥ Q40 Q-scores for the run, including assigned and unassigned reads
PercentReadsTrimmedThe percentage of reads that Bases2Fastq trimmed
PerReadGCCountHistogramA list of counts: the value at index i is the number of reads with i G/C calls
QualityScoreMeanThe mean Q-score of base calls for the sample
R1AdaptersThe Read 1 adapter sequences associated with the lane the occurrence belongs to
R2AdaptersThe Read 2 adapter sequences associated with the lane the occurrence belongs to
RemovedAdapterLengthHistogramA histogram showing the number of bases trimmed from an adapter in a given position
YieldThe number of bases in the sample in gigabases