Output Files
The following table lists the files that Bases2Fastq outputs.
File | Directory | Description |
---|---|---|
Bases2Fastq.log | info | Log file that records software events |
IndexAssignment.csv | Root | Yield and the number and rate of polonies assigned for each sample and index combination |
Metrics.csv | Root | The mismatch rates, percent assigned, and per sample yield for each lane |
{ProjectName}_QC.html | Samples/{ProjectName} | Interactive HTML QC report on the performance and quality of the samples aggregated by project |
{ProjectName}_index_assignment.csv | Samples/{ProjectName} | Yield and the number and rate of polonies assigned for each sample and index combination in a project |
{ProjectName}_metrics.csv | Samples/{ProjectName} | The mismatch rates, percent assigned, and per sample yield for each lane in a project |
{ProjectName}_RunStats.json | Samples/{ProjectName} | Information on the performance of samples in a project |
{RunName}_QC.html | Root | Interactive HTML QC report on run performance and quality for all samples and projects |
RunManifest.csv | Root | The AVITI OS- or user-created run manifest |
RunManifest.json | Root | Machine-readable copy of the run manifest as a JSON file |
RunManifestErrors.json | info | A record of errors in the run manifest |
RunParameters.json | Root | A copy of the original run parameters file |
RunStats.json | Root | Information on run performance |
{SampleName}_{Read}.fastq.gz | Samples/{SampleName} or Samples/{ProjectName}/{SampleName} | The primary output of Bases2Fastq |
{SampleName}_stats.json | Samples/{SampleName} | Information on the performance of each sample in the run |
UnassignedSequences.csv | Root | The most frequent unassigned index sequences with approximate counts1 |
1 Counts indicate how many times an incorrect index sequence appears.
HTML QC Reports
The HTML QC reports open in a browser so you can move through various tabs. The tabs display histograms and other charts that visualize index assignment and other quality metrics. Bases2Fastq names the QC report for a run per the convention {RunName}_QC.html
. Project-level QC reports follow the convention {ProjectName}_QC.html
.
If the run manifest includes more than 96 samples, the report does not display per sample charts.
HTML QC Reports for Individually Addressable Lanes
If you are using the Individually Addressable Lanes add-on and want HTML QC reports for each lane, create projects for each lane in your run manifest. For an example, see the Run Manifest Documentation.
Missing HTML QC Report
If an HTML QC report does not generate on a system configured for static binary, complete the following troubleshooting steps.
- If you are using the static binary executable, make sure compatible versions of Python and the necessary packages are installed.
- Review the error in
info/QCReportErrors.txt
for the cause, and then use this information to generate the HTML QC report.
FASTQ Files
A FASTQ file records all genomic data and corresponding Q-scores for a sample. FASTQ files are GZIP compressed text files named per the convention {SampleName}_{Read}.fastq.gz
.
Each entry in a FASTQ file corresponds to one read and includes the following four lines:
- A sequence identifier that includes run and polony information
- Base calls assembled into a sequence comprised of A, C, G, T, and N
- A plus sign (+) that separates the sequence from the Q-scores
- A Q-score for each base in the sequence
Sequence Identifiers
A sequence identifier includes the components described in the following table, formatted in one line:
@<instrument>:<run name>:<flow cell ID>:<lane>:<tile>:<x-pos>:<y-pos>:UMI <read>:N:0:<index sequence>
File | Directory | Description |
---|---|---|
@ | @ | Start to the sequence identifier line |
<instrument> | Upper and lowercase letters, integers 0–9, and underscores (_) | Instrument name |
<run name> | Upper and lowercase letters, integers 0–9, hyphens (-), and underscores | Run name as defined during run setup |
<flow cell ID> | Upper and lowercase letters and integers 0–9 | Flow Cell ID from the barcode scan. If no barcode is present, the Run ID replaces the Flow Cell ID. |
<lane> | 1 or 2 | Lane number |
<tile> | An integer | Tile number |
<x_pos> | A zero-padded integer | X-coordinate of the polony |
<y_pos> | A zero-padded integer | Y-coordinate of the polony |
<UMI> | A, C, G, T, and N | UMI sequence with a plus sign separating the Read 1 and Read 2 sequences, if applicable |
<read> | 1 or 2 | Read number |
<is filtered> | N | A legacy filtering value of N. The value exists only for backwards compatibility and does not change. |
<control number> | 0 | A legacy control number of 0. The value exists only for backwards compatibility and does not change. |
<index sequence> | Varies | A value that depends on the indexing strategy indicated in the run manifest:
|
Quality Scores
A Q-score indicates the confidence of a base call based on the Phred scale. A Phred quality score (Q) is logarithmically related to error rate (E): Q = -10log E.
In a FASTQ file, an ASCII code represents the Q-score. Bases2Fastq encodes quality scores with a +33 offset (Phred33).
Q-Score | ASCII Code | Character | Q-Score | ASCII Code | Character | Q-Score | ASCII Code | Character | ||
---|---|---|---|---|---|---|---|---|---|---|
0 | 33 | ! | 17 | 50 | 2 | 34 | 67 | C | ||
1 | 34 | " | 18 | 51 | 3 | 35 | 68 | D | ||
2 | 35 | # | 19 | 52 | 4 | 36 | 69 | E | ||
3 | 36 | $ | 20 | 53 | 5 | 37 | 70 | F | ||
4 | 37 | % | 21 | 54 | 6 | 38 | 71 | G | ||
5 | 38 | & | 22 | 55 | 7 | 39 | 72 | H | ||
6 | 39 | ' | 23 | 56 | 8 | 40 | 73 | I | ||
7 | 40 | ( | 24 | 57 | 9 | 41 | 74 | J | ||
8 | 41 | ) | 25 | 58 | : | 42 | 75 | K | ||
9 | 42 | * | 26 | 59 | ; | 43 | 76 | L | ||
10 | 43 | + | 27 | 60 | < | 44 | 77 | M | ||
11 | 44 | , | 28 | 61 | = | 45 | 78 | N | ||
12 | 45 | - | 29 | 62 | > | 46 | 79 | O | ||
13 | 46 | . | 30 | 63 | ? | 47 | 80 | P | ||
14 | 47 | / | 31 | 64 | @ | 48 | 81 | Q | ||
15 | 48 | 0 | 32 | 65 | A | 49 | 82 | R | ||
16 | 49 | 1 | 33 | 66 | B | 50 | 83 | S |
Run Metrics (RunStats.json)
A run metrics file, RunStats.json
, reports the following performance metrics in a JSON file format. The metrics are specific to the Bases2Fastq execution.
Metric | Value |
---|---|
AnalysisID | The unique, Bases2Fastq-generated identifier for the analysis |
AnalysisVersion | The current version of Bases2Fastq |
AssignedYield | The run yield based on assigned reads in gigabases |
FileVersion | The current version of the file format |
FlowCellID | A flow cell identifier sourced from RunParameters.json or, if blank, the letter R followed by the RunID value |
I1IsReverseComplement | The observed orientation of the Index 1 sequences relative to the orientation recorded in the run manifest |
I2IsReverseComplement | The observed orientation of the Index 2 sequences relative to the orientation recorded in the run manifest |
Lanes | A detailed list of per lane metrics |
MeanReadLength | The average read length after adapter trimming |
NumPolonies | The total number of polonies calculated for the run |
NumPoloniesBeforeTrimming | The total number of polonies calculated for the run before adapter trimming |
PercentAssignedReads | The percentage of reads assigned to a sample |
PercentMismatch | The percentage of polonies assigned to a sample with a mismatch |
PercentMismatchI1 | The percentage of polonies assigned to Index 1 sequences with a mismatch |
PercentMismatchI2 | The percentage of polonies assigned to Index 2 sequences with a mismatch |
PercentQ30 | The percentage of ≥ Q30 Q-scores for the run, including assigned and unassigned reads |
PercentQ40 | The percentage of ≥ Q40 Q-scores for the run, including assigned and unassigned reads |
PercentReadsTrimmed | The percentage of reads that Bases2Fastq trimmed |
PercentUnexpectedIndexPairs | The percentage of all polonies with Index 1 and Index 2 reads that matched different samples1 |
PerReadMeanQualityScoreHistogram | The distribution of per-read average quality scores |
QualityScore10thPercentile | The 10th percentile of quality scores |
QualityScore25thPercentile | The 25th percentile of quality scores |
QualityScore50thPercentile | The 50th percentile of quality scores |
QualityScore75thPercentile | The 75th percentile of quality scores |
QualityScore90thPercentile | The 90th percentile of quality scores |
QualityScoreHistogram | A per-base call Q-score distribution with integer resolution |
QualityScoreMean | The average Q-score of base calls for a sample |
RemovedAdapterLengthHistogram | A histogram showing the number of bases trimmed from an adapter in a given position |
RunName | A text-based run identifier sourced from RunParameters.json |
RunID | A universally unique identifier (UUID) assigned to the run and sourced from RunParameters.json |
Samples | A list of libraries the run sequenced |
SampleStats | The per-sample metrics listed in the sample metrics files for the run |
TotalYield | The total yield of all reads in gigabases |
UnassignedSequences | A list of unassigned index sequences with a count for each unassigned sequence |
1 For demultiplexing to be successful, both index reads must match the same sample.
Project Metrics ({ProjectName}_RunStats.json)
When a run manifest groups samples by project, Bases2Fastq creates JSON project metrics files. Bases2Fastq names the files per the convention {ProjectName}_RunStats.json
. The files report the following performance metrics for the samples in the project.
Metric | Value |
---|---|
AnalysisVersion | The current version of Bases2Fastq |
AnalysisID | The unique, Bases2Fastq-generated identifier for the analysis |
BaseComposition | Counts for each A, C, G, T, and N base |
FileVersion | The current version of the file format |
FlowCellID | A flow cell identifier sourced from RunParameters.json or, if blank, the letter R followed by the RunID value |
I1IsReverseComplement | The observed orientation of the Index 1 sequences relative to the orientation recorded in the run manifest |
I2IsReverseComplement | The observed orientation of the Index 2 sequences relative to the orientation recorded in the run manifest |
Lanes | A detailed list of per lane metrics |
MeanReadLength | The average read length after adapter trimming |
NumPolonies | The total number of polonies calculated for the samples in the project |
NumPoloniesBeforeTrimming | The total number of polonies calculated for the samples in the project before adapter trimming |
PercentMismatch | The percentage of polonies assigned to samples with a mismatch in the project |
PercentMismatchI1 | The percentage of polonies assigned to Index 1 sequences with a mismatch |
PercentMismatchI2 | The percentage of polonies assigned to Index 2 sequences with a mismatch |
PercentQ30 | The percentage of ≥ Q30 Q-scores for the project, including assigned and unassigned reads |
PercentQ40 | The percentage of ≥ Q40 Q-scores for the project, including assigned and unassigned reads |
PercentReadsTrimmed | The percentage of reads that Bases2Fastq trimmed |
PerReadGCCountHistogram | A list of counts: the value at index i is the number of reads with i G/C calls |
Project | The alphanumeric project identifier |
QualityScore10thPercentile | The 10th percentile of quality scores |
QualityScore25thPercentile | The 25th percentile of quality scores |
QualityScore50thPercentile | The 50th percentile of quality scores |
QualityScore75thPercentile | The 75th percentile of quality scores |
QualityScore90thPercentile | The 90th percentile of quality scores |
QualityScoreMean | The mean Q-score of base calls for the samples in the project |
Reads | A detailed list of per read metrics |
RemovedAdapterLengthHistogram | A histogram showing the number of bases trimmed from an adapter in a given position |
RunName | A text-based run identifier sourced from RunParameters.json |
RunID | A UUID assigned to the run and sourced from RunParameters.json |
Samples | A list of libraries sequenced for the project |
SampleStats | The per-sample metrics listed in the sample metrics files for the project |
SampleID | A globally unique sample identifier |
SampleName | The alphanumeric sample identifier |
SampleNumber | The numeric sample identifier |
Yield | The number of bases in the project in gigabases |
Sample Metrics ({SampleName}_stats.json)
A sample metrics file reports the following sample-specific performance metrics in a JSON file format. Bases2Fastq names the file per the convention {SampleName}_stats.json
.
Metric | Value |
---|---|
AnalysisVersion | The current version of Bases2Fastq |
BaseComposition | Counts for each A, C, G, T, and N base |
ExternalID | An external ID specified in the run manifest, if applicable |
FileVersion | The current version of the file format |
MeanReadLength | The average read length after adapter trimming |
NumPolonies | The total number of polonies assigned to the sample |
NumPoloniesBeforeTrimming | The number of polonies assigned to a sample before adapter trimming |
Occurrences | Additional information per occurrence of the sample |
PercentMismatch | The percentage of polonies assigned to a sample with mismatch |
PercentQ30 | The percentage of ≥ Q30 Q-scores for the sample |
PercentQ40 | The percentage of ≥ Q40 Q-scores for the sample |
PercentReadsTrimmed | The percentage of reads that Bases2Fastq trimmed |
PerReadGCCountHistogram | A list of counts: the value at index i is the number of reads with i G/C calls |
QualityScoreMean | The mean Q-score of base calls for the sample |
RemovedAdapterLengthHistogram | A histogram showing the number of bases trimmed from an adapter in a given position |
RunName | A text-based run identifier sourced from RunParameters.json |
RunID | A UUID assigned to the run and sourced from RunParameters.json |
SampleID | A globally unique sample identifier |
SampleName | The alphanumeric sample identifier |
SampleNumber | The numeric sample identifier |
Yield | The number of bases in the sample in gigabases |
Occurrences
Occurrences are a set of fields in a sample metrics file that allocate sample performance metrics by specific occurrences of a sample in the run. For example, if a sample appears in both lanes, Bases2Fastq lists an occurrence for each lane.
Each occurrence includes the identifiers Lane and Expected Sequence and reports the following performance metrics.
Metric | Value |
---|---|
BaseComposition | Counts for each A, C, G, T, and N base |
CustomMetadata | Custom metadata specified in the run manifest, if applicable |
MeanReadLength | The average read length after adapter trimming |
NumPolonies | The total number of polonies assigned to the sample |
NumPoloniesBeforeTrimming | The number of polonies assigned to a sample before adapter trimming |
Occurrences | The average read length after adapter trimming |
PercentMismatch | The percentage of polonies assigned to a sample with mismatch |
PercentQ30 | The percentage of ≥ Q30 Q-scores for the run, including assigned and unassigned reads |
PercentQ40 | The percentage of ≥ Q40 Q-scores for the run, including assigned and unassigned reads |
PercentReadsTrimmed | The percentage of reads that Bases2Fastq trimmed |
PerReadGCCountHistogram | A list of counts: the value at index i is the number of reads with i G/C calls |
QualityScoreMean | The mean Q-score of base calls for the sample |
R1Adapters | The Read 1 adapter sequences associated with the lane the occurrence belongs to |
R2Adapters | The Read 2 adapter sequences associated with the lane the occurrence belongs to |
RemovedAdapterLengthHistogram | A histogram showing the number of bases trimmed from an adapter in a given position |
Yield | The number of bases in the sample in gigabases |