================================
HIV Subclustering
- Single consensus estimation of all full length genome runs
- Hypothesis: SGAs runs 0021-463M-D6 or 0020-463M_I3 are possible
founders to the 463F SGA recipient.
- Best outcome: identify the exact HIV founder genome to the base in
the bulk PCR runs that transmitted between the donor and the
recipient.
Next
================================
HIV Subclustering Overview
- What is the recipient identity?
-0015-463F_28 and -0024-463F_74 consensus have a single difference
(A deletion after TTTT).
- Cluster high coverage bulk PCR run -0003.
- Original cluster. 154 variant feature positions.
- Cut at 0.5 which is near the theoretical threshold for 154 selected
features.
- There are 17 groups at 0.5 cutoff of which 10 have 28 or more
sequences.
Next
================================
HIV Subclustering Clusters
- 10 clustering plots after subsetting (number of features, cluster plots):
1, 2, 70
10, 10, 45
13, 0, 0
0
- Alignment of the 10 cluster consensus groups
getQuiverSubpop.aln
Next
================================
HIV Subclustering Identification
- Align the 10 clusters to the 28 full-run consensus estimates
- Two subclusters mapped to SGA estimates:
- cg_2->0022-463M_I1 and cg_3->0020-463M_I3.
- Subcluster cg_2->463M_I1 is exact for the 9067 bases with NO
error.
- Subclustering cg_3->0020-463M_I3 is 99.3821% concordant with 49
mismatches, 2 deletions, and 5 insertions.
- Subcluster 2 perfect (green) and subcluster 3 99.38% concordance (blue)
- Alignment of the 10 cluster consensus to the recipient estimate (run
-0024) shows cg_3 to be the closest but still errors (47 mismatch 17
deletion 4 insertion).
Next
================================
HIV Subclustering Ambiguity
- subcluster cg-3 still has a larger variation with 70 identified
variant positions in the subcluster.
- There are 116 degenerate positions at 10% or greater in cluster group 3!
61 R
30 Y
10 K
10 M
3 W
2 H
- There are 177 degerate positions at 10% in the bulk PCR before subclustering.
84 R
56 Y
15 M
10 K
4 W
3 S
2 D
1 H
1 N
1 V
- There is no ambiguity at the 10% level in the closest matched
recipient SGA -0022-463M_I3.