================================

HIV Subclustering

- Single consensus estimation of all full length genome runs - Hypothesis: SGAs runs 0021-463M-D6 or 0020-463M_I3 are possible founders to the 463F SGA recipient. - Best outcome: identify the exact HIV founder genome to the base in the bulk PCR runs that transmitted between the donor and the recipient. Next ================================

HIV Subclustering Overview

- What is the recipient identity? -0015-463F_28 and -0024-463F_74 consensus have a single difference (A deletion after TTTT). - Cluster high coverage bulk PCR run -0003. - Original cluster. 154 variant feature positions. - Cut at 0.5 which is near the theoretical threshold for 154 selected features. - There are 17 groups at 0.5 cutoff of which 10 have 28 or more sequences.
Next ================================

HIV Subclustering Clusters

- 10 clustering plots after subsetting (number of features, cluster plots): 1, 2, 70 10, 10, 45 13, 0, 0 0 - Alignment of the 10 cluster consensus groups
getQuiverSubpop.aln Next ================================

HIV Subclustering Identification

- Align the 10 clusters to the 28 full-run consensus estimates - Two subclusters mapped to SGA estimates: - cg_2->0022-463M_I1 and cg_3->0020-463M_I3. - Subcluster cg_2->463M_I1 is exact for the 9067 bases with NO error. - Subclustering cg_3->0020-463M_I3 is 99.3821% concordant with 49 mismatches, 2 deletions, and 5 insertions. - Subcluster 2 perfect (green) and subcluster 3 99.38% concordance (blue) - Alignment of the 10 cluster consensus to the recipient estimate (run -0024) shows cg_3 to be the closest but still errors (47 mismatch 17 deletion 4 insertion).
Next ================================

HIV Subclustering Ambiguity

- subcluster cg-3 still has a larger variation with 70 identified variant positions in the subcluster. - There are 116 degenerate positions at 10% or greater in cluster group 3! 61 R 30 Y 10 K 10 M 3 W 2 H - There are 177 degerate positions at 10% in the bulk PCR before subclustering. 84 R 56 Y 15 M 10 K 4 W 3 S 2 D 1 H 1 N 1 V - There is no ambiguity at the 10% level in the closest matched recipient SGA -0022-463M_I3.