, and we find that the distribution of HB 36 is less likely than the distribution of cys2—indicating that HB 36 is a stronger marker of severe disease than cys2 in the Malian population. This is essentially what we observed in the Kenyan population, since HB 36 is the dominant HB expression rate of the PC that correlates most strongly with severe disease, PC 1 (Figure 5E). Additionally, in the Malian population we find that HBs 60, 64, 79, 163, and 179 are differentially expressed in cerebral versus mild selleck kinase inhibitor hyperparasitaemic cases (p < .05). For the Malian dataset [14],
we also compare the recall (hit rate), accuracy and precision of the following two predictive models: (1) expressed DBLα sequence tags containing two cysteines predict severe malaria whereas those with some other number predict
mild hyperparasitaemic malaria, and (2) expressed sequence tags lacking HB 36 predict severe malaria whereas those with HB 36 predict mild disease. selleck chemicals The hit rate, accuracy and precision are given by TP/P, (TP + TN)/(P + N) and TP/(TP + FP), NVP-BSK805 respectively, where TP is the number of truly positive instances classified as positive, TN is the number of truly negative instances classified as negative, FP is the number of truly negative instances classified as positive, P is the total number of truly positive instances classified as either positive or negative, and N is the total number of truly negative instances classified as either positive or negative [32]. For the purpose of predicting severe disease from sequence features of expressed DBLα var tags in the Malian population, classification by HB 36 out-performs
classification by cys2 in terms of all three of the above. The hit rate is 0.723 as opposed to 0.617, the accuracy is 0.765 as opposed to 0.724, and the precision is 0.773 as opposed to 0.763. Among the unique set of sequences expressed within the cerebral and hyperparasitemia isolates, the rank correlations (both Spearman and Kendall) of rosetting with each of HB 60, 79, 153, Acyl CoA dehydrogenase and 219 are all greater in magnitude than the rank correlation of rosetting with cys2. These several HBs are also associated with rosetting in the Kenyan dataset [10], and thus, they appear to serve as more informative predictors of rosetting than the number of cysteines within the var DBLα tag. Conclusions Even though the HBs were designed using a very small number of var sequences isolated from a few parasite genomes, they manage to cover the sequence diversity of a local population, leaving only the minority of sites unaligned. We find that the variation described by HB diversity within the var DBLα tag is not completely redundant with the diversity already described by classic methods. Furthermore, relative to classic methods, the consideration of HB composition appears to be more informative for predicting whether a tag’s expression is associated with various disease phenotypes.