### This file is the same as kmmout.eg with the addition of some explanatory notes. Search on # to find the next note. Estimated mean (as a row vector) for each group # This first batch of 0.000000 # numbers is simply the 2.750000 # various input values # specified by the user # in the corresponding Estimated common covariance matrix # input file. It provides 1.000000 # a useful check that you # have inputted the correct # file! Estimated mixing proportion for each group # 0.500 0.500 # With one group the log likelihood is -293.554 ### This is the log likelihood of the best-fitting single Gaussian to the dataset under study. In loop 17 log likelihood is -289.903 ### This gives the log likelihood of the 2-group (in this case) fit once the KMM algorithm has converged. For this dataset the algorithm has converged relatively rapidly after only 17 iterations. [Currently the program is terminated if convergence has not occurred within 250 iterations.] Estimate of mixing proportion for each group 0.517 0.483 ### This is the mixing proportion for the 2-group fit as calculated by KMM. (Note, the actual number of objects assigned to each group is given later in the output. The above mixing proportion does not always agree with the "naive" mixing proportion obtained from the ratio of the numbers of objects assigned to different groups. We are currently not sure whether this is because the estimated mixing proportion is a little more subtle than simply the ratio of the number of objects assigned to each group. ### The following numbers provide an estimate of the reliability of the assignment of each datapoint to its preferred group. The first column is just the identifier of the datapoint (thus the first number in the column of data from the input file is assigned the number 1, and so on). The second column is the posterior probability that the datapoint is a member of group 1, the third column that it is a member of group 2. Obviously datapoints are assigned to the group for which they have the highest membership probability. These numbers are a guide to how reliable those assignments are. Entity: Final estimates of posterior probabilities of group membership 1 0.976 0.024 2 0.995 0.005 3 0.991 0.009 4 0.948 0.052 5 0.986 0.014 6 0.851 0.149 7 0.999 0.001 8 0.989 0.011 9 0.759 0.241 10 0.813 0.187 11 0.956 0.044 12 0.931 0.069 13 0.989 0.011 14 0.810 0.190 15 0.887 0.113 16 0.766 0.234 17 1.000 0.000 18 0.995 0.005 19 0.677 0.323 20 0.561 0.439 21 0.929 0.071 22 0.999 0.001 23 0.986 0.014 24 0.960 0.040 25 0.985 0.015 26 0.999 0.001 27 0.957 0.043 28 0.936 0.064 29 0.940 0.060 30 0.709 0.291 31 0.982 0.018 32 0.829 0.171 33 0.975 0.025 34 0.953 0.047 35 0.967 0.033 36 0.952 0.048 37 0.987 0.013 38 0.999 0.001 39 0.731 0.269 40 0.925 0.075 41 0.999 0.001 42 0.996 0.004 43 0.989 0.011 44 0.978 0.022 45 0.631 0.369 46 0.996 0.004 47 0.999 0.001 48 0.998 0.002 49 0.766 0.234 50 0.080 0.920 51 0.439 0.561 52 0.427 0.573 53 0.993 0.007 54 0.938 0.062 55 0.966 0.034 56 0.691 0.309 57 0.960 0.040 58 0.721 0.279 59 0.984 0.016 60 0.968 0.032 61 0.997 0.003 62 0.933 0.067 63 0.911 0.089 64 0.154 0.846 65 0.793 0.207 66 0.982 0.018 67 0.584 0.416 68 0.952 0.048 69 0.297 0.703 70 0.958 0.042 71 0.657 0.343 72 0.995 0.005 73 0.812 0.188 74 0.576 0.424 75 0.998 0.002 76 0.292 0.708 77 0.031 0.969 78 0.487 0.513 79 0.000 1.000 80 0.004 0.996 81 0.438 0.562 82 0.015 0.985 83 0.144 0.856 84 0.046 0.954 85 0.044 0.956 86 0.106 0.894 87 0.021 0.979 88 0.128 0.872 89 0.010 0.990 90 0.649 0.351 91 0.015 0.985 92 0.003 0.997 93 0.104 0.896 94 0.225 0.775 95 0.010 0.990 96 0.035 0.965 97 0.669 0.331 98 0.013 0.987 99 0.206 0.794 100 0.665 0.335 101 0.426 0.574 102 0.198 0.802 103 0.931 0.069 104 0.697 0.303 105 0.018 0.982 106 0.061 0.939 107 0.010 0.990 108 0.037 0.963 109 0.008 0.992 110 0.286 0.714 111 0.022 0.978 112 0.178 0.822 113 0.042 0.958 114 0.574 0.426 115 0.605 0.395 116 0.001 0.999 117 0.494 0.506 118 0.023 0.977 119 0.167 0.833 120 0.074 0.926 121 0.009 0.991 122 0.091 0.909 123 0.000 1.000 124 0.329 0.671 125 0.007 0.993 126 0.011 0.989 127 0.058 0.942 128 0.010 0.990 129 0.591 0.409 130 0.038 0.962 131 0.027 0.973 132 0.450 0.550 133 0.283 0.717 134 0.568 0.432 135 0.328 0.672 136 0.061 0.939 137 0.001 0.999 138 0.002 0.998 139 0.019 0.981 140 0.006 0.994 141 0.023 0.977 142 0.010 0.990 143 0.039 0.961 144 0.045 0.955 145 0.004 0.996 146 0.035 0.965 147 0.199 0.801 148 0.067 0.933 149 0.381 0.619 150 0.011 0.989 Resulting partition of the entities into NG groups 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 1 2 2 1 2 2 1 1 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 ### This grid provides a visual representation of the assignment of the datapoints into group 1 or group 2. Number assigned to each group 79 71 ### 79 of the datapoints were assigned to group 1, 71 to group 2. Estimates of correct allocation rates for each group 0.892 0.865 ### This is the sum of the individual posterior probabilities of group membership for each group. Estimate of overall correct allocation rate 0.879 ### This is the weighted mean of the previous two numbers. Estimated mean (as a row vector) for each group 0.075989 2.701117 ### The mean values of group 1 and group 2 Estimated common covariance matrix 1.212705 ### KMM's estimate of the group covariance The likelihood ratio test statistic is 7.302 with 2 degrees of freedom and the p value for this statistic is 0.025 ### The P-value is an estimate of the improvement of the 2-group fit over a single Gaussian. It is only an approximation, based on the assumption that the likelihood ratio test statistic is distributed like Chi^2, with the number of degrees of freedom being the difference in the number of degrees of freedom of the hypotheses under study (here 2-group versus single Gaussian). Simulations have shown that for the 2-group versus 1-group homoscedastic case, this approximation is justified. However, for more complicated cases (particularly the heteroscedastic case) the justification for this assumption becomes increasingly poor. Thus some care is warranted. In the simple 2-group homoscedastic case considered here, this procedure is believed to be reliable. A P-value of 0.025 is interpreted as a rejection of the single Gaussian model at a confidence level of 97.5% (conventionally, rejection at better than 95% is regarded as strongly significant). Thus for the dataset analysed here, a bimodal distribution is a statistically significant improvement over the single Gaussian. ################################################################################