Nicely identified as a problem, and nicely solved!
What’s remarkable (to me) is that the signal of the confusion matrix is weaker than the adjacency matrix, since there are less nonzero entries in the adjacency matrix. Despite this, the coloring actually is better.
I’d say that the confusion matrix is a better estimator of your variable of interest than the adjacency matrix is. You want to know which cluster pairs need different colours; so your variable of interest is which cluster pairs have the most visual overlap or adjency. And confusion (of a 2D classifier model) is a better estimator of overlap than topological adjacency of cluster centers.
That’s a good point you make! Initially I thought the adjacency matrix would capture the topological structure better (since it’d be nice to have different colors for adjacent classes, even if they’re not misclassified), but it’s too much of a constraint. And there is no direct way of encoding overlap as a higher priority than just normal neighboring cluster adjacency.
And there is no direct way of encoding overlap as a higher priority than just normal neighboring cluster adjacency.
Start with the confusion matrix, find the lowest non-zero value X, and create a new matrix pmax(confusion_matrix, 0.5 * Xmin * incidence_matrix)? I’m using pmax, pairwise max, in its R sense here: the confusion, incidence, and result matrices all have the same shape, and each cell in the result contains the corresponding cell from the confusion or (0.5 * Xmin * incidence) matrix, whichever is greater.
What could also be fun: the CIFAR-100 dataset contains lots of colours, some pretty similar. It would be nice groups with more overlap got more distinct colours – e.g. blue and yellow, not light green and yellow. Compute a distance matrix for the colours, and then tell the solver to maximize Sum(confusion(A,B) * colour_distance(colour for A, colour for B) ) (constraints omitted). Perhaps this can even be done by passing the right weight colour matrix to optimize.quadratic_assignment? Not sure, I’d have to sit down and work out the multiplications and the traces to see if the right sums gets maximized/minimized.
I really like that idea of combining the two matrices! Sounds very reasonable to encode the adjacency as a minor requirement, so hopefully it can be ignored by the optimization.
Yes, I had a very similar conversation with someone yesterday! I think it should be possible to achieve that by turning the color block matrix into a color affinity matrix (somewhat similar to an element-wise inverse distance matrix).
I’m thinking of turning this into a small package, where this functionality would probably be more useful than the binary is-color-distinct matrix that I used for the blog post.
Nicely identified as a problem, and nicely solved!
I’d say that the confusion matrix is a better estimator of your variable of interest than the adjacency matrix is. You want to know which cluster pairs need different colours; so your variable of interest is which cluster pairs have the most visual overlap or adjency. And confusion (of a 2D classifier model) is a better estimator of overlap than topological adjacency of cluster centers.
‘Signal’ (I assume you mean sensitivity or specificity, or some other accuracy metric?) is nice, but what is really useful is a measured variable that is strongly related to your (not directly measured) variable of interest.
Thanks!
That’s a good point you make! Initially I thought the adjacency matrix would capture the topological structure better (since it’d be nice to have different colors for adjacent classes, even if they’re not misclassified), but it’s too much of a constraint. And there is no direct way of encoding overlap as a higher priority than just normal neighboring cluster adjacency.
Start with the confusion matrix, find the lowest non-zero value X, and create a new matrix
pmax(confusion_matrix, 0.5 * Xmin * incidence_matrix)
? I’m usingpmax
, pairwise max, in its R sense here: the confusion, incidence, and result matrices all have the same shape, and each cell in the result contains the corresponding cell from the confusion or (0.5 * Xmin * incidence) matrix, whichever is greater.What could also be fun: the CIFAR-100 dataset contains lots of colours, some pretty similar. It would be nice groups with more overlap got more distinct colours – e.g. blue and yellow, not light green and yellow. Compute a distance matrix for the colours, and then tell the solver to maximize
Sum(confusion(A,B) * colour_distance(colour for A, colour for B) )
(constraints omitted). Perhaps this can even be done by passing the right weight colour matrix to optimize.quadratic_assignment? Not sure, I’d have to sit down and work out the multiplications and the traces to see if the right sums gets maximized/minimized.I really like that idea of combining the two matrices! Sounds very reasonable to encode the adjacency as a minor requirement, so hopefully it can be ignored by the optimization.
Yes, I had a very similar conversation with someone yesterday! I think it should be possible to achieve that by turning the color block matrix into a color affinity matrix (somewhat similar to an element-wise inverse distance matrix).
I’m thinking of turning this into a small package, where this functionality would probably be more useful than the binary is-color-distinct matrix that I used for the blog post.
If you have any feedback on the overall design of the website, I would also appreciate it.