Archive for the 'Graphics' Category

h1

Plotting Heatmaps in R

Tuesday, January 24th, 2012

I had recently had to create a heatmap visualisation as a part of our results in a paper we had submitted for a conference and as it took way more time than I had anticipated, I figured it’s something worth documenting. My first point of call was obviously Sgt. Google and the first hit given was How to Make a Heatmap – a Quick and Easy Solution, which I naturally liked since the sample dataset was a basketball stats one 😀 However, I quickly realised that this was not enough for what I wanted – my x-axis showed time and instead of nice, fat blocks, my heatmap/graph showed thin, coloured lines. Another problem I ran into was interpretation. I initially had something like this:

n = 50
matrix_to_be_plotted <- rnorm(n*n) # generate 50 x 50 = 2500 random numbers
dim(matrix_to_be_plotted) <- c(n,n) # change vector to matrix of dimension 50 x 50

heatmap(matrix_to_be_plotted, # as name suggests, the matrix of the data to be plotted
scale = “row”, # this is important; I did not realise this at first and spent an evening wondering why data values did not match with what I was describing in the heatmap (I’d assumed the darker the shade, the higher the value). Basically, you can control colour scaling by row or column, default is row. This is so important in my opinion that I’ll quote the documentation: “character indicating if the values should be centered and scaled in either the row direction or the column direction, or none. The default is “row” if symm false, and “none” otherwise.”
main = “Greyscale heatmap – squares”, # name/title of figure
Colv=NA, Rowv=NA, # set to NA or columns/rows of matrix would be rearranged into hierarchical clusters according to R’s dendrogram function
margins=c(3,3), # column/row margin space, the higher the number, the more space you get
cexCol=1, cexRow=1, # size of column/row labels, 1 is default
col=grey(seq(1,0,-0.01)) # colour is greyscale, sequence from 1 (black) to 0 (white) in steps of 0.01
)

which was a grey-scale version of what I wanted. You can read the documentation of the parameters of R heatmap, but my own explanation/interpretation of the parameters in the context of what I’ve given is written.

Here’s a picture of what should come out (tips on Saving Plots in R):

Example of Greyscale heatmap - squares

Not bad: Example of Greyscale heatmap, square matrix

This doesn’t look so bad, and if this is all you need, great. However, here’s a rectangular matrix, i.e. more similar to what I had in originally, which can be emulated by changing the dimensions like so:

dim(matrix_to_be_plotted_thin) <- c(n/2,2*n)
colnames(matrix_to_be_plotted_thin) <- rep(” “, 2*n)
colnames(matrix_to_be_plotted_thin)[seq(1,2*n,3)] <- paste(“Wk”, seq(1,2*n,3))

The column names are changed so they don’t get so squished on the x-axis:

Example of Greyscale heatmap, non-square matrix

Kind of Ugly: Example of Greyscale heatmap, rectangular matrix

This is less visually appealing. Moreover, it didn’t suit my graph because looked like this:

Greyscale insufficient

Yuck: My own graph, Greyscale insufficient

Part of the problem was that I didn’t realise I hadn’t the scaling properly – by default it was row-scaled, I was describing it as it were column-scaled. However, in retrospect, it still wouldn’t have worked if I had the correct scaling because this, as a figure taking up about barely one-sixth of a page, it was fairly difficult to read when printed. (I couldn’t figure out how to get a border around the graphic either so if anyone knows, please comment and let me know!) Anyway, my Saturday morning thus became an online google image and documentation hunt with keywords of R, heatmap, image, matrix image, visualisation,  <what have you>, and finally I converged to using an upgraded package of heatmap (library being heatmap.plus) with gplots – to prettify the graph with custom colours.

This apparently Easy Guide To Drawing Heat Maps To PDF With R (With Color Key) was a great starting point, but ultimately, I found that the full heatmap.2 documentation combined with this colorRamp pdf most useful as they actually explained what you needed to do to customise.

A slight detour – colour

I am a bit particular about my colours, both from an explanatory viewpoint and from an aesthetic one. I utterly despise bad graphs and badly-coloured ones even more so. In academia (or any good documentation that requires printing out), the best thing to do is to have graphs free of colour, such that it still makes perfect sense when printed in greyscale, and I do prefer that. However, when needs be for colour, I think it’s important to get it right, e.g. the most common type of colour blindness is red-green, so avoid using those two for distinguishing.

This arbitrary compulsive requirement of mine lead me to actually create my own palette for my heatmap…(Yay, as if I don’t have enough ways to spend my time.) More importantly, graphs are supposed to be a more efficient way of explaining data, not made just for the sake of them. If I am going to use colour in my graph, there should be a reason for it, and it shouldn’t require more text to explain why. (Picture – a thousand words – that sort of thing.)

A useful resource for creating your own palette is to look at this R colour chart. Here are a few examples:

TestPalette <- colorRampPalette( c(‘aliceblue’,’aquamarine1′,’azure3′,’blue’,’blueviolet’, ‘darkcyan’,’darkblue’,’darkgreen’,’darkmagenta’,’darkolivegreen’, ‘darkmagenta’,’darkviolet’,’black’))
WarmPalette <- colorRampPalette(c(‘antiquewhite’,’pink’,’rosybrown3′,’rosybrown4′,’saddlebrown’,’brown’,’black’))
CoolPalette <- colorRampPalette(c(‘lavender’,’mediumslateblue’,’blue’,’turquoise4′,’seagreen2′,’seagreen4′,’black’))
BluesPalette <- colorRampPalette(brewer.pal(9,”Blues”))(100)

brewer.pal is the in-built palette (as described in detail in the colorRamp pdf I linked earlier) – the handiest query for me was:

display.brewer.all()

which shows the name of the palette (like “Blues”) and the range of colours in it. Also, (100) in the BluesPalette example gives how fine you want the shading to be. So, if you had (3) then there’d be three varying shades of blue of something like Dark Blue, Blue, Light Blue, (100) gives you on hundred shades varying from dark to light.

Finally…

The code for my final plot and comments for explanation of new things:

library(gplots) # for colour panel of heatmap
library(heatmap.plus)

heatmap.2(MyMatrix,
Rowv=FALSE, Colv=FALSE, dendrogram= c(“none”),
cexCol=1, cexRow=1,
key=TRUE, keysize=0.1, # display colour key
density.info=c(“none”), # options of different plots to be drawn in colour key
trace=”none”, # character string indicating whether a solid “trace” line should be drawn across ‘row’s or down ‘column’s, ‘both’ or ‘none’.
margins=c(5,9),
lmat=rbind( c(0,3), c(2,1), c(0,4) ), lhei=c(0.2, 8.5, 2), # where to display colour key
col=CoolPalette # custom colours for colour key
)

….and the result:

heatmap_cool_palette

Much nicer: Heatmap with Cool Palette colours

My only and final annoyance with this is the “Value” which floats a bit too near the displayed numbers, but I don’t think it impedes on the readability so much that it’s worth unnecessary tweaking.

Funnily enough, the greyscale versions don’t look as bad on screen as a blog post. But trust me, it makes one hell of a difference on paper.