Update: This tool is now available on cran.
Researchers in academia and the industry consistently use visualizations to better understand their data. The standard two dimension scatter-plot is a staple of many exploratory data analyses. However, there are many cases where two dimensions are not enough. For instance in the plot below, I’ve set up a pairwise plot between three distributions (x, y, and z). The plots are duplicated on opposite sides of the diagonal (showing x vs. y …or… y vs. x). I find it extraordinarily difficult to perceive the three dimensional structure of this data solely with the series of two dimensional plots shown here.
Luckily, R has a package called rgl that lets me plot in three dimensions using OpenGL. It even allows for simple interactions with the mouse, like rotation and zooming.
This allows for a much better sense of the underlying data… but It’s still a bit limiting. Many times, there is “interesting structure” in the three dimensional representation that I would like to focus in on, or perhaps there’s just a lot of junk that I want to get rid of. Many times this structure is not isolated in one or two dimensions, but only becomes apparent through rotating and zooming the three dimensional display.
RGL has a useful method called “select3d” that involves selecting points in the plot, but it’s a bit tricky to use. Using select3d involves typing into the R console, then clicking and dragging on the plot. This produces output from the select3d function. However, the output of the function is… another function! It is then necessary to apply this produced function on the original data to determine which data points fall inside the selection range…. and then it’s completely up to you what you do at that point… do you filter them out? Crop them? Color them differently than the others? It becomes necessary to keep a further supply of methods at hand that can perform these routines on the output of select3d.
This is still all a bit too much to keep in my head for extended periods of time. It’s also a pain to constantly switch back and forth between the three dimensional plot environment and the console in order to pare down the data.
So, I’ve cooked up a little GUI tool bar that makes selecting, labeling, cropping, and deleting points in the RGL plot much simpler. It’s called “sculpt3d”, since it’s focused on altering and shaping the underlying data. In order to launch the toolbar, you first install a small library, and then make a method call: sculpt3d(x,y,z) where x, y, and z are the three dimensions you are interested in plotting.
I put together a small video that shows me playing around with it:
In the video, you can see how I choose a selection color (a mint green color that I thought would be noticeable against the bright rainbow colored points). Then I crop the data on this selection. This is a bit jarring because the crop function automatically zooms in on the selected points, and reverts them back to their original color. The delete function works in a similar way, and is a little easier to follow.
Once I have a smaller collection of datapoints, I might be interesting in labelling them (assuming I passed labels as an argument to sculpt3d). I can toggle the labels with a click of the “Label” button.
If I’m interested in saving the results of the pruning and cropping, I can access the currently selected points by calling the function sculpt3d.selected(), or the currently visible scene by sculpt3d.current(). This returns the same sort of logic vector that select3d does, so I can now use it to filter my dataset, save it under a new name, and come back to it later on. Furthermore… since it’s in R, and uses a cross-platform gui (GTK+ via RGtk2 and Glade), it’s possible to use this tool and the data on any platform I want to.
Currently, I only have this tool on a local webserver, and it’s still a bit rough. However, I thought I’d make it available. To try it out, enter the following in R’s console:
install.packages(rgl) install.packages(RGtk2) rep='http://ethos.informatics.indiana.edu/~jjdonald/r' install.packages(sculpt3d)
It’s currently source-only, so if you’re on windows, you’ll need to install RTools in order to compile everything. You’ll also need the GTK+ framework and Glade (which should get installed automatically with RGtk2).
However, once that is done, you can check the demo out by entering:
Thanks to Daniel Adler and Duncan Murdoch for rgl, and Michael Lawrence and Duncan Temple Lang for RGtk2. Between the four of them, they’ve produced a lot of great stuff for R.