New t-SNE package for R
I finished up a rough version of my t-SNE package for R. If you haven’t heard of t-SNE, and are into dimensionality reduction and/or visualization. You should check it out. The “tsne” package for R is available on CRAN.
t-SNE is a non-metric multidimensional scaler. It creates an embedding by a gradient descent process, much like isoMDS, and a number of other approaches. However, it does some interesting things to the high dimensional data before it proceeds.
First, it translates the matrix of raw numeric values into probabilities. The probabilities are a guess as to the likelihood that another given datapoint is a neighbor. The number of neighbors that are considered is parametrized by a “perplexity” argument, which is the optimal number of neighbors for a given datapoint. A gradient descent attempts to preserve aspects of entropy present in the probability distributions.
The technique handles large variances fairly well across the values in the dimensions or the datapoints themselves, and is fairly quick in the applications that van der Maaten provides. However, my R version is unfortunately much slower.
In the meantime, I’ve been testing it out against various datasets. It does seem to provide a more natural clustering in many cases, such as the classic “iris” flower measurement dataset available in R. I’ve made a simple animation of how the gradient descent process sorts out the clusters and gradually arranges them to minimize its error function:
This produces a better cluster than a comparable PCA technique, based on simple covariance:

I’ve tried it on some other datasets, and it works even better with larger sets of more complex data. I’ll update this post with a link to the package once it goes through the verification process at CRAN.





Justin
I am using the R implementation of t-SNE, thank you for writing it. I got it to work once, but now have a recurring problem.
Namely, the error:
Error in while (abs(Hdiff) > tol && tries < 50) { :
missing value where TRUE/FALSE needed
Using the command with an object of class='dist' produced the same error. To work around that I did this:
nbsed.df <- data.frame(as.matrix(nbsed))
Which worked once, but not on the second data set.
nbsed.tsne tol && tries < 50) { :
missing value where TRUE/FALSE needed
Any attempts to change any of the parameters results in an error. I have been in touch with Laurens. I followed the link on the t-SNE website to post to this website.
Thanks for your help,
Mark Grimes
University of Montana
Division of Biological Sciences
Hi Mark, Thanks for trying out the tsne package.
I have to admit that I’m not quite sure what is happening in your case. But, I noticed that somehow I forgot to push a batch of tsne changes to CRAN some time ago. So, I’ve dusted them off and sent them in. If you check ftp://cran.r-project.org/incoming, you should see the package I’ve uploaded called tsne_0.1-2.tar.gz. If it’s not there, then it’s probably already available on CRAN.
If you continue to have problems, feel free to send me a note at my e-mail address listed in the tsne docs.
P.S. I’ve also added the code for this package to a repo on github: https://github.com/jdonaldson/rtsne