Skip to content

New t-SNE package for R

February 20, 2010

I finished up a rough version of my t-SNE package for R. If you haven’t heard of t-SNE, and are into dimensionality reduction and/or visualization. You should check it out. The “tsne” package for R is available on CRAN.

t-SNE is a non-metric multidimensional scaler. It creates an embedding by a gradient descent process, much like isoMDS, and a number of other approaches. However, it does some interesting things to the high dimensional data before it proceeds.

First, it translates the matrix of raw numeric values into probabilities. The probabilities are a guess as to the likelihood that another given datapoint is a neighbor. The number of neighbors that are considered is parametrized by a “perplexity” argument, which is the optimal number of neighbors for a given datapoint. A gradient descent attempts to preserve aspects of entropy present in the probability distributions.

The technique handles large variances fairly well across the values in the dimensions or the datapoints themselves, and is fairly quick in the applications that van der Maaten provides. However, my R version is unfortunately much slower.

In the meantime, I’ve been testing it out against various datasets. It does seem to provide a more natural clustering in many cases, such as the classic “iris” flower measurement dataset available in R. I’ve made a simple animation of how the gradient descent process sorts out the clusters and gradually arranges them to minimize its error function:

This produces a better cluster than a comparable PCA technique, based on simple covariance:

pca iris

I’ve tried it on some other datasets, and it works even better with larger sets of more complex data. I’ll update this post with a link to the package once it goes through the verification process at CRAN.

From → Mapping

  1. Mark Grimes permalink


    I am using the R implementation of t-SNE, thank you for writing it. I got it to work once, but now have a recurring problem.

    Namely, the error:
    Error in while (abs(Hdiff) > tol && tries < 50) { :
    missing value where TRUE/FALSE needed

    Using the command with an object of class='dist' produced the same error. To work around that I did this:
    nbsed.df <- data.frame(as.matrix(nbsed))

    Which worked once, but not on the second data set.

    nbsed.tsne tol && tries < 50) { :
    missing value where TRUE/FALSE needed

    Any attempts to change any of the parameters results in an error. I have been in touch with Laurens. I followed the link on the t-SNE website to post to this website.

    Thanks for your help,

    Mark Grimes
    University of Montana
    Division of Biological Sciences

  2. Justin Donaldson permalink

    Hi Mark, Thanks for trying out the tsne package.

    I have to admit that I’m not quite sure what is happening in your case. But, I noticed that somehow I forgot to push a batch of tsne changes to CRAN some time ago. So, I’ve dusted them off and sent them in. If you check, you should see the package I’ve uploaded called tsne_0.1-2.tar.gz. If it’s not there, then it’s probably already available on CRAN.

    If you continue to have problems, feel free to send me a note at my e-mail address listed in the tsne docs.

    P.S. I’ve also added the code for this package to a repo on github:

  3. Sirus permalink

    Hi Justin,
    Nice package, I want to use your package for automatic selection. but I couldn’t figure out how,
    I went through your code, it seems that you suppose that the features are already ordered,
    could you please give me some advise?

  4. Justin Donaldson permalink

    Hi Sirus, I’m not sure what you mean by automatic selection, or ordering. What are you trying to do?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: