Skip to content

sculpt3d

Update: This tool is now available on cran.

Researchers in academia and the industry consistently use visualizations to better understand their data.  The standard two dimension scatter-plot is a staple of many exploratory data analyses.  However, there are many cases where two dimensions are not enough. For instance in the plot below, I’ve set up a pairwise plot between three distributions (x, y, and z).  The plots are duplicated on opposite sides of the diagonal (showing x vs. y …or… y vs. x).   I find it extraordinarily difficult to perceive the three dimensional structure of this data solely with the series of two dimensional plots shown here.

3d scatter plot

Luckily, R has a package called rgl that lets me plot in three dimensions using OpenGL.  It even allows for simple interactions with the mouse, like rotation and zooming.

rgl

This allows for a much better sense of the underlying data… but It’s still a bit limiting.  Many times, there is “interesting structure” in the three dimensional representation that I would like to focus in on, or perhaps there’s just a lot of junk that I want to get rid of.  Many times this structure is not isolated in one or two dimensions, but only becomes apparent through rotating and zooming the three dimensional display.

RGL has a useful method called “select3d” that involves selecting points in the plot, but it’s a bit tricky to use.  Using select3d involves typing into the R console, then clicking and dragging on the plot.  This produces output from the select3d function.  However, the output of the function is… another function!  It is then necessary to apply this produced function on the original data to determine which data points fall inside the selection range…. and then it’s completely up to you what you do at that point… do you filter them out?  Crop them? Color them differently than the others?  It becomes necessary to keep a further supply of methods at hand that can perform these routines on the output of select3d.

This is still all a bit too much to keep in my head for extended periods of time.  It’s also a pain to constantly switch back and forth between the three dimensional plot environment and the console in order to pare down the data.

So, I’ve cooked up a little GUI tool bar that makes selecting, labeling, cropping, and deleting points in the RGL plot much simpler.  It’s called “sculpt3d”, since it’s focused on altering and shaping the underlying data.  In order to launch the toolbar, you first install a small library, and then make a method call:   sculpt3d(x,y,z) where x, y, and z are the three dimensions you are interested in plotting.

I put together a small video that shows me playing around with it:

In the video, you can see how I choose a selection color (a mint green color that I thought would be noticeable against the bright rainbow colored points).  Then I crop the data on this selection.  This is a bit jarring because the crop function automatically zooms in on the selected points, and reverts them back to their original color.  The delete function works in a similar way, and is a little easier to follow.

Once I have a smaller collection of datapoints, I might be interesting in labelling them (assuming I passed labels as an argument to sculpt3d).  I can toggle the labels with a click of the “Label” button.

If I’m interested in saving the results of the pruning and cropping, I can access the currently selected points by calling the function sculpt3d.selected(), or the currently visible scene by sculpt3d.current(). This returns the same sort of logic vector that select3d does, so I can now use it to filter my dataset, save it under a new name, and come back to it later on.  Furthermore… since it’s in R, and uses a cross-platform gui (GTK+ via RGtk2 and Glade), it’s possible to use this tool and the data on any platform I want to.

Currently, I only have this tool on a local webserver, and it’s still a bit rough.  However, I thought I’d make it available.  To try it out, enter the following in R’s console:

install.packages(rgl)
install.packages(RGtk2)
rep='http://ethos.informatics.indiana.edu/~jjdonald/r'
install.packages(sculpt3d)

It’s currently source-only, so if you’re on windows, you’ll need to install RTools in order to compile everything.  You’ll also need the GTK+ framework and Glade (which should get installed automatically with RGtk2).

However, once that is done, you can check the demo out by entering:

library(sculpt3d)
demo(sculpt3d)

Thanks to Daniel Adler and Duncan Murdoch for rgl, and Michael Lawrence and Duncan Temple Lang for RGtk2.  Between the four of them, they’ve produced a lot of great stuff for R.

HaXe Demonstration

I recently gave a presentation on haXe at the Strands offices in Corvallis and Seattle.  I think it raised a few eyebrows, but probably ended up raising more questions than it answered.  This was to be expected, as haXe is pretty unique in its ability to target multiple web-related platforms.  The devil is in the details, and so I thought it was best to post everything here on the blog so that interested individuals can go over the specifics in a more involved way.

The demo that I showed was actually two parts: A simple “hello world” trace example, which I compile to  swf, php, and javascript.  Then, a more complex example that compiles some code to each of those targets, and then uses conditional compilation to build a web page that displays output from each target all at once (The PHP target references the javascript source, a javascript method references the swf file).

This, to me, is perhaps the most satisfying aspect of haXe:  The ability to treat each target as a minor variation of an API, rather than as a set of totally unrelated platforms.  Developing, targetting, and displaying results simultaneously is a strange feeling.  All of a sudden, javascript, php, and swf are operating off of the same script, and you are totally in control.

You can download the source as an archive here, or browse the source here.  Once you compile the Demo2 file, you should see something like this.

There’s a README in the source that’ll explain what’s going on.

If you’re totally new to haXe, you’ll want to download the compiler from the haxe.org download page.  After installing, you can run the build1.hxml – build4.hxml examples by simply typing in:

haxe build1.hxml

For each of the build files.  For the build.hxml file (non-numbered), you will need to edit it so that it points and outputs to a working web folder.  If you use TextMate, you can download the textmate bundle for haXe, and use the editor to browse and build files.

Injecting Methods into HaXe with "using"

Nicolas just added a new keyword to the haXe compiler called “using”.  In short, this keyword lets you inject additional methods into instances of types.

“using” explained

The haXe “using” keyword  is a special import command for classes.   The command will import the given class normally.  However, it has special behavior if the imported class has any static functions accepting an argument.

The “using” keyword takes any static functions defined in the class, and adds those static methods to the argument type, as if the argument type defined the method itself.  In this sense, it behaves like a sort of implicit mixin, or trait.  However, it’s also very different from these approaches, and to the best of my knowledge, nothing like this exists in any other language.

Explaining what’s happening in simple prose is actually a bit cumbersome, so here’s some example code.  First, we’ll give a simple haXe example without “using”.  The first class is a Demo class that contains the main function:

// in Demo.hx
class Demo{

public static function main(): Void
{
trace(BarStringClass.addBar('foo'));
}

}

The second is a class that contains additional methods:

//in BarStringClass.hx
class BarStringClass{
public static function addBar(str:String) :String
{
return str + ' bar';
}
}

If we compile the code using main() from Demo, we’ll get “foo bar”.  We’re simply using the static class from BarStringClass to process the string and tracing it.  Pretty simple.

Now let’s look at what we can do to Demo.hx with “using”.  We’ll use the same BarStringClass from before, but we’ll change Demo slightly:

//in Demo.hx
using BarStringClass; // here's the 'using' keyword in use
class Demo{

public static function main(): Void
{
trace('foo'.addBar());
}

}

haXe now lets us access BarStringClass static functions as if they were functions of the String class.  In this fashion, the standard base classes can be augmented with additional methods without the need to extend them explicitly.

Notice how in this example it wasn’t even necessary to specify the “str:String” argument required by addBar(). When the haXe compiler injects the methods from BarStringClass, it will automatically fill in the current instance as the first argument… turning BarStringClass.addBar(‘foo’) into ‘foo’.addBar().  All of the first argument Types of the “using” classes will be augmented in this way.

So in this case, BarStringClass had one static function (addBar), and the first argument type was String.  Therefore, every String instance now gets the addBar() function.  You could have many different Static functions in a “using” class, and each of the first argument classes will get injected in the same way.

“using” with types and/or classes

The other nice thing about this approach is that it works with the more broadly defined Types/typedefs, and not just Classes. So, my IterTools functions that work (mainly) with Iterable Types can be injected with “using”, and then you can do things like:

for ( i in [1,2,3,4].cycle(5) ){
trace(i);
}

…to iterate through the numbers 1 through 4 five times. “cycle()” and the rest of the functions that operate on Iterables would also then work with Lists, FastLists, Hashes, and so forth.  When you add “using IterTools;” at the top of a class file, anything that can iterate can now “cycle()”.

Caveats

There are two significant caveats with “using”.  The first is that the injected methods are not available through reflection.  This means they’re not part of the “run-time” class instance.  The second caveat is that it can be very easy to overload several instances of a single method name (from multiple “using” classes).  Currently, haXe’s behavior is to only use the first matching static method from a “using” class.

“using” can be habit forming

Some of the positives to the “using” keyword approach are:

  1. You can manage classes externally from code that defines them, seamlessly adding in functionality by augmenting classes and typedefs.
  2. Anything you can do through the “using” keyword can be done by using static classes, so there is very little ‘magic’ going on… just some clever reformatting of function calls.
  3. There’s a ton of static classes out there (in addition to my own) that already contain a lot of good practical functionality.  You can easily reuse these classes as “using” classes, or write your own that handle specific needs.
  4. Method completion works with –display.  So once you include the “using” keyword with an appropriate class, you can instantly see the extra available methods by triggering a code-completion request inside your editor (assuming the editor does –display code completion).

I’m assuming “using” will be added in the next version of haXe (2.04).  Till then, you can play around with it by downloading and building from the CVS sources.

7/26 Update: “using” is now officially part of haXe (2.04).

Cognitive Phase Points in Social Network Sizes

Social networking has been a huge buzzword for the last few years.   Companies have sprung up around helping people keeping in touch with friends, family, colleagues, and potential colleagues in a semi-public, semi-permanent format.  In typical “buzzword” fashion, social networking has been mercilessly exploited and shoehorned into dozens of contrived contexts.  Whether it’s Walmart trying to set up a social network for teens… who perhaps didn’t want to associate with the Walmart aesthetic, or iYomu, which tried to cater to an older demographic… typically the slowest population to adapt to new technology, particularly new communication technology.

In addition to assuming that social networks can be applied to nearly any business endeavor, there was also the sense that a greater utilization of these networks would somehow improve some sort of social awareness, and by corollary, provide the participating users with greater ‘access’ to individuals in otherwise inaccessible or distant locations.

To a certain degree, this is true.  People can keep tabs on distant friends, families, and colleagues in very informal fashion using facebook and twitter, but there has to be limits to how many meaningful social connections an individual can maintain, at a minimum level of awareness…

The general sense seems to be that “more friends=better”, and entire careers seem to have been built on exploiting social networks for self promotion.  However, turning your personal social network into a competitive  ‘numbers game’ seems… well… wrong.  It turns out there are pretty well defined ‘averages’ for how many meaningful relationships an individual can maintain.  Basically, there are three thresholds for social connections.

  1. 5 individuals = A number influenced by our working short term memory.  This seems to be the maximum level of individuals that you can become ‘intimately familiar’ with.  You might be able to predict how this individual would respond to given situations, or finish their sentences for them.
  2. 15 individuals = The maximum number of individuals you can form a “deep trust” with.  These are individuals that you feel completely relaxed and unguarded around.
  3. 150 individuals = The maximum number of individuals you can “enumerate” and identify with in a meaningful fashion.

The 150 number is a far cry from the thousands of friends on facebook/myspace that many people seem driven to acquire.  In fact, the very act of ‘friending’ someone on facebook may have undermined the  meaning of the word ‘friend’.  Perhaps all that facebook/myspace needs is clearer semantics behind the connections its users form, but it does feel to me at times that the notion of and online ‘friend’ has become garbled… and it seems that limiting the amount of connections you can make might be a start towards making them more useful.

Designing for social networks using these real-world social/cognitive limits might lead to some different interaction arrangements.  I had thought of the following:

  • For my “select 5″, I would give them the opportunity to directly connect/interrupt me at almost any time.  Richer communication channels (video/voice) would be enabled by default.
  • For my “trusted 15″, I would grant the opportunity to directly respond on any blog postings/etc without explicit editorial permission.  I would publish personal status messages and inside jokes through a limited public interface.
  • For my “group of 150″, I would grant permission to see pictures of me and my other friends (I think face recognition technolgy would be great to automatically tag and filter photos so that I wouldn’t have to).  I would publish less personal/professional messages that they would be able to see.
  • For any individuals above that number, they would be able to see non-personally-identifiable pictures, blog posts, and professional messages.

In general, I think as social networks mature…’less’ will be/mean more.

The Dissertation Marathon

I’m getting close (hopefully) to submitting my dissertation proposal (on music corpus visualization, natch).  This has been something that’s been a goal of mine for a long time, and I feel like my enthusiasm alone has carried me up to this point.  So, maybe I feel lucky that only recently have I started to feel very run down and disenchanted with the process of writing/editing/reviewing for 4-6 hours everyday.

Anyways, I’ve started doing some things that I think help greatly, and since many of my friends/colleagues are also going through this same process, and mentioned the same problems, I thought I could share some of what I’ve started doing that seems to help.  I’ll give them here in order of importance:

  1. Mood:  It’s really easy to get into weird mood swings, particularly if you’re like me, and are no longer on campus, and don’t have a bunch of others around to keep you in check.  If you have serious emotional problems, then this advice is not for you.  However, I think most people will go through some sort of strange mood while writing, it’s just part of the process.  What has helped me a lot in this area has been a simple amino acid called L-Methionine.  This is a biochemical precursor to Sam-e, a more expensive natural hormone related to joint strength and mood control.  I take L-Methionine, and have noticed a marked improvement in mood with no adverse side effects.  I realize of course that all of this is unproven according to the FDA, and some people consider it useless.  However, it is perfectly safe.  YMMV.
  2. Sleep: I’ve always had problems sleeping, and it gets worse when I travel.  Some of my collegaues suggested Melatonin, which they use to get over jet lag.  I have to give it my endorsement as well.  I’ve tried prescription strength sleep aids, but this is far better.  No side-effects, non-habit forming, and it doesn’t even make me dangerously drowsy.  I just feel ready to go to sleep, and sleep better than I do without it.
  3. Exercise:  It’s remarkable to me how exercise can change my mental state.  It only takes about 30 min. of jogging, etc. before I have a new perspective on things.  It’s also my “canary in the coal mine” for my weekly routine.  If I’m not having a good workout, it usually means I’m not getting enough sleep, or not eating right, etc.  Then it’s time for a change.
  4. Diet:  Some people are able to control all of these factors with diet and exercise.  More power to them.  I like more of  a variety, so I tend to eat whatever I want (within reason).  I’ve got nothing really profound here:  the wrong kinds of food can throw everything off kilter, so it’s good to be aware of how they affect you.  Generally, I eat first, and ask questions later.  It doesn’t get me into too much trouble.

Anyways, these are just tips that seem to work for me, and aren’t supposed to be a guideline.  I also don’t consider myself one of the “all-natural” crowd, nor do I endorse any sort of chemical stimulant/barbituate (everything I’ve referenced here has weaker side effects than caffeine/tylenol pm).  Writing the dissertation is shaping up to be a long, drawn out affair, and I’m very aware of the percentage of people who end up being A.B.D, which may be as high as 50% in some areas.  I just am going to use every tool I have available, and wanted to share them as well.

Tutorial: Music Visualization for Discovery

visualizing music tutorial

Paul Lamere and I recently got our tutorial accepted to ISMIR 2009.  It is titled Using Music Visualizations for Discovery and will be a survey of all the different ways that researchers have approached visualizing music collections, and how we can use these methods to find new and interesting music.

At first glance, this perhaps appears to be a bit of a ‘fringe’ topic.  Visualizations are often seen as more of an art form than a straightforward research tool.  However, I believe that music is a special form of information.  On one hand it behaves according to a set of physical rules, following certain spectral behaviors as an acoustic signal.  On the other hand, it behaves as a cultural icon, following another set of rules in a network of contextual associations and systems of meaning.  On yet another hand, it is understood in a musicological system of compositional symbols and structures.  To study and understand only one aspect of music is to have an incomplete grasp of the role it plays in the other systems.  You could spend an incredible amount of time understanding how musical information is related in each context.  Perhaps this is why music information retrieval projects spanning different music metadata types are somewhat rare.

Paul and I are believers that visualizations can be effective “big picture” overviews of music corpus information in many different contexts, and can be a great first step towards understanding how each form of music metadata reflects an important way of understanding music.  We hope to use our combined expertise to produce a useful overview of the current state of the art in music corpus visualization, as well as provide some useful open source tools for researchers to investigate their own musical information visually.

We’ve been gathering a lot of existing relevant projects already, and have tagged them all on del.icio.us.  There’s a lot of great projects on there, and it’s worth a look.  If you think we’ve missed something, tag the respective website with “MusicVizIsmir2009″ and we’ll check it out and add it to our collection.

Paul has some information available on his blog…. but just don’t believe everything he says about me, I’m the understudy in this group :)

Misconceptions of dividend producing stock

(Self Disclosure:  My father is an investment manager whose portfolios and strategies include dividend producing companies.)

This is way off topic for me, but I’ve been really bothered by all of the anti-wealth sentiment that seems to be growing in the US… that somehow people with a lot of money are the cause of the economic collapse, or that they are hurting the economy by continuing to take their pre-collapse salaries.  This sentiment is routinely expressed (of course) towards the CEO’s of big firms, but also now for college basketball coaches.

Apparently, limiting the salaries of corporate/organization heads is not enough, and so right now there is growing sentiment that dividends should be taxed higher, since many heads of companies invest in these stocks to generate income, and by doing so avoid higher tax rates on salaries.

The article goes on to assert that the investment in dividend producing stocks led to greater speculation, which in turn led to the current market state.

One example of “speculative dividend investment gone awry” are the Madoff funds, where individuals and companies bought into his Ponzi scheme, and ended up paying taxes for dividends they never received. However, this had nothing to do with the real culprit of the crash, the mortgage-backed security.  The folks that invested with Madoff were duped and robbed.  They have nobody to blame but themselves for getting hoodwinked by one of the oldest investment cons in the book.   It doesn’t mean the underlying dividend system is flawed, and it doesn’t mean the dividend system contributed to the current market situation.

The reason why dividend income is taxed lower is that it has already been taxed.  The companies that offer dividends have already paid taxes on their earnings.  Companies (not funds) that offer dividends typically are older, more stable companies that have limited opportunity for growth.  By offering dividends, they offer an incentive for investment over another faster growing, but less established company.

The Christian Science Monitor article is wrong on several points.  Investing in dividend producing companies is almost always less speculative than non-dividend producing companies.  You are typically investing in companies that are producing positive cash flow, and in companies that have been around for a while.  This is far safer than investing in other types of companies/funds.  This method of investment is also not the sole domain of the super-rich.  Anyone can do it, and it remains a powerful tool for people looking to make safer investments in the market.

The government has meddled with investment strategies at its own peril.  They encouraged us to buy homes we couldn’t afford, and directly brought about arrangements for loan agencies to buy and sell bad debt. The last thing they should be doing now is discouraging us from investing in strong, stable companies.

It is my belief that we are going to get out of this bad market through “back to basics” economic policies and investment strategies, not by eating the rich.

Follow

Get every new post delivered to your Inbox.